Daniel Dai commented on PIG-890:

1. Can you include unit test?
2. PoissonSampleLoader.java
                try {
                        numSplits = 
                } catch (NumberFormatException e) {
                        numSplits = 1;
We shall throw exception rather than continue.
Same to 
                try {
                        float f = (Runtime.getRuntime().maxMemory() * heapPerc) 
/ (float) (FileLocalizer.getSize(fname) * convFactor);
                        baseNumSamples = (long) Math.ceil(1.0 / f);
                } catch (IOException e) {
                        baseNumSamples = 1; // default value 
3. Are PoissonSampleLoader.next and PoissonSampleLoader.bindTo the same with 
RandomSampleLoader? If so, we shall put them in base class rather than copy
4. For DEFAULT_SAMPLE_RATE, can you provide some other values in the comment, 
such as confidence 90%, 85%, and also put a link of how to get these magic 
numbers. I know this is Poisson cdf, but it is better to have something we can 
check really quick

> Create a sampler interface and improve the skewed join sampler
> --------------------------------------------------------------
>                 Key: PIG-890
>                 URL: https://issues.apache.org/jira/browse/PIG-890
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>         Attachments: sampler.patch
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to