[ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745830#action_12745830
 ] 

Daniel Dai commented on PIG-890:
--------------------------------

Comments:
1. Can you include unit test?
2. PoissonSampleLoader.java
{noformat}
                try {
                        numSplits = 
Integer.valueOf(pcProps.getProperty(MAPSPLITS_COUNT));
                } catch (NumberFormatException e) {
                        numSplits = 1;
                }
{noformat}
We shall throw exception rather than continue.
Same to 
{noformat}
                try {
                        float f = (Runtime.getRuntime().maxMemory() * heapPerc) 
/ (float) (FileLocalizer.getSize(fname) * convFactor);
                        baseNumSamples = (long) Math.ceil(1.0 / f);
                } catch (IOException e) {
                        baseNumSamples = 1; // default value 
                }
{noformat}
3. Are PoissonSampleLoader.next and PoissonSampleLoader.bindTo the same with 
RandomSampleLoader? If so, we shall put them in base class rather than copy
4. For DEFAULT_SAMPLE_RATE, can you provide some other values in the comment, 
such as confidence 90%, 85%, and also put a link of how to get these magic 
numbers. I know this is Poisson cdf, but it is better to have something we can 
check really quick

> Create a sampler interface and improve the skewed join sampler
> --------------------------------------------------------------
>
>                 Key: PIG-890
>                 URL: https://issues.apache.org/jira/browse/PIG-890
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>         Attachments: sampler.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to