[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745830#action_12745830 ]
Daniel Dai commented on PIG-890: -------------------------------- Comments: 1. Can you include unit test? 2. PoissonSampleLoader.java {noformat} try { numSplits = Integer.valueOf(pcProps.getProperty(MAPSPLITS_COUNT)); } catch (NumberFormatException e) { numSplits = 1; } {noformat} We shall throw exception rather than continue. Same to {noformat} try { float f = (Runtime.getRuntime().maxMemory() * heapPerc) / (float) (FileLocalizer.getSize(fname) * convFactor); baseNumSamples = (long) Math.ceil(1.0 / f); } catch (IOException e) { baseNumSamples = 1; // default value } {noformat} 3. Are PoissonSampleLoader.next and PoissonSampleLoader.bindTo the same with RandomSampleLoader? If so, we shall put them in base class rather than copy 4. For DEFAULT_SAMPLE_RATE, can you provide some other values in the comment, such as confidence 90%, 85%, and also put a link of how to get these magic numbers. I know this is Poisson cdf, but it is better to have something we can check really quick > Create a sampler interface and improve the skewed join sampler > -------------------------------------------------------------- > > Key: PIG-890 > URL: https://issues.apache.org/jira/browse/PIG-890 > Project: Pig > Issue Type: Improvement > Reporter: Sriranjan Manjunath > Attachments: sampler.patch > > > We need a different sampler for order by and skewed join. We thus need a > better sampling interface. The design of the same is described here: > http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.