[ https://issues.apache.org/jira/browse/PIG-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789069#action_12789069 ]
Thejas M Nair commented on PIG-1143: ------------------------------------ If the data is going to be in BinStorage, my comments regarding the approach for this patch are not applicable. But the patch does not need to be ported to load-store redesign branch. > Poisson Sample Loader should compute the number of samples required only once > ----------------------------------------------------------------------------- > > Key: PIG-1143 > URL: https://issues.apache.org/jira/browse/PIG-1143 > Project: Pig > Issue Type: Bug > Reporter: Sriranjan Manjunath > Assignee: Sriranjan Manjunath > > The current poisson sampler forces each of the maps to compute the sample > number. This is redundant and causes issues when a large directory is > specified in the join. The sampler should be changed to calculate the sample > count only once and this information should be shared with the remaining > mappers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.