[ https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778654#action_12778654 ]
Pradeep Kamath commented on PIG-1062: ------------------------------------- Review comments: In SampleLoader.java ==================== Isn't the idea of SampleLoader only to carry common code for RandomSampleLoader and PoissonLoader and add a computeSamples() method? - Looks like now it has the getNext() implementation needed by RandomSampleLoader in it now. Should we move that to RandomSampleLoader instead? {code} 134 System.err.println("Sample " + samples[nextSampleIdx]); {code} Debug statement above should be removed. Why is skipNext() needed? Can't loader.getNext() == null be used instead? If so, is recordReader needed? In RandomSampleLoader.java ========================== XXX FIXME comment (put in by me :))should be removed I think we should move the actual getNext() implementation code from SampleLoader to here In PoissonSampleLoader.java ============================ {code} 40 // this will be value of first column in the special row {code} I think this is no longer the case - should be removed. {code} 58 // memory per sample. divide this by avgTupleMemSize to get skipInterval 59 private long memPerSample=0; 60 {code} Should the above be called memToSkipPerSample? {code} 104 if(skipInterval == -1){ {code} It doesn't look like skipInterval is initialized to -1 Instead of keeping track of max. num of columns in the different rows and then appending the special marker string and num of rows at the end, would it be better to just have these as the first two fields of the last tuple emitted and then introduce a split-union combination to ensure that the foreach pipeline gets the regular tuples (excluding the special tuple)? > load-store-redesign branch: change SampleLoader and subclasses to work with > new LoadFunc interface > --------------------------------------------------------------------------------------------------- > > Key: PIG-1062 > URL: https://issues.apache.org/jira/browse/PIG-1062 > Project: Pig > Issue Type: Sub-task > Reporter: Thejas M Nair > Assignee: Thejas M Nair > Attachments: PIG-1062.patch, PIG-1062.patch.3 > > > This is part of the effort to implement new load store interfaces as laid out > in http://wiki.apache.org/pig/LoadStoreRedesignProposal . > PigStorage and BinStorage are now working. > SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to > be changed to work with new LoadFunc interface. > Fixing SampleLoader and RandomSampleLoader will get order-by queries working. > PoissonSampleLoader is used by skew join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.