[ 
https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778654#action_12778654
 ] 

Pradeep Kamath commented on PIG-1062:
-------------------------------------

Review comments:
In SampleLoader.java
====================
Isn't the idea of SampleLoader only to carry common code for RandomSampleLoader 
and PoissonLoader
and add a computeSamples() method? - Looks like now it has the getNext() 
implementation
needed by RandomSampleLoader in it now. Should we move that to 
RandomSampleLoader instead?


{code}
134             System.err.println("Sample " + samples[nextSampleIdx]);
{code}
Debug statement above should be removed.


Why is skipNext() needed? Can't loader.getNext() == null be used instead? If 
so, is recordReader
needed?

In RandomSampleLoader.java
==========================
XXX FIXME comment (put in by me :))should be removed

I think we should move the actual getNext() implementation code from 
SampleLoader to here

In PoissonSampleLoader.java
============================

{code}
 40         // this will be value of first column in the special row   
{code}
I think this is no longer the case - should be removed.


{code}
    58     // memory per sample. divide this by avgTupleMemSize to get 
skipInterval 
     59     private long memPerSample=0;
     60 
{code}
Should the above be called memToSkipPerSample?


{code}
 104         if(skipInterval == -1){
{code}
It doesn't look like skipInterval is initialized to -1


Instead of keeping track of max. num of columns in the different rows and then 
appending the
special marker string and num of rows at the end, would it be better to just 
have these as the
first two fields of the last tuple emitted and then introduce a split-union 
combination to 
ensure that the foreach pipeline gets the regular tuples (excluding the special 
tuple)?



> load-store-redesign branch: change SampleLoader and subclasses to work with 
> new LoadFunc interface 
> ---------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1062
>                 URL: https://issues.apache.org/jira/browse/PIG-1062
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>         Attachments: PIG-1062.patch, PIG-1062.patch.3
>
>
> This is part of the effort to implement new load store interfaces as laid out 
> in http://wiki.apache.org/pig/LoadStoreRedesignProposal .
> PigStorage and BinStorage are now working.
> SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to 
> be changed to work with new LoadFunc interface.  
> Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
> PoissonSampleLoader is used by skew join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to