Thejas M Nair commented on PIG-1062:

In SampleLoader.java
Isn't the idea of SampleLoader only to carry common code for RandomSampleLoader 
and PoissonLoader
and add a computeSamples() method? - Looks like now it has the getNext() 
needed by RandomSampleLoader in it now. Should we move that to 
RandomSampleLoader instead? 
RandomSampleLoader.getNext() is fairly generic, it can be used by any new 
sample loader classes where the number of samples to be sampled in each map is 
known in advance. So having this getNext() implementation in SampleLoader can 
be useful in future.

Why is skipNext() needed? Can't loader.getNext() == null be used instead? If 
so, is recordReader
skipNext() calls recordReader.getNext() which does not parse the record in to a 
tuple, unlike loader.getNext(). This way records can be more efficiently 

I will create a new patch addressing other comments.

> load-store-redesign branch: change SampleLoader and subclasses to work with 
> new LoadFunc interface 
> ---------------------------------------------------------------------------------------------------
>                 Key: PIG-1062
>                 URL: https://issues.apache.org/jira/browse/PIG-1062
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>         Attachments: PIG-1062.patch, PIG-1062.patch.3
> This is part of the effort to implement new load store interfaces as laid out 
> in http://wiki.apache.org/pig/LoadStoreRedesignProposal .
> PigStorage and BinStorage are now working.
> SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to 
> be changed to work with new LoadFunc interface.  
> Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
> PoissonSampleLoader is used by skew join. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to