[
https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779573#action_12779573
]
Thejas M Nair commented on PIG-1062:
------------------------------------
Instead of adding the num-rows information as a last special tuple, I am making
a change to have this as part of the last tuple, appended to its end (special
marker column and num-rows column).
{quote}
Instead of keeping track of max. num of columns in the different rows and then
appending the
special marker string and num of rows at the end, would it be better to just
have these as the
first two fields of the last tuple emitted and then introduce a split-union
combination to
ensure that the foreach pipeline gets the regular tuples (excluding the special
tuple)?
{quote}
In the implementation in my upcoming patch, foreach pipleline that evaluates
the join expression (in map of sampling MR job) would be getting regular
tuples, except in case of last tuple. This is safer than existing
implementation in trunk where all the tuples had a disk-size column appended to
it. The split-union approach proposed above helps in getting the special tuple
to bypass the foreach, but getting it around the reduce stage (of sampling MR
job) sort would involve lot more changes (if the special tuple has marker and
num-rows as first two columns).
> load-store-redesign branch: change SampleLoader and subclasses to work with
> new LoadFunc interface
> ---------------------------------------------------------------------------------------------------
>
> Key: PIG-1062
> URL: https://issues.apache.org/jira/browse/PIG-1062
> Project: Pig
> Issue Type: Sub-task
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Attachments: PIG-1062.patch, PIG-1062.patch.3
>
>
> This is part of the effort to implement new load store interfaces as laid out
> in http://wiki.apache.org/pig/LoadStoreRedesignProposal .
> PigStorage and BinStorage are now working.
> SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to
> be changed to work with new LoadFunc interface.
> Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
> PoissonSampleLoader is used by skew join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.