[
https://issues.apache.org/jira/browse/PIG-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583232#action_12583232
]
Shravan Matthur Narayanamurthy commented on PIG-162:
----------------------------------------------------
What would be the best way for implementing the Split operator. The problem
with implementing it as an operator would be the buffering required. Since we
are following the single threaded model, a blocking getNext by say a filter
operator might actualy read all the tuples from the split which can very well
be in the reduce side. Since the other branch of the split will execute after
the filter, there is no other go but to buffer all the tuples.
One way would be to replicate the pipeline during the logical to physical
translation.
Another would be to construct a databag explicitly inside the Split and store
all tuples from its input into the bag. Now attach the bag's iterator to the
splt readers. But this doesn't sound very efficient to me.
Another one would be to differentiate the split processing in map and reduce
phases. In the map side, we can follow the above approach of using a bag since
the amount of data is restricted. On the reuce side, since we will have only
one package, we can use plan folding. That is, make the plan that the split
operator feeds to an attribute plan of the split. getNext() to split wil read a
tuple and attach it to the attribute plan and will return whatever, the plan's
root operator's getNext returns. The folded plan can be implemented as in the
Map side.
Any suggestions?
> Rework mapreduce submission and monitoring
> ------------------------------------------
>
> Key: PIG-162
> URL: https://issues.apache.org/jira/browse/PIG-162
> Project: Pig
> Issue Type: Sub-task
> Environment: This bug tracks works to rework the submission and
> monitoring interface to map reduce as described in
> http://wiki.apache.org/pig/PigTypesFunctionalSpec
> Reporter: Alan Gates
> Assignee: Alan Gates
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.