Re: Ten processor with multiple inputs

Hitesh Shah Mon, 18 May 2015 11:25:46 -0700

This is with respect to how work is assigned to a Task. For a shuffle edge, a 
Task’s input is determined based on the partitions and how partitions are 
assigned to a Task. For a vertex reading data from HDFS ( initial input ). this 
is effectively random as the input data is split up and then assigned to tasks.

When trying to combine the data, the user would need to write a custom vertex 
manager to handle correctly assigning data from the initial input and the 
shuffle edge in a deterministic manner ( and other user-specific conditions 
such as trying to do a “join” ) for the processing to be correctly done.

I believe Hive has a couple of cases where this is implemented. You should ask 
on the dev@hive list for more details. 

— Hitesh

On May 18, 2015, at 9:00 AM, Oleg Zhurakousky <[email protected]> 
wrote:

> Also, while trying something related to this i’ve noticed the following: "A 
> vertex with an Initial Input and a Shuffle Input are not supported at the 
> moment”.
> Is there a target timeframe for this? JIRA?
> 
> Thanks
> Oleg
> 
>> On May 18, 2015, at 10:27 AM, Oleg Zhurakousky 
>> <[email protected]> wrote:
>> 
>> Is it possible to allow Tez processor implementation which has multiple 
>> inputs to become available as soon as at least one input is available to be 
>> read.
>> This could allow for some computation to begin while waiting for other 
>> inputs. Other inputs could (if logic allows) be processed as they become 
>> available. 
>> 
>> 
>> Thanks
>> Oleg
>

Re: Ten processor with multiple inputs

Reply via email to