hi folks,

While debugging the DAG generated by a Scalding / Cascading job, I noticed
that in Tez we end up with two input vertices - one vertex for each input
path. In case of Hadoop on the other hand we end up with our map phase
reading from both input datasets. Is this supported in Tez? I noticed that
Cascading is currently using MRInput
<https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/input/MRInput.java>
to
set up its Tez inputs. I wasn't sure if we could use MultiMRInput
<https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/input/MultiMRInput.java>
to
read from multiple input directories in the same vertex in Tez or if it has
a different purpose. If we can use it, is it safe for public consumption?
(noticed it is still annotated with @Evolving).

Thanks,

-- 
- Piyush

Reply via email to