Re: Getting data back to the non-computational node

Gopal V Fri, 23 Jan 2015 14:16:54 -0800

On 1/20/15, 3:45 PM, David Pollak wrote:

In Tez, is there a concept of shipping data back to the machine (likely not
part of the Hadoop cluster) that spawned the Tez job?

The standard practice is to write to an HDFS directory and read databack instead of opening up ports between containers and the client.

That's really an infrastructure workaround, for secure clusters withoutpermissive firewalls.

The added plus is that you preempt the Tez containers without worryingabout data loss.

(like a Spark foreach). I'm looking for the same thing, except when the job
is run via Yarn.

Tez works at a slightly different layer from the data formats as such,but the following code should give you a good idea what would happenwhen you translate a Driver method to .collect() into the Tez executioncontext.


https://github.com/hortonworks/spark-native-yarn/blob/master/src/main/scala/org/apache/spark/tez/TezJobExecutionContext.scala#L178

.forEach() is similar, almost literally a loop over a collection.

Cheers,
Gopal

Re: Getting data back to the non-computational node

Reply via email to