RE: Guide to write map-reduce code using Tez API

Bikas Saha Mon, 01 Aug 2016 14:32:02 -0700

Hi Sudheer,

It would be good to understand what use cases you are trying to solve.

The typical and easiest way of using Tez is via Hive and Pig and turning tez on 
for these engines.

But if you want to hand-code your jobs because of specific optimizations in 
your business logic then it makes a case for using MR API's. So then the 
question is one of the following

1) run existing MR jobs using Tez - this can be done with the steps mentioned 
below and may give you 0-20% perf improvement due to generic containe reuse and 
other optimizations.
2) refactor MR job code to use Tez to create longer jobs - If your overall 
business job consists of a DAG of MR jobs then you could get significant perf 
gains by changing your M1->R1->HDFS->M2->R2->HDFS->M3->R3->HDFS chain to a 
single M1->R1+M2->R2+M3->R3->HDFS chain. This needs you to refactor your code 
to wrap your existing MR code into TezProcessors. Since Tez runtime libraries 
component for shuffling intermediate data is compatible with MR shuffle, almost 
all your MR code could be reused with minimal changes.

Bikas

-----Original Message-----
From: zhiyuan yang [mailto:[email protected]] 
Sent: Monday, August 1, 2016 10:26 AM
To: [email protected]
Subject: Re: Guide to write map-reduce code using Tez API

The nice thing of Tez is it’s compatible with MapReduce API. So if you just 
want to run MapReduce on Tez, you just learn how to write standard MapReduce 
and change the execution engine to Tez.

To change the execution engine of MapReduce, please change the configuration 
mapreduce.framework.name. (Not 100% percent sure about this, correct me if I’m 
wrong)

Thanks!
Zhiyuan

> On Aug 1, 2016, at 10:19 AM, Sudhir.Kumar <[email protected]> wrote:
> 
> Hello All,
>  
> I have just started to read about Tez.
>  
> Is there a document to understand the Tez Java APIs which can be used to 
> write map-reduce code.
>  
> Thanks,
>  
> Sudhir

RE: Guide to write map-reduce code using Tez API

Reply via email to