Hi Sudheer,
It would be good to understand what use cases you are trying to solve.
The typical and easiest way of using Tez is via Hive and Pig and turning tez on
for these engines.
But if you want to hand-code your jobs because of specific optimizations in
your business logic then it makes a case for using MR API's. So then the
question is one of the following
1) run existing MR jobs using Tez - this can be done with the steps mentioned
below and may give you 0-20% perf improvement due to generic containe reuse and
other optimizations.
2) refactor MR job code to use Tez to create longer jobs - If your overall
business job consists of a DAG of MR jobs then you could get significant perf
gains by changing your M1->R1->HDFS->M2->R2->HDFS->M3->R3->HDFS chain to a
single M1->R1+M2->R2+M3->R3->HDFS chain. This needs you to refactor your code
to wrap your existing MR code into TezProcessors. Since Tez runtime libraries
component for shuffling intermediate data is compatible with MR shuffle, almost
all your MR code could be reused with minimal changes.
Bikas
-Original Message-
From: zhiyuan yang [mailto:sjtu@gmail.com]
Sent: Monday, August 1, 2016 10:26 AM
To: user@tez.apache.org
Subject: Re: Guide to write map-reduce code using Tez API
The nice thing of Tez is it’s compatible with MapReduce API. So if you just
want to run MapReduce on Tez, you just learn how to write standard MapReduce
and change the execution engine to Tez.
To change the execution engine of MapReduce, please change the configuration
mapreduce.framework.name. (Not 100% percent sure about this, correct me if I’m
wrong)
Thanks!
Zhiyuan
> On Aug 1, 2016, at 10:19 AM, Sudhir.Kumar wrote:
>
> Hello All,
>
> I have just started to read about Tez.
>
> Is there a document to understand the Tez Java APIs which can be used to
> write map-reduce code.
>
> Thanks,
>
> Sudhir