Hi Sudheer, It would be good to understand what use cases you are trying to solve.
The typical and easiest way of using Tez is via Hive and Pig and turning tez on for these engines. But if you want to hand-code your jobs because of specific optimizations in your business logic then it makes a case for using MR API's. So then the question is one of the following 1) run existing MR jobs using Tez - this can be done with the steps mentioned below and may give you 0-20% perf improvement due to generic containe reuse and other optimizations. 2) refactor MR job code to use Tez to create longer jobs - If your overall business job consists of a DAG of MR jobs then you could get significant perf gains by changing your M1->R1->HDFS->M2->R2->HDFS->M3->R3->HDFS chain to a single M1->R1+M2->R2+M3->R3->HDFS chain. This needs you to refactor your code to wrap your existing MR code into TezProcessors. Since Tez runtime libraries component for shuffling intermediate data is compatible with MR shuffle, almost all your MR code could be reused with minimal changes. Bikas -----Original Message----- From: zhiyuan yang [mailto:[email protected]] Sent: Monday, August 1, 2016 10:26 AM To: [email protected] Subject: Re: Guide to write map-reduce code using Tez API The nice thing of Tez is it’s compatible with MapReduce API. So if you just want to run MapReduce on Tez, you just learn how to write standard MapReduce and change the execution engine to Tez. To change the execution engine of MapReduce, please change the configuration mapreduce.framework.name. (Not 100% percent sure about this, correct me if I’m wrong) Thanks! Zhiyuan > On Aug 1, 2016, at 10:19 AM, Sudhir.Kumar <[email protected]> wrote: > > Hello All, > > I have just started to read about Tez. > > Is there a document to understand the Tez Java APIs which can be used to > write map-reduce code. > > Thanks, > > Sudhir
