You are right. In fact, it’s a very interesting use case.
Are you using MapProcessor and ReduceProcessor? Or have you written your own processor and are just using Tez inputs/outputs? If you look at the latest WordCount.java code in the tez code base, then you can see the current best practice for using the API. For these best practices on using the Tez API, you should look at compiling against the current master that tracks the next 0.5 release. If you are building tez locally then it’s the master branch. Otherwise maven artifacts (for dependency on 0.5.0-incubating-SNAPSHOT) are at https://repository.apache.org/content/groups/snapshots/org/apache/tez Let us know if this helps! Bikas *From:* Wojciech Indyk [mailto:[email protected]] *Sent:* Wednesday, May 21, 2014 1:58 AM *To:* [email protected] *Subject:* Re: Sequence file as an output When I remove MRHelpers.doJobClientMagic then NullPointerException in Configuration class occurs. Could you advise me a base class (class and branch/release) for good practice in TEZ for mapReduce jobs? I've rewritten my MR job to use Counters (not available in MapReduce on TEZ) and Sessions (to improve iterative processing speed). I have just Map and Reduce phase, it works in loop (several iterations), so I think using session can improve a performance. Am I right? Kindly regards Wojciech Indyk 2014-05-21 0:33 GMT+02:00 Siddharth Seth <[email protected]>: It's possible that the old Output Format is being used (mapred vs mapreduce). Could you try forcing this to use the new API with the following. finalVertex.setBoolean("mapred.mapper.new-api", true); Also, if you happen to be using MRHelpers.doJobClientMagic - remove that, since that could reset this parameter. This is a little messed up, but we're working on making this much easier to use in 0.5. Thanks - Sid On Tue, May 20, 2014 at 3:19 PM, Wojciech Indyk <[email protected]> wrote: Hi all! I use tez-0.4 on HDP 2.1. I tried to save results of DAG as a SequenceFile. I use: finalVertex.set(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR, SequenceFileOutputFormat.class.getName()); The problem is the output is saved as TextOutputFormat. I use Sequence file as an input to DAG and it works fine (I use SequenceFileInputFormat). Kindly regards Wojciech Indyk -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
