RE: Sequence file as an output

Bikas Saha Wed, 21 May 2014 10:40:51 -0700

You are right. In fact, it’s a very interesting use case.



Are you using MapProcessor and ReduceProcessor? Or have you written your
own processor and are just using Tez inputs/outputs?



If you look at the latest WordCount.java code in the tez code base, then
you can see the current best practice for using the API. For these best
practices on using the Tez API, you should look at compiling against the
current master that tracks the next 0.5 release. If you are building tez
locally then it’s the master branch. Otherwise maven artifacts (for
dependency on 0.5.0-incubating-SNAPSHOT) are at
https://repository.apache.org/content/groups/snapshots/org/apache/tez





Let us know if this helps!

Bikas



*From:* Wojciech Indyk [mailto:[email protected]]
*Sent:* Wednesday, May 21, 2014 1:58 AM
*To:* [email protected]
*Subject:* Re: Sequence file as an output



When I remove MRHelpers.doJobClientMagic then NullPointerException in
Configuration class occurs.



Could you advise me a base class (class and branch/release) for good
practice in TEZ for mapReduce jobs? I've rewritten my MR job to use
Counters (not available in MapReduce on TEZ) and Sessions (to improve
iterative processing speed). I have just Map and Reduce phase, it works in
loop (several iterations), so I think using session can improve a
performance. Am I right?


Kindly regards

Wojciech Indyk



2014-05-21 0:33 GMT+02:00 Siddharth Seth <[email protected]>:

It's possible that the old Output Format is being used (mapred vs
mapreduce).

Could you try forcing this to use the new API with the following.

    finalVertex.setBoolean("mapred.mapper.new-api", true);

Also, if you happen to be using MRHelpers.doJobClientMagic - remove that,
since that could reset this parameter.



This is a little messed up, but we're working on making this much easier to
use in 0.5.



Thanks

- Sid





On Tue, May 20, 2014 at 3:19 PM, Wojciech Indyk <[email protected]>
wrote:

Hi all!

I use tez-0.4 on HDP 2.1. I tried to save results of DAG as a SequenceFile.

I use:

finalVertex.set(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR,
SequenceFileOutputFormat.class.getName());

The problem is the output is saved as TextOutputFormat. I use Sequence file
as an input to DAG and it works fine (I use SequenceFileInputFormat).


Kindly regards

Wojciech Indyk

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: Sequence file as an output

Reply via email to