Hi,
     I'm using the newer version of the mapreduce API 
(org.apache.hadoop.mapreduce, not org.apache.hadoop.mapred). I was under the 
impression that the older API doesn't work well with the newer one. Do you have 
an example of using the older OutputFormat with the newer API by any chance?

Cheers,

David Poisson

________________________________________
From: Azuryy Yu [[email protected]]
Sent: Saturday, June 29, 2013 10:34 AM
To: [email protected]
Subject: RE: Profiling map reduce jobs?

I just advice to use MultipleOutputFormat, instead of MultipleOurput.write

--Send from my Sony mobile.
On Jun 29, 2013 9:16 PM, "David Poisson" <[email protected]>
wrote:

> Just thought I'd provide some insight into our problem.
>
> It appears that the problem was a slowdown caused by the use of
> multipleOutputs.write(output, key, keyValue, path) (going from memory
> here). Anyways, after looking at the implementation of that write function
>  in multipleOutputs.java it appears that a context was created and a conf
> was gotten and a new recordWriter was gotten for every call to
> write(output, key, keyValue, path).
>
> We have changed all of those calls to write(output, key, keyValue) (which
> doesn't do any extra things) and it seems to help.
>
> Anyone else has any tips when using multipleOutputs?
>
> We are taking our input and splitting it into 3 files. So it seems to be a
> natural choice for MultipleOutputs. Performance is a bit slow though.
>
> Cheers!
>
> David
> ________________________________________
> From: David Poisson [[email protected]]
> Sent: Thursday, June 27, 2013 4:22 PM
> To: [email protected]
> Subject: Profiling map reduce jobs?
>
> Howdy,
>      I want to take a look at a MR job which seems to be slower than I had
> hoped. Mind you, this MR job is only running on a pseudo-distributed VM
> (cloudera cdh4).
>
> I have modified my mapred-site.xml with the following (that last one is
> commented out because it crashes my MR job):
>
>   <property>
>     <name>mapred.task.profile</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>mapred.task.profile.maps</name>
>     <value>0-2</value>
>   </property>
>   <property>
>     <name>mapred.task.profile.reduces</name>
>     <value>0-2</value>
>   </property>
>   <!--property>
>     <name>mapred.task.profile.params</name>
>
> <value>agentlib:hprof=cpu=samples,heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s</value>
>   </property-->
> Are there any resources that explain how to interpret the results?
> Or maybe an open-source app that could help display the results in a more
> intuiative manner?
>
> Ideally, we'd want to know where we are spending most of our time.
>
> Cheers,
>
> David

Reply via email to