Re: Debugging M/R job with tez

Madhusudan Ramanna Wed, 28 Sep 2016 11:11:32 -0700

Not sure if this will help, but If you're running into class loading issues , 
take a look here: 
FAQ - Tez - Apache Software Foundation

|  
|   
|   
|   |    |

   |

  |
|  
|   |  
FAQ - Tez - Apache Software Foundation
   |   |

  |

  |

    On Wednesday, September 28, 2016 9:39 AM, Manuel Godbert 
<manuel.godb...@gmail.com> wrote:

 Hello,
In non local mode my M/R jobs generally behave as expected with Tez. However 
some still resist, and I am trying to have them running locally to understand 
if I they can work with some changes (either in my code or in Tez code, and in 
that latter case I planned to contribute some way to the Tez effort). Runnning 
the WordCount locally is only a first step.
I won't be able to provide source code easily for the real problematic jobs, as 
we use a quite big home made framework on top of hadoop and that is not open 
source... in a few words most of my issues actually seem to come from the task 
attempts IDs management. We have subclassed the output committers to manage 
multiple outputs, and when we reach the commit task step the produced files are 
not always where expected in the temporary task attempt paths. It is hard to 
say what happens exactly, and this is why I wanted to reproduce the issue 
locally before sharing it.
Besides this, another minor issue we got is that we used to package our 
applicative jars with nested dependencies in /lib and these are ignored by Tez. 
We could easily work around this expanding these and adapting our classpath.
Regards
On Wed, Sep 28, 2016 at 5:46 PM, Hitesh Shah <hit...@apache.org> wrote:

Hello Manuel,

Thanks for reporting the issue. Let me try and reproduce this locally to see 
what is going on.

A quick question in general though - are you hitting issues when running in 
non-local mode too? Would you mind sharing that details on the issues you hit?

thanks
— Hitesh

> On Sep 27, 2016, at 9:53 AM, Manuel Godbert <manuel.godb...@gmail.com> wrote:
>
> Hello,
>
> I have map/reduce jobs that work as expected within YARN, and I want to see 
> if Tez can help me improving their performance. Alas, I am experiencing 
> issues and I want to understand what happens, to see if I can adapt my code 
> or if I can suggest Tez enhancements. For this I need to be able to debug 
> jobs from within eclipse, with breakpoints in Tez source code etc.
>
> I am working on a linux (ubuntu) platform
> I use the latest Tez version I found, i.e. 0.9.0-SNAPSHOT (also tried with 
> 0.7.0)
> I have set up the hortonworks mini dev cluster https://github.com/ 
> hortonworks/mini-dev-cluster
> I am trying to run the basic WordCount2 code found here 
> https://hadoop.apache.org/ docs/r2.7.2/hadoop-mapreduce- 
> client/hadoop-mapreduce- client-core/MapReduceTutorial. 
> html#Example:_WordCount_v2.0
> I added the following code to have tez running locally:
>     conf.set("mapreduce.framework. name", "yarn-tez");
>     conf.setBoolean("tez.local. mode", true);
>     conf.set("fs.default.name", "file:///");
>     conf.setBoolean("tez.runtime. optimize.local.fetch", true);
>
> And I am getting the following error:
>
> 2016-09-27 18:32:34 Running Dag: dag_1474992804027_0003_1
> 2016-09-27 18:32:34 Running Dag: dag_1474992804027_0003_1
> Exception in thread "main" java.lang.NullPointerException
>       at org.apache.tez.client. LocalClient. getApplicationReport( 
>LocalClient.java:153)
>       at org.apache.tez.dag.api.client. rpc.DAGClientRPCImpl. 
>getAppReport(DAGClientRPCImpl. java:231)
>       at org.apache.tez.dag.api.client. rpc.DAGClientRPCImpl. 
>createAMProxyIfNeeded( DAGClientRPCImpl.java:251)
>       at org.apache.tez.dag.api.client. rpc.DAGClientRPCImpl. 
>getDAGStatus(DAGClientRPCImpl. java:96)
>       at org.apache.tez.dag.api.client. DAGClientImpl. getDAGStatusViaAM( 
>DAGClientImpl.java:360)
>       at org.apache.tez.dag.api.client. DAGClientImpl. getDAGStatusInternal( 
>DAGClientImpl.java:220)
>       at org.apache.tez.dag.api.client. DAGClientImpl.getDAGStatus( 
>DAGClientImpl.java:268)
>       at org.apache.tez.dag.api.client. MRDAGClient.getDAGStatus( 
>MRDAGClient.java:58)
>       at org.apache.tez.mapreduce. client.YARNRunner. 
>getJobStatus(YARNRunner.java: 710)
>       at org.apache.tez.mapreduce. client.YARNRunner.submitJob( 
>YARNRunner.java:650)
>       at org.apache.hadoop.mapreduce. JobSubmitter. submitJobInternal( 
>JobSubmitter.java:240)
>       at org.apache.hadoop.mapreduce. Job$10.run(Job.java:1290)
>       at org.apache.hadoop.mapreduce. Job$10.run(Job.java:1287)
>       at java.security. AccessController.doPrivileged( Native Method)
>       at javax.security.auth.Subject. doAs(Subject.java:422)
>       at org.apache.hadoop.security. UserGroupInformation.doAs( 
>UserGroupInformation.java: 1657)
>       at org.apache.hadoop.mapreduce. Job.submit(Job.java:1287)
>       at org.apache.hadoop.mapreduce. Job.waitForCompletion(Job. java:1308)
>       at WordCount2.main(WordCount2. java:136)
>
> Please help me understanding what I am doing wrong!
>
> Regards

Re: Debugging M/R job with tez

Reply via email to