[jira] [Updated] (TEZ-3459) Issues running M/R jobs with Tez

2016-10-21 Thread Manuel Godbert (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manuel Godbert updated TEZ-3459:

Description: 
After applying the patch delivered in TEZ-3330, I enriched the MapredColorCount 
example to reproduce some of the other issues I encountered on the jobs I wish 
to see running with Tez.

I am attaching a jar to the JIRA, including source code, and a script file 
detailing the observed results in comments.

It adresses 4 issues:
- the embedded jars in /lib are ignored by Tez, but YARN uses them without 
additional configuration
- The use of a combiner causes a NullPointerException
- The counters incremented in the Reporter objects stay at 0
- The additional output configured is missing in the final job output folder. 
It seems that we actually have 2 issues at task commit time:
-> there is no task committing for maps in a map+reduce job, but in our example 
we generated outputs in map phase using MultipleOutputs
-> the temporary task folder used for files coming from the MultipleOutputs is 
not always the same as for the main output files (more difficult to illustrate 
with simple example). This happens to cause issues at task commit time.

For information we observe about 10% of performance gain using Tez and working 
around above issues in our use cases with production data volumes, which is 
really great!

I am using HDP2.4

  was:
After applying the patch delivered in TEZ-3330, I enriched the MapredColorCount 
example to reproduce some of the other issues I encountered on the jobs I wish 
to see running with Tez.

I am attaching a jar to the JIRA, including source code, and a script file 
detailing the observed results in comments.

It adresses 4 issues:
- the embedded jars in /lib are ignored by Tez, but YARN uses them without 
additional configuration
- The use of a combiner causes a NullPointerException
- The counters incremented in the Reporter objects stay at 0
- The additional output configured is missing in the final job output folder. 
It seems the problem occurs at task commit time, as the new output file is not 
in the same folder as the main output file.

I am using HDP2.4


> Issues running M/R jobs with Tez
> 
>
> Key: TEZ-3459
> URL: https://issues.apache.org/jira/browse/TEZ-3459
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
> Attachments: colorCount.sh, colorCount.sh, mr-example.jar, 
> mr-example.jar
>
>
> After applying the patch delivered in TEZ-3330, I enriched the 
> MapredColorCount example to reproduce some of the other issues I encountered 
> on the jobs I wish to see running with Tez.
> I am attaching a jar to the JIRA, including source code, and a script file 
> detailing the observed results in comments.
> It adresses 4 issues:
> - the embedded jars in /lib are ignored by Tez, but YARN uses them without 
> additional configuration
> - The use of a combiner causes a NullPointerException
> - The counters incremented in the Reporter objects stay at 0
> - The additional output configured is missing in the final job output folder. 
> It seems that we actually have 2 issues at task commit time:
> -> there is no task committing for maps in a map+reduce job, but in our 
> example we generated outputs in map phase using MultipleOutputs
> -> the temporary task folder used for files coming from the MultipleOutputs 
> is not always the same as for the main output files (more difficult to 
> illustrate with simple example). This happens to cause issues at task commit 
> time.
> For information we observe about 10% of performance gain using Tez and 
> working around above issues in our use cases with production data volumes, 
> which is really great!
> I am using HDP2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3459) Issues running M/R jobs with Tez

2016-10-13 Thread Manuel Godbert (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manuel Godbert updated TEZ-3459:

Description: 
After applying the patch delivered in TEZ-3330, I enriched the MapredColorCount 
example to reproduce some of the other issues I encountered on the jobs I wish 
to see running with Tez.

I am attaching a jar to the JIRA, including source code, and a script file 
detailing the observed results in comments.

It adresses 4 issues:
- the embedded jars in /lib are ignored by Tez, but YARN uses them without 
additional configuration
- The use of a combiner causes a NullPointerException
- The counters incremented in the Reporter objects stay at 0
- The additional output configured is missing in the final job output folder. 
It seems the problem occurs at task commit time, as the new output file is not 
in the same folder as the main output file.

I am using HDP2.4

  was:
After applying the patch delivered in TEZ-3330, I enriched the MapredColorCount 
example to reproduce some of the other issues I encountered on the jobs I wish 
to see running with Tez.

I am attaching a jar to the JIRA, including source code, and a script file 
detailing the observed results in comments.

It adresses 3 issues:
- the embedded jars in /lib are ignored by Tez, but YARN uses them without 
additional configuration
- The use of a combiner causes a NullPointerException
- The counters incremented in the Reporter objects stay at 0

I am using HDP2.4


> Issues running M/R jobs with Tez
> 
>
> Key: TEZ-3459
> URL: https://issues.apache.org/jira/browse/TEZ-3459
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
> Attachments: colorCount.sh, colorCount.sh, mr-example.jar, 
> mr-example.jar
>
>
> After applying the patch delivered in TEZ-3330, I enriched the 
> MapredColorCount example to reproduce some of the other issues I encountered 
> on the jobs I wish to see running with Tez.
> I am attaching a jar to the JIRA, including source code, and a script file 
> detailing the observed results in comments.
> It adresses 4 issues:
> - the embedded jars in /lib are ignored by Tez, but YARN uses them without 
> additional configuration
> - The use of a combiner causes a NullPointerException
> - The counters incremented in the Reporter objects stay at 0
> - The additional output configured is missing in the final job output folder. 
> It seems the problem occurs at task commit time, as the new output file is 
> not in the same folder as the main output file.
> I am using HDP2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3459) Issues running M/R jobs with Tez

2016-10-13 Thread Manuel Godbert (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manuel Godbert updated TEZ-3459:

Attachment: colorCount.sh
mr-example.jar

Updated attached example to adress an additional issue, when using multiple 
outputs. Updating the JIRA general description too.

> Issues running M/R jobs with Tez
> 
>
> Key: TEZ-3459
> URL: https://issues.apache.org/jira/browse/TEZ-3459
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
> Attachments: colorCount.sh, colorCount.sh, mr-example.jar, 
> mr-example.jar
>
>
> After applying the patch delivered in TEZ-3330, I enriched the 
> MapredColorCount example to reproduce some of the other issues I encountered 
> on the jobs I wish to see running with Tez.
> I am attaching a jar to the JIRA, including source code, and a script file 
> detailing the observed results in comments.
> It adresses 3 issues:
> - the embedded jars in /lib are ignored by Tez, but YARN uses them without 
> additional configuration
> - The use of a combiner causes a NullPointerException
> - The counters incremented in the Reporter objects stay at 0
> I am using HDP2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3461) Tez not working in local mode for M/R jobs

2016-10-06 Thread Manuel Godbert (JIRA)
Manuel Godbert created TEZ-3461:
---

 Summary: Tez not working in local mode for M/R jobs
 Key: TEZ-3461
 URL: https://issues.apache.org/jira/browse/TEZ-3461
 Project: Apache Tez
  Issue Type: Bug
Reporter: Manuel Godbert


I have map/reduce jobs that work as expected within YARN, and I want to see if 
Tez can help me improving their performance. Alas, I am experiencing issues and 
I want to understand what happens, to see if I can adapt my code or if I can 
suggest Tez enhancements. For this I need to be able to debug jobs from within 
eclipse, with breakpoints in Tez source code etc.

I am working on a linux (ubuntu) platform
I use the latest Tez version I found, i.e. 0.9.0-SNAPSHOT (also tried with 
0.7.0)
I have set up the hortonworks mini dev cluster 
https://github.com/hortonworks/mini-dev-cluster
I am trying to run the basic WordCount2 code found here 
https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v2.0
I added the following code to have tez running locally:
conf.set("mapreduce.framework.name", "yarn-tez");
conf.setBoolean("tez.local.mode", true);
conf.set("fs.default.name", "file:///");
conf.setBoolean("tez.runtime.optimize.local.fetch", true);

And I am getting the following error:

2016-09-27 18:32:34 Running Dag: dag_1474992804027_0003_1
2016-09-27 18:32:34 Running Dag: dag_1474992804027_0003_1
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.tez.client.LocalClient.getApplicationReport(LocalClient.java:153)
at 
org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getAppReport(DAGClientRPCImpl.java:231)
at 
org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.createAMProxyIfNeeded(DAGClientRPCImpl.java:251)
at 
org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatus(DAGClientRPCImpl.java:96)
at 
org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusViaAM(DAGClientImpl.java:360)
at 
org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusInternal(DAGClientImpl.java:220)
at 
org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatus(DAGClientImpl.java:268)
at 
org.apache.tez.dag.api.client.MRDAGClient.getDAGStatus(MRDAGClient.java:58)
at 
org.apache.tez.mapreduce.client.YARNRunner.getJobStatus(YARNRunner.java:710)
at 
org.apache.tez.mapreduce.client.YARNRunner.submitJob(YARNRunner.java:650)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at WordCount2.main(WordCount2.java:136)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3459) Issues running M/R jobs with Tez

2016-10-05 Thread Manuel Godbert (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549419#comment-15549419
 ] 

Manuel Godbert commented on TEZ-3459:
-

It is included in the jar

> Issues running M/R jobs with Tez
> 
>
> Key: TEZ-3459
> URL: https://issues.apache.org/jira/browse/TEZ-3459
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
> Attachments: colorCount.sh, mr-example.jar
>
>
> After applying the patch delivered in TEZ-3330, I enriched the 
> MapredColorCount example to reproduce some of the other issues I encountered 
> on the jobs I wish to see running with Tez.
> I am attaching a jar to the JIRA, including source code, and a script file 
> detailing the observed results in comments.
> It adresses 3 issues:
> - the embedded jars in /lib are ignored by Tez, but YARN uses them without 
> additional configuration
> - The use of a combiner causes a NullPointerException
> - The counters incremented in the Reporter objects stay at 0
> I am using HDP2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3459) Issues running M/R jobs with Tez

2016-10-05 Thread Manuel Godbert (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manuel Godbert updated TEZ-3459:

Description: 
After applying the patch delivered in TEZ-3330, I enriched the MapredColorCount 
example to reproduce some of the other issues I encountered on the jobs I wish 
to see running with Tez.

I am attaching a jar to the JIRA, including source code, and a script file 
detailing the observed results in comments.

It adresses 3 issues:
- the embedded jars in /lib are ignored by Tez, but YARN uses them without 
additional configuration
- The use of a combiner causes a NullPointerException
- The counters incremented in the Reporter objects stay at 0

I am using HDP2.4

  was:
After applying the patch delivered in TEZ-3330, I enriched the MapredColorCount 
example to reproduce some of the other issues I encountered on the jobs I wish 
to see running with Tez.

I am attaching a jar to the JIRA, including source code, and a script file 
detailing the observed results in comments.

It adresses 3 issues:
- the embedded jars in /lib are ignored by Tez, but YARN uses them without 
additional configuration
- The use of a combiner causes a NullPointerException
- The counters incremented in the Reporter objects stay at 0


> Issues running M/R jobs with Tez
> 
>
> Key: TEZ-3459
> URL: https://issues.apache.org/jira/browse/TEZ-3459
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
> Attachments: colorCount.sh, mr-example.jar
>
>
> After applying the patch delivered in TEZ-3330, I enriched the 
> MapredColorCount example to reproduce some of the other issues I encountered 
> on the jobs I wish to see running with Tez.
> I am attaching a jar to the JIRA, including source code, and a script file 
> detailing the observed results in comments.
> It adresses 3 issues:
> - the embedded jars in /lib are ignored by Tez, but YARN uses them without 
> additional configuration
> - The use of a combiner causes a NullPointerException
> - The counters incremented in the Reporter objects stay at 0
> I am using HDP2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3459) Issues running M/R jobs with Tez

2016-10-05 Thread Manuel Godbert (JIRA)
Manuel Godbert created TEZ-3459:
---

 Summary: Issues running M/R jobs with Tez
 Key: TEZ-3459
 URL: https://issues.apache.org/jira/browse/TEZ-3459
 Project: Apache Tez
  Issue Type: Bug
Reporter: Manuel Godbert
 Attachments: colorCount.sh, mr-example.jar

After applying the patch delivered in TEZ-3330, I enriched the MapredColorCount 
example to reproduce some of the other issues I encountered on the jobs I wish 
to see running with Tez.

I am attaching a jar to the JIRA, including source code, and a script file 
detailing the observed results in comments.

It adresses 3 issues:
- the embedded jars in /lib are ignored by Tez, but YARN uses them without 
additional configuration
- The use of a combiner causes a NullPointerException
- The counters incremented in the Reporter objects stay at 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3459) Issues running M/R jobs with Tez

2016-10-05 Thread Manuel Godbert (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manuel Godbert updated TEZ-3459:

Attachment: mr-example.jar
colorCount.sh

> Issues running M/R jobs with Tez
> 
>
> Key: TEZ-3459
> URL: https://issues.apache.org/jira/browse/TEZ-3459
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
> Attachments: colorCount.sh, mr-example.jar
>
>
> After applying the patch delivered in TEZ-3330, I enriched the 
> MapredColorCount example to reproduce some of the other issues I encountered 
> on the jobs I wish to see running with Tez.
> I am attaching a jar to the JIRA, including source code, and a script file 
> detailing the observed results in comments.
> It adresses 3 issues:
> - the embedded jars in /lib are ignored by Tez, but YARN uses them without 
> additional configuration
> - The use of a combiner causes a NullPointerException
> - The counters incremented in the Reporter objects stay at 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-09-20 Thread Manuel Godbert (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15506084#comment-15506084
 ] 

Manuel Godbert commented on TEZ-3330:
-

I am afraid I do not understand what you expect from me, I am not used to git 
patches, just basic push and pull... I let you finalize the work!

> Error on avro M/R job with Tez: missing configuration property
> --
>
> Key: TEZ-3330
> URL: https://issues.apache.org/jira/browse/TEZ-3330
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
> Attachments: TEZ-3330.temp.2.patch, TEZ-3330.temp.patch
>
>
> I tried running the simple avro M/R job MapredColorCount, that I found in the 
> examples of avro release 1.7.7.
> It failed with the following trace:
> {code}
> errorMessage=Shuffle Runner 
> Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  Error while doing final merge
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376)
> ... 6 more
> {code}
> Digging a bit I saw that during shuffle Tez can't access some of the 
> configuration properties of the job. In our example it is the 
> avro.output.schema that is missing.
> With some more complicated code I could get one step further and a similar 
> issue happened when the valuesIterator for the reducer was being built:
> {code}
> java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
> at 
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
> at 
> org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80)
> at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287)
> {code}
> I am using HDP2.4, Tez 0.7.0, avro 1.7.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-09-15 Thread Manuel Godbert (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493310#comment-15493310
 ] 

Manuel Godbert commented on TEZ-3330:
-

Thanks, this solves the issue!

> Error on avro M/R job with Tez: missing configuration property
> --
>
> Key: TEZ-3330
> URL: https://issues.apache.org/jira/browse/TEZ-3330
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
> Attachments: TEZ-3330.temp.2.patch, TEZ-3330.temp.patch
>
>
> I tried running the simple avro M/R job MapredColorCount, that I found in the 
> examples of avro release 1.7.7.
> It failed with the following trace:
> {code}
> errorMessage=Shuffle Runner 
> Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  Error while doing final merge
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376)
> ... 6 more
> {code}
> Digging a bit I saw that during shuffle Tez can't access some of the 
> configuration properties of the job. In our example it is the 
> avro.output.schema that is missing.
> With some more complicated code I could get one step further and a similar 
> issue happened when the valuesIterator for the reducer was being built:
> {code}
> java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
> at 
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
> at 
> org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80)
> at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287)
> {code}
> I am using HDP2.4, Tez 0.7.0, avro 1.7.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-07-22 Thread Manuel Godbert (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384408#comment-15384408
 ] 

Manuel Godbert edited comment on TEZ-3330 at 7/22/16 4:52 PM:
--

New edit of my feedback, I did not set all my classpaths properly at first try 
so I deleted my comments.
Actually avro related properties are now properly passed, except the 
io.serializations that does not include the expected AvroSerialization, 
probably because it is a property that already exists with some other value : 
the "this.conf.addResource(conf)" in the patch does not affect properties 
already present in the initial conf.


was (Author: manuel.godbert):
Hello, thanks for the patch. I just tested it, it solves the shuffle error but 
not the second issue. The full trace is:

{code}
task:java.lang.NullPointerException
at java.io.StringReader.(StringReader.java:50)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
at 
org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
at 
org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
at 
org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:81)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:280)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.waitForInputReady(OrderedGroupedKVInput.java:176)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.getReader(OrderedGroupedKVInput.java:240)
at 
org.apache.tez.mapreduce.processor.reduce.ReduceProcessor.run(ReduceProcessor.java:130)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}

Regards

Edit: Actually I am not so sure it even solved the first issue, it has come 
back... there may be some random behaviour.

> Error on avro M/R job with Tez: missing configuration property
> --
>
> Key: TEZ-3330
> URL: https://issues.apache.org/jira/browse/TEZ-3330
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
> Attachments: TEZ-3330.temp.patch
>
>
> I tried running the simple avro M/R job MapredColorCount, that I found in the 
> examples of avro release 1.7.7.
> It failed with the following trace:
> {code}
> errorMessage=Shuffle Runner 
> Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  Error while doing final merge
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.ap

[jira] [Comment Edited] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-07-20 Thread Manuel Godbert (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384408#comment-15384408
 ] 

Manuel Godbert edited comment on TEZ-3330 at 7/20/16 4:25 PM:
--

Hello, thanks for the patch. I just tested it, it solves the shuffle error but 
not the second issue. The full trace is:

{code}
task:java.lang.NullPointerException
at java.io.StringReader.(StringReader.java:50)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
at 
org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
at 
org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
at 
org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:81)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:280)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.waitForInputReady(OrderedGroupedKVInput.java:176)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.getReader(OrderedGroupedKVInput.java:240)
at 
org.apache.tez.mapreduce.processor.reduce.ReduceProcessor.run(ReduceProcessor.java:130)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}

Regards

Edit: Actually I am not so sure it even solved the first issue, it has come 
back... there may be some random behaviour.


was (Author: manuel.godbert):
Hello, thanks for the patch. I just tested it, it solves the shuffle error but 
not the second issue. The full trace is:

{code}
task:java.lang.NullPointerException
at java.io.StringReader.(StringReader.java:50)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
at 
org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
at 
org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
at 
org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:81)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:280)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.waitForInputReady(OrderedGroupedKVInput.java:176)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.getReader(OrderedGroupedKVInput.java:240)
at 
org.apache.tez.mapreduce.processor.reduce.ReduceProcessor.run(ReduceProcessor.java:130)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}

Regards

> Error on avro M/R job with Tez: missing configuration property
> --
>
> Key: TEZ-3330
> URL: https://issues.apache.org/jira/browse/TEZ-3330
> Project: Apac

[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-07-19 Thread Manuel Godbert (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384408#comment-15384408
 ] 

Manuel Godbert commented on TEZ-3330:
-

Hello, thanks for the patch. I just tested it, it solves the shuffle error but 
not the second issue. The full trace is:

{code}
task:java.lang.NullPointerException
at java.io.StringReader.(StringReader.java:50)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
at 
org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
at 
org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
at 
org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:81)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:280)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.waitForInputReady(OrderedGroupedKVInput.java:176)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.getReader(OrderedGroupedKVInput.java:240)
at 
org.apache.tez.mapreduce.processor.reduce.ReduceProcessor.run(ReduceProcessor.java:130)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}

Regards

> Error on avro M/R job with Tez: missing configuration property
> --
>
> Key: TEZ-3330
> URL: https://issues.apache.org/jira/browse/TEZ-3330
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
> Attachments: TEZ-3330.temp.patch
>
>
> I tried running the simple avro M/R job MapredColorCount, that I found in the 
> examples of avro release 1.7.7.
> It failed with the following trace:
> {code}
> errorMessage=Shuffle Runner 
> Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  Error while doing final merge
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376)
> ... 6 more
> {code}
> Digging a bit I saw that during shuffle Tez can't access some of the 
> configuration properties of the job. In our example it is the

[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-07-12 Thread Manuel Godbert (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372798#comment-15372798
 ] 

Manuel Godbert commented on TEZ-3330:
-

I already tried that actually, with no success: the configuration property 
becomes available during shuffle but its value is the constant value of the 
tez-site.xml, not the value dynamically built at job setup.

> Error on avro M/R job with Tez: missing configuration property
> --
>
> Key: TEZ-3330
> URL: https://issues.apache.org/jira/browse/TEZ-3330
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
>
> I tried running the simple avro M/R job MapredColorCount, that I found in the 
> examples of avro release 1.7.7.
> It failed with the following trace:
> {code}
> errorMessage=Shuffle Runner 
> Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  Error while doing final merge
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376)
> ... 6 more
> {code}
> Digging a bit I saw that during shuffle Tez can't access some of the 
> configuration properties of the job. In our example it is the 
> avro.output.schema that is missing.
> With some more complicated code I could get one step further and a similar 
> issue happened when the valuesIterator for the reducer was being built:
> {code}
> java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
> at 
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
> at 
> org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80)
> at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287)
> {code}
> I am using HDP2.4, Tez 0.7.0, avro 1.7.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-07-11 Thread Manuel Godbert (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371714#comment-15371714
 ] 

Manuel Godbert commented on TEZ-3330:
-

This would be nice.
Before a fix is available, do you know if there is a way to parameterize the 
filter, defining the keys I need to keep in a special place for example? Or any 
other kind of workaround?

> Error on avro M/R job with Tez: missing configuration property
> --
>
> Key: TEZ-3330
> URL: https://issues.apache.org/jira/browse/TEZ-3330
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
>
> I tried running the simple avro M/R job MapredColorCount, that I found in the 
> examples of avro release 1.7.7.
> It failed with the following trace:
> {code}
> errorMessage=Shuffle Runner 
> Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  Error while doing final merge
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376)
> ... 6 more
> {code}
> Digging a bit I saw that during shuffle Tez can't access some of the 
> configuration properties of the job. In our example it is the 
> avro.output.schema that is missing.
> With some more complicated code I could get one step further and a similar 
> issue happened when the valuesIterator for the reducer was being built:
> {code}
> java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
> at 
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
> at 
> org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80)
> at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287)
> {code}
> I am using HDP2.4, Tez 0.7.0, avro 1.7.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-07-08 Thread Manuel Godbert (JIRA)
Manuel Godbert created TEZ-3330:
---

 Summary: Error on avro M/R job with Tez: missing configuration 
property
 Key: TEZ-3330
 URL: https://issues.apache.org/jira/browse/TEZ-3330
 Project: Apache Tez
  Issue Type: Bug
Reporter: Manuel Godbert


I tried running the simple avro M/R job MapredColorCount, that I found in the 
examples of avro release 1.7.7.
It failed with the following trace:
{code}
errorMessage=Shuffle Runner 
Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
 Error while doing final merge
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NullPointerException
at java.io.StringReader.(StringReader.java:50)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
at 
org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at 
org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376)
... 6 more
{code}

Digging a bit I saw that during shuffle Tez can't access some of the 
configuration properties of the job. In our example it is the 
avro.output.schema that is missing.

With some more complicated code I could get one step further and a similar 
issue happened when the valuesIterator for the reducer was being built:
{code}
java.lang.NullPointerException
at java.io.StringReader.(StringReader.java:50)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
at 
org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
at 
org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
at 
org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287)
{code}

I am using HDP2.4, Tez 0.7.0, avro 1.7.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)