subject:"\[jira\] \[Updated\] \(TEZ\-3348\) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat"

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-18 Thread Piyush Narang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piyush Narang updated TEZ-3348:
---
Attachment: 11.patch

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>Assignee: Piyush Narang
> Attachments: 11.patch, 11.patch
>
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-18 Thread Piyush Narang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piyush Narang updated TEZ-3348:
---
Attachment: (was: 11.patch)

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>Assignee: Piyush Narang
> Attachments: 11.patch
>
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-18 Thread Piyush Narang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piyush Narang updated TEZ-3348:
---
Attachment: (was: 11.patch)

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>Assignee: Piyush Narang
> Attachments: 11.patch
>
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-18 Thread Piyush Narang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piyush Narang updated TEZ-3348:
---
Attachment: (was: 11.patch)

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>Assignee: Piyush Narang
> Attachments: 11.patch
>
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-15 Thread Piyush Narang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piyush Narang updated TEZ-3348:
---
Attachment: 11.patch

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>Assignee: Piyush Narang
> Attachments: 11.patch, 11.patch, 11.patch, 11.patch
>
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-15 Thread Piyush Narang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piyush Narang updated TEZ-3348:
---
Attachment: 11.patch

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>Assignee: Piyush Narang
> Attachments: 11.patch, 11.patch, 11.patch
>
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-15 Thread Piyush Narang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piyush Narang updated TEZ-3348:
---
Attachment: 11.patch

Updated patch

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>Assignee: Piyush Narang
> Attachments: 11.patch, 11.patch
>
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-15 Thread Piyush Narang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piyush Narang updated TEZ-3348:
---
Attachment: (was: 11_07142016.patch)

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>Assignee: Piyush Narang
> Attachments: 11.patch
>
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-14 Thread Piyush Narang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piyush Narang updated TEZ-3348:
---
Attachment: 11_07142016.patch

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>Assignee: Piyush Narang
> Attachments: 11.patch, 11_07142016.patch
>
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-14 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3348:
-
Assignee: Piyush Narang

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>Assignee: Piyush Narang
> Attachments: 11.patch
>
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-14 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3348:
-
Attachment: 11.patch

Attaching patch from pull request to trigger pre-commit 

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
> Attachments: 11.patch
>
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

[jira] [Updated] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

11 matches

Site Navigation

Mail list logo

Footer information