Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail

2015-06-17 Thread Yin Huai
So, the second attemp of those tasks failed with NPE can complete and the
job eventually finished?

On Mon, Jun 15, 2015 at 10:37 PM, Night Wolf nightwolf...@gmail.com wrote:

 Hey Yin,

 Thanks for the link to the JIRA. I'll add details to it. But I'm able to
 reproduce it, at least in the same shell session, every time I do a write I
 get a random number of tasks failing on the first run with the NPE.

 Using dynamic allocation of executors in YARN mode. No speculative
 execution is enabled.

 On Tue, Jun 16, 2015 at 3:11 PM, Yin Huai yh...@databricks.com wrote:

 I saw it once but I was not clear how to reproduce it. The jira I created
 is https://issues.apache.org/jira/browse/SPARK-7837.

 More information will be very helpful. Were those errors from speculative
 tasks or regular tasks (the first attempt of the task)? Is this error
 deterministic (can you reproduce every time you run this command)?

 Thanks,

 Yin

 On Mon, Jun 15, 2015 at 8:59 PM, Night Wolf nightwolf...@gmail.com
 wrote:

 Looking at the logs of the executor, looks like it fails to find the
 file; e.g. for task 10323.0


 15/06/16 13:43:13 ERROR output.FileOutputCommitter: Hit IOException
 trying to rename
 maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340__m_010181_0/part-r-353626.gz.parquet
 to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353626.gz.parquet
 java.io.IOException: Invalid source or target
 at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
 at
 org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
 at
 org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
 at
 org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
 at
 org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
 at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
 $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
 at
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
 at
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
 at org.apache.spark.scheduler.Task.run(Task.scala:70)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 15/06/16 13:43:13 ERROR mapred.SparkHadoopMapRedUtil: Error committing
 the output of task: attempt_201506161340__m_010181_0
 java.io.IOException: Invalid source or target
 at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
 at
 org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
 at
 org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
 at
 org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
 at
 org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
 at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
 $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
 at
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
 at
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
 at org.apache.spark.scheduler.Task.run(Task.scala:70)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 15/06/16 13:43:16 ERROR output.FileOutputCommitter: Hit IOException
 trying to rename
 maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341__m_010323_0/part-r-353768.gz.parquet
 to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353768.gz.parquet
 java.io.IOException: Invalid 

Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail

2015-06-17 Thread Cheng Lian
What's the size of this table? Is the data skewed (so that speculation 
is probably triggered)?


Cheng

On 6/15/15 10:37 PM, Night Wolf wrote:

Hey Yin,

Thanks for the link to the JIRA. I'll add details to it. But I'm able 
to reproduce it, at least in the same shell session, every time I do a 
write I get a random number of tasks failing on the first run with the 
NPE.


Using dynamic allocation of executors in YARN mode. No speculative 
execution is enabled.


On Tue, Jun 16, 2015 at 3:11 PM, Yin Huai yh...@databricks.com 
mailto:yh...@databricks.com wrote:


I saw it once but I was not clear how to reproduce it. The jira I
created is https://issues.apache.org/jira/browse/SPARK-7837.

More information will be very helpful. Were those errors from
speculative tasks or regular tasks (the first attempt of the
task)? Is this error deterministic (can you reproduce every time
you run this command)?

Thanks,

Yin

On Mon, Jun 15, 2015 at 8:59 PM, Night Wolf
nightwolf...@gmail.com mailto:nightwolf...@gmail.com wrote:

Looking at the logs of the executor, looks like it fails to
find the file; e.g. for task 10323.0


15/06/16 13:43:13 ERROR output.FileOutputCommitter: Hit
IOException trying to rename

maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340__m_010181_0/part-r-353626.gz.parquet
to
maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353626.gz.parquet
java.io.IOException: Invalid source or target
at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
at

org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
at

org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
at

org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
at

org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
at

org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
at

org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
at

org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org

http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
at

org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at

org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/06/16 13:43:13 ERROR mapred.SparkHadoopMapRedUtil: Error
committing the output of task:
attempt_201506161340__m_010181_0
java.io.IOException: Invalid source or target
at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
at

org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
at

org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
at

org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
at

org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
at

org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
at

org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
at

org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org

http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
at

org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at

org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at

Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail

2015-06-15 Thread Night Wolf
Looking at the logs of the executor, looks like it fails to find the file;
e.g. for task 10323.0


15/06/16 13:43:13 ERROR output.FileOutputCommitter: Hit IOException trying
to rename
maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340__m_010181_0/part-r-353626.gz.parquet
to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353626.gz.parquet
java.io.IOException: Invalid source or target
at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
at
org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
at
org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/06/16 13:43:13 ERROR mapred.SparkHadoopMapRedUtil: Error committing the
output of task: attempt_201506161340__m_010181_0
java.io.IOException: Invalid source or target
at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
at
org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
at
org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/06/16 13:43:16 ERROR output.FileOutputCommitter: Hit IOException trying
to rename
maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341__m_010323_0/part-r-353768.gz.parquet
to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353768.gz.parquet
java.io.IOException: Invalid source or target
at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
at
org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
at
org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at

Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail

2015-06-15 Thread Night Wolf
Hi guys,

Using Spark 1.4, trying to save a dataframe as a table, a really simple
test, but I'm getting a bunch of NPEs;

The code Im running is very simple;

 
qc.read.parquet(/user/sparkuser/data/staged/item_sales_basket_id.parquet).write.format(parquet).saveAsTable(is_20150617_test2)

Logs of tasks lost;

[Stage 0:=(8771 + 450) /
13000]15/06/16 03:42:30 WARN TaskSetManager: Lost task 10681.0 in stage 0.0
(TID 8757, qtausc-pphd0146): java.lang.NullPointerException
at
parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146)
at
parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112)
at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73)
at
org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:116)
at
org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:404)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:160)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

[Stage 0:==   (9006 + 490) /
13000]15/06/16 03:43:22 WARN TaskSetManager: Lost task 10323.0 in stage 0.0
(TID 8896, qtausc-pphd0167): java.lang.NullPointerException
at
parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146)
at
parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112)
at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73)
at
org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:116)
at
org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:404)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:160)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail

2015-06-15 Thread Yin Huai
I saw it once but I was not clear how to reproduce it. The jira I created
is https://issues.apache.org/jira/browse/SPARK-7837.

More information will be very helpful. Were those errors from speculative
tasks or regular tasks (the first attempt of the task)? Is this error
deterministic (can you reproduce every time you run this command)?

Thanks,

Yin

On Mon, Jun 15, 2015 at 8:59 PM, Night Wolf nightwolf...@gmail.com wrote:

 Looking at the logs of the executor, looks like it fails to find the file;
 e.g. for task 10323.0


 15/06/16 13:43:13 ERROR output.FileOutputCommitter: Hit IOException trying
 to rename
 maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340__m_010181_0/part-r-353626.gz.parquet
 to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353626.gz.parquet
 java.io.IOException: Invalid source or target
 at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
 at
 org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
 at
 org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
 at
 org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
 at
 org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
 at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
 $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
 at
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
 at
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
 at org.apache.spark.scheduler.Task.run(Task.scala:70)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 15/06/16 13:43:13 ERROR mapred.SparkHadoopMapRedUtil: Error committing the
 output of task: attempt_201506161340__m_010181_0
 java.io.IOException: Invalid source or target
 at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
 at
 org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
 at
 org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
 at
 org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
 at
 org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
 at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
 $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
 at
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
 at
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
 at org.apache.spark.scheduler.Task.run(Task.scala:70)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 15/06/16 13:43:16 ERROR output.FileOutputCommitter: Hit IOException trying
 to rename
 maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341__m_010323_0/part-r-353768.gz.parquet
 to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353768.gz.parquet
 java.io.IOException: Invalid source or target
 at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
 at
 org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
 at
 

Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail

2015-06-15 Thread Night Wolf
Hey Yin,

Thanks for the link to the JIRA. I'll add details to it. But I'm able to
reproduce it, at least in the same shell session, every time I do a write I
get a random number of tasks failing on the first run with the NPE.

Using dynamic allocation of executors in YARN mode. No speculative
execution is enabled.

On Tue, Jun 16, 2015 at 3:11 PM, Yin Huai yh...@databricks.com wrote:

 I saw it once but I was not clear how to reproduce it. The jira I created
 is https://issues.apache.org/jira/browse/SPARK-7837.

 More information will be very helpful. Were those errors from speculative
 tasks or regular tasks (the first attempt of the task)? Is this error
 deterministic (can you reproduce every time you run this command)?

 Thanks,

 Yin

 On Mon, Jun 15, 2015 at 8:59 PM, Night Wolf nightwolf...@gmail.com
 wrote:

 Looking at the logs of the executor, looks like it fails to find the
 file; e.g. for task 10323.0


 15/06/16 13:43:13 ERROR output.FileOutputCommitter: Hit IOException
 trying to rename
 maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340__m_010181_0/part-r-353626.gz.parquet
 to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353626.gz.parquet
 java.io.IOException: Invalid source or target
 at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
 at
 org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
 at
 org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
 at
 org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
 at
 org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
 at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
 $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
 at
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
 at
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
 at org.apache.spark.scheduler.Task.run(Task.scala:70)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 15/06/16 13:43:13 ERROR mapred.SparkHadoopMapRedUtil: Error committing
 the output of task: attempt_201506161340__m_010181_0
 java.io.IOException: Invalid source or target
 at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
 at
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
 at
 org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
 at
 org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
 at
 org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
 at
 org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
 at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
 $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
 at
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
 at
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
 at org.apache.spark.scheduler.Task.run(Task.scala:70)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 15/06/16 13:43:16 ERROR output.FileOutputCommitter: Hit IOException
 trying to rename
 maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341__m_010323_0/part-r-353768.gz.parquet
 to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353768.gz.parquet
 java.io.IOException: Invalid source or target
 at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
 at