Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail
So, the second attemp of those tasks failed with NPE can complete and the job eventually finished? On Mon, Jun 15, 2015 at 10:37 PM, Night Wolf nightwolf...@gmail.com wrote: Hey Yin, Thanks for the link to the JIRA. I'll add details to it. But I'm able to reproduce it, at least in the same shell session, every time I do a write I get a random number of tasks failing on the first run with the NPE. Using dynamic allocation of executors in YARN mode. No speculative execution is enabled. On Tue, Jun 16, 2015 at 3:11 PM, Yin Huai yh...@databricks.com wrote: I saw it once but I was not clear how to reproduce it. The jira I created is https://issues.apache.org/jira/browse/SPARK-7837. More information will be very helpful. Were those errors from speculative tasks or regular tasks (the first attempt of the task)? Is this error deterministic (can you reproduce every time you run this command)? Thanks, Yin On Mon, Jun 15, 2015 at 8:59 PM, Night Wolf nightwolf...@gmail.com wrote: Looking at the logs of the executor, looks like it fails to find the file; e.g. for task 10323.0 15/06/16 13:43:13 ERROR output.FileOutputCommitter: Hit IOException trying to rename maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340__m_010181_0/part-r-353626.gz.parquet to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353626.gz.parquet java.io.IOException: Invalid source or target at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) at org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) at org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/06/16 13:43:13 ERROR mapred.SparkHadoopMapRedUtil: Error committing the output of task: attempt_201506161340__m_010181_0 java.io.IOException: Invalid source or target at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) at org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) at org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/06/16 13:43:16 ERROR output.FileOutputCommitter: Hit IOException trying to rename maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341__m_010323_0/part-r-353768.gz.parquet to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353768.gz.parquet java.io.IOException: Invalid
Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail
What's the size of this table? Is the data skewed (so that speculation is probably triggered)? Cheng On 6/15/15 10:37 PM, Night Wolf wrote: Hey Yin, Thanks for the link to the JIRA. I'll add details to it. But I'm able to reproduce it, at least in the same shell session, every time I do a write I get a random number of tasks failing on the first run with the NPE. Using dynamic allocation of executors in YARN mode. No speculative execution is enabled. On Tue, Jun 16, 2015 at 3:11 PM, Yin Huai yh...@databricks.com mailto:yh...@databricks.com wrote: I saw it once but I was not clear how to reproduce it. The jira I created is https://issues.apache.org/jira/browse/SPARK-7837. More information will be very helpful. Were those errors from speculative tasks or regular tasks (the first attempt of the task)? Is this error deterministic (can you reproduce every time you run this command)? Thanks, Yin On Mon, Jun 15, 2015 at 8:59 PM, Night Wolf nightwolf...@gmail.com mailto:nightwolf...@gmail.com wrote: Looking at the logs of the executor, looks like it fails to find the file; e.g. for task 10323.0 15/06/16 13:43:13 ERROR output.FileOutputCommitter: Hit IOException trying to rename maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340__m_010181_0/part-r-353626.gz.parquet to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353626.gz.parquet java.io.IOException: Invalid source or target at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) at org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) at org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/06/16 13:43:13 ERROR mapred.SparkHadoopMapRedUtil: Error committing the output of task: attempt_201506161340__m_010181_0 java.io.IOException: Invalid source or target at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) at org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) at org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at
Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail
Looking at the logs of the executor, looks like it fails to find the file; e.g. for task 10323.0 15/06/16 13:43:13 ERROR output.FileOutputCommitter: Hit IOException trying to rename maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340__m_010181_0/part-r-353626.gz.parquet to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353626.gz.parquet java.io.IOException: Invalid source or target at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) at org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) at org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/06/16 13:43:13 ERROR mapred.SparkHadoopMapRedUtil: Error committing the output of task: attempt_201506161340__m_010181_0 java.io.IOException: Invalid source or target at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) at org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) at org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/06/16 13:43:16 ERROR output.FileOutputCommitter: Hit IOException trying to rename maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341__m_010323_0/part-r-353768.gz.parquet to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353768.gz.parquet java.io.IOException: Invalid source or target at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) at org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) at org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at
Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail
Hi guys, Using Spark 1.4, trying to save a dataframe as a table, a really simple test, but I'm getting a bunch of NPEs; The code Im running is very simple; qc.read.parquet(/user/sparkuser/data/staged/item_sales_basket_id.parquet).write.format(parquet).saveAsTable(is_20150617_test2) Logs of tasks lost; [Stage 0:=(8771 + 450) / 13000]15/06/16 03:42:30 WARN TaskSetManager: Lost task 10681.0 in stage 0.0 (TID 8757, qtausc-pphd0146): java.lang.NullPointerException at parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146) at parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112) at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) at org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:116) at org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:404) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:160) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [Stage 0:== (9006 + 490) / 13000]15/06/16 03:43:22 WARN TaskSetManager: Lost task 10323.0 in stage 0.0 (TID 8896, qtausc-pphd0167): java.lang.NullPointerException at parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146) at parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112) at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) at org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:116) at org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:404) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:160) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail
I saw it once but I was not clear how to reproduce it. The jira I created is https://issues.apache.org/jira/browse/SPARK-7837. More information will be very helpful. Were those errors from speculative tasks or regular tasks (the first attempt of the task)? Is this error deterministic (can you reproduce every time you run this command)? Thanks, Yin On Mon, Jun 15, 2015 at 8:59 PM, Night Wolf nightwolf...@gmail.com wrote: Looking at the logs of the executor, looks like it fails to find the file; e.g. for task 10323.0 15/06/16 13:43:13 ERROR output.FileOutputCommitter: Hit IOException trying to rename maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340__m_010181_0/part-r-353626.gz.parquet to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353626.gz.parquet java.io.IOException: Invalid source or target at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) at org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) at org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/06/16 13:43:13 ERROR mapred.SparkHadoopMapRedUtil: Error committing the output of task: attempt_201506161340__m_010181_0 java.io.IOException: Invalid source or target at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) at org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) at org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/06/16 13:43:16 ERROR output.FileOutputCommitter: Hit IOException trying to rename maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341__m_010323_0/part-r-353768.gz.parquet to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353768.gz.parquet java.io.IOException: Invalid source or target at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) at
Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail
Hey Yin, Thanks for the link to the JIRA. I'll add details to it. But I'm able to reproduce it, at least in the same shell session, every time I do a write I get a random number of tasks failing on the first run with the NPE. Using dynamic allocation of executors in YARN mode. No speculative execution is enabled. On Tue, Jun 16, 2015 at 3:11 PM, Yin Huai yh...@databricks.com wrote: I saw it once but I was not clear how to reproduce it. The jira I created is https://issues.apache.org/jira/browse/SPARK-7837. More information will be very helpful. Were those errors from speculative tasks or regular tasks (the first attempt of the task)? Is this error deterministic (can you reproduce every time you run this command)? Thanks, Yin On Mon, Jun 15, 2015 at 8:59 PM, Night Wolf nightwolf...@gmail.com wrote: Looking at the logs of the executor, looks like it fails to find the file; e.g. for task 10323.0 15/06/16 13:43:13 ERROR output.FileOutputCommitter: Hit IOException trying to rename maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340__m_010181_0/part-r-353626.gz.parquet to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353626.gz.parquet java.io.IOException: Invalid source or target at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) at org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) at org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/06/16 13:43:13 ERROR mapred.SparkHadoopMapRedUtil: Error committing the output of task: attempt_201506161340__m_010181_0 java.io.IOException: Invalid source or target at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) at org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) at org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/06/16 13:43:16 ERROR output.FileOutputCommitter: Hit IOException trying to rename maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341__m_010323_0/part-r-353768.gz.parquet to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353768.gz.parquet java.io.IOException: Invalid source or target at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) at