So, the second attemp of those tasks failed with NPE can complete and the job eventually finished?
On Mon, Jun 15, 2015 at 10:37 PM, Night Wolf <nightwolf...@gmail.com> wrote: > Hey Yin, > > Thanks for the link to the JIRA. I'll add details to it. But I'm able to > reproduce it, at least in the same shell session, every time I do a write I > get a random number of tasks failing on the first run with the NPE. > > Using dynamic allocation of executors in YARN mode. No speculative > execution is enabled. > > On Tue, Jun 16, 2015 at 3:11 PM, Yin Huai <yh...@databricks.com> wrote: > >> I saw it once but I was not clear how to reproduce it. The jira I created >> is https://issues.apache.org/jira/browse/SPARK-7837. >> >> More information will be very helpful. Were those errors from speculative >> tasks or regular tasks (the first attempt of the task)? Is this error >> deterministic (can you reproduce every time you run this command)? >> >> Thanks, >> >> Yin >> >> On Mon, Jun 15, 2015 at 8:59 PM, Night Wolf <nightwolf...@gmail.com> >> wrote: >> >>> Looking at the logs of the executor, looks like it fails to find the >>> file; e.g. for task 10323.0 >>> >>> >>> 15/06/16 13:43:13 ERROR output.FileOutputCommitter: Hit IOException >>> trying to rename >>> maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340_0000_m_010181_0/part-r-353626.gz.parquet >>> to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353626.gz.parquet >>> java.io.IOException: Invalid source or target >>> at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) >>> at >>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) >>> at >>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) >>> at >>> org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) >>> at >>> org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) >>> at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org >>> $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) >>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> 15/06/16 13:43:13 ERROR mapred.SparkHadoopMapRedUtil: Error committing >>> the output of task: attempt_201506161340_0000_m_010181_0 >>> java.io.IOException: Invalid source or target >>> at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) >>> at >>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) >>> at >>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) >>> at >>> org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) >>> at >>> org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) >>> at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org >>> $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) >>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> 15/06/16 13:43:16 ERROR output.FileOutputCommitter: Hit IOException >>> trying to rename >>> maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341_0000_m_010323_0/part-r-353768.gz.parquet >>> to maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353768.gz.parquet >>> java.io.IOException: Invalid source or target >>> at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) >>> at >>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) >>> at >>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) >>> at >>> org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) >>> at >>> org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) >>> at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org >>> $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) >>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> 15/06/16 13:43:16 ERROR mapred.SparkHadoopMapRedUtil: Error committing >>> the output of task: attempt_201506161341_0000_m_010323_0 >>> java.io.IOException: Invalid source or target >>> at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) >>> at >>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) >>> at >>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) >>> at >>> org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) >>> at >>> org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) >>> at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org >>> $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) >>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> 15/06/16 13:43:20 INFO codec.CodecConfig: Compression: GZIP >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet block size to >>> 134217728 >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet page size to >>> 1048576 >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet dictionary >>> page size to 1048576 >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Dictionary is on >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Validation is off >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Writer version is: >>> PARQUET_1_0 >>> 15/06/16 13:43:20 INFO codec.CodecConfig: Compression: GZIP >>> 15/06/16 13:43:20 INFO codec.CodecConfig: Compression: GZIP >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet block size to >>> 134217728 >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet block size to >>> 134217728 >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet page size to >>> 1048576 >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet page size to >>> 1048576 >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet dictionary >>> page size to 1048576 >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet dictionary >>> page size to 1048576 >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Dictionary is on >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Dictionary is on >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Validation is off >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Validation is off >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Writer version is: >>> PARQUET_1_0 >>> 15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Writer version is: >>> PARQUET_1_0 >>> 15/06/16 13:43:20 ERROR fs.MapRFileSystem: Failed to delete path >>> maprfs:/user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340_0000_m_010181_0, >>> error: No such file or directory (2) >>> 15/06/16 13:43:21 ERROR sources.DefaultWriterContainer: Task attempt >>> attempt_201506161340_0000_m_010181_0 aborted. >>> 15/06/16 13:43:21 ERROR sources.InsertIntoHadoopFsRelation: Aborting >>> task. >>> java.lang.RuntimeException: Failed to commit task >>> at >>> org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:398) >>> at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org >>> $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) >>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> Caused by: java.io.IOException: Invalid source or target >>> at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) >>> at >>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) >>> at >>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) >>> at >>> org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) >>> at >>> org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) >>> ... 9 more >>> 15/06/16 13:43:21 ERROR fs.MapRFileSystem: Failed to delete path >>> maprfs:/user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341_0000_m_010323_0, >>> error: No such file or directory (2) >>> 15/06/16 13:43:21 INFO compress.CodecPool: Got brand-new compressor [.gz] >>> 15/06/16 13:43:21 INFO compress.CodecPool: Got brand-new compressor [.gz] >>> 15/06/16 13:43:21 INFO compress.CodecPool: Got brand-new compressor [.gz] >>> 15/06/16 13:43:21 INFO hadoop.InternalParquetRecordReader: at row 0. >>> reading next block >>> 15/06/16 13:43:21 INFO hadoop.InternalParquetRecordReader: at row 0. >>> reading next block >>> 15/06/16 13:43:21 INFO hadoop.InternalParquetRecordReader: at row 0. >>> reading next block >>> 15/06/16 13:43:21 INFO hadoop.InternalParquetRecordReader: block read in >>> memory in 124 ms. row count = 998525 >>> 15/06/16 13:43:21 INFO hadoop.InternalParquetRecordReader: block read in >>> memory in 201 ms. row count = 983534 >>> 15/06/16 13:43:21 INFO hadoop.InternalParquetRecordReader: block read in >>> memory in 217 ms. row count = 970355 >>> 15/06/16 13:43:22 ERROR sources.DefaultWriterContainer: Task attempt >>> attempt_201506161341_0000_m_010323_0 aborted. >>> 15/06/16 13:43:22 ERROR sources.InsertIntoHadoopFsRelation: Aborting >>> task. >>> java.lang.RuntimeException: Failed to commit task >>> at >>> org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:398) >>> at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org >>> $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) >>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> Caused by: java.io.IOException: Invalid source or target >>> at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225) >>> at >>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167) >>> at >>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100) >>> at >>> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137) >>> at >>> org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357) >>> at >>> org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394) >>> ... 9 more >>> 15/06/16 13:43:22 ERROR fs.MapRFileSystem: Failed to delete path >>> maprfs:/user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341_0000_m_010323_0, >>> error: No such file or directory (2) >>> 15/06/16 13:43:22 ERROR fs.MapRFileSystem: Failed to delete path >>> maprfs:/user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340_0000_m_010181_0, >>> error: No such file or directory (2) >>> 15/06/16 13:43:22 ERROR sources.DefaultWriterContainer: Task attempt >>> attempt_201506161341_0000_m_010323_0 aborted. >>> 15/06/16 13:43:22 ERROR sources.DefaultWriterContainer: Task attempt >>> attempt_201506161340_0000_m_010181_0 aborted. >>> 15/06/16 13:43:22 ERROR executor.Executor: Exception in task 10323.0 in >>> stage 0.0 (TID 8896) >>> java.lang.NullPointerException >>> at >>> parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146) >>> at >>> parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112) >>> at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) >>> at >>> org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:116) >>> at >>> org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:404) >>> at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org >>> $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:160) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) >>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> 15/06/16 13:43:22 ERROR executor.Executor: Exception in task 10181.0 in >>> stage 0.0 (TID 8835) >>> java.lang.NullPointerException >>> at >>> parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146) >>> at >>> parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112) >>> at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) >>> at >>> org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:116) >>> at >>> org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:404) >>> at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org >>> $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:160) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at >>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) >>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> 15/06/16 13:43:22 INFO executor.CoarseGrainedExecutorBackend: Got >>> assigned task 9552 >>> 15/06/16 13:43:22 INFO executor.Executor: Running task 11093.0 in stage >>> 0.0 (TID 9552) >>> 15/06/16 13:43:22 INFO executor.CoarseGrainedExecutorBackend: Got >>> assigned task 9553 >>> 15/06/16 13:43:22 INFO executor.Executor: Running task 10323.1 in stage >>> 0.0 (TID 9553) >>> >>> On Tue, Jun 16, 2015 at 1:47 PM, Night Wolf <nightwolf...@gmail.com> >>> wrote: >>> >>>> Hi guys, >>>> >>>> Using Spark 1.4, trying to save a dataframe as a table, a really simple >>>> test, but I'm getting a bunch of NPEs; >>>> >>>> The code Im running is very simple; >>>> >>>> >>>> >>>> qc.read.parquet("/user/sparkuser/data/staged/item_sales_basket_id.parquet").write.format("parquet").saveAsTable("is_20150617_test2") >>>> >>>> Logs of tasks lost; >>>> >>>> [Stage 0:=================================> (8771 + 450) >>>> / 13000]15/06/16 03:42:30 WARN TaskSetManager: Lost task 10681.0 in stage >>>> 0.0 (TID 8757, qtausc-pphd0146): java.lang.NullPointerException >>>> at >>>> parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146) >>>> at >>>> parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112) >>>> at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) >>>> at >>>> org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:116) >>>> at >>>> org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:404) >>>> at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org >>>> $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:160) >>>> at >>>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>>> at >>>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) >>>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> >>>> [Stage 0:==================================> (9006 + 490) >>>> / 13000]15/06/16 03:43:22 WARN TaskSetManager: Lost task 10323.0 in stage >>>> 0.0 (TID 8896, qtausc-pphd0167): java.lang.NullPointerException >>>> at >>>> parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146) >>>> at >>>> parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112) >>>> at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) >>>> at >>>> org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:116) >>>> at >>>> org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:404) >>>> at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org >>>> $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:160) >>>> at >>>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>>> at >>>> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132) >>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) >>>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> >>>> >>>> >>> >> >