[jira] [Assigned] (SPARK-18672) Close recordwriter in SparkHadoopMapReduceWriter before committing
[ https://issues.apache.org/jira/browse/SPARK-18672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18672: Assignee: Apache Spark > Close recordwriter in SparkHadoopMapReduceWriter before committing > -- > > Key: SPARK-18672 > URL: https://issues.apache.org/jira/browse/SPARK-18672 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Hyukjin Kwon >Assignee: Apache Spark > > It seems some APIs such as {{PairRDDFunctions.saveAsHadoopDataset()}} do not > close the record writer before issuing the commit for the task. > On Windows, the output in the temp directory is being open and output > committer tries to rename it from temp directory to the output directory > after finishing writing. > So, it fails to move the file. It seems we should close the writer actually > before committing the task like the other writers such as > {{FileFormatWriter}}. > Identified failure was as below: > {code} > FAILURE! - in org.apache.spark.JavaAPISuite > writeWithNewAPIHadoopFile(org.apache.spark.JavaAPISuite) Time elapsed: 0.25 > sec <<< ERROR! > org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.JavaAPISuite.writeWithNewAPIHadoopFile(JavaAPISuite.java:1231) > Caused by: org.apache.spark.SparkException: > Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor > driver): org.apache.spark.SparkException: Task failed while writing rows > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:182) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:100) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:99) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Could not rename > file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/_temporary/attempt_20161201005155__r_00_0 > to > file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/task_20161201005155__r_00 > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:436) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:415) > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50) > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:153) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:167) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:156) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:168) > ... 8 more > Driver stacktrace: > at > org.apache.spark.JavaAPISuite.writeWithNewAPIHadoopFile(JavaAPISuite.java:1231) > Caused by: org.apache.spark.SparkException: Task failed while writing rows > Caused by: java.io.IOException: Could not rename > file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/_temporary/attempt_20161201005155__r_00_0 > to > file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/task_20161201005155__r_00 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18672) Close recordwriter in SparkHadoopMapReduceWriter before committing
[ https://issues.apache.org/jira/browse/SPARK-18672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18672: Assignee: (was: Apache Spark) > Close recordwriter in SparkHadoopMapReduceWriter before committing > -- > > Key: SPARK-18672 > URL: https://issues.apache.org/jira/browse/SPARK-18672 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Hyukjin Kwon > > It seems some APIs such as {{PairRDDFunctions.saveAsHadoopDataset()}} do not > close the record writer before issuing the commit for the task. > On Windows, the output in the temp directory is being open and output > committer tries to rename it from temp directory to the output directory > after finishing writing. > So, it fails to move the file. It seems we should close the writer actually > before committing the task like the other writers such as > {{FileFormatWriter}}. > Identified failure was as below: > {code} > FAILURE! - in org.apache.spark.JavaAPISuite > writeWithNewAPIHadoopFile(org.apache.spark.JavaAPISuite) Time elapsed: 0.25 > sec <<< ERROR! > org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.JavaAPISuite.writeWithNewAPIHadoopFile(JavaAPISuite.java:1231) > Caused by: org.apache.spark.SparkException: > Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor > driver): org.apache.spark.SparkException: Task failed while writing rows > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:182) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:100) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:99) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Could not rename > file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/_temporary/attempt_20161201005155__r_00_0 > to > file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/task_20161201005155__r_00 > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:436) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:415) > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50) > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:153) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:167) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:156) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:168) > ... 8 more > Driver stacktrace: > at > org.apache.spark.JavaAPISuite.writeWithNewAPIHadoopFile(JavaAPISuite.java:1231) > Caused by: org.apache.spark.SparkException: Task failed while writing rows > Caused by: java.io.IOException: Could not rename > file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/_temporary/attempt_20161201005155__r_00_0 > to > file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/task_20161201005155__r_00 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org