[jira] [Commented] (SPARK-20886) HadoopMapReduceCommitProtocol to fail with message if FileOutputCommitter.getWorkPath==null
[ https://issues.apache.org/jira/browse/SPARK-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171386#comment-16171386 ] Steve Loughran commented on SPARK-20886: Not, but related. This is handling the situation where the committer is a classic FileOutputCommitter (or subclass, like the parquet one), but when you ask for a working dir it returns null. SPARK-21549 looks like there's hard-coded expectations of a dest dir being set via the (private) config option used by FileOutputCommitter, and NPEing if its not there > HadoopMapReduceCommitProtocol to fail with message if > FileOutputCommitter.getWorkPath==null > --- > > Key: SPARK-20886 > URL: https://issues.apache.org/jira/browse/SPARK-20886 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Trivial > Fix For: 2.3.0 > > > This is minor, and the root cause is my fault *elsewhere*, but its the patch > I used to track down the problem. > If {{HadoopMapReduceCommitProtocol}} has a {{FileOutputCommitter}} for > committing things, and *somehow* that's been configured with a > {{JobAttemptContext}}, not a {{TaskAttemptContext}}, then the committer NPEs. > A {{require()}} statement can validate the working path and so point the > blame at whoever's code is confused. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20886) HadoopMapReduceCommitProtocol to fail with message if FileOutputCommitter.getWorkPath==null
[ https://issues.apache.org/jira/browse/SPARK-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171227#comment-16171227 ] Sergey Zhemzhitsky commented on SPARK-20886: [~ste...@apache.org], [~hyukjin.kwon], does this patch also fixes NPE described in SPARK-21549? > HadoopMapReduceCommitProtocol to fail with message if > FileOutputCommitter.getWorkPath==null > --- > > Key: SPARK-20886 > URL: https://issues.apache.org/jira/browse/SPARK-20886 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Trivial > Fix For: 2.3.0 > > > This is minor, and the root cause is my fault *elsewhere*, but its the patch > I used to track down the problem. > If {{HadoopMapReduceCommitProtocol}} has a {{FileOutputCommitter}} for > committing things, and *somehow* that's been configured with a > {{JobAttemptContext}}, not a {{TaskAttemptContext}}, then the committer NPEs. > A {{require()}} statement can validate the working path and so point the > blame at whoever's code is confused. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20886) HadoopMapReduceCommitProtocol to fail with message if FileOutputCommitter.getWorkPath==null
[ https://issues.apache.org/jira/browse/SPARK-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024914#comment-16024914 ] Apache Spark commented on SPARK-20886: -- User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/18111 > HadoopMapReduceCommitProtocol to fail with message if > FileOutputCommitter.getWorkPath==null > --- > > Key: SPARK-20886 > URL: https://issues.apache.org/jira/browse/SPARK-20886 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Steve Loughran >Priority: Trivial > > This is minor, and the root cause is my fault *elsewhere*, but its the patch > I used to track down the problem. > If {{HadoopMapReduceCommitProtocol}} has a {{FileOutputCommitter}} for > committing things, and *somehow* that's been configured with a > {{JobAttemptContext}}, not a {{TaskAttemptContext}}, then the committer NPEs. > A {{require()}} statement can validate the working path and so point the > blame at whoever's code is confused. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20886) HadoopMapReduceCommitProtocol to fail with message if FileOutputCommitter.getWorkPath==null
[ https://issues.apache.org/jira/browse/SPARK-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024886#comment-16024886 ] Steve Loughran commented on SPARK-20886: Stack trace: after {code} 2017-05-25 16:22:10,807 [dag-scheduler-event-loop] INFO scheduler.DAGScheduler (Logging.scala:logInfo(54)) - ResultStage 2 (apply at Transformer.scala:22) failed in 0.065 s due to Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost, executor driver): org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:263) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:182) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:181) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: requirement failed: Committer has no workpath FileOutputCommitter{outputPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/tmp/spark-41b05e5f-93eb-4b1b-8e9d-7fd930641267, workPath=null, algorithmVersion=2, skipCleanup=false, ignoreCleanupFailures=false} at scala.Predef$.require(Predef.scala:224) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.newTaskTempFile(HadoopMapReduceCommitProtocol.scala:78) at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:291) at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:305) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:249) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1365) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:252) ... 8 more {code} > HadoopMapReduceCommitProtocol to fail with message if > FileOutputCommitter.getWorkPath==null > --- > > Key: SPARK-20886 > URL: https://issues.apache.org/jira/browse/SPARK-20886 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Steve Loughran >Priority: Trivial > > This is minor, and the root cause is my fault *elsewhere*, but its the patch > I used to track down the problem. > If {{HadoopMapReduceCommitProtocol}} has a {{FileOutputCommitter}} for > committing things, and *somehow* that's been configured with a > {{JobAttemptContext}}, not a {{TaskAttemptContext}}, then the committer NPEs. > A {{require()}} statement can validate the working path and so point the > blame at whoever's code is confused. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20886) HadoopMapReduceCommitProtocol to fail with message if FileOutputCommitter.getWorkPath==null
[ https://issues.apache.org/jira/browse/SPARK-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024885#comment-16024885 ] Steve Loughran commented on SPARK-20886: Stack trace: before {code} Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814) ... Cause: org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:263) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:182) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:181) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ... Cause: java.lang.NullPointerException: at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.newTaskTempFile(HadoopMapReduceCommitProtocol.scala:76) at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:291) at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:305) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:249) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1365) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:252) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:182) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:181) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) {code} > HadoopMapReduceCommitProtocol to fail with message if > FileOutputCommitter.getWorkPath==null > --- > > Key: SPARK-20886 > URL: https://issues.apache.org/jira/browse/SPARK-20886 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Steve Loughran >Priority: Trivial > > This is minor, and the root cause is my fault *elsewhere*, but its the patch > I used to track down the problem. > If {{HadoopMapReduceCommitProtocol}} has a {{FileOutputCommitter}} for > committing things, and *somehow* that's been configured with a > {{JobAttemptContext}}, not a {{TaskAttemptContext}}, then the committer NPEs. > A {{require()}} statement can validate the working path and so point the > blame at whoever's code is confused. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org