Sergey Zhemzhitsky created SPARK-21549:
------------------------------------------

             Summary: Spark fails to abort job correctly in case of custom 
OutputFormat implementations
                 Key: SPARK-21549
                 URL: https://issues.apache.org/jira/browse/SPARK-21549
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.2.0
         Environment: spark 2.2.0
scala 2.11
            Reporter: Sergey Zhemzhitsky
            Priority: Critical


Spark fails to abort job correctly in case of custom OutputFormat 
implementations.

There are OutputFormat implementations which do not need to use 
*mapreduce.output.fileoutputformat.outputdir* standard hadoop property.

[But spark reads this property from the 
configuration.|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala#L79]
 while setting up an OutputCommitter
{code:javascript}
val committer = FileCommitProtocol.instantiate(
  className = classOf[HadoopMapReduceCommitProtocol].getName,
  jobId = stageId.toString,
  outputPath = conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
  isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
committer.setupJob(jobContext)
{code}

In that case if job fails Spark executes 
[committer.abortJob|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala#L106]
{code:javascript}
committer.abortJob(jobContext)
{code}
... and fails with the following exception
{code}
Can not create a Path from a null string
java.lang.IllegalArgumentException: Can not create a Path from a null string
  at org.apache.hadoop.fs.Path.checkPathArg(Path.java:123)
  at org.apache.hadoop.fs.Path.<init>(Path.java:135)
  at org.apache.hadoop.fs.Path.<init>(Path.java:89)
  at 
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58)
  at 
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.abortJob(HadoopMapReduceCommitProtocol.scala:141)
  at 
org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:106)
  at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085)
  at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
  at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
  at 
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to