[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file

2020-10-16 Thread GitBox


viirya commented on a change in pull request #26312:
URL: https://github.com/apache/spark/pull/26312#discussion_r506086784



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##
@@ -281,6 +281,10 @@ object FileFormatWriter extends Logging {
 } catch {
   case e: FetchFailedException =>
 throw e
+  case f: FileAlreadyExistsException =>

Review comment:
   I see. Thanks for the details. We have different standpoints. For your 
cases the first one option looks a better choice. The customers we had are 
using HDFS and `FileAlreadyExistsException` isn't recoverable. So the pain 
point comes from more time spent on a failed job.
   
   I believe even SPARK-27194 is resolved, fast-fail of a failed job caused by 
`FileAlreadyExistsException` or maybe other errors if we know they are 
un-recoverable in advance, is still useful.
   
   Seems to me there are options, one is to revert this completely, second is 
to add a config for the fast-fail behavior and set it false by default. I 
prefer the second one because the reason above, we can relieve the pain of 
wasting time on failed job if users want.
   
   WDYT?
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file

2020-10-15 Thread GitBox


viirya commented on a change in pull request #26312:
URL: https://github.com/apache/spark/pull/26312#discussion_r506027849



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##
@@ -281,6 +281,10 @@ object FileFormatWriter extends Logging {
 } catch {
   case e: FetchFailedException =>
 throw e
+  case f: FileAlreadyExistsException =>

Review comment:
   Created SPARK-33167 and will submit a PR in next days. Thanks.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file

2020-10-15 Thread GitBox


viirya commented on a change in pull request #26312:
URL: https://github.com/apache/spark/pull/26312#discussion_r506025571



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##
@@ -281,6 +281,10 @@ object FileFormatWriter extends Logging {
 } catch {
   case e: FetchFailedException =>
 throw e
+  case f: FileAlreadyExistsException =>

Review comment:
   @cloud-fan Sure. Will submit a PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file

2020-10-15 Thread GitBox


viirya commented on a change in pull request #26312:
URL: https://github.com/apache/spark/pull/26312#discussion_r506020849



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##
@@ -281,6 +281,10 @@ object FileFormatWriter extends Logging {
 } catch {
   case e: FetchFailedException =>
 throw e
+  case f: FileAlreadyExistsException =>

Review comment:
   It is not a rare case actually. We had customer encountered this issue 
in production jobs occasionally. There are also other people reporting this 
issue. It is remarkable because this `FileAlreadyExistsException` cannot be 
handled by user code. It is happened internally in Spark, and users have no way 
to deal with it. Image your production jobs fail after 10+ hours because of 
`FileAlreadyExistsException`. I believe it is worth failing fast the production 
jobs.
   
   It sounds really unreliable to rely on retry to handle a rare race that 
causes `FileAlreadyExistsException`. What if later attempts cannot success too? 
Seems to me a production job won't rely on that and needs address this issue 
seriously. In most of cases this can be handled by catching 
`FileAlreadyExistsException` and retrying. In rare case if any, there is rare 
race that can cause `FileAlreadyExistsException`  and it cannot make change? It 
sounds pretty rare to me, and even if it is really happens we can retry the 
call of the library/service to solve it, so I'm not sure if we should sacrifice 
the cases seen in production for a rare case.
   
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file

2020-10-15 Thread GitBox


viirya commented on a change in pull request #26312:
URL: https://github.com/apache/spark/pull/26312#discussion_r505946915



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##
@@ -281,6 +281,10 @@ object FileFormatWriter extends Logging {
 } catch {
   case e: FetchFailedException =>
 throw e
+  case f: FileAlreadyExistsException =>

Review comment:
   If users really want to retry, I believe it is to catch 
`FileAlreadyExistsException` and do retry in user code with changing filename, 
instead of relying on Spark to retry it.
   
   To rely on Spark's task attempt for file writing retry? It is unreliable and 
task failure can have many reasons not just file already existing.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file

2019-11-10 Thread GitBox
viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop 
task set if FileAlreadyExistsException was thrown when writing to output file
URL: https://github.com/apache/spark/pull/26312#discussion_r344586962
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/TaskOutputFileAlreadyExistException.scala
 ##
 @@ -0,0 +1,23 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+/**
+ * Exception thrown when a task cannot write to output file due to the file 
already exists.
+ */
+private[spark] class TaskOutputFileAlreadyExistException(error: Throwable) 
extends Exception(error)
 
 Review comment:
   Seems not necessary? I saw there is `TaskNotSerializableException` which 
does not extend it too. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file

2019-10-30 Thread GitBox
viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop 
task set if FileAlreadyExistsException was thrown when writing to output file
URL: https://github.com/apache/spark/pull/26312#discussion_r340926443
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
 ##
 @@ -735,4 +737,35 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
   assert(msg.contains("Cannot write nullable values to non-null column 
's'"))
 }
   }
+
+  test("SPARK-29649: Stop task set if FileAlreadyExistsException was thrown") {
+withSQLConf("fs.file.impl" -> classOf[FileExistingTestFileSystem].getName,
+"fs.file.impl.disable.cache" -> "true") {
+  withTable("t") {
+sql(
+  """
+|create table t(i int, part1 int) using parquet
+|partitioned by (part1)
+  """.stripMargin)
+
+val df = Seq((1, 1)).toDF("i", "part1")
+val err = intercept[SparkException] {
+  df.write.mode("overwrite").format("parquet").insertInto("t")
+}
+assert(err.getCause.getMessage.contains("can not write to output file: 
" +
+  "org.apache.hadoop.fs.FileAlreadyExistsException"))
 
 Review comment:
   I tried to test with max failures config, however in the test max failure is 
fixed at 1. Currently this checks the exception message is thrown by the added 
code path in TaskSetManager.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org