[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file
viirya commented on a change in pull request #26312: URL: https://github.com/apache/spark/pull/26312#discussion_r506086784 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ## @@ -281,6 +281,10 @@ object FileFormatWriter extends Logging { } catch { case e: FetchFailedException => throw e + case f: FileAlreadyExistsException => Review comment: I see. Thanks for the details. We have different standpoints. For your cases the first one option looks a better choice. The customers we had are using HDFS and `FileAlreadyExistsException` isn't recoverable. So the pain point comes from more time spent on a failed job. I believe even SPARK-27194 is resolved, fast-fail of a failed job caused by `FileAlreadyExistsException` or maybe other errors if we know they are un-recoverable in advance, is still useful. Seems to me there are options, one is to revert this completely, second is to add a config for the fast-fail behavior and set it false by default. I prefer the second one because the reason above, we can relieve the pain of wasting time on failed job if users want. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file
viirya commented on a change in pull request #26312: URL: https://github.com/apache/spark/pull/26312#discussion_r506027849 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ## @@ -281,6 +281,10 @@ object FileFormatWriter extends Logging { } catch { case e: FetchFailedException => throw e + case f: FileAlreadyExistsException => Review comment: Created SPARK-33167 and will submit a PR in next days. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file
viirya commented on a change in pull request #26312: URL: https://github.com/apache/spark/pull/26312#discussion_r506025571 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ## @@ -281,6 +281,10 @@ object FileFormatWriter extends Logging { } catch { case e: FetchFailedException => throw e + case f: FileAlreadyExistsException => Review comment: @cloud-fan Sure. Will submit a PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file
viirya commented on a change in pull request #26312: URL: https://github.com/apache/spark/pull/26312#discussion_r506020849 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ## @@ -281,6 +281,10 @@ object FileFormatWriter extends Logging { } catch { case e: FetchFailedException => throw e + case f: FileAlreadyExistsException => Review comment: It is not a rare case actually. We had customer encountered this issue in production jobs occasionally. There are also other people reporting this issue. It is remarkable because this `FileAlreadyExistsException` cannot be handled by user code. It is happened internally in Spark, and users have no way to deal with it. Image your production jobs fail after 10+ hours because of `FileAlreadyExistsException`. I believe it is worth failing fast the production jobs. It sounds really unreliable to rely on retry to handle a rare race that causes `FileAlreadyExistsException`. What if later attempts cannot success too? Seems to me a production job won't rely on that and needs address this issue seriously. In most of cases this can be handled by catching `FileAlreadyExistsException` and retrying. In rare case if any, there is rare race that can cause `FileAlreadyExistsException` and it cannot make change? It sounds pretty rare to me, and even if it is really happens we can retry the call of the library/service to solve it, so I'm not sure if we should sacrifice the cases seen in production for a rare case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file
viirya commented on a change in pull request #26312: URL: https://github.com/apache/spark/pull/26312#discussion_r505946915 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ## @@ -281,6 +281,10 @@ object FileFormatWriter extends Logging { } catch { case e: FetchFailedException => throw e + case f: FileAlreadyExistsException => Review comment: If users really want to retry, I believe it is to catch `FileAlreadyExistsException` and do retry in user code with changing filename, instead of relying on Spark to retry it. To rely on Spark's task attempt for file writing retry? It is unreliable and task failure can have many reasons not just file already existing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file
viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file URL: https://github.com/apache/spark/pull/26312#discussion_r344586962 ## File path: core/src/main/scala/org/apache/spark/TaskOutputFileAlreadyExistException.scala ## @@ -0,0 +1,23 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +/** + * Exception thrown when a task cannot write to output file due to the file already exists. + */ +private[spark] class TaskOutputFileAlreadyExistException(error: Throwable) extends Exception(error) Review comment: Seems not necessary? I saw there is `TaskNotSerializableException` which does not extend it too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file
viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file URL: https://github.com/apache/spark/pull/26312#discussion_r340926443 ## File path: sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala ## @@ -735,4 +737,35 @@ class InsertSuite extends DataSourceTest with SharedSparkSession { assert(msg.contains("Cannot write nullable values to non-null column 's'")) } } + + test("SPARK-29649: Stop task set if FileAlreadyExistsException was thrown") { +withSQLConf("fs.file.impl" -> classOf[FileExistingTestFileSystem].getName, +"fs.file.impl.disable.cache" -> "true") { + withTable("t") { +sql( + """ +|create table t(i int, part1 int) using parquet +|partitioned by (part1) + """.stripMargin) + +val df = Seq((1, 1)).toDF("i", "part1") +val err = intercept[SparkException] { + df.write.mode("overwrite").format("parquet").insertInto("t") +} +assert(err.getCause.getMessage.contains("can not write to output file: " + + "org.apache.hadoop.fs.FileAlreadyExistsException")) Review comment: I tried to test with max failures config, however in the test max failure is fixed at 1. Currently this checks the exception message is thrown by the added code path in TaskSetManager. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org