cloud-fan commented on a change in pull request #23606: [SPARK-26666][SQL] 
Support DSv2 overwrite and dynamic partition overwrite.
URL: https://github.com/apache/spark/pull/23606#discussion_r257112471
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##########
 @@ -144,14 +146,22 @@ object DataSourceV2Strategy extends Strategy {
       WriteToDataSourceV2Exec(writer, planLater(query)) :: Nil
 
     case AppendData(r: DataSourceV2Relation, query, _) =>
-      val writeBuilder = r.newWriteBuilder(query.schema)
-      writeBuilder match {
-        case s: SupportsSaveMode =>
-          val write = s.mode(SaveMode.Append).buildForBatch()
-          assert(write != null)
-          WriteToDataSourceV2Exec(write, planLater(query)) :: Nil
-        case _ => throw new AnalysisException(s"data source ${r.name} does not 
support SaveMode")
-      }
+      AppendDataExec(
+        r.table.asBatchWritable, r.options.toDataSourceOptions, 
planLater(query)) :: Nil
+
+    case OverwriteByExpression(r: DataSourceV2Relation, deleteExpr, query, _) 
=>
+      // fail if any filter cannot be converted. correctness depends on 
removing all matching data.
+      val filters = splitConjunctivePredicates(deleteExpr).map {
+        filter => DataSourceStrategy.translateFilter(deleteExpr).getOrElse(
+          throw new SparkException(s"Cannot translate expression to source 
filter: $filter"))
 
 Review comment:
   Agree that `AnalysisException` is not proper here, as this is not the 
analysis phase. But `SparkException` is worse. `SparkException` should be used 
at executor side, which indicates a Spark job is launched.
   
   One thing we can do is, do this conversion at analysis phase. i.e. the 
logical plan `OverwriteByExpression` should take `Array[Filter]` as well. 
However, the downside is, for advanced users that touch catalyst rules 
directly, they are not able to support arbitrary expressions.
   
   So I think `AnalysisException` is OK here, as this is something we should do 
at the analysis phase, but we don't for other reasons.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to