cloud-fan commented on a change in pull request #23606: [SPARK-26666][SQL]
Support DSv2 overwrite and dynamic partition overwrite.
URL: https://github.com/apache/spark/pull/23606#discussion_r257112471
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
##########
@@ -144,14 +146,22 @@ object DataSourceV2Strategy extends Strategy {
WriteToDataSourceV2Exec(writer, planLater(query)) :: Nil
case AppendData(r: DataSourceV2Relation, query, _) =>
- val writeBuilder = r.newWriteBuilder(query.schema)
- writeBuilder match {
- case s: SupportsSaveMode =>
- val write = s.mode(SaveMode.Append).buildForBatch()
- assert(write != null)
- WriteToDataSourceV2Exec(write, planLater(query)) :: Nil
- case _ => throw new AnalysisException(s"data source ${r.name} does not
support SaveMode")
- }
+ AppendDataExec(
+ r.table.asBatchWritable, r.options.toDataSourceOptions,
planLater(query)) :: Nil
+
+ case OverwriteByExpression(r: DataSourceV2Relation, deleteExpr, query, _)
=>
+ // fail if any filter cannot be converted. correctness depends on
removing all matching data.
+ val filters = splitConjunctivePredicates(deleteExpr).map {
+ filter => DataSourceStrategy.translateFilter(deleteExpr).getOrElse(
+ throw new SparkException(s"Cannot translate expression to source
filter: $filter"))
Review comment:
Agree that `AnalysisException` is not proper here, as this is not the
analysis phase. But `SparkException` is worse. `SparkException` should be used
at executor side, which indicates a Spark job is launched.
One thing we can do is, do this conversion at analysis phase. i.e. the
logical plan `OverwriteByExpression` should take `Array[Filter]` as well.
However, the downside is, for advanced users that touch catalyst rules
directly, they are not able to support arbitrary expressions.
So I think `AnalysisException` is OK here, as this is something we should do
at the analysis phase, but we don't for other reasons.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]