Re: [PR] [SPARK-48821][SQL] Support `Update` in `DataFrameWriterV2` [spark]

via GitHub Wed, 18 Sep 2024 06:34:29 -0700


grundprinzip commented on code in PR #47233:
URL: https://github.com/apache/spark/pull/47233#discussion_r1765071539



##########
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala:
##########
@@ -1654,6 +1655,43 @@ class Dataset[T] private[sql](
     new MergeIntoWriterImpl[T](table, this, condition)
   }
 
+  /**
+   * Update rows in a table.
+   *
+   * Scala Example:
+   * {{{
+   *   spark.table("source")
+   *    .update(Map("salary" -> lit(200)))
+   *    .execute()
+   * }}}
+   * @param assignments A Map of column names to Column expressions 
representing the updates
+   *     to be applied.
+   * @group basic
+   * @since 4.0.0
+   */
+  def update(assignments: Map[String, Column]): Unit = {
+    updateInternal(assignments)
+  }
+
+  /**
+   * Update rows in a table that match a condition.
+   *
+   * Scala Example:
+   * {{{
+   *   spark.table("source")
+   *    .update(Map("salary" -> lit(200)), $"salary" === 100)
+   *    .execute()
+   * }}}
+   * @param assignments A Map of column names to Column expressions 
representing the updates
+   *     to be applied.
+   * @param condition the update condition
+   * @group basic
+   * @since 4.0.0
+   */
+  def update(assignments: Map[String, Column], condition: Column): Unit = {
+    updateInternal(assignments, Some(condition))
+  }

Review Comment:
   I think we get side-tracked maybe a bit too much by mapping this into "one" 
particular SQL query instead of looking at this from a usability perspective.
   
   IMHO, there are couple of ways to look at this:
   
   ```
   spark.table("simple").write.update(mapping, cond)
   ```
   
   This one is in line with the API of the existing datagram writer.
   
   ```
   spark.table("simple").update()
   ```
   
   This one doesn't follow the builder pattern of the writer and is much more a 
new API.
   
   ```
   spark.catalog.getTable().update()
   ```
   
   This one seems to be a bit lazy because it tries to trade the complexity of 
dealing with the analysis error handling of the input relation with a weirder 
API.
   
   Why not just keep it in the actual dataframe writer API?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48821][SQL] Support `Update` in `DataFrameWriterV2` [spark]

Reply via email to