[GitHub] [spark] cloud-fan commented on a diff in pull request #37011: [SPARK-39625][SPARK-38904][SQL] Add Dataset.as(StructType)

GitBox Mon, 04 Jul 2022 22:44:27 -0700


cloud-fan commented on code in PR #37011:
URL: https://github.com/apache/spark/pull/37011#discussion_r913404742



##########
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala:
##########
@@ -464,6 +464,25 @@ class Dataset[T] private[sql](
    */
   def as[U : Encoder]: Dataset[U] = Dataset[U](sparkSession, logicalPlan)
 
+  /**
+   * Returns a new DataFrame where each row is reconciled to match the 
specified schema. Spark will:
+   * - Reorder columns and/or inner fields by name to match the specified 
schema.
+   * - Project away columns and/or inner fields that are not needed by the 
specified schema. Missing
+   *   columns and/or inner fields lead to failures.
+   * - Cast the columns and/or inner fields to match the data types in the 
specified schema, if the
+   *   types are compatible, e.g., numeric to numeric (error if overflows), 
but not string to int.
+   * - The columns and/or inner fields will carry the metadata from the 
specified schema, while
+   *   still keep their own metadata if not overwritten by the specified 
schema.
+   * - Fail if the nullability are not compatible. For example, the column 
and/or inner field is
+   *   nullable but the specified schema requires them to be not nullable.

Review Comment:
   My question is: what can go wrong if you expect a nullable column but it's 
actually non-nullable? If no I prefer the current behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #37011: [SPARK-39625][SPARK-38904][SQL] Add Dataset.as(StructType)

Reply via email to