EnricoMi commented on a change in pull request #26969: [SPARK-30319][SQL] Add a
stricter version of `as[T]`
URL: https://github.com/apache/spark/pull/26969#discussion_r369655551
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -495,6 +495,25 @@ class Dataset[T] private[sql](
select(newCols : _*)
}
+ /**
+ * Returns a new Dataset where each record has been mapped on to the
specified type.
+ * This only supports `U` being a class. Fields for the class will be mapped
to columns of the
+ * same name (case sensitivity is determined by `spark.sql.caseSensitive`).
+ *
+ * If the schema of the Dataset does not match the desired `U` type, you can
use `select`
+ * along with `alias` or `as` to rearrange or rename as required.
+ *
+ * This method eagerly projects away any columns that are not present in the
specified class.
+ * It further guarantees the order of columns as well as data types to match
`U`.
+ *
+ * @group basic
+ * @since 3.0.0
+ */
+ def toDS[U : Encoder]: Dataset[U] = {
+ val columns = implicitly[Encoder[U]].schema.fields.map(f =>
col(f.name).cast(f.dataType))
Review comment:
I have added checks to `toDS{T]` that throw a meaningful `AnalysisException`
when columns do not line up with encoder's schema. The method is now general
enough to support any encoder with a schema that can be created merely by
projection, which is true for standard encoder of case classes (without classes
that have inner case classes with extra fields or different field order), seq
and maps of case classes (without extra fields or different field order) as
well as any Java classes (`Encoders.javaSerialisation`). It throws `as[T]`'s
Exception if columns do not line up with encoder's schema and `as[T]` would
fail either.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]