Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12952#discussion_r62367029
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -340,7 +340,18 @@ class Dataset[T] private[sql](
        */
       // This is declared with parentheses to prevent the Scala compiler from 
treating
       // `ds.toDF("1")` as invoking this toDF and then apply on the returned 
DataFrame.
    -  def toDF(): DataFrame = new Dataset[Row](sparkSession, queryExecution, 
RowEncoder(schema))
    +  def toDF(): DataFrame = {
    +    val rowEncoder = RowEncoder(schema)
    +
    +    if (schema == logicalPlan.schema) {
    +      new Dataset[Row](sparkSession, queryExecution, rowEncoder)
    +    } else {
    +      // SPARK-15112: Adjust output column order so that query plan schema 
and encoder schema are
    +      // consistent in the result DataFrame
    +      val output = schema.map(f => UnresolvedAttribute(f.name))
    +      new Dataset[Row](sparkSession, Project(output, logicalPlan), 
rowEncoder)
    --- End diff --
    
    After rethinking about it, I probably should remove this projection since 
it only handles top level columns. Fields of inner nested structs may also be 
out of order. Essentially, it requires a simplified version of de/serializer 
(without instantiating Java objects) to recursively adjust all field orders.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to