GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/8228

    [SPARK-10005] [SQL] Fixes schema merging for nested structs

    In case of schema merging, we only handled first level fields when 
converting Parquet groups to `InternalRow`s. Nested struct fields are not 
properly handled.
    
    For example, the schema of a Parquet file to be read can be:
    
    ```
    message individual {
      required group f1 {
        optional binary f11 (utf8);
      }
    }
    ```
    
    while the global schema is:
    
    ```
    message global {
      required group f1 {
        optional binary f11 (utf8);
        optional int32 f12;
      }
    }
    ```
    
    This PR fixes this issue by padding missing fields when creating actual 
converters.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark 
spark-10005/nested-schema-merging

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8228
    
----
commit 63eb7644223a65edf3ca507c3a1604f01b61de22
Author: Cheng Lian <[email protected]>
Date:   2015-08-16T11:14:55Z

    Fixes schema merging for nested structs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to