GitHub user AndreSchumacher opened a pull request:

    https://github.com/apache/spark/pull/360

    SPARK-1293 [SQL] WIP Parquet support for nested types

    It should be possible to import and export data stored in Parquet's 
columnar format that contains nested types. For example:
    ```java
    message AddressBook {
       required string owner;
       optional group ownerPhoneNumbers {
          repeated string values;
       }
       repeated group contacts {
          required string name;
          optional string phoneNumber;
       }
    }
    ```
    The example could model a type (AddressBook) that contains records made of 
strings (owner), lists (ownerPhoneNumbers) and a table of contacts (e.g., a 
list of pairs of a map). The list of tasks are as follows:
    
    <h6>Implement support for converting nested Parquet types to Spark/Catalyst 
types:</h6>
    - [x] Structs
    - [x] Lists
    - [ ] Maps
    
    <h6>Implement import (via ``parquetFile``) of nested Parquet types (first 
version in this PR)</h6>
    - [x] Initial version (without maps)
     
    <h6>Implement export (via ``saveAsParquetFile``)</h6>
    - [ ] Initial version (missing)
    
    <h6>Test support for AvroParquet, etc.</h6>
    
    Example:
    ```scala
    val data = TestSQLContext
      .parquetFile("input.dir")
      .toSchemaRDD
    data.registerAsTable("data")
    sql("SELECT owner, contacts[1].name FROM data").collect()
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AndreSchumacher/spark nested_parquet

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/360.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #360
    
----
commit 7318fe19eac8caf471feec8e4830a538aa647770
Author: Andre Schumacher <[email protected]>
Date:   2014-03-26T07:46:10Z

    Adding conversion of nested Parquet schemas

commit 0649f3b407632041df63d3306773b657255dbcb3
Author: Andre Schumacher <[email protected]>
Date:   2014-03-27T16:24:13Z

    First commit nested Parquet read converters

commit 341d7e55e7a66e04f0ce45b5b1fa9f8cb7debbeb
Author: Andre Schumacher <[email protected]>
Date:   2014-03-27T17:48:16Z

    First working nested Parquet record input

commit 832d263c1056efe04fe353b9718ce3f0ad307c28
Author: Andre Schumacher <[email protected]>
Date:   2014-04-01T13:17:02Z

    Completing testcase for nested data (Addressbook(

commit 7f5bd07876aa2a228db67b3bd7d2baa938d1c79c
Author: Andre Schumacher <[email protected]>
Date:   2014-04-01T14:15:23Z

    Extending tests for nested Parquet data

commit e9da236fdad31071fddd668dde3b7c303cd08d79
Author: Andre Schumacher <[email protected]>
Date:   2014-04-02T12:42:19Z

    Fixing one problem with nested arrays

commit e4375db6d50baf4f629dd71e82b92881841c3b04
Author: Andre Schumacher <[email protected]>
Date:   2014-04-02T14:00:46Z

    fixing one problem with nested structs and breaking up files

commit 7c4e79aa61fcbeba0f06c8e40e23a2f486e0cce8
Author: Andre Schumacher <[email protected]>
Date:   2014-04-02T14:45:22Z

    added struct converter

commit 04e97d1c355e054c9db51766e2582700f299751e
Author: Andre Schumacher <[email protected]>
Date:   2014-04-03T15:11:40Z

    fixing one problem with arrayconverter

commit 0cc0edb93f5b89197697286ec9cc705cd8fd5edf
Author: Andre Schumacher <[email protected]>
Date:   2014-04-04T16:56:56Z

    Documenting conversions, bugfix, wrappers of Rows

commit 0fae86af7a463bb2ad04db571c970b85bc6de333
Author: Andre Schumacher <[email protected]>
Date:   2014-04-06T14:19:23Z

    Fixing some problems intruduced during rebase

commit 2dc7adc23deb1ba05bef123db08b939a0d386082
Author: Andre Schumacher <[email protected]>
Date:   2014-04-06T16:04:44Z

    For primitive rows fall back to more efficient converter, code reorg

commit 8df7d0c1c710bb44fa904165f6b0352732c83468
Author: Andre Schumacher <[email protected]>
Date:   2014-04-08T07:27:26Z

    Adding resolution of complex ArrayTypes

commit 79b6a7a1c126e54bbd31614b202e39fc1d882e93
Author: Andre Schumacher <[email protected]>
Date:   2014-04-08T14:55:46Z

    Scalastyle

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to