Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/360#discussion_r11932494
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTestData.scala ---
    @@ -68,14 +93,119 @@ object ParquetTestData {
     
       lazy val testData = new ParquetRelation(testDir.toURI.toString)
     
    +  val testNestedSchema1 =
    +    // based on blogpost example, source:
    +    // https://blog.twitter.com/2013/dremel-made-simple-with-parquet
    +    // note: instead of string we have to use binary (?) otherwise
    +    // Parquet gives us:
    +    // IllegalArgumentException: expected one of [INT64, INT32, BOOLEAN,
    +    //   BINARY, FLOAT, DOUBLE, INT96, FIXED_LEN_BYTE_ARRAY]
    +    // Also repeated primitives seem tricky to convert (AvroParquet
    +    // only uses them in arrays?) so only use at most one in each group
    +    // and nothing else in that group (-> is mapped to array)!
    +    // The "values" inside ownerPhoneNumbers is a keyword currently
    +    // so that array types can be translated correctly.
    +    """
    +      |message AddressBook {
    +        |required binary owner;
    +        |optional group ownerPhoneNumbers {
    +          |repeated binary array;
    +        |}
    +        |optional group contacts {
    +          |repeated group array {
    +            |required binary name;
    +            |optional binary phoneNumber;
    +          |}
    +        |}
    +      |}
    +    """.stripMargin
    +
    +
    +  val testNestedSchema2 =
    +    """
    +      |message TestNested2 {
    +        |required int32 firstInt;
    +        |optional int32 secondInt;
    +        |optional group longs {
    +          |repeated int64 array;
    +        |}
    +        |required group entries {
    +          |repeated group array {
    +            |required double value;
    +            |optional boolean truth;
    +          |}
    +        |}
    +        |optional group outerouter {
    +          |repeated group array {
    +            |repeated group array {
    +              |repeated int32 array;
    +            |}
    +          |}
    +        |}
    +      |}
    +    """.stripMargin
    +
    +  val testNestedSchema3 =
    +    """
    +      |message TestNested3 {
    +        |required int32 x;
    +        |optional group booleanNumberPairs {
    +          |repeated group array {
    +            |required int32 key;
    +            |optional group value {
    +              |repeated group array {
    +                |required double nestedValue;
    +                |optional boolean truth;
    +              |}
    +            |}
    +          |}
    +        |}
    +      |}
    +    """.stripMargin
    +
    +  val testNestedSchema4 =
    +    """
    +      |message TestNested4 {
    --- End diff --
    
    Nit: why are the margins aligned with the indentation?  I think that 
results in an unindented string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to