[GitHub] [spark] wangyum edited a comment on pull request #30517: [DO-NOT-MERGE][test-maven] Test compatibility for Parquet 1.11.1, Avro 1.10.0 and Hive 2.3.8

GitBox Sun, 29 Nov 2020 05:06:28 -0800


wangyum edited a comment on pull request #30517:
URL: https://github.com/apache/spark/pull/30517#issuecomment-735367655



   `sql/hive`, `sql/thriftserver` and `external/avro` should be fine.  
`sql/core` has some issues, e.g.:
   ```
   mvn -Dtest=none 
-DwildcardSuites=org.apache.spark.sql.execution.datasources.parquet.ParquetV2SchemaPruningSuite
 test
   ```
   
   ```
   - Spark vectorized reader - with partition data column - select nullable 
complex field and having is not null predicate *** FAILED ***
     Results do not match for query:
     Timezone: 
sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
     Timezone Env:
   
     == Parsed Logical Plan ==
     'Project ['employer.company]
     +- 'Filter (isnotnull('employer) AND ('p = 1))
        +- 'UnresolvedRelation [contacts], [], false
   
     == Analyzed Logical Plan ==
     company: struct<name:string,address:string>
     Project [employer#7739.company AS company#7772]
     +- Filter (isnotnull(employer#7739) AND (p#7741 = 1))
        +- SubqueryAlias contacts
           +- RelationV2[id#7733, name#7734, address#7735, pets#7736, 
friends#7737, relatives#7738, employer#7739, relations#7740, p#7741] parquet 
file:/root/opensource/spark/sql/core/target/tmp/spark-bdb1b34b-cf6a-462d-8caa-fcd923df3fe3/contacts
   
     == Optimized Logical Plan ==
     Project [employer#7739.company AS company#7772]
     +- Filter isnotnull(employer#7739)
        +- RelationV2[employer#7739, p#7741] parquet 
file:/root/opensource/spark/sql/core/target/tmp/spark-bdb1b34b-cf6a-462d-8caa-fcd923df3fe3/contacts
   
     == Physical Plan ==
     *(1) Project [employer#7739.company AS company#7772]
     +- *(1) Filter isnotnull(employer#7739)
        +- BatchScan[employer#7739, p#7741] ParquetScan DataFilters: 
[isnotnull(employer#7739)], Format: parquet, Location: 
InMemoryFileIndex[file:/root/opensource/spark/sql/core/target/tmp/spark-bdb1b34b-cf6a-462d-8caa-f...,
 PartitionFilters: [isnotnull(p#7741), (p#7741 = 1)], PushedFilers: 
[IsNotNull(p), EqualTo(p,1)], ReadSchema: 
struct<employer:struct<company:struct<name:string,address:string>>>, 
PushedFilters: [IsNotNull(p), EqualTo(p,1)]
   
     == Results ==
   
     == Results ==
     !== Correct Answer - 2 ==      == Spark Answer - 0 ==
      struct<>                      struct<>
     ![[abc,123 Business Street]]
     ![null] (QueryTest.scala:243)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wangyum edited a comment on pull request #30517: [DO-NOT-MERGE][test-maven] Test compatibility for Parquet 1.11.1, Avro 1.10.0 and Hive 2.3.8

Reply via email to