[ https://issues.apache.org/jira/browse/DRILL-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Phillips resolved DRILL-2286. ------------------------------------ Resolution: Duplicate > Parquet compression causes read errors > -------------------------------------- > > Key: DRILL-2286 > URL: https://issues.apache.org/jira/browse/DRILL-2286 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Affects Versions: 0.8.0 > Reporter: Adam Gilmore > Assignee: Steven Phillips > Priority: Critical > > From what I can see, since compression has been added to the Parquet writer, > reading errors can occur. > Basically, things like timestamp and decimal are stored as int64 with some > metadata. It appears that when the column is compressed, it tries to read > int64s into a vector of timestamp/decimal types, which causes a cast error. > Here's the JSON file I'm using: > {code} > { "a": 1.5 } > { "a": 3.5 } > { "a": 1.5 } > { "a": 2.5 } > { "a": 1.5 } > { "a": 5.5 } > { "a": 1.5 } > { "a": 6.0 } > { "a": 1.5 } > {code} > Now create a Parquet table like so: > create table dfs.tmp.test as (select cast(a as decimal(18,8)) from > dfs.tmp.`test.json`) > Now when you try to query it like so: > {noformat} > 0: jdbc:drill:zk=local> select * from dfs.tmp.test; > Query failed: RemoteRpcException: Failure while running fragment., > org.apache.drill.exec.vector.NullableDecimal18Vector cannot be cast to > org.apache.drill.exec.vector.NullableBigIntVector [ > 91e23d42-fa06-4429-b78e-3ff32352e660 on ...:31010 ] > [ 91e23d42-fa06-4429-b78e-3ff32352e660 on ...:31010 ] > Error: exception while executing query: Failure while executing query. > (state=,code=0) > {noformat} > This is the same for timestamps, for example. > The relevant code is in ColumnReaderFactory whereby if the column chunk is > encoded, it creates specific readers based on the type of the column (in this > case int64, instead of timestamp/decimal). > This is pretty severe, as it looks like the compression is enabled by default > now. I do note that with only 1-2 records in the JSON file, it doesn't > bother compressing and the queries then work fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)