[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-04-02 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1166
  
@rajrahul thanks for making all the changes (and of course for the fix)!


---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-04-01 Thread rajrahul
Github user rajrahul commented on the issue:

https://github.com/apache/drill/pull/1166
  
@vdiravka removed the extra line.


---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-29 Thread rajrahul
Github user rajrahul commented on the issue:

https://github.com/apache/drill/pull/1166
  
@vdiravka Done. Please review.


---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-29 Thread rajrahul
Github user rajrahul commented on the issue:

https://github.com/apache/drill/pull/1166
  
@vdiravka I have made similar changes for 
testSparkParquetBinaryAsTimeStamp_DictChange, 
testHiveParquetTimestampAsInt96_basic and 
testImpalaParquetBinaryAsTimeStamp_DictChange. All tests are passing, please 
have a look.


---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-24 Thread vdiravka
Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/1166
  
@rajrahul Unit test from your PR relies on particular timezone similar to 
`TestParquetWriter.testImpalaParquetBinaryAsTimeStamp_DictChange`.

Could you please edit test case for working within any time zone?
Please see this PR #904 for more details. 


---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-21 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1166
  
+1. LGTM


---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-15 Thread rajrahul
Github user rajrahul commented on the issue:

https://github.com/apache/drill/pull/1166
  
The schema given below creates the issue, as @vdiravka pointed int96 is 
marked required here. This parquet was generated with an older version of spark 
and is included in the test case.

```
message spark_schema {
  optional binary article_no (UTF8);
  optional binary qty (UTF8);
  required int96 run_date;
}

```
Newer spark version created the schema below where int96 has become 
optional.

```
message spark_schema {
  optional binary country (UTF8);
  optional double sales;
  optional int96 targetDate;
}
```


---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-15 Thread rajrahul
Github user rajrahul commented on the issue:

https://github.com/apache/drill/pull/1166
  
@parthchandra @vdiravka I have added the test case using the same parquet 
file(2.9k bytes). I tried creating a smaller file using Spark, but could not 
replicate the behavior. I have rebased the changes on the same commit and PR.


---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-14 Thread vdiravka
Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/1166
  
@parthchandra I have compared meta of files from 
`TestParquetWriter.testImpalaParquetBinaryAsTimeStamp_DictChange` and the meta 
from Rahul's dataset and found that test case indeed makes a query from two 
parquet files: one is dictionary encoded and other isn't. But the dataMode of 
column is `Optional`, that's why `Nullable` column reader is used.
Rahul's dataset contains `required` mode for INT96 column. This is a 
difference. Therefore other non-nullable column reader is necessary. 

But I believe we have some mess in names of that column readers. Maybe to 
make some refactoring would be a good point. What do you think? For example to 
remove `Dictionary` prefixes from nested classes, but to leave it for top class 
name.


---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-14 Thread rajrahul
Github user rajrahul commented on the issue:

https://github.com/apache/drill/pull/1166
  
@parthchandra I will create a unit test with few time stamp fields.


---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-14 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1166
  
@rajrahul this link is good. As expected, the int96 column is dictionary 
encoded. 
Is it possible for you to extract just a couple of records from this file 
and then use that for a unit test? 
see 
[TestParquetWriter.testImpalaParquetBinaryAsTimeStamp_DictChange](https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java#L784)

@vdiravka TestParquetWriter.testImpalaParquetBinaryAsTimeStamp_DictChange 
also uses an int96 that is dictionary encoded. Any idea whether (and why) it 
might be going thru a different code path?




---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-14 Thread rajrahul
Github user rajrahul commented on the issue:

https://github.com/apache/drill/pull/1166
  
@parthchandra please use the link 
https://github.com/rajrahul/files/raw/master/result.tar.gz
The files are present inside result/parquet/latest.


---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-14 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1166
  
@rajrahul, thanks for submitting the patch. It looks good. I guess we 
missed dictionary encoded int96 timestamps (even though timestamps with 
nanosecond precision) are the one thing that should never, ever, be dictionary 
encoded!

Just to make sure, I tried the use the sample file in DRILL-6016, but I 
could not even unzip it! Can you please check and see if the file is correct? 
WE can use that to create the unit test as well.



---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-14 Thread rajrahul
Github user rajrahul commented on the issue:

https://github.com/apache/drill/pull/1166
  
@parthchandra @vdiravka  I do not have a test case for this. I have 
manually verified the scenario with and without the patch. The sample input 
file is attached with https://issues.apache.org/jira/browse/DRILL-6016.




---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-14 Thread priteshm
Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1166
  
@parthchandra would you please review this?


---