[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2017-03-31 Thread Bridget Bevens (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951764#comment-15951764
 ] 

Bridget Bevens commented on DRILL-4373:
---

Link to doc: http://drill.apache.org/docs/parquet-format/#about-int96-support

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2017-03-22 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937343#comment-15937343
 ] 

Rahul Challapalli commented on DRILL-4373:
--

Thanks [~knguyen] For 3 and 4 we can use 
'`store.parquet.reader.int96_as_timestamp`=true/false.' for testing. Also do 
you know if we can somehow enable the configuration option in a view? (table 
option?) Otherwise the end user experience will not be good as they have to 
know which table is generated from hive and which from drill. By having a view 
which defines how to interpret the timestamp column, then end users can be 
de-coupled from this knowledge. What do you think?

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2017-03-22 Thread Krystal (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937318#comment-15937318
 ] 

Krystal commented on DRILL-4373:


[~rkins] For #1, I will check.  For #2, I tested that and it works as expected. 
 For #3 and #4, TIMESTAMP_IMPALA_LOCALTIMEZONE function is removed as part of 
DRILL-5034.

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2017-03-22 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937305#comment-15937305
 ] 

Rahul Challapalli commented on DRILL-4373:
--

[~knguyen] I am wondering how this timestamp column would work in the below 
scenarios :

1. Generate a metadata cache on top of file which contains hive generated 
timestamp, what would happen? how would the contents of the cache file look etc?
2. Running drill native parquet reader on top of hive tables which have 
timestamp data types?
3. Running drill timestamp functions after converting hive timestamps using 
IMPALA_TIMESTAMP_LOCALTIMEZONE function
4. Running IMPALA_TIMESTAMP_LOCALTIMEZONE function in a view

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-11-11 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15657809#comment-15657809
 ] 

Vitalii Diravka commented on DRILL-4373:


The known [issue|https://issues.apache.org/jira/browse/HIVE-9482] with hive 
that it stores timestamp values into parquet files with local zone retain. 
That's why when we want to retrieve the data from such table we should consider 
the local timezone.
On the other hand parquet files don't involve the particular time zone and when 
we just read the file we shouldn't consdier a local timezone. And this is also 
standard drill behaviour with normal int64 timestamps.
So I decided that we need two IMPALA_TIMESTAMP functions: for hive and for 
regular parquet files.
I left IMPALA_TIMESTAMP function without local timezone retain and I added 
IMPALA_TIMESTAMP_LOCALTIMEZONE function (is used implicitly while reading hive 
timestamps with enabled drill native parquet reader). 

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-11-02 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630388#comment-15630388
 ] 

Kunal Khatua commented on DRILL-4373:
-

[~rkins] Please verify and close this bug.

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626573#comment-15626573
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/600
  
+1. Yes this is fine, and the tests all pass as well.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Parth Chandra
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626338#comment-15626338
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/600
  
@parthchandra The known issue with hive that it stores timestamp values 
into parquet files with local zone retain. That's why when we want to retrieve 
the data from such table we should consider the local timezone.
On the other hand parquet files don't involve the particular time zone and 
when we just read the file we shouldn't consdier a local timezone. And this is 
also standard drill behaviour with normal int64 timestamps.
So I decided that we need two `IMPALA_TIMESTAMP` functions: for hive and 
for regular parquet files.
I left  `IMPALA_TIMESTAMP` function without local timezone retain and I 
added `IMPALA_TIMESTAMP_LOCALTIMEZONE` function (implicit using with hive 
timestamps and enabled drill native parquet reader). 

Please let me know if this approach is good.
Changes in a new commit for easy review.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Parth Chandra
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623806#comment-15623806
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/600
  
@vdiravka Looks like the test 
TestHiveStorage.readAllSupportedHiveDataTypesNativeParquet:214 is also failing. 
(The timestamp_field value is not matching the baseline). Can you take a look?


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Parth Chandra
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615277#comment-15615277
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r85522267
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -739,30 +741,76 @@ public void runTestAndValidate(String selection, 
String validationSelection, Str
   }
 
   /*
-  Test the reading of an int96 field. Impala encodes timestamps as int96 
fields
+Impala encodes timestamp values as int96 fields. Test the reading of 
an int96 field with two converters:
+the first one converts parquet INT96 into drill VARBINARY and the 
second one (works while
+store.parquet.reader.int96_as_timestamp option is enabled) converts 
parquet INT96 into drill TIMESTAMP.
*/
   @Test
   public void testImpalaParquetInt96() throws Exception {
 compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
+try {
+  test("alter session set %s = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+  compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
+} finally {
+  test("alter session reset %s", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+}
   }
 
   /*
-  Test the reading of a binary field where data is in dicationary _and_ 
non-dictionary encoded pages
+  Test the reading of a binary field as drill varbinary where data is in 
dicationary _and_ non-dictionary encoded pages
*/
   @Test
-  public void testImpalaParquetVarBinary_DictChange() throws Exception {
+  public void testImpalaParquetBinaryAsVarBinary_DictChange() throws 
Exception {
 compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_dict_change.parquet`");
   }
 
   /*
+  Test the reading of a binary field as drill timestamp where data is in 
dicationary _and_ non-dictionary encoded pages
+   */
+  @Test
+  public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws 
Exception {
+final String WORKING_PATH = TestTools.getWorkingPath();
+final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources";
+try {
+  testBuilder()
+  .sqlQuery("select int96_ts from 
dfs_test.`%s/parquet/int96_dict_change`", TEST_RES_PATH)
+  .optionSettingQueriesForTestQuery(
+  "alter session set `%s` = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP)
+  .ordered()
+  
.csvBaselineFile("testframework/testParquetReader/testInt96DictChange/q1.tsv")
+  .baselineTypes(TypeProtos.MinorType.TIMESTAMP)
+  .baselineColumns("int96_ts")
+  .build().run();
+} finally {
+  test("alter system reset `%s`", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+}
+  }
+
+  /*
  Test the conversion from int96 to impala timestamp
*/
   @Test
-  public void testImpalaParquetTimestampAsInt96() throws Exception {
+  public void testTimestampImpalaConvertFrom() throws Exception {
 compareParquetReadersColumnar("convert_from(field_impala_ts, 
'TIMESTAMP_IMPALA')", "cp.`parquet/int96_impala_1.parquet`");
   }
 
   /*
+ Test reading parquet Int96 as TimeStamp and comparing obtained values 
with the
+ old results (reading the same values as VarBinary and 
convert_fromTIMESTAMP_IMPALA function using)
+   */
+  @Test
+  public void testImpalaParquetTimestampInt96AsTimeStamp() throws 
Exception {
--- End diff --

This test compares the results between new converter (Int96 to TimeStamp) 
and the old one (Int96 to VarBinary) with `convert_fromTIMESTAMP_IMPALA` 
function. 
The issue was in the `ConvertFromImpalaTimestamp` [link to the code

](https://github.com/apache/drill/pull/600/commits/a45490af2dd663168220cc3bda62a2d79170db62#diff-5d8360c5e3cf7d2f6ac7bfe58b6d319aL57)
 Because the timezone changing shouldn't affect on the result timestamp values.
I deleted timezone consideration there, so now all tests passed successfuly 
even across different timezones.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: 

[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613669#comment-15613669
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/600
  
Changing this to -1 until unit test failure is addressed.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613668#comment-15613668
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r85449218
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -739,30 +741,76 @@ public void runTestAndValidate(String selection, 
String validationSelection, Str
   }
 
   /*
-  Test the reading of an int96 field. Impala encodes timestamps as int96 
fields
+Impala encodes timestamp values as int96 fields. Test the reading of 
an int96 field with two converters:
+the first one converts parquet INT96 into drill VARBINARY and the 
second one (works while
+store.parquet.reader.int96_as_timestamp option is enabled) converts 
parquet INT96 into drill TIMESTAMP.
*/
   @Test
   public void testImpalaParquetInt96() throws Exception {
 compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
+try {
+  test("alter session set %s = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+  compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
+} finally {
+  test("alter session reset %s", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+}
   }
 
   /*
-  Test the reading of a binary field where data is in dicationary _and_ 
non-dictionary encoded pages
+  Test the reading of a binary field as drill varbinary where data is in 
dicationary _and_ non-dictionary encoded pages
*/
   @Test
-  public void testImpalaParquetVarBinary_DictChange() throws Exception {
+  public void testImpalaParquetBinaryAsVarBinary_DictChange() throws 
Exception {
 compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_dict_change.parquet`");
   }
 
   /*
+  Test the reading of a binary field as drill timestamp where data is in 
dicationary _and_ non-dictionary encoded pages
+   */
+  @Test
+  public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws 
Exception {
+final String WORKING_PATH = TestTools.getWorkingPath();
+final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources";
+try {
+  testBuilder()
+  .sqlQuery("select int96_ts from 
dfs_test.`%s/parquet/int96_dict_change`", TEST_RES_PATH)
+  .optionSettingQueriesForTestQuery(
+  "alter session set `%s` = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP)
+  .ordered()
+  
.csvBaselineFile("testframework/testParquetReader/testInt96DictChange/q1.tsv")
+  .baselineTypes(TypeProtos.MinorType.TIMESTAMP)
+  .baselineColumns("int96_ts")
+  .build().run();
+} finally {
+  test("alter system reset `%s`", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+}
+  }
+
+  /*
  Test the conversion from int96 to impala timestamp
*/
   @Test
-  public void testImpalaParquetTimestampAsInt96() throws Exception {
+  public void testTimestampImpalaConvertFrom() throws Exception {
 compareParquetReadersColumnar("convert_from(field_impala_ts, 
'TIMESTAMP_IMPALA')", "cp.`parquet/int96_impala_1.parquet`");
   }
 
   /*
+ Test reading parquet Int96 as TimeStamp and comparing obtained values 
with the
+ old results (reading the same values as VarBinary and 
convert_fromTIMESTAMP_IMPALA function using)
+   */
+  @Test
+  public void testImpalaParquetTimestampInt96AsTimeStamp() throws 
Exception {
--- End diff --

The test testImpalaParquetTimestampInt96AsTimeStamp fails when run in  a 
different timezone. Can you mark this as @Ignore unless you can fix the test to 
run across different timezones?


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 

[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610130#comment-15610130
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/600
  
+1. LGTM


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-26 Thread Karthikeyan Manivannan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610055#comment-15610055
 ] 

Karthikeyan Manivannan commented on DRILL-4373:
---

+1

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-26 Thread Karthikeyan Manivannan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610053#comment-15610053
 ] 

Karthikeyan Manivannan commented on DRILL-4373:
---

Looks good to me.

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15608503#comment-15608503
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/600
  
@bitblender @parthchandra 
Changes according to the comments were made, the branch version was rebased 
to the master version. 
Could you please review?


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15608480#comment-15608480
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r85124582
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -739,30 +739,54 @@ public void runTestAndValidate(String selection, 
String validationSelection, Str
   }
 
   /*
-  Test the reading of an int96 field. Impala encodes timestamps as int96 
fields
+Impala encodes timestamp values as int96 fields. Test the reading of 
an int96 field with two converters:
+the first one converts parquet INT96 into drill VARBINARY and the 
second one (works while
+store.parquet.reader.int96_as_timestamp option is enabled) converts 
parquet INT96 into drill TIMESTAMP.
*/
   @Test
   public void testImpalaParquetInt96() throws Exception {
 compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
+try {
+  test("alter session set %s = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+  compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
--- End diff --

The above comment was addressed to the 
[testImpalaParquetBinaryAsTimeStamp_DictChange](https://github.com/apache/drill/pull/600/commits/81c48c9cd5cdc3905ea78c6cad07a9d818d5026f#diff-aab74a5027942e775c846cebc06c32a4R771)
 method


Test was updated:
An old incorrect file int96_dict_change.parquet was replaced with the new 
two ones with int96 timestamp field and different encoded pages (dictionary and 
non-dictionary).
Csv baseline file also was added.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586088#comment-15586088
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r83908798
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -739,30 +739,54 @@ public void runTestAndValidate(String selection, 
String validationSelection, Str
   }
 
   /*
-  Test the reading of an int96 field. Impala encodes timestamps as int96 
fields
+Impala encodes timestamp values as int96 fields. Test the reading of 
an int96 field with two converters:
+the first one converts parquet INT96 into drill VARBINARY and the 
second one (works while
+store.parquet.reader.int96_as_timestamp option is enabled) converts 
parquet INT96 into drill TIMESTAMP.
*/
   @Test
   public void testImpalaParquetInt96() throws Exception {
 compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
+try {
+  test("alter session set %s = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+  compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
--- End diff --

Github seems to have swallowed the previous comments so including 
@vdiravka's questions here:

>  1) Is it better to compare result with baseline columns and values from 
the file or it is ok to compare with sqlBaselineQuery and disabled new 
PARQUET_READER_INT96_AS_TIMESTAMP option?
> In the process of investigating this test I found that the primitive data 
type of the column in the file int96_dict_change.parquet is BINARY, not INT96.
> 2) I am a little bit confused with this. Do we need convert this BINARY 
to TIMESTAMP as well? CONVERT_FROM function with IMPALA_TIMESTAMP argument 
works properly for this field. I will investigate a little more about does 
impala and hive can store timestamps into parquet BINARY.

For 1) I think it is better to compare values from the file as opposed to 
running with the the PARQUET_READER_INT96_AS_TIMESTAMP disabled.
For 2) Can you correct the int96 data in the file? AFAIK, the data should 
be int96 for the test.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585690#comment-15585690
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r83853146
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -132,6 +132,8 @@
   OptionValidator PARQUET_VECTOR_FILL_CHECK_THRESHOLD_VALIDATOR = new 
PositiveLongValidator(PARQUET_VECTOR_FILL_CHECK_THRESHOLD, 100l, 10l);
   String PARQUET_NEW_RECORD_READER = "store.parquet.use_new_reader";
   OptionValidator PARQUET_RECORD_READER_IMPLEMENTATION_VALIDATOR = new 
BooleanValidator(PARQUET_NEW_RECORD_READER, false);
+  String PARQUET_READER_INT96_AS_TIMESTAMP = 
"store.parquet.int96_as_timestamp";
--- End diff --

Agree. Done.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585692#comment-15585692
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r83853471
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -45,4 +53,34 @@ public static int getIntFromLEBytes(byte[] input, int 
start) {
 }
 return out;
   }
+
+  /**
+   * Utilities for converting from parquet INT96 binary (impala, hive 
timestamp)
+   * to date time value. This utilizes the Joda library.
+   */
+  public static class NanoTimeUtils {
+
+public static final long NANOS_PER_DAY = TimeUnit.DAYS.toNanos(1);
+public static final long NANOS_PER_HOUR = TimeUnit.HOURS.toNanos(1);
+public static final long NANOS_PER_MINUTE = 
TimeUnit.MINUTES.toNanos(1);
+public static final long NANOS_PER_SECOND = 
TimeUnit.SECONDS.toNanos(1);
+public static final long NANOS_PER_MILLISECOND =  
TimeUnit.MILLISECONDS.toNanos(1);
+
+  /**
+   * @param binaryTimeStampValue
+   *  hive, impala timestamp values with nanoseconds precision
+   *  are stored in parquet Binary as INT96
+   *
+   * @return  the number of milliseconds since January 1, 1970, 00:00:00 
GMT
+   *  represented by @param binaryTimeStampValue .
+   */
+public static long getDateTimeValueFromBinary(Binary 
binaryTimeStampValue) {
+  NanoTime nt = NanoTime.fromBinary(binaryTimeStampValue);
+  int julianDay = nt.getJulianDay();
+  long nanosOfDay = nt.getTimeOfDayNanos();
+  return DateTimeUtils.fromJulianDay(julianDay-0.5d) + 
nanosOfDay/NANOS_PER_MILLISECOND;
--- End diff --

The comment is removed. And numbers are replaced with constants from 
ParquetReaderUtility and DateTimeConstants.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585691#comment-15585691
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r83852721
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -899,18 +883,21 @@ public void testLastPageOneNull() throws Exception {
 "cp.`parquet/last_page_one_null.parquet`");
   }
 
-  private void compareParquetInt96Converters(String newInt96ConverterQuery,
-  String oldInt96ConverterAndConvertFromFunctionQuery) throws 
Exception {
-testBuilder()
-.ordered()
-.sqlQuery(newInt96ConverterQuery)
-.optionSettingQueriesForTestQuery(
-"alter session set `%s` = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP)
-.sqlBaselineQuery(oldInt96ConverterAndConvertFromFunctionQuery)
-.optionSettingQueriesForBaseline(
-"alter session set `%s` = false", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP)
-.build()
-.run();
+  private void compareParquetInt96Converters(String selection, String 
table) throws Exception {
+try {
--- End diff --

I refactored my helped method with more clear code.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583934#comment-15583934
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r83761501
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -45,4 +53,34 @@ public static int getIntFromLEBytes(byte[] input, int 
start) {
 }
 return out;
   }
+
+  /**
+   * Utilities for converting from parquet INT96 binary (impala, hive 
timestamp)
+   * to date time value. This utilizes the Joda library.
+   */
+  public static class NanoTimeUtils {
+
+public static final long NANOS_PER_DAY = TimeUnit.DAYS.toNanos(1);
+public static final long NANOS_PER_HOUR = TimeUnit.HOURS.toNanos(1);
+public static final long NANOS_PER_MINUTE = 
TimeUnit.MINUTES.toNanos(1);
+public static final long NANOS_PER_SECOND = 
TimeUnit.SECONDS.toNanos(1);
+public static final long NANOS_PER_MILLISECOND =  
TimeUnit.MILLISECONDS.toNanos(1);
+
+  /**
+   * @param binaryTimeStampValue
+   *  hive, impala timestamp values with nanoseconds precision
+   *  are stored in parquet Binary as INT96
+   *
+   * @return  the number of milliseconds since January 1, 1970, 00:00:00 
GMT
+   *  represented by @param binaryTimeStampValue .
+   */
+public static long getDateTimeValueFromBinary(Binary 
binaryTimeStampValue) {
+  NanoTime nt = NanoTime.fromBinary(binaryTimeStampValue);
+  int julianDay = nt.getJulianDay();
+  long nanosOfDay = nt.getTimeOfDayNanos();
+  return DateTimeUtils.fromJulianDay(julianDay-0.5d) + 
nanosOfDay/NANOS_PER_MILLISECOND;
--- End diff --

Sorry for the late reply. For some reason, I did not see these comments 
till now. 
About 1) Yes, you are correct. I just want the comments in 
ConvertFromImpalaTimestamp to be removed.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583114#comment-15583114
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r83710133
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -754,15 +764,45 @@ public void testImpalaParquetVarBinary_DictChange() 
throws Exception {
 compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_dict_change.parquet`");
   }
 
+  @Test
+  public void testImpalaParquetBinaryTimeStamp_DictChange() throws 
Exception {
+try {
+  test("alter session set %s = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+  compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_dict_change.parquet`");
--- End diff --

1. Is it better to compare result with baseline columns and values from the 
file or it is ok to compare with `sqlBaselineQuery` and disabled new 
`PARQUET_READER_INT96_AS_TIMESTAMP` option?
2. In the process of investigating this test I found that the primitive 
data type of the column in the file `int96_dict_change.parquet`  is BINARY, not 
INT96.  
I am a little bit confused with this. Do we need convert this BINARY to 
TIMESTAMP as well?
CONVERT_FROM function with IMPALA_TIMESTAMP argument works properly for 
this field.
I will investigate a little more about does impala and hive can store 
timestamps into parquet BINARY. 


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572829#comment-15572829
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r83284350
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -754,15 +764,45 @@ public void testImpalaParquetVarBinary_DictChange() 
throws Exception {
 compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_dict_change.parquet`");
   }
 
+  @Test
+  public void testImpalaParquetBinaryTimeStamp_DictChange() throws 
Exception {
+try {
+  test("alter session set %s = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+  compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_dict_change.parquet`");
--- End diff --

This is not a good enough test. Both the baseline and test case queries 
will use the getDateTimeValueFromBinary method and if there is a bug in that 
method, the test will still pass as both will produce the same incorrect value. 
Better to compare with the actual baseline value in the file.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572828#comment-15572828
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r83281827
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -132,6 +132,8 @@
   OptionValidator PARQUET_VECTOR_FILL_CHECK_THRESHOLD_VALIDATOR = new 
PositiveLongValidator(PARQUET_VECTOR_FILL_CHECK_THRESHOLD, 100l, 10l);
   String PARQUET_NEW_RECORD_READER = "store.parquet.use_new_reader";
   OptionValidator PARQUET_RECORD_READER_IMPLEMENTATION_VALIDATOR = new 
BooleanValidator(PARQUET_NEW_RECORD_READER, false);
+  String PARQUET_READER_INT96_AS_TIMESTAMP = 
"store.parquet.int96_as_timestamp";
--- End diff --

Should rename this to store.parquet.reader.abc to make it clear this is a 
reader only property


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553744#comment-15553744
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r82314071
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -45,4 +53,34 @@ public static int getIntFromLEBytes(byte[] input, int 
start) {
 }
 return out;
   }
+
+  /**
+   * Utilities for converting from parquet INT96 binary (impala, hive 
timestamp)
+   * to date time value. This utilizes the Joda library.
+   */
+  public static class NanoTimeUtils {
+
+public static final long NANOS_PER_DAY = TimeUnit.DAYS.toNanos(1);
+public static final long NANOS_PER_HOUR = TimeUnit.HOURS.toNanos(1);
+public static final long NANOS_PER_MINUTE = 
TimeUnit.MINUTES.toNanos(1);
+public static final long NANOS_PER_SECOND = 
TimeUnit.SECONDS.toNanos(1);
+public static final long NANOS_PER_MILLISECOND =  
TimeUnit.MILLISECONDS.toNanos(1);
+
+  /**
+   * @param binaryTimeStampValue
+   *  hive, impala timestamp values with nanoseconds precision
+   *  are stored in parquet Binary as INT96
+   *
+   * @return  the number of milliseconds since January 1, 1970, 00:00:00 
GMT
+   *  represented by @param binaryTimeStampValue .
+   */
+public static long getDateTimeValueFromBinary(Binary 
binaryTimeStampValue) {
+  NanoTime nt = NanoTime.fromBinary(binaryTimeStampValue);
+  int julianDay = nt.getJulianDay();
+  long nanosOfDay = nt.getTimeOfDayNanos();
+  return DateTimeUtils.fromJulianDay(julianDay-0.5d) + 
nanosOfDay/NANOS_PER_MILLISECOND;
--- End diff --

1.  I would recommend not using Joda. Do the calculations directly, like in 
ConvertFromImpalaTimestamp. Joda uses non-standard, hence  confusing, 
terminology. What Joda calls and uses as JulianDay, is actually Julian Date. 
Seems like you have identified this discrepancy and adjusted for it by 
subtracting 0.5 from _julianDay_. 

Note: (I guess you have already figured this out) : The actual code and 
the Joda code in the comment, in ConvertFromImpalaTimestamp, are inconsistent. 
Took me a day to figure out the reason behind this ! A bug should be opened to 
delete the comment. 

2. Can you please also leave a comment stating that 2440588 is the JDN for 
the Unix Epoch.

3. Please leave a comment stating that the order of the calls to get 
_julianDay_ and _nanosOfDay_ matters. You can do this by just stating how 
timestamps are stored in INT96 i.e 32-bit JDN followed by 64-bit nanosOfDay.

4. Consistent(single or none) spacing for binary operators (+-/) used here 
would be nice. Single spacing would be preferable.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553313#comment-15553313
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/600
  
@bitblender Sorry about this. That was hidden `\u` symbols.
Fixed.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553230#comment-15553230
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user bitblender commented on the issue:

https://github.com/apache/drill/pull/600
  
I can't see NullableFixedByteAlignedReaders.java. Shows up as a binary file.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-01 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15538934#comment-15538934
 ] 

Khurram Faraaz commented on DRILL-4373:
---

The description says Drill & Hive have incompatible timestamp representations 
in parquet. Ideally we want parquet files generated by Drill to be 
compatible/usable on Hive, Impala, Spark etc, and vice versa.

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-09-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15536285#comment-15536285
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

GitHub user vdiravka opened a pull request:

https://github.com/apache/drill/pull/600

DRILL-4373: Drill and Hive have incompatible timestamp representations in 
parquet

- added sys/sess option "store.parquet.int96_as_timestamp";
- added int96 to timestamp converter for both readers;
- added unit tests;

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vdiravka/drill DRILL-4373

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/600.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #600


commit 0d768c42f7c732360cafcacc91e29b67ae44fca4
Author: Vitalii Diravka 
Date:   2016-09-02T21:43:50Z

DRILL-4373: Drill and Hive have incompatible timestamp representations in 
parquet
- added sys/sess option "store.parquet.int96_as_timestamp";
- added int96 to timestamp converter for both readers;
- added unit tests;




> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-09-30 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15535895#comment-15535895
 ] 

Vitalii Diravka commented on DRILL-4373:


So I added int96 to timestamp converter for both parquet readers and controling 
it by system / session option "store.parquet.int96_as_timestamp". 
The value of the option is false by default for the proper work of the old 
query scripts with the "convert_from TIMESTAMP_IMPALA" function. 

When the option is true using of that function is unnesessary and can lead to 
the query fail. 


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-08-31 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451891#comment-15451891
 ] 

Vitalii Diravka commented on DRILL-4373:


[~rkins] As I see you have an error cause drill and hive use different data 
types for timestamp logical type: hive uses int96 (the reason is nanoseconds 
accuracy), but drill uses int64 (special data type for timestamps with 
appropriate meta annotation due to [parquet 
documentation|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md],
 used for microseconds or milliseconds accuracy). Therefore drill stores 
timestamps correctly and hive must be able to read such parquet files: 
https://issues.apache.org/jira/browse/HIVE-13435.

Another issue is that Drill can read hive timestamps from parquet files but 
with using CONVERT_FROM function. By default drill converts INT96 to VARBINARY.
I'm going to implement in context of this jira ability for drill to interpret 
hive timestamp in parquet files as timestamp implicitly by default, but with 
controlling it by session/system option (for the case if a new datatype will be 
stored as INT96 in the parquet file).


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-02-09 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139445#comment-15139445
 ] 

Rahul Challapalli commented on DRILL-4373:
--

Hive itself fails to read the file with the below error  (Same error from 
drill's hive plugin as well)

{code}
2016-02-09 19:12:21,980 ERROR [main]: CliDriver 
(SessionState.java:printError(833)) - Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.UnsupportedOperationException: Cannot inspect 
org.apache.hadoop.io.LongWritable
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.UnsupportedOperationException: Cannot inspect 
org.apache.hadoop.io.LongWritable
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:153)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1707)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:221)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:153)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:364)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:631)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:570)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.UnsupportedOperationException: Cannot inspect 
org.apache.hadoop.io.LongWritable
at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:584)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:576)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:139)
... 12 more
Caused by: java.lang.UnsupportedOperationException: Cannot inspect 
org.apache.hadoop.io.LongWritable
at 
org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveWritableObject(ParquetStringInspector.java:52)
at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:225)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:438)
at 
org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:422)
at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:71)
at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:40)
at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87)
... 19 more
{code}

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-02-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138294#comment-15138294
 ] 

Jason Altekruse commented on DRILL-4373:


Could you also post the failure message?

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-02-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138293#comment-15138293
 ] 

Jason Altekruse commented on DRILL-4373:


Just to isolate the issue a little more, can you read the table with Hive 
itself (rather than reading hive through the drill plugin)?

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)