[jira] [Commented] (DRILL-4664) ScanBatch.isNewSchema() returns wrong result for map datatype

2016-06-08 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320882#comment-15320882
 ] 

Vitalii Diravka commented on DRILL-4664:


[~hgunes] Could you please review this?

> ScanBatch.isNewSchema() returns wrong result for map datatype
> -
>
> Key: DRILL-4664
> URL: https://issues.apache.org/jira/browse/DRILL-4664
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Priority: Minor
>
> isNewSchema() method checks if top-level schema or any of the deeper map 
> schemas has changed. The last one doesn't work properly with count function.
> "deeperSchemaChanged" equals true even when two map strings have the same 
> children fields.
> Discovered while trying to fix [DRILL-2385|DRILL-2385].
> Dataset test.json for reproducing (MAP datatype object):
> {code}{"oooi":{"oa":{"oab":{"oabc":1{code}
> Example of query:
> {code}select count(t.oooi) from dfs.tmp.`test.json` t{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4664) ScanBatch.isNewSchema() returns wrong result for map datatype

2016-06-07 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319057#comment-15319057
 ] 

Vitalii Diravka commented on DRILL-4664:


Noticed that this probably happened because after creating "previous 
InternalBatch" in StreamingAggTemplate a new MapVector in MapTransferPair is 
created with the same SchemaChangeCallBack one (the same in 
RepeateedMapVector). I assume we need a new SchemaChangeCallBack there.

> ScanBatch.isNewSchema() returns wrong result for map datatype
> -
>
> Key: DRILL-4664
> URL: https://issues.apache.org/jira/browse/DRILL-4664
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Priority: Minor
>
> isNewSchema() method checks if top-level schema or any of the deeper map 
> schemas has changed. The last one doesn't work properly with count function.
> "deeperSchemaChanged" equals true even when two map strings have the same 
> children fields.
> Discovered while trying to fix [DRILL-2385|DRILL-2385].
> Dataset test.json for reproducing (MAP datatype object):
> {code}{"oooi":{"oa":{"oab":{"oabc":1{code}
> Example of query:
> {code}select count(t.oooi) from dfs.tmp.`test.json` t{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2385) count on complex objects failed with missing function implementation

2016-06-21 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-2385.

Resolution: Fixed

Fixed in f86c4fa8.

> count on complex objects failed with missing function implementation
> 
>
> Key: DRILL-2385
> URL: https://issues.apache.org/jira/browse/DRILL-2385
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 0.8.0
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.7.0
>
>
> #Wed Mar 04 01:23:42 EST 2015
> git.commit.id.abbrev=71b6bfe
> Have a complex type looks like the following:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.sia from 
> `complex.json` t limit 1;
> ++
> |sia |
> ++
> | [1,11,101,1001] |
> ++
> {code}
> A count on the complex type will fail with missing function implementation:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.gbyi, count(t.sia) 
> countsia from `complex.json` t group by t.gbyi;
> Query failed: RemoteRpcException: Failure while running fragment., Schema is 
> currently null.  You must call buildSchema(SelectionVectorMode) before this 
> container can return a schema. [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on 
> qa-node119.qa.lab:31010 ]
> [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on qa-node119.qa.lab:31010 ]
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> drillbit.log
> {code}
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] ERROR 
> o.a.drill.exec.ops.FragmentContext - Fragment Context received failure.
> org.apache.drill.exec.exception.SchemaChangeException: Failure while 
> materializing expression.
> Error in expression at index 0.  Error: Missing function implementation: 
> [count(BIGINT-REPEATED)].  Full expression: null.
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal(HashAggBatch.java:210)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator(HashAggBatch.java:158)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:101)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:114)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing 
> fragment
> java.lang.NullPointerException: Schema is currently null.  You must call 
> buildSchema(SelectionVectorMode) before this container can return a schema.
> at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) 
> ~[guava-14.0.1.jar:na]
> at 
> org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:261)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.getSchema(AbstractRecordBatch.java:155)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> 

[jira] [Commented] (DRILL-3272) HIve : Using 'if' function in hive results in an ExpressionParsingException

2016-06-16 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334130#comment-15334130
 ] 

Vitalii Diravka commented on DRILL-3272:


[~rkins] "IF" hive udf works now in 1.6.0 and 1.5.0 drill versions. Please 
check it. Looks like it was fixed already. 

> HIve : Using 'if' function in hive results in an ExpressionParsingException
> ---
>
> Key: DRILL-3272
> URL: https://issues.apache.org/jira/browse/DRILL-3272
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive
>Reporter: Rahul Challapalli
> Fix For: Future
>
> Attachments: error.log
>
>
> git.commit.id.abbrev=5f26b8b
> The below query fails. It works properly from hive however
> {code}
> select if(1999 > 2000, 'latest', 'old') from lineitem limit 1;
> Error: SYSTEM ERROR: 
> org.apache.drill.common.exceptions.ExpressionParsingException: Expression has 
> syntax error! line 1:28:mismatched input ',' expecting CParen
> Fragment 1:1
> [Error Id: 007e7d7d-62dc-42fd-b526-07762c33719c on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> I attached the error log. Let me know if you need anything else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-02-09 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-2282:
--

Assignee: Vitalii Diravka  (was: Mehant Baid)

> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.6.0
>
> Attachments: DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-975) Null-on-exception option for cast functions

2016-02-05 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133859#comment-15133859
 ] 

Vitalii Diravka commented on DRILL-975:
---

I think we can close this issue, cause  we have option 
drill.exec.functions.cast_empty_string_to_null = true.

> Null-on-exception option for cast functions
> ---
>
> Key: DRILL-975
> URL: https://issues.apache.org/jira/browse/DRILL-975
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Steven Phillips
> Fix For: Future
>
> Attachments: DRILL-975.patch
>
>
> Currently, if a particular value cannot be cast to the target type, an 
> exception is thrown and the query fails. We should have a mode that will 
> treat the output of all cast functions as nullable, and return a null value 
> if the cast fails, rather than throwing an exception.
> An important example where this is important is when using the Text reader. 
> The text reader always produces a single, RepeatedVarChar column. The columns 
> are then cast to the appropriate type. For the columns that are cast to 
> numeric types, if there is no value (i.e. it's an empty string), currently 
> this will throw NumberFormatException. What we really want is for it to 
> produce a Null value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-02-10 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140490#comment-15140490
 ] 

Vitalii Diravka commented on DRILL-2282:


[~mehant] I have updated the patch, because the structure of project was 
changed and new classes were added. But to know that these changes make sense I 
want to reproduce that errors about which you tell. 
Does it mean that errors appear when we query hbase tables like in 
[DRILL-1496|https://issues.apache.org/jira/browse/DRILL-1496]?

> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.6.0
>
> Attachments: DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4346) NumberFormatException when casting empty string to int in hbase/maprdb

2016-02-03 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-4346:
--

 Summary: NumberFormatException when casting empty string to int in 
hbase/maprdb
 Key: DRILL-4346
 URL: https://issues.apache.org/jira/browse/DRILL-4346
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.2.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka


Queries to null values in HBase with no data when casting to Integer results in 
NumberFormatException: 
{code}
Data 

row1,1,2 
row2,,4 
row3,5,6 
row4,7,8 

Create Table 

$ maprcli table create -path /user/cmatta/projects/cmatta_test 
$ maprcli table cf create -path /user/cmatta/projects/cmatta_test -cfname a 

Load into Hbase table: 

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' 
-Dimporttsv.columns=HBASE_ROW_KEY,a:c1,a:c2 /user/cmatta/projects/cmatta_test 
maprfs:///user/cmatta/projects/testdata_hbase_null 
{code}
{code}
0: jdbc:drill:> select cast(x.`row_key` as varchar(128)) as `row_key`, 
CAST(x.`a`.`c1` as INTEGER) from maprfs.cmatta.`cmatta_test` x; 
Error: SYSTEM ERROR: NumberFormatException: 

Fragment 0:0 

[Error Id: 05a0e5ed-d830-4926-a442-569c9d70d0b4 on se-node11.se.lab:31010] 
(state=,code=0) 
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-02-23 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159462#comment-15159462
 ] 

Vitalii Diravka commented on DRILL-2282:


I think it isn't necessary to remove spaces and special symbols from functions, 
because no difference after editing such functions.
Perhaps only to bring to a common design.



> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.6.0
>
> Attachments: DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-02-24 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153973#comment-15153973
 ] 

Vitalii Diravka edited comment on DRILL-2282 at 2/24/16 7:28 PM:
-

[~mehant] I tried to reproduce issues mentioned in this jira but didn't get 
their. Every query with spaces and special symbols in functions works properly 
I mean. 
Here is available the test as proof for successful work such queries.[Updated 
patch 
version|https://github.com/vdiravka/drill/commit/72aec00985b2a385f34c1861eb44a5fb83f0bb9b]



was (Author: vitalii):
[~mehant] I tried to reproduce issues mentioned in this jira but didn't get 
their. Every query with spaces and special symbols in functions works properly 
I mean. 
Here is available the test as proof for successful work such queries.[Updated 
patch|https://github.com/vdiravka/drill/commit/72aec00985b2a385f34c1861eb44a5fb83f0bb9b]


> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.6.0
>
> Attachments: DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-02-19 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153973#comment-15153973
 ] 

Vitalii Diravka commented on DRILL-2282:


I tried to reproduce issues mentioned in this jira but didn't get their. Every 
query with spaces and special symbols in functions works properly I mean. 
Here is available the test as proof for successful work such queries.[Updates 
patch|https://github.com/vdiravka/drill/commit/72aec00985b2a385f34c1861eb44a5fb83f0bb9b]


> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.6.0
>
> Attachments: DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-02-19 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153973#comment-15153973
 ] 

Vitalii Diravka edited comment on DRILL-2282 at 2/19/16 9:28 AM:
-

I tried to reproduce issues mentioned in this jira but didn't get their. Every 
query with spaces and special symbols in functions works properly I mean. 
Here is available the test as proof for successful work such queries.[Updated 
patch|https://github.com/vdiravka/drill/commit/72aec00985b2a385f34c1861eb44a5fb83f0bb9b]



was (Author: vitalii):
I tried to reproduce issues mentioned in this jira but didn't get their. Every 
query with spaces and special symbols in functions works properly I mean. 
Here is available the test as proof for successful work such queries.[Updates 
patch|https://github.com/vdiravka/drill/commit/72aec00985b2a385f34c1861eb44a5fb83f0bb9b]


> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.6.0
>
> Attachments: DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-02-19 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153973#comment-15153973
 ] 

Vitalii Diravka edited comment on DRILL-2282 at 2/19/16 4:57 PM:
-

[~Mehant Baid] I tried to reproduce issues mentioned in this jira but didn't 
get their. Every query with spaces and special symbols in functions works 
properly I mean. 
Here is available the test as proof for successful work such queries.[Updated 
patch|https://github.com/vdiravka/drill/commit/72aec00985b2a385f34c1861eb44a5fb83f0bb9b]



was (Author: vitalii):
I tried to reproduce issues mentioned in this jira but didn't get their. Every 
query with spaces and special symbols in functions works properly I mean. 
Here is available the test as proof for successful work such queries.[Updated 
patch|https://github.com/vdiravka/drill/commit/72aec00985b2a385f34c1861eb44a5fb83f0bb9b]


> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.6.0
>
> Attachments: DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-02-19 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153973#comment-15153973
 ] 

Vitalii Diravka edited comment on DRILL-2282 at 2/19/16 4:57 PM:
-

[~mehant] I tried to reproduce issues mentioned in this jira but didn't get 
their. Every query with spaces and special symbols in functions works properly 
I mean. 
Here is available the test as proof for successful work such queries.[Updated 
patch|https://github.com/vdiravka/drill/commit/72aec00985b2a385f34c1861eb44a5fb83f0bb9b]



was (Author: vitalii):
[~Mehant Baid] I tried to reproduce issues mentioned in this jira but didn't 
get their. Every query with spaces and special symbols in functions works 
properly I mean. 
Here is available the test as proof for successful work such queries.[Updated 
patch|https://github.com/vdiravka/drill/commit/72aec00985b2a385f34c1861eb44a5fb83f0bb9b]


> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.6.0
>
> Attachments: DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-03-10 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-2282:
---
Attachment: DRILL-2282-updated.patch

Updated version of DRILL-2282.patch.

> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.6.0
>
> Attachments: DRILL-2282-updated.patch, DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-03-10 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189043#comment-15189043
 ] 

Vitalii Diravka commented on DRILL-2282:


The updated patch DRILL-2282-updated.patch is ready to implement after fixing 
the bug in Apache arrow subproject [ARROW-61 Method can return the value bigger 
than long MAX_VALUE|https://issues.apache.org/jira/browse/ARROW-61].

> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.6.0
>
> Attachments: DRILL-2282-updated.patch, DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3577) Counting nested fields on CTAS-created-parquet file/s reports inaccurate results

2016-04-13 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-3577:
--

Assignee: Vitalii Diravka  (was: Mehant Baid)

> Counting nested fields on CTAS-created-parquet file/s reports inaccurate 
> results
> 
>
> Key: DRILL-3577
> URL: https://issues.apache.org/jira/browse/DRILL-3577
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.1.0
>Reporter: Hanifi Gunes
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.7.0
>
>
> I have not tried this at a smaller scale nor on JSON file directly but the 
> following seems to re-prod the issue
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> This should read
> {code}
> Fragment Number of records written
> 0_0   20200
> {code}
> 3. Count on nested fields via
> {code:sql}
> select count(t.others.additional) from dfs.`tmp`.`tp` t
> OR
> select count(t.others.other) from dfs.`tmp`.`tp` t
> {code}
> reports no rows as follows
> {code}
> EXPR$0
> 0
> {code}
> While
> {code:sql}
> select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not 
> null
> {code}
> reports expected 200 rows
> {code}
> EXPR$0
> 200
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (DRILL-4584) JDBC/ODBC Client IP in Drill audit logs

2016-04-08 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4584:
---
Comment: was deleted

(was: Is this an ip address of client machine with drill web console, drill 
shell or jdbc/odbc client?
Or this is an ip address of foreman node? If the answer is IP address of 
foreman, And what is better to show hostname, IP address or ip:port.
!https://drill.apache.org/docs/img/query-flow-client.png!)

> JDBC/ODBC Client IP in Drill audit logs
> ---
>
> Key: DRILL-4584
> URL: https://issues.apache.org/jira/browse/DRILL-4584
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC, Client - ODBC
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.7.0
>
>
> Currently Drill audit logs - sqlline_queries.json and drillbit_queries.json 
> provide information about client username who fired the query . It will be 
> good to also have the client IP from where the query was fired .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4584) JDBC/ODBC Client IP in Drill audit logs

2016-04-06 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-4584:
--

 Summary: JDBC/ODBC Client IP in Drill audit logs
 Key: DRILL-4584
 URL: https://issues.apache.org/jira/browse/DRILL-4584
 Project: Apache Drill
  Issue Type: Improvement
  Components: Client - JDBC, Client - ODBC
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
Priority: Minor
 Fix For: 1.7.0


Currently Drill audit logs - sqlline_queries.json and drillbit_queries.json 
provide information about client username who fired the query . It will be good 
to also have the client IP from where the query was fired .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4584) JDBC/ODBC Client IP in Drill audit logs

2016-04-06 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228592#comment-15228592
 ] 

Vitalii Diravka commented on DRILL-4584:


Is this an ip address of client machine with drill web console, drill shell or 
jdbc/odbc client?
Or this is an ip address of foreman node? If the answer is IP address of 
foreman, And what is better to show hostname, IP address or ip:port.
!https://drill.apache.org/docs/img/query-flow-client.png!

> JDBC/ODBC Client IP in Drill audit logs
> ---
>
> Key: DRILL-4584
> URL: https://issues.apache.org/jira/browse/DRILL-4584
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC, Client - ODBC
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.7.0
>
>
> Currently Drill audit logs - sqlline_queries.json and drillbit_queries.json 
> provide information about client username who fired the query . It will be 
> good to also have the client IP from where the query was fired .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-03-19 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-2282:
---
Issue Type: Improvement  (was: Bug)

> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
> Attachments: DRILL-2282-updated.patch, DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4459) SchemaChangeException while querying hive json table

2016-03-01 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-4459:
--

 Summary: SchemaChangeException while querying hive json table
 Key: DRILL-4459
 URL: https://issues.apache.org/jira/browse/DRILL-4459
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill, Functions - Hive
Affects Versions: 1.4.0
 Environment: MapR-Drill 1.4.0
Hive-1.2.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
 Fix For: 1.6.0


getting the SchemaChangeException while querying json documents stored in hive 
table.
{noformat}
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:
 
Error in expression at index -1.  Error: Missing function implementation: 
[castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
{noformat}
minimum reproduce
{noformat}
created sample json documents using the attached script(randomdata.sh)
hive>create table simplejson(json string);
hive>load data local inpath '/tmp/simple.json' into table simplejson;
now query it through Drill.
Drill Version
select * from sys.version;
+---++-+-++
| commit_id | commit_message | commit_time | build_email | build_time |
+---++-+-++
| eafe0a245a0d4c0234bfbead10c6b2d7c8ef413d | DRILL-3901:  Don't do early 
expansion of directory in the non-metadata-cache case because it already 
happens during ParquetGroupScan's metadata gathering operation. | 07.10.2015 @ 
17:12:57 UTC | Unknown | 07.10.2015 @ 17:36:16 UTC |
+---++-+-++

0: jdbc:drill:zk=> select * from hive.`default`.simplejson where 
GET_JSON_OBJECT(simplejson.json, '$.DocId') = 'DocId2759947' limit 1;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:
 
Error in expression at index -1.  Error: Missing function implementation: 
[castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..

Fragment 1:1

[Error Id: 74f054a8-6f1d-4ddd-9064-3939fcc82647 on ip-10-0-0-233:31010] 
(state=,code=0)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3894) Directory functions (MaxDir, MinDir ..) should have optional filename parameter

2016-04-05 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226155#comment-15226155
 ] 

Vitalii Diravka commented on DRILL-3894:


[~tdunning] I have implemented your approach into a new functions with one 
parameter.
https://github.com/vdiravka/drill/commit/966d76a06f82dcb265849b90bcff8ce8a770f4ec

> Directory functions (MaxDir, MinDir ..) should have optional filename 
> parameter
> ---
>
> Key: DRILL-3894
> URL: https://issues.apache.org/jira/browse/DRILL-3894
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.2.0
>Reporter: Neeraja
>Assignee: Vitalii Diravka
>
> https://drill.apache.org/docs/query-directory-functions/
> The directory functions documented above should provide ability to have 
> second parameter(file name) as optional.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3894) Directory functions (MaxDir, MinDir ..) should have optional filename parameter

2016-03-30 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-3894:
--

Assignee: Vitalii Diravka

> Directory functions (MaxDir, MinDir ..) should have optional filename 
> parameter
> ---
>
> Key: DRILL-3894
> URL: https://issues.apache.org/jira/browse/DRILL-3894
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.2.0
>Reporter: Neeraja
>Assignee: Vitalii Diravka
>
> https://drill.apache.org/docs/query-directory-functions/
> The directory functions documented above should provide ability to have 
> second parameter(file name) as optional.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-2100) Drill not deleting spooling files

2016-03-30 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214433#comment-15214433
 ] 

Vitalii Diravka edited comment on DRILL-2100 at 3/30/16 12:30 PM:
--

[~haozhu], 

Added deleting of the whole directory for SQL profile Id 
("/tmp/drill/spill/2aa9600f-016a-5283-f98e-ef22942981c2" for example) when 
FileSystem is closed. 
Add closeSpillFileSytem method to Foreman.close().
https://github.com/vdiravka/drill/commit/a5f891dbba06c2f15c8478c7843394c809de25c0


was (Author: vitalii):
[~haozhu], 

Added deleting of the whole directory for SQL profile Id 
("/tmp/drill/spill/2aa9600f-016a-5283-f98e-ef22942981c2" for example) when 
FileSystem is closed. 
https://github.com/vdiravka/drill/commit/a5f891dbba06c2f15c8478c7843394c809de25c0

> Drill not deleting spooling files
> -
>
> Key: DRILL-2100
> URL: https://issues.apache.org/jira/browse/DRILL-2100
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 0.8.0
>Reporter: Abhishek Girish
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> Currently, after forcing queries to use an external sort by switching off 
> hash join/agg causes spill-to-disk files accumulating. 
> This causes issues with disk space availability when the spill is configured 
> to be on the local file system (/tmp/drill). Also not optimal when configured 
> to use DFS (custom). 
> Drill must clean up all temporary files created after a query completes or 
> after a drillbit restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2100) Drill not deleting spooling files

2016-03-28 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214433#comment-15214433
 ] 

Vitalii Diravka commented on DRILL-2100:


[~haozhu], 

Added deleting of the whole directory for SQL profile Id 
("/tmp/drill/spill/2aa9600f-016a-5283-f98e-ef22942981c2" for example) when 
FileSystem is closed. 
https://github.com/vdiravka/drill/commit/a5f891dbba06c2f15c8478c7843394c809de25c0

> Drill not deleting spooling files
> -
>
> Key: DRILL-2100
> URL: https://issues.apache.org/jira/browse/DRILL-2100
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 0.8.0
>Reporter: Abhishek Girish
> Fix For: Future
>
>
> Currently, after forcing queries to use an external sort by switching off 
> hash join/agg causes spill-to-disk files accumulating. 
> This causes issues with disk space availability when the spill is configured 
> to be on the local file system (/tmp/drill). Also not optimal when configured 
> to use DFS (custom). 
> Drill must clean up all temporary files created after a query completes or 
> after a drillbit restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-2100) Drill not deleting spooling files

2016-03-28 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-2100:
--

Assignee: Vitalii Diravka

> Drill not deleting spooling files
> -
>
> Key: DRILL-2100
> URL: https://issues.apache.org/jira/browse/DRILL-2100
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 0.8.0
>Reporter: Abhishek Girish
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> Currently, after forcing queries to use an external sort by switching off 
> hash join/agg causes spill-to-disk files accumulating. 
> This causes issues with disk space availability when the spill is configured 
> to be on the local file system (/tmp/drill). Also not optimal when configured 
> to use DFS (custom). 
> Drill must clean up all temporary files created after a query completes or 
> after a drillbit restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3510) Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL identifiers

2016-05-19 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-3510:
--

Assignee: Vitalii Diravka

> Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL 
> identifiers 
> --
>
> Key: DRILL-3510
> URL: https://issues.apache.org/jira/browse/DRILL-3510
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Jinfeng Ni
>Assignee: Vitalii Diravka
> Fix For: Future
>
> Attachments: DRILL-3510.patch, DRILL-3510.patch
>
>
> Currently Drill's SQL parser uses backtick as identifier quotes, the same as 
> what MySQL does. However, this is different from ANSI SQL specification, 
> where double quote is used as identifier quotes.  
> MySQL has an option "ANSI_QUOTES", which could be switched on/off by user. 
> Drill should follow the same way, so that Drill users do not have to rewrite 
> their existing queries, if their queries use double quotes. 
> {code}
> SET sql_mode='ANSI_QUOTES';
> {code}
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4673) Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return

2016-05-19 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4673:
---
Priority: Minor  (was: Major)

> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command return
> -
>
> Key: DRILL-4673
> URL: https://issues.apache.org/jira/browse/DRILL-4673
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: drill
>
> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command "DROP TABLE" return if table doesn't exist.
> The same for "DROP VIEW IF EXISTS"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4664) ScanBatch.isNewSchema() returns wrong result for map datatype

2016-05-11 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4664:
---
Description: 
isNewSchema() method checks if top-level schema or any of the deeper map 
schemas has changed. The last one doesn't work properly with count function.
"deeperSchemaChanged" equals true even when two map strings have the same 
children fields.

Discovered while trying to fix [DRILL-2385|DRILL-2385].

Dataset test.json for reproducing (MAP datatype object):
{code}{"oooi":{"oa":{"oab":{"oabc":1{code}

Example of query:
{code}select count(t.oooi) from dfs.tmp.`test.json` t{code}

  was:
isNewSchema() method checks if top-level schema or any of the deeper map 
schemas has changed. The last one doesn't work properly.
"deeperSchemaChanged" equals true even when two map strings have the same 
children fields.

Discovered while trying to fix [DRILL-2385|DRILL-2385].

Dataset for reproducing (MAP datatype object):
{code}{"oooi":{"oa":{"oab":{"oabc":1{code}


> ScanBatch.isNewSchema() returns wrong result for map datatype
> -
>
> Key: DRILL-4664
> URL: https://issues.apache.org/jira/browse/DRILL-4664
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Priority: Minor
>
> isNewSchema() method checks if top-level schema or any of the deeper map 
> schemas has changed. The last one doesn't work properly with count function.
> "deeperSchemaChanged" equals true even when two map strings have the same 
> children fields.
> Discovered while trying to fix [DRILL-2385|DRILL-2385].
> Dataset test.json for reproducing (MAP datatype object):
> {code}{"oooi":{"oa":{"oab":{"oabc":1{code}
> Example of query:
> {code}select count(t.oooi) from dfs.tmp.`test.json` t{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-2100) Drill not deleting spooling files

2016-05-18 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288641#comment-15288641
 ] 

Vitalii Diravka edited comment on DRILL-2100 at 5/18/16 9:00 AM:
-

Fixed in 
[38e1016|https://github.com/apache/drill/commit/38e1016c49786acaacb153ee37784b3ce3023eb5].


was (Author: vitalii):
Fixed in 38e1016.

> Drill not deleting spooling files
> -
>
> Key: DRILL-2100
> URL: https://issues.apache.org/jira/browse/DRILL-2100
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 0.8.0
>Reporter: Abhishek Girish
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> Currently, after forcing queries to use an external sort by switching off 
> hash join/agg causes spill-to-disk files accumulating. 
> This causes issues with disk space availability when the spill is configured 
> to be on the local file system (/tmp/drill). Also not optimal when configured 
> to use DFS (custom). 
> Drill must clean up all temporary files created after a query completes or 
> after a drillbit restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4673) Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return

2016-05-12 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-4673:
--

 Summary: Implement "DROP TABLE IF EXISTS" for drill to prevent 
FAILED status on command return
 Key: DRILL-4673
 URL: https://issues.apache.org/jira/browse/DRILL-4673
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka


Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command 
"DROP TABLE" return if table isn't exists.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4673) Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return

2016-05-12 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4673:
---
Description: 
Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command 
"DROP TABLE" return if table doesn't exist.


  was:
Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command 
"DROP TABLE" return if table isn't exists.



> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command return
> -
>
> Key: DRILL-4673
> URL: https://issues.apache.org/jira/browse/DRILL-4673
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>  Labels: drill
>
> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command "DROP TABLE" return if table doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3894) Directory functions (MaxDir, MinDir ..) should have optional filename parameter

2016-05-04 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-3894.

   Resolution: Implemented
Fix Version/s: 1.7.0

Implemented in 
[a6a85ab|https://github.com/apache/drill/commit/a6a85ab66360cac81ab4777cec20292470ac483d].

> Directory functions (MaxDir, MinDir ..) should have optional filename 
> parameter
> ---
>
> Key: DRILL-3894
> URL: https://issues.apache.org/jira/browse/DRILL-3894
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.2.0
>Reporter: Neeraja
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> https://drill.apache.org/docs/query-directory-functions/
> The directory functions documented above should provide ability to have 
> second parameter(file name) as optional.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2100) Drill not deleting spooling files

2016-05-04 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270384#comment-15270384
 ] 

Vitalii Diravka commented on DRILL-2100:


[~jaltekruse] Not certainly in that way. 
Spill directories will delete immediately after success, failed or canceled 
query due to fs.delete() in the close() method of ExternalSortBatch.java (also 
spill directories will be deleted from deleteOnExit set). It can work without 
using fs.deleteOnExit(). 
What about fs.deleteOnExit() I use it to delete temporary spill folders for the 
case when drillbit process is killed.

> Drill not deleting spooling files
> -
>
> Key: DRILL-2100
> URL: https://issues.apache.org/jira/browse/DRILL-2100
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 0.8.0
>Reporter: Abhishek Girish
>Assignee: Deneche A. Hakim
> Fix For: 1.7.0
>
>
> Currently, after forcing queries to use an external sort by switching off 
> hash join/agg causes spill-to-disk files accumulating. 
> This causes issues with disk space availability when the spill is configured 
> to be on the local file system (/tmp/drill). Also not optimal when configured 
> to use DFS (custom). 
> Drill must clean up all temporary files created after a query completes or 
> after a drillbit restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4584) JDBC/ODBC Client IP in Drill audit logs

2016-05-04 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-4584.

Resolution: Done

Implemented in 
[2d9f9ab|https://github.com/apache/drill/commit/2d9f9abb4c47d08f8462599c8d6076a61a1708fe].

> JDBC/ODBC Client IP in Drill audit logs
> ---
>
> Key: DRILL-4584
> URL: https://issues.apache.org/jira/browse/DRILL-4584
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC, Client - ODBC
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.7.0
>
>
> Currently Drill audit logs - sqlline_queries.json and drillbit_queries.json 
> provide information about client username who fired the query . It will be 
> good to also have the client IP from where the query was fired .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3894) Directory functions (MaxDir, MinDir ..) should have optional filename parameter

2016-05-04 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-3894:
---
Labels: doc-impacting  (was: )

> Directory functions (MaxDir, MinDir ..) should have optional filename 
> parameter
> ---
>
> Key: DRILL-3894
> URL: https://issues.apache.org/jira/browse/DRILL-3894
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.2.0
>Reporter: Neeraja
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>
> https://drill.apache.org/docs/query-directory-functions/
> The directory functions documented above should provide ability to have 
> second parameter(file name) as optional.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4584) JDBC/ODBC Client IP in Drill audit logs

2016-05-04 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4584:
---
Labels: documentation  (was: )

> JDBC/ODBC Client IP in Drill audit logs
> ---
>
> Key: DRILL-4584
> URL: https://issues.apache.org/jira/browse/DRILL-4584
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC, Client - ODBC
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: documentation
> Fix For: 1.7.0
>
>
> Currently Drill audit logs - sqlline_queries.json and drillbit_queries.json 
> provide information about client username who fired the query . It will be 
> good to also have the client IP from where the query was fired .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4584) JDBC/ODBC Client IP in Drill audit logs

2016-05-04 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4584:
---
Labels: doc-impacting  (was: documentation)

> JDBC/ODBC Client IP in Drill audit logs
> ---
>
> Key: DRILL-4584
> URL: https://issues.apache.org/jira/browse/DRILL-4584
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC, Client - ODBC
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>
> Currently Drill audit logs - sqlline_queries.json and drillbit_queries.json 
> provide information about client username who fired the query . It will be 
> good to also have the client IP from where the query was fired .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4664) ScanBatch.isNewSchema() returns wrong result for map datatype

2016-05-10 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-4664:
--

 Summary: ScanBatch.isNewSchema() returns wrong result for map 
datatype
 Key: DRILL-4664
 URL: https://issues.apache.org/jira/browse/DRILL-4664
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.6.0
Reporter: Vitalii Diravka
Priority: Minor


isNewSchema() method checks if top-level schema or any of the deeper map 
schemas has changed. The last one doesn't work properly.
"deeperSchemaChanged" equals true even when two map strings have the same 
children fields.

Discovered while trying to fix [DRILL-2385|DRILL-2385].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4664) ScanBatch.isNewSchema() returns wrong result for map datatype

2016-05-10 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4664:
---
Description: 
isNewSchema() method checks if top-level schema or any of the deeper map 
schemas has changed. The last one doesn't work properly.
"deeperSchemaChanged" equals true even when two map strings have the same 
children fields.

Discovered while trying to fix [DRILL-2385|DRILL-2385].

Dataset for reproducing (MAP datatype object):
{"oooi":{"oa":{"oab":{"oabc":1

  was:
isNewSchema() method checks if top-level schema or any of the deeper map 
schemas has changed. The last one doesn't work properly.
"deeperSchemaChanged" equals true even when two map strings have the same 
children fields.

Discovered while trying to fix [DRILL-2385|DRILL-2385].


> ScanBatch.isNewSchema() returns wrong result for map datatype
> -
>
> Key: DRILL-4664
> URL: https://issues.apache.org/jira/browse/DRILL-4664
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Priority: Minor
>
> isNewSchema() method checks if top-level schema or any of the deeper map 
> schemas has changed. The last one doesn't work properly.
> "deeperSchemaChanged" equals true even when two map strings have the same 
> children fields.
> Discovered while trying to fix [DRILL-2385|DRILL-2385].
> Dataset for reproducing (MAP datatype object):
> {"oooi":{"oa":{"oab":{"oabc":1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4664) ScanBatch.isNewSchema() returns wrong result for map datatype

2016-05-10 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4664:
---
Description: 
isNewSchema() method checks if top-level schema or any of the deeper map 
schemas has changed. The last one doesn't work properly.
"deeperSchemaChanged" equals true even when two map strings have the same 
children fields.

Discovered while trying to fix [DRILL-2385|DRILL-2385].

Dataset for reproducing (MAP datatype object):
{code}{"oooi":{"oa":{"oab":{"oabc":1{code}

  was:
isNewSchema() method checks if top-level schema or any of the deeper map 
schemas has changed. The last one doesn't work properly.
"deeperSchemaChanged" equals true even when two map strings have the same 
children fields.

Discovered while trying to fix [DRILL-2385|DRILL-2385].

Dataset for reproducing (MAP datatype object):
{"oooi":{"oa":{"oab":{"oabc":1


> ScanBatch.isNewSchema() returns wrong result for map datatype
> -
>
> Key: DRILL-4664
> URL: https://issues.apache.org/jira/browse/DRILL-4664
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Priority: Minor
>
> isNewSchema() method checks if top-level schema or any of the deeper map 
> schemas has changed. The last one doesn't work properly.
> "deeperSchemaChanged" equals true even when two map strings have the same 
> children fields.
> Discovered while trying to fix [DRILL-2385|DRILL-2385].
> Dataset for reproducing (MAP datatype object):
> {code}{"oooi":{"oa":{"oab":{"oabc":1{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories

2016-04-18 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245991#comment-15245991
 ] 

Vitalii Diravka edited comment on DRILL-4614 at 4/18/16 4:56 PM:
-

Discovered while investigating the issue in 
[DRILL-3577|https://issues.apache.org/jira/browse/DRILL-3577]


was (Author: vitalii):
Discovered while investigating the issue in 
[-DRILL-3577-|https://issues.apache.org/jira/browse/DRILL-3577]

> Drill must appoint one data type per one column for self-describing data 
> while querying directories 
> 
>
> Key: DRILL-4614
> URL: https://issues.apache.org/jira/browse/DRILL-4614
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> While drill selects data from the directory and detects data types on-the-fly
> it is possible that one field will be of several data types . 
> For example:
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> In this case will be created parquet table as the folder with two files.
> 3. Select the data
> {code}
> select t.others.additional from dfs.`tmp`.`tp` t
> {code}
> *The result of selecting will be mix of EXPR$0  and  
> EXPR$0.*
> It happens because Drill defines column data type per file.  
> The same result with json files.
> Since streaming aggregate does not support schema changes this issue makes 
> impossible of using aggregate functions with query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3577) Counting nested fields on CTAS-created-parquet file/s reports inaccurate results

2016-04-18 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246034#comment-15246034
 ] 

Vitalii Diravka edited comment on DRILL-3577 at 4/18/16 5:10 PM:
-

1. Partially fixed in 
[-DRILL-3551-|https://issues.apache.org/jira/browse/DRILL-3551]
{code}
0: jdbc:drill:zk=local> select count(t.others.other) from dfs.`tmp`.`tp` t;
+-+
| EXPR$0  |
+-+
| 20203   |
+-+
1 row selected (0.165 seconds)
{code}
2. {code}
0: jdbc:drill:zk=local> select count(t.others.additional) from dfs.`tmp`.`tp` t;
Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support schema 
changes
{code}
This error can be resolved after fixing the 
[DRILL-4614|https://issues.apache.org/jira/browse/DRILL-4614]


was (Author: vitalii):
1. Partially fixed in 
[-DRILL-3551-|https://issues.apache.org/jira/browse/DRILL-3551]
{code}
0: jdbc:drill:zk=local> select count(t.others.other) from dfs.`tmp`.`tp` t;
+-+
| EXPR$0  |
+-+
| 20203   |
+-+
1 row selected (0.165 seconds)
{code}
2. {code}
0: jdbc:drill:zk=local> select count(t.others.additional) from dfs.`tmp`.`tp` t;
Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support schema 
changes
{code}
This error can be resolved after fixing the 
[DRILL-3551|https://issues.apache.org/jira/browse/DRILL-3551]

> Counting nested fields on CTAS-created-parquet file/s reports inaccurate 
> results
> 
>
> Key: DRILL-3577
> URL: https://issues.apache.org/jira/browse/DRILL-3577
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.1.0
>Reporter: Hanifi Gunes
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.7.0
>
>
> I have not tried this at a smaller scale nor on JSON file directly but the 
> following seems to re-prod the issue
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> This should read
> {code}
> Fragment Number of records written
> 0_0   20200
> {code}
> 3. Count on nested fields via
> {code:sql}
> select count(t.others.additional) from dfs.`tmp`.`tp` t
> OR
> select count(t.others.other) from dfs.`tmp`.`tp` t
> {code}
> reports no rows as follows
> {code}
> EXPR$0
> 0
> {code}
> While
> {code:sql}
> select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not 
> null
> {code}
> reports expected 200 rows
> {code}
> EXPR$0
> 200
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3577) Counting nested fields on CTAS-created-parquet file/s reports inaccurate results

2016-04-18 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246034#comment-15246034
 ] 

Vitalii Diravka commented on DRILL-3577:


1. Partially fixed in 
[-DRILL-3551-|https://issues.apache.org/jira/browse/DRILL-3551]
{code}
0: jdbc:drill:zk=local> select count(t.others.other) from dfs.`tmp`.`tp` t;
+-+
| EXPR$0  |
+-+
| 20203   |
+-+
1 row selected (0.165 seconds)
{code}
2. {code}
0: jdbc:drill:zk=local> select count(t.others.additional) from dfs.`tmp`.`tp` t;
Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support schema 
changes
{code}
This error can be resolved after fixing the 
[DRILL-3551|https://issues.apache.org/jira/browse/DRILL-3551]

> Counting nested fields on CTAS-created-parquet file/s reports inaccurate 
> results
> 
>
> Key: DRILL-3577
> URL: https://issues.apache.org/jira/browse/DRILL-3577
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.1.0
>Reporter: Hanifi Gunes
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.7.0
>
>
> I have not tried this at a smaller scale nor on JSON file directly but the 
> following seems to re-prod the issue
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> This should read
> {code}
> Fragment Number of records written
> 0_0   20200
> {code}
> 3. Count on nested fields via
> {code:sql}
> select count(t.others.additional) from dfs.`tmp`.`tp` t
> OR
> select count(t.others.other) from dfs.`tmp`.`tp` t
> {code}
> reports no rows as follows
> {code}
> EXPR$0
> 0
> {code}
> While
> {code:sql}
> select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not 
> null
> {code}
> reports expected 200 rows
> {code}
> EXPR$0
> 200
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories

2016-04-18 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245991#comment-15245991
 ] 

Vitalii Diravka commented on DRILL-4614:


Discovered while investigating the issue in 
[-DRILL-3577-|https://issues.apache.org/jira/browse/DRILL-3577]

> Drill must appoint one data type per one column for self-describing data 
> while querying directories 
> 
>
> Key: DRILL-4614
> URL: https://issues.apache.org/jira/browse/DRILL-4614
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> While drill selects data from the directory and detects data types on-the-fly
> it is possible that one field will be of several data types . 
> For example:
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> In this case will be created parquet table as the folder with two files.
> 3. Select the data
> {code}
> select t.others.additional from dfs.`tmp`.`tp` t
> {code}
> *The result of selecting will be mix of EXPR$0  and  
> EXPR$0.*
> It happens because Drill defines column data type per file.  
> The same result with json files.
> Since streaming aggregate does not support schema changes this issue makes 
> impossible of using aggregate functions with query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories

2016-04-18 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4614:
---
Description: 
While drill selects data from the directory and detects data types on-the-fly
it is possible that one field will be of several data types . 

For example:

1. Create an input file as follows
20K rows with the following - 
{"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
200 rows with the following - 
{"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
entries only"}}

2. CTAS as follows
{code:sql}
CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
{code}

In this case will be created parquet table as the folder with two files.

3. Select the data
{code}
select t.others.additional from dfs.`tmp`.`tp` t
{code}
*The result of selecting will be mix of EXPR$0  and  
EXPR$0.*

It happens because Drill defines column data type per file.  
The same result with json files.
Since streaming aggregate does not support schema changes this issue makes 
impossible of using aggregate functions with query results.

  was:
While drill selects data from the directory and detects data types on-the-fly
it is possible that one field will be of several data types . 

For example:

1. Create an input file as follows
20K rows with the following - 
{"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
200 rows with the following - 
{"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
entries only"}}

2. CTAS as follows
{code:sql}
CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
{code}

In this case will be created parquet table as the folder with two files.

3. Select the data
{code}
select t.others.additional from dfs.`tmp`.`tp` t
{code}
The result of selecting will be mix of EXPR$0  and  
EXPR$0.

It happens because Drill defines column data type per file.  
The same result with json files.
Since streaming aggregate does not support schema changes this issue makes 
impossible of using aggregate functions with query results.


> Drill must appoint one data type per one column for self-describing data 
> while querying directories 
> 
>
> Key: DRILL-4614
> URL: https://issues.apache.org/jira/browse/DRILL-4614
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> While drill selects data from the directory and detects data types on-the-fly
> it is possible that one field will be of several data types . 
> For example:
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> In this case will be created parquet table as the folder with two files.
> 3. Select the data
> {code}
> select t.others.additional from dfs.`tmp`.`tp` t
> {code}
> *The result of selecting will be mix of EXPR$0  and  
> EXPR$0.*
> It happens because Drill defines column data type per file.  
> The same result with json files.
> Since streaming aggregate does not support schema changes this issue makes 
> impossible of using aggregate functions with query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories

2016-04-18 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-4614:
--

 Summary: Drill must appoint one data type per one column for 
self-describing data while querying directories 
 Key: DRILL-4614
 URL: https://issues.apache.org/jira/browse/DRILL-4614
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.6.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
 Fix For: 1.7.0


While drill selects data from the directory and detects data types on-the-fly
it is possible that one field will be of several data types . 

For example:

1. Create an input file as follows
20K rows with the following - 
{"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
200 rows with the following - 
{"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
entries only"}}

2. CTAS as follows
{code:sql}
CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
{code}

In this case will be created parquet table as the folder with two files.

3. Select the data
{code}
select t.others.additional from dfs.`tmp`.`tp` t
{code}
The result of selecting will be mix of EXPR$0  and  
EXPR$0.

It happens because Drill defines column data type per file.  
The same result with json files.
Since streaming aggregate does not support schema changes this issue makes 
impossible of using aggregate functions with query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories

2016-04-18 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4614:
---
Attachment: (was: DRILL-3551.json)

> Drill must appoint one data type per one column for self-describing data 
> while querying directories 
> 
>
> Key: DRILL-4614
> URL: https://issues.apache.org/jira/browse/DRILL-4614
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> While drill selects data from the directory and detects data types on-the-fly
> it is possible that one field will be of several data types . 
> For example:
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> In this case will be created parquet table as the folder with two files.
> 3. Select the data
> {code}
> select t.others.additional from dfs.`tmp`.`tp` t
> {code}
> *The result of selecting will be mix of EXPR$0  and  
> EXPR$0.*
> It happens because Drill defines column data type per file.  
> The same result with json files.
> Since streaming aggregate does not support schema changes this issue makes 
> impossible of using aggregate functions with query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories

2016-04-18 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4614:
---
Attachment: DRILL-3551.json

> Drill must appoint one data type per one column for self-describing data 
> while querying directories 
> 
>
> Key: DRILL-4614
> URL: https://issues.apache.org/jira/browse/DRILL-4614
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> While drill selects data from the directory and detects data types on-the-fly
> it is possible that one field will be of several data types . 
> For example:
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> In this case will be created parquet table as the folder with two files.
> 3. Select the data
> {code}
> select t.others.additional from dfs.`tmp`.`tp` t
> {code}
> *The result of selecting will be mix of EXPR$0  and  
> EXPR$0.*
> It happens because Drill defines column data type per file.  
> The same result with json files.
> Since streaming aggregate does not support schema changes this issue makes 
> impossible of using aggregate functions with query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories

2016-04-18 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4614:
---
Attachment: data.json

> Drill must appoint one data type per one column for self-describing data 
> while querying directories 
> 
>
> Key: DRILL-4614
> URL: https://issues.apache.org/jira/browse/DRILL-4614
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
> Attachments: data.json
>
>
> While drill selects data from the directory and detects data types on-the-fly
> it is possible that one field will be of several data types . 
> For example:
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> In this case will be created parquet table as the folder with two files.
> 3. Select the data
> {code}
> select t.others.additional from dfs.`tmp`.`tp` t
> {code}
> *The result of selecting will be mix of EXPR$0  and  
> EXPR$0.*
> It happens because Drill defines column data type per file.  
> The same result with json files.
> Since streaming aggregate does not support schema changes this issue makes 
> impossible of using aggregate functions with query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (DRILL-4459) SchemaChangeException while querying hive json table

2016-04-20 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reopened DRILL-4459:


> SchemaChangeException while querying hive json table
> 
>
> Key: DRILL-4459
> URL: https://issues.apache.org/jira/browse/DRILL-4459
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill, Functions - Hive
>Affects Versions: 1.4.0
> Environment: MapR-Drill 1.4.0
> Hive-1.2.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> getting the SchemaChangeException while querying json documents stored in 
> hive table.
> {noformat}
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> {noformat}
> minimum reproduce
> {noformat}
> created sample json documents using the attached script(randomdata.sh)
> hive>create table simplejson(json string);
> hive>load data local inpath '/tmp/simple.json' into table simplejson;
> now query it through Drill.
> Drill Version
> select * from sys.version;
> +---++-+-++
> | commit_id | commit_message | commit_time | build_email | build_time |
> +---++-+-++
> | eafe0a245a0d4c0234bfbead10c6b2d7c8ef413d | DRILL-3901:  Don't do early 
> expansion of directory in the non-metadata-cache case because it already 
> happens during ParquetGroupScan's metadata gathering operation. | 07.10.2015 
> @ 17:12:57 UTC | Unknown | 07.10.2015 @ 17:36:16 UTC |
> +---++-+-++
> 0: jdbc:drill:zk=> select * from hive.`default`.simplejson where 
> GET_JSON_OBJECT(simplejson.json, '$.DocId') = 'DocId2759947' limit 1;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 1:1
> [Error Id: 74f054a8-6f1d-4ddd-9064-3939fcc82647 on ip-10-0-0-233:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4459) SchemaChangeException while querying hive json table

2016-04-20 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-4459.

Resolution: Fixed

> SchemaChangeException while querying hive json table
> 
>
> Key: DRILL-4459
> URL: https://issues.apache.org/jira/browse/DRILL-4459
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill, Functions - Hive
>Affects Versions: 1.4.0
> Environment: MapR-Drill 1.4.0
> Hive-1.2.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> getting the SchemaChangeException while querying json documents stored in 
> hive table.
> {noformat}
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> {noformat}
> minimum reproduce
> {noformat}
> created sample json documents using the attached script(randomdata.sh)
> hive>create table simplejson(json string);
> hive>load data local inpath '/tmp/simple.json' into table simplejson;
> now query it through Drill.
> Drill Version
> select * from sys.version;
> +---++-+-++
> | commit_id | commit_message | commit_time | build_email | build_time |
> +---++-+-++
> | eafe0a245a0d4c0234bfbead10c6b2d7c8ef413d | DRILL-3901:  Don't do early 
> expansion of directory in the non-metadata-cache case because it already 
> happens during ParquetGroupScan's metadata gathering operation. | 07.10.2015 
> @ 17:12:57 UTC | Unknown | 07.10.2015 @ 17:36:16 UTC |
> +---++-+-++
> 0: jdbc:drill:zk=> select * from hive.`default`.simplejson where 
> GET_JSON_OBJECT(simplejson.json, '$.DocId') = 'DocId2759947' limit 1;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 1:1
> [Error Id: 74f054a8-6f1d-4ddd-9064-3939fcc82647 on ip-10-0-0-233:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories

2016-04-20 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka closed DRILL-4614.
--
Resolution: Duplicate

> Drill must appoint one data type per one column for self-describing data 
> while querying directories 
> 
>
> Key: DRILL-4614
> URL: https://issues.apache.org/jira/browse/DRILL-4614
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
> Fix For: 1.7.0
>
> Attachments: data.json
>
>
> While drill selects data from the directory and detects data types on-the-fly
> it is possible that one field will be of several data types . 
> For example:
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> In this case will be created parquet table as the folder with two files.
> 3. Select the data
> {code}
> select t.others.additional from dfs.`tmp`.`tp` t
> {code}
> *The result of selecting will be mix of EXPR$0  and  
> EXPR$0.*
> It happens because Drill defines column data type per file.  
> The same result with json files.
> Since streaming aggregate does not support schema changes this issue makes 
> impossible of using aggregate functions with query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories

2016-04-20 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka closed DRILL-4614.
--
Resolution: Fixed

The problem is already mentioned here  
https://issues.apache.org/jira/browse/DRILL-3806

> Drill must appoint one data type per one column for self-describing data 
> while querying directories 
> 
>
> Key: DRILL-4614
> URL: https://issues.apache.org/jira/browse/DRILL-4614
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
> Attachments: data.json
>
>
> While drill selects data from the directory and detects data types on-the-fly
> it is possible that one field will be of several data types . 
> For example:
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> In this case will be created parquet table as the folder with two files.
> 3. Select the data
> {code}
> select t.others.additional from dfs.`tmp`.`tp` t
> {code}
> *The result of selecting will be mix of EXPR$0  and  
> EXPR$0.*
> It happens because Drill defines column data type per file.  
> The same result with json files.
> Since streaming aggregate does not support schema changes this issue makes 
> impossible of using aggregate functions with query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories

2016-04-20 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reopened DRILL-4614:

  Assignee: (was: Vitalii Diravka)

> Drill must appoint one data type per one column for self-describing data 
> while querying directories 
> 
>
> Key: DRILL-4614
> URL: https://issues.apache.org/jira/browse/DRILL-4614
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
> Fix For: 1.7.0
>
> Attachments: data.json
>
>
> While drill selects data from the directory and detects data types on-the-fly
> it is possible that one field will be of several data types . 
> For example:
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> In this case will be created parquet table as the folder with two files.
> 3. Select the data
> {code}
> select t.others.additional from dfs.`tmp`.`tp` t
> {code}
> *The result of selecting will be mix of EXPR$0  and  
> EXPR$0.*
> It happens because Drill defines column data type per file.  
> The same result with json files.
> Since streaming aggregate does not support schema changes this issue makes 
> impossible of using aggregate functions with query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3806) add metadata for untyped null and simple type promotion

2016-04-20 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249665#comment-15249665
 ] 

Vitalii Diravka commented on DRILL-3806:


I saw the same problem with json and parquet files. 
Here is the example for reproducing 
[-DRILL-4614-|https://issues.apache.org/jira/browse/DRILL-4614]

> add metadata for untyped null and simple type promotion
> ---
>
> Key: DRILL-3806
> URL: https://issues.apache.org/jira/browse/DRILL-3806
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Reporter: Julien Le Dem
> Fix For: Future
>
>
> Currently when a field has literal null values in JSON the type will be 
> assigned as BIGINT by default for lack of better type.
> ```
> {
>   "a": null
> }
> ```
> if later on a is assigned with a string value the query will fail with a 
> schema change error,
> The idea is to capture the notion of "untyped null" and implement simple type 
> promotion from untyped null to the actual type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4459) SchemaChangeException while querying hive json table

2016-04-20 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka closed DRILL-4459.
--
Resolution: Fixed

> SchemaChangeException while querying hive json table
> 
>
> Key: DRILL-4459
> URL: https://issues.apache.org/jira/browse/DRILL-4459
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill, Functions - Hive
>Affects Versions: 1.4.0
> Environment: MapR-Drill 1.4.0
> Hive-1.2.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> getting the SchemaChangeException while querying json documents stored in 
> hive table.
> {noformat}
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> {noformat}
> minimum reproduce
> {noformat}
> created sample json documents using the attached script(randomdata.sh)
> hive>create table simplejson(json string);
> hive>load data local inpath '/tmp/simple.json' into table simplejson;
> now query it through Drill.
> Drill Version
> select * from sys.version;
> +---++-+-++
> | commit_id | commit_message | commit_time | build_email | build_time |
> +---++-+-++
> | eafe0a245a0d4c0234bfbead10c6b2d7c8ef413d | DRILL-3901:  Don't do early 
> expansion of directory in the non-metadata-cache case because it already 
> happens during ParquetGroupScan's metadata gathering operation. | 07.10.2015 
> @ 17:12:57 UTC | Unknown | 07.10.2015 @ 17:36:16 UTC |
> +---++-+-++
> 0: jdbc:drill:zk=> select * from hive.`default`.simplejson where 
> GET_JSON_OBJECT(simplejson.json, '$.DocId') = 'DocId2759947' limit 1;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 1:1
> [Error Id: 74f054a8-6f1d-4ddd-9064-3939fcc82647 on ip-10-0-0-233:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3577) Counting nested fields on CTAS-created-parquet file/s reports inaccurate results

2016-04-20 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246034#comment-15246034
 ] 

Vitalii Diravka edited comment on DRILL-3577 at 4/20/16 12:06 PM:
--

1. Fixed in [-DRILL-3551-|https://issues.apache.org/jira/browse/DRILL-3551]
{code}
0: jdbc:drill:zk=local> select count(t.others.other) from dfs.`tmp`.`tp` t;
+-+
| EXPR$0  |
+-+
| 20203   |
+-+
1 row selected (0.165 seconds)
{code}
2. {code}
0: jdbc:drill:zk=local> select count(t.others.additional) from dfs.`tmp`.`tp` t;
Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support schema 
changes
{code}

This is a known issue. This problem is already mentioned here: 
[DRILL-4505|https://issues.apache.org/jira/browse/DRILL-4505], 
[DRILL-3806|https://issues.apache.org/jira/browse/DRILL-3806]
[DRILL-4538|https://issues.apache.org/jira/browse/DRILL-4538]

It can work now with cast:
{code}
0: jdbc:drill:zk=local> select count(CAST(t.others.additional as VARCHAR)) from 
dfs.`tmp`.`tp` t;
+-+
| EXPR$0  |
+-+
| 201 |
+-+
1 row selected (0.126 seconds)
{code}


was (Author: vitalii):
1. Partially fixed in 
[-DRILL-3551-|https://issues.apache.org/jira/browse/DRILL-3551]
{code}
0: jdbc:drill:zk=local> select count(t.others.other) from dfs.`tmp`.`tp` t;
+-+
| EXPR$0  |
+-+
| 20203   |
+-+
1 row selected (0.165 seconds)
{code}
2. {code}
0: jdbc:drill:zk=local> select count(t.others.additional) from dfs.`tmp`.`tp` t;
Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support schema 
changes
{code}
This error can be resolved after fixing the 
[DRILL-4614|https://issues.apache.org/jira/browse/DRILL-4614]

> Counting nested fields on CTAS-created-parquet file/s reports inaccurate 
> results
> 
>
> Key: DRILL-3577
> URL: https://issues.apache.org/jira/browse/DRILL-3577
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.1.0
>Reporter: Hanifi Gunes
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.7.0
>
>
> I have not tried this at a smaller scale nor on JSON file directly but the 
> following seems to re-prod the issue
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> This should read
> {code}
> Fragment Number of records written
> 0_0   20200
> {code}
> 3. Count on nested fields via
> {code:sql}
> select count(t.others.additional) from dfs.`tmp`.`tp` t
> OR
> select count(t.others.other) from dfs.`tmp`.`tp` t
> {code}
> reports no rows as follows
> {code}
> EXPR$0
> 0
> {code}
> While
> {code:sql}
> select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not 
> null
> {code}
> reports expected 200 rows
> {code}
> EXPR$0
> 200
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-2385) count on complex objects failed with missing function implementation

2016-04-20 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-2385:
--

Assignee: Vitalii Diravka

> count on complex objects failed with missing function implementation
> 
>
> Key: DRILL-2385
> URL: https://issues.apache.org/jira/browse/DRILL-2385
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 0.8.0
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.7.0
>
>
> #Wed Mar 04 01:23:42 EST 2015
> git.commit.id.abbrev=71b6bfe
> Have a complex type looks like the following:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.sia from 
> `complex.json` t limit 1;
> ++
> |sia |
> ++
> | [1,11,101,1001] |
> ++
> {code}
> A count on the complex type will fail with missing function implementation:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.gbyi, count(t.sia) 
> countsia from `complex.json` t group by t.gbyi;
> Query failed: RemoteRpcException: Failure while running fragment., Schema is 
> currently null.  You must call buildSchema(SelectionVectorMode) before this 
> container can return a schema. [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on 
> qa-node119.qa.lab:31010 ]
> [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on qa-node119.qa.lab:31010 ]
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> drillbit.log
> {code}
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] ERROR 
> o.a.drill.exec.ops.FragmentContext - Fragment Context received failure.
> org.apache.drill.exec.exception.SchemaChangeException: Failure while 
> materializing expression.
> Error in expression at index 0.  Error: Missing function implementation: 
> [count(BIGINT-REPEATED)].  Full expression: null.
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal(HashAggBatch.java:210)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator(HashAggBatch.java:158)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:101)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:114)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing 
> fragment
> java.lang.NullPointerException: Schema is currently null.  You must call 
> buildSchema(SelectionVectorMode) before this container can return a schema.
> at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) 
> ~[guava-14.0.1.jar:na]
> at 
> org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:261)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.getSchema(AbstractRecordBatch.java:155)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> 

[jira] [Reopened] (DRILL-3577) Counting nested fields on CTAS-created-parquet file/s reports inaccurate results

2016-04-20 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reopened DRILL-3577:


> Counting nested fields on CTAS-created-parquet file/s reports inaccurate 
> results
> 
>
> Key: DRILL-3577
> URL: https://issues.apache.org/jira/browse/DRILL-3577
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.1.0
>Reporter: Hanifi Gunes
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.7.0
>
>
> I have not tried this at a smaller scale nor on JSON file directly but the 
> following seems to re-prod the issue
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> This should read
> {code}
> Fragment Number of records written
> 0_0   20200
> {code}
> 3. Count on nested fields via
> {code:sql}
> select count(t.others.additional) from dfs.`tmp`.`tp` t
> OR
> select count(t.others.other) from dfs.`tmp`.`tp` t
> {code}
> reports no rows as follows
> {code}
> EXPR$0
> 0
> {code}
> While
> {code:sql}
> select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not 
> null
> {code}
> reports expected 200 rows
> {code}
> EXPR$0
> 200
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3577) Counting nested fields on CTAS-created-parquet file/s reports inaccurate results

2016-04-20 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-3577:
---
Fix Version/s: (was: 1.7.0)
   1.2.0

> Counting nested fields on CTAS-created-parquet file/s reports inaccurate 
> results
> 
>
> Key: DRILL-3577
> URL: https://issues.apache.org/jira/browse/DRILL-3577
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.1.0
>Reporter: Hanifi Gunes
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.2.0
>
>
> I have not tried this at a smaller scale nor on JSON file directly but the 
> following seems to re-prod the issue
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> This should read
> {code}
> Fragment Number of records written
> 0_0   20200
> {code}
> 3. Count on nested fields via
> {code:sql}
> select count(t.others.additional) from dfs.`tmp`.`tp` t
> OR
> select count(t.others.other) from dfs.`tmp`.`tp` t
> {code}
> reports no rows as follows
> {code}
> EXPR$0
> 0
> {code}
> While
> {code:sql}
> select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not 
> null
> {code}
> reports expected 200 rows
> {code}
> EXPR$0
> 200
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4682) Allow full schema identifier in SELECT clause

2016-07-26 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-4682:
--

Assignee: Vitalii Diravka

> Allow full schema identifier in SELECT clause
> -
>
> Key: DRILL-4682
> URL: https://issues.apache.org/jira/browse/DRILL-4682
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Andries Engelbrecht
>Assignee: Vitalii Diravka
>
> Currently Drill requires aliases to identify columns in the SELECT clause 
> when working with multiple tables/workspaces.
> Many BI/Analytical and other tools by default will use the full schema 
> identifier in the select clause when generating SQL statements for execution 
> for generic JDBC or ODBC sources. Not supporting this feature causes issues 
> and a slower adoption of utilizing Drill as an execution engine within the 
> larger Analytical SQL community.
> Propose to support 
> SELECT ... FROM 
> ..
> Also see DRILL-3510 for double quote support as per ANSI_QUOTES
> SELECT ""."".""."" FROM 
> ""."".""
> Which is very common generic SQL being generated by most tools when dealing 
> with a generic SQL data source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4763) Parquet file with DATE logical type produces wrong results for simple SELECT

2016-07-06 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-4763:
--

Assignee: Vitalii Diravka

> Parquet file with DATE logical type produces wrong results for simple SELECT
> 
>
> Key: DRILL-4763
> URL: https://issues.apache.org/jira/browse/DRILL-4763
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
> Attachments: date.parquet, int_16.parquet
>
>
> Created a simple Parquet file with the following schema:
> message test { required int32 index; required int32 value (DATE); required 
> int32 raw; }
> That is, a file with an int32 storage type and a DATE logical type. Then, 
> created a number of test values:
> 0 (which should be interpreted as 1970-01-01) and
> (int) (System.currentTimeMillis() / (24*60*60*1000) ) Which should be 
> interpreted as the number of days since 1970-01-01 and today.
> According to the Parquet spec 
> (https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md), 
> Parquet dates are expressed as "the number of days from the Unix epoch, 1 
> January 1970."
> Java timestamps are expressed as "measured in milliseconds, between the 
> current time and midnight, January 1, 1970 UTC."
> There is ambiguity here: Parquet dates are presumably local times not 
> absolute times, so the math above will actually tell us the date in London 
> right now, but that's close enough.
> Generate the local file to date.parquet. Query it with:
> SELECT * from `local`.`root`.`date.parquet`;
> The results are incorrect:
> index value raw
> 1 -11395-10-18T00:00:00.000-07:52:58  0
> Here, we have a value of 0. The displayed date is decidedly not 
> 1970-01-01T00:00:00. We actually have many problems:
> 1. The date is far off.
> 2. The output shows time. But, the Parquet DATE format explcitly does NOT 
> include time, so it makes no sense to include it.
> 3. The output attempts to show a time zone, but a time zone of -07:52:58, 
> while close to PST, is not right (there is no timezine that is of by 7:02 
> from UTC.)
> 4. The data has no time zone, Parquet DATE explicilty is a local time, so it 
> is impossible to know the relationship between that date an UTC.
> The correct output (in ISO format) would be: 1970-01-01
> The last line should be today's date, but instead is:
> 6 -11348-04-20T00:00:00.000-07:52:58  16986
> Expected:
> 2016-07-04
> Note that all the information to produce the right information is available 
> to Drill:
> 1. The DATE annotation says the meaning of the signed 32-bit integer.
> 2. Given the starting point and duration in days, the conversion to Drill's 
> own internal date format is unambiguous.
> 3. The DATE annotation says that the date is local, so Drill should not 
> attempt to convert to UTC. (That is, a Java Date object can't be used, 
> instead a Joda/Java 8 LocalDate is necessary.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4309) Make this option store.hive.optimize_scan_with_native_readers=true default

2016-07-06 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-4309:
--

Assignee: Vitalii Diravka  (was: Arina Ielchiieva)

> Make this option store.hive.optimize_scan_with_native_readers=true default
> --
>
> Key: DRILL-4309
> URL: https://issues.apache.org/jira/browse/DRILL-4309
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> This new feature has been around and used/tests in many scenarios. 
> We should enable this feature by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4799) Schema#getTable should return null when the table does not exist.

2016-07-22 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-4799:
--

 Summary: Schema#getTable should return null when the table does 
not exist.
 Key: DRILL-4799
 URL: https://issues.apache.org/jira/browse/DRILL-4799
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow, Storage - HBase
Affects Versions: 1.7.0
Reporter: Vitalii Diravka
 Fix For: Future


There is an unwritten rule: _schema#getTable_ should return null if the table 
does not exist (continuation of the conversation in the 
[DRILL-4673|https://issues.apache.org/jira/browse/DRILL-4673]).

1. That should be documented to ensure that all plugins follow this rule.

2. Accordingly in HBase plugin _HBaseSchemaFactory#getTable_ should return null 
when table is not found instead of _TableNotFoundException_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4673) Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return

2016-07-04 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15361023#comment-15361023
 ] 

Vitalii Diravka commented on DRILL-4673:


It was decided to use "IF EXISTS" statement.
To implement it "IF" keyword is added to the reserved words list.

After that "IF" function (loaded from Hive) will stop working. In this case 
users will have two options:
a) surround if with backticks (ex: select `if`(condition,option1, option2)
from table)
b) replace if function with case statement

> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command return
> -
>
> Key: DRILL-4673
> URL: https://issues.apache.org/jira/browse/DRILL-4673
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: drill
>
> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command "DROP TABLE" return if table doesn't exist.
> The same for "DROP VIEW IF EXISTS"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4763) Parquet file with DATE logical type produces wrong results for simple SELECT

2016-08-16 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422785#comment-15422785
 ] 

Vitalii Diravka commented on DRILL-4763:


Such logic is used to calculate parquet date in drill now:
{code}
(Julian_day)*2 = unix_first_day
((4713_BE+1970)*365,26)*2 = 4881176
{code}
Accordingly drill doc must be:
{code}
(Julian_day +_1970)*365,26 = unix_first_day
(4713_BE+1970)*365,26 = 2457615
{code}
Accordinly parquet doc must be (right case):
{code}
unix_first_day = 0
{code}


*For example:*
Parquet file created from hive:
{code}
hive> select * from test_parquet;
OK
1970-01-05  17:51   Visakh
Time taken: 0.046 seconds, Fetched: 1 row(s)
{code}
{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar 
parquet-tools-1.6.0rc3-SNAPSHOT.jar cat /tmp/parquetFolder/test_parquet/
dt = 4
tm = 17:51
nm = Visakh
{code}
{code}
Running 
org.apache.drill.exec.store.parquet.columnreaders.TestDateReader#testParquetDate
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
1 row(s):
---
| dt  | tm   
| nm   |
---
| -11395-10-22T00:00:00.000Z  | 17:51   
| Visakh  |
---
Total record count: 1
{code}
Parquet file created from drill:
{code}
0: jdbc:drill:zk=local> select * from drill_parquet;
+---+-+
| current_date  | unix_first_day  |
+---+-+
| 2016-08-15| 1970-01-01  |
+---+-+
1 row selected (0.142 seconds)
{code}

{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar 
parquet-tools-1.6.0rc3-SNAPSHOT.jar cat /tmp/drill_parquet/
current_date = 4898204
unix_first_day = 4881176
{code}

*With fix:*
{code}
0: jdbc:drill:zk=local> create table drill_parquet_with_fix as SELECT 
current_date, CAST('1970-01-05' as date) as unix_fifth_day, CAST('1970-01-01' 
as date) as unix_first_day FROM (VALUES(1));
+---++
| Fragment  | Number of records written  |
+---++
| 0_0   | 1  |
+---++
1 row selected (0.257 seconds)
0: jdbc:drill:zk=local> select * from drill_parquet_with_fix;
+---+-+-+
| current_date  | unix_fifth_day  | unix_first_day  |
+---+-+-+
| 2016-08-15| 1970-01-05  | 1970-01-01  |
+---+-+-+
1 row selected (0.174 seconds)
{code}
{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar 
parquet-tools-1.6.0rc3-SNAPSHOT.jar cat /tmp/drill_parquet_with_fix
current_date = 17028
unix_fifth_day = 4
unix_first_day = 0
{code}

> Parquet file with DATE logical type produces wrong results for simple SELECT
> 
>
> Key: DRILL-4763
> URL: https://issues.apache.org/jira/browse/DRILL-4763
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
> Attachments: date.parquet, int_16.parquet
>
>
> Created a simple Parquet file with the following schema:
> message test { required int32 index; required int32 value (DATE); required 
> int32 raw; }
> That is, a file with an int32 storage type and a DATE logical type. Then, 
> created a number of test values:
> 0 (which should be interpreted as 1970-01-01) and
> (int) (System.currentTimeMillis() / (24*60*60*1000) ) Which should be 
> interpreted as the number of days since 1970-01-01 and today.
> According to the Parquet spec 
> (https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md), 
> Parquet dates are expressed as "the number of days from the Unix epoch, 1 
> January 1970."
> Java timestamps are expressed as "measured in milliseconds, between the 
> current time and midnight, January 1, 1970 UTC."
> There is ambiguity here: Parquet dates are presumably local times not 
> absolute times, so the math above will actually tell us the date in London 
> right now, but that's 

[jira] [Comment Edited] (DRILL-4763) Parquet file with DATE logical type produces wrong results for simple SELECT

2016-08-16 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422785#comment-15422785
 ] 

Vitalii Diravka edited comment on DRILL-4763 at 8/16/16 2:34 PM:
-

The following logic is used to calculate parquet date in drill now:
{code}
(Julian_day)*2 = unix_first_day
((4713_BE+1970)*365,26)*2 = 4881176
{code}
According to drill doc should use the following logic:
{code}
(Julian_day +_1970)*365,26 = unix_first_day
(4713_BE+1970)*365,26 = 2457615
{code}
According to parquet doc should use the following logic which is the right case:
{code}
unix_first_day = 0
{code}


*For example:*
Parquet file created from hive:
{code}
hive> select * from test_parquet;
OK
1970-01-05  17:51   Visakh
Time taken: 0.046 seconds, Fetched: 1 row(s)
{code}
{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar 
parquet-tools-1.6.0rc3-SNAPSHOT.jar cat /tmp/parquetFolder/test_parquet/
dt = 4
tm = 17:51
nm = Visakh
{code}
{code}
Running 
org.apache.drill.exec.store.parquet.columnreaders.TestDateReader#testParquetDate
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
1 row(s):
---
| dt  | tm   
| nm   |
---
| -11395-10-22T00:00:00.000Z  | 17:51   
| Visakh  |
---
Total record count: 1
{code}
Parquet file created from drill:
{code}
0: jdbc:drill:zk=local> select * from drill_parquet;
+---+-+
| current_date  | unix_first_day  |
+---+-+
| 2016-08-15| 1970-01-01  |
+---+-+
1 row selected (0.142 seconds)
{code}

{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar 
parquet-tools-1.6.0rc3-SNAPSHOT.jar cat /tmp/drill_parquet/
current_date = 4898204
unix_first_day = 4881176
{code}

*With fix:*
{code}
0: jdbc:drill:zk=local> create table drill_parquet_with_fix as SELECT 
current_date, CAST('1970-01-05' as date) as unix_fifth_day, CAST('1970-01-01' 
as date) as unix_first_day FROM (VALUES(1));
+---++
| Fragment  | Number of records written  |
+---++
| 0_0   | 1  |
+---++
1 row selected (0.257 seconds)
0: jdbc:drill:zk=local> select * from drill_parquet_with_fix;
+---+-+-+
| current_date  | unix_fifth_day  | unix_first_day  |
+---+-+-+
| 2016-08-15| 1970-01-05  | 1970-01-01  |
+---+-+-+
1 row selected (0.174 seconds)
{code}
{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar 
parquet-tools-1.6.0rc3-SNAPSHOT.jar cat /tmp/drill_parquet_with_fix
current_date = 17028
unix_fifth_day = 4
unix_first_day = 0
{code}


was (Author: vitalii):
Such logic is used to calculate parquet date in drill now:
{code}
(Julian_day)*2 = unix_first_day
((4713_BE+1970)*365,26)*2 = 4881176
{code}
Accordingly drill doc must be:
{code}
(Julian_day +_1970)*365,26 = unix_first_day
(4713_BE+1970)*365,26 = 2457615
{code}
Accordinly parquet doc must be (right case):
{code}
unix_first_day = 0
{code}


*For example:*
Parquet file created from hive:
{code}
hive> select * from test_parquet;
OK
1970-01-05  17:51   Visakh
Time taken: 0.046 seconds, Fetched: 1 row(s)
{code}
{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar 
parquet-tools-1.6.0rc3-SNAPSHOT.jar cat /tmp/parquetFolder/test_parquet/
dt = 4
tm = 17:51
nm = Visakh
{code}
{code}
Running 
org.apache.drill.exec.store.parquet.columnreaders.TestDateReader#testParquetDate
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
1 row(s):
---
| dt  | tm   
| nm   |

[jira] [Updated] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC

2017-02-07 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-5034:
---
Labels:   (was: ready-to-commit)

> Select timestamp from hive generated parquet always return in UTC
> -
>
> Key: DRILL-5034
> URL: https://issues.apache.org/jira/browse/DRILL-5034
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Krystal
>Assignee: Vitalii Diravka
>
> commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904
> Reading timestamp data against a hive parquet table from drill automatically 
> converts the timestamp data to UTC. 
> {code}
> SELECT TIMEOFDAY() FROM (VALUES(1));
> +--+
> |EXPR$0|
> +--+
> | 2016-11-10 12:33:26.547 America/Los_Angeles  |
> +--+
> {code}
> data schema:
> {code}
> message hive_schema {
>   optional int32 voter_id;
>   optional binary name (UTF8);
>   optional int32 age;
>   optional binary registration (UTF8);
>   optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
>   optional int32 voterzone;
>   optional int96 create_timestamp;
>   optional int32 create_date (DATE);
> }
> {code}
> Using drill-1.8, the returned timestamps match the table data:
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> `/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-23 20:03:58.0  |
> | null   |
> | 2016-09-09 12:01:18.0  |
> | 2017-03-06 20:35:55.0  |
> | 2017-01-20 22:32:43.0  |
> ++
> 5 rows selected (1.032 seconds)
> {code}
> If the user timzone is changed to UTC, then the timestamp data is returned in 
> UTC time.
> Using drill-1.9, the returned timestamps got converted to UTC eventhough the 
> user timezone is in PST.
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
> {code}
> alter session set `store.parquet.reader.int96_as_timestamp`=true;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | store.parquet.reader.int96_as_timestamp updated.  |
> +---+---+
> select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` 
> limit 5;
> ++
> |create_timestamp|
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5002) Using hive's date functions on top of date column gives wrong results for local time-zone

2017-02-08 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858106#comment-15858106
 ] 

Vitalii Diravka commented on DRILL-5002:


I changed UTC time-zone to the local one (-10.00) and reproduced the issue. So 
the root-causes of the problem are hive's date functions and local time-zone 
(Hive receives UTC time and converts it to the local time. Therefore it is 
necessary to pass UTC time to Hive). 

But the issue corresponds to every data source. For example:
{code}
0: jdbc:drill:zk=local> select to_date('1994-01-01','-mm-dd') from 
(VALUES(1));
+-+
|   EXPR$0|
+-+
| 1994-01-01  |
+-+
1 row selected (0.096 seconds)
0: jdbc:drill:zk=local> select last_day(to_date('1994-01-01','-mm-dd')) 
from (VALUES(1));
+-+
|   EXPR$0|
+-+
| 1993-12-31  |
+-+
{code}
Therefore I changed the name of this ticket.

> Using hive's date functions on top of date column gives wrong results for 
> local time-zone
> -
>
> Key: DRILL-5002
> URL: https://issues.apache.org/jira/browse/DRILL-5002
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Critical
> Attachments: 0_0_0.parquet
>
>
> git.commit.id.abbrev=190d5d4
> Wrong Result 1 :
> {code}
> select l_shipdate, `month`(l_shipdate) from cp.`tpch/lineitem.parquet` where 
> l_shipdate = date '1994-02-01' limit 2;
> +-+-+
> | l_shipdate  | EXPR$1  |
> +-+-+
> | 1994-02-01  | 1   |
> | 1994-02-01  | 1   |
> +-+-+
> {code}
> Wrong Result 2 : 
> {code}
> select l_shipdate, `day`(l_shipdate) from cp.`tpch/lineitem.parquet` where 
> l_shipdate = date '1998-06-02' limit 2;
> +-+-+
> | l_shipdate  | EXPR$1  |
> +-+-+
> | 1998-06-02  | 1   |
> | 1998-06-02  | 1   |
> +-+-+
> {code}
> Correct Result :
> {code}
> select l_shipdate, `month`(l_shipdate) from cp.`tpch/lineitem.parquet` where 
> l_shipdate = date '1998-06-02' limit 2;
> +-+-+
> | l_shipdate  | EXPR$1  |
> +-+-+
> | 1998-06-02  | 6   |
> | 1998-06-02  | 6   |
> +-+-+
> {code}
> It looks like we are getting wrong results when the 'day' is '01'. I only 
> tried month and day hive functionsbut wouldn't be surprised if they have 
> similar issues too.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5002) Using hive's date functions on top of date column gives wrong results for local time-zone

2017-02-08 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-5002:
---
Summary: Using hive's date functions on top of date column gives wrong 
results for local time-zone  (was: Using hive's date functions on top of date 
column in parquet gives wrong results)

> Using hive's date functions on top of date column gives wrong results for 
> local time-zone
> -
>
> Key: DRILL-5002
> URL: https://issues.apache.org/jira/browse/DRILL-5002
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Critical
> Attachments: 0_0_0.parquet
>
>
> git.commit.id.abbrev=190d5d4
> Wrong Result 1 :
> {code}
> select l_shipdate, `month`(l_shipdate) from cp.`tpch/lineitem.parquet` where 
> l_shipdate = date '1994-02-01' limit 2;
> +-+-+
> | l_shipdate  | EXPR$1  |
> +-+-+
> | 1994-02-01  | 1   |
> | 1994-02-01  | 1   |
> +-+-+
> {code}
> Wrong Result 2 : 
> {code}
> select l_shipdate, `day`(l_shipdate) from cp.`tpch/lineitem.parquet` where 
> l_shipdate = date '1998-06-02' limit 2;
> +-+-+
> | l_shipdate  | EXPR$1  |
> +-+-+
> | 1998-06-02  | 1   |
> | 1998-06-02  | 1   |
> +-+-+
> {code}
> Correct Result :
> {code}
> select l_shipdate, `month`(l_shipdate) from cp.`tpch/lineitem.parquet` where 
> l_shipdate = date '1998-06-02' limit 2;
> +-+-+
> | l_shipdate  | EXPR$1  |
> +-+-+
> | 1998-06-02  | 6   |
> | 1998-06-02  | 6   |
> +-+-+
> {code}
> It looks like we are getting wrong results when the 'day' is '01'. I only 
> tried month and day hive functionsbut wouldn't be surprised if they have 
> similar issues too.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC

2017-02-08 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-5034:
---
Labels: ready-to-commit  (was: )

> Select timestamp from hive generated parquet always return in UTC
> -
>
> Key: DRILL-5034
> URL: https://issues.apache.org/jira/browse/DRILL-5034
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Krystal
>Assignee: Vitalii Diravka
>  Labels: ready-to-commit
>
> commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904
> Reading timestamp data against a hive parquet table from drill automatically 
> converts the timestamp data to UTC. 
> {code}
> SELECT TIMEOFDAY() FROM (VALUES(1));
> +--+
> |EXPR$0|
> +--+
> | 2016-11-10 12:33:26.547 America/Los_Angeles  |
> +--+
> {code}
> data schema:
> {code}
> message hive_schema {
>   optional int32 voter_id;
>   optional binary name (UTF8);
>   optional int32 age;
>   optional binary registration (UTF8);
>   optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
>   optional int32 voterzone;
>   optional int96 create_timestamp;
>   optional int32 create_date (DATE);
> }
> {code}
> Using drill-1.8, the returned timestamps match the table data:
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> `/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-23 20:03:58.0  |
> | null   |
> | 2016-09-09 12:01:18.0  |
> | 2017-03-06 20:35:55.0  |
> | 2017-01-20 22:32:43.0  |
> ++
> 5 rows selected (1.032 seconds)
> {code}
> If the user timzone is changed to UTC, then the timestamp data is returned in 
> UTC time.
> Using drill-1.9, the returned timestamps got converted to UTC eventhough the 
> user timezone is in PST.
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
> {code}
> alter session set `store.parquet.reader.int96_as_timestamp`=true;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | store.parquet.reader.int96_as_timestamp updated.  |
> +---+---+
> select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` 
> limit 5;
> ++
> |create_timestamp|
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC

2017-02-06 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-5034:
---
Labels: ready-to-commit  (was: )

> Select timestamp from hive generated parquet always return in UTC
> -
>
> Key: DRILL-5034
> URL: https://issues.apache.org/jira/browse/DRILL-5034
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Krystal
>Assignee: Vitalii Diravka
>  Labels: ready-to-commit
>
> commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904
> Reading timestamp data against a hive parquet table from drill automatically 
> converts the timestamp data to UTC. 
> {code}
> SELECT TIMEOFDAY() FROM (VALUES(1));
> +--+
> |EXPR$0|
> +--+
> | 2016-11-10 12:33:26.547 America/Los_Angeles  |
> +--+
> {code}
> data schema:
> {code}
> message hive_schema {
>   optional int32 voter_id;
>   optional binary name (UTF8);
>   optional int32 age;
>   optional binary registration (UTF8);
>   optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
>   optional int32 voterzone;
>   optional int96 create_timestamp;
>   optional int32 create_date (DATE);
> }
> {code}
> Using drill-1.8, the returned timestamps match the table data:
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> `/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-23 20:03:58.0  |
> | null   |
> | 2016-09-09 12:01:18.0  |
> | 2017-03-06 20:35:55.0  |
> | 2017-01-20 22:32:43.0  |
> ++
> 5 rows selected (1.032 seconds)
> {code}
> If the user timzone is changed to UTC, then the timestamp data is returned in 
> UTC time.
> Using drill-1.9, the returned timestamps got converted to UTC eventhough the 
> user timezone is in PST.
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
> {code}
> alter session set `store.parquet.reader.int96_as_timestamp`=true;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | store.parquet.reader.int96_as_timestamp updated.  |
> +---+---+
> select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` 
> limit 5;
> ++
> |create_timestamp|
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-3510) Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL identifiers

2017-02-06 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-3510:
---
Fix Version/s: (was: Future)
   1.10.0

> Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL 
> identifiers 
> --
>
> Key: DRILL-3510
> URL: https://issues.apache.org/jira/browse/DRILL-3510
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Jinfeng Ni
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.10.0
>
> Attachments: DRILL-3510.patch, DRILL-3510.patch
>
>
> Currently Drill's SQL parser uses backtick as identifier quotes, the same as 
> what MySQL does. However, this is different from ANSI SQL specification, 
> where double quote is used as identifier quotes.  
> MySQL has an option "ANSI_QUOTES", which could be switched on/off by user. 
> Drill should follow the same way, so that Drill users do not have to rewrite 
> their existing queries, if their queries use double quotes. 
> {code}
> SET sql_mode='ANSI_QUOTES';
> {code}
>



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3510) Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL identifiers

2017-02-06 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854403#comment-15854403
 ] 

Vitalii Diravka commented on DRILL-3510:


Instead of a boolean ANSI_QUOTES option, I implemented an EnumeratedString 
QUOTING_IDENTIFIERS_CHARACTER option. So drill will support three quoting 
identifier characters: BACK_TICK( ` ), DOUBLE_QUOTE( " ), BRACKET( [ ). It is 
associated with calcite parser, which supports three characters now: 
[org.apache.calcite.avatica.util.Quoting.java|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/avatica/core/src/main/java/org/apache/calcite/avatica/util/Quoting.java#L20].

To get current server session options as metadta for jdbc client, a new RPC 
request  is implemented.

> Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL 
> identifiers 
> --
>
> Key: DRILL-3510
> URL: https://issues.apache.org/jira/browse/DRILL-3510
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Jinfeng Ni
>Assignee: Vitalii Diravka
> Fix For: Future
>
> Attachments: DRILL-3510.patch, DRILL-3510.patch
>
>
> Currently Drill's SQL parser uses backtick as identifier quotes, the same as 
> what MySQL does. However, this is different from ANSI SQL specification, 
> where double quote is used as identifier quotes.  
> MySQL has an option "ANSI_QUOTES", which could be switched on/off by user. 
> Drill should follow the same way, so that Drill users do not have to rewrite 
> their existing queries, if their queries use double quotes. 
> {code}
> SET sql_mode='ANSI_QUOTES';
> {code}
>



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-3510) Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL identifiers

2017-02-06 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-3510:
---
Labels: doc-impacting  (was: )

> Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL 
> identifiers 
> --
>
> Key: DRILL-3510
> URL: https://issues.apache.org/jira/browse/DRILL-3510
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Jinfeng Ni
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: Future
>
> Attachments: DRILL-3510.patch, DRILL-3510.patch
>
>
> Currently Drill's SQL parser uses backtick as identifier quotes, the same as 
> what MySQL does. However, this is different from ANSI SQL specification, 
> where double quote is used as identifier quotes.  
> MySQL has an option "ANSI_QUOTES", which could be switched on/off by user. 
> Drill should follow the same way, so that Drill users do not have to rewrite 
> their existing queries, if their queries use double quotes. 
> {code}
> SET sql_mode='ANSI_QUOTES';
> {code}
>



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC

2017-02-13 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-5034:
---
Labels:   (was: ready-to-commit)

> Select timestamp from hive generated parquet always return in UTC
> -
>
> Key: DRILL-5034
> URL: https://issues.apache.org/jira/browse/DRILL-5034
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Krystal
>Assignee: Vitalii Diravka
>
> commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904
> Reading timestamp data against a hive parquet table from drill automatically 
> converts the timestamp data to UTC. 
> {code}
> SELECT TIMEOFDAY() FROM (VALUES(1));
> +--+
> |EXPR$0|
> +--+
> | 2016-11-10 12:33:26.547 America/Los_Angeles  |
> +--+
> {code}
> data schema:
> {code}
> message hive_schema {
>   optional int32 voter_id;
>   optional binary name (UTF8);
>   optional int32 age;
>   optional binary registration (UTF8);
>   optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
>   optional int32 voterzone;
>   optional int96 create_timestamp;
>   optional int32 create_date (DATE);
> }
> {code}
> Using drill-1.8, the returned timestamps match the table data:
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> `/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-23 20:03:58.0  |
> | null   |
> | 2016-09-09 12:01:18.0  |
> | 2017-03-06 20:35:55.0  |
> | 2017-01-20 22:32:43.0  |
> ++
> 5 rows selected (1.032 seconds)
> {code}
> If the user timzone is changed to UTC, then the timestamp data is returned in 
> UTC time.
> Using drill-1.9, the returned timestamps got converted to UTC eventhough the 
> user timezone is in PST.
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
> {code}
> alter session set `store.parquet.reader.int96_as_timestamp`=true;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | store.parquet.reader.int96_as_timestamp updated.  |
> +---+---+
> select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` 
> limit 5;
> ++
> |create_timestamp|
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-4203) Parquet File : Date is stored wrongly

2016-08-19 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-4203:
--

Assignee: Vitalii Diravka

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4763) Parquet file with DATE logical type produces wrong results for simple SELECT

2016-08-16 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422785#comment-15422785
 ] 

Vitalii Diravka edited comment on DRILL-4763 at 8/16/16 3:54 PM:
-

The following logic is used for parquet dates in drill now (for example to 
calculate the first day from unix epoch (1 January 1970)):
{code}
(Julian_day)*2 = unix_first_day
((4713_BE+1970)*365,26)*2 = 4881176
{code}
According to drill and parquet documentations should use the following logic:
{code}
unix_first_day = 0
{code}

*For example:*
Parquet file created from hive:
{code}
hive> select * from test_parquet;
OK
1970-01-05  17:51   Visakh
Time taken: 0.046 seconds, Fetched: 1 row(s)
{code}
{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar 
parquet-tools-1.6.0rc3-SNAPSHOT.jar cat /tmp/parquetFolder/test_parquet/
dt = 4
tm = 17:51
nm = Visakh
{code}
{code}
Running 
org.apache.drill.exec.store.parquet.columnreaders.TestDateReader#testParquetDate
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
1 row(s):
---
| dt  | tm   
| nm   |
---
| -11395-10-22T00:00:00.000Z  | 17:51   
| Visakh  |
---
Total record count: 1
{code}
Parquet file created from drill:
{code}
0: jdbc:drill:zk=local> select * from drill_parquet;
+---+-+
| current_date  | unix_first_day  |
+---+-+
| 2016-08-15| 1970-01-01  |
+---+-+
1 row selected (0.142 seconds)
{code}

{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar 
parquet-tools-1.6.0rc3-SNAPSHOT.jar cat /tmp/drill_parquet/
current_date = 4898204
unix_first_day = 4881176
{code}

*With fix:*
{code}
0: jdbc:drill:zk=local> create table drill_parquet_with_fix as SELECT 
current_date, CAST('1970-01-05' as date) as unix_fifth_day, CAST('1970-01-01' 
as date) as unix_first_day FROM (VALUES(1));
+---++
| Fragment  | Number of records written  |
+---++
| 0_0   | 1  |
+---++
1 row selected (0.257 seconds)
0: jdbc:drill:zk=local> select * from drill_parquet_with_fix;
+---+-+-+
| current_date  | unix_fifth_day  | unix_first_day  |
+---+-+-+
| 2016-08-15| 1970-01-05  | 1970-01-01  |
+---+-+-+
1 row selected (0.174 seconds)
{code}
{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar 
parquet-tools-1.6.0rc3-SNAPSHOT.jar cat /tmp/drill_parquet_with_fix
current_date = 17028
unix_fifth_day = 4
unix_first_day = 0
{code}


was (Author: vitalii):
The following logic is used to calculate parquet date in drill now:
{code}
(Julian_day)*2 = unix_first_day
((4713_BE+1970)*365,26)*2 = 4881176
{code}
According to drill doc should use the following logic:
{code}
(Julian_day +_1970)*365,26 = unix_first_day
(4713_BE+1970)*365,26 = 2457615
{code}
According to parquet doc should use the following logic which is the right case:
{code}
unix_first_day = 0
{code}


*For example:*
Parquet file created from hive:
{code}
hive> select * from test_parquet;
OK
1970-01-05  17:51   Visakh
Time taken: 0.046 seconds, Fetched: 1 row(s)
{code}
{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar 
parquet-tools-1.6.0rc3-SNAPSHOT.jar cat /tmp/parquetFolder/test_parquet/
dt = 4
tm = 17:51
nm = Visakh
{code}
{code}
Running 
org.apache.drill.exec.store.parquet.columnreaders.TestDateReader#testParquetDate
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
1 row(s):
---
| dt  | tm   
| nm   |

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-08-22 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430540#comment-15430540
 ] 

Vitalii Diravka commented on DRILL-4203:


[~jaltekruse], I picked up your pull request 
[PR-341|https://github.com/apache/drill/pull/341#issuecomment-175917437]. 
This is my branch with your commits, rebased to the master version 
https://github.com/vdiravka/drill/commits/DRILL-4203. 

I'm going to open a new PR, please take a look at last three commits in this 
branch. Are there all changes which you want to add to master in context of 
DRILL-4203? There is also one my commit, please take a look at that too.


> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4874) "No UserGroupInformation while generating ORC splits" - hive known issue in 1.2.0-mapr-1607 release.

2016-09-02 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-4874:
--

 Summary: "No UserGroupInformation while generating ORC splits" - 
hive known issue in 1.2.0-mapr-1607 release.
 Key: DRILL-4874
 URL: https://issues.apache.org/jira/browse/DRILL-4874
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Affects Versions: 1.7.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
 Fix For: 1.8.0


Need upgrade drill to 1.2.0-mapr-1608 hive.version where hvie issue 
[HIVE-13120|https://issues.apache.org/jira/browse/HIVE-13120] is fixed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4874) "No UserGroupInformation while generating ORC splits" - hive known issue in 1.2.0-mapr-1607 release.

2016-09-02 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4874:
---
Description: 
Need upgrade drill to 1.2.0-mapr-1608 hive.version where [hive issue 
HIVE-13120|https://issues.apache.org/jira/browse/HIVE-13120] is fixed.


  was:
Need upgrade drill to 1.2.0-mapr-1608 hive.version where hvie issue 
[HIVE-13120|https://issues.apache.org/jira/browse/HIVE-13120] is fixed.



> "No UserGroupInformation while generating ORC splits" - hive known issue in 
> 1.2.0-mapr-1607 release.
> 
>
> Key: DRILL-4874
> URL: https://issues.apache.org/jira/browse/DRILL-4874
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.7.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.8.0
>
>
> Need upgrade drill to 1.2.0-mapr-1608 hive.version where [hive issue 
> HIVE-13120|https://issues.apache.org/jira/browse/HIVE-13120] is fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4874) "No UserGroupInformation while generating ORC splits" - hive known issue in 1.2.0-mapr-1607 release.

2016-09-05 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4874:
---
Fix Version/s: (was: 1.8.0)
   1.9.0

> "No UserGroupInformation while generating ORC splits" - hive known issue in 
> 1.2.0-mapr-1607 release.
> 
>
> Key: DRILL-4874
> URL: https://issues.apache.org/jira/browse/DRILL-4874
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.7.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.9.0
>
>
> Need upgrade drill to 1.2.0-mapr-1608 hive.version where [hive issue 
> HIVE-13120|https://issues.apache.org/jira/browse/HIVE-13120] is fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-08-31 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-4373:
--

Assignee: Vitalii Diravka

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3290) Hive Storage : Add support for Hive complex types

2016-08-31 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451933#comment-15451933
 ] 

Vitalii Diravka commented on DRILL-3290:


Use case to reproduce it:
{code}CREATE TABLE `complex`(
   `col1` ARRAY, 
   `col2` MAP, 
   `col3` STRUCT,
   `col4` UNIONTYPE)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
  COLLECTION ITEMS TERMINATED BY '&' 
  MAP KEYS TERMINATED BY '#'
  LINES TERMINATED BY '\n' STORED AS TEXTFILE;
{code}
Create a csv file:
{code}arr1,101#map1&102#map2,11_1,0&3.141459
arr3,103#map3&104#map4,12_2,1
arr3,103#map3&104#map4,12_2,2 to go home{code}

Load this data in hive table:
{code}LOAD DATA LOCAL INPATH '/home/dev/complex_data.csv' into table 
complex;{code}
Note: Assuming file is located at /home/dev/complex_data.csv


Result from hive:
{code}hive> select * from complex;
OK
["arr1","arr2"] {101:"map1",102:"map2"} {"c1":11,"c2":"varchar_1"}  {0:3}
["arr3","arr4"] {103:"map3",104:"map4"} {"c1":12,"c2":"varchar_2"}  {1:true}
["arr3","arr4"] {103:"map3",104:"map4"} {"c1":12,"c2":"varchar_2"}  
{2:"Time to go home"}
Time taken: 0.065 seconds, Fetched: 3 row(s){code}
Result from dril:
{code}0: jdbc:drill:> use hive;
+---+---+
|  ok   |  summary  |
+---+---+
| true  | Default schema changed to [hive]  |
+---+---+
1 row selected (0.252 seconds)
0: jdbc:drill:> select * from complex;
Error: SYSTEM ERROR: IndexOutOfBoundsException: index (4) must be less than 
size (4)


[Error Id: c6701738-0bd9-4702-a936-73f8a60f34fe on node1:31010] (state=,code=0)
0: jdbc:drill:> select col1 from complex;
Error: UNSUPPORTED_OPERATION ERROR: Unsupported Hive data type LIST. 
Following Hive data types are supported in Drill for querying: BOOLEAN, 
TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DATE, TIMESTAMP, BINARY, 
DECIMAL, STRING, and VARCHAR

Fragment 0:0

[Error Id: 824c9d6d-be1c-4707-8ff1-e8db03141f07 on node1:31010] (state=,code=0)
0: jdbc:drill:> select col2 from complex;
Error: UNSUPPORTED_OPERATION ERROR: Unsupported Hive data type MAP. 
Following Hive data types are supported in Drill for querying: BOOLEAN, 
TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DATE, TIMESTAMP, BINARY, 
DECIMAL, STRING, and VARCHAR

Fragment 0:0

[Error Id: 77f146fb-6102-43e0-bb39-3262e50d6357 on node1:31010] (state=,code=0)
0: jdbc:drill:> select col3 from complex;
Error: SYSTEM ERROR: IndexOutOfBoundsException: index (1) must be less than 
size (1)


[Error Id: e8a46ec1-b8d2-4fea-9eba-0cab20bd766e on node1:31010] (state=,code=0)
0: jdbc:drill:> select col4 from complex;
Error: UNSUPPORTED_OPERATION ERROR: Unsupported Hive data type UNION. 
Following Hive data types are supported in Drill for querying: BOOLEAN, 
TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DATE, TIMESTAMP, BINARY, 
DECIMAL, STRING, and VARCHAR

Fragment 0:0

[Error Id: 44d50ae2-21fc-44a0-924a-cc2a0a7c6cc1 on node1:31010] 
(state=,code=0){code}

> Hive Storage : Add support for Hive complex types
> -
>
> Key: DRILL-3290
> URL: https://issues.apache.org/jira/browse/DRILL-3290
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Hive, Storage - Hive
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> Improve the hive storage plugin to add support for complex types in hive. 
> Below are the complex types hive supports
> {code}
> ARRAY
>  MAP
> STRUCT
> UNIONTYPE
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-08-31 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451891#comment-15451891
 ] 

Vitalii Diravka commented on DRILL-4373:


[~rkins] As I see you have an error cause drill and hive use different data 
types for timestamp logical type: hive uses int96 (the reason is nanoseconds 
accuracy), but drill uses int64 (special data type for timestamps with 
appropriate meta annotation due to [parquet 
documentation|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md],
 used for microseconds or milliseconds accuracy). Therefore drill stores 
timestamps correctly and hive must be able to read such parquet files: 
https://issues.apache.org/jira/browse/HIVE-13435.

Another issue is that Drill can read hive timestamps from parquet files but 
with using CONVERT_FROM function. By default drill converts INT96 to VARBINARY.
I'm going to implement in context of this jira ability for drill to interpret 
hive timestamp in parquet files as timestamp implicitly by default, but with 
controlling it by session/system option (for the case if a new datatype will be 
stored as INT96 in the parquet file).


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-09-30 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15535895#comment-15535895
 ] 

Vitalii Diravka commented on DRILL-4373:


So I added int96 to timestamp converter for both parquet readers and controling 
it by system / session option "store.parquet.int96_as_timestamp". 
The value of the option is false by default for the proper work of the old 
query scripts with the "convert_from TIMESTAMP_IMPALA" function. 

When the option is true using of that function is unnesessary and can lead to 
the query fail. 


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-05 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548372#comment-15548372
 ] 

Vitalii Diravka commented on DRILL-4203:


[~rkins] If parquet metadata [consists "creator: 
parquet-mr"|https://github.com/vdiravka/drill/blob/501e7a2fc033a4b99c1cc75adcaa4d835a9acb2e/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java#L192]
 it is still could be generated with drill. So to determine it we [check | 
https://github.com/vdiravka/drill/blob/501e7a2fc033a4b99c1cc75adcaa4d835a9acb2e/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java#L239]
 column metadata min/max values.

And parquet files stores date values as INT32 that stores the number of days 
from the Unix epoch 
[https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date]. 
So it is possible to store dates over 10,000 years. [Here 
(DRILL-4763)|https://issues.apache.org/jira/browse/DRILL-4763] the example how 
drill now reads correct parquet files which are generated from another tool 
(-11395-10-18T00:00:00.000-07:52:58 0).

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-06 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551878#comment-15551878
 ] 

Vitalii Diravka commented on DRILL-4203:


It was a special case: exception while parsing "created by" meta. And it was 
expected only from drill files. But this file with correct dates. 
So I changed appropriate logic, now column metadata is checked for this case. 
Also added unit test with this paruet file. Changes in last commit.
Thanks [~rkins].

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-05 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549577#comment-15549577
 ] 

Vitalii Diravka edited comment on DRILL-4203 at 10/5/16 6:27 PM:
-

[~rkins] It is right that drill auto corrects it. And yes, you are. You are 
using not that option. The behaviour of both readers is the same. If you want 
to disable "auto correction" you should use the parquet config in the plugin 
settings. Something like this: {code}  "formats": {
"parquet": {
  "type": "parquet",
  "autoCorrectCorruptDates": false
}{code}
Or you can try to use the next query: {code}select l_shipdate, l_commitdate 
from 
table(dfs.`/drill/testdata/parquet_date/dates_nodrillversion/drillgen2_lineitem`
 (type => 'parquet', autoCorrectCorruptDates => false)) limit 1;{code}

And it would be good more investigate the possibility to store from drill dates 
over  years, cause from drill shell I can't get such values: {code}0: 
jdbc:drill:zk=local> select TO_DATE(26278490460) from (VALUES(1));
+-+
|   EXPR$0|
+-+
| 297-04-27  |
+-+
{code}
But from drill unit test I can do it:
{code}  @Test
  public void myTest() throws Exception {
String query = "select TO_DATE(26278490460) from (VALUES(1))";
setColumnWidths(new int[] {35});
List sqlWithResults = testSqlWithResults(query);
printResult(sqlWithResults);
  }
1 row(s):
--
| EXPR$0 |
--
| 10297-04-27T22:50:00.000Z  |
--
{code}


was (Author: vitalii):
[~rkins] It is right that drill auto correct it. And yes, you are. You are 
using not that option. The behaviour of both readers is the same. If you want 
to disable "auto correction" you should use the parquet config in the plugin 
settings. Something like this: {code}  "formats": {
"parquet": {
  "type": "parquet",
  "autoCorrectCorruptDates": false
}{code}
Or you can try to use the next query: {code}select l_shipdate, l_commitdate 
from 
table(dfs.`/drill/testdata/parquet_date/dates_nodrillversion/drillgen2_lineitem`
 (type => 'parquet', autoCorrectCorruptDates => false)) limit 1;{code}

And it would be good more investigate the possibility to store from drill dates 
over  years, cause from drill shell I can't got such values: {code}0: 
jdbc:drill:zk=local> select TO_DATE(26278490460) from (VALUES(1));
+-+
|   EXPR$0|
+-+
| 297-04-27  |
+-+
{code}
But from drill unit test I can do it:
{code}  @Test
  public void myTest() throws Exception {
String query = "select TO_DATE(26278490460) from (VALUES(1))";
setColumnWidths(new int[] {35});
List sqlWithResults = testSqlWithResults(query);
printResult(sqlWithResults);
  }
1 row(s):
--
| EXPR$0 |
--
| 10297-04-27T22:50:00.000Z  |
--
{code}

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-05 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549577#comment-15549577
 ] 

Vitalii Diravka commented on DRILL-4203:


[~rkins] It is right that drill auto correct it. And yes, you are. You are 
using not that option. The behaviour of both readers is the same. If you want 
to disable "auto correction" you should use the parquet config in the plugin 
settings. Something like this: {code}  "formats": {
"parquet": {
  "type": "parquet",
  "autoCorrectCorruptDates": false
}{code}
Or you can try to use the next query: {code}select l_shipdate, l_commitdate 
from 
table(dfs.`/drill/testdata/parquet_date/dates_nodrillversion/drillgen2_lineitem`
 (type => 'parquet', autoCorrectCorruptDates => false)) limit 1;{code}

And it would be good more investigate the possibility to store from drill dates 
over  years, cause from drill shell I can't got such values: {code}0: 
jdbc:drill:zk=local> select TO_DATE(26278490460) from (VALUES(1));
+-+
|   EXPR$0|
+-+
| 297-04-27  |
+-+
{code}
But from drill unit test I can do it:
{code}  @Test
  public void myTest() throws Exception {
String query = "select TO_DATE(26278490460) from (VALUES(1))";
setColumnWidths(new int[] {35});
List sqlWithResults = testSqlWithResults(query);
printResult(sqlWithResults);
  }
1 row(s):
--
| EXPR$0 |
--
| 10297-04-27T22:50:00.000Z  |
--
{code}

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4337) Drill fails to read INT96 fields from hive generated parquet files

2016-08-26 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440697#comment-15440697
 ] 

Vitalii Diravka commented on DRILL-4337:


[~rkins] Do you have the same errors while using CONVERT_FROM(timestamp_col, 
'TIMESTAMP_IMPALA')?

> Drill fails to read INT96 fields from hive generated parquet files
> --
>
> Key: DRILL-4337
> URL: https://issues.apache.org/jira/browse/DRILL-4337
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Rahul Challapalli
>Priority: Critical
> Attachments: hive1_fewtypes_null.parquet
>
>
> git.commit.id.abbrev=576271d
> Cluster : 2 nodes running MaprFS 4.1
> The data file used in the below table is generated from hive. Below is output 
> from running the same query multiple times. 
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select timestamp_col from 
> hive1_fewtypes_null;
> Error: SYSTEM ERROR: NegativeArraySizeException
> Fragment 0:0
> [Error Id: 5517e983-ccae-4c96-b09c-30f331919e56 on qa-node191.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:zk=10.10.100.190:5181> select timestamp_col from 
> hive1_fewtypes_null;
> Error: SYSTEM ERROR: IllegalArgumentException: Reading past RLE/BitPacking 
> stream.
> Fragment 0:0
> [Error Id: 94ed5996-d2ac-438d-b460-c2d2e41bdcc3 on qa-node191.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:zk=10.10.100.190:5181> select timestamp_col from 
> hive1_fewtypes_null;
> Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 0
> Fragment 0:0
> [Error Id: 41dca093-571e-49e5-a2ab-fd69210b143d on qa-node191.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:zk=10.10.100.190:5181> select timestamp_col from 
> hive1_fewtypes_null;
> ++
> | timestamp_col  |
> ++
> | null   |
> | [B@7c766115|
> | [B@3fdfe989|
> | null   |
> | [B@55d4222 |
> | [B@2da0c8ee|
> | [B@16e798a9|
> | [B@3ed78afe|
> | [B@38e649ed|
> | [B@16ff83ca|
> | [B@61254e91|
> | [B@5849436a|
> | [B@31e9116e|
> | [B@3c77665b|
> | [B@42e0ff60|
> | [B@419e19ed|
> | [B@72b83842|
> | [B@1c75afe5|
> | [B@726ef1fb|
> | [B@51d0d06e|
> | [B@64240fb8|
> +
> {code}
> Attached the log, hive ddl used to generate the parquet file and the parquet 
> file itself



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3290) Hive Storage : Add support for Hive complex types

2016-08-26 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-3290:
--

Assignee: Vitalii Diravka

> Hive Storage : Add support for Hive complex types
> -
>
> Key: DRILL-3290
> URL: https://issues.apache.org/jira/browse/DRILL-3290
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Hive, Storage - Hive
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> Improve the hive storage plugin to add support for complex types in hive. 
> Below are the complex types hive supports
> {code}
> ARRAY
>  MAP
> STRUCT
> UNIONTYPE
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-09-28 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15529575#comment-15529575
 ] 

Vitalii Diravka commented on DRILL-4203:


[~rkins] You can expand your test plan according to the new unit tests 
[TestCorruptParquetDateCorrection|https://github.com/vdiravka/drill/blob/cc4abebc33a02fc712889395d20410902184c142/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java]
 with cases when old parquet metadata cache file is already existed with 
parquet table.
And one note about: {quote}Ensure that the version number in the parquet header 
is updated immediately once the fix is in place.
This makes sure that we are not auto-correcting date column when there is 
actually no need.{quote} 
Once the fix is in place the parquet header is updated with a new extra field 
"is.date.correct = true". This property makes sure that we are not 
auto-correcting date column when there is actually no need.

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4203) Parquet File : Date is stored wrongly

2016-09-28 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15529814#comment-15529814
 ] 

Vitalii Diravka edited comment on DRILL-4203 at 9/28/16 2:29 PM:
-

No, it will not. This property in the parquet metadata. For example: 
{code}
creator: parquet-mr version 1.8.1-drill-r0 (build 
6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
extra:   drill.version = 1.9.0-SNAPSHOT 
extra:   is.date.correct = true
{code}
Old drill versions just can’t read this meta info but can read new parquet 
files.
Although by reason of new parquet files will consist correct DATE values, old 
drill versions, surely, will read these DATE values incorrectly from new 
parquet files.


was (Author: vitalii):
No, it will not. This property in parquet metadata. For example: 
{code}
creator: parquet-mr version 1.8.1-drill-r0 (build 
6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
extra:   drill.version = 1.9.0-SNAPSHOT 
extra:   is.date.correct = true
{code}
Old drill versions just can’t read this meta info but can read new parquet 
files.
Although by reason of new parquet files will consist correct DATE values, old 
drill versions, surely, will read these DATE values incorrectly from new 
parquet files.

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-09-28 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15529814#comment-15529814
 ] 

Vitalii Diravka commented on DRILL-4203:


No, it will not. This property in parquet metadata. For example: 
{code}
creator: parquet-mr version 1.8.1-drill-r0 (build 
6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
extra:   drill.version = 1.9.0-SNAPSHOT 
extra:   is.date.correct = true
{code}
Old drill versions just can’t read this meta info but can read new parquet 
files.
Although by reason of new parquet files will consist correct DATE values, old 
drill versions, surely, will read these DATE values incorrectly from new 
parquet files.

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection

2016-10-28 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-4980:
--

 Summary: Upgrading of the approach of parquet date correctness 
status detection
 Key: DRILL-4980
 URL: https://issues.apache.org/jira/browse/DRILL-4980
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.8.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
 Fix For: 1.9.0


This jira is an addition for the 
[DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203].
The date correctness label for the new generated parquet files should be 
upgraded. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4996) Parquet Date auto-correction is not working in auto-partitioned parquet files generated by drill-1.6

2016-11-07 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645729#comment-15645729
 ] 

Vitalii Diravka commented on DRILL-4996:


Right. Moreover drill-1.6.0 (which generated that file) will show incorrect 
date values too. 
Cause fefore DRILL-4203 fix drill could not read any correct date values in 
parquet files. 
To see are date values correct, you can use parquet tools.
{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar 
parquet-tools-1.6.0rc3-SNAPSHOT.jar meta 
/home/vitalii/Downloads/1.6/0_0_1.parquet
file: file:/home/vitalii/Downloads/1.6/0_0_1.parquet 
creator:  parquet-mr version 1.8.1-drill-r0 (build 
6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
extra:drill.version = 1.6.0 
file schema:  root 

i_rec_start_date: OPTIONAL INT32 O:DATE R:0 D:1
i_rec_end_date:   OPTIONAL INT32 O:DATE R:0 D:1
...
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar 
parquet-tools-1.6.0rc3-SNAPSHOT.jar cat 
/home/vitalii/Downloads/1.6/0_0_1.parquet

i_rec_start_date = 10161
i_rec_end_date = 10891
.
{code}
Incorrect values more on 4881176. 

> Parquet Date auto-correction is not working in auto-partitioned parquet files 
> generated by drill-1.6
> 
>
> Key: DRILL-4996
> URL: https://issues.apache.org/jira/browse/DRILL-4996
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: Rahul Challapalli
>Priority: Critical
> Attachments: item.tgz
>
>
> git.commit.id.abbrev=4ee1d4c
> Below are the steps I followed to generate the data :
> {code}
> 1. Generate a parquet file with date column using hive1.2
> 2. Use drill 1.6 to create auto-partitioned parquet files partitioned on the 
> date column
> {code}
> Now the below query returns wrong results :
> {code}
> select i_rec_start_date, i_size from 
> dfs.`/drill/testdata/parquet_date/auto_partition/item_multipart_autorefresh`  
> group by i_rec_start_date, i_size;
> +---+--+
> | i_rec_start_date  |i_size|
> +---+--+
> | null  | large|
> | 366-11-08| extra large  |
> | 366-11-08| medium   |
> | null  | medium   |
> | 366-11-08| petite   |
> | 364-11-07| medium   |
> | null  | petite   |
> | 365-11-07| medium   |
> | 368-11-07| economy  |
> | 365-11-07| large|
> | 365-11-07| small|
> | 366-11-08| small|
> | 365-11-07| extra large  |
> | 364-11-07| N/A  |
> | 366-11-08| economy  |
> | 366-11-08| large|
> | 364-11-07| small|
> | null  | small|
> | 364-11-07| large|
> | 364-11-07| extra large  |
> | 368-11-07| N/A  |
> | 368-11-07| extra large  |
> | 368-11-07| large|
> | 365-11-07| petite   |
> | null  | N/A  |
> | 365-11-07| economy  |
> | 364-11-07| economy  |
> | 364-11-07| petite   |
> | 365-11-07| N/A  |
> | 368-11-07| medium   |
> | null  | extra large  |
> | 368-11-07| small|
> | 368-11-07| petite   |
> | 366-11-08| N/A  |
> +---+--+
> 34 rows selected (0.691 seconds)
> {code}
> However I tried generating the auto-partitioned parquet files using Drill 1.2 
> and then the above query returned the right results.
> I attached the required data sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-5004) Parquet date correction gives null pointer exception if there is no createdBy entry in the metadata

2016-11-04 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-5004:
--

Assignee: Vitalii Diravka

> Parquet date correction gives null pointer exception if there is no createdBy 
> entry in the metadata
> ---
>
> Key: DRILL-5004
> URL: https://issues.apache.org/jira/browse/DRILL-5004
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Vitalii Diravka
> Attachments: DRILL-5004.parquet
>
>
> If the Parquet metadata does not contain a createdBy entry, the date 
> corruption detection code gives a NPE
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >