[jira] [Commented] (DRILL-5048) Fix type mismatch error in case statement with null timestamp

2017-03-27 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944582#comment-15944582
 ] 

Khurram Faraaz commented on DRILL-5048:
---

[~knguyen] I have the tests for this one since we filed this when we tested 
case expressions with constants. I will add those tests to verify this Fix.

> Fix type mismatch error in case statement with null timestamp
> -
>
> Key: DRILL-5048
> URL: https://issues.apache.org/jira/browse/DRILL-5048
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> AssertionError when we use case with timestamp and null:
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT res, CASE res WHEN true THEN 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) ELSE null END
> . . . . . . . . . . . . . . > FROM
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > SELECT
> . . . . . . . . . . . . . . > (CASE WHEN (false) THEN null ELSE 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) END) res
> . . . . . . . . . . . . . . > FROM (values(1)) foo
> . . . . . . . . . . . . . . > ) foobar;
> Error: SYSTEM ERROR: AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> [Error Id: b56e0a4d-2f9e-4afd-8c60-5bc2f9d31f8f on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:1696) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSubset.add(RelSubset.java:295) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSet.add(RelSet.java:147) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1818)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1760)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1017)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1037)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1940)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:138)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> ... 16 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5048) Fix type mismatch error in case statement with null timestamp

2017-03-27 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-5048:
--
Reviewer: Khurram Faraaz  (was: Krystal)

> Fix type mismatch error in case statement with null timestamp
> -
>
> Key: DRILL-5048
> URL: https://issues.apache.org/jira/browse/DRILL-5048
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> AssertionError when we use case with timestamp and null:
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT res, CASE res WHEN true THEN 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) ELSE null END
> . . . . . . . . . . . . . . > FROM
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > SELECT
> . . . . . . . . . . . . . . > (CASE WHEN (false) THEN null ELSE 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) END) res
> . . . . . . . . . . . . . . > FROM (values(1)) foo
> . . . . . . . . . . . . . . > ) foobar;
> Error: SYSTEM ERROR: AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> [Error Id: b56e0a4d-2f9e-4afd-8c60-5bc2f9d31f8f on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:1696) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSubset.add(RelSubset.java:295) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSet.add(RelSet.java:147) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1818)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1760)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1017)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1037)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1940)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:138)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> ... 16 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-4956) Temporary tables support

2017-03-27 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-4956.
-

Verified, tests are added here - framework/resources/Functional/cttas

> Temporary tables support
> 
>
> Key: DRILL-4956
> URL: https://issues.apache.org/jira/browse/DRILL-4956
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> Link to design doc - 
> https://docs.google.com/document/d/1gSRo_w6q2WR5fPx7SsQ5IaVmJXJ6xCOJfYGyqpVOC-g/edit
> Gist - 
> https://gist.github.com/arina-ielchiieva/50158175867a18eee964b5ba36455fbf#file-temporarytablessupport-md
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-5293) Poor performance of Hash Table due to same hash value as distribution below

2017-03-27 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua closed DRILL-5293.
---

Marking as verified and closing.

> Poor performance of Hash Table due to same hash value as distribution below
> ---
>
> Key: DRILL-5293
> URL: https://issues.apache.org/jira/browse/DRILL-5293
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.8.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> The computation of the hash value is basically the same whether for the Hash 
> Table (used by Hash Agg, and Hash Join), or for distribution of rows at the 
> exchange. As a result, a specific Hash Table (in a parallel minor fragment) 
> gets only rows "filtered out" by the partition below ("upstream"), so the 
> pattern of this filtering leads to a non uniform usage of the hash buckets in 
> the table.
>   Here is a simplified example: An exchange partitions into TWO (minor 
> fragments), each running a Hash Agg. So the partition sends rows of EVEN hash 
> values to the first, and rows of ODD hash values to the second. Now the first 
> recomputes the _same_ hash value for its Hash table -- and only the even 
> buckets get used !!  (Or with a partition into EIGHT -- possibly only one 
> eighth of the buckets would be used !! ) 
>This would lead to longer hash chains and thus a _poor performance_ !
> A possible solution -- add a distribution function distFunc (only for 
> partitioning) that takes the hash value and "scrambles" it so that the 
> entropy in all the bits effects the low bits of the output. This function 
> should be applied (in HashPrelUtil) over the generated code that produces the 
> hash value, like:
>distFunc( hash32(field1, hash32(field2, hash32(field3, 0))) );
> Tested with a huge hash aggregate (64 M rows) and a parallelism of 8 ( 
> planner.width.max_per_node = 8 ); minor fragments 0 and 4 used only 1/8 of 
> their buckets, the others used 1/4 of their buckets.  Maybe the reason for 
> this variance is that distribution is using "hash32AsDouble" and hash agg is 
> using "hash32".  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4847) Window function query results in OOM Exception.

2017-03-27 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944548#comment-15944548
 ] 

Khurram Faraaz commented on DRILL-4847:
---

[~zelaine] Tried with the new external sort, OOM still exists on Apache Drill 
1.11.0 commit id: adbf363

{noformat}
apache drill 1.11.0-SNAPSHOT
"a drill in the hand is better than two in the bush"
0: jdbc:drill:schema=dfs.tmp> SELECT clientname, audiencekey, spendprofileid, 
postalcd, provincecd, provincename, postalcode_json, country_json, 
province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER (PARTITION BY 
spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 ELSE 0 END) ASC, 
provincecd ASC) as rn FROM `MD593.parquet` limit 3;
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the 
query.

Failure while allocating buffer.
Fragment 0:0

[Error Id: 6e2ab7f6-ab80-44a7-a053-2d286a2d54f2 on centos-01.qa.lab:31010] 
(state=,code=0)
0: jdbc:drill:schema=dfs.tmp> ALTER SESSION SET `exec.sort.disable_managed` = 
false;
+---+-+
|  ok   |   summary   |
+---+-+
| true  | exec.sort.disable_managed updated.  |
+---+-+
1 row selected (0.149 seconds)
0: jdbc:drill:schema=dfs.tmp> SELECT clientname, audiencekey, spendprofileid, 
postalcd, provincecd, provincename, postalcode_json, country_json, 
province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER (PARTITION BY 
spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 ELSE 0 END) ASC, 
provincecd ASC) as rn FROM `MD593.parquet` limit 3;
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the 
query.

Failure while allocating buffer.
Fragment 0:0

[Error Id: 05262a48-58c0-44b7-b27f-3dd80c42754c on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

Details from drillbit.log

{noformat}
2017-03-28 04:56:36,062 [272612fa-b7b4-bb93-3ddc-808d87156869:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
272612fa-b7b4-bb93-3ddc-808d87156869: SELECT clientname, audiencekey, 
spendprofileid, postalcd, provincecd, provincename, postalcode_json, 
country_json, province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER 
(PARTITION BY spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 ELSE 
0 END) ASC, provincecd ASC) as rn FROM `MD593.parquet` limit 3
2017-03-28 04:56:36,159 [272612fa-b7b4-bb93-3ddc-808d87156869:foreman] INFO  
o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
numFiles: 1
2017-03-28 04:56:36,160 [272612fa-b7b4-bb93-3ddc-808d87156869:foreman] INFO  
o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
numFiles: 1
2017-03-28 04:56:36,160 [272612fa-b7b4-bb93-3ddc-808d87156869:foreman] INFO  
o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
numFiles: 1
2017-03-28 04:56:36,160 [272612fa-b7b4-bb93-3ddc-808d87156869:foreman] INFO  
o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
numFiles: 1
2017-03-28 04:56:36,160 [272612fa-b7b4-bb93-3ddc-808d87156869:foreman] INFO  
o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
numFiles: 1
2017-03-28 04:56:36,160 [272612fa-b7b4-bb93-3ddc-808d87156869:foreman] INFO  
o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
numFiles: 1
2017-03-28 04:56:36,160 [272612fa-b7b4-bb93-3ddc-808d87156869:foreman] INFO  
o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
numFiles: 1
2017-03-28 04:56:36,160 [272612fa-b7b4-bb93-3ddc-808d87156869:foreman] INFO  
o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
numFiles: 1
2017-03-28 04:56:36,177 [272612fa-b7b4-bb93-3ddc-808d87156869:foreman] INFO  
o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses
2017-03-28 04:56:36,182 [272612fa-b7b4-bb93-3ddc-808d87156869:foreman] INFO  
o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 1 
using 1 threads. Time: 5ms total, 5.223963ms avg, 5ms max.
2017-03-28 04:56:36,183 [272612fa-b7b4-bb93-3ddc-808d87156869:foreman] INFO  
o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 1 
using 1 threads. Earliest start: 1.595000 μs, Latest start: 1.595000 μs, 
Average start: 1.595000 μs .
2017-03-28 04:56:36,183 [272612fa-b7b4-bb93-3ddc-808d87156869:foreman] INFO  
o.a.d.exec.store.parquet.Metadata - Took 5 ms to read file metadata
2017-03-28 04:56:36,383 [272612fa-b7b4-bb93-3ddc-808d87156869:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 272612fa-b7b4-bb93-3ddc-808d87156869:0:0: 
State change requested AWAITING_ALLOCATION --> RUNNING
2017-03-28 04:56:36,384 [272612fa-b7b4-bb93-3ddc-808d87156869:frag:0:0] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 272612fa-b7b4-bb93-3ddc-808d87156869:0:0: 

[jira] [Commented] (DRILL-4938) Report UserException when constant expression reduction fails

2017-03-27 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944481#comment-15944481
 ] 

Khurram Faraaz commented on DRILL-4938:
---

[~zelaine] thanks for the info. The new error PLAN ERROR seems better than the 
previous SYSTEM ERROR. Marking this one as closed.

> Report UserException when constant expression reduction fails
> -
>
> Key: DRILL-4938
> URL: https://issues.apache.org/jira/browse/DRILL-4938
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Serhii Harnyk
>Priority: Minor
> Fix For: 1.10.0
>
>
> We need a better error message instead of DrillRuntimeException
> Drill 1.9.0 git commit ID : 4edabe7a
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select (res1 = 2016/09/22) res2
> . . . . . . . . . . . . . . > from
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > select (case when (false) then null else 
> cast('2016/09/22' as date) end) res1
> . . . . . . . . . . . . . . > from (values(1)) foo
> . . . . . . . . . . . . . . > ) foobar;
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing 
> expression in constant expression evaluator [CASE(false, =(null, /(/(2016, 
> 9), 22)), =(CAST('2016/09/22'):DATE NOT NULL, /(/(2016, 9), 22)))].  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [castTIMESTAMP(INT-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.
> Error in expression at index -1.  Error: Missing function implementation: 
> [castTIMESTAMP(INT-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-4938) Report UserException when constant expression reduction fails

2017-03-27 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-4938.
-

> Report UserException when constant expression reduction fails
> -
>
> Key: DRILL-4938
> URL: https://issues.apache.org/jira/browse/DRILL-4938
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Serhii Harnyk
>Priority: Minor
> Fix For: 1.10.0
>
>
> We need a better error message instead of DrillRuntimeException
> Drill 1.9.0 git commit ID : 4edabe7a
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select (res1 = 2016/09/22) res2
> . . . . . . . . . . . . . . > from
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > select (case when (false) then null else 
> cast('2016/09/22' as date) end) res1
> . . . . . . . . . . . . . . > from (values(1)) foo
> . . . . . . . . . . . . . . > ) foobar;
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing 
> expression in constant expression evaluator [CASE(false, =(null, /(/(2016, 
> 9), 22)), =(CAST('2016/09/22'):DATE NOT NULL, /(/(2016, 9), 22)))].  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [castTIMESTAMP(INT-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.
> Error in expression at index -1.  Error: Missing function implementation: 
> [castTIMESTAMP(INT-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5378) Put more information into SchemaChangeException when HashJoin hit SchemaChangeException

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944339#comment-15944339
 ] 

ASF GitHub Bot commented on DRILL-5378:
---

Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/801
  
Agreed with @paul-rogers 's comment regarding overuse SchemaChangeException 
in the code. I actually planned to open a jira, if no one has been opened yet. 
We currently use SchemaChangeException in many situations, including real 
schema change or operational error for "something is wrong"  


> Put more information into SchemaChangeException when HashJoin hit 
> SchemaChangeException
> ---
>
> Key: DRILL-5378
> URL: https://issues.apache.org/jira/browse/DRILL-5378
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Minor
>
> HashJoin currently does not allow schema change in either build side or probe 
> side. When HashJoin hit SchemaChangeException in the middle of execution, 
> Drill reports a brief error message about SchemaChangeException, without 
> providing any information what schemas are in the incoming batches. That 
> makes hard to analyze the error, and understand what's going on. 
> It probably makes sense to put the two differing schemas in the error 
> message, so that user could get better idea about the schema change. 
> Before Drill can provide support for schema change in HashJoin, the detailed 
> error message would help user debug error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5390) Casting as decimal does not make drill use the decimal value vector

2017-03-27 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-5390:


 Summary: Casting as decimal does not make drill use the decimal 
value vector
 Key: DRILL-5390
 URL: https://issues.apache.org/jira/browse/DRILL-5390
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Rahul Challapalli



The below query should be using the decimal value vector. However it looks like 
it is using the float vector. If we feed the output of the below query to a 
CTAS statement then the parquet file created has a double type instead of a 
decimal type

{code}
alter session set `planner.enable_decimal_data_type` = true;
+---++
|  ok   |  summary   |
+---++
| true  | planner.enable_decimal_data_type updated.  |
+---++
1 row selected (0.39 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select typeof(col2) from (select 1 as 
col1, cast(2.0 as decimal(9,2)) as col2, cast(3.0 as decimal(9,2)) as col3 from 
cp.`tpch/lineitem.parquet` limit 1) d;
+-+
| EXPR$0  |
+-+
| FLOAT8  |
+-+
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5390) Casting as decimal does not make drill use the decimal value vector

2017-03-27 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-5390:
-
Component/s: Execution - Data Types

> Casting as decimal does not make drill use the decimal value vector
> ---
>
> Key: DRILL-5390
> URL: https://issues.apache.org/jira/browse/DRILL-5390
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>
> The below query should be using the decimal value vector. However it looks 
> like it is using the float vector. If we feed the output of the below query 
> to a CTAS statement then the parquet file created has a double type instead 
> of a decimal type
> {code}
> alter session set `planner.enable_decimal_data_type` = true;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | planner.enable_decimal_data_type updated.  |
> +---++
> 1 row selected (0.39 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select typeof(col2) from (select 1 as 
> col1, cast(2.0 as decimal(9,2)) as col2, cast(3.0 as decimal(9,2)) as col3 
> from cp.`tpch/lineitem.parquet` limit 1) d;
> +-+
> | EXPR$0  |
> +-+
> | FLOAT8  |
> +-+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5378) Put more information into SchemaChangeException when HashJoin hit SchemaChangeException

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944308#comment-15944308
 ] 

ASF GitHub Bot commented on DRILL-5378:
---

Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/801
  
+1


> Put more information into SchemaChangeException when HashJoin hit 
> SchemaChangeException
> ---
>
> Key: DRILL-5378
> URL: https://issues.apache.org/jira/browse/DRILL-5378
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Minor
>
> HashJoin currently does not allow schema change in either build side or probe 
> side. When HashJoin hit SchemaChangeException in the middle of execution, 
> Drill reports a brief error message about SchemaChangeException, without 
> providing any information what schemas are in the incoming batches. That 
> makes hard to analyze the error, and understand what's going on. 
> It probably makes sense to put the two differing schemas in the error 
> message, so that user could get better idea about the schema change. 
> Before Drill can provide support for schema change in HashJoin, the detailed 
> error message would help user debug error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5051) DRILL-5051: Fix incorrect result returned in nest query with offset specified

2017-03-27 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-5051:
-
Reviewer: Khurram Faraaz  (was: Rahul Challapalli)

> DRILL-5051: Fix incorrect result returned in nest query with offset specified
> -
>
> Key: DRILL-5051
> URL: https://issues.apache.org/jira/browse/DRILL-5051
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
> Environment: Fedora 24 / OpenJDK 8
>Reporter: Hongze Zhang
>Assignee: Hongze Zhang
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> My SQl:
> select count(1) from (select id from (select id from 
> cp.`tpch/lineitem.parquet` LIMIT 2) limit 1 offset 1) 
> This SQL returns nothing.
> Something goes wrong in LimitRecordBatch.java, and the reason is different 
> with [DRILL-4884|https://issues.apache.org/jira/browse/DRILL-4884?filter=-2]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5241) JDBC proxy driver: Do not put null value in map

2017-03-27 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944234#comment-15944234
 ] 

Rahul Challapalli commented on DRILL-5241:
--

A performance test probably might be a good way to verify the fix

> JDBC proxy driver: Do not put null value in map
> ---
>
> Key: DRILL-5241
> URL: https://issues.apache.org/jira/browse/DRILL-5241
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.9.0
>Reporter: David Haller
>Priority: Minor
>  Labels: ready-to-commit
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> See GitHub pull request: https://github.com/apache/drill/pull/724
> Hello everyone,
> proxyReturnClass is always null, so interfacesToProxyClassesMap will contain 
> null values only. Adding newProxyReturnClass should be correct.
> This bug does not affect functionality, but probably decreases performance 
> because you get "cache misses" all the time.
> Best regards,
> David.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5389) select 2 int96 using convert_from(col, 'TIMESTAMP_IMPALA') function fails

2017-03-27 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-5389:
---

Assignee: Vitalii Diravka

> select 2 int96 using convert_from(col, 'TIMESTAMP_IMPALA') function fails
> -
>
> Key: DRILL-5389
> URL: https://issues.apache.org/jira/browse/DRILL-5389
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Krystal
>Assignee: Vitalii Diravka
>
> I have a table containing 2 int96 time stamp columns. If I select one column 
> at a time, it works.
> select convert_from(create_timestamp1, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/hive1_parquet` where voter_id=3;
> ++
> | EXPR$0 |
> ++
> | 2017-04-14 02:27:55.0  |
> ++
> select convert_from(create_timestamp2, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/hive1_parquet` where voter_id=3;
> ++
> | EXPR$0 |
> ++
> | 2017-05-30 19:30:11.0  |
> ++
> However, if I include both columns on the same select, it fails:
> select convert_from(create_timestamp1, 'TIMESTAMP_IMPALA'), 
> convert_from(create_timestamp2, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/hive1_parquet` where voter_id=3;
> Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 0
> This is reproducible in drill-1.9 also.
> In drill-1.10, setting store.parquet.reader.int96_as_timestamp`=true, the 
> same query works fine.
> select create_timestamp1,create_timestamp2 from 
> dfs.`/user/hive/warehouse/hive1_parquet` where voter_id=3;
> +++
> |   create_timestamp1|   create_timestamp2|
> +++
> | 2017-04-14 02:27:55.0  | 2017-05-30 19:30:11.0  |
> +++



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5378) Put more information into SchemaChangeException when HashJoin hit SchemaChangeException

2017-03-27 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5378:

Reviewer: Aman Sinha

> Put more information into SchemaChangeException when HashJoin hit 
> SchemaChangeException
> ---
>
> Key: DRILL-5378
> URL: https://issues.apache.org/jira/browse/DRILL-5378
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Minor
>
> HashJoin currently does not allow schema change in either build side or probe 
> side. When HashJoin hit SchemaChangeException in the middle of execution, 
> Drill reports a brief error message about SchemaChangeException, without 
> providing any information what schemas are in the incoming batches. That 
> makes hard to analyze the error, and understand what's going on. 
> It probably makes sense to put the two differing schemas in the error 
> message, so that user could get better idea about the schema change. 
> Before Drill can provide support for schema change in HashJoin, the detailed 
> error message would help user debug error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4847) Window function query results in OOM Exception.

2017-03-27 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944225#comment-15944225
 ] 

Zelaine Fong commented on DRILL-4847:
-

OOM is coming from external sort, not the window function.

[~khfaraaz] - can you try this with the new external sort to see if this is 
still an issue.

To enable the new sort, run

ALTER SESSION SET `exec.sort.disable_managed` = false;

> Window function query results in OOM Exception.
> ---
>
> Key: DRILL-4847
> URL: https://issues.apache.org/jira/browse/DRILL-4847
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Priority: Critical
>  Labels: window_function
> Attachments: drillbit.log
>
>
> Window function query results in OOM Exception.
> Drill version 1.8.0-SNAPSHOT git commit ID: 38ce31ca
> MapRBuildVersion 5.1.0.37549.GA
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT clientname, audiencekey, spendprofileid, 
> postalcd, provincecd, provincename, postalcode_json, country_json, 
> province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER (PARTITION BY 
> spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 ELSE 0 END) ASC, 
> provincecd ASC) as rn FROM `MD593.parquet` limit 3;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Failure while allocating buffer.
> Fragment 0:0
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-08-16 07:25:44,590 [284d4006-9f9d-b893-9352-4f54f9b1d52a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 284d4006-9f9d-b893-9352-4f54f9b1d52a: SELECT clientname, audiencekey, 
> spendprofileid, postalcd, provincecd, provincename, postalcode_json, 
> country_json, province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER 
> (PARTITION BY spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 
> ELSE 0 END) ASC, provincecd ASC) as rn FROM `MD593.parquet` limit 3
> ...
> 2016-08-16 07:25:46,273 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/284d4006-9f9d-b893-9352-4f54f9b1d52a_majorfragment0_minorfragment0_operator8/2
> 2016-08-16 07:25:46,283 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Failure while allocating buffer.
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure 
> while allocating buffer.
> at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:187)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:331)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:307)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector.getTransferPair(RepeatedMapVector.java:161)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.SimpleVectorWrapper.cloneAndTransfer(SimpleVectorWrapper.java:66)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.cloneAndTransfer(VectorContainer.java:204)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.getTransferClone(VectorContainer.java:157)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> 

[jira] [Created] (DRILL-5389) select 2 int96 using convert_from(col, 'TIMESTAMP_IMPALA') function fails

2017-03-27 Thread Krystal (JIRA)
Krystal created DRILL-5389:
--

 Summary: select 2 int96 using convert_from(col, 
'TIMESTAMP_IMPALA') function fails
 Key: DRILL-5389
 URL: https://issues.apache.org/jira/browse/DRILL-5389
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.9.0, 1.10.0
Reporter: Krystal


I have a table containing 2 int96 time stamp columns. If I select one column at 
a time, it works.

select convert_from(create_timestamp1, 'TIMESTAMP_IMPALA') from 
dfs.`/user/hive/warehouse/hive1_parquet` where voter_id=3;
++
| EXPR$0 |
++
| 2017-04-14 02:27:55.0  |
++

select convert_from(create_timestamp2, 'TIMESTAMP_IMPALA') from 
dfs.`/user/hive/warehouse/hive1_parquet` where voter_id=3;
++
| EXPR$0 |
++
| 2017-05-30 19:30:11.0  |
++

However, if I include both columns on the same select, it fails:
select convert_from(create_timestamp1, 'TIMESTAMP_IMPALA'), 
convert_from(create_timestamp2, 'TIMESTAMP_IMPALA') from 
dfs.`/user/hive/warehouse/hive1_parquet` where voter_id=3;
Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 0

This is reproducible in drill-1.9 also.

In drill-1.10, setting store.parquet.reader.int96_as_timestamp`=true, the same 
query works fine.
select create_timestamp1,create_timestamp2 from 
dfs.`/user/hive/warehouse/hive1_parquet` where voter_id=3;
+++
|   create_timestamp1|   create_timestamp2|
+++
| 2017-04-14 02:27:55.0  | 2017-05-30 19:30:11.0  |
+++




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5378) Put more information into SchemaChangeException when HashJoin hit SchemaChangeException

2017-03-27 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944219#comment-15944219
 ] 

Jinfeng Ni commented on DRILL-5378:
---

[~amansinha100], could you please help review this pull request? Thanks!


> Put more information into SchemaChangeException when HashJoin hit 
> SchemaChangeException
> ---
>
> Key: DRILL-5378
> URL: https://issues.apache.org/jira/browse/DRILL-5378
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Minor
>
> HashJoin currently does not allow schema change in either build side or probe 
> side. When HashJoin hit SchemaChangeException in the middle of execution, 
> Drill reports a brief error message about SchemaChangeException, without 
> providing any information what schemas are in the incoming batches. That 
> makes hard to analyze the error, and understand what's going on. 
> It probably makes sense to put the two differing schemas in the error 
> message, so that user could get better idea about the schema change. 
> Before Drill can provide support for schema change in HashJoin, the detailed 
> error message would help user debug error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5378) Put more information into SchemaChangeException when HashJoin hit SchemaChangeException

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944216#comment-15944216
 ] 

ASF GitHub Bot commented on DRILL-5378:
---

GitHub user jinfengni opened a pull request:

https://github.com/apache/drill/pull/801

DRILL-5378: Put more information for schema change exception in hash …

…join, hash agg and sort operator.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinfengni/incubator-drill DRILL-5378

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/801.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #801


commit e29909a08146c5c766590c07015b5ba9137a8dee
Author: Jinfeng Ni 
Date:   2017-03-22T22:28:22Z

DRILL-5378: Put more information for schema change exception in hash join, 
hash agg and sort operator.




> Put more information into SchemaChangeException when HashJoin hit 
> SchemaChangeException
> ---
>
> Key: DRILL-5378
> URL: https://issues.apache.org/jira/browse/DRILL-5378
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Minor
>
> HashJoin currently does not allow schema change in either build side or probe 
> side. When HashJoin hit SchemaChangeException in the middle of execution, 
> Drill reports a brief error message about SchemaChangeException, without 
> providing any information what schemas are in the incoming batches. That 
> makes hard to analyze the error, and understand what's going on. 
> It probably makes sense to put the two differing schemas in the error 
> message, so that user could get better idea about the schema change. 
> Before Drill can provide support for schema change in HashJoin, the detailed 
> error message would help user debug error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5065) Optimize count(*) queries on MapR-DB JSON Tables

2017-03-27 Thread Abhishek Girish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-5065:
---
Reviewer: Abhishek Girish  (was: Rahul Challapalli)

> Optimize count(*) queries on MapR-DB JSON Tables
> 
>
> Key: DRILL-5065
> URL: https://issues.apache.org/jira/browse/DRILL-5065
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - MapRDB
>Affects Versions: 1.9.0
> Environment: Clusters with MapR v5.2.0 and above
>Reporter: Abhishek Girish
>Assignee: Smidth Panchamia
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> The JSON FileReader optimizes count(* ) queries, by only counting the number 
> of records in the files and discarding the data. This makes the query 
> execution faster & efficient. 
> We need a similar feature in the MapR format plugin (maprdb) to optimize _id 
> only projection & count(* ) queries on MapR-DB JSON Tables.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5293) Poor performance of Hash Table due to same hash value as distribution below

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5293:
-
Reviewer: Kunal Khatua  (was: Chunhui Shi)

> Poor performance of Hash Table due to same hash value as distribution below
> ---
>
> Key: DRILL-5293
> URL: https://issues.apache.org/jira/browse/DRILL-5293
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.8.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> The computation of the hash value is basically the same whether for the Hash 
> Table (used by Hash Agg, and Hash Join), or for distribution of rows at the 
> exchange. As a result, a specific Hash Table (in a parallel minor fragment) 
> gets only rows "filtered out" by the partition below ("upstream"), so the 
> pattern of this filtering leads to a non uniform usage of the hash buckets in 
> the table.
>   Here is a simplified example: An exchange partitions into TWO (minor 
> fragments), each running a Hash Agg. So the partition sends rows of EVEN hash 
> values to the first, and rows of ODD hash values to the second. Now the first 
> recomputes the _same_ hash value for its Hash table -- and only the even 
> buckets get used !!  (Or with a partition into EIGHT -- possibly only one 
> eighth of the buckets would be used !! ) 
>This would lead to longer hash chains and thus a _poor performance_ !
> A possible solution -- add a distribution function distFunc (only for 
> partitioning) that takes the hash value and "scrambles" it so that the 
> entropy in all the bits effects the low bits of the output. This function 
> should be applied (in HashPrelUtil) over the generated code that produces the 
> hash value, like:
>distFunc( hash32(field1, hash32(field2, hash32(field3, 0))) );
> Tested with a huge hash aggregate (64 M rows) and a parallelism of 8 ( 
> planner.width.max_per_node = 8 ); minor fragments 0 and 4 used only 1/8 of 
> their buckets, the others used 1/4 of their buckets.  Maybe the reason for 
> this variance is that distribution is using "hash32AsDouble" and hash agg is 
> using "hash32".  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5290) Provide an option to build operator table once for built-in static functions and reuse it across queries.

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5290:
-
Reviewer: Kunal Khatua  (was: Sudheesh Katkam)

> Provide an option to build operator table once for built-in static functions 
> and reuse it across queries.
> -
>
> Key: DRILL-5290
> URL: https://issues.apache.org/jira/browse/DRILL-5290
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> Currently, DrillOperatorTable which contains standard SQL operators and 
> functions and Drill User Defined Functions (UDFs) (built-in and dynamic) gets 
> built for each query as part of creating QueryContext. This is an expensive 
> operation ( ~30 msec to build) and allocates  ~2M on heap for each query. For 
> high throughput, low latency operational queries, we quickly run out of heap 
> memory, causing JVM hangs. Build operator table once during startup for 
> static built-in functions and save in DrillbitContext, so we can reuse it 
> across queries.
> Provide a system/session option to not use dynamic UDFs so we can use the 
> operator table saved in DrillbitContext and avoid building each time.
> *Please note, changes are adding new option exec.udf.use_dynamic which needs 
> to be documented.*



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5287) Provide option to skip updates of ephemeral state changes in Zookeeper

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5287:
-
Reviewer: Kunal Khatua  (was: Sudheesh Katkam)

> Provide option to skip updates of ephemeral state changes in Zookeeper
> --
>
> Key: DRILL-5287
> URL: https://issues.apache.org/jira/browse/DRILL-5287
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> We put transient profiles in zookeeper and update state as query progresses 
> and changes states. It is observed that this adds latency of ~45msec for each 
> update in the query execution path. This gets even worse when high number of 
> concurrent queries are in progress. For concurrency=100, the average query 
> response time even for short queries  is 8 sec vs 0.2 sec with these updates 
> disabled. For short lived queries in a high-throughput scenario, it is of no 
> value to update state changes in zookeeper. We need an option to disable 
> these updates for short running operational queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5304) Queries fail intermittently when there is skew in data distribution

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5304:
-
Reviewer: Abhishek Girish  (was: Jinfeng Ni)

> Queries fail intermittently when there is skew in data distribution
> ---
>
> Key: DRILL-5304
> URL: https://issues.apache.org/jira/browse/DRILL-5304
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
> Attachments: query1_drillbit.log.txt, query2_drillbit.log.txt
>
>
> In a distributed environment, we've observed certain queries to fail 
> execution intermittently, with an assignment logic issue, when the underlying 
> data is skewed w.r.t distribution. 
> For example the TPC-H [query 
> 7|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Advanced/tpch/tpch_sf100/parquet/07.q]
>  failed with the below error:
> {code}
> java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: 
> MinorFragmentId 105 has no read entries assigned
> ...
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: MinorFragmentId 105 has no read entries 
> assigned
> org.apache.drill.exec.work.foreman.Foreman.run():281
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():744
>   Caused By (java.lang.IllegalArgumentException) MinorFragmentId 105 has no 
> read entries assigned
> {code}
> Log containing full stack trace is attached.
> And for this query, the underlying TPC-H SF100 Parquet dataset was observed 
> to be located mostly only on 2-3 nodes on an 8 node DFS environment. The data 
> distribution skew on this cluster is most likely the triggering factor for 
> this case, as the same query, on the same dataset does not show this failure 
> on a different test cluster (with possibly different data distribution). 
> Also, another 
> [query|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/limit0/window_functions/bugs/data/drill-3700.sql]
>  failed with a similar error when slice target was set to 1. 
> {code}
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: 
> MinorFragmentId 66 has no read entries assigned
> ...
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: MinorFragmentId 66 has no read entries 
> assigned
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5273) CompliantTextReader exhausts 4 GB memory when reading 5000 small files

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5273:
-
Reviewer: Kunal Khatua  (was: Chunhui Shi)

> CompliantTextReader exhausts 4 GB memory when reading 5000 small files
> --
>
> Key: DRILL-5273
> URL: https://issues.apache.org/jira/browse/DRILL-5273
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> A test case was created that consists of 5000 text files, each with a single 
> line with the file number: 1 to 5001. Each file has a single record, and at 
> most 4 characters per record.
> Run the following query:
> {code}
> SELECT * FROM `dfs.data`.`5000files/text
> {code}
> The query will fail with an OOM in the scan batch on around record 3700 on a 
> Mac with 4GB of direct memory.
> The code to read records in {ScanBatch} is complex. The following appears to 
> occur:
> * Iterate over the record readers for each file.
> * For each, call setup
> The setup code is:
> {code}
>   public void setup(OperatorContext context, OutputMutator outputMutator) 
> throws ExecutionSetupException {
> oContext = context;
> readBuffer = context.getManagedBuffer(READ_BUFFER);
> whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER);
> {code}
> The two buffers are in direct memory. There is no code that releases the 
> buffers.
> The sizes are:
> {code}
>   private static final int READ_BUFFER = 1024*1024;
>   private static final int WHITE_SPACE_BUFFER = 64*1024;
> = 1,048,576 + 65536 = 1,114,112
> {code}
> This is exactly the amount of memory that accumulates per call to 
> {{ScanBatch.next()}}
> {code}
> Ctor: 0  -- Initial memory in constructor
> Init setup: 1114112  -- After call to first record reader setup
> Entry Memory: 1114112  -- first next() call, returns one record
> Entry Memory: 1114112  -- second next(), eof and start second reader
> Entry Memory: 2228224 -- third next(), second reader returns EOF
> ...
> {code}
> If we leak 1 MB per file, with 5000 files we would leak 5 GB of memory, which 
> would explain the OOM when given only 4 GB.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5263) Prevent left NLJoin with non scalar subqueries

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5263:
-
Reviewer: Abhishek Girish  (was: Aman Sinha)

> Prevent left NLJoin with non scalar subqueries
> --
>
> Key: DRILL-5263
> URL: https://issues.apache.org/jira/browse/DRILL-5263
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
> Attachments: tmp.tar.gz
>
>
> Nested loop join operator in Drill supports only inner join and returns 
> incorrect result for queries with left join and non scalar sub-queries. Drill 
> should throw error in this case. 
> Example:
> {code:sql}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> {code}
> Result:
> {noformat}
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5221) cancel message is delayed until queryid or data is received

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5221:
-
Reviewer: Khurram Faraaz

> cancel message is delayed until queryid or data is received
> ---
>
> Key: DRILL-5221
> URL: https://issues.apache.org/jira/browse/DRILL-5221
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.9.0
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> When user is calling the cancel method of the C++ client, the client wait for 
> a message from the server to reply back with a cancellation message.
> In case of queries taking a long time to return batch results, it means 
> cancellation won't be effective until the next batch is received, instead of 
> cancelling right away the query (assuming the query id has already been 
> received, which is generally the case).
> It seems this was foreseen by [~vkorukanti] in his initial patch 
> (https://github.com/vkorukanti/drill/commit/e0ef6349aac48de5828b6d725c2cf013905d18eb)
>  but was omitted when I backported it post metadata changes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5207) Improve Parquet scan pipelining

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5207:
-
Reviewer: Kunal Khatua  (was: Sudheesh Katkam)

> Improve Parquet scan pipelining
> ---
>
> Key: DRILL-5207
> URL: https://issues.apache.org/jira/browse/DRILL-5207
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> The parquet reader's async page reader is not quite efficiently pipelined. 
> The default size of the disk read buffer is 4MB while the page reader reads 
> ~1MB at a time. The Parquet decode is also processing 1MB at a time. This 
> means the disk is idle while the data is being processed. Reducing the buffer 
> to 1MB will reduce the time the processing thread waits for the disk read 
> thread.
> Additionally, since the data to process a page may be more or less than 1MB, 
> a queue of pages will help so that the disk scan does not block (until the 
> queue is full), waiting for the processing thread.
> Additionally, the BufferedDirectBufInputStream class reads from disk as soon 
> as it is initialized. Since this is called at setup time, this increases the 
> setup time for the query and query execution does not begin until this is 
> completed.
> There are a few other inefficiencies - options are read every time a page 
> reader is created. Reading options can be expensive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5123) Write query profile after sending final response to client to improve latency

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5123:
-
Reviewer: Kunal Khatua  (was: Padma Penumarthy)

> Write query profile after sending final response to client to improve latency
> -
>
> Key: DRILL-5123
> URL: https://issues.apache.org/jira/browse/DRILL-5123
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> In testing a particular query, I used a test setup that does not write to the 
> "persistent store", causing query profiles to not be saved. I then changed 
> the config to save them (to local disk). This produced about a 200ms 
> difference in query run time as perceived by the client.
> I then moved writing the query profile _after_ sending the client the final 
> message. This resulted in an approximately 100ms savings, as perceived by the 
> client, in query run time on short (~3 sec.) queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5121) A memory leak is observed when exact case is not specified for a column in a filter condition

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5121:
-
Reviewer: Chun Chang  (was: Paul Rogers)

> A memory leak is observed when exact case is not specified for a column in a 
> filter condition
> -
>
> Key: DRILL-5121
> URL: https://issues.apache.org/jira/browse/DRILL-5121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.6.0, 1.8.0
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When the query SELECT XYZ from dfs.`/tmp/foo' where xYZ like "abc", is 
> executed on a setup where /tmp/foo has 2 Parquet files, 1.parquet and 
> 2.parquet, where 1.parquet has the column XYZ but 2.parquet does not, then 
> there is a memory leak. 
> This seems to happen because xYZ seem to be treated as a new column. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-5119) Update MapR version to 5.2.0.40963-mapr

2017-03-27 Thread Abhishek Girish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish closed DRILL-5119.
--

> Update MapR version to 5.2.0.40963-mapr
> ---
>
> Key: DRILL-5119
> URL: https://issues.apache.org/jira/browse/DRILL-5119
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.10.0
>Reporter: Abhishek Girish
>Assignee: Patrick Wong
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> This is for the "mapr" profile. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5097) Using store.parquet.reader.int96_as_timestamp gives IOOB whereas convert_from works

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5097:
-
Reviewer: Krystal  (was: Karthikeyan Manivannan)

> Using store.parquet.reader.int96_as_timestamp gives IOOB whereas convert_from 
> works
> ---
>
> Key: DRILL-5097
> URL: https://issues.apache.org/jira/browse/DRILL-5097
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
> Attachments: data.snappy.parquet
>
>
> Using store.parquet.reader.int96_as_timestamp gives IOOB whereas convert_from 
> works. 
> The below query succeeds:
> {code}
> select c, convert_from(d, 'TIMESTAMP_IMPALA') from 
> dfs.`/drill/testdata/parquet_timestamp/spark_generated/d3`;
> {code}
> The below query fails:
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `store.parquet.reader.int96_as_timestamp` = true;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | store.parquet.reader.int96_as_timestamp updated.  |
> +---+---+
> 1 row selected (0.231 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select c, d from 
> dfs.`/drill/testdata/parquet_timestamp/spark_generated/d3`;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 131076 (expected: 0 <= readerIndex <= writerIndex <= capacity(131072))
> Fragment 0:0
> [Error Id: bd94f477-7c01-420f-8920-06263212177b on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5104) Foreman sets external sort memory allocation even for a physical plan

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5104:
-
Reviewer: Rahul Challapalli  (was: Boaz Ben-Zvi)

> Foreman sets external sort memory allocation even for a physical plan
> -
>
> Key: DRILL-5104
> URL: https://issues.apache.org/jira/browse/DRILL-5104
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> Consider the (disabled) unit test 
> {{TestSimpleExternalSort.outOfMemoryExternalSort}} which uses the physical 
> plan {{xsort/oom_sort_test.json}} that contains a setting for the amount of 
> memory to allocate:
> {code}
>{
> ...
> pop:"external-sort",
> ...
> initialAllocation: 100,
> maxAllocation: 3000
> },
> {code}
> When run, the amount of memory is set to 715827882. The reason is that code 
> was added to {{Foreman}} to compute the memory to allocate to the external 
> sort:
> {code}
>   private void runPhysicalPlan(final PhysicalPlan plan) throws 
> ExecutionSetupException {
> validatePlan(plan);
> MemoryAllocationUtilities.setupSortMemoryAllocations(plan, queryContext);
> {code}
> The problem is that a physical plan should execute as provided to enable 
> detailed testing.
> To solve this problem, move the sort memory setup to the path taken by SQL 
> queries, but not via physical plans.
> This change is necessary to re-enable the previously-disabled external sort 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5098) Improving fault tolerance for connection between client and foreman node.

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5098:
-
Reviewer: Chun Chang  (was: Paul Rogers)

> Improving fault tolerance for connection between client and foreman node.
> -
>
> Key: DRILL-5098
> URL: https://issues.apache.org/jira/browse/DRILL-5098
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> With DRILL-5015 we allowed support for specifying multiple Drillbits in 
> connection string and randomly choosing one out of it. Over time some of the 
> Drillbits specified in the connection string may die and the client can fail 
> to connect to Foreman node if random selection happens to be of dead Drillbit.
> Even if ZooKeeper is used for selecting a random Drillbit from the registered 
> one there is a small window when client selects one Drillbit and then that 
> Drillbit went down. The client will fail to connect to this Drillbit and 
> error out. 
> Instead if we try multiple Drillbits (configurable tries count through 
> connection string) then the probability of hitting this error window will 
> reduce in both the cases improving fault tolerance. During further 
> investigation it was also found that if there is Authentication failure then 
> we throw that error as generic RpcException. We need to improve that as well 
> to capture this case explicitly since in case of Auth failure we don't want 
> to try multiple Drillbits.
> Connection string example with new parameter:
> jdbc:drill:drillbit=[:][,[:]...;tries=5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5080) Create a memory-managed version of the External Sort operator

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5080:
-
Reviewer: Rahul Challapalli  (was: Boaz Ben-Zvi)

> Create a memory-managed version of the External Sort operator
> -
>
> Key: DRILL-5080
> URL: https://issues.apache.org/jira/browse/DRILL-5080
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
> Attachments: ManagedExternalSortDesign.pdf
>
>
> We propose to create a "managed" version of the external sort operator that 
> works to a clearly-defined memory limit. Attached is a design specification 
> for the work.
> The project will include fixing a number of bugs related to the external 
> sort, include as sub-tasks of this umbrella task.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5081) Excessive info level logging introduced in DRILL-4203

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5081:
-
Reviewer: Krystal  (was: Sudheesh Katkam)

> Excessive info level logging introduced in DRILL-4203
> -
>
> Key: DRILL-5081
> URL: https://issues.apache.org/jira/browse/DRILL-5081
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sudheesh Katkam
>Assignee: Vitalii Diravka
> Fix For: 1.10.0
>
>
> Excessive info level logging introduced in 
> [8461d10|https://github.com/apache/drill/commit/8461d10b4fd6ce56361d1d826bb3a38b6dc8473c].
>  A line is printed for every row group being read, and for every metadata 
> file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5065) Optimize count(*) queries on MapR-DB JSON Tables

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5065:
-
Reviewer: Rahul Challapalli

> Optimize count(*) queries on MapR-DB JSON Tables
> 
>
> Key: DRILL-5065
> URL: https://issues.apache.org/jira/browse/DRILL-5065
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - MapRDB
>Affects Versions: 1.9.0
> Environment: Clusters with MapR v5.2.0 and above
>Reporter: Abhishek Girish
>Assignee: Smidth Panchamia
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> The JSON FileReader optimizes count(* ) queries, by only counting the number 
> of records in the files and discarding the data. This makes the query 
> execution faster & efficient. 
> We need a similar feature in the MapR format plugin (maprdb) to optimize _id 
> only projection & count(* ) queries on MapR-DB JSON Tables.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5048) Fix type mismatch error in case statement with null timestamp

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5048:
-
Reviewer: Krystal  (was: Gautam Kumar Parai)

> Fix type mismatch error in case statement with null timestamp
> -
>
> Key: DRILL-5048
> URL: https://issues.apache.org/jira/browse/DRILL-5048
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> AssertionError when we use case with timestamp and null:
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT res, CASE res WHEN true THEN 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) ELSE null END
> . . . . . . . . . . . . . . > FROM
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > SELECT
> . . . . . . . . . . . . . . > (CASE WHEN (false) THEN null ELSE 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) END) res
> . . . . . . . . . . . . . . > FROM (values(1)) foo
> . . . . . . . . . . . . . . > ) foobar;
> Error: SYSTEM ERROR: AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> [Error Id: b56e0a4d-2f9e-4afd-8c60-5bc2f9d31f8f on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:1696) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSubset.add(RelSubset.java:295) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSet.add(RelSet.java:147) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1818)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1760)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1017)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1037)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1940)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:138)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> ... 16 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5051) DRILL-5051: Fix incorrect result returned in nest query with offset specified

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5051:
-
Reviewer: Rahul Challapalli  (was: Sudheesh Katkam)

> DRILL-5051: Fix incorrect result returned in nest query with offset specified
> -
>
> Key: DRILL-5051
> URL: https://issues.apache.org/jira/browse/DRILL-5051
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
> Environment: Fedora 24 / OpenJDK 8
>Reporter: Hongze Zhang
>Assignee: Hongze Zhang
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> My SQl:
> select count(1) from (select id from (select id from 
> cp.`tpch/lineitem.parquet` LIMIT 2) limit 1 offset 1) 
> This SQL returns nothing.
> Something goes wrong in LimitRecordBatch.java, and the reason is different 
> with [DRILL-4884|https://issues.apache.org/jira/browse/DRILL-4884?filter=-2]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5043) Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala reassigned DRILL-5043:


Assignee: Arina Ielchiieva
Reviewer: Krystal  (was: Arina Ielchiieva)

> Function that returns a unique id per session/connection similar to MySQL's 
> CONNECTION_ID()
> ---
>
> Key: DRILL-5043
> URL: https://issues.apache.org/jira/browse/DRILL-5043
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Nagarajan Chinnasamy
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: CONNECTION_ID, SESSION, UDF, doc-impacting
> Fix For: 1.10.0
>
> Attachments: 01_session_id_sqlline.png, 
> 02_session_id_webconsole_query.png, 03_session_id_webconsole_result.png
>
>
> Design and implement a function that returns a unique id per 
> session/connection similar to MySQL's CONNECTION_ID().
> *Implementation details*
> function *session_id* will be added. Function returns current session unique 
> id represented as string. Parameter {code:java} boolean isNiladic{code} will 
> be added to UDF FunctionTemplate to indicate if a function is niladic (a 
> function to be called without any parameters and parentheses)
> Please note, this function will override columns that have the same name. 
> Table alias should be used to retrieve column value from table.
> Example:
> {code:sql}select session_id from   // returns the value of niladic 
> function session_id {code} 
> {code:sql}select t1.session_id from  t1 // returns session_id column 
> value from table {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5032) Drill query on hive parquet table failed with OutOfMemoryError: Java heap space

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5032:
-
Reviewer: Rahul Challapalli  (was: Jinfeng Ni)

> Drill query on hive parquet table failed with OutOfMemoryError: Java heap 
> space
> ---
>
> Key: DRILL-5032
> URL: https://issues.apache.org/jira/browse/DRILL-5032
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
> Attachments: plan, plan with fix
>
>
> Following query on hive parquet table failed with OOM Java heap space:
> {code}
> select distinct(businessdate) from vmdr_trades where trade_date='2016-04-12'
> 2016-08-31 08:02:03,597 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 283938c3-fde8-0fc6-37e1-9a568c7f5913: select distinct(businessdate) from 
> vmdr_trades where trade_date='2016-04-12'
> 2016-08-31 08:05:58,502 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
> 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 1 ms
> 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 3 ms
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> 2016-08-31 08:05:58,664 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$1
> 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> 2016-08-31 08:09:42,355 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
>  ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
>  ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:136) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:76) 
> ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:457) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:166) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:76) 
> ~[na:1.8.0_74]
> at 
> com.google.protobuf.TextFormat$TextGenerator.write(TextFormat.java:538) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$TextGenerator.print(TextFormat.java:526) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:389) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286) 
> ~[protobuf-java-2.5.0.jar:na]
> at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273) 
> 

[jira] [Updated] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5034:
-
Reviewer: Krystal  (was: Karthikeyan Manivannan)

> Select timestamp from hive generated parquet always return in UTC
> -
>
> Key: DRILL-5034
> URL: https://issues.apache.org/jira/browse/DRILL-5034
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Krystal
>Assignee: Vitalii Diravka
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904
> Reading timestamp data against a hive parquet table from drill automatically 
> converts the timestamp data to UTC. 
> {code}
> SELECT TIMEOFDAY() FROM (VALUES(1));
> +--+
> |EXPR$0|
> +--+
> | 2016-11-10 12:33:26.547 America/Los_Angeles  |
> +--+
> {code}
> data schema:
> {code}
> message hive_schema {
>   optional int32 voter_id;
>   optional binary name (UTF8);
>   optional int32 age;
>   optional binary registration (UTF8);
>   optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
>   optional int32 voterzone;
>   optional int96 create_timestamp;
>   optional int32 create_date (DATE);
> }
> {code}
> Using drill-1.8, the returned timestamps match the table data:
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> `/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-23 20:03:58.0  |
> | null   |
> | 2016-09-09 12:01:18.0  |
> | 2017-03-06 20:35:55.0  |
> | 2017-01-20 22:32:43.0  |
> ++
> 5 rows selected (1.032 seconds)
> {code}
> If the user timzone is changed to UTC, then the timestamp data is returned in 
> UTC time.
> Using drill-1.9, the returned timestamps got converted to UTC eventhough the 
> user timezone is in PST.
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
> {code}
> alter session set `store.parquet.reader.int96_as_timestamp`=true;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | store.parquet.reader.int96_as_timestamp updated.  |
> +---+---+
> select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` 
> limit 5;
> ++
> |create_timestamp|
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4987) Use ImpersonationUtil in RemoteFunctionRegistry

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4987:
-
Reviewer: Chun Chang

> Use ImpersonationUtil in RemoteFunctionRegistry
> ---
>
> Key: DRILL-4987
> URL: https://issues.apache.org/jira/browse/DRILL-4987
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>Priority: Minor
> Fix For: 1.10.0
>
>
> + Use ImpersonationUtil#getProcessUserName rather than  
> UserGroupInformation#getCurrentUser#getUserName in RemoteFunctionRegistry
> + Expose process users' group info in ImpersonationUtil and use that in 
> RemoteFunctionRegistry, rather than 
> UserGroupInformation#getCurrentUser#getGroupNames



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4980:
-
Reviewer:   (was: Parth Chandra)

> Upgrading of the approach of parquet date correctness status detection
> --
>
> Key: DRILL-4980
> URL: https://issues.apache.org/jira/browse/DRILL-4980
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.10.0
>
>
> This jira is an addition for the 
> [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203].
> The date correctness label for the new generated parquet files should be 
> upgraded. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4938) Report UserException when constant expression reduction fails

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4938:
-
Reviewer: Khurram Faraaz  (was: Boaz Ben-Zvi)

> Report UserException when constant expression reduction fails
> -
>
> Key: DRILL-4938
> URL: https://issues.apache.org/jira/browse/DRILL-4938
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Serhii Harnyk
>Priority: Minor
> Fix For: 1.10.0
>
>
> We need a better error message instead of DrillRuntimeException
> Drill 1.9.0 git commit ID : 4edabe7a
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select (res1 = 2016/09/22) res2
> . . . . . . . . . . . . . . > from
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > select (case when (false) then null else 
> cast('2016/09/22' as date) end) res1
> . . . . . . . . . . . . . . > from (values(1)) foo
> . . . . . . . . . . . . . . > ) foobar;
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing 
> expression in constant expression evaluator [CASE(false, =(null, /(/(2016, 
> 9), 22)), =(CAST('2016/09/22'):DATE NOT NULL, /(/(2016, 9), 22)))].  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [castTIMESTAMP(INT-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.
> Error in expression at index -1.  Error: Missing function implementation: 
> [castTIMESTAMP(INT-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4956) Temporary tables support

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4956:
-
Reviewer: Khurram Faraaz  (was: Paul Rogers)

> Temporary tables support
> 
>
> Key: DRILL-4956
> URL: https://issues.apache.org/jira/browse/DRILL-4956
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> Link to design doc - 
> https://docs.google.com/document/d/1gSRo_w6q2WR5fPx7SsQ5IaVmJXJ6xCOJfYGyqpVOC-g/edit
> Gist - 
> https://gist.github.com/arina-ielchiieva/50158175867a18eee964b5ba36455fbf#file-temporarytablessupport-md
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-4935) Allow drillbits to advertise a configurable host address to Zookeeper

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala reassigned DRILL-4935:


Assignee: Abhishek Girish
Reviewer: Abhishek Girish  (was: Khurram Faraaz)

> Allow drillbits to advertise a configurable host address to Zookeeper
> -
>
> Key: DRILL-4935
> URL: https://issues.apache.org/jira/browse/DRILL-4935
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - RPC
>Affects Versions: 1.8.0
>Reporter: Harrison Mebane
>Assignee: Abhishek Girish
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> There are certain situations, such as running Drill in distributed Docker 
> containers, in which it is desirable to advertise a different hostname to 
> Zookeeper than would be output by INetAddress.getLocalHost().  I propose 
> adding a configuration variable 'drill.exec.rpc.bit.advertised.host' and 
> passing this address to Zookeeper when the configuration variable is 
> populated, otherwise falling back to the present behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4272) When sort runs out of memory and query fails, resources are seemingly not freed

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4272:
-
Reviewer: Rahul Challapalli

> When sort runs out of memory and query fails, resources are seemingly not 
> freed
> ---
>
> Key: DRILL-4272
> URL: https://issues.apache.org/jira/browse/DRILL-4272
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Paul Rogers
>Priority: Critical
> Fix For: 1.10.0
>
>
> Executed query11.sql from resources/Advanced/tpcds/tpcds_sf1/original/parquet
> Query runs out of memory:
> {code}
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Unable to allocate sv2 for 32768 records, and not enough batchGroups to spill.
> batchGroups.size 1
> spilledBatchGroups.size 0
> allocated memory 19961472
> allocator limit 2000
> Fragment 19:0
> [Error Id: 87aa32b8-17eb-488e-90cb-5f5b9aec on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> And leaves fragments running, holding resources:
> {code}
> 2016-01-14 22:46:32,435 [Drillbit-ShutdownHook#0] INFO  
> o.apache.drill.exec.server.Drillbit - Received shutdown request.
> 2016-01-14 22:46:32,546 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-136.qa.lab no longer 
> active.  Cancelling fragment 2967db08-cd38-925a-4960-9e881f537af8:19:0.
> 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2967db08-cd38-925a-4960-9e881f537af8:19:0: State change requested 
> CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED
> 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2967db08-cd38-925a-4960-9e881f537af8:19:0: Ignoring unexpected state 
> transition CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED
> 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-136.qa.lab no longer 
> active.  Cancelling fragment 2967db08-cd38-925a-4960-9e881f537af8:17:0.
> 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2967db08-cd38-925a-4960-9e881f537af8:17:0: State change requested 
> CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED
> 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2967db08-cd38-925a-4960-9e881f537af8:17:0: Ignoring unexpected state 
> transition CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED
> 2016-01-14 22:46:33,563 [BitServer-1] INFO  
> o.a.d.exec.rpc.control.ControlClient - Channel closed /10.10.88.134:59069 
> <--> atsqa4-136.qa.lab/10.10.88.136:31011.
> 2016-01-14 22:46:33,563 [BitClient-1] INFO  
> o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:34802 <--> 
> atsqa4-136.qa.lab/10.10.88.136:31012.
> 2016-01-14 22:46:33,590 [BitClient-1] INFO  
> o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:36937 <--> 
> atsqa4-135.qa.lab/10.10.88.135:31012.
> 2016-01-14 22:46:33,595 [BitClient-1] INFO  
> o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:53860 <--> 
> atsqa4-133.qa.lab/10.10.88.133:31012.
> 2016-01-14 22:46:38,467 [BitClient-1] INFO  
> o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:48276 <--> 
> atsqa4-134.qa.lab/10.10.88.134:31012.
> 2016-01-14 22:46:39,470 [pool-6-thread-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - closed eventLoopGroup 
> io.netty.channel.nio.NioEventLoopGroup@6fb32dfb in 1003 ms
> 2016-01-14 22:46:39,470 [pool-6-thread-2] INFO  
> o.a.drill.exec.rpc.data.DataServer - closed eventLoopGroup 
> io.netty.channel.nio.NioEventLoopGroup@5c93dd80 in 1003 ms
> 2016-01-14 22:46:39,470 [pool-6-thread-1] INFO  
> o.a.drill.exec.service.ServiceEngine - closed userServer in 1004 ms
> 2016-01-14 22:46:39,470 [pool-6-thread-2] INFO  
> o.a.drill.exec.service.ServiceEngine - closed dataPool in 1005 ms
> 2016-01-14 22:46:39,483 [Drillbit-ShutdownHook#0] WARN  
> o.apache.drill.exec.work.WorkManager - Closing WorkManager but there are 2 
> running fragments.
> 2016-01-14 22:46:41,489 [Drillbit-ShutdownHook#0] ERROR 
> o.a.d.exec.server.BootStrapContext - Pool did not terminate
> 2016-01-14 22:46:41,498 [Drillbit-ShutdownHook#0] WARN  
> o.apache.drill.exec.server.Drillbit - Failure on close()
> java.lang.RuntimeException: Exception while closing
> at 
> org.apache.drill.common.DrillAutoCloseables.closeNoChecked(DrillAutoCloseables.java:46)
>  ~[drill-common-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> at 
> org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:127)
>  

[jira] [Updated] (DRILL-4919) Fix select count(1) / count(*) on csv with header

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4919:
-
Reviewer: Krystal  (was: Gautam Kumar Parai)

> Fix select count(1) / count(*) on csv with header
> -
>
> Key: DRILL-4919
> URL: https://issues.apache.org/jira/browse/DRILL-4919
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: F Méthot
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> This happens since  1.8
> Dataset (I used extended char for display purpose) test.csvh:
> a,b,c,d\n
> 1,2,3,4\n
> 5,6,7,8\n
> Storage config:
> "csvh": {
>   "type": "text",
>   "extensions" : [
>   "csvh"
>],
>"extractHeader": true,
>"delimiter": ","
>   }
> select count(1) from dfs.`test.csvh`
> Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header 
> names are supported
> coumn name columns
> column index
> Fragment 0:0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4864) Add ANSI format for date/time functions

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4864:
-
Reviewer: Krystal  (was: Paul Rogers)

> Add ANSI format for date/time functions
> ---
>
> Key: DRILL-4864
> URL: https://issues.apache.org/jira/browse/DRILL-4864
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
> layer. This is not following SQL conventions used by ANSI and many other 
> database engines on the market.
> Add new UDFs: 
> * sql_to_date(String, Format), 
> * sql_to_time(String, Format), 
> * sql_to_timestamp(String, Format)
> that requires Postgres datetime format.
> Table of supported Postgres patterns
> ||Pattern name||Postgres format   
> |Full name of day|day   
> |Day of year|ddd   
> |Day of month|dd
> |Day of week|d   
> |Name of month|month
> |Abr name of month|mon
> |Full era name|ee
> |Name of day|dy   
> |Time zone|tz   
> |Hour 12 |hh   
> |Hour 12 |hh12   
> |Hour 24|hh24
> |Minute of hour|mi  
> |Second of minute|ss   
> |Millisecond of minute|ms
> |Week of year|ww   
> |Month|mm   
> |Halfday am|am
> |Year   |   y   
> |ref.|
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html   |
> Table of acceptable Postgres pattern modifiers, which may be used in Format 
> string
> ||Description||Pattern||
> |fill mode (suppress padding blanks and zeroes)|fm |
> |fixed format global option (see usage notes)|fx |
> |translation mode (print localized day and month names based on 
> lc_messages)|tm |
> |spell mode (not yet implemented)|sp|
> |ref.|
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html|



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4812) Wildcard queries fail on Windows

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4812:
-
Reviewer: Kunal Khatua  (was: Paul Rogers)

> Wildcard queries fail on Windows
> 
>
> Key: DRILL-4812
> URL: https://issues.apache.org/jira/browse/DRILL-4812
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.7.0
> Environment: Windows 7
>Reporter: Mike Lavender
>  Labels: easyfix, easytest, ready-to-commit, windows
> Fix For: 1.10.0
>
>
> Wildcards within the path of a query are not handled on windows and result in 
> a "String index out of range" exception.
> for example:
> {noformat}
> 0: jdbc:drill:zk=local> SELECT SUM(qty) as num FROM 
> dfs.parquet.`/trends/2016/1/*/*/3701`;
> Error: VALIDATION ERROR: String index out of range: -1
> SQL Query null
> {noformat}
> 
> The problem exists within:
> exec\java-exec\src\main\java\org\apache\drill\exec\store\dfs\FileSelection.java
> private static Path handleWildCard(final String root)
> This function is looking for the index of the system specific PATH_SEPARATOR 
> which on windows is '\' (from System.getProperty("file.separator")).  The 
> path passed in to handleWildcard will not ever have those type of path 
> separators as the Path constructor (from org.apache.hadoop.fs.Path) sets all 
> the path separators to '/'.
> NOTE:
> private static String removeLeadingSlash(String path)
> in that same file explicitly looks for '/' and does not use the system 
> specific PATH_SEPARATOR.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4764) Parquet file with INT_16, etc. logical types not supported by simple SELECT

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4764:
-
Reviewer: Rahul Challapalli  (was: Parth Chandra)

> Parquet file with INT_16, etc. logical types not supported by simple SELECT
> ---
>
> Key: DRILL-4764
> URL: https://issues.apache.org/jira/browse/DRILL-4764
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
> Attachments: int_16.parquet, int_8.parquet, uint_16.parquet, 
> uint_32.parquet, uint_8.parquet
>
>
> Create a Parquet file with the following schema:
> message int16Data { required int32 index; required int32 value (INT_16); }
> Store it as int_16.parquet in the local file system. Query it with:
> SELECT * from `local`.`root`.`int_16.parquet`;
> The result, in the web UI, is this error:
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> UnsupportedOperationException: unsupported type: INT32 INT_16 Fragment 0:0 
> [Error Id: c63f66b4-e5a9-4a35-9ceb-546b74645dd4 on 172.30.1.28:31010]
> The INT_16 logical (or "original") type simply tells consumers of the file 
> that the data is actually a 16-bit signed int. Presumably, this should tell 
> Drill to use the SmallIntVector (or NullableSmallIntVector) class for 
> storage. Without supporting this annotation, even 16-bit integers must be 
> stored as 32-bits within Drill.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4301) OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4301:
-
Reviewer: Rahul Challapalli

> OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to 
> spill.
> ---
>
> Key: DRILL-4301
> URL: https://issues.apache.org/jira/browse/DRILL-4301
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Affects Versions: 1.5.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> Query below in Functional tests, fails due to OOM 
> {code}
> select * from dfs.`/drill/testdata/metadata_caching/fewtypes_boolpartition` 
> where bool_col = true;
> {code}
> Drill version : drill-1.5.0
> JAVA_VERSION=1.8.0
> {noformat}
> version   commit_id   commit_message  commit_time build_email 
> build_time
> 1.5.0-SNAPSHOT2f0e3f27e630d5ac15cdaef808564e01708c3c55
> DRILL-4190 Don't hold on to batches from left side of merge join.   
> 20.01.2016 @ 22:30:26 UTC   Unknown 20.01.2016 @ 23:48:33 UTC
> framework/framework/resources/Functional/metadata_caching/data/bool_partition1.q
>  (connection: 808078113)
> [#1378] Query failed: 
> oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: 
> One or more nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.
> batchGroups.size 0
> spilledBatchGroups.size 0
> allocated memory 48326272
> allocator limit 46684427
> Fragment 0:0
> [Error Id: 97d58ea3-8aff-48cf-a25e-32363b8e0ecd on drill-demod2:31010]
>   at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>   at 
> oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> 

[jira] [Updated] (DRILL-4280) Kerberos Authentication

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4280:
-
Reviewer: Chun Chang  (was: Chunhui Shi)

> Kerberos Authentication
> ---
>
> Key: DRILL-4280
> URL: https://issues.apache.org/jira/browse/DRILL-4280
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: security
> Fix For: 1.10.0
>
>
> Drill should support Kerberos based authentication from clients. This means 
> that both the ODBC and JDBC drivers as well as the web/REST interfaces should 
> support inbound Kerberos. For Web this would most likely be SPNEGO while for 
> ODBC and JDBC this will be more generic Kerberos.
> Since Hive and much of Hadoop supports Kerberos there is a potential for a 
> lot of reuse of ideas if not implementation.
> Note that this is related to but not the same as 
> https://issues.apache.org/jira/browse/DRILL-3584 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4217) Query parquet file treat INT_16 & INT_8 as INT32

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4217:
-
Reviewer: Rahul Challapalli

> Query parquet file treat INT_16 & INT_8 as INT32
> 
>
> Key: DRILL-4217
> URL: https://issues.apache.org/jira/browse/DRILL-4217
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
>Reporter: Low Chin Wei
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Encounter this issue while trying to query a parquet file:
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> UnsupportedOperationException: unsupported type: INT32 INT_16 Fragment 1:1 
> We can treat the following Field Type as INTEGER before support of Short & 
> Byte is implemeted: 
> - INT32 INT_16
> - INT32 INT_8



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-3562:
-
Reviewer: Rahul Challapalli  (was: Arina Ielchiieva)

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala reassigned DRILL-5316:


Assignee: Chun Chang
Reviewer: Chun Chang  (was: Sorabh Hamirwasia)

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Chun Chang
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-4763) Parquet file with DATE logical type produces wrong results for simple SELECT

2017-03-27 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-4763.


DRILL-4203, which is a duplicate of this issue, has been fixed and verified. 
Closing this one as well

> Parquet file with DATE logical type produces wrong results for simple SELECT
> 
>
> Key: DRILL-4763
> URL: https://issues.apache.org/jira/browse/DRILL-4763
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
> Fix For: 1.9.0
>
> Attachments: date.parquet, int_16.parquet
>
>
> Created a simple Parquet file with the following schema:
> message test { required int32 index; required int32 value (DATE); required 
> int32 raw; }
> That is, a file with an int32 storage type and a DATE logical type. Then, 
> created a number of test values:
> 0 (which should be interpreted as 1970-01-01) and
> (int) (System.currentTimeMillis() / (24*60*60*1000) ) Which should be 
> interpreted as the number of days since 1970-01-01 and today.
> According to the Parquet spec 
> (https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md), 
> Parquet dates are expressed as "the number of days from the Unix epoch, 1 
> January 1970."
> Java timestamps are expressed as "measured in milliseconds, between the 
> current time and midnight, January 1, 1970 UTC."
> There is ambiguity here: Parquet dates are presumably local times not 
> absolute times, so the math above will actually tell us the date in London 
> right now, but that's close enough.
> Generate the local file to date.parquet. Query it with:
> SELECT * from `local`.`root`.`date.parquet`;
> The results are incorrect:
> index value raw
> 1 -11395-10-18T00:00:00.000-07:52:58  0
> Here, we have a value of 0. The displayed date is decidedly not 
> 1970-01-01T00:00:00. We actually have many problems:
> 1. The date is far off.
> 2. The output shows time. But, the Parquet DATE format explcitly does NOT 
> include time, so it makes no sense to include it.
> 3. The output attempts to show a time zone, but a time zone of -07:52:58, 
> while close to PST, is not right (there is no timezine that is of by 7:02 
> from UTC.)
> 4. The data has no time zone, Parquet DATE explicilty is a local time, so it 
> is impossible to know the relationship between that date an UTC.
> The correct output (in ISO format) would be: 1970-01-01
> The last line should be today's date, but instead is:
> 6 -11348-04-20T00:00:00.000-07:52:58  16986
> Expected:
> 2016-07-04
> Note that all the information to produce the right information is available 
> to Drill:
> 1. The DATE annotation says the meaning of the signed 32-bit integer.
> 2. Given the starting point and duration in days, the conversion to Drill's 
> own internal date format is unambiguous.
> 3. The DATE annotation says that the date is local, so Drill should not 
> attempt to convert to UTC. (That is, a Java Date object can't be used, 
> instead a Joda/Java 8 LocalDate is necessary.)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-5380) Document the usage of drill's parquet "date auto correction" flag

2017-03-27 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-5380.


> Document the usage of drill's parquet "date auto correction" flag
> -
>
> Key: DRILL-5380
> URL: https://issues.apache.org/jira/browse/DRILL-5380
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation, Storage - Parquet
>Reporter: Rahul Challapalli
>Assignee: Bridget Bevens
>
> Drill used a wrong format for storing dates in parquet  before 1.8.0 release 
> and as a result it had compatibility issues with other parquet reader/writer 
> tools. DRILL-4203 fixes that issue by providing an auto-correction capability 
> in drill's parquet reader. However if someone really intends to use dates, 
> which drill thinks are wrong, then we can use the approach to disable 
> auto-correction by drill
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false));
> {code}
> This needs to be documented



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (DRILL-5388) Correct Parquet reader option name in documentation

2017-03-27 Thread Bridget Bevens (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens resolved DRILL-5388.
---
   Resolution: Fixed
Fix Version/s: 1.10.0

Change the option name on the appropriate pages.

> Correct Parquet reader option name in documentation
> ---
>
> Key: DRILL-5388
> URL: https://issues.apache.org/jira/browse/DRILL-5388
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Bridget Bevens
>Assignee: Bridget Bevens
> Fix For: 1.10.0
>
>
> You need to update this link also.
> https://drill.apache.org/docs/parquet-format/#about-int96-support
> Parquet Format - Apache Drill
> drill.apache.org
> Configuring the Parquet Storage Format. To read or write Parquet data, you 
> need to include the Parquet format in the storage plugin format definitions.
> Yes, you are right. We need to update the documentation with 
> the correct option name.  Thanks for bringing it up.
>|
> Today, 1:57 AM
> > According to this page
> > , Drill can
> > implicitly interprets the INT96 timestamp data type in Parquet files after
> > setting the *store.parquet.int96_as_timestamp* option to *true*.
> > 
> > I believe the option name should be
> > *store.parquet.reader.int96_as_timestamp*
> >\



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5388) Correct Parquet reader option name in documentation

2017-03-27 Thread Bridget Bevens (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943855#comment-15943855
 ] 

Bridget Bevens commented on DRILL-5388:
---

Updated the option on the following pages:

https://drill.apache.org/blog/2017/03/15/drill-1.10-released/
https://drill.apache.org/docs/parquet-format/#about-int96-support
https://drill.apache.org/docs/configuration-options-introduction/

You may need to refresh the pages to see the updates.

Setting status to "Resolved."

> Correct Parquet reader option name in documentation
> ---
>
> Key: DRILL-5388
> URL: https://issues.apache.org/jira/browse/DRILL-5388
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Bridget Bevens
>Assignee: Bridget Bevens
> Fix For: 1.10.0
>
>
> You need to update this link also.
> https://drill.apache.org/docs/parquet-format/#about-int96-support
> Parquet Format - Apache Drill
> drill.apache.org
> Configuring the Parquet Storage Format. To read or write Parquet data, you 
> need to include the Parquet format in the storage plugin format definitions.
> Yes, you are right. We need to update the documentation with 
> the correct option name.  Thanks for bringing it up.
>|
> Today, 1:57 AM
> > According to this page
> > , Drill can
> > implicitly interprets the INT96 timestamp data type in Parquet files after
> > setting the *store.parquet.int96_as_timestamp* option to *true*.
> > 
> > I believe the option name should be
> > *store.parquet.reader.int96_as_timestamp*
> >\



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5354) Create CTTAS Documentaion

2017-03-27 Thread Bridget Bevens (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943785#comment-15943785
 ] 

Bridget Bevens commented on DRILL-5354:
---

Fixed the issues noted by Arina. See 
https://drill.apache.org/docs/create-temporary-table-as-cttas/.
You may need to refresh the page to see the updates here: 
https://drill.apache.org/docs/create-temporary-table-as-cttas/
And here: 
https://drill.apache.org/docs/create-temporary-table-as-cttas/#selection-of-tables

Please review to verify that changes are correct.

Thanks!

> Create CTTAS Documentaion
> -
>
> Key: DRILL-5354
> URL: https://issues.apache.org/jira/browse/DRILL-5354
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Padma Heid
>Priority: Minor
> Fix For: 1.10.0
>
> Attachments: new_line.JPG, unnecessary_paragraph.JPG
>
>
> Work with Dev ,QA and PM to create user documentation for the CTTAS command.  
> When docs are posted, this link will be available: 
> https://drill.apache.org/docs/create-temporary-table-as-cttas/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5384) Sort cannot directly access map members, causes a data copy

2017-03-27 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943772#comment-15943772
 ] 

Jinfeng Ni commented on DRILL-5384:
---

I guess the first case with text file is you are using " select * from file 
order by a, b, c"?   The additional three columns are added because of handling 
of "*"; not because of sort operator.  But I did not see your example; no way 
to comment further.

As to map vector, if the complex path does not have "array" segment, then my 
point is that there is no saving in terms of memory uses, compared the current 
approach and the new proposal.  The project operator which is doing the vector 
transfer is doing exactly same job if the sort operator has to access the 
vectors referres to in the complex path. I'm not clear how we could see 
"minimize memory use and optimize performance".



> Sort cannot directly access map members, causes a data copy
> ---
>
> Key: DRILL-5384
> URL: https://issues.apache.org/jira/browse/DRILL-5384
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Suppose we have a JSON structure for "orders" like this:
> {code}
> { customer: { id: 10, name: "fred" },
>   order: { id: 20, product: "Frammis 1000" } }
> {code}
> Suppose I want to sort by customer.id. Today, Drill will project customer.id 
> up to the top level as a temporary, hidden field. Drill will copy the data 
> from the customer.id vector to this new temporary field. Drill then sorts on 
> the temporary column, and uses another project to remove the columns.
> Clearly, this work, but it has a cost:
> * Extra two project operators.
> * Extra memory copy.
> * Sort must buffer both the original and copied data. This can double memory 
> use in the worst case.
> All of this is done simply to avoid having to reference "customer.id" in the 
> sort.
> But, as explained in DRILL-5376, maps are just nested tuples; there is no 
> need to copy the data, the data is already right there in a value vector. The 
> problem is that Drill's map implementation makes it hard for the generated 
> code to get at the "customer.id" vector.
> This ticket asks to allow the sort to work directly with nested scalars to 
> avoid the overhead explained above. To do this:
> 1. Fix nested scalar access to allow the generated code to easily access a 
> nested scalar.
> 2. Allow a sort key of the form "customer.id".
> 3. Modify the planner to generate such sort keys instead of the dual projects.
> The result will be a leaner, faster sort operation when sorting on scalars 
> within a map.
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5384) Sort cannot directly access map members, causes a data copy

2017-03-27 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943737#comment-15943737
 ] 

Paul Rogers commented on DRILL-5384:


Hi [~jni]. I may be confusing two cases. Let's discuss arrays first. When a 
test case reads a text file with three columns, and uses those three columns as 
sort keys, I see an incoming row with six columns, three of which are from the 
project. Are you saying that the three projected copies are simply references 
to the three original columns, not copies? The code shows that we do, in fact, 
make a copy. The new "RecordBatchSizer" shows that the sum of all six columns 
equals the change in memory allocator "memory allocated" setting. If these are 
not copies, then the allocator is somehow being fooled into thinking they 
consume memory.

Now, back to the map. My assumption (which you suggest is wrong) is that map 
projects work the same way. I've not looked at that particular bit of code in 
detail, so I can't comment yet one way or the other.

For a design for how the sort might handle complex paths directly, look at the 
PR for the "RowSet" test tools. These provide a flattened schema that presents 
map columns as top-level columns (with dotted names) for ease of setting up and 
validating test cases. The thought is that the same approach used in that test 
code could be applied to the map code. Not a priority at the moment, but 
something to keep in mind when we want to minimize memory use and optimize 
performance.

> Sort cannot directly access map members, causes a data copy
> ---
>
> Key: DRILL-5384
> URL: https://issues.apache.org/jira/browse/DRILL-5384
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Suppose we have a JSON structure for "orders" like this:
> {code}
> { customer: { id: 10, name: "fred" },
>   order: { id: 20, product: "Frammis 1000" } }
> {code}
> Suppose I want to sort by customer.id. Today, Drill will project customer.id 
> up to the top level as a temporary, hidden field. Drill will copy the data 
> from the customer.id vector to this new temporary field. Drill then sorts on 
> the temporary column, and uses another project to remove the columns.
> Clearly, this work, but it has a cost:
> * Extra two project operators.
> * Extra memory copy.
> * Sort must buffer both the original and copied data. This can double memory 
> use in the worst case.
> All of this is done simply to avoid having to reference "customer.id" in the 
> sort.
> But, as explained in DRILL-5376, maps are just nested tuples; there is no 
> need to copy the data, the data is already right there in a value vector. The 
> problem is that Drill's map implementation makes it hard for the generated 
> code to get at the "customer.id" vector.
> This ticket asks to allow the sort to work directly with nested scalars to 
> avoid the overhead explained above. To do this:
> 1. Fix nested scalar access to allow the generated code to easily access a 
> nested scalar.
> 2. Allow a sort key of the form "customer.id".
> 3. Modify the planner to generate such sort keys instead of the dual projects.
> The result will be a leaner, faster sort operation when sorting on scalars 
> within a map.
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5388) Correct Parquet reader option name in documentation

2017-03-27 Thread Bridget Bevens (JIRA)
Bridget Bevens created DRILL-5388:
-

 Summary: Correct Parquet reader option name in documentation
 Key: DRILL-5388
 URL: https://issues.apache.org/jira/browse/DRILL-5388
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Bridget Bevens
Assignee: Bridget Bevens


You need to update this link also.
https://drill.apache.org/docs/parquet-format/#about-int96-support
Parquet Format - Apache Drill
drill.apache.org
Configuring the Parquet Storage Format. To read or write Parquet data, you need 
to include the Parquet format in the storage plugin format definitions.

Yes, you are right. We need to update the documentation with 
the correct option name.  Thanks for bringing it up.


   |
Today, 1:57 AM
> According to this page
> , Drill can
> implicitly interprets the INT96 timestamp data type in Parquet files after
> setting the *store.parquet.int96_as_timestamp* option to *true*.
> 
> I believe the option name should be
> *store.parquet.reader.int96_as_timestamp*
>\




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DRILL-5384) Sort cannot directly access map members, causes a data copy

2017-03-27 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943669#comment-15943669
 ] 

Jinfeng Ni edited comment on DRILL-5384 at 3/27/17 5:19 PM:


The reason Drill decides to put a project for "customer.id" before sort 
operator kicks in : it greatly simplified the code for sort operator, as it 
does not have to deal with a complex schema path, without incurring too much 
overhead (vector transfer happens at batch level. it's merely reference 
transfer; no memory copy involved).

It would be great if you can make sort to handle complex schema path directly. 
However, I have doubt about such proposal's performance benefit, until the real 
performance measurement prove my suspicion is wrong. 
 

 


was (Author: jni):
The reason Drill decides to put a project for "customer.id" before sort 
operator kicks in : it greatly simplified the code for sort operator, as it 
does not have to deal with a complex schema path, without incurring too much 
overhead (vector transfer happens at batch level, not it's merely reference 
transfer; no memory copy involved).

It would be great if you can make sort to handle complex schema path directly. 
However, I have doubt about such proposal's performance benefit, until the real 
performance measurement prove my suspicion is wrong. 
 

 

> Sort cannot directly access map members, causes a data copy
> ---
>
> Key: DRILL-5384
> URL: https://issues.apache.org/jira/browse/DRILL-5384
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Suppose we have a JSON structure for "orders" like this:
> {code}
> { customer: { id: 10, name: "fred" },
>   order: { id: 20, product: "Frammis 1000" } }
> {code}
> Suppose I want to sort by customer.id. Today, Drill will project customer.id 
> up to the top level as a temporary, hidden field. Drill will copy the data 
> from the customer.id vector to this new temporary field. Drill then sorts on 
> the temporary column, and uses another project to remove the columns.
> Clearly, this work, but it has a cost:
> * Extra two project operators.
> * Extra memory copy.
> * Sort must buffer both the original and copied data. This can double memory 
> use in the worst case.
> All of this is done simply to avoid having to reference "customer.id" in the 
> sort.
> But, as explained in DRILL-5376, maps are just nested tuples; there is no 
> need to copy the data, the data is already right there in a value vector. The 
> problem is that Drill's map implementation makes it hard for the generated 
> code to get at the "customer.id" vector.
> This ticket asks to allow the sort to work directly with nested scalars to 
> avoid the overhead explained above. To do this:
> 1. Fix nested scalar access to allow the generated code to easily access a 
> nested scalar.
> 2. Allow a sort key of the form "customer.id".
> 3. Modify the planner to generate such sort keys instead of the dual projects.
> The result will be a leaner, faster sort operation when sorting on scalars 
> within a map.
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5384) Sort cannot directly access map members, causes a data copy

2017-03-27 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943669#comment-15943669
 ] 

Jinfeng Ni commented on DRILL-5384:
---

The reason Drill decides to put a project for "customer.id" before sort 
operator kicks in : it greatly simplified the code for sort operator, as it 
does not have to deal with a complex schema path, without incurring too much 
overhead (vector transfer happens at batch level, not it's merely reference 
transfer; no memory copy involved).

It would be great if you can make sort to handle complex schema path directly. 
However, I have doubt about such proposal's performance benefit, until the real 
performance measurement prove my suspicion is wrong. 
 

 

> Sort cannot directly access map members, causes a data copy
> ---
>
> Key: DRILL-5384
> URL: https://issues.apache.org/jira/browse/DRILL-5384
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Suppose we have a JSON structure for "orders" like this:
> {code}
> { customer: { id: 10, name: "fred" },
>   order: { id: 20, product: "Frammis 1000" } }
> {code}
> Suppose I want to sort by customer.id. Today, Drill will project customer.id 
> up to the top level as a temporary, hidden field. Drill will copy the data 
> from the customer.id vector to this new temporary field. Drill then sorts on 
> the temporary column, and uses another project to remove the columns.
> Clearly, this work, but it has a cost:
> * Extra two project operators.
> * Extra memory copy.
> * Sort must buffer both the original and copied data. This can double memory 
> use in the worst case.
> All of this is done simply to avoid having to reference "customer.id" in the 
> sort.
> But, as explained in DRILL-5376, maps are just nested tuples; there is no 
> need to copy the data, the data is already right there in a value vector. The 
> problem is that Drill's map implementation makes it hard for the generated 
> code to get at the "customer.id" vector.
> This ticket asks to allow the sort to work directly with nested scalars to 
> avoid the overhead explained above. To do this:
> 1. Fix nested scalar access to allow the generated code to easily access a 
> nested scalar.
> 2. Allow a sort key of the form "customer.id".
> 3. Modify the planner to generate such sort keys instead of the dual projects.
> The result will be a leaner, faster sort operation when sorting on scalars 
> within a map.
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5384) Sort cannot directly access map members, causes a data copy

2017-03-27 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943659#comment-15943659
 ] 

Jinfeng Ni commented on DRILL-5384:
---

Tried your example data:
{code}
select t.`order`.id from dfs.`/tmp/2.json` t order by t.customer.id;
00-00Screen
00-01  Project(EXPR$0=[$0])
00-02SelectionVectorRemover
00-03  Sort(sort0=[$1], dir0=[ASC])
00-04Project(EXPR$0=[ITEM($0, 'id')], EXPR$1=[ITEM($1, 'id')])
00-05  Scan(groupscan=[EasyGroupScan 
[selectionRoot=file:/tmp/2.json, numFiles=1, columns=[`order`.`id`, 
`customer`.`id`], files=[file:/tmp/2.json]]])
{code}

Here is the generated code for the project operator.  We can see the evaluation 
part is actually empty.  

{code}
Compiling (source size=569 B):
1:  
2:  package org.apache.drill.exec.test.generated;
3:  
4:  import org.apache.drill.exec.exception.SchemaChangeException;
5:  import org.apache.drill.exec.ops.FragmentContext;
6:  import org.apache.drill.exec.record.RecordBatch;
7:  
8:  public class ProjectorGen0 {
9:  
10: 
11: public void doEval(int inIndex, int outIndex)
12: throws SchemaChangeException
13: {
14: }
15: 
16: public void doSetup(FragmentContext context, RecordBatch incoming, 
RecordBatch outgoing)
17: throws SchemaChangeException
18: {
19: }
20: 
21: public void __DRILL_INIT__()
22: throws SchemaChangeException
23: {
24: }
25: 
26: }
{code}


> Sort cannot directly access map members, causes a data copy
> ---
>
> Key: DRILL-5384
> URL: https://issues.apache.org/jira/browse/DRILL-5384
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Suppose we have a JSON structure for "orders" like this:
> {code}
> { customer: { id: 10, name: "fred" },
>   order: { id: 20, product: "Frammis 1000" } }
> {code}
> Suppose I want to sort by customer.id. Today, Drill will project customer.id 
> up to the top level as a temporary, hidden field. Drill will copy the data 
> from the customer.id vector to this new temporary field. Drill then sorts on 
> the temporary column, and uses another project to remove the columns.
> Clearly, this work, but it has a cost:
> * Extra two project operators.
> * Extra memory copy.
> * Sort must buffer both the original and copied data. This can double memory 
> use in the worst case.
> All of this is done simply to avoid having to reference "customer.id" in the 
> sort.
> But, as explained in DRILL-5376, maps are just nested tuples; there is no 
> need to copy the data, the data is already right there in a value vector. The 
> problem is that Drill's map implementation makes it hard for the generated 
> code to get at the "customer.id" vector.
> This ticket asks to allow the sort to work directly with nested scalars to 
> avoid the overhead explained above. To do this:
> 1. Fix nested scalar access to allow the generated code to easily access a 
> nested scalar.
> 2. Allow a sort key of the form "customer.id".
> 3. Modify the planner to generate such sort keys instead of the dual projects.
> The result will be a leaner, faster sort operation when sorting on scalars 
> within a map.
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2017-03-27 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943658#comment-15943658
 ] 

Rahul Challapalli commented on DRILL-4203:
--

Thanks for your comments [~vitalii]. Those cases are covered in my tests and we 
found https://issues.apache.org/jira/browse/DRILL-5377 while testing that 
specifically

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}
> Implementation:
> After the fix Drill can automatically determine date corruption in parquet 
> files 
> and convert it to correct values.
> For the reason, when the user want to work with the dates over the 5 000 
> years,
> an option is included to turn off the auto-correction.
> Use of this option is assumed to be extremely unlikely, but it is included for
> completeness.
> To disable "auto correction" you should use the parquet config in the plugin 
> settings. Something like this:
> {code}
>   "formats": {
> "parquet": {
>   "type": "parquet",
>   "autoCorrectCorruptDates": false
> }
> {code}
> Or you can try to use the query like this:
> {code}
> select l_shipdate, l_commitdate from 
> table(dfs.`/drill/testdata/parquet_date/dates_nodrillversion/drillgen2_lineitem`
>  
> (type => 'parquet', autoCorrectCorruptDates => false)) limit 1;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5384) Sort cannot directly access map members, causes a data copy

2017-03-27 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943639#comment-15943639
 ] 

Jinfeng Ni commented on DRILL-5384:
---

It seems that 1st argument is true; the rest two arguments may be partially 
true. 

It's true that Drill adds a Project operator. However, it's not true that Drill 
has to copy the data out of map vector for a path 'customer.id" in your 
example. If you look at the generated code for project operator, you may see 
that it's merely doing a vector transfer. 

As a matter of fact, Drill does not differentiate  a top level column reference 
like "col1", or a nested field in n-level map, such as "col2.b.c.d".  Only when 
a map is an element of array (repeated map), Drill will evaluate and copy the 
data. For instance, 'col3.a.b[100].c.d[20].f". On the other hand, for such 
schema path, I'm not clear how your proposed approach will make it work without 
any copy, until I see a design/implementation. 





> Sort cannot directly access map members, causes a data copy
> ---
>
> Key: DRILL-5384
> URL: https://issues.apache.org/jira/browse/DRILL-5384
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Suppose we have a JSON structure for "orders" like this:
> {code}
> { customer: { id: 10, name: "fred" },
>   order: { id: 20, product: "Frammis 1000" } }
> {code}
> Suppose I want to sort by customer.id. Today, Drill will project customer.id 
> up to the top level as a temporary, hidden field. Drill will copy the data 
> from the customer.id vector to this new temporary field. Drill then sorts on 
> the temporary column, and uses another project to remove the columns.
> Clearly, this work, but it has a cost:
> * Extra two project operators.
> * Extra memory copy.
> * Sort must buffer both the original and copied data. This can double memory 
> use in the worst case.
> All of this is done simply to avoid having to reference "customer.id" in the 
> sort.
> But, as explained in DRILL-5376, maps are just nested tuples; there is no 
> need to copy the data, the data is already right there in a value vector. The 
> problem is that Drill's map implementation makes it hard for the generated 
> code to get at the "customer.id" vector.
> This ticket asks to allow the sort to work directly with nested scalars to 
> avoid the overhead explained above. To do this:
> 1. Fix nested scalar access to allow the generated code to easily access a 
> nested scalar.
> 2. Allow a sort key of the form "customer.id".
> 3. Modify the planner to generate such sort keys instead of the dual projects.
> The result will be a leaner, faster sort operation when sorting on scalars 
> within a map.
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943565#comment-15943565
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/794
  
A general comment about filter evaluation: In HashJoin and MergeJoin we do 
implicit casting of one of the join columns to process the join condition (see 
invocations of JoinUtils.addLeastRestrictiveCasts()).  Since you are adding 
filter evaluation to NestedLoopJoin, I would think we need this there as well.  
However, I am ok if you want to create a separate JIRA for doing that.  


> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943549#comment-15943549
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/794#discussion_r108213635
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ---
@@ -70,27 +70,65 @@
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillOptiq.class);
 
   /**
-   * Converts a tree of {@link RexNode} operators into a scalar expression 
in Drill syntax.
+   * Converts a tree of {@link RexNode} operators into a scalar expression 
in Drill syntax using one input.
+   *
+   * @param context parse context which contains planner settings
+   * @param input data input
+   * @param expr expression to be converted
+   * @return converted expression
*/
   public static LogicalExpression toDrill(DrillParseContext context, 
RelNode input, RexNode expr) {
-final RexToDrill visitor = new RexToDrill(context, input);
+return toDrill(context, Lists.newArrayList(input), expr);
+  }
+
+  /**
+   * Converts a tree of {@link RexNode} operators into a scalar expression 
in Drill syntax using multiple inputs.
+   *
+   * @param context parse context which contains planner settings
+   * @param inputs multiple data inputs
+   * @param expr expression to be converted
+   * @return converted expression
+   */
+  public static LogicalExpression toDrill(DrillParseContext context, 
List inputs, RexNode expr) {
+final RexToDrill visitor = new RexToDrill(context, inputs);
 return expr.accept(visitor);
   }
 
   private static class RexToDrill extends 
RexVisitorImpl {
-private final RelNode input;
+private final List inputs;
 private final DrillParseContext context;
+private final List fieldList;
 
-RexToDrill(DrillParseContext context, RelNode input) {
+RexToDrill(DrillParseContext context, List inputs) {
   super(true);
   this.context = context;
-  this.input = input;
+  this.inputs = inputs;
+  this.fieldList = Lists.newArrayList();
+  /*
+ Fields are enumerated by their presence order in input. Details 
{@link org.apache.calcite.rex.RexInputRef}.
+ Thus we can merge field list from several inputs by adding them 
into the list in order of appearance.
+ Each field index in the list will match field index in the 
RexInputRef instance which will allow us
+ to retrieve field from filed list by index in {@link 
#visitInputRef(RexInputRef)} method. Example:
+
+ Query: select t1.c1, t2.c1. t2.c2 from t1 inner join t2 on t1.c1 
between t2.c1 and t2.c2
+
+ Input 1: $0
+ Input 2: $1, $2
+
+ Result: $0, $1, $2
+   */
+  for (RelNode input : inputs) {
--- End diff --

I guess I am still unclear about the reason for merging the multiple lists 
into one.  Later, you will have to determine which input each $n reference 
belonged to, isn't it ?


> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> 

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943525#comment-15943525
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/794#discussion_r108210211
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/BatchReference.java ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.expr;
+
+import com.google.common.base.Preconditions;
+
+/**
+ * Holder class that contains batch naming, batch  and record index. Batch 
index is used when batch is hyper container.
+ * Used to distinguish batches in non-equi conditions during expression 
materialization.
+ * Mostly used for nested loop join which allows non equi-join.
+ *
+ * Example:
+ * BatchReference{batchName='leftBatch', batchIndex='leftIndex', 
recordIndex='leftIndex'}
+ * BatchReference{batchName='rightContainer', 
batchIndex='rightBatchIndex', recordIndex='rightRecordIndexWithinBatch'}
+ *
+ */
+public final class BatchReference {
--- End diff --

Can you add some comments about when the batch reference should be set ? 
for every batch or once per OK_NEW_SCHEMA status ? 


> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943518#comment-15943518
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/794#discussion_r108208895
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java
 ---
@@ -40,132 +41,133 @@
   // Record count of the left batch currently being processed
   private int leftRecordCount = 0;
 
-  // List of record counts  per batch in the hyper container
+  // List of record counts per batch in the hyper container
   private List rightCounts = null;
 
   // Output batch
   private NestedLoopJoinBatch outgoing = null;
 
-  // Next right batch to process
-  private int nextRightBatchToProcess = 0;
-
-  // Next record in the current right batch to process
-  private int nextRightRecordToProcess = 0;
-
-  // Next record in the left batch to process
-  private int nextLeftRecordToProcess = 0;
+  // Iteration status tracker
+  private IterationStatusTracker tracker = new IterationStatusTracker();
 
   /**
* Method initializes necessary state and invokes the doSetup() to set 
the
-   * input and output value vector references
+   * input and output value vector references.
+   *
* @param context Fragment context
* @param left Current left input batch being processed
* @param rightContainer Hyper container
+   * @param rightCounts Counts for each right container
* @param outgoing Output batch
*/
-  public void setupNestedLoopJoin(FragmentContext context, RecordBatch 
left,
+  public void setupNestedLoopJoin(FragmentContext context,
+  RecordBatch left,
   ExpandableHyperContainer rightContainer,
   LinkedList rightCounts,
   NestedLoopJoinBatch outgoing) {
 this.left = left;
-leftRecordCount = left.getRecordCount();
+this.leftRecordCount = left.getRecordCount();
 this.rightCounts = rightCounts;
 this.outgoing = outgoing;
 
 doSetup(context, rightContainer, left, outgoing);
   }
 
   /**
-   * This method is the core of the nested loop join. For every record on 
the right we go over
-   * the left batch and produce the cross product output
+   * Main entry point for producing the output records. Thin wrapper 
around populateOutgoingBatch(), this method
+   * controls which left batch we are processing and fetches the next left 
input batch once we exhaust the current one.
+   *
+   * @param joinType join type (INNER ot LEFT)
+   * @return the number of records produced in the output batch
+   */
+  public int outputRecords(JoinRelType joinType) {
+int outputIndex = 0;
+while (leftRecordCount != 0) {
+  outputIndex = populateOutgoingBatch(joinType, outputIndex);
+  if (outputIndex >= NestedLoopJoinBatch.MAX_BATCH_SIZE) {
+break;
+  }
+  // reset state and get next left batch
+  resetAndGetNextLeft();
+}
+return outputIndex;
+  }
+
+  /**
+   * This method is the core of the nested loop join.For each left batch 
record looks for matching record
+   * from the list of right batches. Match is checked by calling {@link 
#doEval(int, int, int)} method.
+   * If matching record is found both left and right records are written 
into output batch,
+   * otherwise if join type is LEFT, than only left record is written, 
right batch record values will be null.
+   *
+   * @param joinType join type (INNER or LEFT)
* @param outputIndex index to start emitting records at
* @return final outputIndex after producing records in the output batch
*/
-  private int populateOutgoingBatch(int outputIndex) {
-
-// Total number of batches on the right side
-int totalRightBatches = rightCounts.size();
-
-// Total number of records on the left
-int localLeftRecordCount = leftRecordCount;
-
-/*
- * The below logic is the core of the NLJ. To have better performance 
we copy the instance members into local
- * method variables, once we are done with the loop we need to update 
the instance variables to reflect the new
- * state. To avoid code duplication of resetting the instance members 
at every exit point in the loop we are using
- * 'goto'
- */
-int localNextRightBatchToProcess = nextRightBatchToProcess;
-int localNextRightRecordToProcess = nextRightRecordToProcess;
- 

[jira] [Created] (DRILL-5387) TestBitBitKerberos and TestUserBitKerberos cause sporadic unit test failures

2017-03-27 Thread Sudheesh Katkam (JIRA)
Sudheesh Katkam created DRILL-5387:
--

 Summary: TestBitBitKerberos and TestUserBitKerberos cause sporadic 
unit test failures
 Key: DRILL-5387
 URL: https://issues.apache.org/jira/browse/DRILL-5387
 Project: Apache Drill
  Issue Type: Bug
Reporter: Sudheesh Katkam
Assignee: Sudheesh Katkam


TestOptionsAuthEnabled and TestInboundImpersonation sporadically fail. There is 
a [Java 
trick|https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/hadoop/security/UgiTestUtil.java#L29]
 to reset some static state in TestUserBitKerberos and TestBitBitKerberos to 
ensure the JVM is reusable for other tests as done in the [Hadoop auth 
tests|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestUGIWithMiniKdc.java#L53],
 but this trick does not seem to work always. So disable these tests. In the 
future, maybe the tests can be run separately through surefire but not as part 
of the default build?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2017-03-27 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943304#comment-15943304
 ] 

Vitalii Diravka commented on DRILL-4203:


[~rkins] You may also add an extra cases for testing: 
1. Test an option of turning off the auto-correction, "autoCorrectCorruptDates 
=> false"
2. Dates over than 5000 years are displayed not correctly in files generated 
not from DRILL ("autoCorrectCorruptDates" should be used for this case)

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}
> Implementation:
> After the fix Drill can automatically determine date corruption in parquet 
> files 
> and convert it to correct values.
> For the reason, when the user want to work with the dates over the 5 000 
> years,
> an option is included to turn off the auto-correction.
> Use of this option is assumed to be extremely unlikely, but it is included for
> completeness.
> To disable "auto correction" you should use the parquet config in the plugin 
> settings. Something like this:
> {code}
>   "formats": {
> "parquet": {
>   "type": "parquet",
>   "autoCorrectCorruptDates": false
> }
> {code}
> Or you can try to use the query like this:
> {code}
> select l_shipdate, l_commitdate from 
> table(dfs.`/drill/testdata/parquet_date/dates_nodrillversion/drillgen2_lineitem`
>  
> (type => 'parquet', autoCorrectCorruptDates => false)) limit 1;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4818) Drill not pushing down joins to RDBS Storages

2017-03-27 Thread Marcus Rehm (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943130#comment-15943130
 ] 

Marcus Rehm commented on DRILL-4818:


Hi Muhammad, unfortunately no. We weren't able to evaluate the way we want 
without this, so we stopped using Drill. 

> Drill not pushing down joins to RDBS Storages
> -
>
> Key: DRILL-4818
> URL: https://issues.apache.org/jira/browse/DRILL-4818
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.7.0
> Environment: Windows 7 and Linux Red Hat 6.1 server
>Reporter: Marcus Rehm
>Priority: Critical
> Attachments: drill pushdown rdbms.sql, Json Profile.txt, Physical 
> Plan.txt
>
>
> I'm trying to map ours databases running on Oracle 11g. After try some 
> queries I realized that the amount of time Drill takes to complete is bigger 
> than a general sql client takes. Looking the execution plan I saw  that Drill 
> is doing the join of tables and is not pushing it down to the database.
> My storage configuration is as:
> {
>   "type": "jdbc",
>   "driver": "oracle.jdbc.OracleDriver",
>   "url": "jdbc:oracle:thin:USER/PASS@server:1521/ORCL",
>   "username": null,
>   "password": null,
>   "enabled": true
> }
> I'm not able to reproduce the case with example tables so i'm sending the 
> query and the physical plan Drill is generating.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5099) support OFFSET values greater than one, for LEAD & LAG window functions

2017-03-27 Thread Nitin Pawar (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943019#comment-15943019
 ] 

Nitin Pawar commented on DRILL-5099:


[~khfaraaz] I am almost done with changes for this JIRA
I am currently stuck at the point when copyNext has argument from next 
partition. 

Can any dev from MapR spare 15 minutes to help me resolve the issue.

> support OFFSET values greater than one, for LEAD & LAG window functions
> ---
>
> Key: DRILL-5099
> URL: https://issues.apache.org/jira/browse/DRILL-5099
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Priority: Minor
>  Labels: window_function
>
> Provide support for OFFSET values greater than one, for LEAD & LAG window 
> functions.
> Adding [~adeneche] comments from the dev list here
> {noformat}
> Things that need to be done to make Lag (or Lead) support offsets other
> than 1:
> - WindowFunction.Lead should extract the offset value from its FunctionCall
> argument, you can look at WindowFunctionNtile.numTilesFromExpression() for
> and example on how to do that.
> - make sure calls to copyNext() and copyPrev() in NoFrameSupportTemplate
> use the offset and not the hard coded value (you already figured that out)
> - finally make sure you update UnsupportedOperatorsVisitor to no longer
> throw an exception when we pass an offset value other than 1 to Lead or
> Lag. Just search for DRILL-3596 in that class and you will find the if
> block that need to be removed
> I think this should be enough to get it to work in the general case.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-3966) Metadata Cache + Partition Pruning not hapenning when the partition column is of type boolean

2017-03-27 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi closed DRILL-3966.
--
Resolution: Duplicate

This issue duplicates DRILL-4139

> Metadata Cache + Partition Pruning not hapenning when the partition column is 
> of type boolean
> -
>
> Key: DRILL-3966
> URL: https://issues.apache.org/jira/browse/DRILL-3966
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata, Query Planning & Optimization
>Reporter: Rahul Challapalli
> Attachments: 0_0_1.parquet, 0_0_2.parquet
>
>
> git.commit.id.abbrev=19b4b79
> I have partitioned parquet files whose partition column is of type boolean.
> The below plan suggests that pruning did not take place when partitioned 
> column is of type boolean and when metadata exists. However if I get rid of 
> the metadata cache, partition pruning seems to be working fine.
> Query :
> {code}
> explain plan for select * from fewtypes_boolpartition where bool_col = false;
> 00-00Screen
> 00-01  Project(*=[$0])
> 00-02Project(T11¦¦*=[$0])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($1, false)])
> 00-05  Project(T11¦¦*=[$0], bool_col=[$1])
> 00-06Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/metadata_caching/fewtypes_boolpartition/0_0_2.parquet],
>  ReadEntryWithPath 
> [path=maprfs:///drill/testdata/metadata_caching/fewtypes_boolpartition/0_0_1.parquet]],
>  selectionRoot=/drill/testdata/metadata_caching/fewtypes_boolpartition, 
> numFiles=2, usedMetadataFile=true, columns=[`*`]]])
> {code}
> Error from the log :
> {code}
> WARN  o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
> partition.
>  java.lang.UnsupportedOperationException: Unsupported type: BIT
>   at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.populatePruningVector(ParquetGroupScan.java:451)
>  ~[drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.populatePartitionVectors(ParquetPartitionDescriptor.java:96)
>  ~[drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:212)
>  ~[drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  [calcite-core-1.4.0-drill-r6.jar:1.4.0-drill-r6]
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
>  [calcite-core-1.4.0-drill-r6.jar:1.4.0-drill-r6]
>   at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) 
> [calcite-core-1.4.0-drill-r6.jar:1.4.0-drill-r6]
>   at 
> org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:303) 
> [calcite-core-1.4.0-drill-r6.jar:1.4.0-drill-r6]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:545)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:905) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
>   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> {code}
> I attached the data sets required. Let me know if you need anything



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-5105) Query time increases exponentially with increasing nested levels

2017-03-27 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers closed DRILL-5105.
--

> Query time increases exponentially with increasing nested levels
> 
>
> Key: DRILL-5105
> URL: https://issues.apache.org/jira/browse/DRILL-5105
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.9.0
> Environment: 3 Node Cluster with default memory and configurations. 
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
>
> The time taken to query any JSON dataset depends on number of nested levels 
> within the dataset. Also, increasing the complexity of the dataset further 
> impacts the execution time. 
> Tabulated below is cached query execution times for a simple select * query 
> over two simple forms of JSON datasets: 
> || No. Levels   || Time (s) Dataset 1 || Time (s) Dataset 2  ||
> |1   |0.22  |0.27 
>  |
> |2   |0.23 |0.25  
> |
> |4   |0.24 |0.22  
> |
> |8   |0.22 |0.23  
> |
> |16  |0.34 |0.48  
> |
> |24  |25.76|72.51 
>|
> |26  |103.48   |289.6 
>|
> |28  |336.12   |1151.94   
>  |
> |30  |1342.22  |4586.79|
> |32  |5360.2   |Expected: ~20k|
> The above table lists query times for 20 different JSON files, 10 belonging 
> to dataset 1 & 10 belonging to dataset 2. Each have 1 record, but the number 
> of nested levels within them vary as mentioned in the "No. Levels" column. 
> It appears that the query time almost doubles with addition of a nested level 
> (note that in the table above, it translates to almost 4x across levels 
> starting 24) 
> The below two are the representative datasets, showcasing simple JSON 
> structures with nested levels.
> Structure of Dataset 1:
> {code}
> {
>   "level1": {
> "field1": "a",
> "level2": {
>   "field1"": "b",
>   ...
> }
>   }
> }
> {code}
> Structure of Dataset 2:
> {code}
> "{
>   "level1": {
> "field1": ""a",
> "field2": {
>   "nfield1": true,
>   "nfield2": 1.1
> },
> "level2": {
>   "field1": "b",
>   "field2": {
> "nfield1": false,
> "nfield2": 2.2
>   },
>   ...
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)