[jira] [Commented] (DRILL-4842) SELECT * on JSON data results in NumberFormatException

2016-08-10 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416616#comment-15416616
 ] 

Khurram Faraaz commented on DRILL-4842:
---

The failure is also seen on an older version of MapR Drill 1.7.0 git commit ID 
: 80ebc690
store.json.all_text_mode was set to true in this case too.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * FROM `tooManyNulls.json` WHERE c1 IN 
('Hello World');
Error: SYSTEM ERROR: NumberFormatException: Hello World

Fragment 0:0

[Error Id: d4464e71-bb87-42b9-9189-3eaed389798c on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

> SELECT * on JSON data results in NumberFormatException
> --
>
> Key: DRILL-4842
> URL: https://issues.apache.org/jira/browse/DRILL-4842
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
>Reporter: Khurram Faraaz
> Attachments: tooManyNulls.json
>
>
> Note that doing SELECT c1 returns correct results, the failure is seen when 
> we do SELECT star. json.all_text_mode was set to true.
> JSON file tooManyNulls.json has one key c1 with 4096 nulls as its value and 
> the 4097th key c1 has the value "Hello World"
> git commit ID : aaf220ff
> MapR Drill 1.8.0 RPM
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> alter session set 
> `store.json.all_text_mode`=true;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | store.json.all_text_mode updated.  |
> +---++
> 1 row selected (0.27 seconds)
> 0: jdbc:drill:schema=dfs.tmp> SELECT c1 FROM `tooManyNulls.json` WHERE c1 IN 
> ('Hello World');
> +--+
> |  c1  |
> +--+
> | Hello World  |
> +--+
> 1 row selected (0.243 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select * FROM `tooManyNulls.json` WHERE c1 IN 
> ('Hello World');
> Error: SYSTEM ERROR: NumberFormatException: Hello World
> Fragment 0:0
> [Error Id: 9cafb3f9-3d5c-478a-b55c-900602b8765e on centos-01.qa.lab:31010]
>  (java.lang.NumberFormatException) Hello World
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI():95
> 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt():120
> org.apache.drill.exec.test.generated.FiltererGen1169.doSetup():45
> org.apache.drill.exec.test.generated.FiltererGen1169.setup():54
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():195
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745 (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp>
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> Caused by: java.lang.Num

[jira] [Updated] (DRILL-4842) SELECT * on JSON data results in NumberFormatException

2016-08-10 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-4842:
--
Attachment: tooManyNulls.json

> SELECT * on JSON data results in NumberFormatException
> --
>
> Key: DRILL-4842
> URL: https://issues.apache.org/jira/browse/DRILL-4842
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
>Reporter: Khurram Faraaz
> Attachments: tooManyNulls.json
>
>
> Note that doing SELECT c1 returns correct results, the failure is seen when 
> we do SELECT star. json.all_text_mode was set to true.
> JSON file tooManyNulls.json has one key c1 with 4096 nulls as its value and 
> the 4097th key c1 has the value "Hello World"
> git commit ID : aaf220ff
> MapR Drill 1.8.0 RPM
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> alter session set 
> `store.json.all_text_mode`=true;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | store.json.all_text_mode updated.  |
> +---++
> 1 row selected (0.27 seconds)
> 0: jdbc:drill:schema=dfs.tmp> SELECT c1 FROM `tooManyNulls.json` WHERE c1 IN 
> ('Hello World');
> +--+
> |  c1  |
> +--+
> | Hello World  |
> +--+
> 1 row selected (0.243 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select * FROM `tooManyNulls.json` WHERE c1 IN 
> ('Hello World');
> Error: SYSTEM ERROR: NumberFormatException: Hello World
> Fragment 0:0
> [Error Id: 9cafb3f9-3d5c-478a-b55c-900602b8765e on centos-01.qa.lab:31010]
>  (java.lang.NumberFormatException) Hello World
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI():95
> 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt():120
> org.apache.drill.exec.test.generated.FiltererGen1169.doSetup():45
> org.apache.drill.exec.test.generated.FiltererGen1169.setup():54
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():195
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745 (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp>
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> Caused by: java.lang.NumberFormatException: Hello World
> at 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI(StringFunctionHelpers.java:95)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt(StringFunctionHelpers.java:120)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.FiltererGen1169.doSetup(Filte

[jira] [Commented] (DRILL-4842) SELECT * on JSON data results in NumberFormatException

2016-08-10 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416605#comment-15416605
 ] 

Khurram Faraaz commented on DRILL-4842:
---

This one seems related to DRILL-4479 - JsonReader should pick a less 
restrictive type when creating the default column

> SELECT * on JSON data results in NumberFormatException
> --
>
> Key: DRILL-4842
> URL: https://issues.apache.org/jira/browse/DRILL-4842
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
>Reporter: Khurram Faraaz
>
> Note that doing SELECT c1 returns correct results, the failure is seen when 
> we do SELECT star. json.all_text_mode was set to true.
> JSON file tooManyNulls.json has one key c1 with 4096 nulls as its value and 
> the 4097th key c1 has the value "Hello World"
> git commit ID : aaf220ff
> MapR Drill 1.8.0 RPM
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> alter session set 
> `store.json.all_text_mode`=true;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | store.json.all_text_mode updated.  |
> +---++
> 1 row selected (0.27 seconds)
> 0: jdbc:drill:schema=dfs.tmp> SELECT c1 FROM `tooManyNulls.json` WHERE c1 IN 
> ('Hello World');
> +--+
> |  c1  |
> +--+
> | Hello World  |
> +--+
> 1 row selected (0.243 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select * FROM `tooManyNulls.json` WHERE c1 IN 
> ('Hello World');
> Error: SYSTEM ERROR: NumberFormatException: Hello World
> Fragment 0:0
> [Error Id: 9cafb3f9-3d5c-478a-b55c-900602b8765e on centos-01.qa.lab:31010]
>  (java.lang.NumberFormatException) Hello World
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI():95
> 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt():120
> org.apache.drill.exec.test.generated.FiltererGen1169.doSetup():45
> org.apache.drill.exec.test.generated.FiltererGen1169.setup():54
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():195
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745 (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp>
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> Caused by: java.lang.NumberFormatException: Hello World
> at 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI(StringFunctionHelpers.java:95)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt(StringFunctionHelpers.java:120)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.j

[jira] [Created] (DRILL-4842) SELECT * on JSON data results in NumberFormatException

2016-08-10 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4842:
-

 Summary: SELECT * on JSON data results in NumberFormatException
 Key: DRILL-4842
 URL: https://issues.apache.org/jira/browse/DRILL-4842
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.8.0
Reporter: Khurram Faraaz


Note that doing SELECT c1 returns correct results, the failure is seen when we 
do SELECT star. json.all_text_mode was set to true.

JSON file tooManyNulls.json has one key c1 with 4096 nulls as its value and the 
4097th key c1 has the value "Hello World"

git commit ID : aaf220ff
MapR Drill 1.8.0 RPM

{noformat}
0: jdbc:drill:schema=dfs.tmp> alter session set `store.json.all_text_mode`=true;
+---++
|  ok   |  summary   |
+---++
| true  | store.json.all_text_mode updated.  |
+---++
1 row selected (0.27 seconds)
0: jdbc:drill:schema=dfs.tmp> SELECT c1 FROM `tooManyNulls.json` WHERE c1 IN 
('Hello World');
+--+
|  c1  |
+--+
| Hello World  |
+--+
1 row selected (0.243 seconds)
0: jdbc:drill:schema=dfs.tmp> select * FROM `tooManyNulls.json` WHERE c1 IN 
('Hello World');
Error: SYSTEM ERROR: NumberFormatException: Hello World

Fragment 0:0

[Error Id: 9cafb3f9-3d5c-478a-b55c-900602b8765e on centos-01.qa.lab:31010]

 (java.lang.NumberFormatException) Hello World
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI():95
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt():120
org.apache.drill.exec.test.generated.FiltererGen1169.doSetup():45
org.apache.drill.exec.test.generated.FiltererGen1169.setup():54

org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():195

org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51

org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():415
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():745 (state=,code=0)
0: jdbc:drill:schema=dfs.tmp>
{noformat}

Stack trace from drillbit.log

{noformat}
Caused by: java.lang.NumberFormatException: Hello World
at 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI(StringFunctionHelpers.java:95)
 ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt(StringFunctionHelpers.java:120)
 ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.test.generated.FiltererGen1169.doSetup(FilterTemplate2.java:45)
 ~[na:na]
at 
org.apache.drill.exec.test.generated.FiltererGen1169.setup(FilterTemplate2.java:54)
 ~[na:na]
at 
org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer(FilterRecordBatch.java:195)
 ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema(FilterRecordBa

[jira] [Commented] (DRILL-4586) Create CLIENT ErrorType

2016-08-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416276#comment-15416276
 ] 

ASF GitHub Bot commented on DRILL-4586:
---

GitHub user sudheeshkatkam opened a pull request:

https://github.com/apache/drill/pull/567

DRILL-4586: Add ErrorType#CLIENT to UserException for client side errors

+ Resolve relevant TODOs

@parthchandra please review.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sudheeshkatkam/drill DRILL-4586

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/567.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #567


commit dc2779f89effcab9669b0f160efcee44932aa83d
Author: Sudheesh Katkam 
Date:   2016-08-11T00:28:42Z

DRILL-4586: Add ErrorType#CLIENT to UserException for client side errors

+ Resolve relevant TODOs




> Create CLIENT ErrorType
> ---
>
> Key: DRILL-4586
> URL: https://issues.apache.org/jira/browse/DRILL-4586
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> To display client errors with nice messages, we use "system error". However 
> system error which is not meant to be used when we want to display a proper 
> error message. System errors are meant for unexpected errors that don't have 
> a "nice" error message yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4586) Create CLIENT ErrorType

2016-08-10 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-4586:
---
Assignee: Parth Chandra  (was: Sudheesh Katkam)

> Create CLIENT ErrorType
> ---
>
> Key: DRILL-4586
> URL: https://issues.apache.org/jira/browse/DRILL-4586
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Sudheesh Katkam
>Assignee: Parth Chandra
>
> To display client errors with nice messages, we use "system error". However 
> system error which is not meant to be used when we want to display a proper 
> error message. System errors are meant for unexpected errors that don't have 
> a "nice" error message yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4586) Create CLIENT ErrorType

2016-08-10 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam reassigned DRILL-4586:
--

Assignee: Sudheesh Katkam

> Create CLIENT ErrorType
> ---
>
> Key: DRILL-4586
> URL: https://issues.apache.org/jira/browse/DRILL-4586
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> To display client errors with nice messages, we use "system error". However 
> system error which is not meant to be used when we want to display a proper 
> error message. System errors are meant for unexpected errors that don't have 
> a "nice" error message yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4833) Union-All with a small cardinality input on one side does not get parallelized

2016-08-10 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416272#comment-15416272
 ] 

Aman Sinha commented on DRILL-4833:
---

[~jni] would you mind reviewing the new PR ?   The rationale for the change is 
the following: 
Strictly speaking, union-all does not need re-distribution of data since it is 
not doing a real 'join'; but in Drill's execution model, the data distribution 
and parallelism operators are the same.  This PR is adding a hash distribution 
operator on both sides of union-all to allow parallelism to be determined 
independently for the parent and children.  Note that a round robin 
distribution would have sufficed but we don't have one.  Also, note that a 
Broadcast of the small input from RHS is not a valid plan because it would 
cause the same row to be union-ed multiple times. 

> Union-All with a small cardinality input on one side does not get parallelized
> --
>
> Key: DRILL-4833
> URL: https://issues.apache.org/jira/browse/DRILL-4833
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> When a Union-All has an input that is a LIMIT 1 (or some small value relative 
> to the slice_target), and that input is accessing Parquet files, Drill does 
> an optimization where a single Parquet file is read (based on the rowcount 
> statistics in the Parquet file, we determine that reading 1 file is 
> sufficient).  This also means that the max width for that major fragment is 
> set to 1 because only 1 minor fragment is needed to read 1 row-group. 
> The net effect of this is the width of 1 is applied to the major fragment 
> which consists of union-all and its inputs.  This is sub-optimal because it 
> prevents parallelization of the other input and the union-all operator 
> itself.  
> Here's an example query and plan that illustrates the issue: 
> {noformat}
> alter session set `planner.slice_target` = 1;
> explain plan for 
> (select c.c_nationkey, c.c_custkey, c.c_name
> from
> dfs.`/Users/asinha/data/tpchmulti/customer` c
> inner join
> dfs.`/Users/asinha/data/tpchmulti/nation`  n
> on c.c_nationkey = n.n_nationkey)
> union all
> (select c_nationkey, c_custkey, c_name
> from dfs.`/Users/asinha/data/tpchmulti/customer` c limit 1)
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-02Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-03  UnionAll(all=[true])
> 00-05Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-07  HashJoin(condition=[=($0, $3)], joinType=[inner])
> 00-10Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-13  HashToRandomExchange(dist0=[[$0]])
> 01-01UnorderedMuxExchange
> 03-01  Project(c_nationkey=[$0], c_custkey=[$1], 
> c_name=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 03-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/tpchmulti/customer]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> 00-09Project(n_nationkey=[$0])
> 00-12  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 04-01  Project(n_nationkey=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 04-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/nation]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/nation, numFiles=1, 
> usedMetadataFile=false, columns=[`n_nationkey`]]])
> 00-04Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-06  SelectionVectorRemover
> 00-08Limit(fetch=[1])
> 00-11  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/Users/asinha/data/tpchmulti/customer/01.parquet]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> {noformat}
> Note that Union-all and HashJoin are part of fragment 0 (single minor 
> fragment) even though they could have been parallelized.  This clearly 
> affects performance for larger data sets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4833) Union-All with a small cardinality input on one side does not get parallelized

2016-08-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416260#comment-15416260
 ] 

ASF GitHub Bot commented on DRILL-4833:
---

GitHub user amansinha100 opened a pull request:

https://github.com/apache/drill/pull/566

DRILL-4833: Insert exchanges on the inputs of union-all such that the…

… parent and children can be independently parallelized.

Add planner option to enable/disable distribution for union-all.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amansinha100/incubator-drill DRILL-4833-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/566.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #566


commit 2dea3bdbe300c21a71ae92b174a9cd3d1e1e8ec6
Author: Aman Sinha 
Date:   2016-08-10T15:38:25Z

DRILL-4833: Insert exchanges on the inputs of union-all such that the 
parent and children can be independently parallelized.

Add planner option to enable/disable distribution for union-all.




> Union-All with a small cardinality input on one side does not get parallelized
> --
>
> Key: DRILL-4833
> URL: https://issues.apache.org/jira/browse/DRILL-4833
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> When a Union-All has an input that is a LIMIT 1 (or some small value relative 
> to the slice_target), and that input is accessing Parquet files, Drill does 
> an optimization where a single Parquet file is read (based on the rowcount 
> statistics in the Parquet file, we determine that reading 1 file is 
> sufficient).  This also means that the max width for that major fragment is 
> set to 1 because only 1 minor fragment is needed to read 1 row-group. 
> The net effect of this is the width of 1 is applied to the major fragment 
> which consists of union-all and its inputs.  This is sub-optimal because it 
> prevents parallelization of the other input and the union-all operator 
> itself.  
> Here's an example query and plan that illustrates the issue: 
> {noformat}
> alter session set `planner.slice_target` = 1;
> explain plan for 
> (select c.c_nationkey, c.c_custkey, c.c_name
> from
> dfs.`/Users/asinha/data/tpchmulti/customer` c
> inner join
> dfs.`/Users/asinha/data/tpchmulti/nation`  n
> on c.c_nationkey = n.n_nationkey)
> union all
> (select c_nationkey, c_custkey, c_name
> from dfs.`/Users/asinha/data/tpchmulti/customer` c limit 1)
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-02Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-03  UnionAll(all=[true])
> 00-05Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-07  HashJoin(condition=[=($0, $3)], joinType=[inner])
> 00-10Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-13  HashToRandomExchange(dist0=[[$0]])
> 01-01UnorderedMuxExchange
> 03-01  Project(c_nationkey=[$0], c_custkey=[$1], 
> c_name=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 03-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/tpchmulti/customer]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> 00-09Project(n_nationkey=[$0])
> 00-12  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 04-01  Project(n_nationkey=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 04-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/nation]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/nation, numFiles=1, 
> usedMetadataFile=false, columns=[`n_nationkey`]]])
> 00-04Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-06  SelectionVectorRemover
> 00-08Limit(fetch=[1])
> 00-11  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/Users/asinha/data/tpchmulti/customer/01.parquet]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> {noformat}
> Note that Union-all and HashJoin are

[jira] [Commented] (DRILL-4606) Create DrillClient.Builder class

2016-08-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416239#comment-15416239
 ] 

ASF GitHub Bot commented on DRILL-4606:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/480
  
+1


> Create DrillClient.Builder class
> 
>
> Key: DRILL-4606
> URL: https://issues.apache.org/jira/browse/DRILL-4606
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> + Create a helper class to build DrillClient instances, and deprecate 
> DrillClient constructors
> + Allow DrillClient to specify an event loop group (so user event loop can be 
> used for queries from Web API calls)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4833) Union-All with a small cardinality input on one side does not get parallelized

2016-08-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416218#comment-15416218
 ] 

ASF GitHub Bot commented on DRILL-4833:
---

Github user amansinha100 closed the pull request at:

https://github.com/apache/drill/pull/562


> Union-All with a small cardinality input on one side does not get parallelized
> --
>
> Key: DRILL-4833
> URL: https://issues.apache.org/jira/browse/DRILL-4833
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> When a Union-All has an input that is a LIMIT 1 (or some small value relative 
> to the slice_target), and that input is accessing Parquet files, Drill does 
> an optimization where a single Parquet file is read (based on the rowcount 
> statistics in the Parquet file, we determine that reading 1 file is 
> sufficient).  This also means that the max width for that major fragment is 
> set to 1 because only 1 minor fragment is needed to read 1 row-group. 
> The net effect of this is the width of 1 is applied to the major fragment 
> which consists of union-all and its inputs.  This is sub-optimal because it 
> prevents parallelization of the other input and the union-all operator 
> itself.  
> Here's an example query and plan that illustrates the issue: 
> {noformat}
> alter session set `planner.slice_target` = 1;
> explain plan for 
> (select c.c_nationkey, c.c_custkey, c.c_name
> from
> dfs.`/Users/asinha/data/tpchmulti/customer` c
> inner join
> dfs.`/Users/asinha/data/tpchmulti/nation`  n
> on c.c_nationkey = n.n_nationkey)
> union all
> (select c_nationkey, c_custkey, c_name
> from dfs.`/Users/asinha/data/tpchmulti/customer` c limit 1)
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-02Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-03  UnionAll(all=[true])
> 00-05Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-07  HashJoin(condition=[=($0, $3)], joinType=[inner])
> 00-10Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-13  HashToRandomExchange(dist0=[[$0]])
> 01-01UnorderedMuxExchange
> 03-01  Project(c_nationkey=[$0], c_custkey=[$1], 
> c_name=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 03-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/tpchmulti/customer]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> 00-09Project(n_nationkey=[$0])
> 00-12  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 04-01  Project(n_nationkey=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 04-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/nation]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/nation, numFiles=1, 
> usedMetadataFile=false, columns=[`n_nationkey`]]])
> 00-04Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-06  SelectionVectorRemover
> 00-08Limit(fetch=[1])
> 00-11  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/Users/asinha/data/tpchmulti/customer/01.parquet]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> {noformat}
> Note that Union-all and HashJoin are part of fragment 0 (single minor 
> fragment) even though they could have been parallelized.  This clearly 
> affects performance for larger data sets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4833) Union-All with a small cardinality input on one side does not get parallelized

2016-08-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416217#comment-15416217
 ] 

ASF GitHub Bot commented on DRILL-4833:
---

Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/562
  
Closing based on comments in DRILL-4833. 


> Union-All with a small cardinality input on one side does not get parallelized
> --
>
> Key: DRILL-4833
> URL: https://issues.apache.org/jira/browse/DRILL-4833
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> When a Union-All has an input that is a LIMIT 1 (or some small value relative 
> to the slice_target), and that input is accessing Parquet files, Drill does 
> an optimization where a single Parquet file is read (based on the rowcount 
> statistics in the Parquet file, we determine that reading 1 file is 
> sufficient).  This also means that the max width for that major fragment is 
> set to 1 because only 1 minor fragment is needed to read 1 row-group. 
> The net effect of this is the width of 1 is applied to the major fragment 
> which consists of union-all and its inputs.  This is sub-optimal because it 
> prevents parallelization of the other input and the union-all operator 
> itself.  
> Here's an example query and plan that illustrates the issue: 
> {noformat}
> alter session set `planner.slice_target` = 1;
> explain plan for 
> (select c.c_nationkey, c.c_custkey, c.c_name
> from
> dfs.`/Users/asinha/data/tpchmulti/customer` c
> inner join
> dfs.`/Users/asinha/data/tpchmulti/nation`  n
> on c.c_nationkey = n.n_nationkey)
> union all
> (select c_nationkey, c_custkey, c_name
> from dfs.`/Users/asinha/data/tpchmulti/customer` c limit 1)
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-02Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-03  UnionAll(all=[true])
> 00-05Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-07  HashJoin(condition=[=($0, $3)], joinType=[inner])
> 00-10Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-13  HashToRandomExchange(dist0=[[$0]])
> 01-01UnorderedMuxExchange
> 03-01  Project(c_nationkey=[$0], c_custkey=[$1], 
> c_name=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 03-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/tpchmulti/customer]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> 00-09Project(n_nationkey=[$0])
> 00-12  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 04-01  Project(n_nationkey=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 04-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/nation]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/nation, numFiles=1, 
> usedMetadataFile=false, columns=[`n_nationkey`]]])
> 00-04Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-06  SelectionVectorRemover
> 00-08Limit(fetch=[1])
> 00-11  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/Users/asinha/data/tpchmulti/customer/01.parquet]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> {noformat}
> Note that Union-all and HashJoin are part of fragment 0 (single minor 
> fragment) even though they could have been parallelized.  This clearly 
> affects performance for larger data sets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4833) Union-All with a small cardinality input on one side does not get parallelized

2016-08-10 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416214#comment-15416214
 ] 

Aman Sinha commented on DRILL-4833:
---

On further manual testing, we found that although my patch prevents the 
UnionExchange from being dropped during ExcessiveExchangeIdentifier, the final 
parallelism of the union-all major fragment (which includes the hash-join) is 
still 1.  The reason is that since this major fragment contains the Scan node 
on the RHS of union-all, the parallelism of the entire major fragment is 
determined by the parallelism of the Scan node which in turn depends on the # 
parquet row-groups, which is 1 in this case.  

I will close the existing PR and post a new PR with a different solution. 

> Union-All with a small cardinality input on one side does not get parallelized
> --
>
> Key: DRILL-4833
> URL: https://issues.apache.org/jira/browse/DRILL-4833
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> When a Union-All has an input that is a LIMIT 1 (or some small value relative 
> to the slice_target), and that input is accessing Parquet files, Drill does 
> an optimization where a single Parquet file is read (based on the rowcount 
> statistics in the Parquet file, we determine that reading 1 file is 
> sufficient).  This also means that the max width for that major fragment is 
> set to 1 because only 1 minor fragment is needed to read 1 row-group. 
> The net effect of this is the width of 1 is applied to the major fragment 
> which consists of union-all and its inputs.  This is sub-optimal because it 
> prevents parallelization of the other input and the union-all operator 
> itself.  
> Here's an example query and plan that illustrates the issue: 
> {noformat}
> alter session set `planner.slice_target` = 1;
> explain plan for 
> (select c.c_nationkey, c.c_custkey, c.c_name
> from
> dfs.`/Users/asinha/data/tpchmulti/customer` c
> inner join
> dfs.`/Users/asinha/data/tpchmulti/nation`  n
> on c.c_nationkey = n.n_nationkey)
> union all
> (select c_nationkey, c_custkey, c_name
> from dfs.`/Users/asinha/data/tpchmulti/customer` c limit 1)
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-02Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-03  UnionAll(all=[true])
> 00-05Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-07  HashJoin(condition=[=($0, $3)], joinType=[inner])
> 00-10Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-13  HashToRandomExchange(dist0=[[$0]])
> 01-01UnorderedMuxExchange
> 03-01  Project(c_nationkey=[$0], c_custkey=[$1], 
> c_name=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 03-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/tpchmulti/customer]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> 00-09Project(n_nationkey=[$0])
> 00-12  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 04-01  Project(n_nationkey=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 04-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/nation]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/nation, numFiles=1, 
> usedMetadataFile=false, columns=[`n_nationkey`]]])
> 00-04Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-06  SelectionVectorRemover
> 00-08Limit(fetch=[1])
> 00-11  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/Users/asinha/data/tpchmulti/customer/01.parquet]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> {noformat}
> Note that Union-all and HashJoin are part of fragment 0 (single minor 
> fragment) even though they could have been parallelized.  This clearly 
> affects performance for larger data sets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4833) Union-All with a small cardinality input on one side does not get parallelized

2016-08-10 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha reassigned DRILL-4833:
-

Assignee: Aman Sinha  (was: Jinfeng Ni)

> Union-All with a small cardinality input on one side does not get parallelized
> --
>
> Key: DRILL-4833
> URL: https://issues.apache.org/jira/browse/DRILL-4833
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> When a Union-All has an input that is a LIMIT 1 (or some small value relative 
> to the slice_target), and that input is accessing Parquet files, Drill does 
> an optimization where a single Parquet file is read (based on the rowcount 
> statistics in the Parquet file, we determine that reading 1 file is 
> sufficient).  This also means that the max width for that major fragment is 
> set to 1 because only 1 minor fragment is needed to read 1 row-group. 
> The net effect of this is the width of 1 is applied to the major fragment 
> which consists of union-all and its inputs.  This is sub-optimal because it 
> prevents parallelization of the other input and the union-all operator 
> itself.  
> Here's an example query and plan that illustrates the issue: 
> {noformat}
> alter session set `planner.slice_target` = 1;
> explain plan for 
> (select c.c_nationkey, c.c_custkey, c.c_name
> from
> dfs.`/Users/asinha/data/tpchmulti/customer` c
> inner join
> dfs.`/Users/asinha/data/tpchmulti/nation`  n
> on c.c_nationkey = n.n_nationkey)
> union all
> (select c_nationkey, c_custkey, c_name
> from dfs.`/Users/asinha/data/tpchmulti/customer` c limit 1)
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-02Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-03  UnionAll(all=[true])
> 00-05Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-07  HashJoin(condition=[=($0, $3)], joinType=[inner])
> 00-10Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-13  HashToRandomExchange(dist0=[[$0]])
> 01-01UnorderedMuxExchange
> 03-01  Project(c_nationkey=[$0], c_custkey=[$1], 
> c_name=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 03-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/tpchmulti/customer]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> 00-09Project(n_nationkey=[$0])
> 00-12  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 04-01  Project(n_nationkey=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 04-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/nation]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/nation, numFiles=1, 
> usedMetadataFile=false, columns=[`n_nationkey`]]])
> 00-04Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-06  SelectionVectorRemover
> 00-08Limit(fetch=[1])
> 00-11  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/Users/asinha/data/tpchmulti/customer/01.parquet]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> {noformat}
> Note that Union-all and HashJoin are part of fragment 0 (single minor 
> fragment) even though they could have been parallelized.  This clearly 
> affects performance for larger data sets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-08-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416175#comment-15416175
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

GitHub user sudheeshkatkam opened a pull request:

https://github.com/apache/drill/pull/565

DRILL-4841: Use server event loop for web clients

@parthchandra please review (just the third commit).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sudheeshkatkam/drill DRILL-4841

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #565


commit 0889aa65023abf94c865589b7e4f5aae5baa9c6a
Author: Sudheesh Katkam 
Date:   2016-03-09T18:34:26Z

DRILL-4606: HYGIENE

+ Merge DrillAutoCloseables and AuthCloseables
+ Remove unused imports
+ Expand * imports

commit 8cc6bc929b91a9f19f2ed8cbce293db8f86c1e48
Author: Sudheesh Katkam 
Date:   2016-04-15T05:25:02Z

DRILL-4606: CORE

+ Add DrillClient.Builder helper class to create DrillClient objects
+ Deprecate 8 constructors and DrillClientFactory
+ Reorganize and document DrillClient

commit b5ee0fb6e2fd99c5e700a60261d8990642350983
Author: Sudheesh Katkam 
Date:   2016-08-10T22:39:41Z

DRILL-4841: Use server event loop for web clients




> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>Priority: Minor
>
> Currently we spawn an event loop group for handling requests from clients. 
> This group should also be used to handles responses (from server) for web 
> clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4606) Create DrillClient.Builder class

2016-08-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416173#comment-15416173
 ] 

ASF GitHub Bot commented on DRILL-4606:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/480
  
@parthchandra please review.


> Create DrillClient.Builder class
> 
>
> Key: DRILL-4606
> URL: https://issues.apache.org/jira/browse/DRILL-4606
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> + Create a helper class to build DrillClient instances, and deprecate 
> DrillClient constructors
> + Allow DrillClient to specify an event loop group (so user event loop can be 
> used for queries from Web API calls)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4606) Create DrillClient.Builder class

2016-08-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416172#comment-15416172
 ] 

ASF GitHub Bot commented on DRILL-4606:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/480#discussion_r74346473
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/BootStrapContext.java 
---
@@ -53,6 +54,8 @@ public BootStrapContext(DrillConfig config, ScanResult 
classpathScan) {
 this.loop2 = 
TransportCheck.createEventLoopGroup(config.getInt(ExecConstants.BIT_SERVER_RPC_THREADS),
 "BitClient-");
 // Note that metrics are stored in a static instance
 this.metrics = DrillMetrics.getRegistry();
+this.userLoopGroup = 
TransportCheck.createEventLoopGroup(config.getInt(ExecConstants.USER_SERVER_RPC_THREADS),
--- End diff --

Filed DRILL-4841.


> Create DrillClient.Builder class
> 
>
> Key: DRILL-4606
> URL: https://issues.apache.org/jira/browse/DRILL-4606
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> + Create a helper class to build DrillClient instances, and deprecate 
> DrillClient constructors
> + Allow DrillClient to specify an event loop group (so user event loop can be 
> used for queries from Web API calls)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4841) Use user server event loop group for web clients

2016-08-10 Thread Sudheesh Katkam (JIRA)
Sudheesh Katkam created DRILL-4841:
--

 Summary: Use user server event loop group for web clients
 Key: DRILL-4841
 URL: https://issues.apache.org/jira/browse/DRILL-4841
 Project: Apache Drill
  Issue Type: Improvement
  Components: Client - HTTP
Reporter: Sudheesh Katkam
Assignee: Sudheesh Katkam
Priority: Minor


Currently we spawn an event loop group for handling requests from clients. This 
group should also be used to handles responses (from server) for web clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4840) Sqlline prints log output to stdout on startup

2016-08-10 Thread Parth Chandra (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra closed DRILL-4840.

Resolution: Not A Problem

Weirdly enough, logback does in fact log to console if something is wrong when 
it is starting up.
http://stackoverflow.com/questions/3257154/how-to-prevent-logback-from-outputting-its-own-status-at-the-start-of-every-log
However, what I was seeing is the [INFO] messages from logback as it is 
starting up. I updated my logback to suppress those and the problem is gone.

Closing this as it may be specific to my setup

> Sqlline prints log output to stdout on startup
> --
>
> Key: DRILL-4840
> URL: https://issues.apache.org/jira/browse/DRILL-4840
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Paul Rogers
>
> With the refactoring of drill scripts, sqlline has now started printing 
> logging messages to stdout when it starts up. This messes up some users' 
> scripts that invoke sqlline.
> See also, DRILL-2798 which was logged because end users had scripts that 
> broke as a result of printing out additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4658) cannot specify tab as a fieldDelimiter in table function

2016-08-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-4658.
-
   Resolution: Fixed
Fix Version/s: (was: 1.9.0)
   1.8.0

Merged with 5ca2340a0a83412aa8fc8b077b72eca5f55e4226

> cannot specify tab as a fieldDelimiter in table function
> 
>
> Key: DRILL-4658
> URL: https://issues.apache.org/jira/browse/DRILL-4658
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.6.0
> Environment: Mac OS X, Java 8
>Reporter: Vince Gonzalez
>Assignee: Arina Ielchiieva
> Fix For: 1.8.0
>
>
> I can't specify a tab delimiter in the table function because it maybe counts 
> the characters rather than trying to interpret as a character escape code?
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as a, cast(columns[1] as bigint) as 
> b from table(dfs.tmp.`sample_cast.tsv`(type => 'text', fieldDelimiter => 
> '\t', skipFirstLine => true));
> Error: PARSE ERROR: Expected single character but was String: \t
> table sample_cast.tsv
> parameter fieldDelimiter
> SQL Query null
> [Error Id: 3efa82e1-3810-4d4a-b23c-32d6658dffcf on 172.30.1.144:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3726) Drill is not properly interpreting CRLF (0d0a). CR gets read as content.

2016-08-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-3726.
-
   Resolution: Fixed
Fix Version/s: (was: 1.9.0)
   1.8.0

Merged with 5ca2340a0a83412aa8fc8b077b72eca5f55e4226

> Drill is not properly interpreting CRLF (0d0a). CR gets read as content.
> 
>
> Key: DRILL-3726
> URL: https://issues.apache.org/jira/browse/DRILL-3726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.1.0
> Environment: Linux RHEL 6.6, OSX 10.9
>Reporter: Edmon Begoli
>Assignee: Arina Ielchiieva
> Fix For: 1.8.0
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
>   When we query the last attribute of a text file, we get missing characters. 
>  Looking at the row through Drill, a \r is included at the end of the last 
> attribute.  
> Looking in a text editor, it's not embedded into that attribute.
> I'm thinking that Drill is not interpreting CRLF (0d0a) as a new line, only 
> the LF, resulting in the CR becoming part of the last attribute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4746) Verification Failures (Decimal values) in drill's regression tests

2016-08-10 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414953#comment-15414953
 ] 

Arina Ielchiieva commented on DRILL-4746:
-

Merged with 5ca2340a0a83412aa8fc8b077b72eca5f55e4226

> Verification Failures (Decimal values) in drill's regression tests
> --
>
> Key: DRILL-4746
> URL: https://issues.apache.org/jira/browse/DRILL-4746
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - Text & CSV
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.8.0
>
>
> We started seeing the below 4 functional test failures in drill's extended 
> tests [1]. The data for the below tests can be downloaded from [2]
> {code}
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate28.q
> framework/resources/Functional/tpcds/impala/text/q43.q
> framework/resources/Functional/tpcds/variants/text/q6_1.sql
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate29.q
> {code}
> The failures started showing up from the commit [3]
> [1] https://github.com/mapr/drill-test-framework
> [2] http://apache-drill.s3.amazonaws.com/files/tpcds-sf1-text.tgz
> [3] 
> https://github.com/apache/drill/commit/223507b76ff6c2227e667ae4a53f743c92edd295
> Let me know if more information is needed to reproduce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)