[jira] [Commented] (DRILL-4553) Joins using views are not returning results.

2016-03-29 Thread Anton Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217296#comment-15217296
 ] 

Anton Fernando commented on DRILL-4553:
---

The query returns data if I do not have the filter on the first view (where 
username = upper(user)), but the whole point of this exercise is to secure the 
data by using data around who can view what inside of the JSON files.

> Joins using views are not returning results.
> 
>
> Key: DRILL-4553
> URL: https://issues.apache.org/jira/browse/DRILL-4553
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Anton Fernando
>Priority: Critical
>
> I have the following three views:
> create view view1 as select . from  where username=user;
> create view view2 as select . from view1 as a,  as b where a.col1 
> = b.col1;
> create view view3 as select . from view1 as a,  as b where a.col1 
> = b.col1;
> A select * from each of these views works fine and returns the expected 
> results.  A self join on view2 and view3 also works fine.  However when view2 
> and view3 are joined on common keys there are no rows returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4553) Joins using views are not returning results.

2016-03-29 Thread Anton Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217288#comment-15217288
 ] 

Anton Fernando commented on DRILL-4553:
---

This is over JSON and CSV, in this scenario the security metadata is in csv and 
the first view is created over it.  Views 2 and 3 are used to secure data in 
JSON with the security metadata in csv.  We are currently evaluating Drill to 
see if it is a good fit to analyze healthcare data and we have run into this 
issue.  The explain plan for the query that is not returning data is as follows:

0: jdbc:drill:zk=localhost:2181> explain plan for select a.facilityidentifier, 
a.encounteridentifier from dischargedetail a, dischargephysn b where 
a.encounteridentifier=b.encounteridentifier and 
a.facilityidentifier=b.facilityidentifier;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(facilityidentifier=[$0], encounteridentifier=[$1])
00-02Project(facilityidentifier=[$1], encounteridentifier=[$0])
00-03  Project(EncounterIdentifier=[$2], FacilityIdentifier=[$3], 
EncounterIdentifier0=[$0], FacilityIdentifier0=[$1])
00-04HashJoin(condition=[AND(=($2, $0), =($3, $1))], 
joinType=[inner])
00-06  Project(EncounterIdentifier=[$0], FacilityIdentifier=[$1])
00-08HashJoin(condition=[AND(=($1, $13), =($2, $14))], 
joinType=[inner])
00-11  Project(EncounterIdentifier=[$0], 
FacilityIdentifier=[$1], SettingOfCare=[$2], ITEM=[ITEM($3, 
'MedicalProfessionalIdentifierRaw')], ITEM4=[ITEM($3, 
'MedicalProfessionalRoleCodeRaw')], ITEM5=[ITEM($3, 
'MedicalProfessionalRoleCode')], ITEM6=[ITEM($3, 'FirstNameRaw')], 
ITEM7=[ITEM($3, 'LastNameRaw')], ITEM8=[ITEM($3, 'MiddleNameRaw')], 
ITEM9=[ITEM($3, 'MedicalProfessionalPrimarySpecialtyRaw')], ITEM10=[ITEM($3, 
'MedicalProfessionalSecondarySpecialtyRaw')], ITEM11=[ITEM($3, 
'NationalProviderIdentifierRaw')], ITEM12=[ITEM($3, 
'UniformProviderIdentifierRaw')])
00-14Flatten(flattenField=[$3])
00-17  Project(EncounterIdentifier=[$0], 
FacilityIdentifier=[ITEM($1, 'FacilityIdentifier')], SettingOfCare=[$2], 
MedicalProfessionals=[$3])
00-21Scan(groupscan=[EasyGroupScan 
[selectionRoot=hdfs://sandbox.hortonworks.com:8020/tmp/json, numFiles=3, 
columns=[`EncounterIdentifier`, `Facility`.`FacilityIdentifier`, 
`SettingOfCare`, `MedicalProfessionals`], 
files=[hdfs://sandbox.hortonworks.com:8020/tmp/json/403Encounters.json, 
hdfs://sandbox.hortonworks.com:8020/tmp/json/404Encounters.json, 
hdfs://sandbox.hortonworks.com:8020/tmp/json/405Encounters.json]]])
00-10  Project(FacilityIdentifier0=[$0], SettingOfCare0=[$1])
00-13Project(FacilityIdentifier=[$1], SettingOfCare=[$2])
00-16  SelectionVectorRemover
00-20Filter(condition=[=($0, UPPER(USER))])
00-24  Project(username=[ITEM($0, 0)], 
FacilityIdentifier=[ITEM($0, 1)], SettingOfCare=[ITEM($0, 2)])
00-26Scan(groupscan=[EasyGroupScan 
[selectionRoot=hdfs://sandbox.hortonworks.com:8020/tmp/security, numFiles=1, 
columns=[`columns`[0], `columns`[1], `columns`[2]], 
files=[hdfs://sandbox.hortonworks.com:8020/tmp/security/lake_data_security.csv]]])
00-05  Project(EncounterIdentifier0=[$0], FacilityIdentifier0=[$1])
00-07Project(EncounterIdentifier=[$1], FacilityIdentifier=[$2])
00-09  SelectionVectorRemover
00-12Filter(condition=[=($2, $3)])
00-15  HashJoin(condition=[=($0, $4)], joinType=[inner])
00-19Project(SettingOfCare=[$0], 
EncounterIdentifier=[$1], ITEM=[ITEM($2, 'FacilityIdentifier')])
00-23  Scan(groupscan=[EasyGroupScan 
[selectionRoot=hdfs://sandbox.hortonworks.com:8020/tmp/json, numFiles=3, 
columns=[`SettingOfCare`, `EncounterIdentifier`, 
`Facility`.`FacilityIdentifier`], 
files=[hdfs://sandbox.hortonworks.com:8020/tmp/json/403Encounters.json, 
hdfs://sandbox.hortonworks.com:8020/tmp/json/404Encounters.json, 
hdfs://sandbox.hortonworks.com:8020/tmp/json/405Encounters.json]]])
00-18SelectionVectorRemover
00-22  Filter(condition=[=($0, UPPER(USER))])
00-25Project(username=[ITEM($0, 0)], 
FacilityIdentifier=[ITEM($0, 1)])
00-27  Scan(groupscan=[EasyGroupScan 
[selectionRoot=hdfs://sandbox.hortonworks.com:8020/tmp/security, numFiles=1, 
columns=[`columns`[0], `columns`[1]], 
files=[hdfs://sandbox.hortonworks.com:8020/tmp/security/lake_data_security.csv]]])
 | {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ ],
"queue" : 0,
"resultMode" : "EXEC"
 

[jira] [Commented] (DRILL-4550) Add support more time units in extract function

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217191#comment-15217191
 ] 

ASF GitHub Bot commented on DRILL-4550:
---

GitHub user vkorukanti opened a pull request:

https://github.com/apache/drill/pull/453

DRILL-4550: Add support more time units in extract function

Calcite changes are pending in CALCITE-1177

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vkorukanti/drill DRILL-4550

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/453.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #453


commit ca223227cc44052b55e13a4b7525262ec4ec40f8
Author: vkorukanti 
Date:   2016-03-30T00:08:57Z

DRILL-4550: Add support more time units in extract function




> Add support more time units in extract function
> ---
>
> Key: DRILL-4550
> URL: https://issues.apache.org/jira/browse/DRILL-4550
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.6.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.7.0
>
>
> Currently {{extract}} function support following units {{YEAR, MONTH, DAY, 
> HOUR, MINUTE, SECOND}}. Add support for more units: {{CENTURY, DECADE, DOW, 
> DOY, EPOCH, MILLENNIUM, QUARTER, WEEK}}.
> We also need changes in the SQL parser. Currently the parser only allows 
> {{YEAR, MONTH, DAY, HOUR, MINUTE, SECOND}} as units.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4557) Make complex writer handle also scalars

2016-03-29 Thread Julien Le Dem (JIRA)
Julien Le Dem created DRILL-4557:


 Summary: Make complex writer handle also scalars
 Key: DRILL-4557
 URL: https://issues.apache.org/jira/browse/DRILL-4557
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Julien Le Dem


Currently complex writer can be used to write array or map but not scalar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4557) Make complex writer handle also scalars

2016-03-29 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated DRILL-4557:
-
Issue Type: Sub-task  (was: Improvement)
Parent: DRILL-4538

> Make complex writer handle also scalars
> ---
>
> Key: DRILL-4557
> URL: https://issues.apache.org/jira/browse/DRILL-4557
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Julien Le Dem
>
> Currently complex writer can be used to write array or map but not scalar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4556) UDF with FieldReader parameter reading union type fails compilation

2016-03-29 Thread Julien Le Dem (JIRA)
Julien Le Dem created DRILL-4556:


 Summary: UDF with FieldReader parameter reading union type fails 
compilation
 Key: DRILL-4556
 URL: https://issues.apache.org/jira/browse/DRILL-4556
 Project: Apache Drill
  Issue Type: Bug
Reporter: Julien Le Dem


select foo(a) from mixed
where a is a union vector (say mixed is a json file where a is a string or an 
int)
Foo is a UDF that has one param defined as a FieldReader
the operator compilation fails as the field is produced as a UnionHolder 
instead of a FieldReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4556) UDF with FieldReader parameter reading union type fails compilation

2016-03-29 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated DRILL-4556:
-
Issue Type: Sub-task  (was: Bug)
Parent: DRILL-4538

> UDF with FieldReader parameter reading union type fails compilation
> ---
>
> Key: DRILL-4556
> URL: https://issues.apache.org/jira/browse/DRILL-4556
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Julien Le Dem
>
> select foo(a) from mixed
> where a is a union vector (say mixed is a json file where a is a string or an 
> int)
> Foo is a UDF that has one param defined as a FieldReader
> the operator compilation fails as the field is produced as a UnionHolder 
> instead of a FieldReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3743) query hangs on sqlline once Drillbit on foreman node is killed

2016-03-29 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217180#comment-15217180
 ] 

Sudheesh Katkam commented on DRILL-3743:


The close listener is [removed too 
early|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/QueryResultHandler.java#L323].
 The listener should be removed after the final query state is received 
(through #resultArrived).

> query hangs on sqlline once Drillbit on foreman node is killed
> --
>
> Key: DRILL-3743
> URL: https://issues.apache.org/jira/browse/DRILL-3743
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sudheesh Katkam
>Priority: Critical
> Fix For: Future
>
>
> sqlline/query hangs once Drillbit (on Foreman node) is killed. (kill -9 )
> query was issued from the Foreman node. The query returns many records, and 
> it is a long running query.
> Steps to reproduce the problem.
> set planner.slice_target=1
> 1.  clush -g khurram service mapr-warden stop
> 2.  clush -g khurram service mapr-warden start
> 3.  ./sqlline -u "jdbc:drill:schema=dfs.tmp"
> 0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 200;
> 4.  Immediately from another console do a jps and kill the Drillbit process 
> (in this case foreman) while the query is being run on sqlline. You will 
> notice that sqlline just hangs, we do not see any exceptions or errors being 
> reported on sqlline prompt or in drillbit.log or drillbit.out
> I do see this Exception in sqlline.log on the node from where sqlline was 
> started
> {code}
> 2015-09-04 18:45:12,069 [Client-1] INFO  o.a.d.e.rpc.user.QueryResultHandler 
> - User Error Occurred
> org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: 
> Connection /10.10.100.201:53425 <--> /10.10.100.201:31010 (user client) 
> closed unexpectedly.
> [Error Id: ec316cfd-c9a5-4905-98e3-da20cb799ba5 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.rpc.user.QueryResultHandler$SubmissionListener$ChannelClosedListener.operationComplete(QueryResultHandler.java:298)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 2015-09-04 18:45:12,069 [Client-1] INFO  
> o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#7] Query failed:
> org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: 
> Connection /10.10.100.201:53425 <--> /10.10.100.201:31010 (user client) 
> closed unexpectedly.
> [Error Id: ec316cfd-c9a5-4905-98e3-da20cb799ba5 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.rpc.user.QueryResultHandler$SubmissionListener$ChannelClosedListener.operationComplete(QueryResultHandler.java:298)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) 
> 

[jira] [Commented] (DRILL-4531) Query with filter and aggregate hangs in planning phase

2016-03-29 Thread Chun Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217169#comment-15217169
 ] 

Chun Chang commented on DRILL-4531:
---

added a test in automation. verified fix.

> Query with filter and aggregate hangs in planning phase
> ---
>
> Key: DRILL-4531
> URL: https://issues.apache.org/jira/browse/DRILL-4531
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.7.0
>
>
> For the following query,
> {code}
> SELECT  cust.custAddress, 
>lineitem.provider 
> FROM ( 
>   SELECT cast(c_custkey AS bigint) AS custkey, 
>  c_address AS custAddress 
>   FROM   cp.`tpch/customer.parquet` ) cust 
> LEFT JOIN 
>   ( 
> SELECT DISTINCT l_linenumber, 
>CASE 
>  WHEN l_partkey IN (1, 2) THEN 'Store1'
>  WHEN l_partkey IN (5, 6) THEN 'Store2'
>END AS provider 
> FROM  cp.`tpch/lineitem.parquet` 
> WHERE ( l_orderkey >=20160101 AND l_partkey <=20160301) 
>   AND   l_partkey IN (1,2, 5, 6) ) lineitem
> ONcust.custkey = lineitem.l_linenumber 
> WHERE provider IS NOT NULL 
> GROUP BY  cust.custAddress, 
>   lineitem.provider 
> ORDER BY  cust.custAddress, 
>   lineitem.provider;
> {code}
> When run on today's master branch commit: 
> 79a3c164c1df7a5d7a0b82574316b4a0b1c7593e, query just hangs there in the 
> planning phase.
> Log shows that it stuck in Drill_Logical planning phase. 
>  
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4553) Joins using views are not returning results.

2016-03-29 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217163#comment-15217163
 ] 

Khurram Faraaz commented on DRILL-4553:
---

Is this over Parquet or JSON or CSV data ? Can you please share the query plan 
for the case where equi-join over two views  returns no results ?

> Joins using views are not returning results.
> 
>
> Key: DRILL-4553
> URL: https://issues.apache.org/jira/browse/DRILL-4553
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Anton Fernando
>Priority: Critical
>
> I have the following three views:
> create view view1 as select . from  where username=user;
> create view view2 as select . from view1 as a,  as b where a.col1 
> = b.col1;
> create view view3 as select . from view1 as a,  as b where a.col1 
> = b.col1;
> A select * from each of these views works fine and returns the expected 
> results.  A self join on view2 and view3 also works fine.  However when view2 
> and view3 are joined on common keys there are no rows returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4553) Joins using views are not returning results.

2016-03-29 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-4553:
--
Priority: Critical  (was: Major)

> Joins using views are not returning results.
> 
>
> Key: DRILL-4553
> URL: https://issues.apache.org/jira/browse/DRILL-4553
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Anton Fernando
>Priority: Critical
>
> I have the following three views:
> create view view1 as select . from  where username=user;
> create view view2 as select . from view1 as a,  as b where a.col1 
> = b.col1;
> create view view3 as select . from view1 as a,  as b where a.col1 
> = b.col1;
> A select * from each of these views works fine and returns the expected 
> results.  A self join on view2 and view3 also works fine.  However when view2 
> and view3 are joined on common keys there are no rows returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3743) query hangs on sqlline once Drillbit on foreman node is killed

2016-03-29 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam reassigned DRILL-3743:
--

Assignee: Sudheesh Katkam

> query hangs on sqlline once Drillbit on foreman node is killed
> --
>
> Key: DRILL-3743
> URL: https://issues.apache.org/jira/browse/DRILL-3743
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sudheesh Katkam
>Priority: Critical
> Fix For: Future
>
>
> sqlline/query hangs once Drillbit (on Foreman node) is killed. (kill -9 )
> query was issued from the Foreman node. The query returns many records, and 
> it is a long running query.
> Steps to reproduce the problem.
> set planner.slice_target=1
> 1.  clush -g khurram service mapr-warden stop
> 2.  clush -g khurram service mapr-warden start
> 3.  ./sqlline -u "jdbc:drill:schema=dfs.tmp"
> 0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 200;
> 4.  Immediately from another console do a jps and kill the Drillbit process 
> (in this case foreman) while the query is being run on sqlline. You will 
> notice that sqlline just hangs, we do not see any exceptions or errors being 
> reported on sqlline prompt or in drillbit.log or drillbit.out
> I do see this Exception in sqlline.log on the node from where sqlline was 
> started
> {code}
> 2015-09-04 18:45:12,069 [Client-1] INFO  o.a.d.e.rpc.user.QueryResultHandler 
> - User Error Occurred
> org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: 
> Connection /10.10.100.201:53425 <--> /10.10.100.201:31010 (user client) 
> closed unexpectedly.
> [Error Id: ec316cfd-c9a5-4905-98e3-da20cb799ba5 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.rpc.user.QueryResultHandler$SubmissionListener$ChannelClosedListener.operationComplete(QueryResultHandler.java:298)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 2015-09-04 18:45:12,069 [Client-1] INFO  
> o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#7] Query failed:
> org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: 
> Connection /10.10.100.201:53425 <--> /10.10.100.201:31010 (user client) 
> closed unexpectedly.
> [Error Id: ec316cfd-c9a5-4905-98e3-da20cb799ba5 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.rpc.user.QueryResultHandler$SubmissionListener$ChannelClosedListener.operationComplete(QueryResultHandler.java:298)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:744) 

[jira] [Created] (DRILL-4555) JsonReader does not support nulls in lists

2016-03-29 Thread Julien Le Dem (JIRA)
Julien Le Dem created DRILL-4555:


 Summary: JsonReader does not support nulls in lists
 Key: DRILL-4555
 URL: https://issues.apache.org/jira/browse/DRILL-4555
 Project: Apache Drill
  Issue Type: Bug
Reporter: Julien Le Dem


{noformat}
  case VALUE_NULL:
throw UserException.unsupportedError()
  .message("Null values are not supported in lists by default. " +
"Please set `store.json.all_text_mode` to true to read lists 
containing nulls. " +
"Be advised that this will treat JSON null values as a string 
containing the word 'null'.")
  .build(logger);
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4472) Pushing Filter past Union All fails: DRILL-3257 regressed DRILL-2746 but unit test update break test goal

2016-03-29 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-4472.
--
   Resolution: Fixed
Fix Version/s: 1.7.0

It was resolved when DRILL-4476

> Pushing Filter past Union All fails: DRILL-3257 regressed DRILL-2746 but unit 
> test update break test goal
> -
>
> Key: DRILL-4472
> URL: https://issues.apache.org/jira/browse/DRILL-4472
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> While reviewing DRILL-4467, I discovered this test. 
> https://github.com/apache/drill/blame/master/exec/java-exec/src/test/java/org/apache/drill/TestUnionAll.java#L560
> As you can see, the test is checking that test name confirms that filter is 
> pushed below union all. However, as you can see, the expected result in 
> DRILL-3257 was updated to a plan which doesn't push the in clause below the 
> filter. I'm disabling the test since 4467 happens to remove what becomes a 
> trivial project. However, we really should fix the core problem (a regression 
> of DRILL-2746.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4551) Add some missing functions that are generated by Tableau (cot, regex_matches, split_part, isdate)

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217083#comment-15217083
 ] 

ASF GitHub Bot commented on DRILL-4551:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/452#discussion_r57819554
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctionHelpers.java
 ---
@@ -213,11 +213,39 @@ public static long getDate(DrillBuf buf, int start, 
int end){
 if (BoundsChecking.BOUNDS_CHECKING_ENABLED) {
   buf.checkBytes(start, end);
 }
-return memGetDate(buf.memoryAddress(), start, end);
+int[] dateFields = memGetDate(buf.memoryAddress(), start, end);
+return CHRONOLOGY.getDateTimeMillis(dateFields[0], dateFields[1], 
dateFields[2], 0);
   }
 
+  /**
+   * Takes a string value, specified as a buffer with a start and end and
+   * returns true if the value can be read as a date.
+   *
+   * @param buf
+   * @param start
+   * @param end
+   * @return true iff the string value can be read as a date
+   */
+  public static boolean isReadableAsDate(DrillBuf buf, int start, int end){
+if (BoundsChecking.BOUNDS_CHECKING_ENABLED) {
+  buf.checkBytes(start, end);
+}
+int[] dateFields = memGetDate(buf.memoryAddress(), start, end);
--- End diff --

Can we call getDate() directly here, and wrap with a try/catch block? The 
code seems identical to getDate(), except for the try/catch block. 
 


> Add some missing functions that are generated by Tableau (cot, regex_matches, 
> split_part, isdate)
> -
>
> Key: DRILL-4551
> URL: https://issues.apache.org/jira/browse/DRILL-4551
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
>
> Several of these functions do not appear to be standard SQL functions, but 
> they are available in several other popular databases like SQL Server, Oracle 
> and Postgres.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4551) Add some missing functions that are generated by Tableau (cot, regex_matches, split_part, isdate)

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217072#comment-15217072
 ] 

ASF GitHub Bot commented on DRILL-4551:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/452#discussion_r57818552
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DateTypeFunctions.java
 ---
@@ -40,6 +41,36 @@
 
 public class DateTypeFunctions {
 
+/**
+ * Function to check if a varchar value can be cast to a date.
+ *
+ * At the time of writing this function, several other databases were 
checked
+ * for behavior compatibility. There was not a consensus between 
oracle and
+ * Sql server about the expected behavior of this function, and 
Postgres
+ * lacks it completely.
+ *
+ * Sql Server appears to have both a DATEFORMAT and language locale 
setting
+ * that can change the values accepted by this function. Oracle 
appears to
+ * support several formats, some of which are not mentioned in the Sql
+ * Server docs. With the lack of standardization, we decided to 
implement
+ * this function so that it would only consider date strings that 
would be
+ * accepted by the cast function as valid.
+ */
+@SuppressWarnings("unused")
+@FunctionTemplate(name = "isdate", scope = 
FunctionTemplate.FunctionScope.SIMPLE, nulls=NullHandling.NULL_IF_NULL,
--- End diff --

Have you checked isdate() returns null for null input in other system like 
oracle? I thought it would return either true or false. 
 


> Add some missing functions that are generated by Tableau (cot, regex_matches, 
> split_part, isdate)
> -
>
> Key: DRILL-4551
> URL: https://issues.apache.org/jira/browse/DRILL-4551
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
>
> Several of these functions do not appear to be standard SQL functions, but 
> they are available in several other popular databases like SQL Server, Oracle 
> and Postgres.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4531) Query with filter and aggregate hangs in planning phase

2016-03-29 Thread Chun Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang updated DRILL-4531:
--
Reviewer: Chun Chang

> Query with filter and aggregate hangs in planning phase
> ---
>
> Key: DRILL-4531
> URL: https://issues.apache.org/jira/browse/DRILL-4531
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.7.0
>
>
> For the following query,
> {code}
> SELECT  cust.custAddress, 
>lineitem.provider 
> FROM ( 
>   SELECT cast(c_custkey AS bigint) AS custkey, 
>  c_address AS custAddress 
>   FROM   cp.`tpch/customer.parquet` ) cust 
> LEFT JOIN 
>   ( 
> SELECT DISTINCT l_linenumber, 
>CASE 
>  WHEN l_partkey IN (1, 2) THEN 'Store1'
>  WHEN l_partkey IN (5, 6) THEN 'Store2'
>END AS provider 
> FROM  cp.`tpch/lineitem.parquet` 
> WHERE ( l_orderkey >=20160101 AND l_partkey <=20160301) 
>   AND   l_partkey IN (1,2, 5, 6) ) lineitem
> ONcust.custkey = lineitem.l_linenumber 
> WHERE provider IS NOT NULL 
> GROUP BY  cust.custAddress, 
>   lineitem.provider 
> ORDER BY  cust.custAddress, 
>   lineitem.provider;
> {code}
> When run on today's master branch commit: 
> 79a3c164c1df7a5d7a0b82574316b4a0b1c7593e, query just hangs there in the 
> planning phase.
> Log shows that it stuck in Drill_Logical planning phase. 
>  
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4554) Data type mismatch for union all with timestamp and date

2016-03-29 Thread Krystal (JIRA)
Krystal created DRILL-4554:
--

 Summary: Data type mismatch for union all with timestamp and date
 Key: DRILL-4554
 URL: https://issues.apache.org/jira/browse/DRILL-4554
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Krystal
Assignee: Sean Hsuan-Yi Chu


Calcite and drill execute different implicit cast when a union all query 
contains timestamp and date on both right and left hand side but in different 
order.

select col_tmstmp,col_date, col_boln from `prqUnAll_0_v` union all select 
col_date, col_tmstmp, col_boln from `prqUnAll_1_v`

limit 0: select * from (select col_tmstmp,col_date, col_boln from 
`prqUnAll_0_v` union all select col_date, col_tmstmp, col_boln from 
`prqUnAll_1_v`) t limit 0
limit 0: [col_tmstmp, col_date, col_boln]
regular: [col_tmstmp, col_date, col_boln]

limit 0: [DATE, DATE, BOOLEAN]
regular: [TIMESTAMP, TIMESTAMP, BOOLEAN]

limit 0: [columnNullable, columnNullable, columnNullable]
regular: [columnNullable, columnNullable, columnNullable]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4553) Joins using views are not returning results.

2016-03-29 Thread Anton Fernando (JIRA)
Anton Fernando created DRILL-4553:
-

 Summary: Joins using views are not returning results.
 Key: DRILL-4553
 URL: https://issues.apache.org/jira/browse/DRILL-4553
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.6.0, 1.5.0
Reporter: Anton Fernando


I have the following three views:

create view view1 as select . from  where username=user;

create view view2 as select . from view1 as a,  as b where a.col1 = 
b.col1;

create view view3 as select . from view1 as a,  as b where a.col1 = 
b.col1;

A select * from each of these views works fine and returns the expected 
results.  A self join on view2 and view3 also works fine.  However when view2 
and view3 are joined on common keys there are no rows returned.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4552) Treat decimal literals as Double when type inference is taking place

2016-03-29 Thread Sean Hsuan-Yi Chu (JIRA)
Sean Hsuan-Yi Chu created DRILL-4552:


 Summary: Treat decimal literals as Double when type inference is 
taking place
 Key: DRILL-4552
 URL: https://issues.apache.org/jira/browse/DRILL-4552
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning & Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Sean Hsuan-Yi Chu


In SQL standard, decimal literals (e.g., 1.2, 2.5, etc) are decimal types. 
However, currently, Drill always converts them to Double in DrillOptiq.

Since they will be converted as Double in execution anyway, at inference, we 
can treat them as Double to help determine the return types. 

(The current behavior is "not to do any inference if the operand is Decimal 
type").





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4551) Add some missing functions that are generated by Tableau (cot, regex_matches, split_part, isdate)

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216933#comment-15216933
 ] 

ASF GitHub Bot commented on DRILL-4551:
---

GitHub user jaltekruse opened a pull request:

https://github.com/apache/drill/pull/452

DRILL-4551: Implement new functions (cot, regex_matches, split_part, …

…isdate)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaltekruse/incubator-drill 4551-new-functions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/452.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #452


commit 0166aab070aa7175175b4a35162fc2502ea3cb90
Author: Jason Altekruse 
Date:   2016-03-28T18:55:11Z

DRILL-4551: Implement new functions (cot, regex_matches, split_part, isdate)




> Add some missing functions that are generated by Tableau (cot, regex_matches, 
> split_part, isdate)
> -
>
> Key: DRILL-4551
> URL: https://issues.apache.org/jira/browse/DRILL-4551
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
>
> Several of these functions do not appear to be standard SQL functions, but 
> they are available in several other popular databases like SQL Server, Oracle 
> and Postgres.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4551) Add some missing functions that are generated by Tableau (cot, regex_matches, split_part, isdate)

2016-03-29 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4551:
--

 Summary: Add some missing functions that are generated by Tableau 
(cot, regex_matches, split_part, isdate)
 Key: DRILL-4551
 URL: https://issues.apache.org/jira/browse/DRILL-4551
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse
Assignee: Jason Altekruse


Several of these functions do not appear to be standard SQL functions, but they 
are available in several other popular databases like SQL Server, Oracle and 
Postgres.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4543) Advertise Drill-bit ports, status, capabilities in ZooKeeper

2016-03-29 Thread John Omernik (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216900#comment-15216900
 ] 

John Omernik commented on DRILL-4543:
-

Paul, should we open another JIRA for the data port issue? I am guessing that 
you will want that for YARN as well as me wanting it for Mesos. The kicker 
being I don't have the dev background or team to be able to implement it.  (The 
control port + 1 is going to be an issue if a node running a bit dynamically 
allocates control port, but controlport + 1 isn't available). Let me know if 
you to open a JIRA or want me to.  I

> Advertise Drill-bit ports, status, capabilities in ZooKeeper
> 
>
> Key: DRILL-4543
> URL: https://issues.apache.org/jira/browse/DRILL-4543
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Reporter: Paul Rogers
> Fix For: 2.0.0
>
>
> Today Drill uses ZooKeeper (ZK) to advertise the existence of a Drill-bit, 
> providing just the host name/IP Address of the Drill-bit. All other 
> information (ports, status, capabilities) are assumed to be the same across 
> all Drill-bits in the cluster as specified in the Drill config file.
> Moving forward, as Drill becomes more sophisticated, Drill should advertise 
> the specifics of each Drill-bit so that one Drill bit can differ from another.
> For example, when running on YARN, we need a way for Drill to gracefully shut 
> down. Advertising a status of Ready or Unavailable will help. Ready is the 
> normal state. Unavailable means the Drill-bit will finish in-flight queries, 
> but won't accept new ones. (The actual status is a separate enhancement.)
> In a YARN cluster, Drill should take advantage of machines with more memory, 
> but live with machines with less. (Perhaps some are newer, some are older or 
> more heavily loaded.) Drill should use ZK to identify its available memory 
> and CPUs so that the planner can use them. (Use of the info is a separate 
> enhancement.)
> There may be times when two drill bits run on a single machine. If so, they 
> must use separate ports. So, each Drill-bit should advertise its ports in ZK.
> For backward compatibility, the information is optional; if not present, the 
> receiver should assume the information defaults to that in the config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4543) Advertise Drill-bit ports, status, capabilities in ZooKeeper

2016-03-29 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216875#comment-15216875
 ] 

Paul Rogers commented on DRILL-4543:


Thanks for the clarification. Might be a good idea for the docs on the Apache 
Drill site to point to the HOCON docs so folks know about this system.

For YARN, when testing gets far enough, we'll try out the system property 
override for the YARN-relevant properties.

> Advertise Drill-bit ports, status, capabilities in ZooKeeper
> 
>
> Key: DRILL-4543
> URL: https://issues.apache.org/jira/browse/DRILL-4543
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Reporter: Paul Rogers
> Fix For: 2.0.0
>
>
> Today Drill uses ZooKeeper (ZK) to advertise the existence of a Drill-bit, 
> providing just the host name/IP Address of the Drill-bit. All other 
> information (ports, status, capabilities) are assumed to be the same across 
> all Drill-bits in the cluster as specified in the Drill config file.
> Moving forward, as Drill becomes more sophisticated, Drill should advertise 
> the specifics of each Drill-bit so that one Drill bit can differ from another.
> For example, when running on YARN, we need a way for Drill to gracefully shut 
> down. Advertising a status of Ready or Unavailable will help. Ready is the 
> normal state. Unavailable means the Drill-bit will finish in-flight queries, 
> but won't accept new ones. (The actual status is a separate enhancement.)
> In a YARN cluster, Drill should take advantage of machines with more memory, 
> but live with machines with less. (Perhaps some are newer, some are older or 
> more heavily loaded.) Drill should use ZK to identify its available memory 
> and CPUs so that the planner can use them. (Use of the info is a separate 
> enhancement.)
> There may be times when two drill bits run on a single machine. If so, they 
> must use separate ports. So, each Drill-bit should advertise its ports in ZK.
> For backward compatibility, the information is optional; if not present, the 
> receiver should assume the information defaults to that in the config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4543) Advertise Drill-bit ports, status, capabilities in ZooKeeper

2016-03-29 Thread John Omernik (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216669#comment-15216669
 ] 

John Omernik commented on DRILL-4543:
-

Interesting.  That would keep them all in the drill-env though. So that's good. 
 The only downside to that, is you can be making changes just in drill-env, and 
someone else, or a previous change in drill-override could make for a situation 
where the drill-override is stomping on what is set in system properties/env 
right? Or am I looking at that wrong. I guess that's such an edge case it 
shouldn't matter.  I just like being overtly explicit. This way, if I look at 
drill-override later, I am always reminded that my ports are set in drill-env.  
Tiny issue but one I can see is a person preference thing. 


> Advertise Drill-bit ports, status, capabilities in ZooKeeper
> 
>
> Key: DRILL-4543
> URL: https://issues.apache.org/jira/browse/DRILL-4543
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Reporter: Paul Rogers
> Fix For: 2.0.0
>
>
> Today Drill uses ZooKeeper (ZK) to advertise the existence of a Drill-bit, 
> providing just the host name/IP Address of the Drill-bit. All other 
> information (ports, status, capabilities) are assumed to be the same across 
> all Drill-bits in the cluster as specified in the Drill config file.
> Moving forward, as Drill becomes more sophisticated, Drill should advertise 
> the specifics of each Drill-bit so that one Drill bit can differ from another.
> For example, when running on YARN, we need a way for Drill to gracefully shut 
> down. Advertising a status of Ready or Unavailable will help. Ready is the 
> normal state. Unavailable means the Drill-bit will finish in-flight queries, 
> but won't accept new ones. (The actual status is a separate enhancement.)
> In a YARN cluster, Drill should take advantage of machines with more memory, 
> but live with machines with less. (Perhaps some are newer, some are older or 
> more heavily loaded.) Drill should use ZK to identify its available memory 
> and CPUs so that the planner can use them. (Use of the info is a separate 
> enhancement.)
> There may be times when two drill bits run on a single machine. If so, they 
> must use separate ports. So, each Drill-bit should advertise its ports in ZK.
> For backward compatibility, the information is optional; if not present, the 
> receiver should assume the information defaults to that in the config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4543) Advertise Drill-bit ports, status, capabilities in ZooKeeper

2016-03-29 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216654#comment-15216654
 ] 

Jacques Nadeau commented on DRILL-4543:
---

In general, I would recommend that you set the value using system properties 
(this overrides the drill-override.conf file). If you want to use enviornment 
variables, I'd pass them with system properties in drill-env as opposed to the 
drill-override.conf file.  But that is probably just personal preferences.

> Advertise Drill-bit ports, status, capabilities in ZooKeeper
> 
>
> Key: DRILL-4543
> URL: https://issues.apache.org/jira/browse/DRILL-4543
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Reporter: Paul Rogers
> Fix For: 2.0.0
>
>
> Today Drill uses ZooKeeper (ZK) to advertise the existence of a Drill-bit, 
> providing just the host name/IP Address of the Drill-bit. All other 
> information (ports, status, capabilities) are assumed to be the same across 
> all Drill-bits in the cluster as specified in the Drill config file.
> Moving forward, as Drill becomes more sophisticated, Drill should advertise 
> the specifics of each Drill-bit so that one Drill bit can differ from another.
> For example, when running on YARN, we need a way for Drill to gracefully shut 
> down. Advertising a status of Ready or Unavailable will help. Ready is the 
> normal state. Unavailable means the Drill-bit will finish in-flight queries, 
> but won't accept new ones. (The actual status is a separate enhancement.)
> In a YARN cluster, Drill should take advantage of machines with more memory, 
> but live with machines with less. (Perhaps some are newer, some are older or 
> more heavily loaded.) Drill should use ZK to identify its available memory 
> and CPUs so that the planner can use them. (Use of the info is a separate 
> enhancement.)
> There may be times when two drill bits run on a single machine. If so, they 
> must use separate ports. So, each Drill-bit should advertise its ports in ZK.
> For backward compatibility, the information is optional; if not present, the 
> receiver should assume the information defaults to that in the config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4543) Advertise Drill-bit ports, status, capabilities in ZooKeeper

2016-03-29 Thread John Omernik (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216644#comment-15216644
 ] 

John Omernik commented on DRILL-4543:
-

I think one difference is that with HOCON we'd have to explicitly set the 
drill-override to have the ENV variable as its value. I guess this shouldn't 
hurt. We'd just have to make two changes, define what we want as ENV variables, 
and then update the drill-override on each drill bit to have the setting use 
the ENV variable instead of a hard coded port.  I guess from a simplicity point 
of view, I always saw the drill-override as a cluster wide settings file, as it 
makes it harder to read/grok if there are lots of variables. On the other hand, 
it's not more difficult than doing things implicitly with defaults, therefore, 
as long as Drill ok with the different settings, for us to do this, all we have 
to do define what environmental variables we want to use and then set them, 
while also updating the drill-override.

Then yes, the only other thing is the data port, getting the ability to 
explicitly set that.



> Advertise Drill-bit ports, status, capabilities in ZooKeeper
> 
>
> Key: DRILL-4543
> URL: https://issues.apache.org/jira/browse/DRILL-4543
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Reporter: Paul Rogers
> Fix For: 2.0.0
>
>
> Today Drill uses ZooKeeper (ZK) to advertise the existence of a Drill-bit, 
> providing just the host name/IP Address of the Drill-bit. All other 
> information (ports, status, capabilities) are assumed to be the same across 
> all Drill-bits in the cluster as specified in the Drill config file.
> Moving forward, as Drill becomes more sophisticated, Drill should advertise 
> the specifics of each Drill-bit so that one Drill bit can differ from another.
> For example, when running on YARN, we need a way for Drill to gracefully shut 
> down. Advertising a status of Ready or Unavailable will help. Ready is the 
> normal state. Unavailable means the Drill-bit will finish in-flight queries, 
> but won't accept new ones. (The actual status is a separate enhancement.)
> In a YARN cluster, Drill should take advantage of machines with more memory, 
> but live with machines with less. (Perhaps some are newer, some are older or 
> more heavily loaded.) Drill should use ZK to identify its available memory 
> and CPUs so that the planner can use them. (Use of the info is a separate 
> enhancement.)
> There may be times when two drill bits run on a single machine. If so, they 
> must use separate ports. So, each Drill-bit should advertise its ports in ZK.
> For backward compatibility, the information is optional; if not present, the 
> receiver should assume the information defaults to that in the config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4543) Advertise Drill-bit ports, status, capabilities in ZooKeeper

2016-03-29 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216617#comment-15216617
 ] 

Jacques Nadeau commented on DRILL-4543:
---

The configuration system already works this way (see the HOCON documentation).

The only ask I see here from John is being able to configure the data port 
(rather than control + 1)

> Advertise Drill-bit ports, status, capabilities in ZooKeeper
> 
>
> Key: DRILL-4543
> URL: https://issues.apache.org/jira/browse/DRILL-4543
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Reporter: Paul Rogers
> Fix For: 2.0.0
>
>
> Today Drill uses ZooKeeper (ZK) to advertise the existence of a Drill-bit, 
> providing just the host name/IP Address of the Drill-bit. All other 
> information (ports, status, capabilities) are assumed to be the same across 
> all Drill-bits in the cluster as specified in the Drill config file.
> Moving forward, as Drill becomes more sophisticated, Drill should advertise 
> the specifics of each Drill-bit so that one Drill bit can differ from another.
> For example, when running on YARN, we need a way for Drill to gracefully shut 
> down. Advertising a status of Ready or Unavailable will help. Ready is the 
> normal state. Unavailable means the Drill-bit will finish in-flight queries, 
> but won't accept new ones. (The actual status is a separate enhancement.)
> In a YARN cluster, Drill should take advantage of machines with more memory, 
> but live with machines with less. (Perhaps some are newer, some are older or 
> more heavily loaded.) Drill should use ZK to identify its available memory 
> and CPUs so that the planner can use them. (Use of the info is a separate 
> enhancement.)
> There may be times when two drill bits run on a single machine. If so, they 
> must use separate ports. So, each Drill-bit should advertise its ports in ZK.
> For backward compatibility, the information is optional; if not present, the 
> receiver should assume the information defaults to that in the config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4543) Advertise Drill-bit ports, status, capabilities in ZooKeeper

2016-03-29 Thread John Omernik (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216552#comment-15216552
 ] 

John Omernik commented on DRILL-4543:
-

Make sure the second part of my comment is also realized. I think if we don't 
fix that, both YARN and MESOS face challenges.  Not sure if that should be a 
separate JIRA or not. 

John

> Advertise Drill-bit ports, status, capabilities in ZooKeeper
> 
>
> Key: DRILL-4543
> URL: https://issues.apache.org/jira/browse/DRILL-4543
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Reporter: Paul Rogers
> Fix For: 2.0.0
>
>
> Today Drill uses ZooKeeper (ZK) to advertise the existence of a Drill-bit, 
> providing just the host name/IP Address of the Drill-bit. All other 
> information (ports, status, capabilities) are assumed to be the same across 
> all Drill-bits in the cluster as specified in the Drill config file.
> Moving forward, as Drill becomes more sophisticated, Drill should advertise 
> the specifics of each Drill-bit so that one Drill bit can differ from another.
> For example, when running on YARN, we need a way for Drill to gracefully shut 
> down. Advertising a status of Ready or Unavailable will help. Ready is the 
> normal state. Unavailable means the Drill-bit will finish in-flight queries, 
> but won't accept new ones. (The actual status is a separate enhancement.)
> In a YARN cluster, Drill should take advantage of machines with more memory, 
> but live with machines with less. (Perhaps some are newer, some are older or 
> more heavily loaded.) Drill should use ZK to identify its available memory 
> and CPUs so that the planner can use them. (Use of the info is a separate 
> enhancement.)
> There may be times when two drill bits run on a single machine. If so, they 
> must use separate ports. So, each Drill-bit should advertise its ports in ZK.
> For backward compatibility, the information is optional; if not present, the 
> receiver should assume the information defaults to that in the config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4543) Advertise Drill-bit ports, status, capabilities in ZooKeeper

2016-03-29 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216522#comment-15216522
 ] 

Paul Rogers commented on DRILL-4543:


Great idea! Perhaps generalize this to a configuration stack:

* System properties (-DpropName=value)
* Env var: DRILL_PROP_NAME=value
* Drill site config: "propName": "value"
* Defaults

The idea is that items higher in order take precedence over items lower in the 
order. Mesos can override values with env vars. YARN can use either env vars or 
command-like args (-D...).

Then, when the values are needed by other Drill-bits, this particular Drill-bit 
uses ZK to advertise its actual values as computed using the override rules. 
YARN (or Mesos) can adjust ports and resources, and other Drill-bits can learn 
of those customizations.

> Advertise Drill-bit ports, status, capabilities in ZooKeeper
> 
>
> Key: DRILL-4543
> URL: https://issues.apache.org/jira/browse/DRILL-4543
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Reporter: Paul Rogers
> Fix For: 2.0.0
>
>
> Today Drill uses ZooKeeper (ZK) to advertise the existence of a Drill-bit, 
> providing just the host name/IP Address of the Drill-bit. All other 
> information (ports, status, capabilities) are assumed to be the same across 
> all Drill-bits in the cluster as specified in the Drill config file.
> Moving forward, as Drill becomes more sophisticated, Drill should advertise 
> the specifics of each Drill-bit so that one Drill bit can differ from another.
> For example, when running on YARN, we need a way for Drill to gracefully shut 
> down. Advertising a status of Ready or Unavailable will help. Ready is the 
> normal state. Unavailable means the Drill-bit will finish in-flight queries, 
> but won't accept new ones. (The actual status is a separate enhancement.)
> In a YARN cluster, Drill should take advantage of machines with more memory, 
> but live with machines with less. (Perhaps some are newer, some are older or 
> more heavily loaded.) Drill should use ZK to identify its available memory 
> and CPUs so that the planner can use them. (Use of the info is a separate 
> enhancement.)
> There may be times when two drill bits run on a single machine. If so, they 
> must use separate ports. So, each Drill-bit should advertise its ports in ZK.
> For backward compatibility, the information is optional; if not present, the 
> receiver should assume the information defaults to that in the config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4543) Advertise Drill-bit ports, status, capabilities in ZooKeeper

2016-03-29 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216514#comment-15216514
 ] 

Paul Rogers commented on DRILL-4543:


Thanks, Jacques. I see my confusion. When using the ZK client, I see the host 
name, but ports appear as "noise". They are encoded as a Protobuf block and are 
thus visible only to Drill code (or other code that knows how to decode the 
Protobuf format.)

So, let me rephrase the port suggestion: store ports in an easily readable 
format such as plain text:

drill://host-name:123:456:789

Or, if ZK allows it (can auto remove a subtree as well as a single node), as 
children of the znode:


- host-name: my-host
- user-port: 123
- data-port: 456
- ...
- memory-mb: 128
- cores: 5


> Advertise Drill-bit ports, status, capabilities in ZooKeeper
> 
>
> Key: DRILL-4543
> URL: https://issues.apache.org/jira/browse/DRILL-4543
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Reporter: Paul Rogers
> Fix For: 2.0.0
>
>
> Today Drill uses ZooKeeper (ZK) to advertise the existence of a Drill-bit, 
> providing just the host name/IP Address of the Drill-bit. All other 
> information (ports, status, capabilities) are assumed to be the same across 
> all Drill-bits in the cluster as specified in the Drill config file.
> Moving forward, as Drill becomes more sophisticated, Drill should advertise 
> the specifics of each Drill-bit so that one Drill bit can differ from another.
> For example, when running on YARN, we need a way for Drill to gracefully shut 
> down. Advertising a status of Ready or Unavailable will help. Ready is the 
> normal state. Unavailable means the Drill-bit will finish in-flight queries, 
> but won't accept new ones. (The actual status is a separate enhancement.)
> In a YARN cluster, Drill should take advantage of machines with more memory, 
> but live with machines with less. (Perhaps some are newer, some are older or 
> more heavily loaded.) Drill should use ZK to identify its available memory 
> and CPUs so that the planner can use them. (Use of the info is a separate 
> enhancement.)
> There may be times when two drill bits run on a single machine. If so, they 
> must use separate ports. So, each Drill-bit should advertise its ports in ZK.
> For backward compatibility, the information is optional; if not present, the 
> receiver should assume the information defaults to that in the config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1170) YARN support for Drill

2016-03-29 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216504#comment-15216504
 ] 

Paul Rogers commented on DRILL-1170:


We have considered Slider. Several factors nudged us in the direction of 
writing an AM directly on YARN:

1. Slider has much documentation, but it is incomplete and out-of-date in 
important places.
2. We could make up for the documenation by reading the source code. However, 
Slider is composed of a large amount of Python code. Our team are mostly Java 
developers. If we have to learn a bunch of code, we might as well learn YARN 
directly.
3. Drill needs certain features that Slider does not (yet) provide, such as 
monitoring ZooKeeper to track Drill-bit health, perhaps offering a connection 
proxy, etc.
4. Slider is a general-purpose tool with many cool features. As it turns out, 
many are not needed for Drill. This means that Slider introduces a bit of 
unnecessary complexity for Drill admins.
5. Slider adds its own level of configuration files on top of those that we'd 
need for Drill. Not a big issue, but it is just additional complexity for Drill 
admins to learn and manage.

In balance, we like where Slider is going. Those Drill users who want to 
roll-their-own YARN integration should certainly give Slider a try as a 
short-term solution. This is particularly true for shops that already use 
Slider for other apps.

On balance, however, Drill has a number of specialized needs that would seem to 
justify the cost of a custom AM. We will, of course, continue to revisit the 
issue as analysis proceeds.

> YARN support for Drill
> --
>
> Key: DRILL-1170
> URL: https://issues.apache.org/jira/browse/DRILL-1170
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Neeraja
>Assignee: Paul Rogers
> Fix For: Future
>
>
> This is a tracking item to make Drill work with YARN.
> Below are few requirements/needs to consider.
> - Drill should run as an YARN based application, side by side with other YARN 
> enabled applications (on same nodes or different nodes). Both memory and CPU 
> resources of Drill should be controlled in this mechanism.
> - As an YARN enabled application, Drill resource consumption should be 
> adaptive to the load on the cluster. For ex: When there is no load on the 
> Drill , Drill should consume no resources on the cluster.  As the load on 
> Drill increases, resources permitting, usage should grow proportionally.
> - Low latency is a key requirement for Apache Drill along with support for 
> multiple users (concurrency in 100s-1000s). This should be supported when run 
> as YARN application as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3178) csv reader should allow newlines inside quotes

2016-03-29 Thread Daniel Reznick (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216384#comment-15216384
 ] 

Daniel Reznick commented on DRILL-3178:
---

As drill is meant for working with data in place, having to pre-process files 
prior to use with drill is counter-productive.  Drill should work hard to read 
data as is when possible, and as noted many other tools both read and write 
delimited content with newlines in quoted fields.

> csv reader should allow newlines inside quotes 
> ---
>
> Key: DRILL-3178
> URL: https://issues.apache.org/jira/browse/DRILL-3178
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0
> Environment: Ubuntu Trusty 14.04.2 LTS
>Reporter: Neal McBurnett
> Fix For: Future
>
>
> When reading a csv file which contains newlines within quoted strings, e.g. 
> via
> select * from dfs.`/tmp/q.csv`;
> Drill 1.0 says:
> Error: SYSTEM ERROR: com.univocity.parsers.common.TextParsingException:  
> Error processing input: Cannot use newline character within quoted string
> But many tools produce csv files with newlines in quoted strings.  Drill 
> should be able to handle them.
> Workaround: the csvquote program (https://github.com/dbro/csvquote) can 
> encode embedded commas and newlines, and even decode them later if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4544) Improve error messages for REFRESH TABLE METADATA command

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216357#comment-15216357
 ] 

ASF GitHub Bot commented on DRILL-4544:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/448#issuecomment-203005531
  
Let's open a follow-up bug to move this to Calcite and get in Drill for now.


> Improve error messages for REFRESH TABLE METADATA command
> -
>
> Key: DRILL-4544
> URL: https://issues.apache.org/jira/browse/DRILL-4544
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.7.0
>
>
> Improve the error messages thrown by REFRESH TABLE METADATA command:
> In the first case below, the error is maprfs.abc doesn't exist. It should 
> throw a Object not found or workspace not found. It is currently throwing a 
> non helpful message;
> 0: jdbc:drill:> refresh table metadata maprfs.abc.`my_table`;
> +
> oksummary
> +
> false Error: null
> +
> 1 row selected (0.355 seconds)
> In the second case below, it says refresh table metadata is supported only 
> for single-directory based Parquet tables. But the command works for nested 
> multi-directory Parquet files.
> 0: jdbc:drill:> refresh table metadata maprfs.vnaranammalpuram.`rfm_sales_vw`;
> ---+
> oksummary
> ---+
> false Table rfm_sales_vw does not support metadata refresh. Support is 
> currently limited to single-directory-based Parquet tables.
> ---+
> 1 row selected (0.418 seconds)
> 0: jdbc:drill:>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3993) Rebase Drill on Calcite master branch

2016-03-29 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3993:
--
Summary: Rebase Drill on Calcite master branch  (was: Rebase Drill on 
Calcite 1.7.0 release)

> Rebase Drill on Calcite master branch
> -
>
> Key: DRILL-3993
> URL: https://issues.apache.org/jira/browse/DRILL-3993
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Sudheesh Katkam
>Assignee: Jacques Nadeau
>
> Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure 
> there are no regressions.
> Also, how do we resolve this 'catching up' issue in the long term?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4403) AssertionError: Internal error: Conversion to relational algebra failed to preserve datatypes

2016-03-29 Thread Serge Harnyk (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Harnyk updated DRILL-4403:

Assignee: Taras Supyk  (was: Serge Harnyk)

>  AssertionError: Internal error: Conversion to relational algebra failed to 
> preserve datatypes
> --
>
> Key: DRILL-4403
> URL: https://issues.apache.org/jira/browse/DRILL-4403
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>Assignee: Taras Supyk
>
> select rnum, c1, c2, c3, stddev_pop( c3 ) over(partition by c1) from 
> postgres.public.tolap
> Error: SYSTEM ERROR: AssertionError: Internal error: Conversion to relational 
> algebra failed to preserve datatypes:
> validated type:
> RecordType(INTEGER NOT NULL rnum, CHAR(3) CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary" c1, CHAR(2) CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary" c2, INTEGER c3, INTEGER EXPR$4) NOT NULL
> converted type:
> RecordType(INTEGER NOT NULL rnum, CHAR(3) CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary" c1, CHAR(2) CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary" c2, INTEGER c3, DOUBLE EXPR$4) NOT NULL
> rel:
> LogicalProject(rnum=[$0], c1=[$1], c2=[$2], c3=[$3], 
> EXPR$4=[POWER(/(CastHigh(-(SUM(*(CastHigh($3), CastHigh($3))) OVER (PARTITION 
> BY $1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), 
> /(*(SUM(CastHigh($3)) OVER (PARTITION BY $1 RANGE BETWEEN UNBOUNDED PRECEDING 
> AND UNBOUNDED FOLLOWING), SUM(CastHigh($3)) OVER (PARTITION BY $1 RANGE 
> BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)), COUNT(CastHigh($3)) 
> OVER (PARTITION BY $1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
> FOLLOWING, COUNT(CastHigh($3)) OVER (PARTITION BY $1 RANGE BETWEEN 
> UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)), 0.5)])
>   LogicalTableScan(table=[[postgres, public, tolap]])
> [Error Id: 61be4aa1-6486-4118-a82b-86c22b551bb5 on centos1:31010]
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: Internal error: Conversion to relational 
> algebra failed to preserve datatypes:
> validated type:
> RecordType(INTEGER NOT NULL rnum, CHAR(3) CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary" c1, CHAR(2) CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary" c2, INTEGER c3, INTEGER EXPR$4) NOT NULL
> converted type:
> RecordType(INTEGER NOT NULL rnum, CHAR(3) CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary" c1, CHAR(2) CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary" c2, INTEGER c3, DOUBLE EXPR$4) NOT NULL
> rel:
> LogicalProject(rnum=[$0], c1=[$1], c2=[$2], c3=[$3], 
> EXPR$4=[POWER(/(CastHigh(-(SUM(*(CastHigh($3), CastHigh($3))) OVER (PARTITION 
> BY $1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), 
> /(*(SUM(CastHigh($3)) OVER (PARTITION BY $1 RANGE BETWEEN UNBOUNDED PRECEDING 
> AND UNBOUNDED FOLLOWING), SUM(CastHigh($3)) OVER (PARTITION BY $1 RANGE 
> BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)), COUNT(CastHigh($3)) 
> OVER (PARTITION BY $1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
> FOLLOWING, COUNT(CastHigh($3)) OVER (PARTITION BY $1 RANGE BETWEEN 
> UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)), 0.5)])
>   LogicalTableScan(table=[[postgres, public, tolap]])
> org.apache.drill.exec.work.foreman.Foreman.run():261
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (java.lang.AssertionError) Internal error: Conversion to 
> relational algebra failed to preserve datatypes:
> validated type:
> RecordType(INTEGER NOT NULL rnum, CHAR(3) CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary" c1, CHAR(2) CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary" c2, INTEGER c3, INTEGER EXPR$4) NOT NULL
> converted type:
> RecordType(INTEGER NOT NULL rnum, CHAR(3) CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary" c1, CHAR(2) CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary" c2, INTEGER c3, DOUBLE EXPR$4) NOT NULL
> rel:
> LogicalProject(rnum=[$0], c1=[$1], c2=[$2], c3=[$3], 
> EXPR$4=[POWER(/(CastHigh(-(SUM(*(CastHigh($3), CastHigh($3))) OVER (PARTITION 
> BY $1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), 
> /(*(SUM(CastHigh($3)) OVER (PARTITION BY $1 RANGE BETWEEN UNBOUNDED PRECEDING 
> AND UNBOUNDED FOLLOWING), SUM(CastHigh($3)) OVER (PARTITION BY $1 RANGE 
> BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)), COUNT(CastHigh($3)) 
> OVER (PARTITION BY $1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
> FOLLOWING, 

[jira] [Commented] (DRILL-4405) invalid Postgres SQL generated for CONCAT (literal, literal)

2016-03-29 Thread Serge Harnyk (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216175#comment-15216175
 ] 

Serge Harnyk commented on DRILL-4405:
-

Doesn't reproduce in Drill 1.7. Solved in DRILL-4372

> invalid Postgres SQL generated for CONCAT (literal, literal) 
> -
>
> Key: DRILL-4405
> URL: https://issues.apache.org/jira/browse/DRILL-4405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>Assignee: Serge Harnyk
>
> select concat( 'FF' , 'FF' )  from postgres.public.tversion
> Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the 
> SQL query. 
> sql SELECT CAST('' AS ANY) AS "EXPR$0"
> FROM "public"."tversion"
> plugin postgres
> Fragment 0:0
> [Error Id: c3f24106-8d75-4a57-a638-ac5f0aca0769 on centos1:31010]
>   (org.postgresql.util.PSQLException) ERROR: syntax error at or near "ANY"
>   Position: 23
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse():2182
> org.postgresql.core.v3.QueryExecutorImpl.processResults():1911
> org.postgresql.core.v3.QueryExecutorImpl.execute():173
> org.postgresql.jdbc.PgStatement.execute():622
> org.postgresql.jdbc.PgStatement.executeWithFlags():458
> org.postgresql.jdbc.PgStatement.executeQuery():374
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
> org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup():177
> org.apache.drill.exec.physical.impl.ScanBatch.():108
> org.apache.drill.exec.physical.impl.ScanBatch.():136
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():40
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():33
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():147
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():170
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():101
> org.apache.drill.exec.physical.impl.ImplCreator.getExec():79
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():230
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
> SQLState:  null
> ErrorCode: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4409) projecting literal will result in an empty resultset

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216143#comment-15216143
 ] 

ASF GitHub Bot commented on DRILL-4409:
---

GitHub user Serge-Harnyk opened a pull request:

https://github.com/apache/drill-site/pull/1

DRILL-4409 - Add notice about Postgres typing of literals

All in comments here
https://issues.apache.org/jira/browse/DRILL-4409


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Serge-Harnyk/drill-site asf-site

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill-site/pull/1.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1






> projecting literal will result in an empty resultset
> 
>
> Key: DRILL-4409
> URL: https://issues.apache.org/jira/browse/DRILL-4409
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>Assignee: Serge Harnyk
>
> A query which projects a literal as shown against a Postgres table will 
> result in an empty result set being returned. 
> select 'BB' from postgres.public.tversion



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3878) Support XML Querying (selects/projections, no writing)

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216103#comment-15216103
 ] 

ASF GitHub Bot commented on DRILL-3878:
---

GitHub user magpierre opened a pull request:

https://github.com/apache/drill/pull/451

Drill 3878

Please review my fix for JIRA DRILL-3878 provide XML support for Apache 
Drill.
The fix utilizes the existing support for JSON by converting XML to JSON 
using a simple SAX parser built for the purpose.
The parser tries to produce acceptable JSON documents that are then fed 
into the JSONRecordReader for futher processing.

To add xml support into Apache Drill, please include the built package to 
3rdparty folder of the built Apache Drill environment, and start.
Add:

"xml": {
  "type": "xml",
  "extensions": [
"xml"
  ],
  "keepPrefix": true
}

to the type section in dfs 
(keepPrefix = false will remove namespace from tags in Apache Drill since 
namespace can be named differently between documents and are not really part of 
the tagname)

The parser tries to be nice to Drill / JSON Reader by avoiding mixing 
types, arranging recurring values in arrays, and by removing empty elements. 
This in order to minimize the amount of JSON errors due to the different nature 
of XML and Drill.

Convention in JSON
Attributes are named using convetiion @ and then the attribute name and 
store simple values.
All other objects are stored as objects with a #value field.
This is somewhat conforming with Apache Spark XML, but I need to store all 
values in objects in order to avoid as many map of different type problems as 
possible.

Current limitations:
DTD tags are currently not liked. 
Schema is not validated against XSD's.

Also: SInce I am not a Drill Developer, I might have broken all rules 
possible of syntax, format, layout, test frameworks, as well as how to submit 
pull requests. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/magpierre/drill DRILL-3878

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/451.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #451


commit 844f34a16e75719535ff94c54d5337746ea18c20
Author: MPierre 
Date:   2015-11-05T14:42:06Z

Initial commit

XML support in Apache Drill

commit 592b3af06c2ff45198136577561f2ec1f7caaee0
Author: MPierre 
Date:   2015-11-05T21:21:42Z

Fixed some minor outstanding bugs

EasyRecordReader have a new field userName, and I forgot to change
jsonProcessor to protected from private.

commit 8fad811edab43d3499b41bb66cb419248d11208f
Author: MPierre 
Date:   2015-11-09T08:59:08Z

Merge remote-tracking branch 'apache/master' into DRILL-3878

commit 38f4884fe9b8456c1cde5de44c1e54177301a974
Author: MPierre 
Date:   2016-03-16T11:33:15Z

Syncing to latest release of drill

commit 909c5dec8bdb01bfe0ed358ebc64c959785738df
Author: MPierre 
Date:   2016-03-16T11:34:10Z

syncing to latest release of drill

commit 597d9657d613fa35df2c10dff23681545b13e531
Author: MPierre 
Date:   2016-03-18T08:55:51Z

Cleaned up deliver

Cleaned up the output generated by the SAX Parser, and removed all
unnecessary code.

commit 0cfaa31ab9af89833417288a290d21d0ce88c4ac
Author: MPierre 
Date:   2016-03-18T10:29:51Z

Merge remote-tracking branch 'apache/master' into DRILL-3878

commit aaaff05eb921125ad64854c89c179292c4441fb7
Author: MPierre 
Date:   2016-03-24T13:05:53Z

Adjusted output from Parser to fit Drill better

I have adjusted the SAX parser to produce JSON that Drill likes. Among
the things corrected is to remove empty objects from the tree built.
And to consolidate repeating values in arrays.

commit ba19a356d850224c01b9e807183377b46cf7e545
Author: MPierre 
Date:   2016-03-24T13:10:57Z

Fixed small typo

commit 8ba6705be42c7847d469611ab070b869e0c76d8c
Author: MPierre 
Date:   2016-03-24T21:17:30Z

Further enhancements of the output format to fit Drill

commit e2273f13b8e0136a33c1576c4667f16e23e1631c
Author: MPierre 
Date:   2016-03-24T21:22:41Z

Removed comment

commit c1b6ff8375a7e3c8161167d1a5f2b34ba165e750
Author: MPierre 
Date:   2016-03-29T12:48:53Z

Merge remote-tracking branch 'apache/master' into DRILL-3878




> Support XML Querying (selects/projections, no writing)

[jira] [Updated] (DRILL-4458) JDBC plugin case sensitive table names

2016-03-29 Thread Serge Harnyk (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Harnyk updated DRILL-4458:

External issue ID: DRILL-3993

> JDBC plugin case sensitive table names
> --
>
> Key: DRILL-4458
> URL: https://issues.apache.org/jira/browse/DRILL-4458
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
> Environment: Drill embedded mode on OSX, connecting to MS SQLServer
>Reporter: Paul Mogren
>Assignee: Serge Harnyk
>Priority: Minor
>
> I just tried Drill with MS SQL Server and I found that Drill treats table
> names case-sensitively, contrary to
> https://drill.apache.org/docs/lexical-structure/ which indicates that
> table names are "case-insensitive unless enclosed in double quotation
> marks”. This presents a problem for users and existing SQL scripts that
> expect table names to be case-insensitive.
> This works: select * from mysandbox.dbo.AD_Role
> This does not work: select * from mysandbox.dbo.ad_role
> Mailing list reference including stack trace: 
> http://mail-archives.apache.org/mod_mbox/drill-user/201603.mbox/%3ccajrw0otv8n5ybmvu6w_efe4npgenrdk5grmh9jtbxu9xnni...@mail.gmail.com%3e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted

2016-03-29 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215507#comment-15215507
 ] 

Deneche A. Hakim commented on DRILL-3714:
-

[~jacq...@dremio.com] assuming this is indeed the test case requested, here is 
my understanding:

The fix proposed for this JIRA is helpful for any message waiting, in the 
CoordinationQueue, for an ACK. In the case of the UserClient, it only really 
waits for the handshake and the queryId for every submitted query, other than 
that he CoordinationQueue should be empty most of the time.
The fix I proposed here may help the UserClient fail quickly if it's waiting 
for a queryId, but other than that it will still hang for any query it didn't 
receive the terminal state yet (DRILL-3743). It's a problem worth fixing, but I 
think it should be done separately.

> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
>  [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> at