[jira] [Comment Edited] (DRILL-4116) Inconsistent results with datetime functions on different machines

2017-06-13 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031397#comment-16031397
 ] 

Vitalii Diravka edited comment on DRILL-4116 at 6/13/17 10:50 AM:
--

[~rkins] It looks like the output relies on the local timezone (but should not, 
the result should be the same for any timezone for this function). Could you 
double-check it by querying (for both machines):
{code}
select datediff(date '1996-03-01', timestamp '1997-02-10 17:32:00.0'), 
timeofday() from cp.`tpch/lineitem.parquet` limit 1;
{code}




was (Author: vitalii):
[~rkins] It looks like the output relies on the local timezone (but should not, 
the result should be the same for any timezone for this function). Could you 
double-check it by querying (for both machines):
{code}
select datediff(date '1996-03-01', timestamp '1997-02-10 17:32:00.0'), 
tiemofday() from cp.`tpch/lineitem.parquet` limit 1;
{code}



> Inconsistent results with datetime functions on different machines
> --
>
> Key: DRILL-4116
> URL: https://issues.apache.org/jira/browse/DRILL-4116
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Critical
>
> git.commit.id.abbrev=a6a0fc3
> The below query yields different results on different machines
> System 1 :
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select datediff(date '1996-03-01', 
> timestamp '1997-02-10 17:32:00.0') from cp.`tpch/lineitem.parquet` limit 1;
> +-+
> | EXPR$0  |
> +-+
> | -346|
> +-+
> 1 row selected (1.57 seconds)
> {code}
> System 2 :
> {code}
> 0: jdbc:drill:drillbit=10.10.88.193> select datediff(date '1996-03-01', 
> timestamp '1997-02-10 17:32:00.0') from cp.`tpch/lineitem.parquet` limit 1;
> +-+
> | EXPR$0  |
> +-+
> | -347|
> +-+
> 1 row selected (1.239 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5583) Literal expression not handled

2017-06-13 Thread Muhammad Gelbana (JIRA)
Muhammad Gelbana created DRILL-5583:
---

 Summary: Literal expression not handled
 Key: DRILL-5583
 URL: https://issues.apache.org/jira/browse/DRILL-5583
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Affects Versions: 1.9.0
Reporter: Muhammad Gelbana


The following query
{code:sql}
SELECT ((UNIX_TIMESTAMP(Calcs.`date0`, '-MM-dd') / (60 * 60 * 24)) + (365 * 
70 + 17)) `TEMP(Test)(64617177)(0)` FROM `dfs`.`path_to_parquet` Calcs GROUP BY 
((UNIX_TIMESTAMP(Calcs.`date0`, '-MM-dd') / (60 * 60 * 24)) + (365 * 70 + 
17))
{code}

Throws the following exception
{noformat}
[Error Id: 5ee33c0f-9edc-43a0-8125-3e6499e72410 on mgelbana:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError: 
Internal error: invalid literal: 60 + 2


[Error Id: 5ee33c0f-9edc-43a0-8125-3e6499e72410 on mgelbana:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
 ~[drill-common-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:825)
 [drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:935) 
[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) 
[drill-java-exec-1.9.0.jar:1.9.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
exception during fragment initialization: Internal error: invalid literal: 60 + 
2
... 4 common frames omitted
Caused by: java.lang.AssertionError: Internal error: invalid literal: 60 + 2
at org.apache.calcite.util.Util.newInternal(Util.java:777) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlLiteral.value(SqlLiteral.java:329) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.SqlCallBinding.getOperandLiteralValue(SqlCallBinding.java:219)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.SqlBinaryOperator.getMonotonicity(SqlBinaryOperator.java:188)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.drill.exec.planner.sql.DrillCalciteSqlOperatorWrapper.getMonotonicity(DrillCalciteSqlOperatorWrapper.java:107)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.calcite.sql.SqlCall.getMonotonicity(SqlCall.java:175) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.SqlCallBinding.getOperandMonotonicity(SqlCallBinding.java:193)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.fun.SqlMonotonicBinaryOperator.getMonotonicity(SqlMonotonicBinaryOperator.java:59)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.drill.exec.planner.sql.DrillCalciteSqlOperatorWrapper.getMonotonicity(DrillCalciteSqlOperatorWrapper.java:107)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.calcite.sql.SqlCall.getMonotonicity(SqlCall.java:175) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SelectScope.getMonotonicity(SelectScope.java:154)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:2476)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:2374)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:603)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:564)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2769)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:518)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.drill.exec.planner.sql.SqlConverter.toRel(SqlConverter.java:263) 
~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:626)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:195)
 ~[drill-java-exec-1.9.0.jar:1.9.0]

[jira] [Updated] (DRILL-5583) Literal expression not handled

2017-06-13 Thread Muhammad Gelbana (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muhammad Gelbana updated DRILL-5583:

Description: 
The following query
{code:sql}
SELECT ((UNIX_TIMESTAMP(Calcs.`date0`, '-MM-dd') / (60 * 60 * 24)) + (365 * 
70 + 17)) `TEMP(Test)(64617177)(0)` FROM `dfs`.`path_to_parquet` Calcs GROUP BY 
((UNIX_TIMESTAMP(Calcs.`date0`, '-MM-dd') / (60 * 60 * 24)) + (365 * 70 + 
17))
{code}

Throws the following exception
{noformat}
[Error Id: 5ee33c0f-9edc-43a0-8125-3e6499e72410 on mgelbana:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError: 
Internal error: invalid literal: 60 * 60 * 24


[Error Id: 5ee33c0f-9edc-43a0-8125-3e6499e72410 on mgelbana:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
 ~[drill-common-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:825)
 [drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:935) 
[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) 
[drill-java-exec-1.9.0.jar:1.9.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
exception during fragment initialization: Internal error: invalid literal: 60 + 
2
... 4 common frames omitted
Caused by: java.lang.AssertionError: Internal error: invalid literal: 60 + 2
at org.apache.calcite.util.Util.newInternal(Util.java:777) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlLiteral.value(SqlLiteral.java:329) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.SqlCallBinding.getOperandLiteralValue(SqlCallBinding.java:219)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.SqlBinaryOperator.getMonotonicity(SqlBinaryOperator.java:188)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.drill.exec.planner.sql.DrillCalciteSqlOperatorWrapper.getMonotonicity(DrillCalciteSqlOperatorWrapper.java:107)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.calcite.sql.SqlCall.getMonotonicity(SqlCall.java:175) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.SqlCallBinding.getOperandMonotonicity(SqlCallBinding.java:193)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.fun.SqlMonotonicBinaryOperator.getMonotonicity(SqlMonotonicBinaryOperator.java:59)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.drill.exec.planner.sql.DrillCalciteSqlOperatorWrapper.getMonotonicity(DrillCalciteSqlOperatorWrapper.java:107)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.calcite.sql.SqlCall.getMonotonicity(SqlCall.java:175) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SelectScope.getMonotonicity(SelectScope.java:154)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:2476)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:2374)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:603)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:564)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2769)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:518)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.drill.exec.planner.sql.SqlConverter.toRel(SqlConverter.java:263) 
~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:626)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:195)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at 

[jira] [Created] (DRILL-5584) When Compiling Apache Drill C++ Client, versioning information are not present in the binary

2017-06-13 Thread Rob Wu (JIRA)
Rob Wu created DRILL-5584:
-

 Summary: When Compiling Apache Drill C++ Client, versioning 
information are not present in the binary
 Key: DRILL-5584
 URL: https://issues.apache.org/jira/browse/DRILL-5584
 Project: Apache Drill
  Issue Type: Improvement
  Components: Client - C++
Affects Versions: 1.10.0
Reporter: Rob Wu
Priority: Minor


We should add support for generating an RC file containing the versioning 
information so this manual task can be automated.

Current workaround:
Compile the C++ Client DLL.
Open the DLL and manually add a Version Resource with the following information:
FILEVERSION   1,10,0,0
PRODUCTVERSION 1,10,0,0
CompanyName
FileDescription Apache Drill C++ Client
FileVersion   1.10.0.0
InternalNamedrillClient.dll
LegalCopyright Copyright (c) 2013-2017 The Apache Software 
Foundation
OriginalFilename  drillClient.dll
ProductName   Apache Drill C++ Client
ProductVersion 1.10.0.0




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5503) Disabling exchanges results in "Unable to allocate sv2 buffer" error within the managed external sort code

2017-06-13 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048454#comment-16048454
 ] 

Paul Rogers commented on DRILL-5503:


Related issue: in another test case, we observed that disabling exchanges 
somehow resets the sort's memory limit to the default of 10GB. Not sure if it 
is also happening in this case, but it is worth a look.

> Disabling exchanges results in "Unable to allocate sv2 buffer" error within 
> the managed external sort code
> --
>
> Key: DRILL-5503
> URL: https://issues.apache.org/jira/browse/DRILL-5503
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: drill5503.log, failure.sys.drill, success.sys.drill
>
>
> Setup :
> {code}
> git.commit.id.abbrev=1e0a14c
> No of drillbits : 1
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> {code}
> The below successfully completes
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 6260;
> alter session set `planner.width.max_per_query` = 17;
> select count(*) from (select * from 
> dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by 
> columns[0]) d where d.columns[0] = '4041054511';
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> 1 row selected (814.104 seconds)
> {code}
> However if I disable exchanges, I get the following error
> {code}
> alter session set `planner.disable_exchanges` = false;
> select count(*) from (select * from 
> dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by 
> columns[0]) d where d.columns[0] = '4041054511';
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> 1 row selected (814.104 seconds)
> {code}
> I attached the profile and the log file. The data set used is too large to 
> attach here. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5587) Validate Parquet blockSize and pageSize configured with SYSTEM/SESSION option

2017-06-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048718#comment-16048718
 ] 

ASF GitHub Bot commented on DRILL-5587:
---

GitHub user ppadma opened a pull request:

https://github.com/apache/drill/pull/852

DRILL-5587: Validate Parquet blockSize and pageSize configured with S…

…YSTEM/SESSION option

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ppadma/drill DRILL-5587

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/852.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #852


commit 8e0be0b283583be5ccfe32dc6b0f805424880fb4
Author: Padma Penumarthy 
Date:   2017-06-14T00:23:17Z

DRILL-5587: Validate Parquet blockSize and pageSize configured with 
SYSTEM/SESSION option




> Validate Parquet blockSize and pageSize configured with SYSTEM/SESSION option
> -
>
> Key: DRILL-5587
> URL: https://issues.apache.org/jira/browse/DRILL-5587
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.11.0
>
>
> We can set Parquet blockSize, pageSize and dictionary pageSize to any value. 
> It uses LongValidator which is not exactly validating the value. Since all 
> these sizes are used as int in the code, even though user is able to set them 
> to any value (could be greater than MAXINT and/or negative), parsing the 
> value later in the code as int can throw an error. Instead, restrict the 
> value that can be set to MAXINT. 
> There is a bug open for validating system/session options in general. 
> https://issues.apache.org/jira/browse/DRILL-2478



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5587) Validate Parquet blockSize and pageSize configured with SYSTEM/SESSION option

2017-06-13 Thread Padma Penumarthy (JIRA)
Padma Penumarthy created DRILL-5587:
---

 Summary: Validate Parquet blockSize and pageSize configured with 
SYSTEM/SESSION option
 Key: DRILL-5587
 URL: https://issues.apache.org/jira/browse/DRILL-5587
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.10.0
Reporter: Padma Penumarthy
Assignee: Padma Penumarthy
 Fix For: 1.11.0


We can set Parquet blockSize, pageSize and dictionary pageSize to any value. It 
uses LongValidator which is not exactly validating the value. Since all these 
sizes are used as int in the code, even though user is able to set them to any 
value (could be greater than MAXINT and/or negative), parsing the value later 
in the code as int can throw an error. Instead, restrict the value that can be 
set to MAXINT. 
There is a bug open for validating system/session options in general. 
https://issues.apache.org/jira/browse/DRILL-2478




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5586) UnionAll operator does more than necessary value vector allocation and copy

2017-06-13 Thread Jinfeng Ni (JIRA)
Jinfeng Ni created DRILL-5586:
-

 Summary: UnionAll operator does more than necessary value vector 
allocation and copy
 Key: DRILL-5586
 URL: https://issues.apache.org/jira/browse/DRILL-5586
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jinfeng Ni


When inputs to UnionAll operators are just simple field reference, in stead of 
an expression involving a function, which requires evaluation, it should 
leverage value vector's transfer API.  Doing transfer would avoid the 
allocation of buffer for value vector in outgoing batch, plus the overhead to 
copy the data from incoming batch to outgoing batch. 

For example, in the following query:
{code}
select l_orderkey from cp.`tpch/lineitem.parquet` l union all select 
n_nationkey from cp.`tpch/nation.parquet`
{code}

Both left and right side of UnionAll operator is simple filed reference, and 
Drill should call transfer API. However, the current code would do buffer 
allocation & copy for both left and right. Such processing would significantly 
slow UnionAll operator's performance, and eventually slow down query evaluation.

DRILL-5521 reverts a change in logic whether applying transfer logic made in 
DRILL-5419, based on SchemaPath equal comparison.  Even we fix that problem, 
it's not enough to use SchemaPath equal comparison as criteria whether transfer 
should be used. Ideally, even the output field and incoming field have 
different names, UnionAll operator should do {{transfer}}, instead of {{copy}}, 
as long as the expression is simple field reference. 

{code}
select l_orderkey as Key1 from cp.`tpch/lineitem.parquet` l union all select 
n_nationkey as Key2 from cp.`tpch/nation.parquet`
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5586) UnionAll operator does more than necessary value vector allocation and copy

2017-06-13 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-5586:
-

Assignee: Jinfeng Ni

> UnionAll operator does more than necessary value vector allocation and copy
> ---
>
> Key: DRILL-5586
> URL: https://issues.apache.org/jira/browse/DRILL-5586
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> When inputs to UnionAll operators are just simple field reference, in stead 
> of an expression involving a function, which requires evaluation, it should 
> leverage value vector's transfer API.  Doing transfer would avoid the 
> allocation of buffer for value vector in outgoing batch, plus the overhead to 
> copy the data from incoming batch to outgoing batch. 
> For example, in the following query:
> {code}
> select l_orderkey from cp.`tpch/lineitem.parquet` l union all select 
> n_nationkey from cp.`tpch/nation.parquet`
> {code}
> Both left and right side of UnionAll operator is simple filed reference, and 
> Drill should call transfer API. However, the current code would do buffer 
> allocation & copy for both left and right. Such processing would 
> significantly slow UnionAll operator's performance, and eventually slow down 
> query evaluation.
> DRILL-5521 reverts a change in logic whether applying transfer logic made in 
> DRILL-5419, based on SchemaPath equal comparison.  Even we fix that problem, 
> it's not enough to use SchemaPath equal comparison as criteria whether 
> transfer should be used. Ideally, even the output field and incoming field 
> have different names, UnionAll operator should do {{transfer}}, instead of 
> {{copy}}, as long as the expression is simple field reference. 
> {code}
> select l_orderkey as Key1 from cp.`tpch/lineitem.parquet` l union all select 
> n_nationkey as Key2 from cp.`tpch/nation.parquet`
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5585) UnionAll operator generates run-time code for every incoming batch

2017-06-13 Thread Jinfeng Ni (JIRA)
Jinfeng Ni created DRILL-5585:
-

 Summary: UnionAll operator generates run-time code for every 
incoming batch
 Key: DRILL-5585
 URL: https://issues.apache.org/jira/browse/DRILL-5585
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni


In Drill's execution framework, each operator may generate run-time code for 
various purpose. The code generation & compilation should only happen when 
there is a new schema from incoming batch ({{OK_NEW_SCHEM}}. For any follow-up 
schema ({{OK}}), the operator should not generate the run-time code, since it's 
available. 

However, in the current implementation of UnionAll, regardless the incoming 
batch returns with a {{OK_NEW_SCHEMA}} or {{OK}}, it will always call doWork(), 
which essentially would 1) generate code and possibly compile code, 2) doSetup, 
3) doEvaluation.  The code generation logic is not necessary, and doing that 
for each batch would significantly impact the operator's performance, and slow 
down query execution. 

{code}
case OK_NEW_SCHEMA:
  outputFields = unionAllInput.getOutputFields();
case OK:
  IterOutcome workOutcome = doWork();
{code}
For the multiple run-time generation, code compilation could be skipped, unless 
there is a miss in code cache. However,  the current code logic is still 
problematic,  since it has to {{ClassGenerator}} to generate the run-time 
source code. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5582) [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to data being written to the attacker's target instead of Drillbit

2017-06-13 Thread Rob Wu (JIRA)
Rob Wu created DRILL-5582:
-

 Summary: [Threat Modeling] Drillbit may be spoofed by an attacker 
and this may lead to data being written to the attacker's target instead of 
Drillbit
 Key: DRILL-5582
 URL: https://issues.apache.org/jira/browse/DRILL-5582
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Rob Wu
Priority: Minor


Consider the scenario:
Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication 
enabled containing important data. Bob, the attacker, attempts to spoof the 
connection and redirect it to his own drillbit (fake.drillbit.co) with no 
authentication setup. 

When Alice is under attack and attempts to connect to her secure drillbit, she 
is actually authenticating against Bob's drillbit. At this point, the 
connection should have failed due to unmatched configuration. However, the 
current implementation will return SUCCESS as long as the (spoofing) drillbit 
has no authentication requirement set.

Currently, the drillbit <-to-> drill client connection accepts the lowest 
authentication configuration set on the server. This leaves the user vulnerable 
to spoofing. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5582) [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to data being written to the attacker's target instead of Drillbit

2017-06-13 Thread Rob Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Wu updated DRILL-5582:
--
Description: 
*Consider the scenario:*
Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication 
enabled containing important data. Bob, the attacker, attempts to spoof the 
connection and redirect it to his own drillbit (fake.drillbit.co) with no 
authentication setup. 

When Alice is under attack and attempts to connect to her secure drillbit, she 
is actually authenticating against Bob's drillbit. At this point, the 
connection should have failed due to unmatched configuration. However, the 
current implementation will return SUCCESS as long as the (spoofing) drillbit 
has no authentication requirement set.

Currently, the drillbit <-  to  -> drill client connection accepts the lowest 
authentication configuration set on the server. This leaves unsuspecting user 
vulnerable to spoofing. 

  was:
Consider the scenario:
Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication 
enabled containing important data. Bob, the attacker, attempts to spoof the 
connection and redirect it to his own drillbit (fake.drillbit.co) with no 
authentication setup. 

When Alice is under attack and attempts to connect to her secure drillbit, she 
is actually authenticating against Bob's drillbit. At this point, the 
connection should have failed due to unmatched configuration. However, the 
current implementation will return SUCCESS as long as the (spoofing) drillbit 
has no authentication requirement set.

Currently, the drillbit <-  to  -> drill client connection accepts the lowest 
authentication configuration set on the server. This leaves unsuspecting user 
vulnerable to spoofing. 


> [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to 
> data being written to the attacker's target instead of Drillbit
> -
>
> Key: DRILL-5582
> URL: https://issues.apache.org/jira/browse/DRILL-5582
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Rob Wu
>Priority: Minor
>
> *Consider the scenario:*
> Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication 
> enabled containing important data. Bob, the attacker, attempts to spoof the 
> connection and redirect it to his own drillbit (fake.drillbit.co) with no 
> authentication setup. 
> When Alice is under attack and attempts to connect to her secure drillbit, 
> she is actually authenticating against Bob's drillbit. At this point, the 
> connection should have failed due to unmatched configuration. However, the 
> current implementation will return SUCCESS as long as the (spoofing) drillbit 
> has no authentication requirement set.
> Currently, the drillbit <-  to  -> drill client connection accepts the lowest 
> authentication configuration set on the server. This leaves unsuspecting user 
> vulnerable to spoofing. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5582) [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to data being written to the attacker's target instead of Drillbit

2017-06-13 Thread Rob Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Wu updated DRILL-5582:
--
Description: 
Consider the scenario:
Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication 
enabled containing important data. Bob, the attacker, attempts to spoof the 
connection and redirect it to his own drillbit (fake.drillbit.co) with no 
authentication setup. 

When Alice is under attack and attempts to connect to her secure drillbit, she 
is actually authenticating against Bob's drillbit. At this point, the 
connection should have failed due to unmatched configuration. However, the 
current implementation will return SUCCESS as long as the (spoofing) drillbit 
has no authentication requirement set.

Currently, the drillbit <-  to  -> drill client connection accepts the lowest 
authentication configuration set on the server. This leaves unsuspecting user 
vulnerable to spoofing. 

  was:
Consider the scenario:
Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication 
enabled containing important data. Bob, the attacker, attempts to spoof the 
connection and redirect it to his own drillbit (fake.drillbit.co) with no 
authentication setup. 

When Alice is under attack and attempts to connect to her secure drillbit, she 
is actually authenticating against Bob's drillbit. At this point, the 
connection should have failed due to unmatched configuration. However, the 
current implementation will return SUCCESS as long as the (spoofing) drillbit 
has no authentication requirement set.

Currently, the drillbit <-  to  -> drill client connection accepts the lowest 
authentication configuration set on the server. This leaves the user vulnerable 
to spoofing. 


> [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to 
> data being written to the attacker's target instead of Drillbit
> -
>
> Key: DRILL-5582
> URL: https://issues.apache.org/jira/browse/DRILL-5582
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Rob Wu
>Priority: Minor
>
> Consider the scenario:
> Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication 
> enabled containing important data. Bob, the attacker, attempts to spoof the 
> connection and redirect it to his own drillbit (fake.drillbit.co) with no 
> authentication setup. 
> When Alice is under attack and attempts to connect to her secure drillbit, 
> she is actually authenticating against Bob's drillbit. At this point, the 
> connection should have failed due to unmatched configuration. However, the 
> current implementation will return SUCCESS as long as the (spoofing) drillbit 
> has no authentication requirement set.
> Currently, the drillbit <-  to  -> drill client connection accepts the lowest 
> authentication configuration set on the server. This leaves unsuspecting user 
> vulnerable to spoofing. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5582) [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to data being written to the attacker's target instead of Drillbit

2017-06-13 Thread Rob Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Wu updated DRILL-5582:
--
Description: 
Consider the scenario:
Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication 
enabled containing important data. Bob, the attacker, attempts to spoof the 
connection and redirect it to his own drillbit (fake.drillbit.co) with no 
authentication setup. 

When Alice is under attack and attempts to connect to her secure drillbit, she 
is actually authenticating against Bob's drillbit. At this point, the 
connection should have failed due to unmatched configuration. However, the 
current implementation will return SUCCESS as long as the (spoofing) drillbit 
has no authentication requirement set.

Currently, the drillbit <-  to  -> drill client connection accepts the lowest 
authentication configuration set on the server. This leaves the user vulnerable 
to spoofing. 

  was:
Consider the scenario:
Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication 
enabled containing important data. Bob, the attacker, attempts to spoof the 
connection and redirect it to his own drillbit (fake.drillbit.co) with no 
authentication setup. 

When Alice is under attack and attempts to connect to her secure drillbit, she 
is actually authenticating against Bob's drillbit. At this point, the 
connection should have failed due to unmatched configuration. However, the 
current implementation will return SUCCESS as long as the (spoofing) drillbit 
has no authentication requirement set.

Currently, the drillbit <-to-> drill client connection accepts the lowest 
authentication configuration set on the server. This leaves the user vulnerable 
to spoofing. 


> [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to 
> data being written to the attacker's target instead of Drillbit
> -
>
> Key: DRILL-5582
> URL: https://issues.apache.org/jira/browse/DRILL-5582
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Rob Wu
>Priority: Minor
>
> Consider the scenario:
> Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication 
> enabled containing important data. Bob, the attacker, attempts to spoof the 
> connection and redirect it to his own drillbit (fake.drillbit.co) with no 
> authentication setup. 
> When Alice is under attack and attempts to connect to her secure drillbit, 
> she is actually authenticating against Bob's drillbit. At this point, the 
> connection should have failed due to unmatched configuration. However, the 
> current implementation will return SUCCESS as long as the (spoofing) drillbit 
> has no authentication requirement set.
> Currently, the drillbit <-  to  -> drill client connection accepts the lowest 
> authentication configuration set on the server. This leaves the user 
> vulnerable to spoofing. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5581) Query with CASE statement returns wrong results

2017-06-13 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-5581:
--
Description: 
A query that uses case statement, returns wrong results.

{noformat}
Apache Drill 1.11.0-SNAPSHOT, commit id: 874bf629

[test@centos-101 ~]# cat order_sample.csv
202634342,2101,20160301

apache drill 1.11.0-SNAPSHOT
"this isn't your grandfather's sql"
0: jdbc:drill:schema=dfs.tmp> ALTER SESSION SET `store.format`='csv';
+---++
|  ok   |summary |
+---++
| true  | store.format updated.  |
+---++
1 row selected (0.245 seconds)
0: jdbc:drill:schema=dfs.tmp> CREATE VIEW  `vw_order_sample_csv` as
. . . . . . . . . . . . . . > SELECT
. . . . . . . . . . . . . . > `columns`[0] AS `ND`,
. . . . . . . . . . . . . . > CAST(`columns`[1] AS BIGINT) AS `col1`,
. . . . . . . . . . . . . . > CAST(`columns`[2] AS BIGINT) AS `col2`
. . . . . . . . . . . . . . > FROM `order_sample.csv`;
+---+--+
|  ok   |   summary|
+---+--+
| true  | View 'vw_order_sample_csv' created successfully in 'dfs.tmp' schema  |
+---+--+
1 row selected (0.253 seconds)
0: jdbc:drill:schema=dfs.tmp> select
. . . . . . . . . . . . . . > case
. . . . . . . . . . . . . . > when col1 > col2 then col1
. . . . . . . . . . . . . . > else col2
. . . . . . . . . . . . . . > end as temp_col,
. . . . . . . . . . . . . . > case
. . . . . . . . . . . . . . > when col1 = 2101 and (20170302 - col2) > 
1 then 'D'
. . . . . . . . . . . . . . > when col2 = 2101 then 'P'
. . . . . . . . . . . . . . > when col1 - col2 > 1 then '0'
. . . . . . . . . . . . . . > else 'A'
. . . . . . . . . . . . . . > end as status
. . . . . . . . . . . . . . > from  `vw_order_sample_csv`;
+---+-+
| temp_col  | status  |
+---+-+
| 20160301  | A   |
+---+-+
1 row selected (0.318 seconds)

0: jdbc:drill:schema=dfs.tmp> explain plan for
. . . . . . . . . . . . . . > select
. . . . . . . . . . . . . . > case
. . . . . . . . . . . . . . > when col1 > col2 then col1
. . . . . . . . . . . . . . > else col2
. . . . . . . . . . . . . . > end as temp_col,
. . . . . . . . . . . . . . > case
. . . . . . . . . . . . . . > when col1 = 2101 and (20170302 - col2) > 
1 then 'D'
. . . . . . . . . . . . . . > when col2 = 2101 then 'P'
. . . . . . . . . . . . . . > when col1 - col2 > 1 then '0'
. . . . . . . . . . . . . . > else 'A'
. . . . . . . . . . . . . . > end as status
. . . . . . . . . . . . . . > from  `vw_order_sample_csv`;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(temp_col=[CASE(>(CAST(ITEM($0, 1)):BIGINT, CAST(ITEM($0, 
2)):BIGINT), CAST(ITEM($0, 1)):BIGINT, CAST(ITEM($0, 2)):BIGINT)], 
status=[CASE(AND(=(CAST(ITEM($0, 1)):BIGINT, 2101), >(-(20170302, 
CAST(ITEM($0, 2)):BIGINT), 1)), 'D', =(CAST(ITEM($0, 2)):BIGINT, 2101), 
'P', >(-(CAST(ITEM($0, 1)):BIGINT, CAST(ITEM($0, 2)):BIGINT), 1), '0', 
'A')])
00-02Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/order_sample.csv, numFiles=1, columns=[`columns`[1], 
`columns`[2]], files=[maprfs:///tmp/order_sample.csv]]])

// Details of Java compiler from sys.options
0: jdbc:drill:schema=dfs.tmp> select name, status from sys.options where name 
like '%java_compiler%';
++--+
|  name  |  status  |
++--+
| exec.java.compiler.exp_in_method_size  | DEFAULT  |
| exec.java_compiler | DEFAULT  |
| exec.java_compiler_debug   | DEFAULT  |
| exec.java_compiler_janino_maxsize  | DEFAULT  |
++--+
4 rows selected (0.21 seconds)

{noformat}

Results from Postgres 9.3 for the same query, note the difference in results

{noformat}
postgres=# create table order_sample(c1 varchar(50), c2 bigint, c3 bigint);
CREATE TABLE
postgres=# insert into order_sample values('202634342',2101,20160301);
INSERT 0 1
postgres=# select * from order_sample;
c1 |c2|c3
---+--+--
 202634342 | 2101 | 20160301
(1 row)

postgres=# create view vw_order_sample_csv as 
select 
  c1 as ND,  
  CAST(c2 AS BIGINT) AS col1,  
  CAST(c3 AS BIGINT) AS col2   
FROM order_sample;
CREATE VIEW
postgres=# select
postgres-# case
postgres-# when col1 > col2 then col1
postgres-# else col2
postgres-# end as temp_col,
postgres-# case
postgres-# when col1 =