[jira] [Comment Edited] (DRILL-4116) Inconsistent results with datetime functions on different machines
[ https://issues.apache.org/jira/browse/DRILL-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031397#comment-16031397 ] Vitalii Diravka edited comment on DRILL-4116 at 6/13/17 10:50 AM: -- [~rkins] It looks like the output relies on the local timezone (but should not, the result should be the same for any timezone for this function). Could you double-check it by querying (for both machines): {code} select datediff(date '1996-03-01', timestamp '1997-02-10 17:32:00.0'), timeofday() from cp.`tpch/lineitem.parquet` limit 1; {code} was (Author: vitalii): [~rkins] It looks like the output relies on the local timezone (but should not, the result should be the same for any timezone for this function). Could you double-check it by querying (for both machines): {code} select datediff(date '1996-03-01', timestamp '1997-02-10 17:32:00.0'), tiemofday() from cp.`tpch/lineitem.parquet` limit 1; {code} > Inconsistent results with datetime functions on different machines > -- > > Key: DRILL-4116 > URL: https://issues.apache.org/jira/browse/DRILL-4116 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.3.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka >Priority: Critical > > git.commit.id.abbrev=a6a0fc3 > The below query yields different results on different machines > System 1 : > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> select datediff(date '1996-03-01', > timestamp '1997-02-10 17:32:00.0') from cp.`tpch/lineitem.parquet` limit 1; > +-+ > | EXPR$0 | > +-+ > | -346| > +-+ > 1 row selected (1.57 seconds) > {code} > System 2 : > {code} > 0: jdbc:drill:drillbit=10.10.88.193> select datediff(date '1996-03-01', > timestamp '1997-02-10 17:32:00.0') from cp.`tpch/lineitem.parquet` limit 1; > +-+ > | EXPR$0 | > +-+ > | -347| > +-+ > 1 row selected (1.239 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5583) Literal expression not handled
Muhammad Gelbana created DRILL-5583: --- Summary: Literal expression not handled Key: DRILL-5583 URL: https://issues.apache.org/jira/browse/DRILL-5583 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 1.9.0 Reporter: Muhammad Gelbana The following query {code:sql} SELECT ((UNIX_TIMESTAMP(Calcs.`date0`, '-MM-dd') / (60 * 60 * 24)) + (365 * 70 + 17)) `TEMP(Test)(64617177)(0)` FROM `dfs`.`path_to_parquet` Calcs GROUP BY ((UNIX_TIMESTAMP(Calcs.`date0`, '-MM-dd') / (60 * 60 * 24)) + (365 * 70 + 17)) {code} Throws the following exception {noformat} [Error Id: 5ee33c0f-9edc-43a0-8125-3e6499e72410 on mgelbana:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError: Internal error: invalid literal: 60 + 2 [Error Id: 5ee33c0f-9edc-43a0-8125-3e6499e72410 on mgelbana:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) ~[drill-common-1.9.0.jar:1.9.0] at org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:825) [drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:935) [drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) [drill-java-exec-1.9.0.jar:1.9.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Internal error: invalid literal: 60 + 2 ... 4 common frames omitted Caused by: java.lang.AssertionError: Internal error: invalid literal: 60 + 2 at org.apache.calcite.util.Util.newInternal(Util.java:777) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.SqlLiteral.value(SqlLiteral.java:329) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.SqlCallBinding.getOperandLiteralValue(SqlCallBinding.java:219) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.SqlBinaryOperator.getMonotonicity(SqlBinaryOperator.java:188) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.drill.exec.planner.sql.DrillCalciteSqlOperatorWrapper.getMonotonicity(DrillCalciteSqlOperatorWrapper.java:107) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.calcite.sql.SqlCall.getMonotonicity(SqlCall.java:175) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.SqlCallBinding.getOperandMonotonicity(SqlCallBinding.java:193) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.fun.SqlMonotonicBinaryOperator.getMonotonicity(SqlMonotonicBinaryOperator.java:59) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.drill.exec.planner.sql.DrillCalciteSqlOperatorWrapper.getMonotonicity(DrillCalciteSqlOperatorWrapper.java:107) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.calcite.sql.SqlCall.getMonotonicity(SqlCall.java:175) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.validate.SelectScope.getMonotonicity(SelectScope.java:154) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:2476) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:2374) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:603) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:564) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2769) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:518) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.drill.exec.planner.sql.SqlConverter.toRel(SqlConverter.java:263) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:626) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:195) ~[drill-java-exec-1.9.0.jar:1.9.0]
[jira] [Updated] (DRILL-5583) Literal expression not handled
[ https://issues.apache.org/jira/browse/DRILL-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Muhammad Gelbana updated DRILL-5583: Description: The following query {code:sql} SELECT ((UNIX_TIMESTAMP(Calcs.`date0`, '-MM-dd') / (60 * 60 * 24)) + (365 * 70 + 17)) `TEMP(Test)(64617177)(0)` FROM `dfs`.`path_to_parquet` Calcs GROUP BY ((UNIX_TIMESTAMP(Calcs.`date0`, '-MM-dd') / (60 * 60 * 24)) + (365 * 70 + 17)) {code} Throws the following exception {noformat} [Error Id: 5ee33c0f-9edc-43a0-8125-3e6499e72410 on mgelbana:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError: Internal error: invalid literal: 60 * 60 * 24 [Error Id: 5ee33c0f-9edc-43a0-8125-3e6499e72410 on mgelbana:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) ~[drill-common-1.9.0.jar:1.9.0] at org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:825) [drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:935) [drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) [drill-java-exec-1.9.0.jar:1.9.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Internal error: invalid literal: 60 + 2 ... 4 common frames omitted Caused by: java.lang.AssertionError: Internal error: invalid literal: 60 + 2 at org.apache.calcite.util.Util.newInternal(Util.java:777) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.SqlLiteral.value(SqlLiteral.java:329) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.SqlCallBinding.getOperandLiteralValue(SqlCallBinding.java:219) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.SqlBinaryOperator.getMonotonicity(SqlBinaryOperator.java:188) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.drill.exec.planner.sql.DrillCalciteSqlOperatorWrapper.getMonotonicity(DrillCalciteSqlOperatorWrapper.java:107) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.calcite.sql.SqlCall.getMonotonicity(SqlCall.java:175) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.SqlCallBinding.getOperandMonotonicity(SqlCallBinding.java:193) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.fun.SqlMonotonicBinaryOperator.getMonotonicity(SqlMonotonicBinaryOperator.java:59) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.drill.exec.planner.sql.DrillCalciteSqlOperatorWrapper.getMonotonicity(DrillCalciteSqlOperatorWrapper.java:107) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.calcite.sql.SqlCall.getMonotonicity(SqlCall.java:175) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.validate.SelectScope.getMonotonicity(SelectScope.java:154) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:2476) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:2374) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:603) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:564) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2769) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:518) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.drill.exec.planner.sql.SqlConverter.toRel(SqlConverter.java:263) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:626) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:195) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164) ~[drill-java-exec-1.9.0.jar:1.9.0] at
[jira] [Created] (DRILL-5584) When Compiling Apache Drill C++ Client, versioning information are not present in the binary
Rob Wu created DRILL-5584: - Summary: When Compiling Apache Drill C++ Client, versioning information are not present in the binary Key: DRILL-5584 URL: https://issues.apache.org/jira/browse/DRILL-5584 Project: Apache Drill Issue Type: Improvement Components: Client - C++ Affects Versions: 1.10.0 Reporter: Rob Wu Priority: Minor We should add support for generating an RC file containing the versioning information so this manual task can be automated. Current workaround: Compile the C++ Client DLL. Open the DLL and manually add a Version Resource with the following information: FILEVERSION 1,10,0,0 PRODUCTVERSION 1,10,0,0 CompanyName FileDescription Apache Drill C++ Client FileVersion 1.10.0.0 InternalNamedrillClient.dll LegalCopyright Copyright (c) 2013-2017 The Apache Software Foundation OriginalFilename drillClient.dll ProductName Apache Drill C++ Client ProductVersion 1.10.0.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5503) Disabling exchanges results in "Unable to allocate sv2 buffer" error within the managed external sort code
[ https://issues.apache.org/jira/browse/DRILL-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048454#comment-16048454 ] Paul Rogers commented on DRILL-5503: Related issue: in another test case, we observed that disabling exchanges somehow resets the sort's memory limit to the default of 10GB. Not sure if it is also happening in this case, but it is worth a look. > Disabling exchanges results in "Unable to allocate sv2 buffer" error within > the managed external sort code > -- > > Key: DRILL-5503 > URL: https://issues.apache.org/jira/browse/DRILL-5503 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Attachments: drill5503.log, failure.sys.drill, success.sys.drill > > > Setup : > {code} > git.commit.id.abbrev=1e0a14c > No of drillbits : 1 > DRILL_MAX_DIRECT_MEMORY="32G" > DRILL_MAX_HEAP="4G" > {code} > The below successfully completes > {code} > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.width.max_per_node` = 1; > alter session set `planner.memory.max_query_memory_per_node` = 6260; > alter session set `planner.width.max_per_query` = 17; > select count(*) from (select * from > dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by > columns[0]) d where d.columns[0] = '4041054511'; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > 1 row selected (814.104 seconds) > {code} > However if I disable exchanges, I get the following error > {code} > alter session set `planner.disable_exchanges` = false; > select count(*) from (select * from > dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by > columns[0]) d where d.columns[0] = '4041054511'; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > 1 row selected (814.104 seconds) > {code} > I attached the profile and the log file. The data set used is too large to > attach here. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5587) Validate Parquet blockSize and pageSize configured with SYSTEM/SESSION option
[ https://issues.apache.org/jira/browse/DRILL-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048718#comment-16048718 ] ASF GitHub Bot commented on DRILL-5587: --- GitHub user ppadma opened a pull request: https://github.com/apache/drill/pull/852 DRILL-5587: Validate Parquet blockSize and pageSize configured with S… …YSTEM/SESSION option You can merge this pull request into a Git repository by running: $ git pull https://github.com/ppadma/drill DRILL-5587 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/852.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #852 commit 8e0be0b283583be5ccfe32dc6b0f805424880fb4 Author: Padma PenumarthyDate: 2017-06-14T00:23:17Z DRILL-5587: Validate Parquet blockSize and pageSize configured with SYSTEM/SESSION option > Validate Parquet blockSize and pageSize configured with SYSTEM/SESSION option > - > > Key: DRILL-5587 > URL: https://issues.apache.org/jira/browse/DRILL-5587 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.10.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy > Fix For: 1.11.0 > > > We can set Parquet blockSize, pageSize and dictionary pageSize to any value. > It uses LongValidator which is not exactly validating the value. Since all > these sizes are used as int in the code, even though user is able to set them > to any value (could be greater than MAXINT and/or negative), parsing the > value later in the code as int can throw an error. Instead, restrict the > value that can be set to MAXINT. > There is a bug open for validating system/session options in general. > https://issues.apache.org/jira/browse/DRILL-2478 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5587) Validate Parquet blockSize and pageSize configured with SYSTEM/SESSION option
Padma Penumarthy created DRILL-5587: --- Summary: Validate Parquet blockSize and pageSize configured with SYSTEM/SESSION option Key: DRILL-5587 URL: https://issues.apache.org/jira/browse/DRILL-5587 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.10.0 Reporter: Padma Penumarthy Assignee: Padma Penumarthy Fix For: 1.11.0 We can set Parquet blockSize, pageSize and dictionary pageSize to any value. It uses LongValidator which is not exactly validating the value. Since all these sizes are used as int in the code, even though user is able to set them to any value (could be greater than MAXINT and/or negative), parsing the value later in the code as int can throw an error. Instead, restrict the value that can be set to MAXINT. There is a bug open for validating system/session options in general. https://issues.apache.org/jira/browse/DRILL-2478 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5586) UnionAll operator does more than necessary value vector allocation and copy
Jinfeng Ni created DRILL-5586: - Summary: UnionAll operator does more than necessary value vector allocation and copy Key: DRILL-5586 URL: https://issues.apache.org/jira/browse/DRILL-5586 Project: Apache Drill Issue Type: Bug Reporter: Jinfeng Ni When inputs to UnionAll operators are just simple field reference, in stead of an expression involving a function, which requires evaluation, it should leverage value vector's transfer API. Doing transfer would avoid the allocation of buffer for value vector in outgoing batch, plus the overhead to copy the data from incoming batch to outgoing batch. For example, in the following query: {code} select l_orderkey from cp.`tpch/lineitem.parquet` l union all select n_nationkey from cp.`tpch/nation.parquet` {code} Both left and right side of UnionAll operator is simple filed reference, and Drill should call transfer API. However, the current code would do buffer allocation & copy for both left and right. Such processing would significantly slow UnionAll operator's performance, and eventually slow down query evaluation. DRILL-5521 reverts a change in logic whether applying transfer logic made in DRILL-5419, based on SchemaPath equal comparison. Even we fix that problem, it's not enough to use SchemaPath equal comparison as criteria whether transfer should be used. Ideally, even the output field and incoming field have different names, UnionAll operator should do {{transfer}}, instead of {{copy}}, as long as the expression is simple field reference. {code} select l_orderkey as Key1 from cp.`tpch/lineitem.parquet` l union all select n_nationkey as Key2 from cp.`tpch/nation.parquet` {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DRILL-5586) UnionAll operator does more than necessary value vector allocation and copy
[ https://issues.apache.org/jira/browse/DRILL-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni reassigned DRILL-5586: - Assignee: Jinfeng Ni > UnionAll operator does more than necessary value vector allocation and copy > --- > > Key: DRILL-5586 > URL: https://issues.apache.org/jira/browse/DRILL-5586 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > > When inputs to UnionAll operators are just simple field reference, in stead > of an expression involving a function, which requires evaluation, it should > leverage value vector's transfer API. Doing transfer would avoid the > allocation of buffer for value vector in outgoing batch, plus the overhead to > copy the data from incoming batch to outgoing batch. > For example, in the following query: > {code} > select l_orderkey from cp.`tpch/lineitem.parquet` l union all select > n_nationkey from cp.`tpch/nation.parquet` > {code} > Both left and right side of UnionAll operator is simple filed reference, and > Drill should call transfer API. However, the current code would do buffer > allocation & copy for both left and right. Such processing would > significantly slow UnionAll operator's performance, and eventually slow down > query evaluation. > DRILL-5521 reverts a change in logic whether applying transfer logic made in > DRILL-5419, based on SchemaPath equal comparison. Even we fix that problem, > it's not enough to use SchemaPath equal comparison as criteria whether > transfer should be used. Ideally, even the output field and incoming field > have different names, UnionAll operator should do {{transfer}}, instead of > {{copy}}, as long as the expression is simple field reference. > {code} > select l_orderkey as Key1 from cp.`tpch/lineitem.parquet` l union all select > n_nationkey as Key2 from cp.`tpch/nation.parquet` > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5585) UnionAll operator generates run-time code for every incoming batch
Jinfeng Ni created DRILL-5585: - Summary: UnionAll operator generates run-time code for every incoming batch Key: DRILL-5585 URL: https://issues.apache.org/jira/browse/DRILL-5585 Project: Apache Drill Issue Type: Bug Reporter: Jinfeng Ni Assignee: Jinfeng Ni In Drill's execution framework, each operator may generate run-time code for various purpose. The code generation & compilation should only happen when there is a new schema from incoming batch ({{OK_NEW_SCHEM}}. For any follow-up schema ({{OK}}), the operator should not generate the run-time code, since it's available. However, in the current implementation of UnionAll, regardless the incoming batch returns with a {{OK_NEW_SCHEMA}} or {{OK}}, it will always call doWork(), which essentially would 1) generate code and possibly compile code, 2) doSetup, 3) doEvaluation. The code generation logic is not necessary, and doing that for each batch would significantly impact the operator's performance, and slow down query execution. {code} case OK_NEW_SCHEMA: outputFields = unionAllInput.getOutputFields(); case OK: IterOutcome workOutcome = doWork(); {code} For the multiple run-time generation, code compilation could be skipped, unless there is a miss in code cache. However, the current code logic is still problematic, since it has to {{ClassGenerator}} to generate the run-time source code. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5582) [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to data being written to the attacker's target instead of Drillbit
Rob Wu created DRILL-5582: - Summary: [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to data being written to the attacker's target instead of Drillbit Key: DRILL-5582 URL: https://issues.apache.org/jira/browse/DRILL-5582 Project: Apache Drill Issue Type: Bug Affects Versions: 1.10.0 Reporter: Rob Wu Priority: Minor Consider the scenario: Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication enabled containing important data. Bob, the attacker, attempts to spoof the connection and redirect it to his own drillbit (fake.drillbit.co) with no authentication setup. When Alice is under attack and attempts to connect to her secure drillbit, she is actually authenticating against Bob's drillbit. At this point, the connection should have failed due to unmatched configuration. However, the current implementation will return SUCCESS as long as the (spoofing) drillbit has no authentication requirement set. Currently, the drillbit <-to-> drill client connection accepts the lowest authentication configuration set on the server. This leaves the user vulnerable to spoofing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5582) [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to data being written to the attacker's target instead of Drillbit
[ https://issues.apache.org/jira/browse/DRILL-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Wu updated DRILL-5582: -- Description: *Consider the scenario:* Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication enabled containing important data. Bob, the attacker, attempts to spoof the connection and redirect it to his own drillbit (fake.drillbit.co) with no authentication setup. When Alice is under attack and attempts to connect to her secure drillbit, she is actually authenticating against Bob's drillbit. At this point, the connection should have failed due to unmatched configuration. However, the current implementation will return SUCCESS as long as the (spoofing) drillbit has no authentication requirement set. Currently, the drillbit <- to -> drill client connection accepts the lowest authentication configuration set on the server. This leaves unsuspecting user vulnerable to spoofing. was: Consider the scenario: Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication enabled containing important data. Bob, the attacker, attempts to spoof the connection and redirect it to his own drillbit (fake.drillbit.co) with no authentication setup. When Alice is under attack and attempts to connect to her secure drillbit, she is actually authenticating against Bob's drillbit. At this point, the connection should have failed due to unmatched configuration. However, the current implementation will return SUCCESS as long as the (spoofing) drillbit has no authentication requirement set. Currently, the drillbit <- to -> drill client connection accepts the lowest authentication configuration set on the server. This leaves unsuspecting user vulnerable to spoofing. > [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to > data being written to the attacker's target instead of Drillbit > - > > Key: DRILL-5582 > URL: https://issues.apache.org/jira/browse/DRILL-5582 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Rob Wu >Priority: Minor > > *Consider the scenario:* > Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication > enabled containing important data. Bob, the attacker, attempts to spoof the > connection and redirect it to his own drillbit (fake.drillbit.co) with no > authentication setup. > When Alice is under attack and attempts to connect to her secure drillbit, > she is actually authenticating against Bob's drillbit. At this point, the > connection should have failed due to unmatched configuration. However, the > current implementation will return SUCCESS as long as the (spoofing) drillbit > has no authentication requirement set. > Currently, the drillbit <- to -> drill client connection accepts the lowest > authentication configuration set on the server. This leaves unsuspecting user > vulnerable to spoofing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5582) [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to data being written to the attacker's target instead of Drillbit
[ https://issues.apache.org/jira/browse/DRILL-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Wu updated DRILL-5582: -- Description: Consider the scenario: Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication enabled containing important data. Bob, the attacker, attempts to spoof the connection and redirect it to his own drillbit (fake.drillbit.co) with no authentication setup. When Alice is under attack and attempts to connect to her secure drillbit, she is actually authenticating against Bob's drillbit. At this point, the connection should have failed due to unmatched configuration. However, the current implementation will return SUCCESS as long as the (spoofing) drillbit has no authentication requirement set. Currently, the drillbit <- to -> drill client connection accepts the lowest authentication configuration set on the server. This leaves unsuspecting user vulnerable to spoofing. was: Consider the scenario: Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication enabled containing important data. Bob, the attacker, attempts to spoof the connection and redirect it to his own drillbit (fake.drillbit.co) with no authentication setup. When Alice is under attack and attempts to connect to her secure drillbit, she is actually authenticating against Bob's drillbit. At this point, the connection should have failed due to unmatched configuration. However, the current implementation will return SUCCESS as long as the (spoofing) drillbit has no authentication requirement set. Currently, the drillbit <- to -> drill client connection accepts the lowest authentication configuration set on the server. This leaves the user vulnerable to spoofing. > [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to > data being written to the attacker's target instead of Drillbit > - > > Key: DRILL-5582 > URL: https://issues.apache.org/jira/browse/DRILL-5582 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Rob Wu >Priority: Minor > > Consider the scenario: > Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication > enabled containing important data. Bob, the attacker, attempts to spoof the > connection and redirect it to his own drillbit (fake.drillbit.co) with no > authentication setup. > When Alice is under attack and attempts to connect to her secure drillbit, > she is actually authenticating against Bob's drillbit. At this point, the > connection should have failed due to unmatched configuration. However, the > current implementation will return SUCCESS as long as the (spoofing) drillbit > has no authentication requirement set. > Currently, the drillbit <- to -> drill client connection accepts the lowest > authentication configuration set on the server. This leaves unsuspecting user > vulnerable to spoofing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5582) [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to data being written to the attacker's target instead of Drillbit
[ https://issues.apache.org/jira/browse/DRILL-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Wu updated DRILL-5582: -- Description: Consider the scenario: Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication enabled containing important data. Bob, the attacker, attempts to spoof the connection and redirect it to his own drillbit (fake.drillbit.co) with no authentication setup. When Alice is under attack and attempts to connect to her secure drillbit, she is actually authenticating against Bob's drillbit. At this point, the connection should have failed due to unmatched configuration. However, the current implementation will return SUCCESS as long as the (spoofing) drillbit has no authentication requirement set. Currently, the drillbit <- to -> drill client connection accepts the lowest authentication configuration set on the server. This leaves the user vulnerable to spoofing. was: Consider the scenario: Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication enabled containing important data. Bob, the attacker, attempts to spoof the connection and redirect it to his own drillbit (fake.drillbit.co) with no authentication setup. When Alice is under attack and attempts to connect to her secure drillbit, she is actually authenticating against Bob's drillbit. At this point, the connection should have failed due to unmatched configuration. However, the current implementation will return SUCCESS as long as the (spoofing) drillbit has no authentication requirement set. Currently, the drillbit <-to-> drill client connection accepts the lowest authentication configuration set on the server. This leaves the user vulnerable to spoofing. > [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to > data being written to the attacker's target instead of Drillbit > - > > Key: DRILL-5582 > URL: https://issues.apache.org/jira/browse/DRILL-5582 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Rob Wu >Priority: Minor > > Consider the scenario: > Alice has a drillbit (my.drillbit.co) with plain and kerberos authentication > enabled containing important data. Bob, the attacker, attempts to spoof the > connection and redirect it to his own drillbit (fake.drillbit.co) with no > authentication setup. > When Alice is under attack and attempts to connect to her secure drillbit, > she is actually authenticating against Bob's drillbit. At this point, the > connection should have failed due to unmatched configuration. However, the > current implementation will return SUCCESS as long as the (spoofing) drillbit > has no authentication requirement set. > Currently, the drillbit <- to -> drill client connection accepts the lowest > authentication configuration set on the server. This leaves the user > vulnerable to spoofing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5581) Query with CASE statement returns wrong results
[ https://issues.apache.org/jira/browse/DRILL-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz updated DRILL-5581: -- Description: A query that uses case statement, returns wrong results. {noformat} Apache Drill 1.11.0-SNAPSHOT, commit id: 874bf629 [test@centos-101 ~]# cat order_sample.csv 202634342,2101,20160301 apache drill 1.11.0-SNAPSHOT "this isn't your grandfather's sql" 0: jdbc:drill:schema=dfs.tmp> ALTER SESSION SET `store.format`='csv'; +---++ | ok |summary | +---++ | true | store.format updated. | +---++ 1 row selected (0.245 seconds) 0: jdbc:drill:schema=dfs.tmp> CREATE VIEW `vw_order_sample_csv` as . . . . . . . . . . . . . . > SELECT . . . . . . . . . . . . . . > `columns`[0] AS `ND`, . . . . . . . . . . . . . . > CAST(`columns`[1] AS BIGINT) AS `col1`, . . . . . . . . . . . . . . > CAST(`columns`[2] AS BIGINT) AS `col2` . . . . . . . . . . . . . . > FROM `order_sample.csv`; +---+--+ | ok | summary| +---+--+ | true | View 'vw_order_sample_csv' created successfully in 'dfs.tmp' schema | +---+--+ 1 row selected (0.253 seconds) 0: jdbc:drill:schema=dfs.tmp> select . . . . . . . . . . . . . . > case . . . . . . . . . . . . . . > when col1 > col2 then col1 . . . . . . . . . . . . . . > else col2 . . . . . . . . . . . . . . > end as temp_col, . . . . . . . . . . . . . . > case . . . . . . . . . . . . . . > when col1 = 2101 and (20170302 - col2) > 1 then 'D' . . . . . . . . . . . . . . > when col2 = 2101 then 'P' . . . . . . . . . . . . . . > when col1 - col2 > 1 then '0' . . . . . . . . . . . . . . > else 'A' . . . . . . . . . . . . . . > end as status . . . . . . . . . . . . . . > from `vw_order_sample_csv`; +---+-+ | temp_col | status | +---+-+ | 20160301 | A | +---+-+ 1 row selected (0.318 seconds) 0: jdbc:drill:schema=dfs.tmp> explain plan for . . . . . . . . . . . . . . > select . . . . . . . . . . . . . . > case . . . . . . . . . . . . . . > when col1 > col2 then col1 . . . . . . . . . . . . . . > else col2 . . . . . . . . . . . . . . > end as temp_col, . . . . . . . . . . . . . . > case . . . . . . . . . . . . . . > when col1 = 2101 and (20170302 - col2) > 1 then 'D' . . . . . . . . . . . . . . > when col2 = 2101 then 'P' . . . . . . . . . . . . . . > when col1 - col2 > 1 then '0' . . . . . . . . . . . . . . > else 'A' . . . . . . . . . . . . . . > end as status . . . . . . . . . . . . . . > from `vw_order_sample_csv`; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(temp_col=[CASE(>(CAST(ITEM($0, 1)):BIGINT, CAST(ITEM($0, 2)):BIGINT), CAST(ITEM($0, 1)):BIGINT, CAST(ITEM($0, 2)):BIGINT)], status=[CASE(AND(=(CAST(ITEM($0, 1)):BIGINT, 2101), >(-(20170302, CAST(ITEM($0, 2)):BIGINT), 1)), 'D', =(CAST(ITEM($0, 2)):BIGINT, 2101), 'P', >(-(CAST(ITEM($0, 1)):BIGINT, CAST(ITEM($0, 2)):BIGINT), 1), '0', 'A')]) 00-02Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/order_sample.csv, numFiles=1, columns=[`columns`[1], `columns`[2]], files=[maprfs:///tmp/order_sample.csv]]]) // Details of Java compiler from sys.options 0: jdbc:drill:schema=dfs.tmp> select name, status from sys.options where name like '%java_compiler%'; ++--+ | name | status | ++--+ | exec.java.compiler.exp_in_method_size | DEFAULT | | exec.java_compiler | DEFAULT | | exec.java_compiler_debug | DEFAULT | | exec.java_compiler_janino_maxsize | DEFAULT | ++--+ 4 rows selected (0.21 seconds) {noformat} Results from Postgres 9.3 for the same query, note the difference in results {noformat} postgres=# create table order_sample(c1 varchar(50), c2 bigint, c3 bigint); CREATE TABLE postgres=# insert into order_sample values('202634342',2101,20160301); INSERT 0 1 postgres=# select * from order_sample; c1 |c2|c3 ---+--+-- 202634342 | 2101 | 20160301 (1 row) postgres=# create view vw_order_sample_csv as select c1 as ND, CAST(c2 AS BIGINT) AS col1, CAST(c3 AS BIGINT) AS col2 FROM order_sample; CREATE VIEW postgres=# select postgres-# case postgres-# when col1 > col2 then col1 postgres-# else col2 postgres-# end as temp_col, postgres-# case postgres-# when col1 =