[jira] [Commented] (DRILL-4525) Query with BETWEEN clause on Date and Timestamp values fails with Validation Error

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205951#comment-15205951
 ] 

ASF GitHub Bot commented on DRILL-4525:
---

GitHub user hsuanyi opened a pull request:

https://github.com/apache/drill/pull/438

DRILL-4525: Allow SqlBetweenOperator to accept LOWER_OPERAND and UPPE…

…R_OPERAND with different types

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hsuanyi/incubator-drill DRILL-4525

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/438.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #438


commit 5884e9b923057f1e9ed77ecf0ba8531e4d2f03b7
Author: Hsuan-Yi Chu 
Date:   2016-03-21T21:43:54Z

DRILL-4525: Allow SqlBetweenOperator to accept LOWER_OPERAND and 
UPPER_OPERAND with different types




> Query with BETWEEN clause on Date and Timestamp values fails with Validation 
> Error
> --
>
> Key: DRILL-4525
> URL: https://issues.apache.org/jira/browse/DRILL-4525
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Abhishek Girish
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
>
> Query: (simplified variant of TPC-DS Query37)
> {code}
> SELECT
>*
> FROM   
>date_dim
> WHERE   
>d_date BETWEEN Cast('1999-03-06' AS DATE) AND  (
>   Cast('1999-03-06' AS DATE) + INTERVAL '60' day)
> LIMIT 10;
> {code}
> Error:
> {code}
> Error: VALIDATION ERROR: From line 6, column 8 to line 7, column 64: Cannot 
> apply 'BETWEEN ASYMMETRIC' to arguments of type ' BETWEEN ASYMMETRIC 
>  AND '. Supported form(s): ' BETWEEN 
>  AND '
> SQL Query null
> [Error Id: 223fb37c-f561-4a37-9283-871dc6f4d6d0 on abhi2:31010] 
> (state=,code=0)
> {code}
> This is a regression from 1.6.0. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4037) No schemas available using MongoDB 3.0.6 with authentication + wired Tiger

2016-03-22 Thread W. (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205971#comment-15205971
 ] 

W. commented on DRILL-4037:
---

Hello,
DRILL 1.6 deployed, 7 drillbits
MongoDB is 3.0.6 in master/slave mode with no sharding enabled
Drill Explorer is of 1.2.1.0 (32 bits)



> No schemas available using MongoDB 3.0.6 with authentication + wired Tiger
> --
>
> Key: DRILL-4037
> URL: https://issues.apache.org/jira/browse/DRILL-4037
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - MongoDB
>Affects Versions: 1.2.0
> Environment: Windows 7 - 64 bits
>Reporter: W.
>Priority: Minor
>
> From Drill Explorer, Browse schemas tab, unable to view anything about MongoDB
> MongoDB 3.0.6 is configured as Master/Replica with wiredTiger and 
> authentication enabled



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4517) Reading emtpy Parquet file failes with java.lang.IllegalArgumentException

2016-03-22 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206061#comment-15206061
 ] 

Deneche A. Hakim commented on DRILL-4517:
-

No, this issue is not fixed yet. In DRILL-2223 we fixed CTAS to no longer 
create empty parquet files. We still need to fix Drill, if possible, to 
properly handle such files.

Can you share this particular parquet file ? thanks

> Reading emtpy Parquet file failes with java.lang.IllegalArgumentException
> -
>
> Key: DRILL-4517
> URL: https://issues.apache.org/jira/browse/DRILL-4517
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Reporter: Tobias
>
> When querying a Parquet file that has a schema but no rows the Drill Server 
> will fail with the below
> This looks similar to DRILL-3557
> {noformat}
> {{ParquetMetaData{FileMetaData{schema: message TRANSACTION_REPORT {
>   required int64 MEMBER_ACCOUNT_ID;
>   required int64 TIMESTAMP_IN_HOUR;
>   optional int64 APPLICATION_ID;
> }
> , metadata: {}}}, blocks: []}
> {noformat}
> {noformat}
> Caused by: java.lang.IllegalArgumentException: MinorFragmentId 0 has no read 
> entries assigned
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) 
> ~[guava-14.0.1.jar:na]
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:707)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:105)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:68)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:35)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.base.AbstractGroupScan.accept(AbstractGroupScan.java:60)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:102)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:35)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProject(AbstractPhysicalVisitor.java:77)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.config.Project.accept(Project.java:51) 
> ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:82)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:35)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:195)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.config.Screen.accept(Screen.java:97) 
> ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.SimpleParallelizer.generateWorkUnit(SimpleParallelizer.java:355)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.SimpleParallelizer.getFragments(SimpleParallelizer.java:134)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.getQueryWorkUnit(Foreman.java:518) 
> [drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:405) 
> [drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:926) 
> [drill-java-exec-1.5.0.jar:1.5.0]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4517) Reading emtpy Parquet file failes with java.lang.IllegalArgumentException

2016-03-22 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206068#comment-15206068
 ] 

Deneche A. Hakim commented on DRILL-4517:
-

more precisely, DRILL-2223 was a duplicate of DRILL-3635 and we fixed 
DRILL-3635 to no longer produce such empty parquet files

> Reading emtpy Parquet file failes with java.lang.IllegalArgumentException
> -
>
> Key: DRILL-4517
> URL: https://issues.apache.org/jira/browse/DRILL-4517
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Reporter: Tobias
>
> When querying a Parquet file that has a schema but no rows the Drill Server 
> will fail with the below
> This looks similar to DRILL-3557
> {noformat}
> {{ParquetMetaData{FileMetaData{schema: message TRANSACTION_REPORT {
>   required int64 MEMBER_ACCOUNT_ID;
>   required int64 TIMESTAMP_IN_HOUR;
>   optional int64 APPLICATION_ID;
> }
> , metadata: {}}}, blocks: []}
> {noformat}
> {noformat}
> Caused by: java.lang.IllegalArgumentException: MinorFragmentId 0 has no read 
> entries assigned
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) 
> ~[guava-14.0.1.jar:na]
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:707)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:105)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:68)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:35)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.base.AbstractGroupScan.accept(AbstractGroupScan.java:60)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:102)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:35)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProject(AbstractPhysicalVisitor.java:77)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.config.Project.accept(Project.java:51) 
> ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:82)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:35)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:195)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.config.Screen.accept(Screen.java:97) 
> ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.SimpleParallelizer.generateWorkUnit(SimpleParallelizer.java:355)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.SimpleParallelizer.getFragments(SimpleParallelizer.java:134)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.getQueryWorkUnit(Foreman.java:518) 
> [drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:405) 
> [drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:926) 
> [drill-java-exec-1.5.0.jar:1.5.0]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4514) Add describe schema command

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206067#comment-15206067
 ] 

ASF GitHub Bot commented on DRILL-4514:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/436#discussion_r56957490
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DescribeSchemaCommandResult.java
 ---
@@ -0,0 +1,30 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql.handlers;
+
+public class DescribeSchemaCommandResult {
+
+  public String name;
--- End diff --

Done.


> Add describe schema  command
> -
>
> Key: DRILL-4514
> URL: https://issues.apache.org/jira/browse/DRILL-4514
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: Future
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>
> Add describe database  command which will return directory 
> associated with a database on the fly.
> Syntax:
> describe database 
> describe schema 
> Output:
> {noformat}
>  DESCRIBE SCHEMA xdf.proc;
> +-++
> |name | location   |
> +-++
> | xdf.proc| maprfs://dl.data/processed |
> +-++
> {noformat}
> Current implementation covers only dfs schema.
> For all other "" will be returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4514) Add describe schema command

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206070#comment-15206070
 ] 

ASF GitHub Bot commented on DRILL-4514:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/436#discussion_r56957772
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlDescribeSchema.java
 ---
@@ -0,0 +1,82 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql.parser;
+
+import org.apache.calcite.sql.SqlCall;
+import org.apache.calcite.sql.SqlIdentifier;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlLiteral;
+import org.apache.calcite.sql.SqlNode;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.SqlSpecialOperator;
+import org.apache.calcite.sql.SqlWriter;
+import org.apache.calcite.sql.parser.SqlParserPos;
+import org.apache.drill.exec.planner.sql.handlers.AbstractSqlHandler;
+import org.apache.drill.exec.planner.sql.handlers.DescribeSchemaHandler;
+import org.apache.drill.exec.planner.sql.handlers.SqlHandlerConfig;
+
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * Sql parse tree node to represent statement:
+ * SHOW FILES [{FROM | IN} db_name] [LIKE 'pattern' | WHERE expr]
+ */
+public class SqlDescribeSchema extends DrillSqlCall {
+
+  private final SqlIdentifier schema;
+
+  public static final SqlSpecialOperator OPERATOR =
+  new SqlSpecialOperator("DESCRIBE_SCHEMA", SqlKind.OTHER) {
+@Override
+public SqlCall createCall(SqlLiteral functionQualifier, 
SqlParserPos pos, SqlNode... operands) {
+  return new SqlDescribeSchema(pos, (SqlIdentifier) operands[0]);
+}
+  };
+
+  public SqlDescribeSchema(SqlParserPos pos, SqlIdentifier schema) {
+super(pos);
+this.schema = schema;
+assert schema != null;
--- End diff --

Agree, preconditions is better. But I have checked that schema can't come 
as null, so I have removed check for not null at all.


> Add describe schema  command
> -
>
> Key: DRILL-4514
> URL: https://issues.apache.org/jira/browse/DRILL-4514
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: Future
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>
> Add describe database  command which will return directory 
> associated with a database on the fly.
> Syntax:
> describe database 
> describe schema 
> Output:
> {noformat}
>  DESCRIBE SCHEMA xdf.proc;
> +-++
> |name | location   |
> +-++
> | xdf.proc| maprfs://dl.data/processed |
> +-++
> {noformat}
> Current implementation covers only dfs schema.
> For all other "" will be returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4525) Query with BETWEEN clause on Date and Timestamp values fails with Validation Error

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206470#comment-15206470
 ] 

ASF GitHub Bot commented on DRILL-4525:
---

Github user hsuanyi closed the pull request at:

https://github.com/apache/drill/pull/438


> Query with BETWEEN clause on Date and Timestamp values fails with Validation 
> Error
> --
>
> Key: DRILL-4525
> URL: https://issues.apache.org/jira/browse/DRILL-4525
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Abhishek Girish
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
>
> Query: (simplified variant of TPC-DS Query37)
> {code}
> SELECT
>*
> FROM   
>date_dim
> WHERE   
>d_date BETWEEN Cast('1999-03-06' AS DATE) AND  (
>   Cast('1999-03-06' AS DATE) + INTERVAL '60' day)
> LIMIT 10;
> {code}
> Error:
> {code}
> Error: VALIDATION ERROR: From line 6, column 8 to line 7, column 64: Cannot 
> apply 'BETWEEN ASYMMETRIC' to arguments of type ' BETWEEN ASYMMETRIC 
>  AND '. Supported form(s): ' BETWEEN 
>  AND '
> SQL Query null
> [Error Id: 223fb37c-f561-4a37-9283-871dc6f4d6d0 on abhi2:31010] 
> (state=,code=0)
> {code}
> This is a regression from 1.6.0. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-1706) date_sub function does not accept string as input in Drill

2016-03-22 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva closed DRILL-1706.
---
   Resolution: Fixed
Fix Version/s: (was: Future)
   1.2.0

> date_sub function does not accept string as input in Drill
> --
>
> Key: DRILL-1706
> URL: https://issues.apache.org/jira/browse/DRILL-1706
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill, Functions - Hive
>Affects Versions: 0.7.0
>Reporter: Hao Zhu
>Assignee: Arina Ielchiieva
> Fix For: 1.2.0
>
>
> "date_sub" function does not accept string as input in Drill, however it does 
> in Hive.
> This different behavior of the function will make customer re-write their 
> query to use "cast as date".
> Minimum reproduce :
> {code}
> 0: jdbc:drill:zk=local> select date_sub('2014-11-12 16:45:22',15) from 
> dfs.tmp.`drilltest/test.csv` ;
> Query failed: Failure while running fragment., Invalid format: "2014-11-12 
> 16:45:22" is malformed at "14-11-12 16:45:22" [ 
> 9a6f18da-eb1e-4d91-879a-8d9d528efd59 on 10.250.0.115:31010 ]
>   (java.lang.IllegalArgumentException) Invalid format: "2014-11-12 16:45:22" 
> is malformed at "14-11-12 16:45:22"
> org.joda.time.format.DateTimeFormatter.parseDateTime():873
> org.apache.drill.exec.test.generated.ProjectorGen23.doSetup():63
> org.apache.drill.exec.test.generated.ProjectorGen23.setup():97
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():427
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.buildSchema():270
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.buildSchema():80
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.buildSchema():95
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():111
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run():249
> ...():0
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> Comparing to Hive which is good.
> {code}
> 0: jdbc:hive2://n1a:1/default> select date_sub('2014-11-12 16:45:22',15) 
> from passwords limit 1 ; 
> +-+
> | _c0 |
> +-+
> | 2014-10-28  |
> +-+
> 1 row selected (6.568 seconds)
> {code}
> Workaround in Drill:
> {code}
> 0: jdbc:drill:zk=local> select date_sub(cast('2014-11-12 16:45:22' as 
> date),15) from dfs.tmp.`drilltest/test.csv` ;
> ++
> |   EXPR$0   |
> ++
> | 2014-10-28 |
> ++
> 1 row selected (0.082 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-1706) date_sub function does not accept string as input in Drill

2016-03-22 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-1706:
---

Assignee: Arina Ielchiieva

> date_sub function does not accept string as input in Drill
> --
>
> Key: DRILL-1706
> URL: https://issues.apache.org/jira/browse/DRILL-1706
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill, Functions - Hive
>Affects Versions: 0.7.0
>Reporter: Hao Zhu
>Assignee: Arina Ielchiieva
> Fix For: 1.2.0
>
>
> "date_sub" function does not accept string as input in Drill, however it does 
> in Hive.
> This different behavior of the function will make customer re-write their 
> query to use "cast as date".
> Minimum reproduce :
> {code}
> 0: jdbc:drill:zk=local> select date_sub('2014-11-12 16:45:22',15) from 
> dfs.tmp.`drilltest/test.csv` ;
> Query failed: Failure while running fragment., Invalid format: "2014-11-12 
> 16:45:22" is malformed at "14-11-12 16:45:22" [ 
> 9a6f18da-eb1e-4d91-879a-8d9d528efd59 on 10.250.0.115:31010 ]
>   (java.lang.IllegalArgumentException) Invalid format: "2014-11-12 16:45:22" 
> is malformed at "14-11-12 16:45:22"
> org.joda.time.format.DateTimeFormatter.parseDateTime():873
> org.apache.drill.exec.test.generated.ProjectorGen23.doSetup():63
> org.apache.drill.exec.test.generated.ProjectorGen23.setup():97
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():427
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.buildSchema():270
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.buildSchema():80
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.buildSchema():95
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():111
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run():249
> ...():0
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> Comparing to Hive which is good.
> {code}
> 0: jdbc:hive2://n1a:1/default> select date_sub('2014-11-12 16:45:22',15) 
> from passwords limit 1 ; 
> +-+
> | _c0 |
> +-+
> | 2014-10-28  |
> +-+
> 1 row selected (6.568 seconds)
> {code}
> Workaround in Drill:
> {code}
> 0: jdbc:drill:zk=local> select date_sub(cast('2014-11-12 16:45:22' as 
> date),15) from dfs.tmp.`drilltest/test.csv` ;
> ++
> |   EXPR$0   |
> ++
> | 2014-10-28 |
> ++
> 1 row selected (0.082 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1706) date_sub function does not accept string as input in Drill

2016-03-22 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206567#comment-15206567
 ] 

Arina Ielchiieva commented on DRILL-1706:
-

Reproduces in Drill 1.1.0.
Starting from 1.2.0 is not reproducible up to 1.6.0.

> date_sub function does not accept string as input in Drill
> --
>
> Key: DRILL-1706
> URL: https://issues.apache.org/jira/browse/DRILL-1706
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill, Functions - Hive
>Affects Versions: 0.7.0
>Reporter: Hao Zhu
>Assignee: Arina Ielchiieva
> Fix For: 1.2.0
>
>
> "date_sub" function does not accept string as input in Drill, however it does 
> in Hive.
> This different behavior of the function will make customer re-write their 
> query to use "cast as date".
> Minimum reproduce :
> {code}
> 0: jdbc:drill:zk=local> select date_sub('2014-11-12 16:45:22',15) from 
> dfs.tmp.`drilltest/test.csv` ;
> Query failed: Failure while running fragment., Invalid format: "2014-11-12 
> 16:45:22" is malformed at "14-11-12 16:45:22" [ 
> 9a6f18da-eb1e-4d91-879a-8d9d528efd59 on 10.250.0.115:31010 ]
>   (java.lang.IllegalArgumentException) Invalid format: "2014-11-12 16:45:22" 
> is malformed at "14-11-12 16:45:22"
> org.joda.time.format.DateTimeFormatter.parseDateTime():873
> org.apache.drill.exec.test.generated.ProjectorGen23.doSetup():63
> org.apache.drill.exec.test.generated.ProjectorGen23.setup():97
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():427
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.buildSchema():270
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.buildSchema():80
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.buildSchema():95
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():111
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run():249
> ...():0
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> Comparing to Hive which is good.
> {code}
> 0: jdbc:hive2://n1a:1/default> select date_sub('2014-11-12 16:45:22',15) 
> from passwords limit 1 ; 
> +-+
> | _c0 |
> +-+
> | 2014-10-28  |
> +-+
> 1 row selected (6.568 seconds)
> {code}
> Workaround in Drill:
> {code}
> 0: jdbc:drill:zk=local> select date_sub(cast('2014-11-12 16:45:22' as 
> date),15) from dfs.tmp.`drilltest/test.csv` ;
> ++
> |   EXPR$0   |
> ++
> | 2014-10-28 |
> ++
> 1 row selected (0.082 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1328) Support table statistics

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206668#comment-15206668
 ] 

ASF GitHub Bot commented on DRILL-1328:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/425#discussion_r57015590
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StatisticsAggrFunctions.java
 ---
@@ -0,0 +1,295 @@

+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
+
+/*
+ * This class is automatically generated from AggrTypeFunctions2.tdd using 
FreeMarker.
+ */
+
+package org.apache.drill.exec.expr.fn.impl;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillAggFunc;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import 
org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import 
org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.annotations.Workspace;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import org.apache.drill.exec.expr.holders.NullableBigIntHolder;
+import org.apache.drill.exec.expr.holders.NullableVarBinaryHolder;
+import org.apache.drill.exec.expr.holders.ObjectHolder;
+import org.apache.drill.exec.vector.complex.reader.FieldReader;
+
+import javax.inject.Inject;
+
+@SuppressWarnings("unused")
+public class StatisticsAggrFunctions {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(StatisticsAggrFunctions.class);
+
+  @FunctionTemplate(name = "statcount", scope = 
FunctionTemplate.FunctionScope.POINT_AGGREGATE)
+  public static class StatCount implements DrillAggFunc {
+@Param
+FieldReader in;
+@Workspace
+BigIntHolder count;
+@Output
+NullableBigIntHolder out;
+
+@Override
+public void setup() {
+  count = new BigIntHolder();
+}
+
+@Override
+public void add() {
+  count.value++;
+}
+
+@Override
+public void output() {
+  out.isSet = 1;
+  out.value = count.value;
+}
+
+@Override
+public void reset() {
+  count.value = 0;
+}
+  }
+
+  @FunctionTemplate(name = "nonnullstatcount", scope = 
FunctionTemplate.FunctionScope.POINT_AGGREGATE)
+  public static class NonNullStatCount implements DrillAggFunc {
+@Param
+FieldReader in;
+@Workspace
+BigIntHolder count;
+@Output
+NullableBigIntHolder out;
+
+@Override
+public void setup() {
+  count = new BigIntHolder();
+}
+
+@Override
+public void add() {
+  if (in.isSet()) {
+count.value++;
+  }
+}
+
+@Override
+public void output() {
+  out.isSet = 1;
+  out.value = count.value;
+}
+
+@Override
+public void reset() {
+  count.value = 0;
+}
+  }
+
+  @FunctionTemplate(name = "hll", scope = 
FunctionTemplate.FunctionScope.POINT_AGGREGATE)
+  public static class HllFieldReader implements DrillAggFunc {
+@Param
+FieldReader in;
+@Workspace
+ObjectHolder work;
+@Output
+NullableVarBinaryHolder out;
+@Inject
+DrillBuf buffer;
+
+@Override
+public void setup() {
+  work = new ObjectHolder();
+  work.obj = new 
com.clearspring.analytics.stream.cardinality.HyperLogLog(10);
+}
+
+@Overrid

[jira] [Commented] (DRILL-1328) Support table statistics

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206794#comment-15206794
 ] 

ASF GitHub Bot commented on DRILL-1328:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/425#discussion_r57026410
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StatisticsAggrFunctions.java
 ---
@@ -0,0 +1,295 @@

+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
+
+/*
+ * This class is automatically generated from AggrTypeFunctions2.tdd using 
FreeMarker.
+ */
+
+package org.apache.drill.exec.expr.fn.impl;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillAggFunc;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import 
org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import 
org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.annotations.Workspace;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import org.apache.drill.exec.expr.holders.NullableBigIntHolder;
+import org.apache.drill.exec.expr.holders.NullableVarBinaryHolder;
+import org.apache.drill.exec.expr.holders.ObjectHolder;
+import org.apache.drill.exec.vector.complex.reader.FieldReader;
+
+import javax.inject.Inject;
+
+@SuppressWarnings("unused")
+public class StatisticsAggrFunctions {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(StatisticsAggrFunctions.class);
+
+  @FunctionTemplate(name = "statcount", scope = 
FunctionTemplate.FunctionScope.POINT_AGGREGATE)
+  public static class StatCount implements DrillAggFunc {
+@Param
+FieldReader in;
+@Workspace
+BigIntHolder count;
+@Output
+NullableBigIntHolder out;
+
+@Override
+public void setup() {
+  count = new BigIntHolder();
+}
+
+@Override
+public void add() {
+  count.value++;
+}
+
+@Override
+public void output() {
+  out.isSet = 1;
+  out.value = count.value;
+}
+
+@Override
+public void reset() {
+  count.value = 0;
+}
+  }
+
+  @FunctionTemplate(name = "nonnullstatcount", scope = 
FunctionTemplate.FunctionScope.POINT_AGGREGATE)
+  public static class NonNullStatCount implements DrillAggFunc {
+@Param
+FieldReader in;
+@Workspace
+BigIntHolder count;
+@Output
+NullableBigIntHolder out;
+
+@Override
+public void setup() {
+  count = new BigIntHolder();
+}
+
+@Override
+public void add() {
+  if (in.isSet()) {
+count.value++;
+  }
+}
+
+@Override
+public void output() {
+  out.isSet = 1;
+  out.value = count.value;
+}
+
+@Override
+public void reset() {
+  count.value = 0;
+}
+  }
+
+  @FunctionTemplate(name = "hll", scope = 
FunctionTemplate.FunctionScope.POINT_AGGREGATE)
+  public static class HllFieldReader implements DrillAggFunc {
+@Param
+FieldReader in;
+@Workspace
+ObjectHolder work;
+@Output
+NullableVarBinaryHolder out;
+@Inject
+DrillBuf buffer;
+
+@Override
+public void setup() {
+  work = new ObjectHolder();
+  work.obj = new 
com.clearspring.analytics.stream.cardinality.HyperLogLog(10);
+}
+
+@Override

[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206798#comment-15206798
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user StevenMPhillips commented on the pull request:

https://github.com/apache/drill/pull/405#issuecomment-199910572
  
Sudheesh, could you respond to the comments made by Jin Feng? If you have 
already discussed it with him in person, could you post a summary here.

I am in favor of merging this PR, but just the last commit. I think the 
second commit should be separate, and should be done as an inserted operator 
(similar to IteratorValidator), rather than modifying the constructor for 
Screen.


> Limit 0 should avoid execution when querying a known schema
> ---
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4525) Query with BETWEEN clause on Date and Timestamp values fails with Validation Error

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206889#comment-15206889
 ] 

ASF GitHub Bot commented on DRILL-4525:
---

Github user hsuanyi commented on the pull request:

https://github.com/apache/drill/pull/439#issuecomment-199936054
  
@jinfengni  can you help review? Thanks.


> Query with BETWEEN clause on Date and Timestamp values fails with Validation 
> Error
> --
>
> Key: DRILL-4525
> URL: https://issues.apache.org/jira/browse/DRILL-4525
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Abhishek Girish
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
>
> Query: (simplified variant of TPC-DS Query37)
> {code}
> SELECT
>*
> FROM   
>date_dim
> WHERE   
>d_date BETWEEN Cast('1999-03-06' AS DATE) AND  (
>   Cast('1999-03-06' AS DATE) + INTERVAL '60' day)
> LIMIT 10;
> {code}
> Error:
> {code}
> Error: VALIDATION ERROR: From line 6, column 8 to line 7, column 64: Cannot 
> apply 'BETWEEN ASYMMETRIC' to arguments of type ' BETWEEN ASYMMETRIC 
>  AND '. Supported form(s): ' BETWEEN 
>  AND '
> SQL Query null
> [Error Id: 223fb37c-f561-4a37-9283-871dc6f4d6d0 on abhi2:31010] 
> (state=,code=0)
> {code}
> This is a regression from 1.6.0. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4525) Query with BETWEEN clause on Date and Timestamp values fails with Validation Error

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206887#comment-15206887
 ] 

ASF GitHub Bot commented on DRILL-4525:
---

GitHub user hsuanyi opened a pull request:

https://github.com/apache/drill/pull/439

DRILL-4525: Allow SqlBetweenOperator to accept LOWER_OPERAND and UPPE…

…R_OPERAND with different types

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hsuanyi/incubator-drill DRILL-4525

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #439


commit 0f6bd0a398c017bdf40c01d2baab4050f8f00a2a
Author: Hsuan-Yi Chu 
Date:   2016-03-21T21:43:54Z

DRILL-4525: Allow SqlBetweenOperator to accept LOWER_OPERAND and 
UPPER_OPERAND with different types




> Query with BETWEEN clause on Date and Timestamp values fails with Validation 
> Error
> --
>
> Key: DRILL-4525
> URL: https://issues.apache.org/jira/browse/DRILL-4525
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Abhishek Girish
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
>
> Query: (simplified variant of TPC-DS Query37)
> {code}
> SELECT
>*
> FROM   
>date_dim
> WHERE   
>d_date BETWEEN Cast('1999-03-06' AS DATE) AND  (
>   Cast('1999-03-06' AS DATE) + INTERVAL '60' day)
> LIMIT 10;
> {code}
> Error:
> {code}
> Error: VALIDATION ERROR: From line 6, column 8 to line 7, column 64: Cannot 
> apply 'BETWEEN ASYMMETRIC' to arguments of type ' BETWEEN ASYMMETRIC 
>  AND '. Supported form(s): ' BETWEEN 
>  AND '
> SQL Query null
> [Error Id: 223fb37c-f561-4a37-9283-871dc6f4d6d0 on abhi2:31010] 
> (state=,code=0)
> {code}
> This is a regression from 1.6.0. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4527) Remove unnecessary code: DrillAvgVarianceConvertlet.java

2016-03-22 Thread MinJi Kim (JIRA)
MinJi Kim created DRILL-4527:


 Summary: Remove unnecessary code:  DrillAvgVarianceConvertlet.java
 Key: DRILL-4527
 URL: https://issues.apache.org/jira/browse/DRILL-4527
 Project: Apache Drill
  Issue Type: Bug
  Components:  Server
Reporter: MinJi Kim
Assignee: MinJi Kim


DrillConvertletTable is used as a way to have custom functions.  For example, 
for EXTRACT(), DrilLConvertletTable.get() returns DrillExtractConvertlet, which 
returns a custom RexNode for the extract function.  

On the other hand, DrillAvgVarianceConvertlet is never used.  
stddev/avg/variance functions are handled by DrillAggregateRule and 
DrillReduceAggregatesRule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207042#comment-15207042
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/405#issuecomment-199962117
  
I have addressed @jinfengni 's and @hsuanyi 's comments 
[here](https://github.com/sudheeshkatkam/drill/commit/e4cfdfa9b0562d52ac07f6d80860a82fa8baba40)
 [I force pushed to this branch and somehow their comments are not referenced 
in this PR any longer.]


> Limit 0 should avoid execution when querying a known schema
> ---
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207044#comment-15207044
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/405#discussion_r57047786
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/limit/TestLimit0.java
 ---
@@ -0,0 +1,677 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.limit;
+
+import com.google.common.collect.Lists;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.BaseTestQuery;
+import org.apache.drill.PlanTestBase;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.joda.time.DateTime;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Ignore;
+import org.junit.Test;
+
+import java.util.List;
+
+public class TestLimit0 extends BaseTestQuery {
+
+  private static final String viewName = "limitZeroEmployeeView";
+
+  private static String wrapLimit0(final String query) {
+return "SELECT * FROM (" + query + ") LZT LIMIT 0";
+  }
+
+  @BeforeClass
+  public static void createView() throws Exception {
+test("USE dfs_test.tmp");
+test(String.format("CREATE OR REPLACE VIEW %s AS SELECT " +
+"CAST(employee_id AS INT) AS employee_id, " +
+"CAST(full_name AS VARCHAR(25)) AS full_name, " +
+"CAST(position_id AS INTEGER) AS position_id, " +
+"CAST(department_id AS BIGINT) AS department_id," +
+"CAST(birth_date AS DATE) AS birth_date, " +
+"CAST(hire_date AS TIMESTAMP) AS hire_date, " +
+"CAST(salary AS DOUBLE) AS salary, " +
+"CAST(salary AS FLOAT) AS fsalary, " +
+"CAST((CASE WHEN marital_status = 'S' THEN true ELSE false END) AS 
BOOLEAN) AS single, " +
+"CAST(education_level AS VARCHAR(60)) AS education_level," +
+"CAST(gender AS CHAR) AS gender " +
+"FROM cp.`employee.json` " +
+"ORDER BY employee_id " +
+"LIMIT 1;", viewName));
+// { "employee_id":1,"full_name":"Sheri 
Nowmer","first_name":"Sheri","last_name":"Nowmer","position_id":1,
+// 
"position_title":"President","store_id":0,"department_id":1,"birth_date":"1961-08-26",
+// "hire_date":"1994-12-01 
00:00:00.0","end_date":null,"salary":8.,"supervisor_id":0,
+// "education_level":"Graduate 
Degree","marital_status":"S","gender":"F","management_role":"Senior Management" 
}
+  }
+
+  @AfterClass
+  public static void tearDownView() throws Exception {
+test("DROP VIEW " + viewName + ";");
+  }
+
+  //  SIMPLE QUERIES 
+
+  @Test
+  public void infoSchema() throws Exception {
+testBuilder()
+.sqlQuery(String.format("DESCRIBE %s", viewName))
+.unOrdered()
+.baselineColumns("COLUMN_NAME", "DATA_TYPE", "IS_NULLABLE")
+.baselineValues("employee_id", "INTEGER", "YES")
+.baselineValues("full_name", "CHARACTER VARYING", "YES")
+.baselineValues("position_id", "INTEGER", "YES")
+.baselineValues("department_id", "BIGINT", "YES")
+.baselineValues("birth_date", "DATE", "YES")
+.baselineValues("hire_date", "TIMESTAMP", "YES")
+.baselineValues("salary", "DOUBLE", "YES")
+.baselineValues("fsalary", "FLOAT", "YES")
+.baselineValues("single", "BOOLEAN", "NO")
+.baselineValues("education_level", "CHARACTER VARYING", "YES")
+.baselineValues("gender", "CHARACTER", "YES")
+.go();
+  }
+
+  @Test
+  @Ignore("DateTime timezone error needs to be f

[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207053#comment-15207053
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user StevenMPhillips commented on the pull request:

https://github.com/apache/drill/pull/405#issuecomment-199964093
  
+1


> Limit 0 should avoid execution when querying a known schema
> ---
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4528) AVG() windows function is not optimized for limit 0 queries

2016-03-22 Thread Krystal (JIRA)
Krystal created DRILL-4528:
--

 Summary: AVG() windows function is not optimized for limit 0 
queries
 Key: DRILL-4528
 URL: https://issues.apache.org/jira/browse/DRILL-4528
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Krystal
Assignee: Sean Hsuan-Yi Chu


git.commit.id.abbrev=cee5317

The following sample query contains the avg() windows function that is not 
optimized when wrapped with limit 0:

select * from (
SELECT AVG(cast( col1 as BIGINT )) OVER(PARTITION BY cast( col4 as TIMESTAMP) 
ORDER BY cast( col5 as DATE )) FROM `fewRowsAllData_v`) t limit 0

Physical Plan:
{code}
00-00Screen : rowType = RecordType(ANY EXPR$0): rowcount = 1.0, cumulative 
cost = {469.1 rows, 5717.190984570043 cpu, 0.0 io, 0.0 network, 1872.0 memory}, 
id = 5762023
00-01  Project(EXPR$0=[$0]) : rowType = RecordType(ANY EXPR$0): rowcount = 
1.0, cumulative cost = {469.0 rows, 5717.090984570043 cpu, 0.0 io, 0.0 network, 
1872.0 memory}, id = 5762022
00-02SelectionVectorRemover : rowType = RecordType(ANY EXPR$0): 
rowcount = 1.0, cumulative cost = {469.0 rows, 5717.090984570043 cpu, 0.0 io, 
0.0 network, 1872.0 memory}, id = 5762021
00-03  Limit(fetch=[0]) : rowType = RecordType(ANY EXPR$0): rowcount = 
1.0, cumulative cost = {468.0 rows, 5716.090984570043 cpu, 0.0 io, 0.0 network, 
1872.0 memory}, id = 5762020
00-04Project(EXPR$0=[/(CastHigh($3), $4)]) : rowType = 
RecordType(ANY EXPR$0): rowcount = 78.0, cumulative cost = {468.0 rows, 
5716.090984570043 cpu, 0.0 io, 0.0 network, 1872.0 memory}, id = 5762019
00-05  Window(window#0=[window(partition {2} order by [1] range 
between UNBOUNDED PRECEDING and CURRENT ROW aggs [SUM($0), COUNT($0)])]) : 
rowType = RecordType(BIGINT $0, DATE $1, TIMESTAMP(0) $2, BIGINT w0$o0, BIGINT 
w0$o1): rowcount = 78.0, cumulative cost = {390.0 rows, 5404.090984570043 cpu, 
0.0 io, 0.0 network, 1872.0 memory}, id = 5762018
00-06SelectionVectorRemover : rowType = RecordType(BIGINT $0, 
DATE $1, TIMESTAMP(0) $2): rowcount = 78.0, cumulative cost = {312.0 rows, 
5170.090984570043 cpu, 0.0 io, 0.0 network, 1872.0 memory}, id = 5762017
00-07  Sort(sort0=[$2], sort1=[$1], dir0=[ASC], dir1=[ASC]) : 
rowType = RecordType(BIGINT $0, DATE $1, TIMESTAMP(0) $2): rowcount = 78.0, 
cumulative cost = {234.0 rows, 5092.090984570043 cpu, 0.0 io, 0.0 network, 
1872.0 memory}, id = 5762016
00-08Project($0=[CAST(CAST($0):BIGINT):BIGINT], 
$1=[CAST(CAST($1):DATE):DATE], $2=[CAST(CAST($2):TIMESTAMP(0)):TIMESTAMP(0)]) : 
rowType = RecordType(BIGINT $0, DATE $1, TIMESTAMP(0) $2): rowcount = 78.0, 
cumulative cost = {156.0 rows, 1170.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
= 5762015
00-09  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/window_functions/fewRowsAllData.parquet]], 
selectionRoot=maprfs:/drill/testdata/window_functions/fewRowsAllData.parquet, 
numFiles=1, usedMetadataFile=false, columns=[`col1`, `col5`, `col4`]]]) : 
rowType = RecordType(ANY col1, ANY col5, ANY col4): rowcount = 78.0, cumulative 
cost = {78.0 rows, 234.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 5762014
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4528) AVG() windows function is not optimized for limit 0 queries

2016-03-22 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal updated DRILL-4528:
---
Labels: limit0  (was: )

> AVG() windows function is not optimized for limit 0 queries
> ---
>
> Key: DRILL-4528
> URL: https://issues.apache.org/jira/browse/DRILL-4528
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Krystal
>Assignee: Sean Hsuan-Yi Chu
>  Labels: limit0
>
> git.commit.id.abbrev=cee5317
> The following sample query contains the avg() windows function that is not 
> optimized when wrapped with limit 0:
> select * from (
> SELECT AVG(cast( col1 as BIGINT )) OVER(PARTITION BY cast( col4 as TIMESTAMP) 
> ORDER BY cast( col5 as DATE )) FROM `fewRowsAllData_v`) t limit 0
> Physical Plan:
> {code}
> 00-00Screen : rowType = RecordType(ANY EXPR$0): rowcount = 1.0, 
> cumulative cost = {469.1 rows, 5717.190984570043 cpu, 0.0 io, 0.0 network, 
> 1872.0 memory}, id = 5762023
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(ANY EXPR$0): rowcount 
> = 1.0, cumulative cost = {469.0 rows, 5717.090984570043 cpu, 0.0 io, 0.0 
> network, 1872.0 memory}, id = 5762022
> 00-02SelectionVectorRemover : rowType = RecordType(ANY EXPR$0): 
> rowcount = 1.0, cumulative cost = {469.0 rows, 5717.090984570043 cpu, 0.0 io, 
> 0.0 network, 1872.0 memory}, id = 5762021
> 00-03  Limit(fetch=[0]) : rowType = RecordType(ANY EXPR$0): rowcount 
> = 1.0, cumulative cost = {468.0 rows, 5716.090984570043 cpu, 0.0 io, 0.0 
> network, 1872.0 memory}, id = 5762020
> 00-04Project(EXPR$0=[/(CastHigh($3), $4)]) : rowType = 
> RecordType(ANY EXPR$0): rowcount = 78.0, cumulative cost = {468.0 rows, 
> 5716.090984570043 cpu, 0.0 io, 0.0 network, 1872.0 memory}, id = 5762019
> 00-05  Window(window#0=[window(partition {2} order by [1] range 
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [SUM($0), COUNT($0)])]) : 
> rowType = RecordType(BIGINT $0, DATE $1, TIMESTAMP(0) $2, BIGINT w0$o0, 
> BIGINT w0$o1): rowcount = 78.0, cumulative cost = {390.0 rows, 
> 5404.090984570043 cpu, 0.0 io, 0.0 network, 1872.0 memory}, id = 5762018
> 00-06SelectionVectorRemover : rowType = RecordType(BIGINT $0, 
> DATE $1, TIMESTAMP(0) $2): rowcount = 78.0, cumulative cost = {312.0 rows, 
> 5170.090984570043 cpu, 0.0 io, 0.0 network, 1872.0 memory}, id = 5762017
> 00-07  Sort(sort0=[$2], sort1=[$1], dir0=[ASC], dir1=[ASC]) : 
> rowType = RecordType(BIGINT $0, DATE $1, TIMESTAMP(0) $2): rowcount = 78.0, 
> cumulative cost = {234.0 rows, 5092.090984570043 cpu, 0.0 io, 0.0 network, 
> 1872.0 memory}, id = 5762016
> 00-08Project($0=[CAST(CAST($0):BIGINT):BIGINT], 
> $1=[CAST(CAST($1):DATE):DATE], $2=[CAST(CAST($2):TIMESTAMP(0)):TIMESTAMP(0)]) 
> : rowType = RecordType(BIGINT $0, DATE $1, TIMESTAMP(0) $2): rowcount = 78.0, 
> cumulative cost = {156.0 rows, 1170.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
> id = 5762015
> 00-09  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/window_functions/fewRowsAllData.parquet]], 
> selectionRoot=maprfs:/drill/testdata/window_functions/fewRowsAllData.parquet, 
> numFiles=1, usedMetadataFile=false, columns=[`col1`, `col5`, `col4`]]]) : 
> rowType = RecordType(ANY col1, ANY col5, ANY col4): rowcount = 78.0, 
> cumulative cost = {78.0 rows, 234.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 5762014
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted

2016-03-22 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207167#comment-15207167
 ] 

Deneche A. Hakim commented on DRILL-3714:
-

After further investigation here is what I found:

- RpcBus uses a CoordinationQueue to keep track of every batch that was sent 
and is still waiting for an Ack (it actually maps every batch to it's 
corresponding coordination queue)
- when Netty closes the data channel RpcBus.ChannelClosedHandler is called. On 
the client side it actually avoids calling CoordinationQueue.channelClosed()
- the logs show that some of those batches reached the server and some didn't. 
But the client side never received any Ack for any of them
- ReconnectingConnection replaces the closed connection with a new one (with a 
new CoordinationQueue)

After that, some of those batches are failed because of the closed connection, 
but most of them remain in this state: not send nor failed.

> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
>  [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.ja

[jira] [Comment Edited] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted

2016-03-22 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207167#comment-15207167
 ] 

Deneche A. Hakim edited comment on DRILL-3714 at 3/22/16 8:04 PM:
--

After further investigation here is what I found:

- RpcBus uses a CoordinationQueue to keep track of every batch that was sent 
and is still waiting for an Ack (it actually maps every batch to it's 
corresponding coordinationId)
- when Netty closes the data channel RpcBus.ChannelClosedHandler is called. On 
the client side it actually avoids calling CoordinationQueue.channelClosed()
- the logs show that some of those batches reached the server and some didn't. 
But the client side never received any Ack for any of them
- ReconnectingConnection replaces the closed connection with a new one (with a 
new CoordinationQueue)

After that, some of those batches are failed because of the closed connection, 
but most of them remain in this state: not send nor failed.


was (Author: adeneche):
After further investigation here is what I found:

- RpcBus uses a CoordinationQueue to keep track of every batch that was sent 
and is still waiting for an Ack (it actually maps every batch to it's 
corresponding coordination queue)
- when Netty closes the data channel RpcBus.ChannelClosedHandler is called. On 
the client side it actually avoids calling CoordinationQueue.channelClosed()
- the logs show that some of those batches reached the server and some didn't. 
But the client side never received any Ack for any of them
- ReconnectingConnection replaces the closed connection with a new one (with a 
new CoordinationQueue)

After that, some of those batches are failed because of the closed connection, 
but most of them remain in this state: not send nor failed.

> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io

[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207194#comment-15207194
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/405#issuecomment-20859
  
LGTM.

+1 




> Limit 0 should avoid execution when querying a known schema
> ---
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4514) Add describe schema command

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207274#comment-15207274
 ] 

ASF GitHub Bot commented on DRILL-4514:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/436#discussion_r57066817
  
--- Diff: exec/java-exec/src/main/codegen/includes/parserImpls.ftl ---
@@ -278,3 +278,19 @@ SqlNode SqlRefreshMetadata() :
 }
 }
 
+/**
--- End diff --

I would say, it's rather specific syntax.


> Add describe schema  command
> -
>
> Key: DRILL-4514
> URL: https://issues.apache.org/jira/browse/DRILL-4514
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: Future
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>
> Add describe database  command which will return directory 
> associated with a database on the fly.
> Syntax:
> describe database 
> describe schema 
> Output:
> {noformat}
>  DESCRIBE SCHEMA xdf.proc;
> +-++
> |name | location   |
> +-++
> | xdf.proc| maprfs://dl.data/processed |
> +-++
> {noformat}
> Current implementation covers only dfs schema.
> For all other "" will be returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207311#comment-15207311
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/405#issuecomment-200026538
  
Thank you for the reviews.

All regression tests passed; I am running unit tests right now.

Note that, the `planner.enable_limit0_optimization` option is disabled by 
default. To summarize (and document) the limitations:

If, during validation, the planner is able to resolve that the types of the 
columns (i.e. types are non late binding), the shorter execution path is taken. 
Some types are excluded:
+ DECIMAL type is not fully supported in general.
+ VARBINARY is not fully tested.
+ MAP, ARRAY are currently not exposed to the planner.
+ TINYINT, SMALLINT are defined in the Drill type system but have been 
turned off for now.
+ SYMBOL, MULTISET, DISTINCT, STRUCTURED, ROW, OTHER, CURSOR, COLUMN_LIST 
are Calcite types currently not supported by Drill, nor defined in the Drill 
type list.

Three scenarios when the planner can do type resolution during validation:
+ Queries on Hive tables
+ Queries with explicit casts on table columns, example: `SELECT CAST(col1 
AS BIGINT), ABS(CAST(col2 AS INTEGER)) FROM table;`
+ Queries on views with casts on table columns

In the latter two cases, the schema of the query with LIMIT 0 clause has 
relaxed nullability compared to the query without the LIMIT 0 clause. Example:
Say the schema definition of the Parquet file (`numbers.parquet`) is:
```
message Numbers {
  required int col1;
  optional int col2;
 }
```

Since the view definition does not specify nullability of columns, and 
schema of a parquet file is not yet leveraged by Drill's planner:
```
CREATE VIEW dfs.tmp.mynumbers AS SELECT CAST(col1 AS INTEGER) as col1, 
CAST(col2 AS INTEGER) AS col2 FROM dfs.tmp.`numbers.parquet`;
```
(1) For query with LIMIT 0 clause, since the file/ metadata is not read, 
Drill assumes the nullability of both columns is 
[`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
```
SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 0;
```

(2) For query without LIMIT 0 clause, since the file is read, Drill knows 
the nullability of `col1` is 
[`columnNoNulls`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNoNulls),
 and `col2` is 
[`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
```
SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 1;
```


> Limit 0 should avoid execution when querying a known schema
> ---
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4143) REFRESH TABLE METADATA - Permission Issues with metadata files

2016-03-22 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4143:

Assignee: Chunhui Shi

> REFRESH TABLE METADATA - Permission Issues with metadata files
> --
>
> Key: DRILL-4143
> URL: https://issues.apache.org/jira/browse/DRILL-4143
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0, 1.4.0
>Reporter: John Omernik
>Assignee: Chunhui Shi
>  Labels: Metadata, Parquet, Permissions
> Fix For: Future
>
>
> Summary of Refresh Metadata Issues confirmed by two different users on Drill 
> User Mailing list. (Title: REFRESH TABLE METADATA - Access Denied)
> This issue pertains to table METADATA and revolves around user 
> authentication. 
> Basically, when the drill bits are running as one user, and the data is owned 
> by another user, there can be access denied issues on subsequent queries 
> after issuing a REFRESH TABLE METADATA command. 
> To troubleshoot what is actually happening, I turned on MapR Auditing (This 
> is a handy feature) and found that when I run a query (that is giving me 
> access denied.. my query is select count(1) from testtable ) Per MapR the 
> user I am logged in as (dataowner) is trying to do a create operation on the 
> .drill.parquet_metadata file and it's failing with status: 17. Per Keys at 
> MapR, "status 17 means errno 17 which means EEXIST. Looks like Drill is 
> trying to create a file that already exists." This seems to indicate that 
> drill is perhaps trying to create the .drill.parquet_metadata on each select 
> as the dataowner user, but the permissions (as seen below) don't allow it. 
> Here are the steps to reproduce:
> Enable Authentication. 
> Run all drill bits in the cluster as "drillbituser", then have the files 
> owned by "dataowner". Note the root of the table permissions are drwxrwxr-x 
> but as Drill loads each partition it loads them as drwxr-xr-x (all with 
> dataowner:dataowner ownership). That may be something too, the default 
> permissions when creating a table?  Another note, in my setup, drillbituser 
> is in the group for dataowner.  Thus, they should always have read access. 
> # Authenticated as dataowner (this should have full permissions to all the 
> data)
> Enter username for jdbc:drill:zk=zknode1:5181: dataowner
> Enter password for jdbc:drill:zk=zknode1:5181: **
> 0: jdbc:drill:zk=zknode1> use dfs.dev;
> +---+--+
> |  ok   |   summary|
> +---+--+
> | true  | Default schema changed to [dfs.dev]  |
> +---+--+
> 1 row selected (0.307 seconds)
> # The query works fine with no table metadata
> 0: jdbc:drill:zk=zknode1> select count(1) from `testtable`;
> +---+
> |  EXPR$0   |
> +---+
> | 24565203  |
> +---+
> 1 row selected (3.392 seconds)
> # Refresh of metadata works under with no errors
> 0: jdbc:drill:zk=zknode1> refresh table metadata `testtable`;
> +---+---+
> |  ok   |summary|
> +---+---+
> | true  | Successfully updated metadata for table testtable.  |
> +---+---+
> 1 row selected (5.767 seconds)
>  
> # Trying to run the same query, it returns a access denied issue. 
> 0: jdbc:drill:zk=zknode1> select count(1) from `testtable`;
> Error: SYSTEM ERROR: IOException: 2127.7646.2950962 
> /data/dev/testtable/2015-11-12/.drill.parquet_metadata (Permission denied)
>  
>  
> [Error Id: 7bfce2e7-f78d-4fba-b047-f4c85b471de4 on node1:31010] 
> (state=,code=0)
>  
>  
> # Note how all the files are owned by the drillbituser. Per discussion on 
> list, this is normal 
>  
> $ find ./ -type f -name ".drill.parquet_metadata" -exec ls -ls {} \;
> 726 -rwxr-xr-x 1 drillbituser drillbituser 742837 Nov 30 14:27 
> ./2015-11-12/.drill.parquet_metadata
> 583 -rwxr-xr-x 1 drillbituser drillbituser 596146 Nov 30 14:27 
> ./2015-11-29/.drill.parquet_metadata
> 756 -rwxr-xr-x 1 drillbituser drillbituser 773811 Nov 30 14:27 
> ./2015-11-11/.drill.parquet_metadata
> 763 -rwxr-xr-x 1 drillbituser drillbituser 780829 Nov 30 14:27 
> ./2015-11-04/.drill.parquet_metadata
> 632 -rwxr-xr-x 1 drillbituser drillbituser 646851 Nov 30 14:27 
> ./2015-11-08/.drill.parquet_metadata
> 845 -rwxr-xr-x 1 drillbituser drillbituser 864421 Nov 30 14:27 
> ./2015-11-05/.drill.parquet_metadata
> 771 -rwxr-xr-x 1 drillbituser drillbituser 788823 Nov 30 14:27 
> ./2015-11-28/.drill.parquet_metadata
> 1273 -rwxr-xr-x 1 drillbituser drillbituser 1303168 N

[jira] [Created] (DRILL-4529) SUM() with windows function result in mismatch nullability

2016-03-22 Thread Krystal (JIRA)
Krystal created DRILL-4529:
--

 Summary: SUM() with windows function result in mismatch nullability
 Key: DRILL-4529
 URL: https://issues.apache.org/jira/browse/DRILL-4529
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Krystal
Assignee: Sean Hsuan-Yi Chu


git.commit.id.abbrev=cee5317

select 
sum(1)  over w sum1, 
sum(5)  over w sum5,
sum(10) over w sum10
from 
j1_v
where 
c_date is not null
window w as (partition by c_date);

Output from test:
limit 0: [columnNoNulls, columnNoNulls, columnNoNulls]
regular: [columnNullable, columnNullable, columnNullable]




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4529) SUM() with windows function result in mismatch nullability

2016-03-22 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal updated DRILL-4529:
---
Labels: limit0  (was: )

> SUM() with windows function result in mismatch nullability
> --
>
> Key: DRILL-4529
> URL: https://issues.apache.org/jira/browse/DRILL-4529
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Krystal
>Assignee: Sean Hsuan-Yi Chu
>  Labels: limit0
>
> git.commit.id.abbrev=cee5317
> select 
>   sum(1)  over w sum1, 
>   sum(5)  over w sum5,
>   sum(10) over w sum10
> from 
>   j1_v
> where 
>   c_date is not null
> window w as (partition by c_date);
> Output from test:
> limit 0: [columnNoNulls, columnNoNulls, columnNoNulls]
> regular: [columnNullable, columnNullable, columnNullable]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4523) Add the IP address if we enable debug for org.apache.drill.exec.coord.zk

2016-03-22 Thread Parth Chandra (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207467#comment-15207467
 ] 

Parth Chandra commented on DRILL-4523:
--

If the hostname was localhost.localdomain, then the ipaddress will likely be 
127.0.0.1 which is just as useless.  As [~mandoskippy] mentioned, other than 
the case of a single node cluster there is no case where we need to allow 
localhost (or the loopback address) as legal values for a drillbit endpoint. 
For the one special case of a single node cluster, we can provide a config 
variable to allow the localhost name.

> Add the IP address if we enable debug for org.apache.drill.exec.coord.zk
> 
>
> Key: DRILL-4523
> URL: https://issues.apache.org/jira/browse/DRILL-4523
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
>
> If we enable debug for org.apache.drill.exec.coord.zk in logback.xml, we only 
> get the hostname and ports information. For example:
> {code}
> 2015-11-04 19:47:02,927 [ServiceCache-0] DEBUG 
> o.a.d.e.c.zk.ZKClusterCoordinator - Cache changed, updating.
> 2015-11-04 19:47:02,932 [ServiceCache-0] DEBUG 
> o.a.d.e.c.zk.ZKClusterCoordinator - Active drillbit set changed.  Now 
> includes 2 total bits.  New active drillbits:
>  h3.poc.com:31010:31011:31012
>  h2.poc.com:31010:31011:31012
> {code}
> We need to know the IP address of each hostname to do further troubleshooting.
> Imagine if any drillbit registers itself as "localhost.localdomain" in 
> zookeeper, we will never know where it comes from. Enabling IP address 
> tracking can help this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207525#comment-15207525
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/405


> Limit 0 should avoid execution when querying a known schema
> ---
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-03-22 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam resolved DRILL-3623.

Resolution: Fixed

Fixed in 
[5dbaafb|https://github.com/apache/drill/commit/5dbaafbe6651b0a284fef69d5c952d82ce506e20].

> Limit 0 should avoid execution when querying a known schema
> ---
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1170) YARN support for Drill

2016-03-22 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207566#comment-15207566
 ] 

Paul Rogers commented on DRILL-1170:


A brief "starter set" of requirements:

* Configuration file to gather the cluster configuration (memory, cores, number 
of nodes and so on.)
* Launcher to start/stop Drill within YARN
* Drill-specific Application Master (AM)
* AM requests YARN Node Manager (AM) to launch drill-bits.
* Use YARN localization feature to depoy Drill files to each node.
* Add nodes (drill-bits) to a running Drill cluster
* Remove nodes from a running Drill cluster (see DRILL-2656)
* Detect and restart failed drill-bits
* Status/statistics about the cluster as a whole (number of active nodes, 
number of restarts, etc.)
* Allow existing users to run "unmanaged" Drill clusters (YARN is optional)
* Possibly allow multiple "Drill clusters" (independent clusters of drill bits) 
on the same YARN-managed physical cluster.


> YARN support for Drill
> --
>
> Key: DRILL-1170
> URL: https://issues.apache.org/jira/browse/DRILL-1170
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Neeraja
>Assignee: Paul Rogers
> Fix For: Future
>
>
> This is a tracking item to make Drill work with YARN.
> Below are few requirements/needs to consider.
> - Drill should run as an YARN based application, side by side with other YARN 
> enabled applications (on same nodes or different nodes). Both memory and CPU 
> resources of Drill should be controlled in this mechanism.
> - As an YARN enabled application, Drill resource consumption should be 
> adaptive to the load on the cluster. For ex: When there is no load on the 
> Drill , Drill should consume no resources on the cluster.  As the load on 
> Drill increases, resources permitting, usage should grow proportionally.
> - Low latency is a key requirement for Apache Drill along with support for 
> multiple users (concurrency in 100s-1000s). This should be supported when run 
> as YARN application as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4530) Improve metadata cache performance for queries with single partition

2016-03-22 Thread Aman Sinha (JIRA)
Aman Sinha created DRILL-4530:
-

 Summary: Improve metadata cache performance for queries with 
single partition 
 Key: DRILL-4530
 URL: https://issues.apache.org/jira/browse/DRILL-4530
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning & Optimization
Affects Versions: 1.6.0
Reporter: Aman Sinha
Assignee: Aman Sinha


Consider two types of queries which are run with Parquet metadata caching: 
{noformat}
query 1:
SELECT col FROM  `A/B/C`;

query 2:
SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
{noformat}

For a certain dataset, the query1 elapsed time is 1 sec whereas query1 elapsed 
time is 9 sec even though both are accessing the same amount of data.  The user 
expectation is that they should perform roughly the same.  The main difference 
comes from reading the bigger metadata cache file at the root level 'A' for 
query2 and then applying the partitioning filter.  query1 reads a much smaller 
metadata cache file at the subdirectory level. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition

2016-03-22 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207648#comment-15207648
 ] 

Aman Sinha commented on DRILL-4530:
---

On a closer look, we can optimize this type of query which has a single 
partition by changing the selection root after partition pruning rule is 
applied.  While populating the partition vectors in ParquetPartitionDescriptor, 
we can keep track of the dir0, dir1 etc. values that have been encountered and 
if these are unique, then we can change the selection root to  'A/B/C'.  This 
will subsequently get used by ParquetGroupScan to read the metadata cache file 
from the subdirectory instead of top level.  

Note that this type of optimization would apply only when we know that a single 
partition is being selected.  For a general case (e.g IN filter or range 
predicate on the directory column) other solutions need to be designed.   
However, for the general case it is unlikely that a user will compare its 
performance with an equivalent query written without the partition filter (it 
is not easy to express arbitrary WHERE condition into a FROM clause). 

> Improve metadata cache performance for queries with single partition 
> -
>
> Key: DRILL-4530
> URL: https://issues.apache.org/jira/browse/DRILL-4530
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> Consider two types of queries which are run with Parquet metadata caching: 
> {noformat}
> query 1:
> SELECT col FROM  `A/B/C`;
> query 2:
> SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
> {noformat}
> For a certain dataset, the query1 elapsed time is 1 sec whereas query1 
> elapsed time is 9 sec even though both are accessing the same amount of data. 
>  The user expectation is that they should perform roughly the same.  The main 
> difference comes from reading the bigger metadata cache file at the root 
> level 'A' for query2 and then applying the partitioning filter.  query1 reads 
> a much smaller metadata cache file at the subdirectory level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4530) Improve metadata cache performance for queries with single partition

2016-03-22 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-4530:
--
Description: 
Consider two types of queries which are run with Parquet metadata caching: 
{noformat}
query 1:
SELECT col FROM  `A/B/C`;

query 2:
SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
{noformat}

For a certain dataset, the query1 elapsed time is 1 sec whereas query2 elapsed 
time is 9 sec even though both are accessing the same amount of data.  The user 
expectation is that they should perform roughly the same.  The main difference 
comes from reading the bigger metadata cache file at the root level 'A' for 
query2 and then applying the partitioning filter.  query1 reads a much smaller 
metadata cache file at the subdirectory level. 


  was:
Consider two types of queries which are run with Parquet metadata caching: 
{noformat}
query 1:
SELECT col FROM  `A/B/C`;

query 2:
SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
{noformat}

For a certain dataset, the query1 elapsed time is 1 sec whereas query1 elapsed 
time is 9 sec even though both are accessing the same amount of data.  The user 
expectation is that they should perform roughly the same.  The main difference 
comes from reading the bigger metadata cache file at the root level 'A' for 
query2 and then applying the partitioning filter.  query1 reads a much smaller 
metadata cache file at the subdirectory level. 



> Improve metadata cache performance for queries with single partition 
> -
>
> Key: DRILL-4530
> URL: https://issues.apache.org/jira/browse/DRILL-4530
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> Consider two types of queries which are run with Parquet metadata caching: 
> {noformat}
> query 1:
> SELECT col FROM  `A/B/C`;
> query 2:
> SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
> {noformat}
> For a certain dataset, the query1 elapsed time is 1 sec whereas query2 
> elapsed time is 9 sec even though both are accessing the same amount of data. 
>  The user expectation is that they should perform roughly the same.  The main 
> difference comes from reading the bigger metadata cache file at the root 
> level 'A' for query2 and then applying the partitioning filter.  query1 reads 
> a much smaller metadata cache file at the subdirectory level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)