[jira] [Commented] (DRILL-7011) Allow hybrid model in the Row set-based scan framework

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804556#comment-16804556
 ] 

ASF GitHub Bot commented on DRILL-7011:
---

paul-rogers commented on issue #1711: DRILL-7011: Support schema in scan 
framework
URL: https://github.com/apache/drill/pull/1711#issuecomment-477853689
 
 
   Final cleanup to ensure all check style warnings are addressed and unit 
tests pass.
   As part of this, removed a bunch of unused "suppress warnings" annotations 
that are no longer needed in Java 8.
   
   This PR should be ready to go. Since this has gotten rather large (sorry!) 
we'll tackle enforcing "strict" column schema in the next one.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow hybrid model in the Row set-based scan framework
> --
>
> Key: DRILL-7011
> URL: https://issues.apache.org/jira/browse/DRILL-7011
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> As part of schema provisioning project we want to allow hybrid model for Row 
> set-based scan framework, namely to allow to pass custom schema metadata 
> which can be partial.
> Currently schema provisioning has SchemaContainer class that contains the 
> following information (can be obtained from metastore, schema file, table 
> function):
> 1. schema represented by org.apache.drill.exec.record.metadata.TupleMetadata
> 2. properties represented by Map, can contain information if 
> schema is strict or partial (default is partial) etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7139) Date_add() can produce incorrect results when adding to a timestamp

2019-03-28 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-7139:


Assignee: Arina Ielchiieva  (was: Pritesh Maker)

> Date_add() can produce incorrect results when adding to a timestamp
> ---
>
> Key: DRILL-7139
> URL: https://issues.apache.org/jira/browse/DRILL-7139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
>
> I am using date_add() to create a sequence of timestamps:
> {noformat}
> select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') 
> as interval minute)) timestamp_id from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1970-01-25 20:31:12.704  |
> +--+
> 1 row selected (0.121 seconds)
> {noformat}
> When I add one more, I get an older timestamp:
> {noformat}
> select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107375,'M') 
> as interval minute)) timestamp_id from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1969-12-07 03:29:25.408  |
> +--+
> 1 row selected (0.126 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7076) NPE is logged when querying postgres tables

2019-03-28 Thread Gautam Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804510#comment-16804510
 ] 

Gautam Parai commented on DRILL-7076:
-

[~vvysotskyi] could you please let me know how to get/create the data-source 
i.e. the postgres table.

> NPE is logged when querying postgres tables
> ---
>
> Key: DRILL-7076
> URL: https://issues.apache.org/jira/browse/DRILL-7076
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> NPE is seen in logs when querying Postgres table:
> {code:sql}
> select 1 from postgres.public.tdt
> {code}
> Stack trace from {{sqlline.log}}:
> {noformat}
> 2019-03-05 13:49:19,395 [23819dc0-abf8-24f3-ea81-6ced1b6e11af:foreman] WARN  
> o.a.d.e.p.common.DrillStatsTable - Failed to materialize the stats. 
> Continuing without stats.
> java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.planner.common.DrillStatsTable$StatsMaterializationVisitor.visit(DrillStatsTable.java:189)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at org.apache.calcite.rel.SingleRel.childrenAccept(SingleRel.java:72) 
> [calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
>   at org.apache.calcite.rel.RelVisitor.visit(RelVisitor.java:44) 
> [calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
>   at 
> org.apache.drill.exec.planner.common.DrillStatsTable$StatsMaterializationVisitor.visit(DrillStatsTable.java:202)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61) 
> [calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
>   at 
> org.apache.drill.exec.planner.common.DrillStatsTable$StatsMaterializationVisitor.materialize(DrillStatsTable.java:177)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:235)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:331)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:178)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:204)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:114)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:80)
>  [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:584) 
> [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:272) 
> [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_191]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_191]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
> {noformat}
> But query runs and returns the correct result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7139) Date_add() can produce incorrect results when adding to a timestamp

2019-03-28 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7139:
--
Description: 
I am using date_add() to create a sequence of timestamps:
{noformat}
select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') 
as interval minute)) timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1970-01-25 20:31:12.704  |
+--+
1 row selected (0.121 seconds)
{noformat}

When I add one more, I get an older timestamp:
{noformat}
select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107375,'M') 
as interval minute)) timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1969-12-07 03:29:25.408  |
+--+
1 row selected (0.126 seconds)
{noformat}

  was:
I am using date_add() to create a sequence of timestamps:
{noformat}
select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') 
as interval minute)) timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1970-01-25 20:31:12.704  |
+--+
1 row selected (0.121 seconds)
{noformat}

When I add one more, I get an older timestamp:
{noformat}
0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 
00:00:00', cast(concat('PT',107375,'M') as interval minute)) timestamp_id from 
(values(1));
+--+
|   timestamp_id   |
+--+
| 1969-12-07 03:29:25.408  |
+--+
1 row selected (0.126 seconds)
{noformat}


> Date_add() can produce incorrect results when adding to a timestamp
> ---
>
> Key: DRILL-7139
> URL: https://issues.apache.org/jira/browse/DRILL-7139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Pritesh Maker
>Priority: Major
>
> I am using date_add() to create a sequence of timestamps:
> {noformat}
> select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') 
> as interval minute)) timestamp_id from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1970-01-25 20:31:12.704  |
> +--+
> 1 row selected (0.121 seconds)
> {noformat}
> When I add one more, I get an older timestamp:
> {noformat}
> select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107375,'M') 
> as interval minute)) timestamp_id from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1969-12-07 03:29:25.408  |
> +--+
> 1 row selected (0.126 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7139) Date_add() can produce incorrect results when adding to a timestamp

2019-03-28 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7139:
--
Description: 
I am using date_add() to create a sequence of timestamps:
{noformat}
select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') 
as interval minute)) timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1970-01-25 20:31:12.704  |
+--+
1 row selected (0.121 seconds)
{noformat}

When I add one more, I get an older timestamp:
{noformat}
0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 
00:00:00', cast(concat('PT',107375,'M') as interval minute)) timestamp_id from 
(values(1));
+--+
|   timestamp_id   |
+--+
| 1969-12-07 03:29:25.408  |
+--+
1 row selected (0.126 seconds)
{noformat}

  was:
I am using date_add() to create a sequence of timestamps:
{noformat}
select date_add(timestamp '1970-01-01 00:00:00', 
cast(concat('PT',{color:#f79232}107374{color},'M') as interval minute)) 
timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1970-01-25 20:31:12.704  |
+--+
1 row selected (0.121 seconds)
{noformat}

When I add one more, I get an older timestamp:
{noformat}
0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 
00:00:00', cast(concat('PT',{color:#f79232}107375{color},'M') as interval 
minute)) timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1969-12-07 03:29:25.408  |
+--+
1 row selected (0.126 seconds)
{noformat}


> Date_add() can produce incorrect results when adding to a timestamp
> ---
>
> Key: DRILL-7139
> URL: https://issues.apache.org/jira/browse/DRILL-7139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Pritesh Maker
>Priority: Major
>
> I am using date_add() to create a sequence of timestamps:
> {noformat}
> select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') 
> as interval minute)) timestamp_id from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1970-01-25 20:31:12.704  |
> +--+
> 1 row selected (0.121 seconds)
> {noformat}
> When I add one more, I get an older timestamp:
> {noformat}
> 0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 
> 00:00:00', cast(concat('PT',107375,'M') as interval minute)) timestamp_id 
> from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1969-12-07 03:29:25.408  |
> +--+
> 1 row selected (0.126 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7139) Date_add() can produce incorrect results when adding to a timestamp

2019-03-28 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7139:
--
Summary: Date_add() can produce incorrect results when adding to a 
timestamp  (was: Date)add produces Incorrect results when adding to a timestamp)

> Date_add() can produce incorrect results when adding to a timestamp
> ---
>
> Key: DRILL-7139
> URL: https://issues.apache.org/jira/browse/DRILL-7139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Pritesh Maker
>Priority: Major
>
> I am using date_add() to create a sequence of timestamps:
> {noformat}
> select date_add(timestamp '1970-01-01 00:00:00', 
> cast(concat('PT',{color:#f79232}107374{color},'M') as interval minute)) 
> timestamp_id from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1970-01-25 20:31:12.704  |
> +--+
> 1 row selected (0.121 seconds)
> {noformat}
> When I add one more, I get an older timestamp:
> {noformat}
> 0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 
> 00:00:00', cast(concat('PT',{color:#f79232}107375{color},'M') as interval 
> minute)) timestamp_id from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1969-12-07 03:29:25.408  |
> +--+
> 1 row selected (0.126 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7139) Date)add produces Incorrect results when adding to a timestamp

2019-03-28 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7139:
-

 Summary: Date)add produces Incorrect results when adding to a 
timestamp
 Key: DRILL-7139
 URL: https://issues.apache.org/jira/browse/DRILL-7139
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.15.0
Reporter: Robert Hou
Assignee: Pritesh Maker


I am using date_add() to create a sequence of timestamps:
{noformat}
select date_add(timestamp '1970-01-01 00:00:00', 
cast(concat('PT',{color:#f79232}107374{color},'M') as interval minute)) 
timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1970-01-25 20:31:12.704  |
+--+
1 row selected (0.121 seconds)
{noformat}

When I add one more, I get an older timestamp:
{noformat}
0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 
00:00:00', cast(concat('PT',{color:#f79232}107375{color},'M') as interval 
minute)) timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1969-12-07 03:29:25.408  |
+--+
1 row selected (0.126 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7117) Support creation of histograms for numeric data types (except Decimal) and date/time/timestamp

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804477#comment-16804477
 ] 

ASF GitHub Bot commented on DRILL-7117:
---

amansinha100 commented on pull request #1715: DRILL-7117: Support creation of 
equi-depth histogram for selected dat…
URL: https://github.com/apache/drill/pull/1715#discussion_r270244337
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/StringEquiDepthHistogram.java
 ##
 @@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.common;
+
+
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.calcite.rex.RexNode;
+
+/**
+ * A column specific histogram which is meant for string columns
+ */
+@JsonTypeName("string-equi-depth")
+public class StringEquiDepthHistogram implements Histogram {
 
 Review comment:
   Ok, will remove.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support creation of histograms for numeric data types (except Decimal) and 
> date/time/timestamp
> --
>
> Key: DRILL-7117
> URL: https://issues.apache.org/jira/browse/DRILL-7117
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning  Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> This JIRA is specific to creating histograms for numeric data types: INT, 
> BIGINT, FLOAT4, FLOAT8  and their corresponding nullable/non-nullable 
> versions.  Additionally, since DATE/TIME/TIMESTAMP are internally stored as 
> longs, we should allow the same numeric type histogram creation for these 
> data types as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7117) Support creation of histograms for numeric data types (except Decimal) and date/time/timestamp

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804473#comment-16804473
 ] 

ASF GitHub Bot commented on DRILL-7117:
---

amansinha100 commented on pull request #1715: DRILL-7117: Support creation of 
equi-depth histogram for selected dat…
URL: https://github.com/apache/drill/pull/1715#discussion_r270235751
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -62,10 +62,15 @@
   public enum STATS_VERSION {V0, V1};
   // The current version
   public static final STATS_VERSION CURRENT_VERSION = STATS_VERSION.V1;
+  // 10 histogram buckets (TODO: can make this configurable later)
+  public static final int NUM_HISTOGRAM_BUCKETS = 10;
 
 Review comment:
   I picked 10 buckets because that's what Postgres uses. I will try and check 
any others (let me know if you know of other defaults).  Also, in the JSON 
stats file we want to keep the amount of serialized output small enough.  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support creation of histograms for numeric data types (except Decimal) and 
> date/time/timestamp
> --
>
> Key: DRILL-7117
> URL: https://issues.apache.org/jira/browse/DRILL-7117
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning  Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> This JIRA is specific to creating histograms for numeric data types: INT, 
> BIGINT, FLOAT4, FLOAT8  and their corresponding nullable/non-nullable 
> versions.  Additionally, since DATE/TIME/TIMESTAMP are internally stored as 
> longs, we should allow the same numeric type histogram creation for these 
> data types as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7117) Support creation of histograms for numeric data types (except Decimal) and date/time/timestamp

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804475#comment-16804475
 ] 

ASF GitHub Bot commented on DRILL-7117:
---

amansinha100 commented on pull request #1715: DRILL-7117: Support creation of 
equi-depth histogram for selected dat…
URL: https://github.com/apache/drill/pull/1715#discussion_r270244682
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/statistics/TDigestMergedStatistic.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.statistics;
+
+// Library implementing TDigest algorithm to derive approximate quantiles. 
Please refer to:
+// 'Computing Extremely Accurate Quantiles using t-Digests' by Ted Dunning and 
Otmar Ertl
+
+import com.clearspring.analytics.stream.quantile.TDigest;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.vector.NullableVarBinaryVector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.MapVector;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.nio.ByteBuffer;
+
+public class TDigestMergedStatistic extends AbstractMergedStatistic {
+  private Map tdigestHolder;
+  private int compression;
+
+  public TDigestMergedStatistic() {
+this.tdigestHolder = new HashMap<>();
+state = State.INIT;
+  }
+
+  @Override
+  public void initialize(String inputName, double samplePercent) {
+super.initialize(Statistic.TDIGEST_MERGE, inputName, samplePercent);
+state = State.CONFIG;
+  }
+
+  @Override
+  public String getName() {
+return name;
+  }
+
+  @Override
+  public String getInput() {
+return inputName;
+  }
+
+  @Override
+  public void merge(MapVector input) {
+// Check the input is a Map Vector
+assert (input.getField().getType().getMinorType() == 
TypeProtos.MinorType.MAP);
+for (ValueVector vv : input) {
+  String colName = vv.getField().getName();
+  TDigest colTdigestHolder = null;
+  if (tdigestHolder.get(colName) != null) {
+colTdigestHolder = tdigestHolder.get(colName);
+  }
+  NullableVarBinaryVector tdigestVector = (NullableVarBinaryVector) vv;
+  NullableVarBinaryVector.Accessor accessor = tdigestVector.getAccessor();
+
+  try {
+if (!accessor.isNull(0)) {
+  TDigest other = TDigest.fromBytes(ByteBuffer.wrap(accessor.get(0)));
+  if (colTdigestHolder != null) {
+colTdigestHolder.add(other);
+tdigestHolder.put(colName, colTdigestHolder);
+  } else {
+tdigestHolder.put(colName, other);
+  }
+}
+  } catch (Exception ex) {
+//TODO: Catch IOException/CardinalityMergeException
 
 Review comment:
   I will add logging for this. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support creation of histograms for numeric data types (except Decimal) and 
> date/time/timestamp
> --
>
> Key: DRILL-7117
> URL: https://issues.apache.org/jira/browse/DRILL-7117
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning  Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> This JIRA is specific to creating histograms for numeric data types: INT, 
> BIGINT, FLOAT4, FLOAT8  and their corresponding nullable/non-nullable 
> versions.  Additionally, since DATE/TIME/TIMESTAMP are internally stored as 
> longs, we should allow the same numeric type histogram creation for these 
> data types as well. 



--

[jira] [Commented] (DRILL-7117) Support creation of histograms for numeric data types (except Decimal) and date/time/timestamp

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804474#comment-16804474
 ] 

ASF GitHub Bot commented on DRILL-7117:
---

amansinha100 commented on pull request #1715: DRILL-7117: Support creation of 
equi-depth histogram for selected dat…
URL: https://github.com/apache/drill/pull/1715#discussion_r270236731
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/NumericEquiDepthHistogram.java
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.common;
+
+
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+
+import org.apache.calcite.rex.RexNode;
+import com.clearspring.analytics.stream.quantile.TDigest;
+
+/**
+ * A column specific equi-depth histogram which is meant for numeric data types
+ */
+@JsonTypeName("numeric-equi-depth")
+public class NumericEquiDepthHistogram implements Histogram {
+
+  // For equi-depth, all buckets will have same (approx) number of rows
+  @JsonProperty("numRowsPerBucket")
+  private long numRowsPerBucket;
+
+  // An array of buckets arranged in increasing order of their start boundaries
+  // Note that the buckets only maintain the start point of the bucket range.
+  // End point is assumed to be the same as the start point of next bucket.
 
 Review comment:
   What I meant here is that we are not keeping a pair  for each 
bucket.  Instead, we are just maintaining the start.  But while *interpreting* 
the ranges, we will do what you are saying .. i.e the interval is evaluated 
such that start point is included while end point is excluded.   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support creation of histograms for numeric data types (except Decimal) and 
> date/time/timestamp
> --
>
> Key: DRILL-7117
> URL: https://issues.apache.org/jira/browse/DRILL-7117
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning  Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> This JIRA is specific to creating histograms for numeric data types: INT, 
> BIGINT, FLOAT4, FLOAT8  and their corresponding nullable/non-nullable 
> versions.  Additionally, since DATE/TIME/TIMESTAMP are internally stored as 
> longs, we should allow the same numeric type histogram creation for these 
> data types as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7117) Support creation of histograms for numeric data types (except Decimal) and date/time/timestamp

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804476#comment-16804476
 ] 

ASF GitHub Bot commented on DRILL-7117:
---

amansinha100 commented on pull request #1715: DRILL-7117: Support creation of 
equi-depth histogram for selected dat…
URL: https://github.com/apache/drill/pull/1715#discussion_r270243654
 
 

 ##
 File path: exec/jdbc-all/pom.xml
 ##
 @@ -452,6 +452,7 @@

org/apache/drill/shaded/guava/com/google/common/graph/**

org/apache/drill/shaded/guava/com/google/common/collect/Tree*

org/apache/drill/shaded/guava/com/google/common/collect/Standard*
+   
org/apache/drill/shaded/guava/com/google/common/io/BaseEncoding*
 
 Review comment:
   The JDBC jar file was exceeding size limits.  We have done this type of 
exclusions in the past also. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support creation of histograms for numeric data types (except Decimal) and 
> date/time/timestamp
> --
>
> Key: DRILL-7117
> URL: https://issues.apache.org/jira/browse/DRILL-7117
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning  Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> This JIRA is specific to creating histograms for numeric data types: INT, 
> BIGINT, FLOAT4, FLOAT8  and their corresponding nullable/non-nullable 
> versions.  Additionally, since DATE/TIME/TIMESTAMP are internally stored as 
> longs, we should allow the same numeric type histogram creation for these 
> data types as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-7123) TPCDS query 83 runs slower when Statistics is disabled

2019-03-28 Thread Gautam Parai (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Parai resolved DRILL-7123.
-
Resolution: Fixed

> TPCDS query 83 runs slower when Statistics is disabled
> --
>
> Key: DRILL-7123
> URL: https://issues.apache.org/jira/browse/DRILL-7123
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Query is TPCDS 83 with sf 100:
> {noformat}
> WITH sr_items 
>  AS (SELECT i_item_id   item_id, 
> Sum(sr_return_quantity) sr_item_qty 
>  FROM   store_returns, 
> item, 
> date_dim 
>  WHERE  sr_item_sk = i_item_sk 
> AND d_date IN (SELECT d_date 
>FROM   date_dim 
>WHERE  d_week_seq IN (SELECT d_week_seq 
>  FROM   date_dim 
>  WHERE 
>   d_date IN ( '1999-06-30', 
>   '1999-08-28', 
>   '1999-11-18' 
> ))) 
> AND sr_returned_date_sk = d_date_sk 
>  GROUP  BY i_item_id), 
>  cr_items 
>  AS (SELECT i_item_id   item_id, 
> Sum(cr_return_quantity) cr_item_qty 
>  FROM   catalog_returns, 
> item, 
> date_dim 
>  WHERE  cr_item_sk = i_item_sk 
> AND d_date IN (SELECT d_date 
>FROM   date_dim 
>WHERE  d_week_seq IN (SELECT d_week_seq 
>  FROM   date_dim 
>  WHERE 
>   d_date IN ( '1999-06-30', 
>   '1999-08-28', 
>   '1999-11-18' 
> ))) 
> AND cr_returned_date_sk = d_date_sk 
>  GROUP  BY i_item_id), 
>  wr_items 
>  AS (SELECT i_item_id   item_id, 
> Sum(wr_return_quantity) wr_item_qty 
>  FROM   web_returns, 
> item, 
> date_dim 
>  WHERE  wr_item_sk = i_item_sk 
> AND d_date IN (SELECT d_date 
>FROM   date_dim 
>WHERE  d_week_seq IN (SELECT d_week_seq 
>  FROM   date_dim 
>  WHERE 
>   d_date IN ( '1999-06-30', 
>   '1999-08-28', 
>   '1999-11-18' 
> ))) 
> AND wr_returned_date_sk = d_date_sk 
>  GROUP  BY i_item_id) 
> SELECT sr_items.item_id, 
>sr_item_qty, 
>sr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 
> 3.0 * 
>100 sr_dev, 
>cr_item_qty, 
>cr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 
> 3.0 * 
>100 cr_dev, 
>wr_item_qty, 
>wr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 
> 3.0 * 
>100 wr_dev, 
>( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 
>average 
> FROM   sr_items, 
>cr_items, 
>wr_items 
> WHERE  sr_items.item_id = cr_items.item_id 
>AND sr_items.item_id = wr_items.item_id 
> ORDER  BY sr_items.item_id, 
>   sr_item_qty
> LIMIT 100; 
> {noformat}
> The number of threads for major fragments 1 and 2 has changed when Statistics 
> is disabled.  The number of minor fragments has been reduced from 10 and 15 
> fragments down to 3 fragments.  Rowcount has changed for major fragment 2 
> from 1439754.0 down to 287950.8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-7120) Query fails with ChannelClosedException when Statistics is disabled

2019-03-28 Thread Gautam Parai (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Parai resolved DRILL-7120.
-
Resolution: Fixed

> Query fails with ChannelClosedException when Statistics is disabled
> ---
>
> Key: DRILL-7120
> URL: https://issues.apache.org/jira/browse/DRILL-7120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> TPCH query 5 fails at sf100 when Statistics is disabled.  Here is the query:
> {noformat}
> select
>   n.n_name,
>   sum(l.l_extendedprice * (1 - l.l_discount)) as revenue
> from
>   customer c,
>   orders o,
>   lineitem l,
>   supplier s,
>   nation n,
>   region r
> where
>   c.c_custkey = o.o_custkey
>   and l.l_orderkey = o.o_orderkey
>   and l.l_suppkey = s.s_suppkey
>   and c.c_nationkey = s.s_nationkey
>   and s.s_nationkey = n.n_nationkey
>   and n.n_regionkey = r.r_regionkey
>   and r.r_name = 'EUROPE'
>   and o.o_orderdate >= date '1997-01-01'
>   and o.o_orderdate < date '1997-01-01' + interval '1' year
> group by
>   n.n_name
> order by
>   revenue desc;
> {noformat}
> This is the error from drillbit.log:
> {noformat}
> 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> 
> FINISHED
> 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State to report: FINISHED
> 2019-03-04 18:17:51,454 [BitServer-13] WARN  
> o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
> stream due to memory limits.  Current Allocation: 262144.
> 2019-03-04 18:17:51,454 [BitServer-13] ERROR 
> o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer.
> 2019-03-04 18:17:51,463 [BitServer-13] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.120.104:31012 <--> /10.10.120.106:53048 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: 
> org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating 
> buffer.
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271)
>  ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) 
> [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  

[jira] [Resolved] (DRILL-7122) TPCDS queries 29 25 17 are slower when Statistics is disabled.

2019-03-28 Thread Gautam Parai (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Parai resolved DRILL-7122.
-
Resolution: Fixed

> TPCDS queries 29 25 17 are slower when Statistics is disabled.
> --
>
> Key: DRILL-7122
> URL: https://issues.apache.org/jira/browse/DRILL-7122
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Here is query 29 with sf 100:
> {noformat}
> SELECT i_item_id, 
>i_item_desc, 
>s_store_id, 
>s_store_name, 
>Avg(ss_quantity)AS store_sales_quantity, 
>Avg(sr_return_quantity) AS store_returns_quantity, 
>Avg(cs_quantity)AS catalog_sales_quantity 
> FROM   store_sales, 
>store_returns, 
>catalog_sales, 
>date_dim d1, 
>date_dim d2, 
>date_dim d3, 
>store, 
>item 
> WHERE  d1.d_moy = 4 
>AND d1.d_year = 1998 
>AND d1.d_date_sk = ss_sold_date_sk 
>AND i_item_sk = ss_item_sk 
>AND s_store_sk = ss_store_sk 
>AND ss_customer_sk = sr_customer_sk 
>AND ss_item_sk = sr_item_sk 
>AND ss_ticket_number = sr_ticket_number 
>AND sr_returned_date_sk = d2.d_date_sk 
>AND d2.d_moy BETWEEN 4 AND 4 + 3 
>AND d2.d_year = 1998 
>AND sr_customer_sk = cs_bill_customer_sk 
>AND sr_item_sk = cs_item_sk 
>AND cs_sold_date_sk = d3.d_date_sk 
>AND d3.d_year IN ( 1998, 1998 + 1, 1998 + 2 ) 
> GROUP  BY i_item_id, 
>   i_item_desc, 
>   s_store_id, 
>   s_store_name 
> ORDER  BY i_item_id, 
>   i_item_desc, 
>   s_store_id, 
>   s_store_name
> LIMIT 100; 
> {noformat}
> The hash join order has changed.  As a result, one of the hash joins does not 
> seem to reduce the number of rows significantly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7117) Support creation of histograms for numeric data types (except Decimal) and date/time/timestamp

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804429#comment-16804429
 ] 

ASF GitHub Bot commented on DRILL-7117:
---

amansinha100 commented on pull request #1715: DRILL-7117: Support creation of 
equi-depth histogram for selected dat…
URL: https://github.com/apache/drill/pull/1715#discussion_r270234955
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/TDigestFunctions.java
 ##
 @@ -0,0 +1,1126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.expr.fn.impl;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillAggFunc;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.annotations.Workspace;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import org.apache.drill.exec.expr.holders.NullableBigIntHolder;
+import org.apache.drill.exec.expr.holders.BitHolder;
+import org.apache.drill.exec.expr.holders.NullableBitHolder;
+import org.apache.drill.exec.expr.holders.NullableIntHolder;
+import org.apache.drill.exec.expr.holders.NullableFloat8Holder;
+import org.apache.drill.exec.expr.holders.NullableFloat4Holder;
+import org.apache.drill.exec.expr.holders.Float8Holder;
+import org.apache.drill.exec.expr.holders.Float4Holder;
+import org.apache.drill.exec.expr.holders.DateHolder;
+import org.apache.drill.exec.expr.holders.TimeHolder;
+import org.apache.drill.exec.expr.holders.TimeStampHolder;
+import org.apache.drill.exec.expr.holders.NullableDateHolder;
+import org.apache.drill.exec.expr.holders.NullableTimeHolder;
+import org.apache.drill.exec.expr.holders.NullableTimeStampHolder;
+import org.apache.drill.exec.expr.holders.ObjectHolder;
+import org.apache.drill.exec.expr.holders.IntHolder;
+import org.apache.drill.exec.expr.holders.NullableVarBinaryHolder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.expr.holders.VarBinaryHolder;
+import org.apache.drill.exec.expr.holders.NullableVarCharHolder;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.RepeatedFloat8Vector;
+
+import javax.inject.Inject;
+
+@SuppressWarnings("unused")
+public class TDigestFunctions {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TDigestFunctions.class);
+
+  private TDigestFunctions(){}
+
+  @FunctionTemplate(name = "tdigest", scope = 
FunctionTemplate.FunctionScope.POINT_AGGREGATE)
+  public static class BigIntTDigestFunction implements DrillAggFunc {
+@Param BigIntHolder in;
+@Workspace ObjectHolder work;
+@Output NullableVarBinaryHolder out;
+@Inject DrillBuf buffer;
+@Inject OptionManager options;
+@Workspace IntHolder compression;
+
+@Override
+public void setup() {
+  work = new ObjectHolder();
+  compression.value = (int) 
options.getLong(org.apache.drill.exec.ExecConstants.TDIGEST_COMPRESSION);
+  work.obj = new 
com.clearspring.analytics.stream.quantile.TDigest(compression.value);
+}
+
+@Override
+public void add() {
+  if (work.obj != null) {
+com.clearspring.analytics.stream.quantile.TDigest tdigest = 
(com.clearspring.analytics.stream.quantile.TDigest) work.obj;
+tdigest.add(in.value);
+  }
+}
+
+@Override
+public void output() {
+  if (work.obj != null) {
+com.clearspring.analytics.stream.quantile.TDigest tdigest = 
(com.clearspring.analytics.stream.quantile.TDigest) work.obj;
+try {
+  int size = tdigest.smallByteSize();
+  java.nio.ByteBuffer byteBuf = java.nio.ByteBuffer.allocate(size);
+  tdigest.asSmallBytes(byteBuf);
+  out.buffer = 

[jira] [Commented] (DRILL-7117) Support creation of histograms for numeric data types (except Decimal) and date/time/timestamp

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804413#comment-16804413
 ] 

ASF GitHub Bot commented on DRILL-7117:
---

amansinha100 commented on pull request #1715: DRILL-7117: Support creation of 
equi-depth histogram for selected dat…
URL: https://github.com/apache/drill/pull/1715#discussion_r270232703
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/TDigestFunctions.java
 ##
 @@ -0,0 +1,1126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.expr.fn.impl;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillAggFunc;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.annotations.Workspace;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import org.apache.drill.exec.expr.holders.NullableBigIntHolder;
+import org.apache.drill.exec.expr.holders.BitHolder;
+import org.apache.drill.exec.expr.holders.NullableBitHolder;
+import org.apache.drill.exec.expr.holders.NullableIntHolder;
+import org.apache.drill.exec.expr.holders.NullableFloat8Holder;
+import org.apache.drill.exec.expr.holders.NullableFloat4Holder;
+import org.apache.drill.exec.expr.holders.Float8Holder;
+import org.apache.drill.exec.expr.holders.Float4Holder;
+import org.apache.drill.exec.expr.holders.DateHolder;
+import org.apache.drill.exec.expr.holders.TimeHolder;
+import org.apache.drill.exec.expr.holders.TimeStampHolder;
+import org.apache.drill.exec.expr.holders.NullableDateHolder;
+import org.apache.drill.exec.expr.holders.NullableTimeHolder;
+import org.apache.drill.exec.expr.holders.NullableTimeStampHolder;
+import org.apache.drill.exec.expr.holders.ObjectHolder;
+import org.apache.drill.exec.expr.holders.IntHolder;
+import org.apache.drill.exec.expr.holders.NullableVarBinaryHolder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.expr.holders.VarBinaryHolder;
+import org.apache.drill.exec.expr.holders.NullableVarCharHolder;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.RepeatedFloat8Vector;
+
+import javax.inject.Inject;
+
+@SuppressWarnings("unused")
+public class TDigestFunctions {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TDigestFunctions.class);
+
+  private TDigestFunctions(){}
+
+  @FunctionTemplate(name = "tdigest", scope = 
FunctionTemplate.FunctionScope.POINT_AGGREGATE)
+  public static class BigIntTDigestFunction implements DrillAggFunc {
+@Param BigIntHolder in;
+@Workspace ObjectHolder work;
+@Output NullableVarBinaryHolder out;
+@Inject DrillBuf buffer;
+@Inject OptionManager options;
+@Workspace IntHolder compression;
+
+@Override
+public void setup() {
+  work = new ObjectHolder();
+  compression.value = (int) 
options.getLong(org.apache.drill.exec.ExecConstants.TDIGEST_COMPRESSION);
+  work.obj = new 
com.clearspring.analytics.stream.quantile.TDigest(compression.value);
+}
+
+@Override
+public void add() {
+  if (work.obj != null) {
+com.clearspring.analytics.stream.quantile.TDigest tdigest = 
(com.clearspring.analytics.stream.quantile.TDigest) work.obj;
+tdigest.add(in.value);
+  }
+}
+
+@Override
+public void output() {
+  if (work.obj != null) {
+com.clearspring.analytics.stream.quantile.TDigest tdigest = 
(com.clearspring.analytics.stream.quantile.TDigest) work.obj;
+try {
+  int size = tdigest.smallByteSize();
+  java.nio.ByteBuffer byteBuf = java.nio.ByteBuffer.allocate(size);
+  tdigest.asSmallBytes(byteBuf);
+  out.buffer = 

[jira] [Commented] (DRILL-7121) TPCH 4 takes longer when Statistics is disabled.

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804415#comment-16804415
 ] 

ASF GitHub Bot commented on DRILL-7121:
---

gparai commented on pull request #1718: DRILL-7121: Use correct ndv when 
statistics is disabled
URL: https://github.com/apache/drill/pull/1718
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TPCH 4 takes longer when Statistics is disabled.
> 
>
> Key: DRILL-7121
> URL: https://issues.apache.org/jira/browse/DRILL-7121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Here is TPCH 4 with sf 100:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> The plan has changed when Statistics is disabled.   A Hash Agg and a 
> Broadcast Exchange have been added.  These two operators expand the number of 
> rows from the lineitem table from 137M to 9B rows.   This forces the hash 
> join to use 6GB of memory instead of 30 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7117) Support creation of histograms for numeric data types (except Decimal) and date/time/timestamp

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804400#comment-16804400
 ] 

ASF GitHub Bot commented on DRILL-7117:
---

amansinha100 commented on pull request #1715: DRILL-7117: Support creation of 
equi-depth histogram for selected dat…
URL: https://github.com/apache/drill/pull/1715#discussion_r270230184
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/TDigestFunctions.java
 ##
 @@ -0,0 +1,1126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.expr.fn.impl;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillAggFunc;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.annotations.Workspace;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import org.apache.drill.exec.expr.holders.NullableBigIntHolder;
+import org.apache.drill.exec.expr.holders.BitHolder;
+import org.apache.drill.exec.expr.holders.NullableBitHolder;
+import org.apache.drill.exec.expr.holders.NullableIntHolder;
+import org.apache.drill.exec.expr.holders.NullableFloat8Holder;
+import org.apache.drill.exec.expr.holders.NullableFloat4Holder;
+import org.apache.drill.exec.expr.holders.Float8Holder;
+import org.apache.drill.exec.expr.holders.Float4Holder;
+import org.apache.drill.exec.expr.holders.DateHolder;
+import org.apache.drill.exec.expr.holders.TimeHolder;
+import org.apache.drill.exec.expr.holders.TimeStampHolder;
+import org.apache.drill.exec.expr.holders.NullableDateHolder;
+import org.apache.drill.exec.expr.holders.NullableTimeHolder;
+import org.apache.drill.exec.expr.holders.NullableTimeStampHolder;
+import org.apache.drill.exec.expr.holders.ObjectHolder;
+import org.apache.drill.exec.expr.holders.IntHolder;
+import org.apache.drill.exec.expr.holders.NullableVarBinaryHolder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.expr.holders.VarBinaryHolder;
+import org.apache.drill.exec.expr.holders.NullableVarCharHolder;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.RepeatedFloat8Vector;
+
+import javax.inject.Inject;
+
+@SuppressWarnings("unused")
+public class TDigestFunctions {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TDigestFunctions.class);
+
+  private TDigestFunctions(){}
+
+  @FunctionTemplate(name = "tdigest", scope = 
FunctionTemplate.FunctionScope.POINT_AGGREGATE)
+  public static class BigIntTDigestFunction implements DrillAggFunc {
+@Param BigIntHolder in;
+@Workspace ObjectHolder work;
+@Output NullableVarBinaryHolder out;
+@Inject DrillBuf buffer;
 
 Review comment:
   The `buffer` is being used in the output() method below. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support creation of histograms for numeric data types (except Decimal) and 
> date/time/timestamp
> --
>
> Key: DRILL-7117
> URL: https://issues.apache.org/jira/browse/DRILL-7117
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning  Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> This JIRA is specific to creating histograms for numeric data types: INT, 
> BIGINT, FLOAT4, 

[jira] [Commented] (DRILL-7138) Implement command to describe schema for table

2019-03-28 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804197#comment-16804197
 ] 

Khurram Faraaz commented on DRILL-7138:
---

[~arina] will this also be supported for views ?

> Implement command to describe schema for table
> --
>
> Key: DRILL-7138
> URL: https://issues.apache.org/jira/browse/DRILL-7138
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> As part of schema provisioning project, it will be handy for the user to see 
> if table has schema and its content. 
> Command syntax:
> *describe schema for table dfs.tmp.`text_table`*
> By default schema will be output in JSON format (format schema is stored in 
> .drill.schema file):
> {noformat}
> {
>   "table" : "dfs.tmp.`text_table`",
>   "schema" : {
> "columns" : [
>   {
> "name" : "id",
> "type" : "INT",
> "mode" : "OPTIONAL"
>   }
> ]
>   },
>   "version" : 1
> }
> {noformat}
> JSON format can be indicated explicitly:
> *describe schema for table dfs.tmp.`text_table` as json*
> Other formats:
> STATEMENT - schema will be output in the form of CREATE SCHEMA statement.
> *describe schema for table dfs.tmp.`text_table` as statement*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7121) TPCH 4 takes longer when Statistics is disabled.

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804195#comment-16804195
 ] 

ASF GitHub Bot commented on DRILL-7121:
---

gparai commented on pull request #1718: DRILL-7121: Use correct ndv when 
statistics is disabled
URL: https://github.com/apache/drill/pull/1718#discussion_r270152862
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
 ##
 @@ -75,8 +75,25 @@ public Double getDistinctRowCount(Join rel, 
RelMetadataQuery mq,
 
   @Override
   public Double getDistinctRowCount(RelNode rel, RelMetadataQuery mq, 
ImmutableBitSet groupKey, RexNode predicate) {
-if (rel instanceof TableScan && !DrillRelOptUtil.guessRows(rel)) {
-  return getDistinctRowCount((TableScan) rel, mq, groupKey, predicate);
+if (rel instanceof DrillScanRelBase) {
+  DrillTable table = rel.getTable().unwrap(DrillTable.class);
+  if (table == null) {
+if (rel.getTable().unwrap(DrillTranslatableTable.class) != null) {
+  table = 
rel.getTable().unwrap(DrillTranslatableTable.class).getDrillTable();
+}
+  }
+  if (table != null && table.getStatsTable() != null && 
!DrillRelOptUtil.guessRows(rel)) {
+return getDistinctRowCount(((DrillScanRelBase)rel), mq, table, 
groupKey, rel.getRowType(), predicate);
+  } else {
+// If guessing, return NDV as 0.1 * rowCount
+/* If there is no table or metadata (stats) table associated with 
scan, estimate the
+ * distinct row count. Consistent with the estimation of Aggregate row 
count in
+ * RelMdRowCount: distinctRowCount = rowCount * 10%.
+ */
+if (rel instanceof DrillScanRel) {
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TPCH 4 takes longer when Statistics is disabled.
> 
>
> Key: DRILL-7121
> URL: https://issues.apache.org/jira/browse/DRILL-7121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Here is TPCH 4 with sf 100:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> The plan has changed when Statistics is disabled.   A Hash Agg and a 
> Broadcast Exchange have been added.  These two operators expand the number of 
> rows from the lineitem table from 137M to 9B rows.   This forces the hash 
> join to use 6GB of memory instead of 30 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7121) TPCH 4 takes longer when Statistics is disabled.

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804193#comment-16804193
 ] 

ASF GitHub Bot commented on DRILL-7121:
---

gparai commented on pull request #1718: DRILL-7121: Use correct ndv when 
statistics is disabled
URL: https://github.com/apache/drill/pull/1718#discussion_r270152804
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
 ##
 @@ -75,8 +75,25 @@ public Double getDistinctRowCount(Join rel, 
RelMetadataQuery mq,
 
   @Override
   public Double getDistinctRowCount(RelNode rel, RelMetadataQuery mq, 
ImmutableBitSet groupKey, RexNode predicate) {
-if (rel instanceof TableScan && !DrillRelOptUtil.guessRows(rel)) {
-  return getDistinctRowCount((TableScan) rel, mq, groupKey, predicate);
+if (rel instanceof DrillScanRelBase) {
+  DrillTable table = rel.getTable().unwrap(DrillTable.class);
+  if (table == null) {
+if (rel.getTable().unwrap(DrillTranslatableTable.class) != null) {
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TPCH 4 takes longer when Statistics is disabled.
> 
>
> Key: DRILL-7121
> URL: https://issues.apache.org/jira/browse/DRILL-7121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Here is TPCH 4 with sf 100:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> The plan has changed when Statistics is disabled.   A Hash Agg and a 
> Broadcast Exchange have been added.  These two operators expand the number of 
> rows from the lineitem table from 137M to 9B rows.   This forces the hash 
> join to use 6GB of memory instead of 30 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7121) TPCH 4 takes longer when Statistics is disabled.

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804194#comment-16804194
 ] 

ASF GitHub Bot commented on DRILL-7121:
---

gparai commented on pull request #1718: DRILL-7121: Use correct ndv when 
statistics is disabled
URL: https://github.com/apache/drill/pull/1718#discussion_r270152839
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
 ##
 @@ -75,8 +75,25 @@ public Double getDistinctRowCount(Join rel, 
RelMetadataQuery mq,
 
   @Override
   public Double getDistinctRowCount(RelNode rel, RelMetadataQuery mq, 
ImmutableBitSet groupKey, RexNode predicate) {
-if (rel instanceof TableScan && !DrillRelOptUtil.guessRows(rel)) {
-  return getDistinctRowCount((TableScan) rel, mq, groupKey, predicate);
+if (rel instanceof DrillScanRelBase) {
+  DrillTable table = rel.getTable().unwrap(DrillTable.class);
+  if (table == null) {
+if (rel.getTable().unwrap(DrillTranslatableTable.class) != null) {
+  table = 
rel.getTable().unwrap(DrillTranslatableTable.class).getDrillTable();
+}
+  }
+  if (table != null && table.getStatsTable() != null && 
!DrillRelOptUtil.guessRows(rel)) {
+return getDistinctRowCount(((DrillScanRelBase)rel), mq, table, 
groupKey, rel.getRowType(), predicate);
+  } else {
+// If guessing, return NDV as 0.1 * rowCount
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TPCH 4 takes longer when Statistics is disabled.
> 
>
> Key: DRILL-7121
> URL: https://issues.apache.org/jira/browse/DRILL-7121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Here is TPCH 4 with sf 100:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> The plan has changed when Statistics is disabled.   A Hash Agg and a 
> Broadcast Exchange have been added.  These two operators expand the number of 
> rows from the lineitem table from 137M to 9B rows.   This forces the hash 
> join to use 6GB of memory instead of 30 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804156#comment-16804156
 ] 

ASF GitHub Bot commented on DRILL-7048:
---

vvysotskyi commented on pull request #1714: DRILL-7048: Implement JDBC 
Statement.setMaxRows() with System Option
URL: https://github.com/apache/drill/pull/1714#discussion_r270130908
 
 

 ##
 File path: 
exec/jdbc/src/test/java/org/apache/drill/jdbc/PreparedStatementTest.java
 ##
 @@ -410,6 +417,155 @@ public void testNonTriggeredQueryTimeout() throws 
SQLException {
 }
   }
 
+  
+  // Query maxRows methods:
+
+  /**
+   * Test for reading of default max rows
+   */
+  @Test
+  public void testDefaultGetMaxRows() throws SQLException {
+try (PreparedStatement pStmt = 
connection.prepareStatement(SYS_OPTIONS_SQL)) {
+  int maxRowsValue = pStmt.getMaxRows();
+  assertEquals(0, maxRowsValue);
+}
+  }
+
+  /**
+   * Test Invalid parameter by giving negative maxRows value
+   */
+  @Test
+  public void testInvalidSetMaxRows() throws SQLException {
+try (PreparedStatement pStmt = 
connection.prepareStatement(SYS_OPTIONS_SQL)) {
+  //Setting negative value
+  int valueToSet = -10;
+  try {
+pStmt.setMaxRows(valueToSet);
+  } catch (final SQLException e) {
+assertThat(e.getMessage(), containsString("illegal maxRows value: " + 
valueToSet));
 
 Review comment:
   It would be good also to check the value of `pStmt.getMaxRows()` after the 
failure.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement JDBC Statement.setMaxRows() with System Option
> 
>
> Key: DRILL-7048
> URL: https://issues.apache.org/jira/browse/DRILL-7048
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC, Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> With DRILL-6960, the webUI will get an auto-limit on the number of results 
> fetched.
> Since more of the plumbing is already there, it makes sense to provide the 
> same for the JDBC client.
> In addition, it would be nice if the Server can have a pre-defined value as 
> well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a 
> max limit on the resultset size as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804146#comment-16804146
 ] 

ASF GitHub Bot commented on DRILL-7048:
---

vvysotskyi commented on pull request #1714: DRILL-7048: Implement JDBC 
Statement.setMaxRows() with System Option
URL: https://github.com/apache/drill/pull/1714#discussion_r270129243
 
 

 ##
 File path: 
exec/jdbc/src/test/java/org/apache/drill/jdbc/PreparedStatementTest.java
 ##
 @@ -462,4 +618,25 @@ public void 
testParamSettingWhenUnsupportedTypeSaysUnsupported() throws SQLExcep
 }
   }
 
+
+  // Sets the SystemMaxRows option
+  private void setSystemMaxRows(int sysValueToSet) throws SQLException {
 
 Review comment:
   `PreparedStatementTest` contains `testSetQueryTimeoutAsZero()` test which 
verifies that the result should have 3 rows. This test will be executed without 
considering the `Semaphore` lock. So for the case when one of your tests 
changed the value of autoLimit, and before resetting the value this test is 
executed, it will return fewer rows than it should.
   
   So adding these tests may introduce random failures of other tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement JDBC Statement.setMaxRows() with System Option
> 
>
> Key: DRILL-7048
> URL: https://issues.apache.org/jira/browse/DRILL-7048
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC, Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> With DRILL-6960, the webUI will get an auto-limit on the number of results 
> fetched.
> Since more of the plumbing is already there, it makes sense to provide the 
> same for the JDBC client.
> In addition, it would be nice if the Server can have a pre-defined value as 
> well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a 
> max limit on the resultset size as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7138) Implement command to describe schema for table

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804126#comment-16804126
 ] 

ASF GitHub Bot commented on DRILL-7138:
---

arina-ielchiieva commented on issue #1719: DRILL-7138: Implement command to 
describe schema for table
URL: https://github.com/apache/drill/pull/1719#issuecomment-477692710
 
 
   @vvysotskyi please review.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement command to describe schema for table
> --
>
> Key: DRILL-7138
> URL: https://issues.apache.org/jira/browse/DRILL-7138
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> As part of schema provisioning project, it will be handy for the user to see 
> if table has schema and its content. 
> Command syntax:
> *describe schema for table dfs.tmp.`text_table`*
> By default schema will be output in JSON format (format schema is stored in 
> .drill.schema file):
> {noformat}
> {
>   "table" : "dfs.tmp.`text_table`",
>   "schema" : {
> "columns" : [
>   {
> "name" : "id",
> "type" : "INT",
> "mode" : "OPTIONAL"
>   }
> ]
>   },
>   "version" : 1
> }
> {noformat}
> JSON format can be indicated explicitly:
> *describe schema for table dfs.tmp.`text_table` as json*
> Other formats:
> STATEMENT - schema will be output in the form of CREATE SCHEMA statement.
> *describe schema for table dfs.tmp.`text_table` as statement*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7138) Implement command to describe schema for table

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804125#comment-16804125
 ] 

ASF GitHub Bot commented on DRILL-7138:
---

arina-ielchiieva commented on pull request #1719: DRILL-7138: Implement command 
to describe schema for table
URL: https://github.com/apache/drill/pull/1719
 
 
   Details in [DRILL-7138](https://issues.apache.org/jira/browse/DRILL-7138).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement command to describe schema for table
> --
>
> Key: DRILL-7138
> URL: https://issues.apache.org/jira/browse/DRILL-7138
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> As part of schema provisioning project, it will be handy for the user to see 
> if table has schema and its content. 
> Command syntax:
> *describe schema for table dfs.tmp.`text_table`*
> By default schema will be output in JSON format (format schema is stored in 
> .drill.schema file):
> {noformat}
> {
>   "table" : "dfs.tmp.`text_table`",
>   "schema" : {
> "columns" : [
>   {
> "name" : "id",
> "type" : "INT",
> "mode" : "OPTIONAL"
>   }
> ]
>   },
>   "version" : 1
> }
> {noformat}
> JSON format can be indicated explicitly:
> *describe schema for table dfs.tmp.`text_table` as json*
> Other formats:
> STATEMENT - schema will be output in the form of CREATE SCHEMA statement.
> *describe schema for table dfs.tmp.`text_table` as statement*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7051) Upgrade to Jetty 9.3

2019-03-28 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7051:
-
Labels: ready-to-commit  (was: )

> Upgrade to Jetty 9.3 
> -
>
> Key: DRILL-7051
> URL: https://issues.apache.org/jira/browse/DRILL-7051
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.15.0
>Reporter: Veera Naranammalpuram
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Is Drill using a version of jetty web server that's really old? The jar's 
> suggest it's using jetty 9.1 that was built sometime in 2014? 
> {noformat}
> -rw-r--r-- 1 veeranaranammalpuram staff 15988 Nov 20 2017 
> jetty-continuation-9.1.1.v20140108.jar
> -rw-r--r-- 1 veeranaranammalpuram staff 103288 Nov 20 2017 
> jetty-http-9.1.5.v20140505.jar
> -rw-r--r-- 1 veeranaranammalpuram staff 101519 Nov 20 2017 
> jetty-io-9.1.5.v20140505.jar
> -rw-r--r-- 1 veeranaranammalpuram staff 95906 Nov 20 2017 
> jetty-security-9.1.5.v20140505.jar
> -rw-r--r-- 1 veeranaranammalpuram staff 401593 Nov 20 2017 
> jetty-server-9.1.5.v20140505.jar
> -rw-r--r-- 1 veeranaranammalpuram staff 110992 Nov 20 2017 
> jetty-servlet-9.1.5.v20140505.jar
> -rw-r--r-- 1 veeranaranammalpuram staff 119215 Nov 20 2017 
> jetty-servlets-9.1.5.v20140505.jar
> -rw-r--r-- 1 veeranaranammalpuram staff 341683 Nov 20 2017 
> jetty-util-9.1.5.v20140505.jar
> -rw-r--r-- 1 veeranaranammalpuram staff 38707 Dec 21 15:42 
> jetty-util-ajax-9.3.19.v20170502.jar
> -rw-r--r-- 1 veeranaranammalpuram staff 111466 Nov 20 2017 
> jetty-webapp-9.1.1.v20140108.jar
> -rw-r--r-- 1 veeranaranammalpuram staff 41763 Nov 20 2017 
> jetty-xml-9.1.1.v20140108.jar {noformat}
> This version is shown as deprecated: 
> [https://www.eclipse.org/jetty/documentation/current/what-jetty-version.html#d0e203]
> Opening this to upgrade jetty to the latest stable supported version. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7138) Implement command to describe schema for table

2019-03-28 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7138:

Reviewer: Volodymyr Vysotskyi

> Implement command to describe schema for table
> --
>
> Key: DRILL-7138
> URL: https://issues.apache.org/jira/browse/DRILL-7138
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> As part of schema provisioning project, it will be handy for the user to see 
> if table has schema and its content. 
> Command syntax:
> *describe schema for table dfs.tmp.`text_table`*
> By default schema will be output in JSON format (format schema is stored in 
> .drill.schema file):
> {noformat}
> {
>   "table" : "dfs.tmp.`text_table`",
>   "schema" : {
> "columns" : [
>   {
> "name" : "id",
> "type" : "INT",
> "mode" : "OPTIONAL"
>   }
> ]
>   },
>   "version" : 1
> }
> {noformat}
> JSON format can be indicated explicitly:
> *describe schema for table dfs.tmp.`text_table` as json*
> Other formats:
> STATEMENT - schema will be output in the form of CREATE SCHEMA statement.
> *describe schema for table dfs.tmp.`text_table` as statement*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7138) Implement command to describe schema for table

2019-03-28 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7138:

Labels: doc-impacting  (was: )

> Implement command to describe schema for table
> --
>
> Key: DRILL-7138
> URL: https://issues.apache.org/jira/browse/DRILL-7138
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> As part of schema provisioning project, it will be handy for the user to see 
> if table has schema and its content. 
> Command syntax:
> *describe schema for table dfs.tmp.`text_table`*
> By default schema will be output in JSON format (format schema is stored in 
> .drill.schema file):
> {noformat}
> {
>   "table" : "dfs.tmp.`text_table`",
>   "schema" : {
> "columns" : [
>   {
> "name" : "id",
> "type" : "INT",
> "mode" : "OPTIONAL"
>   }
> ]
>   },
>   "version" : 1
> }
> {noformat}
> JSON format can be indicated explicitly:
> *describe schema for table dfs.tmp.`text_table` as json*
> Other formats:
> STATEMENT - schema will be output in the form of CREATE SCHEMA statement.
> *describe schema for table dfs.tmp.`text_table` as statement*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7138) Implement command to describe schema for table

2019-03-28 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7138:

Description: 
As part of schema provisioning project, it will be handy for the user to see if 
table has schema and its content. 
Command syntax:

*describe schema for table dfs.tmp.`text_table`*

By default schema will be output in JSON format (format schema is stored in 
.drill.schema file):

{noformat}
{
  "table" : "dfs.tmp.`text_table`",
  "schema" : {
"columns" : [
  {
"name" : "id",
"type" : "INT",
"mode" : "OPTIONAL"
  }
]
  },
  "version" : 1
}
{noformat}

JSON format can be indicated explicitly:

*describe schema for table dfs.tmp.`text_table` as json*

Other formats:
STATEMENT - schema will be output in the form of CREATE SCHEMA statement.
*describe schema for table dfs.tmp.`text_table` as statement*


  was:
As part of schema provisioning project, it will be handy for the user to see if 
table has schema and its content. 
Command syntax:

*describe schema for table dfs.tmp.`text_table`*

By default schema will be output in JSON format (format schema is stored in 
.drill.schema file):

{noformat}
{
  "table" : "dfs.tmp.`text_table`",
  "schema" : {
"columns" : [
  {
"name" : "id",
"type" : "INT",
"mode" : "OPTIONAL"
  }
]
  },
  "version" : 1
}
{noformat}

JSON format can be indicated explicitly:

*describe schema for table dfs.tmp.`text_table` as json*

Other formats:
STATEMENT - schema will be output in the form of CREATE SCHEMA statement.



> Implement command to describe schema for table
> --
>
> Key: DRILL-7138
> URL: https://issues.apache.org/jira/browse/DRILL-7138
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> As part of schema provisioning project, it will be handy for the user to see 
> if table has schema and its content. 
> Command syntax:
> *describe schema for table dfs.tmp.`text_table`*
> By default schema will be output in JSON format (format schema is stored in 
> .drill.schema file):
> {noformat}
> {
>   "table" : "dfs.tmp.`text_table`",
>   "schema" : {
> "columns" : [
>   {
> "name" : "id",
> "type" : "INT",
> "mode" : "OPTIONAL"
>   }
> ]
>   },
>   "version" : 1
> }
> {noformat}
> JSON format can be indicated explicitly:
> *describe schema for table dfs.tmp.`text_table` as json*
> Other formats:
> STATEMENT - schema will be output in the form of CREATE SCHEMA statement.
> *describe schema for table dfs.tmp.`text_table` as statement*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7138) Implement command to describe schema for table

2019-03-28 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7138:

Summary: Implement command to describe schema for table  (was: Implement 
command to show schema for table)

> Implement command to describe schema for table
> --
>
> Key: DRILL-7138
> URL: https://issues.apache.org/jira/browse/DRILL-7138
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> As part of schema provisioning project, it will be handy for the user to see 
> if table has schema and its content. 
> Command syntax:
> *describe schema for table dfs.tmp.`text_table`*
> By default schema will be output in JSON format (format schema is stored in 
> .drill.schema file):
> {noformat}
> {
>   "table" : "dfs.tmp.`text_table`",
>   "schema" : {
> "columns" : [
>   {
> "name" : "id",
> "type" : "INT",
> "mode" : "OPTIONAL"
>   }
> ]
>   },
>   "version" : 1
> }
> {noformat}
> JSON format can be indicated explicitly:
> *describe schema for table dfs.tmp.`text_table` as json*
> Other formats:
> STATEMENT - schema will be output in the form of CREATE SCHEMA statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7121) TPCH 4 takes longer when Statistics is disabled.

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804028#comment-16804028
 ] 

ASF GitHub Bot commented on DRILL-7121:
---

amansinha100 commented on pull request #1718: DRILL-7121: Use correct ndv when 
statistics is disabled
URL: https://github.com/apache/drill/pull/1718#discussion_r270055924
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
 ##
 @@ -75,8 +75,25 @@ public Double getDistinctRowCount(Join rel, 
RelMetadataQuery mq,
 
   @Override
   public Double getDistinctRowCount(RelNode rel, RelMetadataQuery mq, 
ImmutableBitSet groupKey, RexNode predicate) {
-if (rel instanceof TableScan && !DrillRelOptUtil.guessRows(rel)) {
-  return getDistinctRowCount((TableScan) rel, mq, groupKey, predicate);
+if (rel instanceof DrillScanRelBase) {
+  DrillTable table = rel.getTable().unwrap(DrillTable.class);
+  if (table == null) {
+if (rel.getTable().unwrap(DrillTranslatableTable.class) != null) {
 
 Review comment:
   This code block seems to be used in multiple places .. why not make it a 
utility method ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TPCH 4 takes longer when Statistics is disabled.
> 
>
> Key: DRILL-7121
> URL: https://issues.apache.org/jira/browse/DRILL-7121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Here is TPCH 4 with sf 100:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> The plan has changed when Statistics is disabled.   A Hash Agg and a 
> Broadcast Exchange have been added.  These two operators expand the number of 
> rows from the lineitem table from 137M to 9B rows.   This forces the hash 
> join to use 6GB of memory instead of 30 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7121) TPCH 4 takes longer when Statistics is disabled.

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804027#comment-16804027
 ] 

ASF GitHub Bot commented on DRILL-7121:
---

amansinha100 commented on pull request #1718: DRILL-7121: Use correct ndv when 
statistics is disabled
URL: https://github.com/apache/drill/pull/1718#discussion_r270047377
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
 ##
 @@ -75,8 +75,25 @@ public Double getDistinctRowCount(Join rel, 
RelMetadataQuery mq,
 
   @Override
   public Double getDistinctRowCount(RelNode rel, RelMetadataQuery mq, 
ImmutableBitSet groupKey, RexNode predicate) {
-if (rel instanceof TableScan && !DrillRelOptUtil.guessRows(rel)) {
-  return getDistinctRowCount((TableScan) rel, mq, groupKey, predicate);
+if (rel instanceof DrillScanRelBase) {
+  DrillTable table = rel.getTable().unwrap(DrillTable.class);
+  if (table == null) {
+if (rel.getTable().unwrap(DrillTranslatableTable.class) != null) {
+  table = 
rel.getTable().unwrap(DrillTranslatableTable.class).getDrillTable();
+}
+  }
+  if (table != null && table.getStatsTable() != null && 
!DrillRelOptUtil.guessRows(rel)) {
+return getDistinctRowCount(((DrillScanRelBase)rel), mq, table, 
groupKey, rel.getRowType(), predicate);
+  } else {
+// If guessing, return NDV as 0.1 * rowCount
 
 Review comment:
   This line seems redundant since the same comment is below. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TPCH 4 takes longer when Statistics is disabled.
> 
>
> Key: DRILL-7121
> URL: https://issues.apache.org/jira/browse/DRILL-7121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Here is TPCH 4 with sf 100:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> The plan has changed when Statistics is disabled.   A Hash Agg and a 
> Broadcast Exchange have been added.  These two operators expand the number of 
> rows from the lineitem table from 137M to 9B rows.   This forces the hash 
> join to use 6GB of memory instead of 30 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7121) TPCH 4 takes longer when Statistics is disabled.

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804029#comment-16804029
 ] 

ASF GitHub Bot commented on DRILL-7121:
---

amansinha100 commented on pull request #1718: DRILL-7121: Use correct ndv when 
statistics is disabled
URL: https://github.com/apache/drill/pull/1718#discussion_r270054867
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
 ##
 @@ -75,8 +75,25 @@ public Double getDistinctRowCount(Join rel, 
RelMetadataQuery mq,
 
   @Override
   public Double getDistinctRowCount(RelNode rel, RelMetadataQuery mq, 
ImmutableBitSet groupKey, RexNode predicate) {
-if (rel instanceof TableScan && !DrillRelOptUtil.guessRows(rel)) {
-  return getDistinctRowCount((TableScan) rel, mq, groupKey, predicate);
+if (rel instanceof DrillScanRelBase) {
+  DrillTable table = rel.getTable().unwrap(DrillTable.class);
+  if (table == null) {
+if (rel.getTable().unwrap(DrillTranslatableTable.class) != null) {
+  table = 
rel.getTable().unwrap(DrillTranslatableTable.class).getDrillTable();
+}
+  }
+  if (table != null && table.getStatsTable() != null && 
!DrillRelOptUtil.guessRows(rel)) {
+return getDistinctRowCount(((DrillScanRelBase)rel), mq, table, 
groupKey, rel.getRowType(), predicate);
+  } else {
+// If guessing, return NDV as 0.1 * rowCount
+/* If there is no table or metadata (stats) table associated with 
scan, estimate the
+ * distinct row count. Consistent with the estimation of Aggregate row 
count in
+ * RelMdRowCount: distinctRowCount = rowCount * 10%.
+ */
+if (rel instanceof DrillScanRel) {
 
 Review comment:
   It would be good to add some comment here why the earlier check is for 
DrillScanRelBase and this one is for DrillScanRel. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TPCH 4 takes longer when Statistics is disabled.
> 
>
> Key: DRILL-7121
> URL: https://issues.apache.org/jira/browse/DRILL-7121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Here is TPCH 4 with sf 100:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> The plan has changed when Statistics is disabled.   A Hash Agg and a 
> Broadcast Exchange have been added.  These two operators expand the number of 
> rows from the lineitem table from 137M to 9B rows.   This forces the hash 
> join to use 6GB of memory instead of 30 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7138) Implement command to show schema for table

2019-03-28 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7138:

Description: 
As part of schema provisioning project, it will be handy for the user to see if 
table has schema and its content. 
Command syntax:

*describe schema for table dfs.tmp.`text_table`*

By default schema will be output in JSON format (format schema is stored in 
.drill.schema file):

{noformat}
{
  "table" : "dfs.tmp.`text_table`",
  "schema" : {
"columns" : [
  {
"name" : "id",
"type" : "INT",
"mode" : "OPTIONAL"
  }
]
  },
  "version" : 1
}
{noformat}

JSON format can be indicated explicitly:

*describe schema for table dfs.tmp.`text_table` as json*

Other formats:
STATEMENT - schema will be output in the form of CREATE SCHEMA statement.


  was:
As part of schema provisioning project, it will be handy for the user to see if 
table has schema and its content. 
Command syntax:

*describe schema for table dfs.tmp.`text_table`*

By default schema will be output in JSON format (format schema is stored in 
.drill.schema file):

{noformat}
{
  "table" : "dfs.tmp.`text_table`",
  "schema" : {
"columns" : [
  {
"name" : "id",
"type" : "INT",
"mode" : "OPTIONAL"
  }
]
  },
  "version" : 1
}
{noformat}

JSON format can be indicated explicitly:

*describe schema for table dfs.tmp.`text_table` JSON*

Other formats:
STATEMENT - schema will be output in the form of CREATE SCHEMA statement.



> Implement command to show schema for table
> --
>
> Key: DRILL-7138
> URL: https://issues.apache.org/jira/browse/DRILL-7138
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> As part of schema provisioning project, it will be handy for the user to see 
> if table has schema and its content. 
> Command syntax:
> *describe schema for table dfs.tmp.`text_table`*
> By default schema will be output in JSON format (format schema is stored in 
> .drill.schema file):
> {noformat}
> {
>   "table" : "dfs.tmp.`text_table`",
>   "schema" : {
> "columns" : [
>   {
> "name" : "id",
> "type" : "INT",
> "mode" : "OPTIONAL"
>   }
> ]
>   },
>   "version" : 1
> }
> {noformat}
> JSON format can be indicated explicitly:
> *describe schema for table dfs.tmp.`text_table` as json*
> Other formats:
> STATEMENT - schema will be output in the form of CREATE SCHEMA statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7138) Implement command to show schema for table

2019-03-28 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7138:

Description: 
As part of schema provisioning project, it will be handy for the user to see if 
table has schema and its content. 
Command syntax:

*describe schema for table dfs.tmp.`text_table`*

By default schema will be output in JSON format (format schema is stored in 
.drill.schema file):

{noformat}
{
  "table" : "dfs.tmp.`text_table`",
  "schema" : {
"columns" : [
  {
"name" : "id",
"type" : "INT",
"mode" : "OPTIONAL"
  }
]
  },
  "version" : 1
}
{noformat}

JSON format can be indicated explicitly:

*describe schema for table dfs.tmp.`text_table` JSON*

Other formats:
STATEMENT - schema will be output in the form of CREATE SCHEMA statement.


  was:
As part of schema provisioning project, it will be handy for the user to see if 
table has schema and its content. 
Command syntax:

*describe schema for table dfs.tmp.`text_table`*

By default schema will be output in JSON format (format schema is stored in 
.drill.schema file):

{noformat}
{
  "table" : "dfs.tmp.`text_table`",
  "schema" : {
"columns" : [
  {
"name" : "id",
"type" : "INT",
"mode" : "OPTIONAL"
  }
]
  },
  "version" : 1
}
{noformat}

JSON format can be indicated explicitly:

*describe schema for table dfs.tmp.`text_table` as JSON*

Other formats:
STATEMENT - schema will be output in the form of CREATE SCHEMA statement.



> Implement command to show schema for table
> --
>
> Key: DRILL-7138
> URL: https://issues.apache.org/jira/browse/DRILL-7138
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> As part of schema provisioning project, it will be handy for the user to see 
> if table has schema and its content. 
> Command syntax:
> *describe schema for table dfs.tmp.`text_table`*
> By default schema will be output in JSON format (format schema is stored in 
> .drill.schema file):
> {noformat}
> {
>   "table" : "dfs.tmp.`text_table`",
>   "schema" : {
> "columns" : [
>   {
> "name" : "id",
> "type" : "INT",
> "mode" : "OPTIONAL"
>   }
> ]
>   },
>   "version" : 1
> }
> {noformat}
> JSON format can be indicated explicitly:
> *describe schema for table dfs.tmp.`text_table` JSON*
> Other formats:
> STATEMENT - schema will be output in the form of CREATE SCHEMA statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6835) Schema Provision using File / Table Function

2019-03-28 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6835:

Description: 
Schema Provision using File / Table Function design document:

https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit?usp=sharing

Phase 1 design document - 
https://docs.google.com/document/d/1ExVgx2FDqxAz5GTqyWt-_1-UqwRSTGLGEYuc8gsESG8/edit?usp=sharing

  was:
Schema Provision using File / Table Function design document:

https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit?usp=sharing




> Schema Provision using File / Table Function
> 
>
> Key: DRILL-6835
> URL: https://issues.apache.org/jira/browse/DRILL-6835
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Schema Provision using File / Table Function design document:
> https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit?usp=sharing
> Phase 1 design document - 
> https://docs.google.com/document/d/1ExVgx2FDqxAz5GTqyWt-_1-UqwRSTGLGEYuc8gsESG8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7011) Allow hybrid model in the Row set-based scan framework

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803917#comment-16803917
 ] 

ASF GitHub Bot commented on DRILL-7011:
---

arina-ielchiieva commented on issue #1711: DRILL-7011: Support schema in scan 
framework
URL: https://github.com/apache/drill/pull/1711#issuecomment-477592578
 
 
   It's an impressive PR and it's good that we did code review interactively. 
Overall, looks good. Please squash the commits and it will be good to go.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow hybrid model in the Row set-based scan framework
> --
>
> Key: DRILL-7011
> URL: https://issues.apache.org/jira/browse/DRILL-7011
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> As part of schema provisioning project we want to allow hybrid model for Row 
> set-based scan framework, namely to allow to pass custom schema metadata 
> which can be partial.
> Currently schema provisioning has SchemaContainer class that contains the 
> following information (can be obtained from metastore, schema file, table 
> function):
> 1. schema represented by org.apache.drill.exec.record.metadata.TupleMetadata
> 2. properties represented by Map, can contain information if 
> schema is strict or partial (default is partial) etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-3090) sqlline : save SQL to script file and replay from script, results in error

2019-03-28 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-3090:
---

Assignee: Arina Ielchiieva

> sqlline : save SQL to script file and replay from script, results in error
> --
>
> Key: DRILL-3090
> URL: https://issues.apache.org/jira/browse/DRILL-3090
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.0.0
> Environment: ffbb9c7adc6360744bee186e1f69d47dc743f73e
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.16.0
>
>
> Save a SQL query to a script file and replay the SQL from the script file 
> using !run, on sqlline prompt throws error. We should not see the error when 
> we replay the SQL from the script file.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> !script file3
> Saving command script to "/opt/mapr/drill/drill-1.0.0/bin/file3". Enter 
> "script" with no arguments to stop it.
> 0: jdbc:drill:schema=dfs.tmp> select * from sys.drillbits;
> +++--+++
> |  hostname  | user_port  | control_port | data_port  |  current   |
> +++--+++
> | centos-04.qa.lab | 31010  | 31011| 31012  | false  |
> | centos-02.qa.lab | 31010  | 31011| 31012  | false  |
> | centos-01.qa.lab | 31010  | 31011| 31012  | false  |
> | centos-03.qa.lab | 31010  | 31011| 31012  | true   |
> +++--+++
> 4 rows selected (0.176 seconds)
> 0: jdbc:drill:schema=dfs.tmp> !script
> Script closed. Enter "run /opt/mapr/drill/drill-1.0.0/bin/file3" to replay it.
> 0: jdbc:drill:schema=dfs.tmp> !run /opt/mapr/drill/drill-1.0.0/bin/file3
> 1/2  select * from sys.drillbits;
> +++--+++
> |  hostname  | user_port  | control_port | data_port  |  current   |
> +++--+++
> | centos-04 | 31010  | 31011| 31012  | false  |
> | centos-02 | 31010  | 31011| 31012  | false  |
> | centos-01 | 31010  | 31011| 31012  | false  |
> | centos-03 | 31010  | 31011| 31012  | true   |
> +++--+++
> 4 rows selected (0.178 seconds)
> 2/2  !script
> Usage: script 
> Aborting command set because "force" is false and command failed: "!script"
> {code}
> I looked at the contents of file3 under /opt/mapr/drill/drill-1.0.0/bin
> There seems to be an additional/extra "!script" in the file.
> {code}
> [root@centos-01 bin]# cat file3
> select * from sys.drillbits;
> !script
> [root@centos-01 bin]# 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-1506) Current Schema Not Shown In the sqlline Prompt

2019-03-28 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-1506:
---

Assignee: Arina Ielchiieva

> Current Schema Not Shown In the sqlline Prompt
> --
>
> Key: DRILL-1506
> URL: https://issues.apache.org/jira/browse/DRILL-1506
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - CLI
>Affects Versions: 0.5.0
>Reporter: MUFEED USMAN
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.16.0
>
>
> The prompt isn't what I'd call user-friendly (doesn't display the schema I'm 
> connected to).
> [root@n69 bin]# ./sqlline -u 
> "jdbc:drill:zk=n69:5181,n72:5181,n73:5181;schema=sys" -n admin -p admin
> sqlline version 1.1.6
> 0: jdbc:drill:zk=n69:5181,n72:5181,n73:5181> show tables;
> +--++
> | TABLE_SCHEMA | TABLE_NAME |
> +--++
> | sys  | drillbits  |
> | sys  | options|
> +--++
> 2 rows selected (0.263 seconds)
> 0: jdbc:drill:zk=n69:5181,n72:5181,n73:5181> select * from drillbits;
> +++--++
> |host| user_port  | control_port | data_port  |
> +++--++
> | n69| 31010  | 31011| 31012  |
> | n72| 31010  | 31011| 31012  |
> +++--++
> 2 rows selected (0.077 seconds)
> 0: jdbc:drill:zk=n69:5181,n72:5181,n73:5181> 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-1669) Custom sqlline Prompt

2019-03-28 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-1669:
---

Assignee: Arina Ielchiieva

> Custom sqlline Prompt
> -
>
> Key: DRILL-1669
> URL: https://issues.apache.org/jira/browse/DRILL-1669
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - CLI
>Affects Versions: 0.7.0
>Reporter: MUFEED USMAN
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.16.0
>
>
> It'd be nice to have a way to set/define a custom sqlline prompt in Drill to 
> display, say, the current workspace it is connected to.
> For example:
> In Hive one could do,
> set hive.cli.print.current.db=true
> And in MySQL in /etc/my.cnf,
> [mysql]
> prompt=\u@\h:[\d]>\_



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-808) Sqlline use schema does not change the displayed schema

2019-03-28 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-808:
--

Assignee: Arina Ielchiieva

> Sqlline use schema does not change the displayed schema
> ---
>
> Key: DRILL-808
> URL: https://issues.apache.org/jira/browse/DRILL-808
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.16.0
>
>
> 0: jdbc:drill:schema=dfs.drillTestDirExchange> use dfs.drillTestDir
> . . . . . . . . . . . . . . . . . . . . . . .> ;
> +++
> | ok |  summary   |
> +++
> | true   | Default schema changed to 'dfs.drillTestDir' |
> +++
> 1 row selected (0.246 seconds)
> 0: jdbc:drill:schema=dfs.drillTestDirExchange> 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6980) --run regression - error when empty line or command at end of the file

2019-03-28 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6980:
---

Assignee: Arina Ielchiieva

> --run regression - error when empty line or command at end of the file
> --
>
> Key: DRILL-6980
> URL: https://issues.apache.org/jira/browse/DRILL-6980
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.15.0
>Reporter: benj
>Assignee: Arina Ielchiieva
>Priority: Minor
>
> When using --run like
> {code:java}
> bin/drill-embedded --run="myfile.req"
> {code}
> If "myfile.req" contains extra lines (empty or comment) after the last SQL 
> DRILL request, an error appear.
> {code:java}
> Error: PARSE ERROR: Encountered "" at line 1, column 4.
> {code}
> Note that empty lines or comment lines in the middle of the file (ie between 
> 2 DRILL requests or at the beginning of the file) don't make any problem.
> This problem appeared in 1.15.0 did not exists in 1.14.0
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (DRILL-3637) Elasticsearch storage plugin

2019-03-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/DRILL-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Gülzau updated DRILL-3637:
--
Comment: was deleted

(was: Ich bin bis zum 26.05. nicht im Büro und habe keinen Zugriff auf E-Mails.
In dringenden Fällen wenden Sie sich bitte an supp...@novomind.com

Viele Grüße,

Kai Gülzau

--

novomind AG
)

> Elasticsearch storage plugin
> 
>
> Key: DRILL-3637
> URL: https://issues.apache.org/jira/browse/DRILL-3637
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - ElasticSearch
>Reporter: Andrew
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: Future
>
>
> Create a storage plugin for elasticsearch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7138) Implement command to show schema for table

2019-03-28 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-7138:
---

 Summary: Implement command to show schema for table
 Key: DRILL-7138
 URL: https://issues.apache.org/jira/browse/DRILL-7138
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.16.0


As part of schema provisioning project, it will be handy for the user to see if 
table has schema and its content. 
Command syntax:

*describe schema for table dfs.tmp.`text_table`*

By default schema will be output in JSON format (format schema is stored in 
.drill.schema file):

{noformat}
{
  "table" : "dfs.tmp.`text_table`",
  "schema" : {
"columns" : [
  {
"name" : "id",
"type" : "INT",
"mode" : "OPTIONAL"
  }
]
  },
  "version" : 1
}
{noformat}

JSON format can be indicated explicitly:

*describe schema for table dfs.tmp.`text_table` as JSON*

Other formats:
STATEMENT - schema will be output in the form of CREATE SCHEMA statement.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7038) Queries on partitioned columns scan the entire datasets

2019-03-28 Thread Bohdan Kazydub (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803689#comment-16803689
 ] 

Bohdan Kazydub commented on DRILL-7038:
---

Hi, [~bbevens]. No, it's not like that. Those {{dir0}}, {{dir1}}, ... columns 
refer to directory levels from root directory (see [Querying 
Directories|https://drill.apache.org/docs/querying-directories/]).  For 
example, if {{table1}} had following directory structure:
{code}
/table1/2016/Q1
/table1/2016/Q2
...
{code}
and when querying
{code}
select distinct dir0[, dir1[,...]] from dfs.`/table1`;
select dir0[, dir1[,...]] from dfs.`/table1` group by dir0;
{code}
{{dir0}} references first level directories from `table1` (which is root), i.e. 
'2016' directory, {{dir1}} references second level directories 'Q1' and 'Q2' 
and so on.

Before, Drill was scanning all the *files* in all directories. With this 
optimization, file scanning is discarded and Scan operator is replaced with 
Values operator containing literal values, with this values being collected 
from directory metadata cache file (if exists) or from scan file selection.

> Queries on partitioned columns scan the entire datasets
> ---
>
> Key: DRILL-7038
> URL: https://issues.apache.org/jira/browse/DRILL-7038
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> For tables with hive-style partitions like
> {code}
> /table/2018/Q1
> /table/2018/Q2
> /table/2019/Q1
> etc.
> {code}
> if any of the following queries is run:
> {code}
> select distinct dir0 from dfs.`/table`
> {code}
> {code}
> select dir0 from dfs.`/table` group by dir0
> {code}
> it will actually scan every single record in the table rather than just 
> getting a list of directories at the dir0 level. This applies even when 
> cached metadata is available. This is a big penalty especially as the 
> datasets grow.
> To avoid such situations, a logical prune rule can be used to collect 
> partition columns (`dir0`), either from metadata cache (if available) or 
> group scan, and drop unnecessary files from being read. The rule will be 
> applied on following conditions:
> 1) all queried columns are partitoin columns, and
> 2) either {{DISTINCT}} or {{GROUP BY}} operations are performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7011) Allow hybrid model in the Row set-based scan framework

2019-03-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803672#comment-16803672
 ] 

ASF GitHub Bot commented on DRILL-7011:
---

paul-rogers commented on issue #1711: DRILL-7011: Support schema in scan 
framework
URL: https://github.com/apache/drill/pull/1711#issuecomment-477490790
 
 
   Addressed review comments. Added more implicit type conversions.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow hybrid model in the Row set-based scan framework
> --
>
> Key: DRILL-7011
> URL: https://issues.apache.org/jira/browse/DRILL-7011
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> As part of schema provisioning project we want to allow hybrid model for Row 
> set-based scan framework, namely to allow to pass custom schema metadata 
> which can be partial.
> Currently schema provisioning has SchemaContainer class that contains the 
> following information (can be obtained from metastore, schema file, table 
> function):
> 1. schema represented by org.apache.drill.exec.record.metadata.TupleMetadata
> 2. properties represented by Map, can contain information if 
> schema is strict or partial (default is partial) etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)