[jira] [Updated] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata

2019-04-29 Thread Aman Sinha (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-7225:
--
 Reviewer: Aman Sinha
Fix Version/s: 1.17.0

Technically this is a regression from 1.15 but since Case 1 is the most common 
case and that is working, I am marking the fix version as 1.17.  

> Merging of columnTypeInfo for file with different schema throws 
> NullPointerException during refresh metadata
> 
>
> Key: DRILL-7225
> URL: https://issues.apache.org/jira/browse/DRILL-7225
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.17.0
>
>
> Merging of columnTypeInfo from two files with different schemas throws 
> nullpointerexception. For example if a directory Orders has two files:
>  * orders.parquet (with columns order_id, order_name, order_date)
>  * orders_with_address.parquet (with columns order_id, order_name, address)
> When refresh table metadata is triggered, metadata such as total_null_count 
> for columns in both the files is aggregated and updated in the 
> ColumnTypeInfo. Initially ColumnTypeInfo is initialized with the first file's 
> ColumnTypeInfo (i.e., order_id, order_name, order_date). While aggregating, 
> the existing ColumnTypeInfo is looked up for columns in the second file and 
> since some of them don't exist in the ColumnTypeInfo, a npe is thrown. This 
> can be fixed by initializing ColumnTypeInfo for columns that are not yet 
> present.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7223) Make the timeout in TimedCallable a configurable boot time parameter

2019-04-29 Thread Boaz Ben-Zvi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi reassigned DRILL-7223:
---

Assignee: Boaz Ben-Zvi

> Make the timeout in TimedCallable a configurable boot time parameter
> 
>
> Key: DRILL-7223
> URL: https://issues.apache.org/jira/browse/DRILL-7223
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Aman Sinha
>Assignee: Boaz Ben-Zvi
>Priority: Minor
> Fix For: 1.17.0
>
>
> The 
> [TimedCallable.TIMEOUT_PER_RUNNABLE_IN_MSECS|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java#L52]
>  is currently an internal Drill constant defined as 15 secs. This has been 
> there from day 1 of the introduction. Drill's TimedCallable implements the 
> Java concurrency's Callable interface to create timed threads. It is used by 
> the REFRESH METADATA command which creates multiple threads on the Foreman 
> node to gather Parquet metadata to build the metadata cache.
> Depending on the load on the system or for very large scale number of parquet 
> files (millions) it is possible to exceed this timeout.  While the exact root 
> cause of exceeding the timeout is being investigated, it makes sense to make 
> this timeout a configurable parameter to aid with large scale testing. This 
> JIRA is to make this a configurable bootstrapping option in the 
> drill-override.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata

2019-04-29 Thread Venkata Jyothsna Donapati (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829859#comment-16829859
 ] 

Venkata Jyothsna Donapati commented on DRILL-7225:
--

[~amansinha100] Case-2 results in NPE. Case-1 works fine which means there 
won't be any problem when two sub-directories under a directory have different 
schemas.

> Merging of columnTypeInfo for file with different schema throws 
> NullPointerException during refresh metadata
> 
>
> Key: DRILL-7225
> URL: https://issues.apache.org/jira/browse/DRILL-7225
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>
> Merging of columnTypeInfo from two files with different schemas throws 
> nullpointerexception. For example if a directory Orders has two files:
>  * orders.parquet (with columns order_id, order_name, order_date)
>  * orders_with_address.parquet (with columns order_id, order_name, address)
> When refresh table metadata is triggered, metadata such as total_null_count 
> for columns in both the files is aggregated and updated in the 
> ColumnTypeInfo. Initially ColumnTypeInfo is initialized with the first file's 
> ColumnTypeInfo (i.e., order_id, order_name, order_date). While aggregating, 
> the existing ColumnTypeInfo is looked up for columns in the second file and 
> since some of them don't exist in the ColumnTypeInfo, a npe is thrown. This 
> can be fixed by initializing ColumnTypeInfo for columns that are not yet 
> present.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7187) Improve selectivity estimates for range predicates when using histogram

2019-04-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829826#comment-16829826
 ] 

ASF GitHub Bot commented on DRILL-7187:
---

gparai commented on pull request #1772: DRILL-7187: Improve selectivity 
estimation of BETWEEN predicates and …
URL: https://github.com/apache/drill/pull/1772#discussion_r279571671
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestAnalyze.java
 ##
 @@ -480,6 +490,24 @@ public void testHistogramWithColumnsWithAllNulls() throws 
Exception {
 }
   }
 
+
+  @Test
+  public void testHistogramWithBetweenPredicate() throws Exception {
+try {
+  test("ALTER SESSION SET `planner.slice_target` = 1");
+  test("ALTER SESSION SET `store.format` = 'parquet'");
+  test("create table dfs.tmp.orders2 as select * from 
cp.`tpch/orders.parquet`");
+  test("analyze table dfs.tmp.orders2 compute statistics");
+  test("alter session set `planner.statistics.use` = true");
+
+  String query = "select 1 from dfs.tmp.orders2 o where o.o_orderdate >= 
date '1996-10-01' and o.o_orderdate < date '1996-10-01' + interval '3' month";
 
 Review comment:
   I do not see the `BETWEEN` clause in this query.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve selectivity estimates for range predicates when using histogram
> ---
>
> Key: DRILL-7187
> URL: https://issues.apache.org/jira/browse/DRILL-7187
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.17.0
>
>
> 2 types of selectivity estimation improvements need to be done:
> 1.  For range predicates on the same column, we need to collect all such 
> predicates in 1 group and do a histogram lookup for them together. 
> For instance: 
> {noformat}
>  WHERE a > 10 AND b < 20 AND c = 100 AND a <= 50 AND b < 50
> {noformat}
>  Currently, the Drill behavior is to treat each of the conjuncts 
> independently and multiply the individual selectivities.  However, that will 
> not give the accurate estimates. Here, we want to group the predicates on 'a' 
> together and do a single lookup.  Similarly for 'b'.  
> 2. NULLs are not maintained by the histogram but when doing the selectivity 
> calculations, the histogram should use the totalRowCount as the denominator 
> rather than the non-null count. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7187) Improve selectivity estimates for range predicates when using histogram

2019-04-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829827#comment-16829827
 ] 

ASF GitHub Bot commented on DRILL-7187:
---

gparai commented on pull request #1772: DRILL-7187: Improve selectivity 
estimation of BETWEEN predicates and …
URL: https://github.com/apache/drill/pull/1772#discussion_r279574279
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/NumericEquiDepthHistogram.java
 ##
 @@ -76,129 +73,188 @@ public NumericEquiDepthHistogram(int numBuckets) {
 numRowsPerBucket = -1;
   }
 
-  public long getNumRowsPerBucket() {
+  public double getNumRowsPerBucket() {
 return numRowsPerBucket;
   }
 
-  public void setNumRowsPerBucket(long numRows) {
+  public void setNumRowsPerBucket(double numRows) {
 this.numRowsPerBucket = numRows;
   }
 
   public Double[] getBuckets() {
 return buckets;
   }
 
+  /**
+   * Get the number of buckets in the histogram
+   * number of buckets is 1 less than the total # entries in the buckets array 
since last
+   * entry is the end point of the last bucket
+   */
+  public int getNumBuckets() {
+return buckets.length - 1;
+  }
+
+  /**
+   * Estimate the selectivity of a filter which may contain several range 
predicates and in the general case is of
+   * type: col op value1 AND col op value2 AND col op value3 ...
+   *  
+   *e.g a > 10 AND a < 50 AND a >= 20 AND a <= 70 ...
+   *  
+   * Even though in most cases it will have either 1 or 2 range conditions, we 
still have to handle the general case
+   * For each conjunct, we will find the histogram bucket ranges and intersect 
them, taking into account that the
+   * first and last bucket may be partially covered and all other buckets in 
the middle are fully covered.
+   */
   @Override
-  public Double estimatedSelectivity(final RexNode filter) {
-if (numRowsPerBucket >= 0) {
-  // at a minimum, the histogram should have a start and end point of 1 
bucket, so at least 2 entries
-  Preconditions.checkArgument(buckets.length >= 2,  "Histogram has invalid 
number of entries");
-  final int first = 0;
-  final int last = buckets.length - 1;
-
-  // number of buckets is 1 less than the total # entries in the buckets 
array since last
-  // entry is the end point of the last bucket
-  final int numBuckets = buckets.length - 1;
-  final long totalRows = numBuckets * numRowsPerBucket;
+  public Double estimatedSelectivity(final RexNode columnFilter, final long 
totalRowCount) {
+if (numRowsPerBucket == 0) {
+  return null;
+}
+
+// at a minimum, the histogram should have a start and end point of 1 
bucket, so at least 2 entries
+Preconditions.checkArgument(buckets.length >= 2,  "Histogram has invalid 
number of entries");
+
+List filterList = RelOptUtil.conjunctions(columnFilter);
+
+Range fullRange = Range.all();
+List unknownFilterList = new ArrayList();
+
+Range valuesRange = getValuesRange(filterList, fullRange, 
unknownFilterList);
+
+long numSelectedRows;
+// unknown counter is a count of filter predicates whose bucket ranges 
cannot be
+// determined from the histogram; this may happen for instance when there 
is an expression or
+// function involved..e.g  col > CAST('10' as INT)
+int unknown = unknownFilterList.size();
+
+if (valuesRange.hasLowerBound() || valuesRange.hasUpperBound()) {
+  numSelectedRows = getSelectedRows(valuesRange);
+} else {
+  numSelectedRows = 0;
+}
+
+if (numSelectedRows <= 0) {
+  return SMALL_SELECTIVITY;
+} else {
+  // for each 'unknown' range filter selectivity, use a default of 0.5 
(matches Calcite)
+  double scaleFactor = Math.pow(0.5, unknown);
+  return  ((double) numSelectedRows / totalRowCount) * scaleFactor;
+}
+  }
+
+  private Range getValuesRange(List filterList, Range 
fullRange, List unkownFilterList) {
+Range currentRange = fullRange;
+for (RexNode filter : filterList) {
   if (filter instanceof RexCall) {
-// get the operator
-SqlOperator op = ((RexCall) filter).getOperator();
-if (op.getKind() == SqlKind.GREATER_THAN ||
-op.getKind() == SqlKind.GREATER_THAN_OR_EQUAL) {
-  Double value = getLiteralValue(filter);
-  if (value != null) {
-
-// *** Handle the boundary conditions first ***
-
-// if value is less than or equal to the first bucket's start 
point then all rows qualify
-int result = value.compareTo(buckets[first]);
-if (result <= 0) {
-  return LARGE_SELECTIVITY;
-}
-// if value is greater than the end point of the last bucket, then 
none of the rows qualify
-result = value.compareTo(buckets[last]);
-if (result > 0) {
-  return SMALL_SELECTIVITY;
- 

[jira] [Commented] (DRILL-7187) Improve selectivity estimates for range predicates when using histogram

2019-04-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829825#comment-16829825
 ] 

ASF GitHub Bot commented on DRILL-7187:
---

gparai commented on pull request #1772: DRILL-7187: Improve selectivity 
estimation of BETWEEN predicates and …
URL: https://github.com/apache/drill/pull/1772#discussion_r279571290
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdSelectivity.java
 ##
 @@ -356,8 +426,8 @@ private boolean isMultiColumnPredicate(final RexNode node) 
{
 return findAllRexInputRefs(node).size() > 1;
   }
 
-  private static List findAllRexInputRefs(final RexNode node) {
-  List rexRefs = new ArrayList<>();
+  private static Set findAllRexInputRefs(final RexNode node) {
 
 Review comment:
   Would this not break the existing logic? For a predicate like $0=$0 using 
the `Set` would cause the `isMultiColumnPredicate` function to return false.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve selectivity estimates for range predicates when using histogram
> ---
>
> Key: DRILL-7187
> URL: https://issues.apache.org/jira/browse/DRILL-7187
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.17.0
>
>
> 2 types of selectivity estimation improvements need to be done:
> 1.  For range predicates on the same column, we need to collect all such 
> predicates in 1 group and do a histogram lookup for them together. 
> For instance: 
> {noformat}
>  WHERE a > 10 AND b < 20 AND c = 100 AND a <= 50 AND b < 50
> {noformat}
>  Currently, the Drill behavior is to treat each of the conjuncts 
> independently and multiply the individual selectivities.  However, that will 
> not give the accurate estimates. Here, we want to group the predicates on 'a' 
> together and do a single lookup.  Similarly for 'b'.  
> 2. NULLs are not maintained by the histogram but when doing the selectivity 
> calculations, the histogram should use the totalRowCount as the denominator 
> rather than the non-null count. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata

2019-04-29 Thread Aman Sinha (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829814#comment-16829814
 ] 

Aman Sinha commented on DRILL-7225:
---

[~vdonapati], let's clarify this further: 
{noformat}
Case 1:
   /a/b contains files with Schema A
   /a/c contains files with Schema B
Case 2:
   /a/b  contains files with Schema A and Schema B
{noformat}

Does the existing code work correctly for Case 1 and fails for Case 2 ?  Case 1 
is the more common scenario, Case 2 is much less common. 
   

> Merging of columnTypeInfo for file with different schema throws 
> NullPointerException during refresh metadata
> 
>
> Key: DRILL-7225
> URL: https://issues.apache.org/jira/browse/DRILL-7225
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>
> Merging of columnTypeInfo from two files with different schemas throws 
> nullpointerexception. For example if a directory Orders has two files:
>  * orders.parquet (with columns order_id, order_name, order_date)
>  * orders_with_address.parquet (with columns order_id, order_name, address)
> When refresh table metadata is triggered, metadata such as total_null_count 
> for columns in both the files is aggregated and updated in the 
> ColumnTypeInfo. Initially ColumnTypeInfo is initialized with the first file's 
> ColumnTypeInfo (i.e., order_id, order_name, order_date). While aggregating, 
> the existing ColumnTypeInfo is looked up for columns in the second file and 
> since some of them don't exist in the ColumnTypeInfo, a npe is thrown. This 
> can be fixed by initializing ColumnTypeInfo for columns that are not yet 
> present.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata

2019-04-29 Thread Venkata Jyothsna Donapati (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Jyothsna Donapati updated DRILL-7225:
-
Description: 
Merging of columnTypeInfo from two files with different schemas throws 
nullpointerexception. For example if a directory Orders has two files:
 * orders.parquet (with columns order_id, order_name, order_date)
 * orders_with_address.parquet (with columns order_id, order_name, address)

When refresh table metadata is triggered, metadata such as total_null_count for 
columns in both the files is aggregated and updated in the ColumnTypeInfo. 
Initially ColumnTypeInfo is initialized with the first file's ColumnTypeInfo 
(i.e., order_id, order_name, order_date). While aggregating, the existing 
ColumnTypeInfo is looked up for columns in the second file and since some of 
them don't exist in the ColumnTypeInfo, a npe is thrown. This can be fixed by 
initializing ColumnTypeInfo for columns that are not yet present.

 

  was:
Merging of columnTypeInfo from two files with different schemas throws 
nullpointerexception. For example if a directory Orders has two files:
 * orders.parquet (with columns order_id, order_name, order_date)
 * orders_with_address.parquet (with columns order_id, order_name, address)

When refresh table metadata is triggered, metadata such as total_null_count for 
columns in both the files is aggregated and updated in the ColumnTypeInfo. 
Initially ColumnTypeInfo is initialized with the first file's ColumnTypeInfo. 
While aggregating, the existing ColumnTypeInfo is looked up for columns in the 
second file and since some of them don't exist in the ColumnTypeInfo, a npe is 
thrown. This can be fixed by initializing ColumnTypeInfo for columns that are 
not yet present.

 


> Merging of columnTypeInfo for file with different schema throws 
> NullPointerException during refresh metadata
> 
>
> Key: DRILL-7225
> URL: https://issues.apache.org/jira/browse/DRILL-7225
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>
> Merging of columnTypeInfo from two files with different schemas throws 
> nullpointerexception. For example if a directory Orders has two files:
>  * orders.parquet (with columns order_id, order_name, order_date)
>  * orders_with_address.parquet (with columns order_id, order_name, address)
> When refresh table metadata is triggered, metadata such as total_null_count 
> for columns in both the files is aggregated and updated in the 
> ColumnTypeInfo. Initially ColumnTypeInfo is initialized with the first file's 
> ColumnTypeInfo (i.e., order_id, order_name, order_date). While aggregating, 
> the existing ColumnTypeInfo is looked up for columns in the second file and 
> since some of them don't exist in the ColumnTypeInfo, a npe is thrown. This 
> can be fixed by initializing ColumnTypeInfo for columns that are not yet 
> present.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata

2019-04-29 Thread Venkata Jyothsna Donapati (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Jyothsna Donapati updated DRILL-7225:
-
Description: 
Merging of columnTypeInfo from two files with different schemas throws 
nullpointerexception. For example if a directory Orders has two files:
 * orders.parquet (with columns order_id, order_name, order_date)
 * orders_with_address.parquet (with columns order_id, order_name, address)

When refresh table metadata is triggered, metadata such as total_null_count for 
columns in both the files is aggregated and updated in the ColumnTypeInfo. 
Initially ColumnTypeInfo is initialized with the first file's ColumnTypeInfo. 
While aggregating, the existing ColumnTypeInfo is looked up for columns in the 
second file and since some of them don't exist in the ColumnTypeInfo, a npe is 
thrown. This can be fixed by initializing ColumnTypeInfo for columns that are 
not yet present.

 

  was:Merging of columnTypeInfo from two files with different schemas throws 
nullpointerexception


> Merging of columnTypeInfo for file with different schema throws 
> NullPointerException during refresh metadata
> 
>
> Key: DRILL-7225
> URL: https://issues.apache.org/jira/browse/DRILL-7225
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>
> Merging of columnTypeInfo from two files with different schemas throws 
> nullpointerexception. For example if a directory Orders has two files:
>  * orders.parquet (with columns order_id, order_name, order_date)
>  * orders_with_address.parquet (with columns order_id, order_name, address)
> When refresh table metadata is triggered, metadata such as total_null_count 
> for columns in both the files is aggregated and updated in the 
> ColumnTypeInfo. Initially ColumnTypeInfo is initialized with the first file's 
> ColumnTypeInfo. While aggregating, the existing ColumnTypeInfo is looked up 
> for columns in the second file and since some of them don't exist in the 
> ColumnTypeInfo, a npe is thrown. This can be fixed by initializing 
> ColumnTypeInfo for columns that are not yet present.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed

2019-04-29 Thread Gautam Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829718#comment-16829718
 ] 

Gautam Parai commented on DRILL-7148:
-

PR link is present in the Issue Links section

> TPCH query 17 increases execution time with Statistics enabled because join 
> order is changed
> 
>
> Key: DRILL-7148
> URL: https://issues.apache.org/jira/browse/DRILL-7148
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Gautam Parai
>Assignee: Gautam Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order 
> has flipped the build side and the probe side in Major Fragment 01.
> Here is the query:
> select
>  sum(l.l_extendedprice) / 7.0 as avg_yearly
> from
>  lineitem l,
>  part p
> where
>  p.p_partkey = l.l_partkey
>  and p.p_brand = 'Brand#13'
>  and p.p_container = 'JUMBO CAN'
>  and l.l_quantity < (
>  select
>  0.2 * avg(l2.l_quantity)
>  from
>  lineitem l2
>  where
>  l2.l_partkey = p.p_partkey
>  );
> Here is original plan:
> {noformat}
> 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, 
> cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, 
> 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 
> memory}, id = 489493
> 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY 
> avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, 
> 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 
> 1.17294998955024E11 memory}, id = 489492
> 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): 
> rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, 
> 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 
> 1.17294998955024E11 memory}, id = 489491
> 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, 
> cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, 
> 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 
> memory}, id = 489490
> 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): 
> rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, 
> 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489489
> 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY 
> l_extendedprice): rowcount = 2.948545E9, cumulative cost = 
> \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, 
> 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488
> 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY 
> l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = 
> 2.948545E9, cumulative cost = \{7.253787630218E10 rows, 
> 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489487
> 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY 
> l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): 
> rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, 
> 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489486
> 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) 
> : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, 
> ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = 
> \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, 
> 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485
> 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : 
> rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): 
> rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, 
> 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 
> memory}, id = 489476
> 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY 
> l_quantity, ANY l_extendedprice, ANY p_partkey, ANY 
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = 
> \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, 
> 9.8471562592256E13 network, 7.92E7 memory}, id = 489475
> 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_quantity, ANY 
> l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 
> 5.89709E9, cumulative cost = \{3.0417948545E10 rows, 1.57618732434E11 
> cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489474
> 04-01 Project(l_quantity=[$0], l_extendedprice=[$1], 

[jira] [Commented] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed

2019-04-29 Thread Gautam Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829717#comment-16829717
 ] 

Gautam Parai commented on DRILL-7148:
-

[https://github.com/apache/drill/pull/1744]

> TPCH query 17 increases execution time with Statistics enabled because join 
> order is changed
> 
>
> Key: DRILL-7148
> URL: https://issues.apache.org/jira/browse/DRILL-7148
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Gautam Parai
>Assignee: Gautam Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order 
> has flipped the build side and the probe side in Major Fragment 01.
> Here is the query:
> select
>  sum(l.l_extendedprice) / 7.0 as avg_yearly
> from
>  lineitem l,
>  part p
> where
>  p.p_partkey = l.l_partkey
>  and p.p_brand = 'Brand#13'
>  and p.p_container = 'JUMBO CAN'
>  and l.l_quantity < (
>  select
>  0.2 * avg(l2.l_quantity)
>  from
>  lineitem l2
>  where
>  l2.l_partkey = p.p_partkey
>  );
> Here is original plan:
> {noformat}
> 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, 
> cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, 
> 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 
> memory}, id = 489493
> 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY 
> avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, 
> 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 
> 1.17294998955024E11 memory}, id = 489492
> 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): 
> rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, 
> 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 
> 1.17294998955024E11 memory}, id = 489491
> 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, 
> cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, 
> 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 
> memory}, id = 489490
> 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): 
> rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, 
> 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489489
> 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY 
> l_extendedprice): rowcount = 2.948545E9, cumulative cost = 
> \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, 
> 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488
> 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY 
> l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = 
> 2.948545E9, cumulative cost = \{7.253787630218E10 rows, 
> 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489487
> 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY 
> l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): 
> rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, 
> 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489486
> 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) 
> : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, 
> ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = 
> \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, 
> 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485
> 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : 
> rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): 
> rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, 
> 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 
> memory}, id = 489476
> 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY 
> l_quantity, ANY l_extendedprice, ANY p_partkey, ANY 
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = 
> \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, 
> 9.8471562592256E13 network, 7.92E7 memory}, id = 489475
> 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_quantity, ANY 
> l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 
> 5.89709E9, cumulative cost = \{3.0417948545E10 rows, 1.57618732434E11 
> cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489474
> 04-01 Project(l_quantity=[$0], l_extendedprice=[$1], 

[jira] [Issue Comment Deleted] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed

2019-04-29 Thread Gautam Parai (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Parai updated DRILL-7148:

Comment: was deleted

(was: [https://github.com/apache/drill/pull/1744])

> TPCH query 17 increases execution time with Statistics enabled because join 
> order is changed
> 
>
> Key: DRILL-7148
> URL: https://issues.apache.org/jira/browse/DRILL-7148
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Gautam Parai
>Assignee: Gautam Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order 
> has flipped the build side and the probe side in Major Fragment 01.
> Here is the query:
> select
>  sum(l.l_extendedprice) / 7.0 as avg_yearly
> from
>  lineitem l,
>  part p
> where
>  p.p_partkey = l.l_partkey
>  and p.p_brand = 'Brand#13'
>  and p.p_container = 'JUMBO CAN'
>  and l.l_quantity < (
>  select
>  0.2 * avg(l2.l_quantity)
>  from
>  lineitem l2
>  where
>  l2.l_partkey = p.p_partkey
>  );
> Here is original plan:
> {noformat}
> 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, 
> cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, 
> 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 
> memory}, id = 489493
> 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY 
> avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, 
> 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 
> 1.17294998955024E11 memory}, id = 489492
> 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): 
> rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, 
> 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 
> 1.17294998955024E11 memory}, id = 489491
> 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, 
> cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, 
> 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 
> memory}, id = 489490
> 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): 
> rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, 
> 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489489
> 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY 
> l_extendedprice): rowcount = 2.948545E9, cumulative cost = 
> \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, 
> 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488
> 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY 
> l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = 
> 2.948545E9, cumulative cost = \{7.253787630218E10 rows, 
> 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489487
> 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY 
> l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): 
> rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, 
> 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489486
> 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) 
> : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, 
> ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = 
> \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, 
> 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485
> 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : 
> rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): 
> rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, 
> 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 
> memory}, id = 489476
> 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY 
> l_quantity, ANY l_extendedprice, ANY p_partkey, ANY 
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = 
> \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, 
> 9.8471562592256E13 network, 7.92E7 memory}, id = 489475
> 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_quantity, ANY 
> l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 
> 5.89709E9, cumulative cost = \{3.0417948545E10 rows, 1.57618732434E11 
> cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489474
> 04-01 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2], 

[jira] [Updated] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed

2019-04-29 Thread Gautam Parai (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Parai updated DRILL-7148:

Labels: ready-to-commit  (was: )

> TPCH query 17 increases execution time with Statistics enabled because join 
> order is changed
> 
>
> Key: DRILL-7148
> URL: https://issues.apache.org/jira/browse/DRILL-7148
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Gautam Parai
>Assignee: Gautam Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order 
> has flipped the build side and the probe side in Major Fragment 01.
> Here is the query:
> select
>  sum(l.l_extendedprice) / 7.0 as avg_yearly
> from
>  lineitem l,
>  part p
> where
>  p.p_partkey = l.l_partkey
>  and p.p_brand = 'Brand#13'
>  and p.p_container = 'JUMBO CAN'
>  and l.l_quantity < (
>  select
>  0.2 * avg(l2.l_quantity)
>  from
>  lineitem l2
>  where
>  l2.l_partkey = p.p_partkey
>  );
> Here is original plan:
> {noformat}
> 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, 
> cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, 
> 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 
> memory}, id = 489493
> 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY 
> avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, 
> 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 
> 1.17294998955024E11 memory}, id = 489492
> 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): 
> rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, 
> 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 
> 1.17294998955024E11 memory}, id = 489491
> 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, 
> cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, 
> 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 
> memory}, id = 489490
> 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): 
> rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, 
> 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489489
> 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY 
> l_extendedprice): rowcount = 2.948545E9, cumulative cost = 
> \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, 
> 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488
> 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY 
> l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = 
> 2.948545E9, cumulative cost = \{7.253787630218E10 rows, 
> 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489487
> 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY 
> l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): 
> rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, 
> 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489486
> 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) 
> : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, 
> ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = 
> \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, 
> 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485
> 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : 
> rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): 
> rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, 
> 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 
> memory}, id = 489476
> 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY 
> l_quantity, ANY l_extendedprice, ANY p_partkey, ANY 
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = 
> \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, 
> 9.8471562592256E13 network, 7.92E7 memory}, id = 489475
> 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_quantity, ANY 
> l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 
> 5.89709E9, cumulative cost = \{3.0417948545E10 rows, 1.57618732434E11 
> cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489474
> 04-01 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2], 
> 

[jira] [Updated] (DRILL-6964) Implement CREATE / DROP TABLE SCHEMA commands

2019-04-29 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-6964:
--
Labels: doc-complete ready-to-commit  (was: ready-to-commit)

> Implement CREATE / DROP TABLE SCHEMA commands
> -
>
> Key: DRILL-6964
> URL: https://issues.apache.org/jira/browse/DRILL-6964
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.16.0
>
>
> Design doc - 
> https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6964) Implement CREATE / DROP TABLE SCHEMA commands

2019-04-29 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829698#comment-16829698
 ] 

Bridget Bevens commented on DRILL-6964:
---

doc posted here: 
https://drill.apache.org/docs/create-or-replace-schema/


 

> Implement CREATE / DROP TABLE SCHEMA commands
> -
>
> Key: DRILL-6964
> URL: https://issues.apache.org/jira/browse/DRILL-6964
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.16.0
>
>
> Design doc - 
> https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata

2019-04-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829693#comment-16829693
 ] 

ASF GitHub Bot commented on DRILL-7225:
---

dvjyothsna commented on pull request #1773: DRILL-7225: Fixed merging 
ColumnTypeInfo for files with different schemas
URL: https://github.com/apache/drill/pull/1773
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merging of columnTypeInfo for file with different schema throws 
> NullPointerException during refresh metadata
> 
>
> Key: DRILL-7225
> URL: https://issues.apache.org/jira/browse/DRILL-7225
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>
> Merging of columnTypeInfo from two files with different schemas throws 
> nullpointerexception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata

2019-04-29 Thread Venkata Jyothsna Donapati (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Jyothsna Donapati updated DRILL-7225:
-
Description: Merging of columnTypeInfo from two files with different 
schemas throws nullpointerexception

> Merging of columnTypeInfo for file with different schema throws 
> NullPointerException during refresh metadata
> 
>
> Key: DRILL-7225
> URL: https://issues.apache.org/jira/browse/DRILL-7225
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>
> Merging of columnTypeInfo from two files with different schemas throws 
> nullpointerexception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata

2019-04-29 Thread Venkata Jyothsna Donapati (JIRA)
Venkata Jyothsna Donapati created DRILL-7225:


 Summary: Merging of columnTypeInfo for file with different schema 
throws NullPointerException during refresh metadata
 Key: DRILL-7225
 URL: https://issues.apache.org/jira/browse/DRILL-7225
 Project: Apache Drill
  Issue Type: Bug
Reporter: Venkata Jyothsna Donapati
Assignee: Venkata Jyothsna Donapati






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7187) Improve selectivity estimates for range predicates when using histogram

2019-04-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829620#comment-16829620
 ] 

ASF GitHub Bot commented on DRILL-7187:
---

amansinha100 commented on pull request #1772: DRILL-7187: Improve selectivity 
estimation of BETWEEN predicates and …
URL: https://github.com/apache/drill/pull/1772
 
 
   …arbitrary combination of range predicates.
   
   - Also, propagate the totalRowCount to the histogram selectivity estimation 
and use it instead of the nonNullCount. 
   - Before and after estimates for the following predicate: 
   `where o.o_orderdate >= date '1996-10-01' and o.o_orderdate < date 
'1996-10-01' + interval '3' month`
   BEFORE this PR:  estimated filter row count = **3206**
   AFTER this PR: estimated filter row count = **601**
   ACTUAL row count   = **561**
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve selectivity estimates for range predicates when using histogram
> ---
>
> Key: DRILL-7187
> URL: https://issues.apache.org/jira/browse/DRILL-7187
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.17.0
>
>
> 2 types of selectivity estimation improvements need to be done:
> 1.  For range predicates on the same column, we need to collect all such 
> predicates in 1 group and do a histogram lookup for them together. 
> For instance: 
> {noformat}
>  WHERE a > 10 AND b < 20 AND c = 100 AND a <= 50 AND b < 50
> {noformat}
>  Currently, the Drill behavior is to treat each of the conjuncts 
> independently and multiply the individual selectivities.  However, that will 
> not give the accurate estimates. Here, we want to group the predicates on 'a' 
> together and do a single lookup.  Similarly for 'b'.  
> 2. NULLs are not maintained by the histogram but when doing the selectivity 
> calculations, the histogram should use the totalRowCount as the denominator 
> rather than the non-null count. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7099) Resource Management in Exchange Operators

2019-04-29 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-7099:
-
Fix Version/s: 1.17.0

> Resource Management in Exchange Operators
> -
>
> Key: DRILL-7099
> URL: https://issues.apache.org/jira/browse/DRILL-7099
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>Priority: Major
> Fix For: 1.17.0
>
>
> This Jira will be used to track the changes required for implementing 
> Resource Management in Exchange operators.
> The design can be found here: 
> https://docs.google.com/document/d/1N9OXfCWcp68jsxYVmSt9tPgnZRV_zk8rwwFh0BxXZeE/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7099) Resource Management in Exchange Operators

2019-04-29 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-7099:
-
Issue Type: Improvement  (was: Bug)

> Resource Management in Exchange Operators
> -
>
> Key: DRILL-7099
> URL: https://issues.apache.org/jira/browse/DRILL-7099
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>Priority: Major
> Fix For: 1.17.0
>
>
> This Jira will be used to track the changes required for implementing 
> Resource Management in Exchange operators.
> The design can be found here: 
> https://docs.google.com/document/d/1N9OXfCWcp68jsxYVmSt9tPgnZRV_zk8rwwFh0BxXZeE/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)