[jira] [Closed] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata

2019-05-23 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-7225.
-

> Merging of columnTypeInfo for file with different schema throws 
> NullPointerException during refresh metadata
> 
>
> Key: DRILL-7225
> URL: https://issues.apache.org/jira/browse/DRILL-7225
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.16.0
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> Merging of columnTypeInfo from two files with different schemas throws 
> nullpointerexception. For example if a directory Orders has two files:
>  * orders.parquet (with columns order_id, order_name, order_date)
>  * orders_with_address.parquet (with columns order_id, order_name, address)
> When refresh table metadata is triggered, metadata such as total_null_count 
> for columns in both the files is aggregated and updated in the 
> ColumnTypeInfo. Initially ColumnTypeInfo is initialized with the first file's 
> ColumnTypeInfo (i.e., order_id, order_name, order_date). While aggregating, 
> the existing ColumnTypeInfo is looked up for columns in the second file and 
> since some of them don't exist in the ColumnTypeInfo, a npe is thrown. This 
> can be fixed by initializing ColumnTypeInfo for columns that are not yet 
> present.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata

2019-05-23 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847139#comment-16847139
 ] 

Robert Hou commented on DRILL-7225:
---

I have not been able to reproduce the original problem.

I created three files in one directory.  File 1 and file 2 have the same column 
names, but for one of the columns, the data type is different for each file.  
Also, file 1 and file 3 have one column that has different names.  I could not 
get an error with both the new code and old code (before the commit).  Either 
way, the new code is working.

> Merging of columnTypeInfo for file with different schema throws 
> NullPointerException during refresh metadata
> 
>
> Key: DRILL-7225
> URL: https://issues.apache.org/jira/browse/DRILL-7225
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.16.0
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> Merging of columnTypeInfo from two files with different schemas throws 
> nullpointerexception. For example if a directory Orders has two files:
>  * orders.parquet (with columns order_id, order_name, order_date)
>  * orders_with_address.parquet (with columns order_id, order_name, address)
> When refresh table metadata is triggered, metadata such as total_null_count 
> for columns in both the files is aggregated and updated in the 
> ColumnTypeInfo. Initially ColumnTypeInfo is initialized with the first file's 
> ColumnTypeInfo (i.e., order_id, order_name, order_date). While aggregating, 
> the existing ColumnTypeInfo is looked up for columns in the second file and 
> since some of them don't exist in the ColumnTypeInfo, a npe is thrown. This 
> can be fixed by initializing ColumnTypeInfo for columns that are not yet 
> present.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7227) TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100

2019-05-01 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830888#comment-16830888
 ] 

Robert Hou edited comment on DRILL-7227 at 5/1/19 6:32 AM:
---

Query 59 is fixed.  Queries 47 and 57 are still failing.  Here is query 47:
{noformat}
WITH v1 
 AS (SELECT i_category, 
i_brand, 
s_store_name, 
s_company_name, 
d_year, 
d_moy, 
Sum(ss_sales_price) sum_sales, 
Avg(Sum(ss_sales_price)) 
  OVER ( 
partition BY i_category, i_brand, s_store_name, 
  s_company_name, 
  d_year) 
avg_monthly_sales, 
Rank() 
  OVER ( 
partition BY i_category, i_brand, s_store_name, 
  s_company_name 
ORDER BY d_year, d_moy) rn 
 FROM   item, 
store_sales, 
date_dim, 
store 
 WHERE  ss_item_sk = i_item_sk 
AND ss_sold_date_sk = d_date_sk 
AND ss_store_sk = s_store_sk 
AND ( d_year = 1999 
   OR ( d_year = 1999 - 1 
AND d_moy = 12 ) 
   OR ( d_year = 1999 + 1 
AND d_moy = 1 ) ) 
 GROUP  BY i_category, 
   i_brand, 
   s_store_name, 
   s_company_name, 
   d_year, 
   d_moy), 
 v2 
 AS (SELECT v1.i_category, 
v1.d_year, 
v1.d_moy, 
v1.avg_monthly_sales, 
v1.sum_sales, 
v1_lag.sum_sales  psum, 
v1_lead.sum_sales nsum 
 FROM   v1, 
v1 v1_lag, 
v1 v1_lead 
 WHERE  v1.i_category = v1_lag.i_category 
AND v1.i_category = v1_lead.i_category 
AND v1.i_brand = v1_lag.i_brand 
AND v1.i_brand = v1_lead.i_brand 
AND v1.s_store_name = v1_lag.s_store_name 
AND v1.s_store_name = v1_lead.s_store_name 
AND v1.s_company_name = v1_lag.s_company_name 
AND v1.s_company_name = v1_lead.s_company_name 
AND v1.rn = v1_lag.rn + 1 
AND v1.rn = v1_lead.rn - 1) 
SELECT * 
FROM   v2 
WHERE  d_year = 1999 
   AND avg_monthly_sales > 0 
   AND CASE 
 WHEN avg_monthly_sales > 0 THEN Abs(sum_sales - avg_monthly_sales) 
 / 
 avg_monthly_sales 
 ELSE NULL 
   END > 0.1 
ORDER  BY sum_sales - avg_monthly_sales, 
  3
LIMIT 100; 
r...@ucs-node1.perf.lab :~/perftests/BENCHMARKS/TPCDS/Queries> cat TPCDS_57.sql 
WITH v1 
 AS (SELECT i_category, 
i_brand, 
cc_name, 
d_year, 
d_moy, 
Sum(cs_sales_price)
sum_sales 
, 
Avg(Sum(cs_sales_price)) 
  OVER ( 
partition BY i_category, i_brand, cc_name, d_year) 
avg_monthly_sales 
   , 
Rank() 
  OVER ( 
partition BY i_category, i_brand, cc_name 
ORDER BY d_year, d_moy)rn 
 FROM   item, 
catalog_sales, 
date_dim, 
call_center 
 WHERE  cs_item_sk = i_item_sk 
AND cs_sold_date_sk = d_date_sk 
AND cc_call_center_sk = cs_call_center_sk 
AND ( d_year = 2000 
   OR ( d_year = 2000 - 1 
AND d_moy = 12 ) 
   OR ( d_year = 2000 + 1 
AND d_moy = 1 ) ) 
 GROUP  BY i_category, 
   i_brand, 
   cc_name, 
   d_year, 
   d_moy), 
 v2 
 AS (SELECT v1.i_brand, 
v1.d_year, 
v1.avg_monthly_sales, 
v1.sum_sales, 
v1_lag.sum_sales  psum, 
v1_lead.sum_sales nsum 
 FROM   v1, 
v1 v1_lag, 
v1 v1_lead 
 WHERE  v1.i_category = v1_lag.i_category 
AND v1.i_category = v1_lead.i_category 
AND v1.i_brand = v1_lag.i_brand 
AND v1.i_brand = v1_lead.i_brand 
AND v1. cc_name = v1_lag. cc_name 
AND v1. cc_name = v1_lead. cc_name 
AND v1.rn = v1_lag.rn + 1 
AND v1.rn = v1_lead.rn - 

[jira] [Commented] (DRILL-7227) TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100

2019-05-01 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830888#comment-16830888
 ] 

Robert Hou commented on DRILL-7227:
---

Query 59 is fixed.  Queries 47 and 57 are still failing.  Here is the stack 
trace for query 47:
{noformat}
2019-04-30 22:42:31,964 [2336ce39-d449-3a2b-2ab4-c890128bde02:foreman] INFO  
o.a.d.e.store.mock.MockStorageEngine - Took 21 ms to read statistics from 
parquet format plugin
2019-04-30 22:42:32,041 [2336ce39-d449-3a2b-2ab4-c890128bde02:foreman] INFO  
o.a.d.e.store.mock.MockStorageEngine - Took 12 ms to read statistics from 
parquet format plugin
2019-04-30 22:42:32,154 [2336ce39-d449-3a2b-2ab4-c890128bde02:foreman] INFO  
o.a.d.e.store.mock.MockStorageEngine - Took 9 ms to read statistics from 
parquet format plugin
2019-04-30 22:42:32,183 [2336ce39-d449-3a2b-2ab4-c890128bde02:foreman] INFO  
o.a.d.e.store.mock.MockStorageEngine - Took 9 ms to read statistics from 
parquet format plugin
2019-04-30 22:42:33,705 [2336ce39-d449-3a2b-2ab4-c890128bde02:foreman] ERROR 
o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: NullPointerException


Please, refer to logs for more information.

[Error Id: 6d98ee46-91d5-4ad9-84a2-2a0903e8d977 on ucs-node3.perf.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
NullPointerException


Please, refer to logs for more information.

[Error Id: 6d98ee46-91d5-4ad9-84a2-2a0903e8d977 on ucs-node3.perf.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
 ~[drill-common-1.16.0.0-mapr.jar:1.16.0.0-mapr]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:789)
 [drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr]
at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.checkCommonStates(QueryStateProcessor.java:325)
 [drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr]
at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.planning(QueryStateProcessor.java:221)
 [drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr]
at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:83)
 [drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:304) 
[drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_112]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112]
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
exception during fragment initialization: Error while applying rule 
DrillPushProjectIntoScanRule:enumerable, args 
[rel#12063:LogicalProject.NONE.ANY([]).[](input=rel#12062:Subset#2.ENUMERABLE.ANY([]).[],ss_sold_date_sk=$1,ss_sold_time_sk=$2,ss_item_sk=$3,ss_customer_sk=$4,ss_cdemo_sk=$5,ss_hdemo_sk=$6,ss_addr_sk=$7,ss_store_sk=$8,ss_promo_sk=$9,ss_ticket_number=$10,ss_quantity=$11,ss_wholesale_cost=$12,ss_list_price=$13,ss_sales_price=$14,ss_ext_discount_amt=$15,ss_ext_sales_price=$16,ss_ext_wholesale_cost=$17,ss_ext_list_price=$18,ss_ext_tax=$19,ss_coupon_amt=$20,ss_net_paid=$21,ss_net_paid_inc_tax=$22,ss_net_profit=$23),
 rel#11499:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, 
/tpcdsParquet10/SF100/store_sales])]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:305) 
[drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr]
... 3 common frames omitted
Caused by: java.lang.RuntimeException: Error while applying rule 
DrillPushProjectIntoScanRule:enumerable, args 
[rel#12063:LogicalProject.NONE.ANY([]).[](input=rel#12062:Subset#2.ENUMERABLE.ANY([]).[],ss_sold_date_sk=$1,ss_sold_time_sk=$2,ss_item_sk=$3,ss_customer_sk=$4,ss_cdemo_sk=$5,ss_hdemo_sk=$6,ss_addr_sk=$7,ss_store_sk=$8,ss_promo_sk=$9,ss_ticket_number=$10,ss_quantity=$11,ss_wholesale_cost=$12,ss_list_price=$13,ss_sales_price=$14,ss_ext_discount_amt=$15,ss_ext_sales_price=$16,ss_ext_wholesale_cost=$17,ss_ext_list_price=$18,ss_ext_tax=$19,ss_coupon_amt=$20,ss_net_paid=$21,ss_net_paid_inc_tax=$22,ss_net_profit=$23),
 rel#11499:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, 
/tpcdsParquet10/SF100/store_sales])]
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:236)
 ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:643)
 ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:339) 
~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:430)
 

[jira] [Created] (DRILL-7227) TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100

2019-04-30 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7227:
-

 Summary: TPCDS queries 47, 57, 59 fail to run with Statistics 
enabled at sf100
 Key: DRILL-7227
 URL: https://issues.apache.org/jira/browse/DRILL-7227
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.17.0
 Attachments: 23387ab0-cb1c-cd5e-449a-c9bcefc901c1.sys.drill, 
2338ae93-155b-356d-382e-0da949c6f439.sys.drill

Here is query 78:
{noformat}
WITH ws 
 AS (SELECT d_year AS ws_sold_year, 
ws_item_sk, 
ws_bill_customer_skws_customer_sk, 
Sum(ws_quantity)   ws_qty, 
Sum(ws_wholesale_cost) ws_wc, 
Sum(ws_sales_price)ws_sp 
 FROM   web_sales 
LEFT JOIN web_returns 
   ON wr_order_number = ws_order_number 
  AND ws_item_sk = wr_item_sk 
JOIN date_dim 
  ON ws_sold_date_sk = d_date_sk 
 WHERE  wr_order_number IS NULL 
 GROUP  BY d_year, 
   ws_item_sk, 
   ws_bill_customer_sk), 
 cs 
 AS (SELECT d_year AS cs_sold_year, 
cs_item_sk, 
cs_bill_customer_skcs_customer_sk, 
Sum(cs_quantity)   cs_qty, 
Sum(cs_wholesale_cost) cs_wc, 
Sum(cs_sales_price)cs_sp 
 FROM   catalog_sales 
LEFT JOIN catalog_returns 
   ON cr_order_number = cs_order_number 
  AND cs_item_sk = cr_item_sk 
JOIN date_dim 
  ON cs_sold_date_sk = d_date_sk 
 WHERE  cr_order_number IS NULL 
 GROUP  BY d_year, 
   cs_item_sk, 
   cs_bill_customer_sk), 
 ss 
 AS (SELECT d_year AS ss_sold_year, 
ss_item_sk, 
ss_customer_sk, 
Sum(ss_quantity)   ss_qty, 
Sum(ss_wholesale_cost) ss_wc, 
Sum(ss_sales_price)ss_sp 
 FROM   store_sales 
LEFT JOIN store_returns 
   ON sr_ticket_number = ss_ticket_number 
  AND ss_item_sk = sr_item_sk 
JOIN date_dim 
  ON ss_sold_date_sk = d_date_sk 
 WHERE  sr_ticket_number IS NULL 
 GROUP  BY d_year, 
   ss_item_sk, 
   ss_customer_sk) 
SELECT ss_item_sk, 
   Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) ratio, 
   ss_qty  store_qty, 
   ss_wc 
   store_wholesale_cost, 
   ss_sp 
   store_sales_price, 
   COALESCE(ws_qty, 0) + COALESCE(cs_qty, 0) 
   other_chan_qty, 
   COALESCE(ws_wc, 0) + COALESCE(cs_wc, 0) 
   other_chan_wholesale_cost, 
   COALESCE(ws_sp, 0) + COALESCE(cs_sp, 0) 
   other_chan_sales_price 
FROM   ss 
   LEFT JOIN ws 
  ON ( ws_sold_year = ss_sold_year 
   AND ws_item_sk = ss_item_sk 
   AND ws_customer_sk = ss_customer_sk ) 
   LEFT JOIN cs 
  ON ( cs_sold_year = ss_sold_year 
   AND cs_item_sk = cs_item_sk 
   AND cs_customer_sk = ss_customer_sk ) 
WHERE  COALESCE(ws_qty, 0) > 0 
   AND COALESCE(cs_qty, 0) > 0 
   AND ss_sold_year = 1999 
ORDER  BY ss_item_sk, 
  ss_qty DESC, 
  ss_wc DESC, 
  ss_sp DESC, 
  other_chan_qty, 
  other_chan_wholesale_cost, 
  other_chan_sales_price, 
  Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2)
LIMIT 100; 
{noformat}

The profile for the new plan is 2338ae93-155b-356d-382e-0da949c6f439.  Hash 
partition sender operator (10-00) takes 10-15 minutes.  I am not sure why it 
takes so long.  It has 10 minor fragments sending to receiver (06-05), which 
has 62 minor fragments.  But hash partition sender (16-00) has 10 minor 
fragments sending to receiver (12-06), which has 220 minor fragments, and there 
is no performance issue.

The profile for the old plan is 23387ab0-cb1c-cd5e-449a-c9bcefc901c1.  Both 
plans use the same commit.  The old plan is created by disabling statistics.

I have not included the plans in the Jira because Jira has a max of 32K.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7183) TPCDS query 10, 35, 69 take longer with sf 1000 when Statistics are disabled

2019-04-17 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7183:
-

 Summary: TPCDS query 10, 35, 69 take longer with sf 1000 when 
Statistics are disabled
 Key: DRILL-7183
 URL: https://issues.apache.org/jira/browse/DRILL-7183
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Hanumath Rao Maduri
 Fix For: 1.16.0


Query 69 runs 150% slower when Statistics is disabled.  Here is the query:
{noformat}
SELECT
  cd_gender,
  cd_marital_status,
  cd_education_status,
  count(*) cnt1,
  cd_purchase_estimate,
  count(*) cnt2,
  cd_credit_rating,
  count(*) cnt3
FROM
  customer c, customer_address ca, customer_demographics
WHERE
  c.c_current_addr_sk = ca.ca_address_sk AND
ca_state IN ('KY', 'GA', 'NM') AND
cd_demo_sk = c.c_current_cdemo_sk AND
exists(SELECT *
   FROM store_sales, date_dim
   WHERE c.c_customer_sk = ss_customer_sk AND
 ss_sold_date_sk = d_date_sk AND
 d_year = 2001 AND
 d_moy BETWEEN 4 AND 4 + 2) AND
(NOT exists(SELECT *
FROM web_sales, date_dim
WHERE c.c_customer_sk = ws_bill_customer_sk AND
  ws_sold_date_sk = d_date_sk AND
  d_year = 2001 AND
  d_moy BETWEEN 4 AND 4 + 2) AND
  NOT exists(SELECT *
 FROM catalog_sales, date_dim
 WHERE c.c_customer_sk = cs_ship_customer_sk AND
   cs_sold_date_sk = d_date_sk AND
   d_year = 2001 AND
   d_moy BETWEEN 4 AND 4 + 2))
GROUP BY cd_gender, cd_marital_status, cd_education_status,
  cd_purchase_estimate, cd_credit_rating
ORDER BY cd_gender, cd_marital_status, cd_education_status,
  cd_purchase_estimate, cd_credit_rating
LIMIT 100;
{noformat}

This regression is caused by commit 982e98061e029a39f1c593f695c0d93ec7079f0d.  
This commit should be reverted for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled

2019-04-05 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810456#comment-16810456
 ] 

Robert Hou edited comment on DRILL-7154 at 4/6/19 1:01 AM:
---

Sorabh gave me a private branch where he reverted the RM commit on Apache 
master.  With this private branch, the memory used in the profile was restored 
to the original amount.


was (Author: rhou):
Sorabh gave me a private branch where he reverted the RM commit on Apache 
master.  With this private branch, the memory allocation was restored to the 
original amount.

> TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
> -
>
> Key: DRILL-7154
> URL: https://issues.apache.org/jira/browse/DRILL-7154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.16.0
>
> Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
> 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, 
> hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, 
> hashagg.stats.disabled.foreman.log
>
>
> Here is TPCH 04 with sf 1000:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
> operator in the new plan is taking longer.  One possible reason is that the 
> Hash Agg operator in the new plan is not using as many buckets as the old 
> plan did.  The memory usage of the Hash Agg operator in the new plan is using 
> less memory compared to the old plan.
> Here is the old plan:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
> order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
> rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 
> network, 2.2631985057468002E10 memory}, id = 5645
> 00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
> 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 
> memory}, id = 5644
> 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
> 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
> 01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
> 02-01SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
> 02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
> 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType 
> = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 
> memory}, id = 5639
> 02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
> cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5638
> 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : 
> rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 
> 3.75E7, cumulative cost = 

[jira] [Commented] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled

2019-04-04 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810456#comment-16810456
 ] 

Robert Hou commented on DRILL-7154:
---

Sorabh gave me a private branch where he reverted the RM commit on Apache 
master.  With this private branch, the memory allocation was restored to the 
original amount.

> TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
> -
>
> Key: DRILL-7154
> URL: https://issues.apache.org/jira/browse/DRILL-7154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.16.0
>
> Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
> 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, 
> hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, 
> hashagg.stats.disabled.foreman.log
>
>
> Here is TPCH 04 with sf 1000:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
> operator in the new plan is taking longer.  One possible reason is that the 
> Hash Agg operator in the new plan is not using as many buckets as the old 
> plan did.  The memory usage of the Hash Agg operator in the new plan is using 
> less memory compared to the old plan.
> Here is the old plan:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
> order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
> rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 
> network, 2.2631985057468002E10 memory}, id = 5645
> 00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
> 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 
> memory}, id = 5644
> 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
> 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
> 01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
> 02-01SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
> 02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
> 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType 
> = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 
> memory}, id = 5639
> 02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
> cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5638
> 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : 
> rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 
> 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 
> cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5637
> 03-02  Project(o_orderpriority=[$1]) : rowType = 
> RecordType(ANY o_orderpriority): rowcount = 

[jira] [Assigned] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled

2019-04-04 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou reassigned DRILL-7154:
-

Assignee: Hanumath Rao Maduri  (was: Boaz Ben-Zvi)

> TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
> -
>
> Key: DRILL-7154
> URL: https://issues.apache.org/jira/browse/DRILL-7154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.16.0
>
> Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
> 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, 
> hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, 
> hashagg.stats.disabled.foreman.log
>
>
> Here is TPCH 04 with sf 1000:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
> operator in the new plan is taking longer.  One possible reason is that the 
> Hash Agg operator in the new plan is not using as many buckets as the old 
> plan did.  The memory usage of the Hash Agg operator in the new plan is using 
> less memory compared to the old plan.
> Here is the old plan:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
> order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
> rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 
> network, 2.2631985057468002E10 memory}, id = 5645
> 00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
> 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 
> memory}, id = 5644
> 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
> 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
> 01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
> 02-01SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
> 02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
> 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType 
> = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 
> memory}, id = 5639
> 02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
> cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5638
> 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : 
> rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 
> 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 
> cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5637
> 03-02  Project(o_orderpriority=[$1]) : rowType = 
> RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = 
> {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, 
> 3.25631968386048E12 network, 

[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled

2019-04-04 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7154:
--
Priority: Blocker  (was: Critical)

> TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
> -
>
> Key: DRILL-7154
> URL: https://issues.apache.org/jira/browse/DRILL-7154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Blocker
> Fix For: 1.16.0
>
> Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
> 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, 
> hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, 
> hashagg.stats.disabled.foreman.log
>
>
> Here is TPCH 04 with sf 1000:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
> operator in the new plan is taking longer.  One possible reason is that the 
> Hash Agg operator in the new plan is not using as many buckets as the old 
> plan did.  The memory usage of the Hash Agg operator in the new plan is using 
> less memory compared to the old plan.
> Here is the old plan:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
> order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
> rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 
> network, 2.2631985057468002E10 memory}, id = 5645
> 00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
> 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 
> memory}, id = 5644
> 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
> 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
> 01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
> 02-01SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
> 02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
> 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType 
> = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 
> memory}, id = 5639
> 02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
> cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5638
> 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : 
> rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 
> 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 
> cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5637
> 03-02  Project(o_orderpriority=[$1]) : rowType = 
> RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = 
> {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, 
> 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id = 5636
> 

[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled

2019-04-04 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7154:
--
Attachment: hashagg.nostats.data.log
hashagg.stats.disabled.data.log

> TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
> -
>
> Key: DRILL-7154
> URL: https://issues.apache.org/jira/browse/DRILL-7154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.16.0
>
> Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
> 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, 
> hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, 
> hashagg.stats.disabled.foreman.log
>
>
> Here is TPCH 04 with sf 1000:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
> operator in the new plan is taking longer.  One possible reason is that the 
> Hash Agg operator in the new plan is not using as many buckets as the old 
> plan did.  The memory usage of the Hash Agg operator in the new plan is using 
> less memory compared to the old plan.
> Here is the old plan:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
> order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
> rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 
> network, 2.2631985057468002E10 memory}, id = 5645
> 00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
> 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 
> memory}, id = 5644
> 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
> 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
> 01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
> 02-01SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
> 02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
> 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType 
> = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 
> memory}, id = 5639
> 02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
> cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5638
> 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : 
> rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 
> 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 
> cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5637
> 03-02  Project(o_orderpriority=[$1]) : rowType = 
> RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = 
> {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, 
> 3.25631968386048E12 

[jira] [Commented] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled

2019-04-04 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810306#comment-16810306
 ] 

Robert Hou commented on DRILL-7154:
---

Attached logs from the foreman because it is likely the planner determined the 
memory budget for the queries.

Renamed previous logs to have data.log suffix.

> TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
> -
>
> Key: DRILL-7154
> URL: https://issues.apache.org/jira/browse/DRILL-7154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.16.0
>
> Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
> 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, 
> hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, 
> hashagg.stats.disabled.foreman.log
>
>
> Here is TPCH 04 with sf 1000:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
> operator in the new plan is taking longer.  One possible reason is that the 
> Hash Agg operator in the new plan is not using as many buckets as the old 
> plan did.  The memory usage of the Hash Agg operator in the new plan is using 
> less memory compared to the old plan.
> Here is the old plan:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
> order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
> rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 
> network, 2.2631985057468002E10 memory}, id = 5645
> 00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
> 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 
> memory}, id = 5644
> 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
> 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
> 01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
> 02-01SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
> 02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
> 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType 
> = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 
> memory}, id = 5639
> 02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
> cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5638
> 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : 
> rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 
> 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 
> cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5637
> 03-02  Project(o_orderpriority=[$1]) : rowType = 
> RecordType(ANY o_orderpriority): rowcount = 3.75E8, 

[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled

2019-04-04 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7154:
--
Attachment: (was: hashagg.nostats.log)

> TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
> -
>
> Key: DRILL-7154
> URL: https://issues.apache.org/jira/browse/DRILL-7154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.16.0
>
> Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
> 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, 
> hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, 
> hashagg.stats.disabled.foreman.log
>
>
> Here is TPCH 04 with sf 1000:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
> operator in the new plan is taking longer.  One possible reason is that the 
> Hash Agg operator in the new plan is not using as many buckets as the old 
> plan did.  The memory usage of the Hash Agg operator in the new plan is using 
> less memory compared to the old plan.
> Here is the old plan:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
> order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
> rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 
> network, 2.2631985057468002E10 memory}, id = 5645
> 00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
> 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 
> memory}, id = 5644
> 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
> 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
> 01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
> 02-01SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
> 02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
> 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType 
> = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 
> memory}, id = 5639
> 02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
> cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5638
> 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : 
> rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 
> 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 
> cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5637
> 03-02  Project(o_orderpriority=[$1]) : rowType = 
> RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = 
> {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, 
> 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id 

[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled

2019-04-04 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7154:
--
Attachment: (was: hashagg.stats.disabled.log)

> TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
> -
>
> Key: DRILL-7154
> URL: https://issues.apache.org/jira/browse/DRILL-7154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.16.0
>
> Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
> 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, 
> hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, 
> hashagg.stats.disabled.foreman.log
>
>
> Here is TPCH 04 with sf 1000:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
> operator in the new plan is taking longer.  One possible reason is that the 
> Hash Agg operator in the new plan is not using as many buckets as the old 
> plan did.  The memory usage of the Hash Agg operator in the new plan is using 
> less memory compared to the old plan.
> Here is the old plan:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
> order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
> rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 
> network, 2.2631985057468002E10 memory}, id = 5645
> 00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
> 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 
> memory}, id = 5644
> 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
> 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
> 01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
> 02-01SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
> 02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
> 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType 
> = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 
> memory}, id = 5639
> 02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
> cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5638
> 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : 
> rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 
> 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 
> cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5637
> 03-02  Project(o_orderpriority=[$1]) : rowType = 
> RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = 
> {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, 
> 3.25631968386048E12 network, 1.5311985057468002E10 

[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled

2019-04-04 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7154:
--
Attachment: hashagg.nostats.foreman.log
hashagg.stats.disabled.foreman.log

> TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
> -
>
> Key: DRILL-7154
> URL: https://issues.apache.org/jira/browse/DRILL-7154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.16.0
>
> Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
> 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.foreman.log, 
> hashagg.nostats.log, hashagg.stats.disabled.foreman.log, 
> hashagg.stats.disabled.log
>
>
> Here is TPCH 04 with sf 1000:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
> operator in the new plan is taking longer.  One possible reason is that the 
> Hash Agg operator in the new plan is not using as many buckets as the old 
> plan did.  The memory usage of the Hash Agg operator in the new plan is using 
> less memory compared to the old plan.
> Here is the old plan:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
> order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
> rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 
> network, 2.2631985057468002E10 memory}, id = 5645
> 00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
> 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 
> memory}, id = 5644
> 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
> 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
> 01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
> 02-01SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
> 02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
> 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType 
> = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 
> memory}, id = 5639
> 02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
> cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5638
> 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : 
> rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 
> 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 
> cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5637
> 03-02  Project(o_orderpriority=[$1]) : rowType = 
> RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = 
> {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, 
> 3.25631968386048E12 

[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled

2019-04-04 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7154:
--
Summary: TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics 
are disabled  (was: TPCH query 4 and 17 take longer with sf 1000 when 
Statistics are disabled)

> TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
> -
>
> Key: DRILL-7154
> URL: https://issues.apache.org/jira/browse/DRILL-7154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.16.0
>
> Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
> 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.log, 
> hashagg.stats.disabled.log
>
>
> Here is TPCH 04 with sf 1000:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
> operator in the new plan is taking longer.  One possible reason is that the 
> Hash Agg operator in the new plan is not using as many buckets as the old 
> plan did.  The memory usage of the Hash Agg operator in the new plan is using 
> less memory compared to the old plan.
> Here is the old plan:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
> order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
> rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 
> network, 2.2631985057468002E10 memory}, id = 5645
> 00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
> 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 
> memory}, id = 5644
> 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
> 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
> 01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
> 02-01SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
> 02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
> 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType 
> = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 
> memory}, id = 5639
> 02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
> cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5638
> 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : 
> rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 
> 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 
> cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5637
> 03-02  Project(o_orderpriority=[$1]) : rowType = 
> RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = 
> {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, 
> 

[jira] [Created] (DRILL-7155) Create a standard logging message for batch sizes generated by individual operators

2019-04-04 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7155:
-

 Summary: Create a standard logging message for batch sizes 
generated by individual operators
 Key: DRILL-7155
 URL: https://issues.apache.org/jira/browse/DRILL-7155
 Project: Apache Drill
  Issue Type: Task
  Components: Execution - Relational Operators
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Robert Hou


QA reads log messages in drillbit.log to verify the sizes of data batches 
generated by individual operators.  These log messages need to be standardized 
so that each operator creates the same message.  This allows the QA test 
framework to verify the information in each message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7154) TPCH query 4 and 17 take longer with sf 1000 when Statistics are disabled

2019-04-04 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7154:
-

 Summary: TPCH query 4 and 17 take longer with sf 1000 when 
Statistics are disabled
 Key: DRILL-7154
 URL: https://issues.apache.org/jira/browse/DRILL-7154
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Boaz Ben-Zvi
 Fix For: 1.16.0
 Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.log, 
hashagg.stats.disabled.log

Here is TPCH 04 with sf 1000:
{noformat}
select
  o.o_orderpriority,
  count(*) as order_count
from
  orders o

where
  o.o_orderdate >= date '1996-10-01'
  and o.o_orderdate < date '1996-10-01' + interval '3' month
  and 
  exists (
select
  *
from
  lineitem l
where
  l.l_orderkey = o.o_orderkey
  and l.l_commitdate < l.l_receiptdate
  )
group by
  o.o_orderpriority
order by
  o.o_orderpriority;
{noformat}

TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
operator in the new plan is taking longer.  One possible reason is that the 
Hash Agg operator in the new plan is not using as many buckets as the old plan 
did.  The memory usage of the Hash Agg operator in the new plan is using less 
memory compared to the old plan.

Here is the old plan:
{noformat}
00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 network, 
2.2631985057468002E10 memory}, id = 5645
00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 memory}, 
id = 5644
00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
{1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
{1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
02-01SelectionVectorRemover : rowType = RecordType(ANY 
o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
{1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
{1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType = 
RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 memory}, 
id = 5639
02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 memory}, 
id = 5638
03-01HashAgg(group=[{0}], order_count=[COUNT()]) : rowType 
= RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 cpu, 
2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 memory}, 
id = 5637
03-02  Project(o_orderpriority=[$1]) : rowType = 
RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = 
{1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, 
3.25631968386048E12 network, 1.5311985057468002E10 memory}, id = 5636
03-03Project(o_orderkey=[$1], o_orderpriority=[$2], 
l_orderkey=[$0]) : rowType = RecordType(ANY o_orderkey, ANY o_orderpriority, 
ANY l_orderkey): rowcount = 3.75E8, cumulative cost = {1.8319476940441746E10 
rows, 8.108390595055101E10 cpu, 2.2499969127E10 io, 3.25631968386048E12 
network, 1.5311985057468002E10 memory}, id = 5635
03-04  HashJoin(condition=[=($1, $0)], 
joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY l_orderkey, 
ANY 

[jira] [Closed] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types

2019-04-03 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-7132.
-

> Metadata cache does not have correct min/max values for varchar and interval 
> data types
> ---
>
> Key: DRILL-7132
> URL: https://issues.apache.org/jira/browse/DRILL-7132
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: 0_0_10.parquet
>
>
> The parquet metadata cache does not have correct min/max values for varchar 
> and interval data types.
> I have attached a parquet file.  Here is what parquet tools shows for varchar:
> [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 
> average: 67 total: 67 (raw data: 65 saving -3%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 65 max: 65 average: 65 total: 65
>   column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "varchar_col" ],
> "minValue" : "aW9lZ2pOSkt2bmtk",
> "maxValue" : "aW9lZ2pOSkt2bmtk",
> "nulls" : 0
> Here is what parquet tools shows for interval:
> [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 
> average: 52 total: 52 (raw data: 50 saving -4%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 50 max: 50 average: 50 total: 50
>   column values statistics: min: P18582D, max: P18582D, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "interval_col" ],
> "minValue" : "UDE4NTgyRA==",
> "maxValue" : "UDE4NTgyRA==",
> "nulls" : 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7139) Date_add() can produce incorrect results when adding to a timestamp

2019-03-28 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7139:
--
Description: 
I am using date_add() to create a sequence of timestamps:
{noformat}
select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') 
as interval minute)) timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1970-01-25 20:31:12.704  |
+--+
1 row selected (0.121 seconds)
{noformat}

When I add one more, I get an older timestamp:
{noformat}
select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107375,'M') 
as interval minute)) timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1969-12-07 03:29:25.408  |
+--+
1 row selected (0.126 seconds)
{noformat}

  was:
I am using date_add() to create a sequence of timestamps:
{noformat}
select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') 
as interval minute)) timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1970-01-25 20:31:12.704  |
+--+
1 row selected (0.121 seconds)
{noformat}

When I add one more, I get an older timestamp:
{noformat}
0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 
00:00:00', cast(concat('PT',107375,'M') as interval minute)) timestamp_id from 
(values(1));
+--+
|   timestamp_id   |
+--+
| 1969-12-07 03:29:25.408  |
+--+
1 row selected (0.126 seconds)
{noformat}


> Date_add() can produce incorrect results when adding to a timestamp
> ---
>
> Key: DRILL-7139
> URL: https://issues.apache.org/jira/browse/DRILL-7139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Pritesh Maker
>Priority: Major
>
> I am using date_add() to create a sequence of timestamps:
> {noformat}
> select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') 
> as interval minute)) timestamp_id from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1970-01-25 20:31:12.704  |
> +--+
> 1 row selected (0.121 seconds)
> {noformat}
> When I add one more, I get an older timestamp:
> {noformat}
> select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107375,'M') 
> as interval minute)) timestamp_id from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1969-12-07 03:29:25.408  |
> +--+
> 1 row selected (0.126 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7139) Date_add() can produce incorrect results when adding to a timestamp

2019-03-28 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7139:
--
Description: 
I am using date_add() to create a sequence of timestamps:
{noformat}
select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') 
as interval minute)) timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1970-01-25 20:31:12.704  |
+--+
1 row selected (0.121 seconds)
{noformat}

When I add one more, I get an older timestamp:
{noformat}
0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 
00:00:00', cast(concat('PT',107375,'M') as interval minute)) timestamp_id from 
(values(1));
+--+
|   timestamp_id   |
+--+
| 1969-12-07 03:29:25.408  |
+--+
1 row selected (0.126 seconds)
{noformat}

  was:
I am using date_add() to create a sequence of timestamps:
{noformat}
select date_add(timestamp '1970-01-01 00:00:00', 
cast(concat('PT',{color:#f79232}107374{color},'M') as interval minute)) 
timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1970-01-25 20:31:12.704  |
+--+
1 row selected (0.121 seconds)
{noformat}

When I add one more, I get an older timestamp:
{noformat}
0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 
00:00:00', cast(concat('PT',{color:#f79232}107375{color},'M') as interval 
minute)) timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1969-12-07 03:29:25.408  |
+--+
1 row selected (0.126 seconds)
{noformat}


> Date_add() can produce incorrect results when adding to a timestamp
> ---
>
> Key: DRILL-7139
> URL: https://issues.apache.org/jira/browse/DRILL-7139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Pritesh Maker
>Priority: Major
>
> I am using date_add() to create a sequence of timestamps:
> {noformat}
> select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') 
> as interval minute)) timestamp_id from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1970-01-25 20:31:12.704  |
> +--+
> 1 row selected (0.121 seconds)
> {noformat}
> When I add one more, I get an older timestamp:
> {noformat}
> 0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 
> 00:00:00', cast(concat('PT',107375,'M') as interval minute)) timestamp_id 
> from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1969-12-07 03:29:25.408  |
> +--+
> 1 row selected (0.126 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7139) Date_add() can produce incorrect results when adding to a timestamp

2019-03-28 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7139:
--
Summary: Date_add() can produce incorrect results when adding to a 
timestamp  (was: Date)add produces Incorrect results when adding to a timestamp)

> Date_add() can produce incorrect results when adding to a timestamp
> ---
>
> Key: DRILL-7139
> URL: https://issues.apache.org/jira/browse/DRILL-7139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Pritesh Maker
>Priority: Major
>
> I am using date_add() to create a sequence of timestamps:
> {noformat}
> select date_add(timestamp '1970-01-01 00:00:00', 
> cast(concat('PT',{color:#f79232}107374{color},'M') as interval minute)) 
> timestamp_id from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1970-01-25 20:31:12.704  |
> +--+
> 1 row selected (0.121 seconds)
> {noformat}
> When I add one more, I get an older timestamp:
> {noformat}
> 0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 
> 00:00:00', cast(concat('PT',{color:#f79232}107375{color},'M') as interval 
> minute)) timestamp_id from (values(1));
> +--+
> |   timestamp_id   |
> +--+
> | 1969-12-07 03:29:25.408  |
> +--+
> 1 row selected (0.126 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7139) Date)add produces Incorrect results when adding to a timestamp

2019-03-28 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7139:
-

 Summary: Date)add produces Incorrect results when adding to a 
timestamp
 Key: DRILL-7139
 URL: https://issues.apache.org/jira/browse/DRILL-7139
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.15.0
Reporter: Robert Hou
Assignee: Pritesh Maker


I am using date_add() to create a sequence of timestamps:
{noformat}
select date_add(timestamp '1970-01-01 00:00:00', 
cast(concat('PT',{color:#f79232}107374{color},'M') as interval minute)) 
timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1970-01-25 20:31:12.704  |
+--+
1 row selected (0.121 seconds)
{noformat}

When I add one more, I get an older timestamp:
{noformat}
0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 
00:00:00', cast(concat('PT',{color:#f79232}107375{color},'M') as interval 
minute)) timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1969-12-07 03:29:25.408  |
+--+
1 row selected (0.126 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7136) Num_buckets for HashAgg in profile may be inaccurate

2019-03-27 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7136:
--
Description: 
I ran TPCH query 17 with sf 1000.  Here is the query:
{noformat}
select
  sum(l.l_extendedprice) / 7.0 as avg_yearly
from
  lineitem l,
  part p
where
  p.p_partkey = l.l_partkey
  and p.p_brand = 'Brand#13'
  and p.p_container = 'JUMBO CAN'
  and l.l_quantity < (
select
  0.2 * avg(l2.l_quantity)
from
  lineitem l2
where
  l2.l_partkey = p.p_partkey
  );
{noformat}

One of the hash agg operators has resized 6 times.  It should have 4M buckets.  
But the profile shows it has 64K buckets.



I have attached a sample profile.  In this profile, the hash agg operator is 
(04-02).
{noformat}
Operator Metrics
Minor Fragment  NUM_BUCKETS NUM_ENTRIES NUM_RESIZING
RESIZING_TIME_MSNUM_PARTITIONS  SPILLED_PARTITIONS  SPILL_MB
SPILL_CYCLE INPUT_BATCH_COUNT   AVG_INPUT_BATCH_BYTES   
AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT  OUTPUT_BATCH_COUNT  
AVG_OUTPUT_BATCH_BYTES  AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT
04-00-0265,536 748,746  6   364 1   
582 0   813 582,653 18  26,316,456  401 1,631,943   
25  26,176,350
{noformat}



  was:
I ran TPCH query 17 with sf 1000.  Here is the query:
{noformat}
select
  sum(l.l_extendedprice) / 7.0 as avg_yearly
from
  lineitem l,
  part p
where
  p.p_partkey = l.l_partkey
  and p.p_brand = 'Brand#13'
  and p.p_container = 'JUMBO CAN'
  and l.l_quantity < (
select
  0.2 * avg(l2.l_quantity)
from
  lineitem l2
where
  l2.l_partkey = p.p_partkey
  );
{noformat}

One of the hash agg operators has resized 6 times.  It should have 4M buckets.  
But the profile shows it has 64K buckets.



I have attached a sample profile.  In this profile, the hash agg operator is 
(04-02).
{noformat}
Operator Metrics
Minor Fragment  NUM_BUCKETS NUM_ENTRIES NUM_RESIZING
RESIZING_TIME_MSNUM_PARTITIONS  SPILLED_PARTITIONS  SPILL_MB
SPILL_CYCLE INPUT_BATCH_COUNT   AVG_INPUT_BATCH_BYTES   
AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT  OUTPUT_BATCH_COUNT  
AVG_OUTPUT_BATCH_BYTES  AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT
04-00-0265,536  748,746 6   364 1   582 0   
813 582,653 18  26,316,456  401 1,631,943   25  
26,176,350
{noformat}




> Num_buckets for HashAgg in profile may be inaccurate
> 
>
> Key: DRILL-7136
> URL: https://issues.apache.org/jira/browse/DRILL-7136
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Pritesh Maker
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill
>
>
> I ran TPCH query 17 with sf 1000.  Here is the query:
> {noformat}
> select
>   sum(l.l_extendedprice) / 7.0 as avg_yearly
> from
>   lineitem l,
>   part p
> where
>   p.p_partkey = l.l_partkey
>   and p.p_brand = 'Brand#13'
>   and p.p_container = 'JUMBO CAN'
>   and l.l_quantity < (
> select
>   0.2 * avg(l2.l_quantity)
> from
>   lineitem l2
> where
>   l2.l_partkey = p.p_partkey
>   );
> {noformat}
> One of the hash agg operators has resized 6 times.  It should have 4M 
> buckets.  But the profile shows it has 64K buckets.
> I have attached a sample profile.  In this profile, the hash agg operator is 
> (04-02).
> {noformat}
> Operator Metrics
> Minor FragmentNUM_BUCKETS NUM_ENTRIES NUM_RESIZING
> RESIZING_TIME_MSNUM_PARTITIONS  SPILLED_PARTITIONS  SPILL_MB  
>   SPILL_CYCLE INPUT_BATCH_COUNT   AVG_INPUT_BATCH_BYTES   
> AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT  OUTPUT_BATCH_COUNT  
> AVG_OUTPUT_BATCH_BYTES  AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT
> 04-00-02  65,536 748,746  6   364 1   
> 582 0   813 582,653 18  26,316,456  401 1,631,943 
>   25  26,176,350
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7136) Num_buckets for HashAgg in profile may be inaccurate

2019-03-27 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7136:
-

 Summary: Num_buckets for HashAgg in profile may be inaccurate
 Key: DRILL-7136
 URL: https://issues.apache.org/jira/browse/DRILL-7136
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Fix For: 1.16.0
 Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill

I ran TPCH query 17 with sf 1000.  Here is the query:
{noformat}
select
  sum(l.l_extendedprice) / 7.0 as avg_yearly
from
  lineitem l,
  part p
where
  p.p_partkey = l.l_partkey
  and p.p_brand = 'Brand#13'
  and p.p_container = 'JUMBO CAN'
  and l.l_quantity < (
select
  0.2 * avg(l2.l_quantity)
from
  lineitem l2
where
  l2.l_partkey = p.p_partkey
  );
{noformat}

One of the hash agg operators has resized 6 times.  It should have 4M buckets.  
But the profile shows it has 64K buckets.



I have attached a sample profile.  In this profile, the hash agg operator is 
(04-02).
{noformat}
Operator Metrics
Minor Fragment  NUM_BUCKETS NUM_ENTRIES NUM_RESIZING
RESIZING_TIME_MSNUM_PARTITIONS  SPILLED_PARTITIONS  SPILL_MB
SPILL_CYCLE INPUT_BATCH_COUNT   AVG_INPUT_BATCH_BYTES   
AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT  OUTPUT_BATCH_COUNT  
AVG_OUTPUT_BATCH_BYTES  AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT
04-00-0265,536  748,746 6   364 1   582 0   
813 582,653 18  26,316,456  401 1,631,943   25  
26,176,350
{noformat}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7108) With statistics enabled TPCH 16 has two additional exchange operators

2019-03-25 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801040#comment-16801040
 ] 

Robert Hou commented on DRILL-7108:
---

I have verified this fix.

> With statistics enabled TPCH 16 has two additional exchange operators
> -
>
> Key: DRILL-7108
> URL: https://issues.apache.org/jira/browse/DRILL-7108
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> TPCH 16 with sf 100 runs 14% slower.  Here is the query:
> {noformat}
> select
>   p.p_brand,
>   p.p_type,
>   p.p_size,
>   count(distinct ps.ps_suppkey) as supplier_cnt
> from
>   partsupp ps,
>   part p
> where
>   p.p_partkey = ps.ps_partkey
>   and p.p_brand <> 'Brand#21'
>   and p.p_type not like 'MEDIUM PLATED%'
>   and p.p_size in (38, 2, 8, 31, 44, 5, 14, 24)
>   and ps.ps_suppkey not in (
> select
>   s.s_suppkey
> from
>   supplier s
> where
>   s.s_comment like '%Customer%Complaints%'
>   )
> group by
>   p.p_brand,
>   p.p_type,
>   p.p_size
> order by
>   supplier_cnt desc,
>   p.p_brand,
>   p.p_type,
>   p.p_size;
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-7108) With statistics enabled TPCH 16 has two additional exchange operators

2019-03-25 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-7108.
-

> With statistics enabled TPCH 16 has two additional exchange operators
> -
>
> Key: DRILL-7108
> URL: https://issues.apache.org/jira/browse/DRILL-7108
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> TPCH 16 with sf 100 runs 14% slower.  Here is the query:
> {noformat}
> select
>   p.p_brand,
>   p.p_type,
>   p.p_size,
>   count(distinct ps.ps_suppkey) as supplier_cnt
> from
>   partsupp ps,
>   part p
> where
>   p.p_partkey = ps.ps_partkey
>   and p.p_brand <> 'Brand#21'
>   and p.p_type not like 'MEDIUM PLATED%'
>   and p.p_size in (38, 2, 8, 31, 44, 5, 14, 24)
>   and ps.ps_suppkey not in (
> select
>   s.s_suppkey
> from
>   supplier s
> where
>   s.s_comment like '%Customer%Complaints%'
>   )
> group by
>   p.p_brand,
>   p.p_type,
>   p.p_size
> order by
>   supplier_cnt desc,
>   p.p_brand,
>   p.p_type,
>   p.p_size;
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types

2019-03-22 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-7132.
---
Resolution: Not A Problem

> Metadata cache does not have correct min/max values for varchar and interval 
> data types
> ---
>
> Key: DRILL-7132
> URL: https://issues.apache.org/jira/browse/DRILL-7132
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: 0_0_10.parquet
>
>
> The parquet metadata cache does not have correct min/max values for varchar 
> and interval data types.
> I have attached a parquet file.  Here is what parquet tools shows for varchar:
> [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 
> average: 67 total: 67 (raw data: 65 saving -3%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 65 max: 65 average: 65 total: 65
>   column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "varchar_col" ],
> "minValue" : "aW9lZ2pOSkt2bmtk",
> "maxValue" : "aW9lZ2pOSkt2bmtk",
> "nulls" : 0
> Here is what parquet tools shows for interval:
> [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 
> average: 52 total: 52 (raw data: 50 saving -4%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 50 max: 50 average: 50 total: 50
>   column values statistics: min: P18582D, max: P18582D, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "interval_col" ],
> "minValue" : "UDE4NTgyRA==",
> "maxValue" : "UDE4NTgyRA==",
> "nulls" : 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types

2019-03-22 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799355#comment-16799355
 ] 

Robert Hou commented on DRILL-7132:
---

While I agree that there is no requirement to store data in human-readable 
format, there are advantages when it comes to support and debugging customer 
issues.  But I assume you considered this and decided the pros of using a 
different format were more important.

> Metadata cache does not have correct min/max values for varchar and interval 
> data types
> ---
>
> Key: DRILL-7132
> URL: https://issues.apache.org/jira/browse/DRILL-7132
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: 0_0_10.parquet
>
>
> The parquet metadata cache does not have correct min/max values for varchar 
> and interval data types.
> I have attached a parquet file.  Here is what parquet tools shows for varchar:
> [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 
> average: 67 total: 67 (raw data: 65 saving -3%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 65 max: 65 average: 65 total: 65
>   column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "varchar_col" ],
> "minValue" : "aW9lZ2pOSkt2bmtk",
> "maxValue" : "aW9lZ2pOSkt2bmtk",
> "nulls" : 0
> Here is what parquet tools shows for interval:
> [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 
> average: 52 total: 52 (raw data: 50 saving -4%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 50 max: 50 average: 50 total: 50
>   column values statistics: min: P18582D, max: P18582D, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "interval_col" ],
> "minValue" : "UDE4NTgyRA==",
> "maxValue" : "UDE4NTgyRA==",
> "nulls" : 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types

2019-03-22 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799353#comment-16799353
 ] 

Robert Hou commented on DRILL-7132:
---

The online decoder works.

Thanks.

--Robert

> Metadata cache does not have correct min/max values for varchar and interval 
> data types
> ---
>
> Key: DRILL-7132
> URL: https://issues.apache.org/jira/browse/DRILL-7132
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: 0_0_10.parquet
>
>
> The parquet metadata cache does not have correct min/max values for varchar 
> and interval data types.
> I have attached a parquet file.  Here is what parquet tools shows for varchar:
> [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 
> average: 67 total: 67 (raw data: 65 saving -3%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 65 max: 65 average: 65 total: 65
>   column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "varchar_col" ],
> "minValue" : "aW9lZ2pOSkt2bmtk",
> "maxValue" : "aW9lZ2pOSkt2bmtk",
> "nulls" : 0
> Here is what parquet tools shows for interval:
> [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 
> average: 52 total: 52 (raw data: 50 saving -4%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 50 max: 50 average: 50 total: 50
>   column values statistics: min: P18582D, max: P18582D, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "interval_col" ],
> "minValue" : "UDE4NTgyRA==",
> "maxValue" : "UDE4NTgyRA==",
> "nulls" : 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types

2019-03-22 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799343#comment-16799343
 ] 

Robert Hou commented on DRILL-7132:
---

[~vvysotskyi] Sounds good.

How does QA verify that the values are correct?  We have some metadata cache 
tests that are failing, and they should be re-verified with the new base24 
values.  And I'm about to add some new ones for an enhancement to the metadata 
cache feature.

> Metadata cache does not have correct min/max values for varchar and interval 
> data types
> ---
>
> Key: DRILL-7132
> URL: https://issues.apache.org/jira/browse/DRILL-7132
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: 0_0_10.parquet
>
>
> The parquet metadata cache does not have correct min/max values for varchar 
> and interval data types.
> I have attached a parquet file.  Here is what parquet tools shows for varchar:
> [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 
> average: 67 total: 67 (raw data: 65 saving -3%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 65 max: 65 average: 65 total: 65
>   column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "varchar_col" ],
> "minValue" : "aW9lZ2pOSkt2bmtk",
> "maxValue" : "aW9lZ2pOSkt2bmtk",
> "nulls" : 0
> Here is what parquet tools shows for interval:
> [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 
> average: 52 total: 52 (raw data: 50 saving -4%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 50 max: 50 average: 50 total: 50
>   column values statistics: min: P18582D, max: P18582D, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "interval_col" ],
> "minValue" : "UDE4NTgyRA==",
> "maxValue" : "UDE4NTgyRA==",
> "nulls" : 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types

2019-03-22 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7132:
-

 Summary: Metadata cache does not have correct min/max values for 
varchar and interval data types
 Key: DRILL-7132
 URL: https://issues.apache.org/jira/browse/DRILL-7132
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Affects Versions: 1.14.0
Reporter: Robert Hou
 Fix For: 1.17.0
 Attachments: 0_0_10.parquet

The parquet metadata cache does not have correct min/max values for varchar and 
interval data types.

I have attached a parquet file.  Here is what parquet tools shows for varchar:

[varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 
average: 67 total: 67 (raw data: 65 saving -3%)
  values: min: 1 max: 1 average: 1 total: 1
  uncompressed: min: 65 max: 65 average: 65 total: 65
  column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0

Here is what the metadata cache file shows:

"name" : [ "varchar_col" ],
"minValue" : "aW9lZ2pOSkt2bmtk",
"maxValue" : "aW9lZ2pOSkt2bmtk",
"nulls" : 0

Here is what parquet tools shows for interval:

[interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 
average: 52 total: 52 (raw data: 50 saving -4%)
  values: min: 1 max: 1 average: 1 total: 1
  uncompressed: min: 50 max: 50 average: 50 total: 50
  column values statistics: min: P18582D, max: P18582D, num_nulls: 0

Here is what the metadata cache file shows:

"name" : [ "interval_col" ],
"minValue" : "UDE4NTgyRA==",
"maxValue" : "UDE4NTgyRA==",
"nulls" : 0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7121) TPCH 4 takes longer

2019-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7121:
-

 Summary: TPCH 4 takes longer
 Key: DRILL-7121
 URL: https://issues.apache.org/jira/browse/DRILL-7121
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.16.0


Here is TPCH 4 with sf 100:
{noformat}
select
  o.o_orderpriority,
  count(*) as order_count
from
  orders o

where
  o.o_orderdate >= date '1996-10-01'
  and o.o_orderdate < date '1996-10-01' + interval '3' month
  and 
  exists (
select
  *
from
  lineitem l
where
  l.l_orderkey = o.o_orderkey
  and l.l_commitdate < l.l_receiptdate
  )
group by
  o.o_orderpriority
order by
  o.o_orderpriority;
{noformat}

The plan has changed when Statistics is disabled.   A Hash Agg and a Broadcast 
Exchange have been added.  These two operators expand the number of rows from 
the lineitem table from 137M to 9B rows.   This forces the hash join to use 6GB 
of memory instead of 30 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7120) Query fails with ChannelClosedException when Statistics is disabled.

2019-03-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7120:
--
Summary: Query fails with ChannelClosedException when Statistics is 
disabled.  (was: Query fails with ChannelClosedException)

> Query fails with ChannelClosedException when Statistics is disabled.
> 
>
> Key: DRILL-7120
> URL: https://issues.apache.org/jira/browse/DRILL-7120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> TPCH query 5 fails at sf100 when Statistics is disabled.  Here is the query:
> {noformat}
> select
>   n.n_name,
>   sum(l.l_extendedprice * (1 - l.l_discount)) as revenue
> from
>   customer c,
>   orders o,
>   lineitem l,
>   supplier s,
>   nation n,
>   region r
> where
>   c.c_custkey = o.o_custkey
>   and l.l_orderkey = o.o_orderkey
>   and l.l_suppkey = s.s_suppkey
>   and c.c_nationkey = s.s_nationkey
>   and s.s_nationkey = n.n_nationkey
>   and n.n_regionkey = r.r_regionkey
>   and r.r_name = 'EUROPE'
>   and o.o_orderdate >= date '1997-01-01'
>   and o.o_orderdate < date '1997-01-01' + interval '1' year
> group by
>   n.n_name
> order by
>   revenue desc;
> {noformat}
> This is the error from drillbit.log:
> {noformat}
> 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> 
> FINISHED
> 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State to report: FINISHED
> 2019-03-04 18:17:51,454 [BitServer-13] WARN  
> o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
> stream due to memory limits.  Current Allocation: 262144.
> 2019-03-04 18:17:51,454 [BitServer-13] ERROR 
> o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer.
> 2019-03-04 18:17:51,463 [BitServer-13] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.120.104:31012 <--> /10.10.120.106:53048 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: 
> org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating 
> buffer.
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271)
>  ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) 
> [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> 

[jira] [Created] (DRILL-7123) TPCDS query 83 runs slower when Statistics is disabled

2019-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7123:
-

 Summary: TPCDS query 83 runs slower when Statistics is disabled
 Key: DRILL-7123
 URL: https://issues.apache.org/jira/browse/DRILL-7123
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.16.0


Query is TPCDS 83 with sf 100:
{noformat}
WITH sr_items 
 AS (SELECT i_item_id   item_id, 
Sum(sr_return_quantity) sr_item_qty 
 FROM   store_returns, 
item, 
date_dim 
 WHERE  sr_item_sk = i_item_sk 
AND d_date IN (SELECT d_date 
   FROM   date_dim 
   WHERE  d_week_seq IN (SELECT d_week_seq 
 FROM   date_dim 
 WHERE 
  d_date IN ( '1999-06-30', 
  '1999-08-28', 
  '1999-11-18' 
))) 
AND sr_returned_date_sk = d_date_sk 
 GROUP  BY i_item_id), 
 cr_items 
 AS (SELECT i_item_id   item_id, 
Sum(cr_return_quantity) cr_item_qty 
 FROM   catalog_returns, 
item, 
date_dim 
 WHERE  cr_item_sk = i_item_sk 
AND d_date IN (SELECT d_date 
   FROM   date_dim 
   WHERE  d_week_seq IN (SELECT d_week_seq 
 FROM   date_dim 
 WHERE 
  d_date IN ( '1999-06-30', 
  '1999-08-28', 
  '1999-11-18' 
))) 
AND cr_returned_date_sk = d_date_sk 
 GROUP  BY i_item_id), 
 wr_items 
 AS (SELECT i_item_id   item_id, 
Sum(wr_return_quantity) wr_item_qty 
 FROM   web_returns, 
item, 
date_dim 
 WHERE  wr_item_sk = i_item_sk 
AND d_date IN (SELECT d_date 
   FROM   date_dim 
   WHERE  d_week_seq IN (SELECT d_week_seq 
 FROM   date_dim 
 WHERE 
  d_date IN ( '1999-06-30', 
  '1999-08-28', 
  '1999-11-18' 
))) 
AND wr_returned_date_sk = d_date_sk 
 GROUP  BY i_item_id) 
SELECT sr_items.item_id, 
   sr_item_qty, 
   sr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 
* 
   100 sr_dev, 
   cr_item_qty, 
   cr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 
* 
   100 cr_dev, 
   wr_item_qty, 
   wr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 
* 
   100 wr_dev, 
   ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 
   average 
FROM   sr_items, 
   cr_items, 
   wr_items 
WHERE  sr_items.item_id = cr_items.item_id 
   AND sr_items.item_id = wr_items.item_id 
ORDER  BY sr_items.item_id, 
  sr_item_qty
LIMIT 100; 
{noformat}

The number of threads for major fragments 1 and 2 has changed when Statistics 
is disabled.  The number of minor fragments has been reduced from 10 and 15 
fragments down to 3 fragments.  Rowcount has changed for major fragment 2 from 
1439754.0 down to 287950.8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7120) Query fails with ChannelClosedException when Statistics is disabled

2019-03-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7120:
--
Summary: Query fails with ChannelClosedException when Statistics is 
disabled  (was: Query fails with ChannelClosedException when Statistics is 
disabled.)

> Query fails with ChannelClosedException when Statistics is disabled
> ---
>
> Key: DRILL-7120
> URL: https://issues.apache.org/jira/browse/DRILL-7120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> TPCH query 5 fails at sf100 when Statistics is disabled.  Here is the query:
> {noformat}
> select
>   n.n_name,
>   sum(l.l_extendedprice * (1 - l.l_discount)) as revenue
> from
>   customer c,
>   orders o,
>   lineitem l,
>   supplier s,
>   nation n,
>   region r
> where
>   c.c_custkey = o.o_custkey
>   and l.l_orderkey = o.o_orderkey
>   and l.l_suppkey = s.s_suppkey
>   and c.c_nationkey = s.s_nationkey
>   and s.s_nationkey = n.n_nationkey
>   and n.n_regionkey = r.r_regionkey
>   and r.r_name = 'EUROPE'
>   and o.o_orderdate >= date '1997-01-01'
>   and o.o_orderdate < date '1997-01-01' + interval '1' year
> group by
>   n.n_name
> order by
>   revenue desc;
> {noformat}
> This is the error from drillbit.log:
> {noformat}
> 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> 
> FINISHED
> 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State to report: FINISHED
> 2019-03-04 18:17:51,454 [BitServer-13] WARN  
> o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
> stream due to memory limits.  Current Allocation: 262144.
> 2019-03-04 18:17:51,454 [BitServer-13] ERROR 
> o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer.
> 2019-03-04 18:17:51,463 [BitServer-13] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.120.104:31012 <--> /10.10.120.106:53048 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: 
> org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating 
> buffer.
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271)
>  ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>  [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) 
> [netty-transport-4.0.48.Final.jar:4.0.48.Final]

[jira] [Updated] (DRILL-7121) TPCH 4 takes longer when Statistics is disabled.

2019-03-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7121:
--
Summary: TPCH 4 takes longer when Statistics is disabled.  (was: TPCH 4 
takes longer)

> TPCH 4 takes longer when Statistics is disabled.
> 
>
> Key: DRILL-7121
> URL: https://issues.apache.org/jira/browse/DRILL-7121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Here is TPCH 4 with sf 100:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> The plan has changed when Statistics is disabled.   A Hash Agg and a 
> Broadcast Exchange have been added.  These two operators expand the number of 
> rows from the lineitem table from 137M to 9B rows.   This forces the hash 
> join to use 6GB of memory instead of 30 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7122) TPCDS queries 29 25 17 are slower when Statistics is disabled.

2019-03-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou reassigned DRILL-7122:
-

 Assignee: Gautam Parai
Affects Version/s: 1.16.0
 Priority: Blocker  (was: Major)
Fix Version/s: 1.16.0
  Description: 
Here is query 29 with sf 100:
{noformat}
SELECT i_item_id, 
   i_item_desc, 
   s_store_id, 
   s_store_name, 
   Avg(ss_quantity)AS store_sales_quantity, 
   Avg(sr_return_quantity) AS store_returns_quantity, 
   Avg(cs_quantity)AS catalog_sales_quantity 
FROM   store_sales, 
   store_returns, 
   catalog_sales, 
   date_dim d1, 
   date_dim d2, 
   date_dim d3, 
   store, 
   item 
WHERE  d1.d_moy = 4 
   AND d1.d_year = 1998 
   AND d1.d_date_sk = ss_sold_date_sk 
   AND i_item_sk = ss_item_sk 
   AND s_store_sk = ss_store_sk 
   AND ss_customer_sk = sr_customer_sk 
   AND ss_item_sk = sr_item_sk 
   AND ss_ticket_number = sr_ticket_number 
   AND sr_returned_date_sk = d2.d_date_sk 
   AND d2.d_moy BETWEEN 4 AND 4 + 3 
   AND d2.d_year = 1998 
   AND sr_customer_sk = cs_bill_customer_sk 
   AND sr_item_sk = cs_item_sk 
   AND cs_sold_date_sk = d3.d_date_sk 
   AND d3.d_year IN ( 1998, 1998 + 1, 1998 + 2 ) 
GROUP  BY i_item_id, 
  i_item_desc, 
  s_store_id, 
  s_store_name 
ORDER  BY i_item_id, 
  i_item_desc, 
  s_store_id, 
  s_store_name
LIMIT 100; 
{noformat}

The hash join order has changed.  As a result, one of the hash joins does not 
seem to reduce the number of rows significantly.
  Component/s: Query Planning & Optimization

> TPCDS queries 29 25 17 are slower when Statistics is disabled.
> --
>
> Key: DRILL-7122
> URL: https://issues.apache.org/jira/browse/DRILL-7122
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Here is query 29 with sf 100:
> {noformat}
> SELECT i_item_id, 
>i_item_desc, 
>s_store_id, 
>s_store_name, 
>Avg(ss_quantity)AS store_sales_quantity, 
>Avg(sr_return_quantity) AS store_returns_quantity, 
>Avg(cs_quantity)AS catalog_sales_quantity 
> FROM   store_sales, 
>store_returns, 
>catalog_sales, 
>date_dim d1, 
>date_dim d2, 
>date_dim d3, 
>store, 
>item 
> WHERE  d1.d_moy = 4 
>AND d1.d_year = 1998 
>AND d1.d_date_sk = ss_sold_date_sk 
>AND i_item_sk = ss_item_sk 
>AND s_store_sk = ss_store_sk 
>AND ss_customer_sk = sr_customer_sk 
>AND ss_item_sk = sr_item_sk 
>AND ss_ticket_number = sr_ticket_number 
>AND sr_returned_date_sk = d2.d_date_sk 
>AND d2.d_moy BETWEEN 4 AND 4 + 3 
>AND d2.d_year = 1998 
>AND sr_customer_sk = cs_bill_customer_sk 
>AND sr_item_sk = cs_item_sk 
>AND cs_sold_date_sk = d3.d_date_sk 
>AND d3.d_year IN ( 1998, 1998 + 1, 1998 + 2 ) 
> GROUP  BY i_item_id, 
>   i_item_desc, 
>   s_store_id, 
>   s_store_name 
> ORDER  BY i_item_id, 
>   i_item_desc, 
>   s_store_id, 
>   s_store_name
> LIMIT 100; 
> {noformat}
> The hash join order has changed.  As a result, one of the hash joins does not 
> seem to reduce the number of rows significantly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7122) TPCDS queries 29 25 17 are slower when Statistics is disabled.

2019-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7122:
-

 Summary: TPCDS queries 29 25 17 are slower when Statistics is 
disabled.
 Key: DRILL-7122
 URL: https://issues.apache.org/jira/browse/DRILL-7122
 Project: Apache Drill
  Issue Type: Bug
Reporter: Robert Hou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7120) Query fails with ChannelClosedException

2019-03-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-7120:
--
Description: 
TPCH query 5 fails at sf100 when Statistics is disabled.  Here is the query:
{noformat}
select
  n.n_name,
  sum(l.l_extendedprice * (1 - l.l_discount)) as revenue
from
  customer c,
  orders o,
  lineitem l,
  supplier s,
  nation n,
  region r
where
  c.c_custkey = o.o_custkey
  and l.l_orderkey = o.o_orderkey
  and l.l_suppkey = s.s_suppkey
  and c.c_nationkey = s.s_nationkey
  and s.s_nationkey = n.n_nationkey
  and n.n_regionkey = r.r_regionkey
  and r.r_name = 'EUROPE'
  and o.o_orderdate >= date '1997-01-01'
  and o.o_orderdate < date '1997-01-01' + interval '1' year
group by
  n.n_name
order by
  revenue desc;
{noformat}

This is the error from drillbit.log:
{noformat}
2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 
23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> 
FINISHED
2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: 
State to report: FINISHED
2019-03-04 18:17:51,454 [BitServer-13] WARN  
o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
stream due to memory limits.  Current Allocation: 262144.
2019-03-04 18:17:51,454 [BitServer-13] ERROR o.a.drill.exec.rpc.data.DataServer 
- Out of memory in RPC layer.
2019-03-04 18:17:51,463 [BitServer-13] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /10.10.120.104:31012 <--> 
/10.10.120.106:53048 (data server).  Closing connection.
io.netty.handler.codec.DecoderException: 
org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating buffer.
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271)
 ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
 [netty-common-4.0.48.Final.jar:4.0.48.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112]
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure 
allocating buffer.
at 
io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:67)
 

[jira] [Created] (DRILL-7120) Query fails with ChannelClosedException

2019-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7120:
-

 Summary: Query fails with ChannelClosedException
 Key: DRILL-7120
 URL: https://issues.apache.org/jira/browse/DRILL-7120
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.16.0


TPCH query 5 fails at sf100.  Here is the query:
{noformat}
select
  n.n_name,
  sum(l.l_extendedprice * (1 - l.l_discount)) as revenue
from
  customer c,
  orders o,
  lineitem l,
  supplier s,
  nation n,
  region r
where
  c.c_custkey = o.o_custkey
  and l.l_orderkey = o.o_orderkey
  and l.l_suppkey = s.s_suppkey
  and c.c_nationkey = s.s_nationkey
  and s.s_nationkey = n.n_nationkey
  and n.n_regionkey = r.r_regionkey
  and r.r_name = 'EUROPE'
  and o.o_orderdate >= date '1997-01-01'
  and o.o_orderdate < date '1997-01-01' + interval '1' year
group by
  n.n_name
order by
  revenue desc;
{noformat}

This is the error from drillbit.log:
{noformat}
2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 
23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> 
FINISHED
2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: 
State to report: FINISHED
2019-03-04 18:17:51,454 [BitServer-13] WARN  
o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
stream due to memory limits.  Current Allocation: 262144.
2019-03-04 18:17:51,454 [BitServer-13] ERROR o.a.drill.exec.rpc.data.DataServer 
- Out of memory in RPC layer.
2019-03-04 18:17:51,463 [BitServer-13] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /10.10.120.104:31012 <--> 
/10.10.120.106:53048 (data server).  Closing connection.
io.netty.handler.codec.DecoderException: 
org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating buffer.
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271)
 ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
 [netty-common-4.0.48.Final.jar:4.0.48.Final]
at java.lang.Thread.run(Thread.java:745) 

[jira] [Created] (DRILL-7109) Statistics adds external sort, which spills to disk

2019-03-15 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7109:
-

 Summary: Statistics adds external sort, which spills to disk
 Key: DRILL-7109
 URL: https://issues.apache.org/jira/browse/DRILL-7109
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.16.0


TPCH query 4 with sf 100 runs many times slower.  One issue is that an extra 
external sort has been added, and both external sorts spill to disk.

Also, the hash join sees 100x more data.

Here is the query:
{noformat}
select
  o.o_orderpriority,
  count(*) as order_count
from
  orders o

where
  o.o_orderdate >= date '1996-10-01'
  and o.o_orderdate < date '1996-10-01' + interval '3' month
  and 
  exists (
select
  *
from
  lineitem l
where
  l.l_orderkey = o.o_orderkey
  and l.l_commitdate < l.l_receiptdate
  )
group by
  o.o_orderpriority
order by
  o.o_orderpriority;
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7108) Statistics adds two exchange operators

2019-03-15 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7108:
-

 Summary: Statistics adds two exchange operators
 Key: DRILL-7108
 URL: https://issues.apache.org/jira/browse/DRILL-7108
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.16.0


TPCH 16 with sf 100 runs 14% slower.  Here is the query:
{noformat}
select
  p.p_brand,
  p.p_type,
  p.p_size,
  count(distinct ps.ps_suppkey) as supplier_cnt
from
  partsupp ps,
  part p
where
  p.p_partkey = ps.ps_partkey
  and p.p_brand <> 'Brand#21'
  and p.p_type not like 'MEDIUM PLATED%'
  and p.p_size in (38, 2, 8, 31, 44, 5, 14, 24)
  and ps.ps_suppkey not in (
select
  s.s_suppkey
from
  supplier s
where
  s.s_comment like '%Customer%Complaints%'
  )
group by
  p.p_brand,
  p.p_type,
  p.p_size
order by
  supplier_cnt desc,
  p.p_brand,
  p.p_type,
  p.p_size;
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6755) HashJoin should not build hash tables when probe side is empty.

2019-01-31 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757752#comment-16757752
 ] 

Robert Hou commented on DRILL-6755:
---

Boaz suggested verifying this by joining with an empty file.
{noformat}
select count(*) from dfs.`/empty.json` E where E.l_orderkey in (select 
L.l_orderkey from lineitem L);
{noformat}

I tested this with Drill 1.15.  I had to turn off semijoins to get the desired 
plan because if a semijoin is used, then the join is re-ordered so that the 
empty file is on the build side (may be a bug).

I was able to verify that the hash join operator does not build a hash table 
for this query.

> HashJoin should not build hash tables when probe side is empty.
> ---
>
> Key: DRILL-6755
> URL: https://issues.apache.org/jira/browse/DRILL-6755
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Boaz Ben-Zvi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Currently when doing an Inner or a Right join we still build hashtables when 
> the probe side is empty. A performance optimization would be to not build 
> them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6755) HashJoin should not build hash tables when probe side is empty.

2019-01-31 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-6755.
-

> HashJoin should not build hash tables when probe side is empty.
> ---
>
> Key: DRILL-6755
> URL: https://issues.apache.org/jira/browse/DRILL-6755
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Boaz Ben-Zvi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Currently when doing an Inner or a Right join we still build hashtables when 
> the probe side is empty. A performance optimization would be to not build 
> them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2019-01-30 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-6517.
-

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:242)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:218)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  

[jira] [Comment Edited] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2019-01-30 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756773#comment-16756773
 ] 

Robert Hou edited comment on DRILL-6517 at 1/31/19 1:38 AM:


I am unable to reproduce this problem with sf1. I ran the query for 2 hours and 
12 hours, and then successfully canceled the query. I spoke with Khurram and 
added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set 
`drill.exec.hashjoin.fallback.enabled` = true;" because the query was running 
out of memory. I am able to cancel the query. I am running Drill 1.14, commit 
35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested 
EAND scenario.".  The query can be canceled with Drill 1.15.


was (Author: rhou):
 am unable to reproduce this problem with sf1. I ran the query for 2 hours and 
12 hours, and then successfully canceled the query. I spoke with Khurram and 
added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set 
`drill.exec.hashjoin.fallback.enabled` = true;" because the query was running 
out of memory. I am able to cancel the query. I am running Drill 1.14, commit 
35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested 
EAND scenario.".  The query can be canceled with Drill 1.15.

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> 

[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2019-01-30 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756773#comment-16756773
 ] 

Robert Hou commented on DRILL-6517:
---

 am unable to reproduce this problem with sf1. I ran the query for 2 hours and 
12 hours, and then successfully canceled the query. I spoke with Khurram and 
added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set 
`drill.exec.hashjoin.fallback.enabled` = true;" because the query was running 
out of memory. I am able to cancel the query. I am running Drill 1.14, commit 
35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested 
EAND scenario.".

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> 

[jira] [Comment Edited] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2019-01-30 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756773#comment-16756773
 ] 

Robert Hou edited comment on DRILL-6517 at 1/31/19 1:38 AM:


 am unable to reproduce this problem with sf1. I ran the query for 2 hours and 
12 hours, and then successfully canceled the query. I spoke with Khurram and 
added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set 
`drill.exec.hashjoin.fallback.enabled` = true;" because the query was running 
out of memory. I am able to cancel the query. I am running Drill 1.14, commit 
35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested 
EAND scenario.".  The query can be canceled with Drill 1.15.


was (Author: rhou):
 am unable to reproduce this problem with sf1. I ran the query for 2 hours and 
12 hours, and then successfully canceled the query. I spoke with Khurram and 
added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set 
`drill.exec.hashjoin.fallback.enabled` = true;" because the query was running 
out of memory. I am able to cancel the query. I am running Drill 1.14, commit 
35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested 
EAND scenario.".

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  

[jira] [Closed] (DRILL-6726) Drill fails to query views created before DRILL-6492 when impersonation is enabled

2019-01-30 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-6726.
-

> Drill fails to query views created before DRILL-6492 when impersonation is 
> enabled
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: student
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references a file that includes a schema which has upper case letters, the 
> view needs to be rebuilt.  There may be variations on this issue that I have 
> not seen.
> To reproduce this problem, create a dfs workspace like this:
> {noformat}
> "drillTestDirP1": {
>   "location": "/drill/testdata/p1tests",
>   "writable": true,
>   "defaultInputFormat": "parquet",
>   "allowAccessOutsideWorkspace": false
> },
> {noformat}
> Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
> command:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute 
> this query:
> {noformat}
> select * from student_test_v;
> {noformat}
> Drill will return an exception:
> {noformat}
> Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
> schema drillTestDirP1 not available in schema dfs.
> View Context dfs, drillTestDirP1
> View SQL SELECT *
> FROM `dfs.drillTestDirP1`.`student`
> [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> I have attached the student parquet file I used.
> This is what the .view.drill file looks like:
> {noformat}
> {
>   "name" : "student_test_v",
>   "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
>   "fields" : [ {
> "name" : "**",
> "type" : "DYNAMIC_STAR",
> "isNullable" : true
>   } ],
>   "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
> }
> {noformat}
> This means that users may not be able to access views that they have created 
> using previous versions of Drill.  We should maintain backwards 
> compatibiliity where possible.
> As work-around, these views can be re-created.  It would be helpful to users 
> if the error message explains that these views need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6726) Drill fails to query views created before DRILL-6492 when impersonation is enabled

2019-01-30 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756501#comment-16756501
 ] 

Robert Hou commented on DRILL-6726:
---

I have encountered another problem related to this one.  If I run Drill 1.15, 
and then I run Drill 1.14, Drill 1.14 cannot access schemas using mixed-case 
(have upper case letters).  It can access the schema if it uses lower case 
letters.  For example, if the schema used to be called "drillTestDir", Drill 
1.14 must use "drilltestdir" in order to use it.  This means that scripts that 
use "drillTestDir" can break.

This may not be a major issue now, but sometimes users can try a new version of 
Drill, and if they run into problems, they can revert to the older version of 
Drill.  We know one user who tried Drill 1.14 and encountered some problems and 
went back to Drill 1.13.  We should keep this in mind in future releases.

> Drill fails to query views created before DRILL-6492 when impersonation is 
> enabled
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: student
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references a file that includes a schema which has upper case letters, the 
> view needs to be rebuilt.  There may be variations on this issue that I have 
> not seen.
> To reproduce this problem, create a dfs workspace like this:
> {noformat}
> "drillTestDirP1": {
>   "location": "/drill/testdata/p1tests",
>   "writable": true,
>   "defaultInputFormat": "parquet",
>   "allowAccessOutsideWorkspace": false
> },
> {noformat}
> Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
> command:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute 
> this query:
> {noformat}
> select * from student_test_v;
> {noformat}
> Drill will return an exception:
> {noformat}
> Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
> schema drillTestDirP1 not available in schema dfs.
> View Context dfs, drillTestDirP1
> View SQL SELECT *
> FROM `dfs.drillTestDirP1`.`student`
> [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> I have attached the student parquet file I used.
> This is what the .view.drill file looks like:
> {noformat}
> {
>   "name" : "student_test_v",
>   "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
>   "fields" : [ {
> "name" : "**",
> "type" : "DYNAMIC_STAR",
> "isNullable" : true
>   } ],
>   "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
> }
> {noformat}
> This means that users may not be able to access views that they have created 
> using previous versions of Drill.  We should maintain backwards 
> compatibiliity where possible.
> As work-around, these views can be re-created.  It would be helpful to users 
> if the error message explains that these views need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2019-01-30 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-6709.
-

> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2019-01-30 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756486#comment-16756486
 ] 

Robert Hou commented on DRILL-6709:
---

I have verified this.

> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6880) Hash-Join: Many null keys on the build side form a long linked chain in the Hash Table

2019-01-14 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-6880.
-

> Hash-Join: Many null keys on the build side form a long linked chain in the 
> Hash Table
> --
>
> Key: DRILL-6880
> URL: https://issues.apache.org/jira/browse/DRILL-6880
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.16.0
>
>
> When building the Hash Table for the Hash-Join, each new key is matched with 
> an existing key (same bucket) by calling the generated method 
> `isKeyMatchInternalBuild`, which compares the two. However when both keys are 
> null, the method returns *false* (meaning not-equal; i.e. it is a new key), 
> thus the new key is added into the list following the old key. When a third 
> null key is found, it would be matched with the prior two, and added as well. 
> Etc etc ...
> This way many null values would perform checks at order N^2 / 2.
> _Suggested improvement_: The generated code should return a third result, 
> meaning "two null keys". Then in case of Inner or Left joins all the 
> duplicate nulls can be discarded.
> Below is a simple example, note the time difference between non-null and the 
> all-nulls tables (also instrumentation showed that for nulls, the method 
> above was called 1249975000 times!!)
> {code:java}
> 0: jdbc:drill:zk=local> use dfs.tmp;
> 0: jdbc:drill:zk=local> create table testNull as (select cast(null as int) 
> mycol from 
>  dfs.`/data/test128M.tbl` limit 5);
> 0: jdbc:drill:zk=local> create table test1 as (select cast(1 as int) mycol1 
> from 
>  dfs.`/data/test128M.tbl` limit 6);
> 0: jdbc:drill:zk=local> create table test2 as (select cast(2 as int) mycol2 
> from dfs.`/data/test128M.tbl` limit 5);
> 0: jdbc:drill:zk=local> select count(*) from test1 join test2 on test1.mycol1 
> = test2.mycol2;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> 1 row selected (0.443 seconds)
> 0: jdbc:drill:zk=local> select count(*) from test1 join testNull on 
> test1.mycol1 = testNull.mycol;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> 1 row selected (140.098 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6880) Hash-Join: Many null keys on the build side form a long linked chain in the Hash Table

2019-01-14 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742671#comment-16742671
 ] 

Robert Hou commented on DRILL-6880:
---

I have verified this fix.

> Hash-Join: Many null keys on the build side form a long linked chain in the 
> Hash Table
> --
>
> Key: DRILL-6880
> URL: https://issues.apache.org/jira/browse/DRILL-6880
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.16.0
>
>
> When building the Hash Table for the Hash-Join, each new key is matched with 
> an existing key (same bucket) by calling the generated method 
> `isKeyMatchInternalBuild`, which compares the two. However when both keys are 
> null, the method returns *false* (meaning not-equal; i.e. it is a new key), 
> thus the new key is added into the list following the old key. When a third 
> null key is found, it would be matched with the prior two, and added as well. 
> Etc etc ...
> This way many null values would perform checks at order N^2 / 2.
> _Suggested improvement_: The generated code should return a third result, 
> meaning "two null keys". Then in case of Inner or Left joins all the 
> duplicate nulls can be discarded.
> Below is a simple example, note the time difference between non-null and the 
> all-nulls tables (also instrumentation showed that for nulls, the method 
> above was called 1249975000 times!!)
> {code:java}
> 0: jdbc:drill:zk=local> use dfs.tmp;
> 0: jdbc:drill:zk=local> create table testNull as (select cast(null as int) 
> mycol from 
>  dfs.`/data/test128M.tbl` limit 5);
> 0: jdbc:drill:zk=local> create table test1 as (select cast(1 as int) mycol1 
> from 
>  dfs.`/data/test128M.tbl` limit 6);
> 0: jdbc:drill:zk=local> create table test2 as (select cast(2 as int) mycol2 
> from dfs.`/data/test128M.tbl` limit 5);
> 0: jdbc:drill:zk=local> select count(*) from test1 join test2 on test1.mycol1 
> = test2.mycol2;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> 1 row selected (0.443 seconds)
> 0: jdbc:drill:zk=local> select count(*) from test1 join testNull on 
> test1.mycol1 = testNull.mycol;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> 1 row selected (140.098 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2019-01-08 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737743#comment-16737743
 ] 

Robert Hou commented on DRILL-5796:
---

Found our documentation on this, and the default limit is 10K rowgroups.  Which 
means we are limited to 10K files.

> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6957) Parquet rowgroup filtering can have incorrect file count

2019-01-08 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6957:
-

 Summary: Parquet rowgroup filtering can have incorrect file count
 Key: DRILL-6957
 URL: https://issues.apache.org/jira/browse/DRILL-6957
 Project: Apache Drill
  Issue Type: Bug
Reporter: Robert Hou
Assignee: Jean-Blas IMBERT


If a query accesses all the files, the Scan operator indicates that one file is 
accessed.  The number of rowgroups is correct.

Here is an example query:
{noformat}
select count(*) from 
dfs.`/custdata/tudata/fact/vintage/snapshot_period_id=20151231/comp_id=120` 
where cur_tot_bal_amt < 100
{noformat}

Here is the plan:
{noformat}
Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = 
{9.8376721446E9 rows, 4.35668337906E10 cpu, 2.810763469E9 io, 4096.0 network, 
0.0 memory}, id = 4477
00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount 
= 1.0, cumulative cost = {9.8376721445E9 rows, 4.35668337905E10 cpu, 
2.810763469E9 io, 4096.0 network, 0.0 memory}, id = 4476
00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = 
RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {9.8376721435E9 
rows, 4.35668337895E10 cpu, 2.810763469E9 io, 4096.0 network, 0.0 memory}, id = 
4475
00-03  UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 
1.0, cumulative cost = {9.8376721425E9 rows, 4.35668337775E10 cpu, 
2.810763469E9 io, 4096.0 network, 0.0 memory}, id = 4474
01-01StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = 
RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {9.8376721415E9 
rows, 4.35668337695E10 cpu, 2.810763469E9 io, 0.0 network, 0.0 memory}, id = 
4473
01-02  Project($f0=[0]) : rowType = RecordType(INTEGER $f0): 
rowcount = 1.4053817345E9, cumulative cost = {8.432290407E9 rows, 
2.67022529555E10 cpu, 2.810763469E9 io, 0.0 network, 0.0 memory}, id = 4472
01-03SelectionVectorRemover : rowType = RecordType(ANY 
cur_tot_bal_amt): rowcount = 1.4053817345E9, cumulative cost = {7.0269086725E9 
rows, 2.10807260175E10 cpu, 2.810763469E9 io, 0.0 network, 0.0 memory}, id = 
4471
01-04  Filter(condition=[($0, 100)]) : rowType = 
RecordType(ANY cur_tot_bal_amt): rowcount = 1.4053817345E9, cumulative cost = 
{5.621526938E9 rows, 1.9675344283E10 cpu, 2.810763469E9 io, 0.0 network, 0.0 
memory}, id = 4470
01-05Scan(table=[[dfs, 
/custdata/tudata/fact/vintage/snapshot_period_id=20151231/comp_id=120]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///custdata/tudata/fact/vintage/snapshot_period_id=20151231/comp_id=120]],
 
selectionRoot=maprfs:/custdata/tudata/fact/vintage/snapshot_period_id=20151231/comp_id=120,
 numFiles=1, numRowGroups=1007, usedMetadataFile=false, 
columns=[`cur_tot_bal_amt`]]]) : rowType = RecordType(ANY cur_tot_bal_amt): 
rowcount = 2.810763469E9, cumulative cost = {2.810763469E9 rows, 2.810763469E9 
cpu, 2.810763469E9 io, 0.0 network, 0.0 memory}, id = 4469
{noformat}

numFiles is set to 1 when it should be set to 21.

All the files are in one directory.  If I add a level of directories (i.e. a 
directory with multiple directories, each with files), then I get the correct 
file count.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2019-01-08 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737694#comment-16737694
 ] 

Robert Hou commented on DRILL-5796:
---

It looks like pushdown is performed if there are up to 10K rowgroups.  If there 
are more than 10K rowgroups, I cannot tell if pushdown is being performed.  The 
explain plan suggests it is not being performed.

> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2019-01-07 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736835#comment-16736835
 ] 

Robert Hou commented on DRILL-5796:
---

Is there any limits for this feature?  I am testing it with roughly 250 files 
organized in roughly 20 directories.  There should only be one file that 
matches the query.  But the Scan operator shows that all 250 files in 20 
directories need to be scanned.  Perhaps the optimizer decides not to scan row 
group stats after some threshold?

> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2019-01-07 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736835#comment-16736835
 ] 

Robert Hou edited comment on DRILL-5796 at 1/8/19 7:37 AM:
---

Are there any limits for this feature?  I am testing it with roughly 250 files 
organized in roughly 20 directories.  There should only be one file that 
matches the query.  But the Scan operator shows that all 250 files in 20 
directories need to be scanned.  Perhaps the optimizer decides not to scan row 
group stats after some threshold?


was (Author: rhou):
Is there any limits for this feature?  I am testing it with roughly 250 files 
organized in roughly 20 directories.  There should only be one file that 
matches the query.  But the Scan operator shows that all 250 files in 20 
directories need to be scanned.  Perhaps the optimizer decides not to scan row 
group stats after some threshold?

> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6906) File permissions are not being honored

2018-12-15 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6906:
--
Description: 
I ran sqlline with user "kuser1".
{noformat}
/opt/mapr/drill/drill-1.15.0.apache/bin/sqlline -u 
"jdbc:drill:drillbit=10.10.30.206" -n kuser1 -p mapr
{noformat}

I tried to access a file that is only accessible by root:
{noformat}
[root@perfnode206 drill-test-framework_krystal]# hf -ls 
/drill/testdata/impersonation/neg_tc5/student
-rwx--   3 root root  64612 2018-06-19 10:30 
/drill/testdata/impersonation/neg_tc5/student
{noformat}

I am able to read the table, which should not be possible.  I used this commit 
for Drill 1.15.
{noformat}
git.commit.id=bf2b414ac62cfc515fdd77f2688bb110073d764d
git.commit.message.full=DRILL-6866\: Upgrade to SqlLine 1.6.0\n\n1. Changed 
SqlLine version to 1.6.0.\n2. Overridden new getVersion method in 
DrillSqlLineApplication.\n3. Set maxColumnWidth to 80 to avoid issue described 
in DRILL-6769.\n4. Changed colorScheme to obsidian.\n5. Output null value for 
varchar / char / boolean types as null instead of empty string.\n6. Changed 
access modifier from package default to public for JDBC classes that implement 
external interfaces to avoid issues when calling methods from these classes 
using reflection.\n\ncloses \#1556
{noformat}

This is from drillbit.log.  It shows that user is kuser1.
{noformat}
2018-12-15 05:00:52,516 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] DEBUG 
o.a.d.e.w.f.QueryStateProcessor - 23eb04fb-1701-bea7-dd97-ecda58795b3b: State 
change requested PREPARING --> PLANNING
2018-12-15 05:00:52,531 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query with id 
23eb04fb-1701-bea7-dd97-ecda58795b3b issued by kuser1: select * from 
dfs.`/drill/testdata/impersonation/neg_tc5/student`
{noformat}

It is not clear to me if this is a Drill problem or a file system problem.  I 
tested MFS by logging in as kuser1 and trying to copy the file using "hadoop fs 
-copyToLocal /drill/testdata/impersonation/neg_tc5/student" and got an error, 
and was not able to copy the file.  So I think MFS permissions are working.

I also tried with Drill 1.14, and I get the expected error:
{noformat}
0: jdbc:drill:drillbit=10.10.30.206> select * from 
dfs.`/drill/testdata/impersonation/neg_tc5/student` limit 1;
Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 17: Object 
'/drill/testdata/impersonation/neg_tc5/student' not found within 'dfs'

[Error Id: cdf18c2a-b005-4f92-b819-d4324e8807d9 on perfnode206.perf.lab:31010] 
(state=,code=0)
{noformat}

The commit for Drill 1.14 is:
{noformat}
git.commit.message.full=[maven-release-plugin] prepare release drill-1.14.0\n
git.commit.id=0508a128853ce796ca7e99e13008e49442f83147
{noformat}

This problem exists with both Apache JDBC and Simba ODBC.

Here is drill-distrib.conf.  drill-override.conf is empty.  It is the same for 
both 1.14 and 1.15.
{noformat}
drill.exec: {
  cluster-id: "secure206-drillbits",
  zk.connect: 
"perfnode206.perf.lab:5181,perfnode207.perf.lab:5181,perfnode208.perf.lab:5181",
  rpc.user.client.threads: "4",
  options.store.parquet.block-size: "268435456",
  sys.store.provider.zk.blobroot: "maprfs:///apps/drill",
  spill.directories: [ "/tmp/drill/spill" ],
  spill.fs: "maprfs:///",
  storage.action_on_plugins_override_file: "rename"

  zk.apply_secure_acl: true,

  impersonation.enabled: true,
  impersonation.max_chained_user_hops: 3,
  options.exec.impersonation.inbound_policies: 
"[{proxy_principals:{users:[\"mapr\"]},target_principals:{users:[\"*\"]}}]",

  security.auth.mechanisms: ["PLAIN", "KERBEROS"],
  security.auth.principal : "mapr/maprs...@qa.lab",
  security.auth.keytab : "/etc/drill/mapr_maprsasl.keytab",
  security.user.auth.enabled: true,
  security.user.auth.packages += "org.apache.drill.exec.rpc.user.security",
  security.user.auth.impl: "pam4j",
  security.user.auth.pam_profiles: ["sudo", "login"],

  http.ssl_enabled: true,
  ssl.useHadoopConfig: true,
  http.auth.mechanisms: ["FORM", "SPNEGO"],
  http.auth.spnego.principal: "HTTP/perfnode206.perf@qa.lab",
  http.auth.spnego.keytab: "/etc/drill_spnego/perfnode206.keytab"
}
{noformat}

  was:
I ran sqlline with user "kuser1".
{noformat}
/opt/mapr/drill/drill-1.15.0.apache/bin/sqlline -u 
"jdbc:drill:drillbit=10.10.30.206" -n kuser1 -p mapr
{noformat}

I tried to access a file that is only accessible by root:
{noformat}
[root@perfnode206 drill-test-framework_krystal]# hf -ls 
/drill/testdata/impersonation/neg_tc5/student
-rwx--   3 root root  64612 2018-06-19 10:30 
/drill/testdata/impersonation/neg_tc5/student
{noformat}

I am able to read the table, which should not be possible.  I used this commit 
for Drill 1.15.
{noformat}
git.commit.id=bf2b414ac62cfc515fdd77f2688bb110073d764d
git.commit.message.full=DRILL-6866\: Upgrade to SqlLine 1.6.0\n\n1. 

[jira] [Updated] (DRILL-6906) File permissions are not being honored

2018-12-15 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6906:
--
Description: 
I ran sqlline with user "kuser1".
{noformat}
/opt/mapr/drill/drill-1.15.0.apache/bin/sqlline -u 
"jdbc:drill:drillbit=10.10.30.206" -n kuser1 -p mapr
{noformat}

I tried to access a file that is only accessible by root:
{noformat}
[root@perfnode206 drill-test-framework_krystal]# hf -ls 
/drill/testdata/impersonation/neg_tc5/student
-rwx--   3 root root  64612 2018-06-19 10:30 
/drill/testdata/impersonation/neg_tc5/student
{noformat}

I am able to read the table, which should not be possible.  I used this commit 
for Drill 1.15.
{noformat}
git.commit.id=bf2b414ac62cfc515fdd77f2688bb110073d764d
git.commit.message.full=DRILL-6866\: Upgrade to SqlLine 1.6.0\n\n1. Changed 
SqlLine version to 1.6.0.\n2. Overridden new getVersion method in 
DrillSqlLineApplication.\n3. Set maxColumnWidth to 80 to avoid issue described 
in DRILL-6769.\n4. Changed colorScheme to obsidian.\n5. Output null value for 
varchar / char / boolean types as null instead of empty string.\n6. Changed 
access modifier from package default to public for JDBC classes that implement 
external interfaces to avoid issues when calling methods from these classes 
using reflection.\n\ncloses \#1556
{noformat}

This is from drillbit.log.  It shows that user is kuser1.
{noformat}
2018-12-15 05:00:52,516 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] DEBUG 
o.a.d.e.w.f.QueryStateProcessor - 23eb04fb-1701-bea7-dd97-ecda58795b3b: State 
change requested PREPARING --> PLANNING
2018-12-15 05:00:52,531 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query with id 
23eb04fb-1701-bea7-dd97-ecda58795b3b issued by kuser1: select * from 
dfs.`/drill/testdata/impersonation/neg_tc5/student`
{noformat}

It is not clear to me if this is a Drill problem or a file system problem.  I 
tested MFS by logging in as kuser1 and trying to copy the file using "hadoop fs 
-copyToLocal /drill/testdata/impersonation/neg_tc5/student" and got an error, 
and was not able to copy the file.  So I think MFS permissions are working.

I also tried with Drill 1.14, and I get the expected error:
{noformat}
0: jdbc:drill:drillbit=10.10.30.206> select * from 
dfs.`/drill/testdata/impersonation/neg_tc5/student` limit 1;
Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 17: Object 
'/drill/testdata/impersonation/neg_tc5/student' not found within 'dfs'

[Error Id: cdf18c2a-b005-4f92-b819-d4324e8807d9 on perfnode206.perf.lab:31010] 
(state=,code=0)
{noformat}

The commit for Drill 1.14 is:
{noformat}
git.commit.message.full=[maven-release-plugin] prepare release drill-1.14.0\n
git.commit.id=0508a128853ce796ca7e99e13008e49442f83147
{noformat}

This problem exists with both Apache JDBC and Simba ODBC.

Here is drill-distrib.conf.  drill-override.conf is empty.  It is the same for 
both 1.14 and 1.15.
{noformat}
drill.exec: {
  cluster-id: "secure206-drillbits",
  zk.connect: 
"perfnode206.perf.lab:5181,perfnode207.perf.lab:5181,perfnode208.perf.lab:5181",
  rpc.user.client.threads: "4",
  options.store.parquet.block-size: "268435456",
  sys.store.provider.zk.blobroot: "maprfs:///apps/drill",
  spill.directories: [ "/tmp/drill/spill" ],
  spill.fs: "maprfs:///",
  storage.action_on_plugins_override_file: "rename"

  zk.apply_secure_acl: true,

  impersonation.enabled: true,
  impersonation.max_chained_user_hops: 3,
  options.exec.impersonation.inbound_policies: 
"[{proxy_principals:{users:[\"mapr\"]},target_principals:{users:[\"*\"]}}]",

  # security.auth.mechanisms: ["MAPRSASL", "PLAIN", "KERBEROS"],
  security.auth.mechanisms: ["PLAIN", "KERBEROS"],
  security.auth.principal : "mapr/maprs...@qa.lab",
  security.auth.keytab : "/etc/drill/mapr_maprsasl.keytab",
  security.user.auth.enabled: true,
  security.user.auth.packages += "org.apache.drill.exec.rpc.user.security",
  security.user.auth.impl: "pam4j",
  security.user.auth.pam_profiles: ["sudo", "login"],

  http.ssl_enabled: true,
  ssl.useHadoopConfig: true,
  http.auth.mechanisms: ["FORM", "SPNEGO"],
  http.auth.spnego.principal: "HTTP/perfnode206.perf@qa.lab",
  http.auth.spnego.keytab: "/etc/drill_spnego/perfnode206.keytab"
}
{noformat}

  was:
I ran sqlline with user "kuser1".
{noformat}
/opt/mapr/drill/drill-1.15.0.apache/bin/sqlline -u 
"jdbc:drill:drillbit=10.10.30.206" -n kuser1 -p mapr
{noformat}

I tried to access a file that is only accessible by root:
{noformat}
[root@perfnode206 drill-test-framework_krystal]# hf -ls 
/drill/testdata/impersonation/neg_tc5/student
-rwx--   3 root root  64612 2018-06-19 10:30 
/drill/testdata/impersonation/neg_tc5/student
{noformat}

I am able to read the table, which should not be possible.  I used this commit 
for Drill 1.15.
{noformat}
git.commit.id=bf2b414ac62cfc515fdd77f2688bb110073d764d

[jira] [Created] (DRILL-6906) File permissions are not being honored

2018-12-15 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6906:
-

 Summary: File permissions are not being honored
 Key: DRILL-6906
 URL: https://issues.apache.org/jira/browse/DRILL-6906
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC, Client - ODBC
Affects Versions: 1.15.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Fix For: 1.15.0


I ran sqlline with user "kuser1".
{noformat}
/opt/mapr/drill/drill-1.15.0.apache/bin/sqlline -u 
"jdbc:drill:drillbit=10.10.30.206" -n kuser1 -p mapr
{noformat}

I tried to access a file that is only accessible by root:
{noformat}
[root@perfnode206 drill-test-framework_krystal]# hf -ls 
/drill/testdata/impersonation/neg_tc5/student
-rwx--   3 root root  64612 2018-06-19 10:30 
/drill/testdata/impersonation/neg_tc5/student
{noformat}

I am able to read the table, which should not be possible.  I used this commit 
for Drill 1.15.
{noformat}
git.commit.id=bf2b414ac62cfc515fdd77f2688bb110073d764d
git.commit.message.full=DRILL-6866\: Upgrade to SqlLine 1.6.0\n\n1. Changed 
SqlLine version to 1.6.0.\n2. Overridden new getVersion method in 
DrillSqlLineApplication.\n3. Set maxColumnWidth to 80 to avoid issue described 
in DRILL-6769.\n4. Changed colorScheme to obsidian.\n5. Output null value for 
varchar / char / boolean types as null instead of empty string.\n6. Changed 
access modifier from package default to public for JDBC classes that implement 
external interfaces to avoid issues when calling methods from these classes 
using reflection.\n\ncloses \#1556
{noformat}

This is from drillbit.log.  It shows that user is kuser1.
{noformat}
2018-12-15 05:00:52,516 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] DEBUG 
o.a.d.e.w.f.QueryStateProcessor - 23eb04fb-1701-bea7-dd97-ecda58795b3b: State 
change requested PREPARING --> PLANNING
2018-12-15 05:00:52,531 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query with id 
23eb04fb-1701-bea7-dd97-ecda58795b3b issued by kuser1: select * from 
dfs.`/drill/testdata/impersonation/neg_tc5/student`
{noformat}

It is not clear to me if this is a Drill problem or a file system problem.  I 
tested MFS by logging in as kuser1 and trying to copy the file using "hadoop fs 
-copyToLocal /drill/testdata/impersonation/neg_tc5/student" and got an error, 
and was not able to copy the file.  So I think MFS permissions are working.

I also tried with Drill 1.14, and I get the expected error:
{noformat}
0: jdbc:drill:drillbit=10.10.30.206> select * from 
dfs.`/drill/testdata/impersonation/neg_tc5/student` limit 1;
Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 17: Object 
'/drill/testdata/impersonation/neg_tc5/student' not found within 'dfs'

[Error Id: cdf18c2a-b005-4f92-b819-d4324e8807d9 on perfnode206.perf.lab:31010] 
(state=,code=0)
{noformat}

The commit for Drill 1.14 is:
{noformat}
git.commit.message.full=[maven-release-plugin] prepare release drill-1.14.0\n
git.commit.id=0508a128853ce796ca7e99e13008e49442f83147
{noformat}

This problem exists with both Apache JDBC and Simba ODBC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6902) Extra limit operator is not needed

2018-12-12 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6902:
-

 Summary: Extra limit operator is not needed
 Key: DRILL-6902
 URL: https://issues.apache.org/jira/browse/DRILL-6902
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.15.0
Reporter: Robert Hou
Assignee: Pritesh Maker


For TPCDS query 49, there is an extra limit operator that is not needed.

Here is the query:
{noformat}
SELECT 'web' AS channel, 
   web.item, 
   web.return_ratio, 
   web.return_rank, 
   web.currency_rank 
FROM   (SELECT item, 
   return_ratio, 
   currency_ratio, 
   Rank() 
 OVER ( 
   ORDER BY return_ratio)   AS return_rank, 
   Rank() 
 OVER ( 
   ORDER BY currency_ratio) AS currency_rank 
FROM   (SELECT ws.ws_item_sk   AS 
   item, 
   ( Cast(Sum(COALESCE(wr.wr_return_quantity, 0)) AS 
DEC(15, 
  4)) / 
 Cast( 
 Sum(COALESCE(ws.ws_quantity, 0)) AS DEC(15, 4)) ) AS 
   return_ratio, 
   ( Cast(Sum(COALESCE(wr.wr_return_amt, 0)) AS DEC(15, 4)) 
 / Cast( 
 Sum( 
 COALESCE(ws.ws_net_paid, 0)) AS DEC(15, 
 4)) ) AS 
   currency_ratio 
FROM   web_sales ws 
   LEFT OUTER JOIN web_returns wr 
ON ( ws.ws_order_number = 
wr.wr_order_number 
 AND ws.ws_item_sk = wr.wr_item_sk ), 
   date_dim 
WHERE  wr.wr_return_amt > 1 
   AND ws.ws_net_profit > 1 
   AND ws.ws_net_paid > 0 
   AND ws.ws_quantity > 0 
   AND ws_sold_date_sk = d_date_sk 
   AND d_year = 1999 
   AND d_moy = 12 
GROUP  BY ws.ws_item_sk) in_web) web 
WHERE  ( web.return_rank <= 10 
  OR web.currency_rank <= 10 ) 
UNION 
SELECT 'catalog' AS channel, 
   catalog.item, 
   catalog.return_ratio, 
   catalog.return_rank, 
   catalog.currency_rank 
FROM   (SELECT item, 
   return_ratio, 
   currency_ratio, 
   Rank() 
 OVER ( 
   ORDER BY return_ratio)   AS return_rank, 
   Rank() 
 OVER ( 
   ORDER BY currency_ratio) AS currency_rank 
FROM   (SELECT cs.cs_item_sk   AS 
   item, 
   ( Cast(Sum(COALESCE(cr.cr_return_quantity, 0)) AS 
DEC(15, 
  4)) / 
 Cast( 
 Sum(COALESCE(cs.cs_quantity, 0)) AS DEC(15, 4)) ) AS 
   return_ratio, 
   ( Cast(Sum(COALESCE(cr.cr_return_amount, 0)) AS DEC(15, 
4 
  )) / 
 Cast(Sum( 
 COALESCE(cs.cs_net_paid, 0)) AS DEC( 
 15, 4)) ) AS 
   currency_ratio 
FROM   catalog_sales cs 
   LEFT OUTER JOIN catalog_returns cr 
ON ( cs.cs_order_number = 
cr.cr_order_number 
 AND cs.cs_item_sk = cr.cr_item_sk ), 
   date_dim 
WHERE  cr.cr_return_amount > 1 
   AND cs.cs_net_profit > 1 
   AND cs.cs_net_paid > 0 
   AND cs.cs_quantity > 0 
   AND cs_sold_date_sk = d_date_sk 
   AND d_year = 1999 
   AND d_moy = 12 
GROUP  BY cs.cs_item_sk) in_cat) catalog 
WHERE  ( catalog.return_rank <= 10 
  OR catalog.currency_rank <= 10 ) 
UNION 
SELECT 'store' AS channel, 
   store.item, 
   store.return_ratio, 
   store.return_rank, 
   store.currency_rank 
FROM   (SELECT item, 
   return_ratio, 
   currency_ratio, 
   Rank() 
 OVER ( 
   ORDER BY return_ratio)   AS return_rank, 
   Rank() 
 OVER ( 
   ORDER BY currency_ratio) AS currency_rank 
FROM   (SELECT sts.ss_item_sk   AS 
   item, 
   ( Cast(Sum(COALESCE(sr.sr_return_quantity, 

[jira] [Created] (DRILL-6897) TPCH 13 has regressed

2018-12-11 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6897:
-

 Summary: TPCH 13 has regressed
 Key: DRILL-6897
 URL: https://issues.apache.org/jira/browse/DRILL-6897
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.15.0
Reporter: Robert Hou
Assignee: Karthikeyan Manivannan
 Attachments: 240099ed-ef2a-a23a-4559-f1b2e0809e72.sys.drill, 
2400be84-c024-cb92-8743-3211589e0247.sys.drill

I ran TPCH query 13 with both scale factor 100 and 1000, and ran them 3x to get 
a warm start, and ran them twice to verify the regression. It is regressing 
between 26 and 33%.

Here is the query:
{noformat}
select
  c_count,
  count(*) as custdist
from
  (
select
  c.c_custkey,
  count(o.o_orderkey)
from
  customer c 
  left outer join orders o 
on c.c_custkey = o.o_custkey
and o.o_comment not like '%special%requests%'
group by
  c.c_custkey
  ) as orders (c_custkey, c_count)
group by
  c_count
order by
  custdist desc,
  c_count desc;
{noformat}

I have attached two profiles. 240099ed-ef2a-a23a-4559-f1b2e0809e72 is for Drill 
1.15. 2400be84-c024-cb92-8743-3211589e0247 is for Drill 1.14. The commit for 
Drill 1.15 is 596227bbbecfb19bdb55dd8ea58159890f83bc9c. The commit for Drill 
1.14 is 0508a128853ce796ca7e99e13008e49442f83147.

The two plans nearly the same. One difference is that Drill 1.15 is using four 
times more memory in operator 07-01 Unordered Mux Exchange. I think the problem 
may be in operator 09-01 Project. Drill 1.15 is projecting the comment field 
while Drill 1.14 does not project the comment field.

Another issue is that the Drill 1.15 takes more processing time to filter the 
order table. Filter operator 09-03 takes an average of 19.3s. For Drill 1.14, 
filter operator 09-04 takes an average of 15.6s. They process the same number 
of rows, and have the same number of minor fragments.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6828) Hit UnrecognizedPropertyException when run tpch queries

2018-11-26 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-6828.
---
Resolution: Cannot Reproduce

> Hit UnrecognizedPropertyException when run tpch queries
> ---
>
> Key: DRILL-6828
> URL: https://issues.apache.org/jira/browse/DRILL-6828
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
> Environment: RHEL 7,   Apache Drill commit id: 
> 18e09a1b1c801f2691a05ae7db543bf71874cfea
>Reporter: Dechang Gu
>Assignee: Robert Hou
>Priority: Blocker
> Fix For: 1.15.0
>
>
> Installed Apache Drill 1.15.0 commit id: 
> 18e09a1b1c801f2691a05ae7db543bf71874cfea DRILL-6763: Codegen optimization of 
> SQL functions with constant values(\#1481)
> Hit the following errors:
> {code}
> java.sql.SQLException: SYSTEM ERROR: UnrecognizedPropertyException: 
> Unrecognized field "outgoingBatchSize" (class 
> org.apache.drill.exec.physical.config.HashPartitionSender), not marked as 
> ignorable (9 known properties: "receiver-major-fragment", 
> "initialAllocation", "expr", "userName", "@id", "child", "cost", 
> "destinations", "maxAllocation"])
>  at [Source: (StringReader); line: 1000, column: 29] (through reference 
> chain: 
> org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"])
> Fragment 3:175
> Please, refer to logs for more information.
> [Error Id: cc023cdb-9a46-4edd-ad0b-6da1e9085291 on ucs-node6.perf.lab:31010]
> at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528)
> at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:600)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1288)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
> at 
> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667)
> at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1109)
> at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1120)
> at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
> at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:196)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:227)
> at PipSQueak.executeQuery(PipSQueak.java:289)
> at PipSQueak.runTest(PipSQueak.java:104)
> at PipSQueak.main(PipSQueak.java:477)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: UnrecognizedPropertyException: Unrecognized field "outgoingBatchSize" 
> (class org.apache.drill.exec.physical.config.HashPartitionSender), not marked 
> as ignorable (9 known properties: "receiver-major-fragment", 
> "initialAllocation", "expr", "userName", "@id", "child", "cost", 
> "destinations", "maxAllocation"])
>  at [Source: (StringReader); line: 1000, column: 29] (through reference 
> chain: 
> org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6828) Hit UnrecognizedPropertyException when run tpch queries

2018-11-26 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-6828.
-

> Hit UnrecognizedPropertyException when run tpch queries
> ---
>
> Key: DRILL-6828
> URL: https://issues.apache.org/jira/browse/DRILL-6828
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
> Environment: RHEL 7,   Apache Drill commit id: 
> 18e09a1b1c801f2691a05ae7db543bf71874cfea
>Reporter: Dechang Gu
>Assignee: Robert Hou
>Priority: Blocker
> Fix For: 1.15.0
>
>
> Installed Apache Drill 1.15.0 commit id: 
> 18e09a1b1c801f2691a05ae7db543bf71874cfea DRILL-6763: Codegen optimization of 
> SQL functions with constant values(\#1481)
> Hit the following errors:
> {code}
> java.sql.SQLException: SYSTEM ERROR: UnrecognizedPropertyException: 
> Unrecognized field "outgoingBatchSize" (class 
> org.apache.drill.exec.physical.config.HashPartitionSender), not marked as 
> ignorable (9 known properties: "receiver-major-fragment", 
> "initialAllocation", "expr", "userName", "@id", "child", "cost", 
> "destinations", "maxAllocation"])
>  at [Source: (StringReader); line: 1000, column: 29] (through reference 
> chain: 
> org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"])
> Fragment 3:175
> Please, refer to logs for more information.
> [Error Id: cc023cdb-9a46-4edd-ad0b-6da1e9085291 on ucs-node6.perf.lab:31010]
> at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528)
> at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:600)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1288)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
> at 
> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667)
> at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1109)
> at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1120)
> at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
> at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:196)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:227)
> at PipSQueak.executeQuery(PipSQueak.java:289)
> at PipSQueak.runTest(PipSQueak.java:104)
> at PipSQueak.main(PipSQueak.java:477)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: UnrecognizedPropertyException: Unrecognized field "outgoingBatchSize" 
> (class org.apache.drill.exec.physical.config.HashPartitionSender), not marked 
> as ignorable (9 known properties: "receiver-major-fragment", 
> "initialAllocation", "expr", "userName", "@id", "child", "cost", 
> "destinations", "maxAllocation"])
>  at [Source: (StringReader); line: 1000, column: 29] (through reference 
> chain: 
> org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6828) Hit UnrecognizedPropertyException when run tpch queries

2018-11-26 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699725#comment-16699725
 ] 

Robert Hou commented on DRILL-6828:
---

I think this was a problem with how the build was distributed to the nodes.  I 
will close this for now, and re-open if we hit it again.

> Hit UnrecognizedPropertyException when run tpch queries
> ---
>
> Key: DRILL-6828
> URL: https://issues.apache.org/jira/browse/DRILL-6828
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
> Environment: RHEL 7,   Apache Drill commit id: 
> 18e09a1b1c801f2691a05ae7db543bf71874cfea
>Reporter: Dechang Gu
>Assignee: Robert Hou
>Priority: Blocker
> Fix For: 1.15.0
>
>
> Installed Apache Drill 1.15.0 commit id: 
> 18e09a1b1c801f2691a05ae7db543bf71874cfea DRILL-6763: Codegen optimization of 
> SQL functions with constant values(\#1481)
> Hit the following errors:
> {code}
> java.sql.SQLException: SYSTEM ERROR: UnrecognizedPropertyException: 
> Unrecognized field "outgoingBatchSize" (class 
> org.apache.drill.exec.physical.config.HashPartitionSender), not marked as 
> ignorable (9 known properties: "receiver-major-fragment", 
> "initialAllocation", "expr", "userName", "@id", "child", "cost", 
> "destinations", "maxAllocation"])
>  at [Source: (StringReader); line: 1000, column: 29] (through reference 
> chain: 
> org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"])
> Fragment 3:175
> Please, refer to logs for more information.
> [Error Id: cc023cdb-9a46-4edd-ad0b-6da1e9085291 on ucs-node6.perf.lab:31010]
> at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528)
> at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:600)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1288)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
> at 
> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667)
> at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1109)
> at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1120)
> at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
> at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:196)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:227)
> at PipSQueak.executeQuery(PipSQueak.java:289)
> at PipSQueak.runTest(PipSQueak.java:104)
> at PipSQueak.main(PipSQueak.java:477)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: UnrecognizedPropertyException: Unrecognized field "outgoingBatchSize" 
> (class org.apache.drill.exec.physical.config.HashPartitionSender), not marked 
> as ignorable (9 known properties: "receiver-major-fragment", 
> "initialAllocation", "expr", "userName", "@id", "child", "cost", 
> "destinations", "maxAllocation"])
>  at [Source: (StringReader); line: 1000, column: 29] (through reference 
> chain: 
> org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6567) Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException.

2018-11-20 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-6567.
-

> Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: 
> java.lang.reflect.UndeclaredThrowableException.
> ---
>
> Key: DRILL-6567
> URL: https://issues.apache.org/jira/browse/DRILL-6567
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.15.0
>
>
> This is TPCDS Query 93.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query93.sql
> SELECT ss_customer_sk,
> Sum(act_sales) sumsales
> FROM   (SELECT ss_item_sk,
> ss_ticket_number,
> ss_customer_sk,
> CASE
> WHEN sr_return_quantity IS NOT NULL THEN
> ( ss_quantity - sr_return_quantity ) * ss_sales_price
> ELSE ( ss_quantity * ss_sales_price )
> END act_sales
> FROM   store_sales
> LEFT OUTER JOIN store_returns
> ON ( sr_item_sk = ss_item_sk
> AND sr_ticket_number = ss_ticket_number ),
> reason
> WHERE  sr_reason_sk = r_reason_sk
> AND r_reason_desc = 'reason 38') t
> GROUP  BY ss_customer_sk
> ORDER  BY sumsales,
> ss_customer_sk
> LIMIT 100;
> Here is the stack trace:
> 2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
> Exception:
> java.sql.SQLException: INTERNAL_ERROR ERROR: 
> java.lang.reflect.UndeclaredThrowableException
> Setup failed for null
> Fragment 4:56
> [Error Id: 3c72c14d-9362-4a9b-affb-5cf937bed89e on atsqa6c82.qa.lab:31010]
>   (org.apache.drill.common.exceptions.ExecutionSetupException) 
> java.lang.reflect.UndeclaredThrowableException
> 
> org.apache.drill.common.exceptions.ExecutionSetupException.fromThrowable():30
> org.apache.drill.exec.store.hive.readers.HiveAbstractReader.setup():327
> org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():245
> org.apache.drill.exec.physical.impl.ScanBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema():118
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.physical.impl.BaseRootExec.next():103
> 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
> org.apache.drill.exec.physical.impl.BaseRootExec.next():93
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
>   Caused By (java.util.concurrent.ExecutionException) 
> 

[jira] [Resolved] (DRILL-6567) Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException.

2018-11-20 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-6567.
---
Resolution: Fixed
  Assignee: Vitalii Diravka  (was: Robert Hou)

> Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: 
> java.lang.reflect.UndeclaredThrowableException.
> ---
>
> Key: DRILL-6567
> URL: https://issues.apache.org/jira/browse/DRILL-6567
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.15.0
>
>
> This is TPCDS Query 93.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query93.sql
> SELECT ss_customer_sk,
> Sum(act_sales) sumsales
> FROM   (SELECT ss_item_sk,
> ss_ticket_number,
> ss_customer_sk,
> CASE
> WHEN sr_return_quantity IS NOT NULL THEN
> ( ss_quantity - sr_return_quantity ) * ss_sales_price
> ELSE ( ss_quantity * ss_sales_price )
> END act_sales
> FROM   store_sales
> LEFT OUTER JOIN store_returns
> ON ( sr_item_sk = ss_item_sk
> AND sr_ticket_number = ss_ticket_number ),
> reason
> WHERE  sr_reason_sk = r_reason_sk
> AND r_reason_desc = 'reason 38') t
> GROUP  BY ss_customer_sk
> ORDER  BY sumsales,
> ss_customer_sk
> LIMIT 100;
> Here is the stack trace:
> 2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
> Exception:
> java.sql.SQLException: INTERNAL_ERROR ERROR: 
> java.lang.reflect.UndeclaredThrowableException
> Setup failed for null
> Fragment 4:56
> [Error Id: 3c72c14d-9362-4a9b-affb-5cf937bed89e on atsqa6c82.qa.lab:31010]
>   (org.apache.drill.common.exceptions.ExecutionSetupException) 
> java.lang.reflect.UndeclaredThrowableException
> 
> org.apache.drill.common.exceptions.ExecutionSetupException.fromThrowable():30
> org.apache.drill.exec.store.hive.readers.HiveAbstractReader.setup():327
> org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():245
> org.apache.drill.exec.physical.impl.ScanBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema():118
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.physical.impl.BaseRootExec.next():103
> 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
> org.apache.drill.exec.physical.impl.BaseRootExec.next():93
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
>   Caused By 

[jira] [Closed] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_1

2018-11-20 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-6569.
-

> Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not 
> read value at 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> --
>
> Key: DRILL-6569
> URL: https://issues.apache.org/jira/browse/DRILL-6569
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Storage - Hive
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.15.0
>
>
> This is TPCDS Query 19.
> I am able to scan the parquet file using:
>select * from 
> dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet`
> and I get 3,349,279 rows selected.
> There are roughly 15 similar failures in the Advanced nightly run, out of 37 
> failures.  So this issue accounts for about half the failures.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql
> SELECT i_brand_id  brand_id,
> i_brand brand,
> i_manufact_id,
> i_manufact,
> Sum(ss_ext_sales_price) ext_price
> FROM   date_dim,
> store_sales,
> item,
> customer,
> customer_address,
> store
> WHERE  d_date_sk = ss_sold_date_sk
> AND ss_item_sk = i_item_sk
> AND i_manager_id = 38
> AND d_moy = 12
> AND d_year = 1998
> AND ss_customer_sk = c_customer_sk
> AND c_current_addr_sk = ca_address_sk
> AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5)
> AND ss_store_sk = s_store_sk
> GROUP  BY i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> ORDER  BY ext_price DESC,
> i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> LIMIT 100;
> Here is the stack trace:
> 2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
> Exception:
> java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block 
> 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> Fragment 4:26
> [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010]
>   (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at 
> 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> 
> hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243
> hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57
> 
> org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417
> org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54
> org.apache.drill.exec.physical.impl.ScanBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> 

[jira] [Commented] (DRILL-6567) Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException.

2018-11-20 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693961#comment-16693961
 ] 

Robert Hou commented on DRILL-6567:
---

Enabling "store.hive.optimize_scan_with_native_readers" allows the test to pass.

> Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: 
> java.lang.reflect.UndeclaredThrowableException.
> ---
>
> Key: DRILL-6567
> URL: https://issues.apache.org/jira/browse/DRILL-6567
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Robert Hou
>Priority: Critical
> Fix For: 1.15.0
>
>
> This is TPCDS Query 93.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query93.sql
> SELECT ss_customer_sk,
> Sum(act_sales) sumsales
> FROM   (SELECT ss_item_sk,
> ss_ticket_number,
> ss_customer_sk,
> CASE
> WHEN sr_return_quantity IS NOT NULL THEN
> ( ss_quantity - sr_return_quantity ) * ss_sales_price
> ELSE ( ss_quantity * ss_sales_price )
> END act_sales
> FROM   store_sales
> LEFT OUTER JOIN store_returns
> ON ( sr_item_sk = ss_item_sk
> AND sr_ticket_number = ss_ticket_number ),
> reason
> WHERE  sr_reason_sk = r_reason_sk
> AND r_reason_desc = 'reason 38') t
> GROUP  BY ss_customer_sk
> ORDER  BY sumsales,
> ss_customer_sk
> LIMIT 100;
> Here is the stack trace:
> 2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
> Exception:
> java.sql.SQLException: INTERNAL_ERROR ERROR: 
> java.lang.reflect.UndeclaredThrowableException
> Setup failed for null
> Fragment 4:56
> [Error Id: 3c72c14d-9362-4a9b-affb-5cf937bed89e on atsqa6c82.qa.lab:31010]
>   (org.apache.drill.common.exceptions.ExecutionSetupException) 
> java.lang.reflect.UndeclaredThrowableException
> 
> org.apache.drill.common.exceptions.ExecutionSetupException.fromThrowable():30
> org.apache.drill.exec.store.hive.readers.HiveAbstractReader.setup():327
> org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():245
> org.apache.drill.exec.physical.impl.ScanBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema():118
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.physical.impl.BaseRootExec.next():103
> 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
> org.apache.drill.exec.physical.impl.BaseRootExec.next():93
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> 

[jira] [Commented] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/

2018-11-20 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693962#comment-16693962
 ] 

Robert Hou commented on DRILL-6569:
---

Enabling "store.hive.optimize_scan_with_native_readers" allows the test to pass.

> Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not 
> read value at 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> --
>
> Key: DRILL-6569
> URL: https://issues.apache.org/jira/browse/DRILL-6569
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Storage - Hive
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.15.0
>
>
> This is TPCDS Query 19.
> I am able to scan the parquet file using:
>select * from 
> dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet`
> and I get 3,349,279 rows selected.
> There are roughly 15 similar failures in the Advanced nightly run, out of 37 
> failures.  So this issue accounts for about half the failures.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql
> SELECT i_brand_id  brand_id,
> i_brand brand,
> i_manufact_id,
> i_manufact,
> Sum(ss_ext_sales_price) ext_price
> FROM   date_dim,
> store_sales,
> item,
> customer,
> customer_address,
> store
> WHERE  d_date_sk = ss_sold_date_sk
> AND ss_item_sk = i_item_sk
> AND i_manager_id = 38
> AND d_moy = 12
> AND d_year = 1998
> AND ss_customer_sk = c_customer_sk
> AND c_current_addr_sk = ca_address_sk
> AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5)
> AND ss_store_sk = s_store_sk
> GROUP  BY i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> ORDER  BY ext_price DESC,
> i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> LIMIT 100;
> Here is the stack trace:
> 2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
> Exception:
> java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block 
> 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> Fragment 4:26
> [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010]
>   (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at 
> 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> 
> hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243
> hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57
> 
> org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417
> org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54
> org.apache.drill.exec.physical.impl.ScanBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> 

[jira] [Updated] (DRILL-6787) Update Spnego webpage

2018-10-10 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6787:
--
Description: 
A few things should be updated on this webpage:
https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/

When configuring drillbits in drill-override.conf, the principal and keytab 
should be corrected.  There are two places where this should be corrected.
{noformat}
drill.exec.http: {
  auth.spnego.principal:"HTTP/hostname@realm",
  auth.spnego.keytab:"path/to/keytab",
  auth.mechanisms: [“SPNEGO”]
}
{noformat}

For the section on Chrome, we should change "hostname/domain" to "domain".  
Also, the two blanks around the "=" should be removed.
{noformat}
google-chrome --auth-server-whitelist="domain"
example: google-chrome --auth-server-whitelist="machine.example.com"
example: google-chrome --auth-server-whitelist="*.example.com"

The IP address can also be used
example: google-chrome --auth-server-whitelist="10.10.100.101"

The URL given to Chrome to access the Web UI should match the domain specified 
in auth-server-whitelist.  If the domain is used in auth-server-whitelist, then 
the domain should be used with Chrome.  If the IP address is used in 
auth-server-whitelist, then the IP address should be used with Chrome.
{noformat}

Also, Linux and Mac should be treated in separate paragraphs.  These should be 
the directions for Mac:
{noformat}
cd /Applications/Google Chrome.app/Contents/MacOS
./"Google Chrome" --auth-server-whitelist="*.example.com"
{noformat}

  was:
A few things should be updated on this webpage:
https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/

When configuring drillbits in drill-override.conf, the principal and keytab 
should be corrected.  There are two places where this should be corrected.
{noformat}
drill.exec.http: {
  auth.spnego.principal:"HTTP/hostname@realm",
  auth.spnego.keytab:"path/to/keytab",
  auth.mechanisms: [“SPNEGO”]
}
{noformat}
For the section on Chrome, we should change "hostname/domain" to "domain".  
Also, the two blanks around the "=" should be removed.
{noformat}
google-chrome --auth-server-whitelist="domain"
example: google-chrome --auth-server-whitelist="machine.example.com"
example: google-chrome --auth-server-whitelist="*.example.com"

The IP address can also be used
example: google-chrome --auth-server-whitelist="10.10.100.101"
The URL given to Chrome to access the Web UI should match the domain specified 
in auth-server-whitelist.  If the domain is used in auth-server-whitelist, then 
the domain should be used with Chrome.  If the IP address is used in 
auth-server-whitelist, then the IP address should be used with Chrome.
{noformat}

Also, Linux and Mac should be treated in separate paragraphs.  These should be 
the directions for Mac:
{noformat}
cd /Applications/Google Chrome.app/Contents/MacOS
./"Google Chrome" --auth-server-whitelist="*.example.com"
{noformat}


> Update Spnego webpage
> -
>
> Key: DRILL-6787
> URL: https://issues.apache.org/jira/browse/DRILL-6787
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
> Fix For: 1.15.0
>
>
> A few things should be updated on this webpage:
> https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/
> When configuring drillbits in drill-override.conf, the principal and keytab 
> should be corrected.  There are two places where this should be corrected.
> {noformat}
> drill.exec.http: {
>   auth.spnego.principal:"HTTP/hostname@realm",
>   auth.spnego.keytab:"path/to/keytab",
>   auth.mechanisms: [“SPNEGO”]
> }
> {noformat}
> For the section on Chrome, we should change "hostname/domain" to "domain".  
> Also, the two blanks around the "=" should be removed.
> {noformat}
> google-chrome --auth-server-whitelist="domain"
> example: google-chrome --auth-server-whitelist="machine.example.com"
> example: google-chrome --auth-server-whitelist="*.example.com"
> The IP address can also be used
> example: google-chrome --auth-server-whitelist="10.10.100.101"
> The URL given to Chrome to access the Web UI should match the domain 
> specified in auth-server-whitelist.  If the domain is used in 
> auth-server-whitelist, then the domain should be used with Chrome.  If the IP 
> address is used in auth-server-whitelist, then the IP address should be used 
> with Chrome.
> {noformat}
> Also, Linux and Mac should be treated in separate paragraphs.  These should 
> be the directions for Mac:
> {noformat}
> cd /Applications/Google Chrome.app/Contents/MacOS
> ./"Google Chrome" 

[jira] [Updated] (DRILL-6787) Update Spnego webpage

2018-10-10 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6787:
--
Description: 
A few things should be updated on this webpage:
https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/

When configuring drillbits in drill-override.conf, the principal and keytab 
should be corrected.  There are two places where this should be corrected.
{noformat}
drill.exec.http: {
  auth.spnego.principal:"HTTP/hostname@realm",
  auth.spnego.keytab:"path/to/keytab",
  auth.mechanisms: [“SPNEGO”]
}
{noformat}
For the section on Chrome, we should change "hostname/domain" to "domain".  
Also, the two blanks around the "=" should be removed.
{noformat}
google-chrome --auth-server-whitelist="domain"
example: google-chrome --auth-server-whitelist="machine.example.com"
example: google-chrome --auth-server-whitelist="*.example.com"

The IP address can also be used
example: google-chrome --auth-server-whitelist="10.10.100.101"
{noformat}
The URL given to Chrome to access the Web UI should match the domain specified 
in auth-server-whitelist.  If the domain is used in auth-server-whitelist, then 
the domain should be used with Chrome.  If the IP address is used in 
auth-server-whitelist, then the IP address should be used with Chrome.

Also, Linux and Mac should be treated in separate paragraphs.  These should be 
the directions for Mac:
{noformat}
cd /Applications/Google Chrome.app/Contents/MacOS
./"Google Chrome" --auth-server-whitelist="*.example.com"
{noformat}

  was:
A few things should be updated on this webpage:
https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/

When configuring drillbits in drill-override.conf, the principal and keytab 
should be corrected.  There are two places where this should be corrected.
{noformat}
drill.exec.http: {
  auth.spnego.principal:"HTTP/hostname@realm",
  auth.spnego.keytab:"path/to/keytab",
  auth.mechanisms: [“SPNEGO”]
}
{noformat}
For the section on Chrome, we should change "hostname/domain" to "domain".  
Also, the two blanks around the "=" should be removed.
{noformat}
google-chrome --auth-server-whitelist="domain"
example: google-chrome --auth-server-whitelist="machine.example.com"
example: google-chrome --auth-server-whitelist="*.example.com"

The IP address can also be used
example: google-chrome --auth-server-whitelist="10.10.100.101"

The URL given to Chrome to access the Web UI should match the domain specified 
in auth-server-whitelist.  If the domain is used in auth-server-whitelist, then 
the domain should be used with Chrome.  If the IP address is used in 
auth-server-whitelist, then the IP address should be used with Chrome.
{noformat}

Also, Linux and Mac should be treated in separate paragraphs.  These should be 
the directions for Mac:
{noformat}
cd /Applications/Google Chrome.app/Contents/MacOS
./"Google Chrome" --auth-server-whitelist="*.example.com"
{noformat}


> Update Spnego webpage
> -
>
> Key: DRILL-6787
> URL: https://issues.apache.org/jira/browse/DRILL-6787
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
> Fix For: 1.15.0
>
>
> A few things should be updated on this webpage:
> https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/
> When configuring drillbits in drill-override.conf, the principal and keytab 
> should be corrected.  There are two places where this should be corrected.
> {noformat}
> drill.exec.http: {
>   auth.spnego.principal:"HTTP/hostname@realm",
>   auth.spnego.keytab:"path/to/keytab",
>   auth.mechanisms: [“SPNEGO”]
> }
> {noformat}
> For the section on Chrome, we should change "hostname/domain" to "domain".  
> Also, the two blanks around the "=" should be removed.
> {noformat}
> google-chrome --auth-server-whitelist="domain"
> example: google-chrome --auth-server-whitelist="machine.example.com"
> example: google-chrome --auth-server-whitelist="*.example.com"
> The IP address can also be used
> example: google-chrome --auth-server-whitelist="10.10.100.101"
> {noformat}
> The URL given to Chrome to access the Web UI should match the domain 
> specified in auth-server-whitelist.  If the domain is used in 
> auth-server-whitelist, then the domain should be used with Chrome.  If the IP 
> address is used in auth-server-whitelist, then the IP address should be used 
> with Chrome.
> Also, Linux and Mac should be treated in separate paragraphs.  These should 
> be the directions for Mac:
> {noformat}
> cd /Applications/Google Chrome.app/Contents/MacOS
> ./"Google Chrome" 

[jira] [Updated] (DRILL-6787) Update Spnego webpage

2018-10-10 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6787:
--
Description: 
A few things should be updated on this webpage:
https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/

When configuring drillbits in drill-override.conf, the principal and keytab 
should be corrected.  There are two places where this should be corrected.
{noformat}
drill.exec.http: {
  auth.spnego.principal:"HTTP/hostname@realm",
  auth.spnego.keytab:"path/to/keytab",
  auth.mechanisms: [“SPNEGO”]
}
{noformat}
For the section on Chrome, we should change "hostname/domain" to "domain".  
Also, the two blanks around the "=" should be removed.
{noformat}
google-chrome --auth-server-whitelist="domain"
example: google-chrome --auth-server-whitelist="machine.example.com"
example: google-chrome --auth-server-whitelist="*.example.com"

The IP address can also be used
example: google-chrome --auth-server-whitelist="10.10.100.101"
The URL given to Chrome to access the Web UI should match the domain specified 
in auth-server-whitelist.  If the domain is used in auth-server-whitelist, then 
the domain should be used with Chrome.  If the IP address is used in 
auth-server-whitelist, then the IP address should be used with Chrome.
{noformat}

Also, Linux and Mac should be treated in separate paragraphs.  These should be 
the directions for Mac:
{noformat}
cd /Applications/Google Chrome.app/Contents/MacOS
./"Google Chrome" --auth-server-whitelist="*.example.com"
{noformat}

  was:
A few things should be updated on this webpage:
https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/

When configuring drillbits in drill-override.conf, the principal and keytab 
should be corrected.  There are two places where this should be corrected.
{noformat}
drill.exec.http: {
  auth.spnego.principal:"HTTP/hostname@realm",
  auth.spnego.keytab:"path/to/keytab",
  auth.mechanisms: [“SPNEGO”]
}
{noformat}
For the section on Chrome, we should change "hostname/domain" to "domain".  
Also, the two blanks around the "=" should be removed.
{noformat}
google-chrome --auth-server-whitelist="domain"
example: google-chrome --auth-server-whitelist="machine.example.com"
example: google-chrome --auth-server-whitelist="*.example.com"

The IP address can also be used
example: google-chrome --auth-server-whitelist="10.10.100.101"
{noformat}
The URL given to Chrome to access the Web UI should match the domain specified 
in auth-server-whitelist.  If the domain is used in auth-server-whitelist, then 
the domain should be used with Chrome.  If the IP address is used in 
auth-server-whitelist, then the IP address should be used with Chrome.

Also, Linux and Mac should be treated in separate paragraphs.  These should be 
the directions for Mac:
{noformat}
cd /Applications/Google Chrome.app/Contents/MacOS
./"Google Chrome" --auth-server-whitelist="*.example.com"
{noformat}


> Update Spnego webpage
> -
>
> Key: DRILL-6787
> URL: https://issues.apache.org/jira/browse/DRILL-6787
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
> Fix For: 1.15.0
>
>
> A few things should be updated on this webpage:
> https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/
> When configuring drillbits in drill-override.conf, the principal and keytab 
> should be corrected.  There are two places where this should be corrected.
> {noformat}
> drill.exec.http: {
>   auth.spnego.principal:"HTTP/hostname@realm",
>   auth.spnego.keytab:"path/to/keytab",
>   auth.mechanisms: [“SPNEGO”]
> }
> {noformat}
> For the section on Chrome, we should change "hostname/domain" to "domain".  
> Also, the two blanks around the "=" should be removed.
> {noformat}
> google-chrome --auth-server-whitelist="domain"
> example: google-chrome --auth-server-whitelist="machine.example.com"
> example: google-chrome --auth-server-whitelist="*.example.com"
> The IP address can also be used
> example: google-chrome --auth-server-whitelist="10.10.100.101"
> The URL given to Chrome to access the Web UI should match the domain 
> specified in auth-server-whitelist.  If the domain is used in 
> auth-server-whitelist, then the domain should be used with Chrome.  If the IP 
> address is used in auth-server-whitelist, then the IP address should be used 
> with Chrome.
> {noformat}
> Also, Linux and Mac should be treated in separate paragraphs.  These should 
> be the directions for Mac:
> {noformat}
> cd /Applications/Google Chrome.app/Contents/MacOS
> ./"Google Chrome" 

[jira] [Updated] (DRILL-6787) Update Spnego webpage

2018-10-10 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6787:
--
Description: 
A few things should be updated on this webpage:
https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/

When configuring drillbits in drill-override.conf, the principal and keytab 
should be corrected.  There are two places where this should be corrected.
{noformat}
drill.exec.http: {
  auth.spnego.principal:"HTTP/hostname@realm",
  auth.spnego.keytab:"path/to/keytab",
  auth.mechanisms: [“SPNEGO”]
}
{noformat}
For the section on Chrome, we should change "hostname/domain" to "domain".  
Also, the two blanks around the "=" should be removed.
{noformat}
google-chrome --auth-server-whitelist="domain"
example: google-chrome --auth-server-whitelist="machine.example.com"
example: google-chrome --auth-server-whitelist="*.example.com"

The IP address can also be used
example: google-chrome --auth-server-whitelist="10.10.100.101"

The URL given to Chrome to access the Web UI should match the domain specified 
in auth-server-whitelist.  If the domain is used in auth-server-whitelist, then 
the domain should be used with Chrome.  If the IP address is used in 
auth-server-whitelist, then the IP address should be used with Chrome.
{noformat}

Also, Linux and Mac should be treated in separate paragraphs.  These should be 
the directions for Mac:
{noformat}
cd /Applications/Google Chrome.app/Contents/MacOS
./"Google Chrome" --auth-server-whitelist="*.example.com"
{noformat}

  was:
A few things should be updated on this webpage:
https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/

When configuring drillbits in drill-override.conf, the principal and keytab 
should be corrected.  There are two places where this should be corrected.
{noformat}
drill.exec.http: {
  auth.spnego.principal:"HTTP/hostname@realm",
  auth.spnego.keytab:"path/to/keytab",
  auth.mechanisms: [“SPNEGO”]
}
{noformat}
For the section on Chrome, we should change "hostname/domain" to "domain".  Or 
"hostname@domain".  Also, the two blanks around the "=" should be removed.
{noformat}
google-chrome --auth-server-whitelist="hostname/domain"
{noformat}
Also, for the section on Chrome, the "domain" should match the URL given to 
Chrome to access the Web UI.

Also, Linux and Mac should be treated in separate paragraphs.  These should be 
the directions for Mac:
{noformat}
cd /Applications/Google Chrome.app/Contents/MacOS
./"Google Chrome" --auth-server-whitelist="example.com"
{noformat}


> Update Spnego webpage
> -
>
> Key: DRILL-6787
> URL: https://issues.apache.org/jira/browse/DRILL-6787
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
> Fix For: 1.15.0
>
>
> A few things should be updated on this webpage:
> https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/
> When configuring drillbits in drill-override.conf, the principal and keytab 
> should be corrected.  There are two places where this should be corrected.
> {noformat}
> drill.exec.http: {
>   auth.spnego.principal:"HTTP/hostname@realm",
>   auth.spnego.keytab:"path/to/keytab",
>   auth.mechanisms: [“SPNEGO”]
> }
> {noformat}
> For the section on Chrome, we should change "hostname/domain" to "domain".  
> Also, the two blanks around the "=" should be removed.
> {noformat}
> google-chrome --auth-server-whitelist="domain"
> example: google-chrome --auth-server-whitelist="machine.example.com"
> example: google-chrome --auth-server-whitelist="*.example.com"
> The IP address can also be used
> example: google-chrome --auth-server-whitelist="10.10.100.101"
> The URL given to Chrome to access the Web UI should match the domain 
> specified in auth-server-whitelist.  If the domain is used in 
> auth-server-whitelist, then the domain should be used with Chrome.  If the IP 
> address is used in auth-server-whitelist, then the IP address should be used 
> with Chrome.
> {noformat}
> Also, Linux and Mac should be treated in separate paragraphs.  These should 
> be the directions for Mac:
> {noformat}
> cd /Applications/Google Chrome.app/Contents/MacOS
> ./"Google Chrome" --auth-server-whitelist="*.example.com"
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6787) Update Spnego webpage

2018-10-09 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6787:
-

 Summary: Update Spnego webpage
 Key: DRILL-6787
 URL: https://issues.apache.org/jira/browse/DRILL-6787
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Bridget Bevens
 Fix For: 1.15.0


A few things should be updated on this webpage:
https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/

When configuring drillbits in drill-override.conf, the principal and keytab 
should be corrected.  There are two places where this should be corrected.
{noformat}
drill.exec.http: {
  auth.spnego.principal:"HTTP/hostname@realm",
  auth.spnego.keytab:"path/to/keytab",
  auth.mechanisms: [“SPNEGO”]
}
{noformat}
For the section on Chrome, we should change "hostname/domain" to "domain".  Or 
"hostname@domain".  Also, the two blanks around the "=" should be removed.
{noformat}
google-chrome --auth-server-whitelist="hostname/domain"
{noformat}
Also, for the section on Chrome, the "domain" should match the URL given to 
Chrome to access the Web UI.

Also, Linux and Mac should be treated in separate paragraphs.  These should be 
the directions for Mac:
{noformat}
cd /Applications/Google Chrome.app/Contents/MacOS
./"Google Chrome" --auth-server-whitelist="example.com"
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema

2018-09-04 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603787#comment-16603787
 ] 

Robert Hou commented on DRILL-6726:
---

[~arina] I have verified the fix.  Thanks!

Yes, we test with impersonation enabled most of the time.

> Drill should return a better error message when an existing view uses a table 
> that has a mixed case schema
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: student
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references a file that includes a schema which has upper case letters, the 
> view needs to be rebuilt.  There may be variations on this issue that I have 
> not seen.
> To reproduce this problem, create a dfs workspace like this:
> {noformat}
> "drillTestDirP1": {
>   "location": "/drill/testdata/p1tests",
>   "writable": true,
>   "defaultInputFormat": "parquet",
>   "allowAccessOutsideWorkspace": false
> },
> {noformat}
> Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
> command:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute 
> this query:
> {noformat}
> select * from student_test_v;
> {noformat}
> Drill will return an exception:
> {noformat}
> Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
> schema drillTestDirP1 not available in schema dfs.
> View Context dfs, drillTestDirP1
> View SQL SELECT *
> FROM `dfs.drillTestDirP1`.`student`
> [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> I have attached the student parquet file I used.
> This is what the .view.drill file looks like:
> {noformat}
> {
>   "name" : "student_test_v",
>   "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
>   "fields" : [ {
> "name" : "**",
> "type" : "DYNAMIC_STAR",
> "isNullable" : true
>   } ],
>   "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
> }
> {noformat}
> This means that users may not be able to access views that they have created 
> using previous versions of Drill.  We should maintain backwards 
> compatibiliity where possible.
> As work-around, these views can be re-created.  It would be helpful to users 
> if the error message explains that these views need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema

2018-09-03 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6726:
--
Description: 
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing 
view was created before (DRILL-6492) was committed, and this view references a 
file that includes a schema which has upper case letters, the view needs to be 
rebuilt.  There may be variations on this issue that I have not seen.

To reproduce this problem, create a dfs workspace like this:
{noformat}
"drillTestDirP1": {
  "location": "/drill/testdata/p1tests",
  "writable": true,
  "defaultInputFormat": "parquet",
  "allowAccessOutsideWorkspace": false
},
{noformat}

Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
command:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
query:
{noformat}
select * from student_test_v;
{noformat}
Drill will return an exception:
{noformat}
Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
schema drillTestDirP1 not available in schema dfs.

View Context dfs, drillTestDirP1
View SQL SELECT *
FROM `dfs.drillTestDirP1`.`student`

[Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
(state=,code=0)
{noformat}

I have attached the student parquet file I used.

This is what the .view.drill file looks like:
{noformat}
{
  "name" : "student_test_v",
  "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
  "fields" : [ {
"name" : "**",
"type" : "DYNAMIC_STAR",
"isNullable" : true
  } ],
  "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
}
{noformat}

This means that users may not be able to access views that they have created 
using previous versions of Drill.  We should maintain backwards compatibiliity 
where possible.

As work-around, these views can be re-created.  It would be helpful to users if 
the error message explains that these views need to be re-created.

  was:
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing 
view was created before (DRILL-6492) was committed, and this view references a 
schema which has upper case letters, the view needs to be rebuilt.  There may 
be variations on this issue that I have not seen.

To reproduce this problem, create a dfs workspace like this:
{noformat}
"drillTestDirP1": {
  "location": "/drill/testdata/p1tests",
  "writable": true,
  "defaultInputFormat": "parquet",
  "allowAccessOutsideWorkspace": false
},
{noformat}

Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
command:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
query:
{noformat}
select * from student_test_v;
{noformat}
Drill will return an exception:
{noformat}
Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
schema drillTestDirP1 not available in schema dfs.

View Context dfs, drillTestDirP1
View SQL SELECT *
FROM `dfs.drillTestDirP1`.`student`

[Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
(state=,code=0)
{noformat}

I have attached the student parquet file I used.

This is what the .view.drill file looks like:
{noformat}
{
  "name" : "student_test_v",
  "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
  "fields" : [ {
"name" : "**",
"type" : "DYNAMIC_STAR",
"isNullable" : true
  } ],
  "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
}
{noformat}

This means that users may not be able to access views that they have created 
using previous versions of Drill.  We should maintain backwards compatibiliity 
where possible.

As work-around, these views can be re-created.  It would be helpful to users if 
the error message explains that these views need to be re-created.


> Drill should return a better error message when an existing view uses a table 
> that has a mixed case schema
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: student
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references a file that includes a 

[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema

2018-09-03 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6726:
--
Description: 
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing 
view was created before (DRILL-6492) was committed, and this view references a 
schema which has upper case letters, the view needs to be rebuilt.  There may 
be variations on this issue that I have not seen.

To reproduce this problem, create a dfs workspace like this:
{noformat}
"drillTestDirP1": {
  "location": "/drill/testdata/p1tests",
  "writable": true,
  "defaultInputFormat": "parquet",
  "allowAccessOutsideWorkspace": false
},
{noformat}

Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
command:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
query:
{noformat}
select * from student_test_v;
{noformat}
Drill will return an exception:
{noformat}
Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
schema drillTestDirP1 not available in schema dfs.

View Context dfs, drillTestDirP1
View SQL SELECT *
FROM `dfs.drillTestDirP1`.`student`

[Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
(state=,code=0)
{noformat}

I have attached the student parquet file I used.

This is what the .view.drill file looks like:
{noformat}
{
  "name" : "student_test_v",
  "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
  "fields" : [ {
"name" : "**",
"type" : "DYNAMIC_STAR",
"isNullable" : true
  } ],
  "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
}
{noformat}

This means that users may not be able to access views that they have created 
using previous versions of Drill.  We should maintain backwards compatibiliity 
where possible.

As work-around, these views can be re-created.  It would be helpful to users if 
the error message explains that these views need to be re-created.

  was:
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing 
view was created before (DRILL-6492) was committed, and this view references a 
schema which has upper case letters, the view needs to be rebuilt.  There may 
be variations on this issue that I have not seen.

To reproduce this problem, create a workspace like this:
This is the workspace configuration I used:
{noformat}
"drillTestDirP1": {
  "location": "/drill/testdata/p1tests",
  "writable": true,
  "defaultInputFormat": "parquet",
  "allowAccessOutsideWorkspace": false
},
{noformat}


use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
command:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
query:
{noformat}
select * from student_test_v;
{noformat}
Drill will return an exception:
{noformat}
Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
schema drillTestDirP1 not available in schema dfs.

View Context dfs, drillTestDirP1
View SQL SELECT *
FROM `dfs.drillTestDirP1`.`student`

[Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
(state=,code=0)
{noformat}

I have attached the student parquet file I used.



This is what the .view.drill file looks like:
{noformat}
{
  "name" : "student_test_v",
  "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
  "fields" : [ {
"name" : "**",
"type" : "DYNAMIC_STAR",
"isNullable" : true
  } ],
  "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
}
{noformat}

This means that users may not be able to access views that they have created 
using previous versions of Drill.  We should maintain backwards compatibiliity 
where possible.

As work-around, these views can be re-created.  It would be helpful to users if 
the error message explains that these views need to be re-created.


> Drill should return a better error message when an existing view uses a table 
> that has a mixed case schema
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: student
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references 

[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema

2018-09-03 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6726:
--
Description: 
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing 
view was created before (DRILL-6492) was committed, and this view references a 
schema which has upper case letters, the view needs to be rebuilt.  There may 
be variations on this issue that I have not seen.

To reproduce this problem, create a workspace like this:
This is the workspace configuration I used:
{noformat}
"drillTestDirP1": {
  "location": "/drill/testdata/p1tests",
  "writable": true,
  "defaultInputFormat": "parquet",
  "allowAccessOutsideWorkspace": false
},
{noformat}


use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
command:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
query:
{noformat}
select * from student_test_v;
{noformat}
Drill will return an exception:
{noformat}
Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
schema drillTestDirP1 not available in schema dfs.

View Context dfs, drillTestDirP1
View SQL SELECT *
FROM `dfs.drillTestDirP1`.`student`

[Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
(state=,code=0)
{noformat}

I have attached the student parquet file I used.



This is what the .view.drill file looks like:
{noformat}
{
  "name" : "student_test_v",
  "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
  "fields" : [ {
"name" : "**",
"type" : "DYNAMIC_STAR",
"isNullable" : true
  } ],
  "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
}
{noformat}

This means that users may not be able to access views that they have created 
using previous versions of Drill.  We should maintain backwards compatibiliity 
where possible.

As work-around, these views can be re-created.  It would be helpful to users if 
the error message explains that these views need to be re-created.

  was:
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing 
view was created before (DRILL-6492) was committed, and this view references a 
schema which has upper case letters, the view needs to be rebuilt.  There may 
be variations on this issue that I have not seen.

To reproduce this problem, use Drill commit 
ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
query:
{noformat}
select * from student_test_v;
{noformat}
Drill will return an exception:
{noformat}
Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
schema drillTestDirP1 not available in schema dfs.

View Context dfs, drillTestDirP1
View SQL SELECT *
FROM `dfs.drillTestDirP1`.`student`

[Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
(state=,code=0)
{noformat}

I have attached the student parquet file I used.

This is the workspace configuration I used:
{noformat}
"drillTestDirP1": {
  "location": "/drill/testdata/p1tests",
  "writable": true,
  "defaultInputFormat": "parquet",
  "allowAccessOutsideWorkspace": false
},
{noformat}

This is what the .view.drill file looks like:
{noformat}
{
  "name" : "student_test_v",
  "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
  "fields" : [ {
"name" : "**",
"type" : "DYNAMIC_STAR",
"isNullable" : true
  } ],
  "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
}
{noformat}

This means that users may not be able to access views that they have created 
using previous versions of Drill.  We should maintain backwards compatibiliity 
where possible.

As work-around, these views can be re-created.  It would be helpful to users if 
the error message explains that these views need to be re-created.


> Drill should return a better error message when an existing view uses a table 
> that has a mixed case schema
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: student
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> 

[jira] [Commented] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema

2018-09-03 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602488#comment-16602488
 ] 

Robert Hou commented on DRILL-6726:
---

[~arina] I updated the description and attached the parquet file that I used.

> Drill should return a better error message when an existing view uses a table 
> that has a mixed case schema
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: student
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt.  There may be variations on this issue that I have not seen.
> To reproduce this problem, use Drill commit 
> ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute 
> this query:
> {noformat}
> select * from student_test_v;
> {noformat}
> Drill will return an exception:
> {noformat}
> Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
> schema drillTestDirP1 not available in schema dfs.
> View Context dfs, drillTestDirP1
> View SQL SELECT *
> FROM `dfs.drillTestDirP1`.`student`
> [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> I have attached the student parquet file I used.
> This is the workspace configuration I used:
> {noformat}
> "drillTestDirP1": {
>   "location": "/drill/testdata/p1tests",
>   "writable": true,
>   "defaultInputFormat": "parquet",
>   "allowAccessOutsideWorkspace": false
> },
> {noformat}
> This is what the .view.drill file looks like:
> {noformat}
> {
>   "name" : "student_test_v",
>   "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
>   "fields" : [ {
> "name" : "**",
> "type" : "DYNAMIC_STAR",
> "isNullable" : true
>   } ],
>   "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
> }
> {noformat}
> This means that users may not be able to access views that they have created 
> using previous versions of Drill.  We should maintain backwards 
> compatibiliity where possible.
> As work-around, these views can be re-created.  It would be helpful to users 
> if the error message explains that these views need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema

2018-09-03 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6726:
--
Attachment: student

> Drill should return a better error message when an existing view uses a table 
> that has a mixed case schema
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: student
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt.  There may be variations on this issue that I have not seen.
> To reproduce this problem, use Drill commit 
> ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute 
> this query:
> {noformat}
> select * from student_test_v;
> {noformat}
> Drill will return an exception:
> {noformat}
> Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
> schema drillTestDirP1 not available in schema dfs.
> View Context dfs, drillTestDirP1
> View SQL SELECT *
> FROM `dfs.drillTestDirP1`.`student`
> [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> I have attached the student parquet file I used.
> This is the workspace configuration I used:
> {noformat}
> "drillTestDirP1": {
>   "location": "/drill/testdata/p1tests",
>   "writable": true,
>   "defaultInputFormat": "parquet",
>   "allowAccessOutsideWorkspace": false
> },
> {noformat}
> This is what the .view.drill file looks like:
> {noformat}
> {
>   "name" : "student_test_v",
>   "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
>   "fields" : [ {
> "name" : "**",
> "type" : "DYNAMIC_STAR",
> "isNullable" : true
>   } ],
>   "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
> }
> {noformat}
> This means that users may not be able to access views that they have created 
> using previous versions of Drill.  We should maintain backwards 
> compatibiliity where possible.
> As work-around, these views can be re-created.  It would be helpful to users 
> if the error message explains that these views need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema

2018-09-03 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6726:
--
Description: 
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing 
view was created before (DRILL-6492) was committed, and this view references a 
schema which has upper case letters, the view needs to be rebuilt.  There may 
be variations on this issue that I have not seen.

To reproduce this problem, use Drill commit 
ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
query:
{noformat}
select * from student_test_v;
{noformat}
Drill will return an exception:
{noformat}
Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
schema drillTestDirP1 not available in schema dfs.

View Context dfs, drillTestDirP1
View SQL SELECT *
FROM `dfs.drillTestDirP1`.`student`

[Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
(state=,code=0)
{noformat}

I have attached the student parquet file I used.

This is the workspace configuration I used:
{noformat}
"drillTestDirP1": {
  "location": "/drill/testdata/p1tests",
  "writable": true,
  "defaultInputFormat": "parquet",
  "allowAccessOutsideWorkspace": false
},
{noformat}

This is what the .view.drill file looks like:
{noformat}
{
  "name" : "student_test_v",
  "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
  "fields" : [ {
"name" : "**",
"type" : "DYNAMIC_STAR",
"isNullable" : true
  } ],
  "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
}
{noformat}

This means that users may not be able to access views that they have created 
using previous versions of Drill.  We should maintain backwards compatibiliity 
where possible.

As work-around, these views can be re-created.  It would be helpful to users if 
the error message explains that these views need to be re-created.

  was:
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing 
view was created before (DRILL-6492) was committed, and this view references a 
schema which has upper case letters, the view needs to be rebuilt.

To reproduce this problem, use Drill commit 
ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
The use Drill commit 
If a query references this schema, Drill will return an exception:
{noformat}
java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.
{noformat}
It would be helpful to users if the error message explains that these views 
need to be re-created.


> Drill should return a better error message when an existing view uses a table 
> that has a mixed case schema
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.15.0
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt.  There may be variations on this issue that I have not seen.
> To reproduce this problem, use Drill commit 
> ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute 
> this query:
> {noformat}
> select * from student_test_v;
> {noformat}
> Drill will return an exception:
> {noformat}
> Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
> schema drillTestDirP1 not available in schema dfs.
> View Context dfs, drillTestDirP1
> View SQL SELECT *
> FROM `dfs.drillTestDirP1`.`student`
> [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> I have attached the student parquet file I used.
> This is the workspace configuration I used:
> {noformat}
> "drillTestDirP1": {
>   "location": "/drill/testdata/p1tests",
>   "writable": true,
>   "defaultInputFormat": "parquet",
>   

[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema

2018-09-03 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6726:
--
Description: 
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing 
view was created before (DRILL-6492) was committed, and this view references a 
schema which has upper case letters, the view needs to be rebuilt.

To reproduce this problem, use Drill commit 
ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
The use Drill commit 
If a query references this schema, Drill will return an exception:
{noformat}
java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.
{noformat}
It would be helpful to users if the error message explains that these views 
need to be re-created.

  was:
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing 
view was created before (DRILL-6492) was committed, and this view references a 
schema which has upper case letters, the view needs to be rebuilt.

To reproduce this problem, use Drill commit 
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
If a query references this schema, Drill will return an exception:
{noformat}
java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.
{noformat}
It would be helpful to users if the error message explains that these views 
need to be re-created.


> Drill should return a better error message when an existing view uses a table 
> that has a mixed case schema
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.15.0
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt.
> To reproduce this problem, use Drill commit 
> ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> The use Drill commit 
> If a query references this schema, Drill will return an exception:
> {noformat}
> java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
> view. Requested schema drillTestDirP1 not available in schema dfs.
> {noformat}
> It would be helpful to users if the error message explains that these views 
> need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema

2018-09-03 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6726:
--
Description: 
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing 
view was created before (DRILL-6492) was committed, and this view references a 
schema which has upper case letters, the view needs to be rebuilt.

To reproduce this problem, use Drill commit 
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
If a query references this schema, Drill will return an exception:
{noformat}
java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.
{noformat}
It would be helpful to users if the error message explains that these views 
need to be re-created.

  was:
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing 
view was created before (DRILL-6492) was committed, and this view references a 
schema which has upper case letters, the view needs to be rebuilt. For example:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
If a query references this schema, Drill will return an exception:
{noformat}
java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.
{noformat}
It would be helpful to users if the error message explains that these views 
need to be re-created.


> Drill should return a better error message when an existing view uses a table 
> that has a mixed case schema
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.15.0
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt.
> To reproduce this problem, use Drill commit 
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> If a query references this schema, Drill will return an exception:
> {noformat}
> java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
> view. Requested schema drillTestDirP1 not available in schema dfs.
> {noformat}
> It would be helpful to users if the error message explains that these views 
> need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema

2018-09-02 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601712#comment-16601712
 ] 

Robert Hou commented on DRILL-6726:
---

[~arina] I clarified the issue in the description.

Use commit ddb35ce71837376c7caef28c25327ba556bb32f2 to create a view (this is 
the commit prior to DRILL-6492).

Then try to access the view using commit 
8bcb103a0e3bcc5f85a03cbed3c6c0cea254ec4e , which is the commit for DRILL-6492.

> Drill should return a better error message when an existing view uses a table 
> that has a mixed case schema
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.15.0
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt. For example:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> If a query references this schema, Drill will return an exception:
> {noformat}
> java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
> view. Requested schema drillTestDirP1 not available in schema dfs.
> {noformat}
> It would be helpful to users if the error message explains that these views 
> need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema

2018-09-02 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6726:
--
Description: 
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing 
view was created before (DRILL-6492) was committed, and this view references a 
schema which has upper case letters, the view needs to be rebuilt. For example:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
If a query references this schema, Drill will return an exception:
{noformat}
java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.
{noformat}
It would be helpful to users if the error message explains that these views 
need to be re-created.

  was:
Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view 
references a schema which has upper case letters, the view needs to be rebuilt. 
For example:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
If a query references this schema, Drill will return an exception:
{noformat}
java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.
{noformat}
It would be helpful to users if the error message explains that these views 
need to be re-created.


> Drill should return a better error message when an existing view uses a table 
> that has a mixed case schema
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.15.0
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt. For example:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> If a query references this schema, Drill will return an exception:
> {noformat}
> java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
> view. Requested schema drillTestDirP1 not available in schema dfs.
> {noformat}
> It would be helpful to users if the error message explains that these views 
> need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema

2018-09-02 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6726:
--
Summary: Drill should return a better error message when an existing view 
uses a table that has a mixed case schema  (was: Drill should return a better 
error message when a view uses a table that has a mixed case schema)

> Drill should return a better error message when an existing view uses a table 
> that has a mixed case schema
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.15.0
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt. For example:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> If a query references this schema, Drill will return an exception:
> {noformat}
> java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
> view. Requested schema drillTestDirP1 not available in schema dfs.
> {noformat}
> It would be helpful to users if the error message explains that these views 
> need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6726) Drill should return a better error message when a view uses a table that has a mixed case schema

2018-08-31 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6726:
--
Affects Version/s: (was: 1.14.0)
   1.15.0

> Drill should return a better error message when a view uses a table that has 
> a mixed case schema
> 
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.15.0
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt. For example:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> If a query references this schema, Drill will return an exception:
> {noformat}
> java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
> view. Requested schema drillTestDirP1 not available in schema dfs.
> {noformat}
> It would be helpful to users if the error message explains that these views 
> need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6725) Drill 1.15 cannot use existing views if they reference tables with mixed case schemas

2018-08-31 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6725:
--
Affects Version/s: (was: 1.14.0)
   1.15.0

> Drill 1.15 cannot use existing views if they reference tables with mixed case 
> schemas
> -
>
> Key: DRILL-6725
> URL: https://issues.apache.org/jira/browse/DRILL-6725
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
> Fix For: 1.15.0
>
>
> Drill 1.15 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt.  For example:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> If a query references this schema, Drill will return an exception:
> {noformat}
> java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
> view. Requested schema drillTestDirP1 not available in schema dfs.
> {noformat}
> Do we have release notes?  If so, this should be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6725) Drill 1.14 cannot use existing views if they reference tables with mixed case schemas

2018-08-31 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6725:
--
Fix Version/s: (was: 1.14.0)
   1.15.0

> Drill 1.14 cannot use existing views if they reference tables with mixed case 
> schemas
> -
>
> Key: DRILL-6725
> URL: https://issues.apache.org/jira/browse/DRILL-6725
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
> Fix For: 1.15.0
>
>
> Drill 1.15 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt.  For example:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> If a query references this schema, Drill will return an exception:
> {noformat}
> java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
> view. Requested schema drillTestDirP1 not available in schema dfs.
> {noformat}
> Do we have release notes?  If so, this should be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6725) Drill 1.14 cannot use existing views if they reference tables with mixed case schemas

2018-08-31 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6725:
--
Description: 
Drill 1.15 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
references a schema which has upper case letters, the view needs to be rebuilt. 
 For example:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
If a query references this schema, Drill will return an exception:
{noformat}
java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.
{noformat}
Do we have release notes?  If so, this should be documented.

  was:
Drill 1.14 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
references a schema which has upper case letters, the view needs to be rebuilt. 
 For example:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
If a query references this schema, Drill will return an exception:
{noformat}
java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.
{noformat}
Do we have release notes?  If so, this should be documented.


> Drill 1.14 cannot use existing views if they reference tables with mixed case 
> schemas
> -
>
> Key: DRILL-6725
> URL: https://issues.apache.org/jira/browse/DRILL-6725
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
> Fix For: 1.15.0
>
>
> Drill 1.15 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt.  For example:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> If a query references this schema, Drill will return an exception:
> {noformat}
> java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
> view. Requested schema drillTestDirP1 not available in schema dfs.
> {noformat}
> Do we have release notes?  If so, this should be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6725) Drill 1.15 cannot use existing views if they reference tables with mixed case schemas

2018-08-31 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6725:
--
Summary: Drill 1.15 cannot use existing views if they reference tables with 
mixed case schemas  (was: Drill 1.14 cannot use existing views if they 
reference tables with mixed case schemas)

> Drill 1.15 cannot use existing views if they reference tables with mixed case 
> schemas
> -
>
> Key: DRILL-6725
> URL: https://issues.apache.org/jira/browse/DRILL-6725
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
> Fix For: 1.15.0
>
>
> Drill 1.15 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt.  For example:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> If a query references this schema, Drill will return an exception:
> {noformat}
> java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
> view. Requested schema drillTestDirP1 not available in schema dfs.
> {noformat}
> Do we have release notes?  If so, this should be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6726) Drill should return a better error message when a view uses a table that has a mixed case schema

2018-08-31 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6726:
-

 Summary: Drill should return a better error message when a view 
uses a table that has a mixed case schema
 Key: DRILL-6726
 URL: https://issues.apache.org/jira/browse/DRILL-6726
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Arina Ielchiieva
 Fix For: 1.15.0


Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view 
references a schema which has upper case letters, the view needs to be rebuilt. 
For example:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
If a query references this schema, Drill will return an exception:
{noformat}
java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.
{noformat}
It would be helpful to users if the error message explains that these views 
need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6725) Drill 1.14 cannot use existing views if they reference tables with mixed case schemas

2018-08-31 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6725:
--
Description: 
Drill 1.14 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
references a schema which has upper case letters, the view needs to be rebuilt. 
 For example:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
If a query references this schema, Drill will return an exception:
{noformat}
java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.
{noformat}
Do we have release notes?  If so, this should be documented.

  was:
Drill 1.14 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
references a schema which has upper case letters, the view needs to be rebuilt. 
 For example:

create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;

If a query references this schema, Drill will return an exception:

java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.

Do we have release notes?  If so, this should be documented.


> Drill 1.14 cannot use existing views if they reference tables with mixed case 
> schemas
> -
>
> Key: DRILL-6725
> URL: https://issues.apache.org/jira/browse/DRILL-6725
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
> Fix For: 1.14.0
>
>
> Drill 1.14 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt.  For example:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> If a query references this schema, Drill will return an exception:
> {noformat}
> java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
> view. Requested schema drillTestDirP1 not available in schema dfs.
> {noformat}
> Do we have release notes?  If so, this should be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6725) Drill 1.14 cannot use existing views if they reference tables with mixed case schemas

2018-08-31 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6725:
--
Description: 
Drill 1.14 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
references a schema which has upper case letters, the view needs to be rebuilt. 
 For example:

create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;

If a query references this schema, Drill will return an exception:

java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.

Do we have release notes?  If so, this should be documented.

  was:
Drill 1.14 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
references a schema which has upper case letters, the view needs to be rebuilt. 
 For example:

create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;

Do we have release notes?  If so, this should be documented.


> Drill 1.14 cannot use existing views if they reference tables with mixed case 
> schemas
> -
>
> Key: DRILL-6725
> URL: https://issues.apache.org/jira/browse/DRILL-6725
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
> Fix For: 1.14.0
>
>
> Drill 1.14 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt.  For example:
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> If a query references this schema, Drill will return an exception:
> java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
> view. Requested schema drillTestDirP1 not available in schema dfs.
> Do we have release notes?  If so, this should be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6725) Drill 1.14 cannot use existing views if they reference tables with mixed case schemas

2018-08-31 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-6725:
--
Summary: Drill 1.14 cannot use existing views if they reference tables with 
mixed case schemas  (was: Views cannot use tables with mixed case schemas)

> Drill 1.14 cannot use existing views if they reference tables with mixed case 
> schemas
> -
>
> Key: DRILL-6725
> URL: https://issues.apache.org/jira/browse/DRILL-6725
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
> Fix For: 1.14.0
>
>
> Drill 1.14 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
> references a schema which has upper case letters, the view needs to be 
> rebuilt.  For example:
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> Do we have release notes?  If so, this should be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6725) Views cannot use tables with mixed case schemas

2018-08-31 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6725:
-

 Summary: Views cannot use tables with mixed case schemas
 Key: DRILL-6725
 URL: https://issues.apache.org/jira/browse/DRILL-6725
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Bridget Bevens
 Fix For: 1.14.0


Drill 1.14 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
references a schema which has upper case letters, the view needs to be rebuilt. 
 For example:

create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;

Do we have release notes?  If so, this should be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   >