[jira] [Closed] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata
[ https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-7225. - > Merging of columnTypeInfo for file with different schema throws > NullPointerException during refresh metadata > > > Key: DRILL-7225 > URL: https://issues.apache.org/jira/browse/DRILL-7225 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.16.0 >Reporter: Venkata Jyothsna Donapati >Assignee: Venkata Jyothsna Donapati >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > Merging of columnTypeInfo from two files with different schemas throws > nullpointerexception. For example if a directory Orders has two files: > * orders.parquet (with columns order_id, order_name, order_date) > * orders_with_address.parquet (with columns order_id, order_name, address) > When refresh table metadata is triggered, metadata such as total_null_count > for columns in both the files is aggregated and updated in the > ColumnTypeInfo. Initially ColumnTypeInfo is initialized with the first file's > ColumnTypeInfo (i.e., order_id, order_name, order_date). While aggregating, > the existing ColumnTypeInfo is looked up for columns in the second file and > since some of them don't exist in the ColumnTypeInfo, a npe is thrown. This > can be fixed by initializing ColumnTypeInfo for columns that are not yet > present. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata
[ https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847139#comment-16847139 ] Robert Hou commented on DRILL-7225: --- I have not been able to reproduce the original problem. I created three files in one directory. File 1 and file 2 have the same column names, but for one of the columns, the data type is different for each file. Also, file 1 and file 3 have one column that has different names. I could not get an error with both the new code and old code (before the commit). Either way, the new code is working. > Merging of columnTypeInfo for file with different schema throws > NullPointerException during refresh metadata > > > Key: DRILL-7225 > URL: https://issues.apache.org/jira/browse/DRILL-7225 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.16.0 >Reporter: Venkata Jyothsna Donapati >Assignee: Venkata Jyothsna Donapati >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > Merging of columnTypeInfo from two files with different schemas throws > nullpointerexception. For example if a directory Orders has two files: > * orders.parquet (with columns order_id, order_name, order_date) > * orders_with_address.parquet (with columns order_id, order_name, address) > When refresh table metadata is triggered, metadata such as total_null_count > for columns in both the files is aggregated and updated in the > ColumnTypeInfo. Initially ColumnTypeInfo is initialized with the first file's > ColumnTypeInfo (i.e., order_id, order_name, order_date). While aggregating, > the existing ColumnTypeInfo is looked up for columns in the second file and > since some of them don't exist in the ColumnTypeInfo, a npe is thrown. This > can be fixed by initializing ColumnTypeInfo for columns that are not yet > present. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DRILL-7227) TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100
[ https://issues.apache.org/jira/browse/DRILL-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830888#comment-16830888 ] Robert Hou edited comment on DRILL-7227 at 5/1/19 6:32 AM: --- Query 59 is fixed. Queries 47 and 57 are still failing. Here is query 47: {noformat} WITH v1 AS (SELECT i_category, i_brand, s_store_name, s_company_name, d_year, d_moy, Sum(ss_sales_price) sum_sales, Avg(Sum(ss_sales_price)) OVER ( partition BY i_category, i_brand, s_store_name, s_company_name, d_year) avg_monthly_sales, Rank() OVER ( partition BY i_category, i_brand, s_store_name, s_company_name ORDER BY d_year, d_moy) rn FROM item, store_sales, date_dim, store WHERE ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND ss_store_sk = s_store_sk AND ( d_year = 1999 OR ( d_year = 1999 - 1 AND d_moy = 12 ) OR ( d_year = 1999 + 1 AND d_moy = 1 ) ) GROUP BY i_category, i_brand, s_store_name, s_company_name, d_year, d_moy), v2 AS (SELECT v1.i_category, v1.d_year, v1.d_moy, v1.avg_monthly_sales, v1.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum FROM v1, v1 v1_lag, v1 v1_lead WHERE v1.i_category = v1_lag.i_category AND v1.i_category = v1_lead.i_category AND v1.i_brand = v1_lag.i_brand AND v1.i_brand = v1_lead.i_brand AND v1.s_store_name = v1_lag.s_store_name AND v1.s_store_name = v1_lead.s_store_name AND v1.s_company_name = v1_lag.s_company_name AND v1.s_company_name = v1_lead.s_company_name AND v1.rn = v1_lag.rn + 1 AND v1.rn = v1_lead.rn - 1) SELECT * FROM v2 WHERE d_year = 1999 AND avg_monthly_sales > 0 AND CASE WHEN avg_monthly_sales > 0 THEN Abs(sum_sales - avg_monthly_sales) / avg_monthly_sales ELSE NULL END > 0.1 ORDER BY sum_sales - avg_monthly_sales, 3 LIMIT 100; r...@ucs-node1.perf.lab :~/perftests/BENCHMARKS/TPCDS/Queries> cat TPCDS_57.sql WITH v1 AS (SELECT i_category, i_brand, cc_name, d_year, d_moy, Sum(cs_sales_price) sum_sales , Avg(Sum(cs_sales_price)) OVER ( partition BY i_category, i_brand, cc_name, d_year) avg_monthly_sales , Rank() OVER ( partition BY i_category, i_brand, cc_name ORDER BY d_year, d_moy)rn FROM item, catalog_sales, date_dim, call_center WHERE cs_item_sk = i_item_sk AND cs_sold_date_sk = d_date_sk AND cc_call_center_sk = cs_call_center_sk AND ( d_year = 2000 OR ( d_year = 2000 - 1 AND d_moy = 12 ) OR ( d_year = 2000 + 1 AND d_moy = 1 ) ) GROUP BY i_category, i_brand, cc_name, d_year, d_moy), v2 AS (SELECT v1.i_brand, v1.d_year, v1.avg_monthly_sales, v1.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum FROM v1, v1 v1_lag, v1 v1_lead WHERE v1.i_category = v1_lag.i_category AND v1.i_category = v1_lead.i_category AND v1.i_brand = v1_lag.i_brand AND v1.i_brand = v1_lead.i_brand AND v1. cc_name = v1_lag. cc_name AND v1. cc_name = v1_lead. cc_name AND v1.rn = v1_lag.rn + 1 AND v1.rn = v1_lead.rn -
[jira] [Commented] (DRILL-7227) TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100
[ https://issues.apache.org/jira/browse/DRILL-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830888#comment-16830888 ] Robert Hou commented on DRILL-7227: --- Query 59 is fixed. Queries 47 and 57 are still failing. Here is the stack trace for query 47: {noformat} 2019-04-30 22:42:31,964 [2336ce39-d449-3a2b-2ab4-c890128bde02:foreman] INFO o.a.d.e.store.mock.MockStorageEngine - Took 21 ms to read statistics from parquet format plugin 2019-04-30 22:42:32,041 [2336ce39-d449-3a2b-2ab4-c890128bde02:foreman] INFO o.a.d.e.store.mock.MockStorageEngine - Took 12 ms to read statistics from parquet format plugin 2019-04-30 22:42:32,154 [2336ce39-d449-3a2b-2ab4-c890128bde02:foreman] INFO o.a.d.e.store.mock.MockStorageEngine - Took 9 ms to read statistics from parquet format plugin 2019-04-30 22:42:32,183 [2336ce39-d449-3a2b-2ab4-c890128bde02:foreman] INFO o.a.d.e.store.mock.MockStorageEngine - Took 9 ms to read statistics from parquet format plugin 2019-04-30 22:42:33,705 [2336ce39-d449-3a2b-2ab4-c890128bde02:foreman] ERROR o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: NullPointerException Please, refer to logs for more information. [Error Id: 6d98ee46-91d5-4ad9-84a2-2a0903e8d977 on ucs-node3.perf.lab:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: NullPointerException Please, refer to logs for more information. [Error Id: 6d98ee46-91d5-4ad9-84a2-2a0903e8d977 on ucs-node3.perf.lab:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630) ~[drill-common-1.16.0.0-mapr.jar:1.16.0.0-mapr] at org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:789) [drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr] at org.apache.drill.exec.work.foreman.QueryStateProcessor.checkCommonStates(QueryStateProcessor.java:325) [drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr] at org.apache.drill.exec.work.foreman.QueryStateProcessor.planning(QueryStateProcessor.java:221) [drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr] at org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:83) [drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:304) [drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_112] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112] Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Error while applying rule DrillPushProjectIntoScanRule:enumerable, args [rel#12063:LogicalProject.NONE.ANY([]).[](input=rel#12062:Subset#2.ENUMERABLE.ANY([]).[],ss_sold_date_sk=$1,ss_sold_time_sk=$2,ss_item_sk=$3,ss_customer_sk=$4,ss_cdemo_sk=$5,ss_hdemo_sk=$6,ss_addr_sk=$7,ss_store_sk=$8,ss_promo_sk=$9,ss_ticket_number=$10,ss_quantity=$11,ss_wholesale_cost=$12,ss_list_price=$13,ss_sales_price=$14,ss_ext_discount_amt=$15,ss_ext_sales_price=$16,ss_ext_wholesale_cost=$17,ss_ext_list_price=$18,ss_ext_tax=$19,ss_coupon_amt=$20,ss_net_paid=$21,ss_net_paid_inc_tax=$22,ss_net_profit=$23), rel#11499:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, /tpcdsParquet10/SF100/store_sales])] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:305) [drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr] ... 3 common frames omitted Caused by: java.lang.RuntimeException: Error while applying rule DrillPushProjectIntoScanRule:enumerable, args [rel#12063:LogicalProject.NONE.ANY([]).[](input=rel#12062:Subset#2.ENUMERABLE.ANY([]).[],ss_sold_date_sk=$1,ss_sold_time_sk=$2,ss_item_sk=$3,ss_customer_sk=$4,ss_cdemo_sk=$5,ss_hdemo_sk=$6,ss_addr_sk=$7,ss_store_sk=$8,ss_promo_sk=$9,ss_ticket_number=$10,ss_quantity=$11,ss_wholesale_cost=$12,ss_list_price=$13,ss_sales_price=$14,ss_ext_discount_amt=$15,ss_ext_sales_price=$16,ss_ext_wholesale_cost=$17,ss_ext_list_price=$18,ss_ext_tax=$19,ss_coupon_amt=$20,ss_net_paid=$21,ss_net_paid_inc_tax=$22,ss_net_profit=$23), rel#11499:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, /tpcdsParquet10/SF100/store_sales])] at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:236) ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0] at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:643) ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0] at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:339) ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:430)
[jira] [Created] (DRILL-7227) TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100
Robert Hou created DRILL-7227: - Summary: TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100 Key: DRILL-7227 URL: https://issues.apache.org/jira/browse/DRILL-7227 Project: Apache Drill Issue Type: Bug Components: Metadata Affects Versions: 1.16.0 Reporter: Robert Hou Assignee: Gautam Parai Fix For: 1.17.0 Attachments: 23387ab0-cb1c-cd5e-449a-c9bcefc901c1.sys.drill, 2338ae93-155b-356d-382e-0da949c6f439.sys.drill Here is query 78: {noformat} WITH ws AS (SELECT d_year AS ws_sold_year, ws_item_sk, ws_bill_customer_skws_customer_sk, Sum(ws_quantity) ws_qty, Sum(ws_wholesale_cost) ws_wc, Sum(ws_sales_price)ws_sp FROM web_sales LEFT JOIN web_returns ON wr_order_number = ws_order_number AND ws_item_sk = wr_item_sk JOIN date_dim ON ws_sold_date_sk = d_date_sk WHERE wr_order_number IS NULL GROUP BY d_year, ws_item_sk, ws_bill_customer_sk), cs AS (SELECT d_year AS cs_sold_year, cs_item_sk, cs_bill_customer_skcs_customer_sk, Sum(cs_quantity) cs_qty, Sum(cs_wholesale_cost) cs_wc, Sum(cs_sales_price)cs_sp FROM catalog_sales LEFT JOIN catalog_returns ON cr_order_number = cs_order_number AND cs_item_sk = cr_item_sk JOIN date_dim ON cs_sold_date_sk = d_date_sk WHERE cr_order_number IS NULL GROUP BY d_year, cs_item_sk, cs_bill_customer_sk), ss AS (SELECT d_year AS ss_sold_year, ss_item_sk, ss_customer_sk, Sum(ss_quantity) ss_qty, Sum(ss_wholesale_cost) ss_wc, Sum(ss_sales_price)ss_sp FROM store_sales LEFT JOIN store_returns ON sr_ticket_number = ss_ticket_number AND ss_item_sk = sr_item_sk JOIN date_dim ON ss_sold_date_sk = d_date_sk WHERE sr_ticket_number IS NULL GROUP BY d_year, ss_item_sk, ss_customer_sk) SELECT ss_item_sk, Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) ratio, ss_qty store_qty, ss_wc store_wholesale_cost, ss_sp store_sales_price, COALESCE(ws_qty, 0) + COALESCE(cs_qty, 0) other_chan_qty, COALESCE(ws_wc, 0) + COALESCE(cs_wc, 0) other_chan_wholesale_cost, COALESCE(ws_sp, 0) + COALESCE(cs_sp, 0) other_chan_sales_price FROM ss LEFT JOIN ws ON ( ws_sold_year = ss_sold_year AND ws_item_sk = ss_item_sk AND ws_customer_sk = ss_customer_sk ) LEFT JOIN cs ON ( cs_sold_year = ss_sold_year AND cs_item_sk = cs_item_sk AND cs_customer_sk = ss_customer_sk ) WHERE COALESCE(ws_qty, 0) > 0 AND COALESCE(cs_qty, 0) > 0 AND ss_sold_year = 1999 ORDER BY ss_item_sk, ss_qty DESC, ss_wc DESC, ss_sp DESC, other_chan_qty, other_chan_wholesale_cost, other_chan_sales_price, Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) LIMIT 100; {noformat} The profile for the new plan is 2338ae93-155b-356d-382e-0da949c6f439. Hash partition sender operator (10-00) takes 10-15 minutes. I am not sure why it takes so long. It has 10 minor fragments sending to receiver (06-05), which has 62 minor fragments. But hash partition sender (16-00) has 10 minor fragments sending to receiver (12-06), which has 220 minor fragments, and there is no performance issue. The profile for the old plan is 23387ab0-cb1c-cd5e-449a-c9bcefc901c1. Both plans use the same commit. The old plan is created by disabling statistics. I have not included the plans in the Jira because Jira has a max of 32K. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7183) TPCDS query 10, 35, 69 take longer with sf 1000 when Statistics are disabled
Robert Hou created DRILL-7183: - Summary: TPCDS query 10, 35, 69 take longer with sf 1000 when Statistics are disabled Key: DRILL-7183 URL: https://issues.apache.org/jira/browse/DRILL-7183 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.16.0 Reporter: Robert Hou Assignee: Hanumath Rao Maduri Fix For: 1.16.0 Query 69 runs 150% slower when Statistics is disabled. Here is the query: {noformat} SELECT cd_gender, cd_marital_status, cd_education_status, count(*) cnt1, cd_purchase_estimate, count(*) cnt2, cd_credit_rating, count(*) cnt3 FROM customer c, customer_address ca, customer_demographics WHERE c.c_current_addr_sk = ca.ca_address_sk AND ca_state IN ('KY', 'GA', 'NM') AND cd_demo_sk = c.c_current_cdemo_sk AND exists(SELECT * FROM store_sales, date_dim WHERE c.c_customer_sk = ss_customer_sk AND ss_sold_date_sk = d_date_sk AND d_year = 2001 AND d_moy BETWEEN 4 AND 4 + 2) AND (NOT exists(SELECT * FROM web_sales, date_dim WHERE c.c_customer_sk = ws_bill_customer_sk AND ws_sold_date_sk = d_date_sk AND d_year = 2001 AND d_moy BETWEEN 4 AND 4 + 2) AND NOT exists(SELECT * FROM catalog_sales, date_dim WHERE c.c_customer_sk = cs_ship_customer_sk AND cs_sold_date_sk = d_date_sk AND d_year = 2001 AND d_moy BETWEEN 4 AND 4 + 2)) GROUP BY cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating ORDER BY cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating LIMIT 100; {noformat} This regression is caused by commit 982e98061e029a39f1c593f695c0d93ec7079f0d. This commit should be reverted for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810456#comment-16810456 ] Robert Hou edited comment on DRILL-7154 at 4/6/19 1:01 AM: --- Sorabh gave me a private branch where he reverted the RM commit on Apache master. With this private branch, the memory used in the profile was restored to the original amount. was (Author: rhou): Sorabh gave me a private branch where he reverted the RM commit on Apache master. With this private branch, the memory allocation was restored to the original amount. > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Hanumath Rao Maduri >Priority: Blocker > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost =
[jira] [Commented] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810456#comment-16810456 ] Robert Hou commented on DRILL-7154: --- Sorabh gave me a private branch where he reverted the RM commit on Apache master. With this private branch, the memory allocation was restored to the original amount. > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Hanumath Rao Maduri >Priority: Blocker > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount =
[jira] [Assigned] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou reassigned DRILL-7154: - Assignee: Hanumath Rao Maduri (was: Boaz Ben-Zvi) > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Hanumath Rao Maduri >Priority: Blocker > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12 network,
[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7154: -- Priority: Blocker (was: Critical) > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Blocker > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id = 5636 >
[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7154: -- Attachment: hashagg.nostats.data.log hashagg.stats.disabled.data.log > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Critical > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12
[jira] [Commented] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810306#comment-16810306 ] Robert Hou commented on DRILL-7154: --- Attached logs from the foreman because it is likely the planner determined the memory budget for the queries. Renamed previous logs to have data.log suffix. > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Critical > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8,
[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7154: -- Attachment: (was: hashagg.nostats.log) > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Critical > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id
[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7154: -- Attachment: (was: hashagg.stats.disabled.log) > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Critical > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12 network, 1.5311985057468002E10
[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7154: -- Attachment: hashagg.nostats.foreman.log hashagg.stats.disabled.foreman.log > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Critical > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.foreman.log, > hashagg.nostats.log, hashagg.stats.disabled.foreman.log, > hashagg.stats.disabled.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12
[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7154: -- Summary: TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled (was: TPCH query 4 and 17 take longer with sf 1000 when Statistics are disabled) > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Critical > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.log, > hashagg.stats.disabled.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, >
[jira] [Created] (DRILL-7155) Create a standard logging message for batch sizes generated by individual operators
Robert Hou created DRILL-7155: - Summary: Create a standard logging message for batch sizes generated by individual operators Key: DRILL-7155 URL: https://issues.apache.org/jira/browse/DRILL-7155 Project: Apache Drill Issue Type: Task Components: Execution - Relational Operators Affects Versions: 1.16.0 Reporter: Robert Hou Assignee: Robert Hou QA reads log messages in drillbit.log to verify the sizes of data batches generated by individual operators. These log messages need to be standardized so that each operator creates the same message. This allows the QA test framework to verify the information in each message. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7154) TPCH query 4 and 17 take longer with sf 1000 when Statistics are disabled
Robert Hou created DRILL-7154: - Summary: TPCH query 4 and 17 take longer with sf 1000 when Statistics are disabled Key: DRILL-7154 URL: https://issues.apache.org/jira/browse/DRILL-7154 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.16.0 Reporter: Robert Hou Assignee: Boaz Ben-Zvi Fix For: 1.16.0 Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.log, hashagg.stats.disabled.log Here is TPCH 04 with sf 1000: {noformat} select o.o_orderpriority, count(*) as order_count from orders o where o.o_orderdate >= date '1996-10-01' and o.o_orderdate < date '1996-10-01' + interval '3' month and exists ( select * from lineitem l where l.l_orderkey = o.o_orderkey and l.l_commitdate < l.l_receiptdate ) group by o.o_orderpriority order by o.o_orderpriority; {noformat} TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg operator in the new plan is taking longer. One possible reason is that the Hash Agg operator in the new plan is not using as many buckets as the old plan did. The memory usage of the Hash Agg operator in the new plan is using less memory compared to the old plan. Here is the old plan: {noformat} 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5645 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5644 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 02-01SelectionVectorRemover : rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 memory}, id = 5639 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 memory}, id = 5638 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 memory}, id = 5637 03-02 Project(o_orderpriority=[$1]) : rowType = RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id = 5636 03-03Project(o_orderkey=[$1], o_orderpriority=[$2], l_orderkey=[$0]) : rowType = RecordType(ANY o_orderkey, ANY o_orderpriority, ANY l_orderkey): rowcount = 3.75E8, cumulative cost = {1.8319476940441746E10 rows, 8.108390595055101E10 cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id = 5635 03-04 HashJoin(condition=[=($1, $0)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY l_orderkey, ANY
[jira] [Closed] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types
[ https://issues.apache.org/jira/browse/DRILL-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-7132. - > Metadata cache does not have correct min/max values for varchar and interval > data types > --- > > Key: DRILL-7132 > URL: https://issues.apache.org/jira/browse/DRILL-7132 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.14.0 >Reporter: Robert Hou >Priority: Major > Fix For: 1.17.0 > > Attachments: 0_0_10.parquet > > > The parquet metadata cache does not have correct min/max values for varchar > and interval data types. > I have attached a parquet file. Here is what parquet tools shows for varchar: > [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 > average: 67 total: 67 (raw data: 65 saving -3%) > values: min: 1 max: 1 average: 1 total: 1 > uncompressed: min: 65 max: 65 average: 65 total: 65 > column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0 > Here is what the metadata cache file shows: > "name" : [ "varchar_col" ], > "minValue" : "aW9lZ2pOSkt2bmtk", > "maxValue" : "aW9lZ2pOSkt2bmtk", > "nulls" : 0 > Here is what parquet tools shows for interval: > [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 > average: 52 total: 52 (raw data: 50 saving -4%) > values: min: 1 max: 1 average: 1 total: 1 > uncompressed: min: 50 max: 50 average: 50 total: 50 > column values statistics: min: P18582D, max: P18582D, num_nulls: 0 > Here is what the metadata cache file shows: > "name" : [ "interval_col" ], > "minValue" : "UDE4NTgyRA==", > "maxValue" : "UDE4NTgyRA==", > "nulls" : 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7139) Date_add() can produce incorrect results when adding to a timestamp
[ https://issues.apache.org/jira/browse/DRILL-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7139: -- Description: I am using date_add() to create a sequence of timestamps: {noformat} select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') as interval minute)) timestamp_id from (values(1)); +--+ | timestamp_id | +--+ | 1970-01-25 20:31:12.704 | +--+ 1 row selected (0.121 seconds) {noformat} When I add one more, I get an older timestamp: {noformat} select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107375,'M') as interval minute)) timestamp_id from (values(1)); +--+ | timestamp_id | +--+ | 1969-12-07 03:29:25.408 | +--+ 1 row selected (0.126 seconds) {noformat} was: I am using date_add() to create a sequence of timestamps: {noformat} select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') as interval minute)) timestamp_id from (values(1)); +--+ | timestamp_id | +--+ | 1970-01-25 20:31:12.704 | +--+ 1 row selected (0.121 seconds) {noformat} When I add one more, I get an older timestamp: {noformat} 0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107375,'M') as interval minute)) timestamp_id from (values(1)); +--+ | timestamp_id | +--+ | 1969-12-07 03:29:25.408 | +--+ 1 row selected (0.126 seconds) {noformat} > Date_add() can produce incorrect results when adding to a timestamp > --- > > Key: DRILL-7139 > URL: https://issues.apache.org/jira/browse/DRILL-7139 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Pritesh Maker >Priority: Major > > I am using date_add() to create a sequence of timestamps: > {noformat} > select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') > as interval minute)) timestamp_id from (values(1)); > +--+ > | timestamp_id | > +--+ > | 1970-01-25 20:31:12.704 | > +--+ > 1 row selected (0.121 seconds) > {noformat} > When I add one more, I get an older timestamp: > {noformat} > select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107375,'M') > as interval minute)) timestamp_id from (values(1)); > +--+ > | timestamp_id | > +--+ > | 1969-12-07 03:29:25.408 | > +--+ > 1 row selected (0.126 seconds) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7139) Date_add() can produce incorrect results when adding to a timestamp
[ https://issues.apache.org/jira/browse/DRILL-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7139: -- Description: I am using date_add() to create a sequence of timestamps: {noformat} select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') as interval minute)) timestamp_id from (values(1)); +--+ | timestamp_id | +--+ | 1970-01-25 20:31:12.704 | +--+ 1 row selected (0.121 seconds) {noformat} When I add one more, I get an older timestamp: {noformat} 0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107375,'M') as interval minute)) timestamp_id from (values(1)); +--+ | timestamp_id | +--+ | 1969-12-07 03:29:25.408 | +--+ 1 row selected (0.126 seconds) {noformat} was: I am using date_add() to create a sequence of timestamps: {noformat} select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',{color:#f79232}107374{color},'M') as interval minute)) timestamp_id from (values(1)); +--+ | timestamp_id | +--+ | 1970-01-25 20:31:12.704 | +--+ 1 row selected (0.121 seconds) {noformat} When I add one more, I get an older timestamp: {noformat} 0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',{color:#f79232}107375{color},'M') as interval minute)) timestamp_id from (values(1)); +--+ | timestamp_id | +--+ | 1969-12-07 03:29:25.408 | +--+ 1 row selected (0.126 seconds) {noformat} > Date_add() can produce incorrect results when adding to a timestamp > --- > > Key: DRILL-7139 > URL: https://issues.apache.org/jira/browse/DRILL-7139 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Pritesh Maker >Priority: Major > > I am using date_add() to create a sequence of timestamps: > {noformat} > select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',107374,'M') > as interval minute)) timestamp_id from (values(1)); > +--+ > | timestamp_id | > +--+ > | 1970-01-25 20:31:12.704 | > +--+ > 1 row selected (0.121 seconds) > {noformat} > When I add one more, I get an older timestamp: > {noformat} > 0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 > 00:00:00', cast(concat('PT',107375,'M') as interval minute)) timestamp_id > from (values(1)); > +--+ > | timestamp_id | > +--+ > | 1969-12-07 03:29:25.408 | > +--+ > 1 row selected (0.126 seconds) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7139) Date_add() can produce incorrect results when adding to a timestamp
[ https://issues.apache.org/jira/browse/DRILL-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7139: -- Summary: Date_add() can produce incorrect results when adding to a timestamp (was: Date)add produces Incorrect results when adding to a timestamp) > Date_add() can produce incorrect results when adding to a timestamp > --- > > Key: DRILL-7139 > URL: https://issues.apache.org/jira/browse/DRILL-7139 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Pritesh Maker >Priority: Major > > I am using date_add() to create a sequence of timestamps: > {noformat} > select date_add(timestamp '1970-01-01 00:00:00', > cast(concat('PT',{color:#f79232}107374{color},'M') as interval minute)) > timestamp_id from (values(1)); > +--+ > | timestamp_id | > +--+ > | 1970-01-25 20:31:12.704 | > +--+ > 1 row selected (0.121 seconds) > {noformat} > When I add one more, I get an older timestamp: > {noformat} > 0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 > 00:00:00', cast(concat('PT',{color:#f79232}107375{color},'M') as interval > minute)) timestamp_id from (values(1)); > +--+ > | timestamp_id | > +--+ > | 1969-12-07 03:29:25.408 | > +--+ > 1 row selected (0.126 seconds) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7139) Date)add produces Incorrect results when adding to a timestamp
Robert Hou created DRILL-7139: - Summary: Date)add produces Incorrect results when adding to a timestamp Key: DRILL-7139 URL: https://issues.apache.org/jira/browse/DRILL-7139 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.15.0 Reporter: Robert Hou Assignee: Pritesh Maker I am using date_add() to create a sequence of timestamps: {noformat} select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',{color:#f79232}107374{color},'M') as interval minute)) timestamp_id from (values(1)); +--+ | timestamp_id | +--+ | 1970-01-25 20:31:12.704 | +--+ 1 row selected (0.121 seconds) {noformat} When I add one more, I get an older timestamp: {noformat} 0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 00:00:00', cast(concat('PT',{color:#f79232}107375{color},'M') as interval minute)) timestamp_id from (values(1)); +--+ | timestamp_id | +--+ | 1969-12-07 03:29:25.408 | +--+ 1 row selected (0.126 seconds) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7136) Num_buckets for HashAgg in profile may be inaccurate
[ https://issues.apache.org/jira/browse/DRILL-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7136: -- Description: I ran TPCH query 17 with sf 1000. Here is the query: {noformat} select sum(l.l_extendedprice) / 7.0 as avg_yearly from lineitem l, part p where p.p_partkey = l.l_partkey and p.p_brand = 'Brand#13' and p.p_container = 'JUMBO CAN' and l.l_quantity < ( select 0.2 * avg(l2.l_quantity) from lineitem l2 where l2.l_partkey = p.p_partkey ); {noformat} One of the hash agg operators has resized 6 times. It should have 4M buckets. But the profile shows it has 64K buckets. I have attached a sample profile. In this profile, the hash agg operator is (04-02). {noformat} Operator Metrics Minor Fragment NUM_BUCKETS NUM_ENTRIES NUM_RESIZING RESIZING_TIME_MSNUM_PARTITIONS SPILLED_PARTITIONS SPILL_MB SPILL_CYCLE INPUT_BATCH_COUNT AVG_INPUT_BATCH_BYTES AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT OUTPUT_BATCH_COUNT AVG_OUTPUT_BATCH_BYTES AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT 04-00-0265,536 748,746 6 364 1 582 0 813 582,653 18 26,316,456 401 1,631,943 25 26,176,350 {noformat} was: I ran TPCH query 17 with sf 1000. Here is the query: {noformat} select sum(l.l_extendedprice) / 7.0 as avg_yearly from lineitem l, part p where p.p_partkey = l.l_partkey and p.p_brand = 'Brand#13' and p.p_container = 'JUMBO CAN' and l.l_quantity < ( select 0.2 * avg(l2.l_quantity) from lineitem l2 where l2.l_partkey = p.p_partkey ); {noformat} One of the hash agg operators has resized 6 times. It should have 4M buckets. But the profile shows it has 64K buckets. I have attached a sample profile. In this profile, the hash agg operator is (04-02). {noformat} Operator Metrics Minor Fragment NUM_BUCKETS NUM_ENTRIES NUM_RESIZING RESIZING_TIME_MSNUM_PARTITIONS SPILLED_PARTITIONS SPILL_MB SPILL_CYCLE INPUT_BATCH_COUNT AVG_INPUT_BATCH_BYTES AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT OUTPUT_BATCH_COUNT AVG_OUTPUT_BATCH_BYTES AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT 04-00-0265,536 748,746 6 364 1 582 0 813 582,653 18 26,316,456 401 1,631,943 25 26,176,350 {noformat} > Num_buckets for HashAgg in profile may be inaccurate > > > Key: DRILL-7136 > URL: https://issues.apache.org/jira/browse/DRILL-7136 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build Test >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Pritesh Maker >Priority: Major > Fix For: 1.16.0 > > Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill > > > I ran TPCH query 17 with sf 1000. Here is the query: > {noformat} > select > sum(l.l_extendedprice) / 7.0 as avg_yearly > from > lineitem l, > part p > where > p.p_partkey = l.l_partkey > and p.p_brand = 'Brand#13' > and p.p_container = 'JUMBO CAN' > and l.l_quantity < ( > select > 0.2 * avg(l2.l_quantity) > from > lineitem l2 > where > l2.l_partkey = p.p_partkey > ); > {noformat} > One of the hash agg operators has resized 6 times. It should have 4M > buckets. But the profile shows it has 64K buckets. > I have attached a sample profile. In this profile, the hash agg operator is > (04-02). > {noformat} > Operator Metrics > Minor FragmentNUM_BUCKETS NUM_ENTRIES NUM_RESIZING > RESIZING_TIME_MSNUM_PARTITIONS SPILLED_PARTITIONS SPILL_MB > SPILL_CYCLE INPUT_BATCH_COUNT AVG_INPUT_BATCH_BYTES > AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT OUTPUT_BATCH_COUNT > AVG_OUTPUT_BATCH_BYTES AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT > 04-00-02 65,536 748,746 6 364 1 > 582 0 813 582,653 18 26,316,456 401 1,631,943 > 25 26,176,350 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7136) Num_buckets for HashAgg in profile may be inaccurate
Robert Hou created DRILL-7136: - Summary: Num_buckets for HashAgg in profile may be inaccurate Key: DRILL-7136 URL: https://issues.apache.org/jira/browse/DRILL-7136 Project: Apache Drill Issue Type: Bug Components: Tools, Build Test Affects Versions: 1.16.0 Reporter: Robert Hou Assignee: Pritesh Maker Fix For: 1.16.0 Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill I ran TPCH query 17 with sf 1000. Here is the query: {noformat} select sum(l.l_extendedprice) / 7.0 as avg_yearly from lineitem l, part p where p.p_partkey = l.l_partkey and p.p_brand = 'Brand#13' and p.p_container = 'JUMBO CAN' and l.l_quantity < ( select 0.2 * avg(l2.l_quantity) from lineitem l2 where l2.l_partkey = p.p_partkey ); {noformat} One of the hash agg operators has resized 6 times. It should have 4M buckets. But the profile shows it has 64K buckets. I have attached a sample profile. In this profile, the hash agg operator is (04-02). {noformat} Operator Metrics Minor Fragment NUM_BUCKETS NUM_ENTRIES NUM_RESIZING RESIZING_TIME_MSNUM_PARTITIONS SPILLED_PARTITIONS SPILL_MB SPILL_CYCLE INPUT_BATCH_COUNT AVG_INPUT_BATCH_BYTES AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT OUTPUT_BATCH_COUNT AVG_OUTPUT_BATCH_BYTES AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT 04-00-0265,536 748,746 6 364 1 582 0 813 582,653 18 26,316,456 401 1,631,943 25 26,176,350 {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7108) With statistics enabled TPCH 16 has two additional exchange operators
[ https://issues.apache.org/jira/browse/DRILL-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801040#comment-16801040 ] Robert Hou commented on DRILL-7108: --- I have verified this fix. > With statistics enabled TPCH 16 has two additional exchange operators > - > > Key: DRILL-7108 > URL: https://issues.apache.org/jira/browse/DRILL-7108 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > TPCH 16 with sf 100 runs 14% slower. Here is the query: > {noformat} > select > p.p_brand, > p.p_type, > p.p_size, > count(distinct ps.ps_suppkey) as supplier_cnt > from > partsupp ps, > part p > where > p.p_partkey = ps.ps_partkey > and p.p_brand <> 'Brand#21' > and p.p_type not like 'MEDIUM PLATED%' > and p.p_size in (38, 2, 8, 31, 44, 5, 14, 24) > and ps.ps_suppkey not in ( > select > s.s_suppkey > from > supplier s > where > s.s_comment like '%Customer%Complaints%' > ) > group by > p.p_brand, > p.p_type, > p.p_size > order by > supplier_cnt desc, > p.p_brand, > p.p_type, > p.p_size; > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-7108) With statistics enabled TPCH 16 has two additional exchange operators
[ https://issues.apache.org/jira/browse/DRILL-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-7108. - > With statistics enabled TPCH 16 has two additional exchange operators > - > > Key: DRILL-7108 > URL: https://issues.apache.org/jira/browse/DRILL-7108 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > TPCH 16 with sf 100 runs 14% slower. Here is the query: > {noformat} > select > p.p_brand, > p.p_type, > p.p_size, > count(distinct ps.ps_suppkey) as supplier_cnt > from > partsupp ps, > part p > where > p.p_partkey = ps.ps_partkey > and p.p_brand <> 'Brand#21' > and p.p_type not like 'MEDIUM PLATED%' > and p.p_size in (38, 2, 8, 31, 44, 5, 14, 24) > and ps.ps_suppkey not in ( > select > s.s_suppkey > from > supplier s > where > s.s_comment like '%Customer%Complaints%' > ) > group by > p.p_brand, > p.p_type, > p.p_size > order by > supplier_cnt desc, > p.p_brand, > p.p_type, > p.p_size; > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types
[ https://issues.apache.org/jira/browse/DRILL-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou resolved DRILL-7132. --- Resolution: Not A Problem > Metadata cache does not have correct min/max values for varchar and interval > data types > --- > > Key: DRILL-7132 > URL: https://issues.apache.org/jira/browse/DRILL-7132 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.14.0 >Reporter: Robert Hou >Priority: Major > Fix For: 1.17.0 > > Attachments: 0_0_10.parquet > > > The parquet metadata cache does not have correct min/max values for varchar > and interval data types. > I have attached a parquet file. Here is what parquet tools shows for varchar: > [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 > average: 67 total: 67 (raw data: 65 saving -3%) > values: min: 1 max: 1 average: 1 total: 1 > uncompressed: min: 65 max: 65 average: 65 total: 65 > column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0 > Here is what the metadata cache file shows: > "name" : [ "varchar_col" ], > "minValue" : "aW9lZ2pOSkt2bmtk", > "maxValue" : "aW9lZ2pOSkt2bmtk", > "nulls" : 0 > Here is what parquet tools shows for interval: > [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 > average: 52 total: 52 (raw data: 50 saving -4%) > values: min: 1 max: 1 average: 1 total: 1 > uncompressed: min: 50 max: 50 average: 50 total: 50 > column values statistics: min: P18582D, max: P18582D, num_nulls: 0 > Here is what the metadata cache file shows: > "name" : [ "interval_col" ], > "minValue" : "UDE4NTgyRA==", > "maxValue" : "UDE4NTgyRA==", > "nulls" : 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types
[ https://issues.apache.org/jira/browse/DRILL-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799355#comment-16799355 ] Robert Hou commented on DRILL-7132: --- While I agree that there is no requirement to store data in human-readable format, there are advantages when it comes to support and debugging customer issues. But I assume you considered this and decided the pros of using a different format were more important. > Metadata cache does not have correct min/max values for varchar and interval > data types > --- > > Key: DRILL-7132 > URL: https://issues.apache.org/jira/browse/DRILL-7132 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.14.0 >Reporter: Robert Hou >Priority: Major > Fix For: 1.17.0 > > Attachments: 0_0_10.parquet > > > The parquet metadata cache does not have correct min/max values for varchar > and interval data types. > I have attached a parquet file. Here is what parquet tools shows for varchar: > [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 > average: 67 total: 67 (raw data: 65 saving -3%) > values: min: 1 max: 1 average: 1 total: 1 > uncompressed: min: 65 max: 65 average: 65 total: 65 > column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0 > Here is what the metadata cache file shows: > "name" : [ "varchar_col" ], > "minValue" : "aW9lZ2pOSkt2bmtk", > "maxValue" : "aW9lZ2pOSkt2bmtk", > "nulls" : 0 > Here is what parquet tools shows for interval: > [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 > average: 52 total: 52 (raw data: 50 saving -4%) > values: min: 1 max: 1 average: 1 total: 1 > uncompressed: min: 50 max: 50 average: 50 total: 50 > column values statistics: min: P18582D, max: P18582D, num_nulls: 0 > Here is what the metadata cache file shows: > "name" : [ "interval_col" ], > "minValue" : "UDE4NTgyRA==", > "maxValue" : "UDE4NTgyRA==", > "nulls" : 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types
[ https://issues.apache.org/jira/browse/DRILL-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799353#comment-16799353 ] Robert Hou commented on DRILL-7132: --- The online decoder works. Thanks. --Robert > Metadata cache does not have correct min/max values for varchar and interval > data types > --- > > Key: DRILL-7132 > URL: https://issues.apache.org/jira/browse/DRILL-7132 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.14.0 >Reporter: Robert Hou >Priority: Major > Fix For: 1.17.0 > > Attachments: 0_0_10.parquet > > > The parquet metadata cache does not have correct min/max values for varchar > and interval data types. > I have attached a parquet file. Here is what parquet tools shows for varchar: > [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 > average: 67 total: 67 (raw data: 65 saving -3%) > values: min: 1 max: 1 average: 1 total: 1 > uncompressed: min: 65 max: 65 average: 65 total: 65 > column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0 > Here is what the metadata cache file shows: > "name" : [ "varchar_col" ], > "minValue" : "aW9lZ2pOSkt2bmtk", > "maxValue" : "aW9lZ2pOSkt2bmtk", > "nulls" : 0 > Here is what parquet tools shows for interval: > [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 > average: 52 total: 52 (raw data: 50 saving -4%) > values: min: 1 max: 1 average: 1 total: 1 > uncompressed: min: 50 max: 50 average: 50 total: 50 > column values statistics: min: P18582D, max: P18582D, num_nulls: 0 > Here is what the metadata cache file shows: > "name" : [ "interval_col" ], > "minValue" : "UDE4NTgyRA==", > "maxValue" : "UDE4NTgyRA==", > "nulls" : 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types
[ https://issues.apache.org/jira/browse/DRILL-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799343#comment-16799343 ] Robert Hou commented on DRILL-7132: --- [~vvysotskyi] Sounds good. How does QA verify that the values are correct? We have some metadata cache tests that are failing, and they should be re-verified with the new base24 values. And I'm about to add some new ones for an enhancement to the metadata cache feature. > Metadata cache does not have correct min/max values for varchar and interval > data types > --- > > Key: DRILL-7132 > URL: https://issues.apache.org/jira/browse/DRILL-7132 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.14.0 >Reporter: Robert Hou >Priority: Major > Fix For: 1.17.0 > > Attachments: 0_0_10.parquet > > > The parquet metadata cache does not have correct min/max values for varchar > and interval data types. > I have attached a parquet file. Here is what parquet tools shows for varchar: > [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 > average: 67 total: 67 (raw data: 65 saving -3%) > values: min: 1 max: 1 average: 1 total: 1 > uncompressed: min: 65 max: 65 average: 65 total: 65 > column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0 > Here is what the metadata cache file shows: > "name" : [ "varchar_col" ], > "minValue" : "aW9lZ2pOSkt2bmtk", > "maxValue" : "aW9lZ2pOSkt2bmtk", > "nulls" : 0 > Here is what parquet tools shows for interval: > [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 > average: 52 total: 52 (raw data: 50 saving -4%) > values: min: 1 max: 1 average: 1 total: 1 > uncompressed: min: 50 max: 50 average: 50 total: 50 > column values statistics: min: P18582D, max: P18582D, num_nulls: 0 > Here is what the metadata cache file shows: > "name" : [ "interval_col" ], > "minValue" : "UDE4NTgyRA==", > "maxValue" : "UDE4NTgyRA==", > "nulls" : 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types
Robert Hou created DRILL-7132: - Summary: Metadata cache does not have correct min/max values for varchar and interval data types Key: DRILL-7132 URL: https://issues.apache.org/jira/browse/DRILL-7132 Project: Apache Drill Issue Type: Bug Components: Metadata Affects Versions: 1.14.0 Reporter: Robert Hou Fix For: 1.17.0 Attachments: 0_0_10.parquet The parquet metadata cache does not have correct min/max values for varchar and interval data types. I have attached a parquet file. Here is what parquet tools shows for varchar: [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 average: 67 total: 67 (raw data: 65 saving -3%) values: min: 1 max: 1 average: 1 total: 1 uncompressed: min: 65 max: 65 average: 65 total: 65 column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0 Here is what the metadata cache file shows: "name" : [ "varchar_col" ], "minValue" : "aW9lZ2pOSkt2bmtk", "maxValue" : "aW9lZ2pOSkt2bmtk", "nulls" : 0 Here is what parquet tools shows for interval: [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 average: 52 total: 52 (raw data: 50 saving -4%) values: min: 1 max: 1 average: 1 total: 1 uncompressed: min: 50 max: 50 average: 50 total: 50 column values statistics: min: P18582D, max: P18582D, num_nulls: 0 Here is what the metadata cache file shows: "name" : [ "interval_col" ], "minValue" : "UDE4NTgyRA==", "maxValue" : "UDE4NTgyRA==", "nulls" : 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7121) TPCH 4 takes longer
Robert Hou created DRILL-7121: - Summary: TPCH 4 takes longer Key: DRILL-7121 URL: https://issues.apache.org/jira/browse/DRILL-7121 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.16.0 Reporter: Robert Hou Assignee: Gautam Parai Fix For: 1.16.0 Here is TPCH 4 with sf 100: {noformat} select o.o_orderpriority, count(*) as order_count from orders o where o.o_orderdate >= date '1996-10-01' and o.o_orderdate < date '1996-10-01' + interval '3' month and exists ( select * from lineitem l where l.l_orderkey = o.o_orderkey and l.l_commitdate < l.l_receiptdate ) group by o.o_orderpriority order by o.o_orderpriority; {noformat} The plan has changed when Statistics is disabled. A Hash Agg and a Broadcast Exchange have been added. These two operators expand the number of rows from the lineitem table from 137M to 9B rows. This forces the hash join to use 6GB of memory instead of 30 MB. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7120) Query fails with ChannelClosedException when Statistics is disabled.
[ https://issues.apache.org/jira/browse/DRILL-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7120: -- Summary: Query fails with ChannelClosedException when Statistics is disabled. (was: Query fails with ChannelClosedException) > Query fails with ChannelClosedException when Statistics is disabled. > > > Key: DRILL-7120 > URL: https://issues.apache.org/jira/browse/DRILL-7120 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Blocker > Fix For: 1.16.0 > > > TPCH query 5 fails at sf100 when Statistics is disabled. Here is the query: > {noformat} > select > n.n_name, > sum(l.l_extendedprice * (1 - l.l_discount)) as revenue > from > customer c, > orders o, > lineitem l, > supplier s, > nation n, > region r > where > c.c_custkey = o.o_custkey > and l.l_orderkey = o.o_orderkey > and l.l_suppkey = s.s_suppkey > and c.c_nationkey = s.s_nationkey > and s.s_nationkey = n.n_nationkey > and n.n_regionkey = r.r_regionkey > and r.r_name = 'EUROPE' > and o.o_orderdate >= date '1997-01-01' > and o.o_orderdate < date '1997-01-01' + interval '1' year > group by > n.n_name > order by > revenue desc; > {noformat} > This is the error from drillbit.log: > {noformat} > 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> > FINISHED > 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State to report: FINISHED > 2019-03-04 18:17:51,454 [BitServer-13] WARN > o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming > stream due to memory limits. Current Allocation: 262144. > 2019-03-04 18:17:51,454 [BitServer-13] ERROR > o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer. > 2019-03-04 18:17:51,463 [BitServer-13] ERROR > o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. > Connection: /10.10.120.104:31012 <--> /10.10.120.106:53048 (data server). > Closing connection. > io.netty.handler.codec.DecoderException: > org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating > buffer. > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271) > ~[netty-codec-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at >
[jira] [Created] (DRILL-7123) TPCDS query 83 runs slower when Statistics is disabled
Robert Hou created DRILL-7123: - Summary: TPCDS query 83 runs slower when Statistics is disabled Key: DRILL-7123 URL: https://issues.apache.org/jira/browse/DRILL-7123 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.16.0 Reporter: Robert Hou Assignee: Gautam Parai Fix For: 1.16.0 Query is TPCDS 83 with sf 100: {noformat} WITH sr_items AS (SELECT i_item_id item_id, Sum(sr_return_quantity) sr_item_qty FROM store_returns, item, date_dim WHERE sr_item_sk = i_item_sk AND d_date IN (SELECT d_date FROM date_dim WHERE d_week_seq IN (SELECT d_week_seq FROM date_dim WHERE d_date IN ( '1999-06-30', '1999-08-28', '1999-11-18' ))) AND sr_returned_date_sk = d_date_sk GROUP BY i_item_id), cr_items AS (SELECT i_item_id item_id, Sum(cr_return_quantity) cr_item_qty FROM catalog_returns, item, date_dim WHERE cr_item_sk = i_item_sk AND d_date IN (SELECT d_date FROM date_dim WHERE d_week_seq IN (SELECT d_week_seq FROM date_dim WHERE d_date IN ( '1999-06-30', '1999-08-28', '1999-11-18' ))) AND cr_returned_date_sk = d_date_sk GROUP BY i_item_id), wr_items AS (SELECT i_item_id item_id, Sum(wr_return_quantity) wr_item_qty FROM web_returns, item, date_dim WHERE wr_item_sk = i_item_sk AND d_date IN (SELECT d_date FROM date_dim WHERE d_week_seq IN (SELECT d_week_seq FROM date_dim WHERE d_date IN ( '1999-06-30', '1999-08-28', '1999-11-18' ))) AND wr_returned_date_sk = d_date_sk GROUP BY i_item_id) SELECT sr_items.item_id, sr_item_qty, sr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 * 100 sr_dev, cr_item_qty, cr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 * 100 cr_dev, wr_item_qty, wr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 * 100 wr_dev, ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 average FROM sr_items, cr_items, wr_items WHERE sr_items.item_id = cr_items.item_id AND sr_items.item_id = wr_items.item_id ORDER BY sr_items.item_id, sr_item_qty LIMIT 100; {noformat} The number of threads for major fragments 1 and 2 has changed when Statistics is disabled. The number of minor fragments has been reduced from 10 and 15 fragments down to 3 fragments. Rowcount has changed for major fragment 2 from 1439754.0 down to 287950.8. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7120) Query fails with ChannelClosedException when Statistics is disabled
[ https://issues.apache.org/jira/browse/DRILL-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7120: -- Summary: Query fails with ChannelClosedException when Statistics is disabled (was: Query fails with ChannelClosedException when Statistics is disabled.) > Query fails with ChannelClosedException when Statistics is disabled > --- > > Key: DRILL-7120 > URL: https://issues.apache.org/jira/browse/DRILL-7120 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Blocker > Fix For: 1.16.0 > > > TPCH query 5 fails at sf100 when Statistics is disabled. Here is the query: > {noformat} > select > n.n_name, > sum(l.l_extendedprice * (1 - l.l_discount)) as revenue > from > customer c, > orders o, > lineitem l, > supplier s, > nation n, > region r > where > c.c_custkey = o.o_custkey > and l.l_orderkey = o.o_orderkey > and l.l_suppkey = s.s_suppkey > and c.c_nationkey = s.s_nationkey > and s.s_nationkey = n.n_nationkey > and n.n_regionkey = r.r_regionkey > and r.r_name = 'EUROPE' > and o.o_orderdate >= date '1997-01-01' > and o.o_orderdate < date '1997-01-01' + interval '1' year > group by > n.n_name > order by > revenue desc; > {noformat} > This is the error from drillbit.log: > {noformat} > 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> > FINISHED > 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State to report: FINISHED > 2019-03-04 18:17:51,454 [BitServer-13] WARN > o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming > stream due to memory limits. Current Allocation: 262144. > 2019-03-04 18:17:51,454 [BitServer-13] ERROR > o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer. > 2019-03-04 18:17:51,463 [BitServer-13] ERROR > o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. > Connection: /10.10.120.104:31012 <--> /10.10.120.106:53048 (data server). > Closing connection. > io.netty.handler.codec.DecoderException: > org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating > buffer. > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271) > ~[netty-codec-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > [netty-transport-4.0.48.Final.jar:4.0.48.Final]
[jira] [Updated] (DRILL-7121) TPCH 4 takes longer when Statistics is disabled.
[ https://issues.apache.org/jira/browse/DRILL-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7121: -- Summary: TPCH 4 takes longer when Statistics is disabled. (was: TPCH 4 takes longer) > TPCH 4 takes longer when Statistics is disabled. > > > Key: DRILL-7121 > URL: https://issues.apache.org/jira/browse/DRILL-7121 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Blocker > Fix For: 1.16.0 > > > Here is TPCH 4 with sf 100: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > The plan has changed when Statistics is disabled. A Hash Agg and a > Broadcast Exchange have been added. These two operators expand the number of > rows from the lineitem table from 137M to 9B rows. This forces the hash > join to use 6GB of memory instead of 30 MB. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7122) TPCDS queries 29 25 17 are slower when Statistics is disabled.
[ https://issues.apache.org/jira/browse/DRILL-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou reassigned DRILL-7122: - Assignee: Gautam Parai Affects Version/s: 1.16.0 Priority: Blocker (was: Major) Fix Version/s: 1.16.0 Description: Here is query 29 with sf 100: {noformat} SELECT i_item_id, i_item_desc, s_store_id, s_store_name, Avg(ss_quantity)AS store_sales_quantity, Avg(sr_return_quantity) AS store_returns_quantity, Avg(cs_quantity)AS catalog_sales_quantity FROM store_sales, store_returns, catalog_sales, date_dim d1, date_dim d2, date_dim d3, store, item WHERE d1.d_moy = 4 AND d1.d_year = 1998 AND d1.d_date_sk = ss_sold_date_sk AND i_item_sk = ss_item_sk AND s_store_sk = ss_store_sk AND ss_customer_sk = sr_customer_sk AND ss_item_sk = sr_item_sk AND ss_ticket_number = sr_ticket_number AND sr_returned_date_sk = d2.d_date_sk AND d2.d_moy BETWEEN 4 AND 4 + 3 AND d2.d_year = 1998 AND sr_customer_sk = cs_bill_customer_sk AND sr_item_sk = cs_item_sk AND cs_sold_date_sk = d3.d_date_sk AND d3.d_year IN ( 1998, 1998 + 1, 1998 + 2 ) GROUP BY i_item_id, i_item_desc, s_store_id, s_store_name ORDER BY i_item_id, i_item_desc, s_store_id, s_store_name LIMIT 100; {noformat} The hash join order has changed. As a result, one of the hash joins does not seem to reduce the number of rows significantly. Component/s: Query Planning & Optimization > TPCDS queries 29 25 17 are slower when Statistics is disabled. > -- > > Key: DRILL-7122 > URL: https://issues.apache.org/jira/browse/DRILL-7122 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Blocker > Fix For: 1.16.0 > > > Here is query 29 with sf 100: > {noformat} > SELECT i_item_id, >i_item_desc, >s_store_id, >s_store_name, >Avg(ss_quantity)AS store_sales_quantity, >Avg(sr_return_quantity) AS store_returns_quantity, >Avg(cs_quantity)AS catalog_sales_quantity > FROM store_sales, >store_returns, >catalog_sales, >date_dim d1, >date_dim d2, >date_dim d3, >store, >item > WHERE d1.d_moy = 4 >AND d1.d_year = 1998 >AND d1.d_date_sk = ss_sold_date_sk >AND i_item_sk = ss_item_sk >AND s_store_sk = ss_store_sk >AND ss_customer_sk = sr_customer_sk >AND ss_item_sk = sr_item_sk >AND ss_ticket_number = sr_ticket_number >AND sr_returned_date_sk = d2.d_date_sk >AND d2.d_moy BETWEEN 4 AND 4 + 3 >AND d2.d_year = 1998 >AND sr_customer_sk = cs_bill_customer_sk >AND sr_item_sk = cs_item_sk >AND cs_sold_date_sk = d3.d_date_sk >AND d3.d_year IN ( 1998, 1998 + 1, 1998 + 2 ) > GROUP BY i_item_id, > i_item_desc, > s_store_id, > s_store_name > ORDER BY i_item_id, > i_item_desc, > s_store_id, > s_store_name > LIMIT 100; > {noformat} > The hash join order has changed. As a result, one of the hash joins does not > seem to reduce the number of rows significantly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7122) TPCDS queries 29 25 17 are slower when Statistics is disabled.
Robert Hou created DRILL-7122: - Summary: TPCDS queries 29 25 17 are slower when Statistics is disabled. Key: DRILL-7122 URL: https://issues.apache.org/jira/browse/DRILL-7122 Project: Apache Drill Issue Type: Bug Reporter: Robert Hou -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7120) Query fails with ChannelClosedException
[ https://issues.apache.org/jira/browse/DRILL-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7120: -- Description: TPCH query 5 fails at sf100 when Statistics is disabled. Here is the query: {noformat} select n.n_name, sum(l.l_extendedprice * (1 - l.l_discount)) as revenue from customer c, orders o, lineitem l, supplier s, nation n, region r where c.c_custkey = o.o_custkey and l.l_orderkey = o.o_orderkey and l.l_suppkey = s.s_suppkey and c.c_nationkey = s.s_nationkey and s.s_nationkey = n.n_nationkey and n.n_regionkey = r.r_regionkey and r.r_name = 'EUROPE' and o.o_orderdate >= date '1997-01-01' and o.o_orderdate < date '1997-01-01' + interval '1' year group by n.n_name order by revenue desc; {noformat} This is the error from drillbit.log: {noformat} 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO o.a.d.e.w.fragment.FragmentExecutor - 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> FINISHED 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO o.a.d.e.w.f.FragmentStatusReporter - 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State to report: FINISHED 2019-03-04 18:17:51,454 [BitServer-13] WARN o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming stream due to memory limits. Current Allocation: 262144. 2019-03-04 18:17:51,454 [BitServer-13] ERROR o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer. 2019-03-04 18:17:51,463 [BitServer-13] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.10.120.104:31012 <--> /10.10.120.106:53048 (data server). Closing connection. io.netty.handler.codec.DecoderException: org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating buffer. at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271) ~[netty-codec-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) [netty-common-4.0.48.Final.jar:4.0.48.Final] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112] Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating buffer. at io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:67)
[jira] [Created] (DRILL-7120) Query fails with ChannelClosedException
Robert Hou created DRILL-7120: - Summary: Query fails with ChannelClosedException Key: DRILL-7120 URL: https://issues.apache.org/jira/browse/DRILL-7120 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.16.0 Reporter: Robert Hou Assignee: Gautam Parai Fix For: 1.16.0 TPCH query 5 fails at sf100. Here is the query: {noformat} select n.n_name, sum(l.l_extendedprice * (1 - l.l_discount)) as revenue from customer c, orders o, lineitem l, supplier s, nation n, region r where c.c_custkey = o.o_custkey and l.l_orderkey = o.o_orderkey and l.l_suppkey = s.s_suppkey and c.c_nationkey = s.s_nationkey and s.s_nationkey = n.n_nationkey and n.n_regionkey = r.r_regionkey and r.r_name = 'EUROPE' and o.o_orderdate >= date '1997-01-01' and o.o_orderdate < date '1997-01-01' + interval '1' year group by n.n_name order by revenue desc; {noformat} This is the error from drillbit.log: {noformat} 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO o.a.d.e.w.fragment.FragmentExecutor - 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> FINISHED 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO o.a.d.e.w.f.FragmentStatusReporter - 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State to report: FINISHED 2019-03-04 18:17:51,454 [BitServer-13] WARN o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming stream due to memory limits. Current Allocation: 262144. 2019-03-04 18:17:51,454 [BitServer-13] ERROR o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer. 2019-03-04 18:17:51,463 [BitServer-13] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.10.120.104:31012 <--> /10.10.120.106:53048 (data server). Closing connection. io.netty.handler.codec.DecoderException: org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating buffer. at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271) ~[netty-codec-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.0.48.Final.jar:4.0.48.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) [netty-common-4.0.48.Final.jar:4.0.48.Final] at java.lang.Thread.run(Thread.java:745)
[jira] [Created] (DRILL-7109) Statistics adds external sort, which spills to disk
Robert Hou created DRILL-7109: - Summary: Statistics adds external sort, which spills to disk Key: DRILL-7109 URL: https://issues.apache.org/jira/browse/DRILL-7109 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.16.0 Reporter: Robert Hou Assignee: Gautam Parai Fix For: 1.16.0 TPCH query 4 with sf 100 runs many times slower. One issue is that an extra external sort has been added, and both external sorts spill to disk. Also, the hash join sees 100x more data. Here is the query: {noformat} select o.o_orderpriority, count(*) as order_count from orders o where o.o_orderdate >= date '1996-10-01' and o.o_orderdate < date '1996-10-01' + interval '3' month and exists ( select * from lineitem l where l.l_orderkey = o.o_orderkey and l.l_commitdate < l.l_receiptdate ) group by o.o_orderpriority order by o.o_orderpriority; {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7108) Statistics adds two exchange operators
Robert Hou created DRILL-7108: - Summary: Statistics adds two exchange operators Key: DRILL-7108 URL: https://issues.apache.org/jira/browse/DRILL-7108 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.16.0 Reporter: Robert Hou Assignee: Gautam Parai Fix For: 1.16.0 TPCH 16 with sf 100 runs 14% slower. Here is the query: {noformat} select p.p_brand, p.p_type, p.p_size, count(distinct ps.ps_suppkey) as supplier_cnt from partsupp ps, part p where p.p_partkey = ps.ps_partkey and p.p_brand <> 'Brand#21' and p.p_type not like 'MEDIUM PLATED%' and p.p_size in (38, 2, 8, 31, 44, 5, 14, 24) and ps.ps_suppkey not in ( select s.s_suppkey from supplier s where s.s_comment like '%Customer%Complaints%' ) group by p.p_brand, p.p_type, p.p_size order by supplier_cnt desc, p.p_brand, p.p_type, p.p_size; {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6755) HashJoin should not build hash tables when probe side is empty.
[ https://issues.apache.org/jira/browse/DRILL-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757752#comment-16757752 ] Robert Hou commented on DRILL-6755: --- Boaz suggested verifying this by joining with an empty file. {noformat} select count(*) from dfs.`/empty.json` E where E.l_orderkey in (select L.l_orderkey from lineitem L); {noformat} I tested this with Drill 1.15. I had to turn off semijoins to get the desired plan because if a semijoin is used, then the join is re-ordered so that the empty file is on the build side (may be a bug). I was able to verify that the hash join operator does not build a hash table for this query. > HashJoin should not build hash tables when probe side is empty. > --- > > Key: DRILL-6755 > URL: https://issues.apache.org/jira/browse/DRILL-6755 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Boaz Ben-Zvi >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > Currently when doing an Inner or a Right join we still build hashtables when > the probe side is empty. A performance optimization would be to not build > them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6755) HashJoin should not build hash tables when probe side is empty.
[ https://issues.apache.org/jira/browse/DRILL-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-6755. - > HashJoin should not build hash tables when probe side is empty. > --- > > Key: DRILL-6755 > URL: https://issues.apache.org/jira/browse/DRILL-6755 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Boaz Ben-Zvi >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > Currently when doing an Inner or a Right join we still build hashtables when > the probe side is empty. A performance optimization would be to not build > them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6517) IllegalStateException: Record count not set for this vector container
[ https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-6517. - > IllegalStateException: Record count not set for this vector container > - > > Key: DRILL-6517 > URL: https://issues.apache.org/jira/browse/DRILL-6517 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.14.0 >Reporter: Khurram Faraaz >Assignee: Boaz Ben-Zvi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.15.0 > > Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill > > > TPC-DS query is Canceled after 2 hrs and 47 mins and we see an > IllegalStateException: Record count not set for this vector container, in > drillbit.log > Steps to reproduce the problem, query profile > (24d7b377-7589-7928-f34f-57d02061acef) is attached here. > {noformat} > In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster > export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"} > and set these options from sqlline, > alter system set `planner.memory.max_query_memory_per_node` = 10737418240; > alter system set `drill.exec.hashagg.fallback.enabled` = true; > To run the query (replace IP-ADDRESS with your foreman node's IP address) > cd /opt/mapr/drill/drill-1.14.0/bin > ./sqlline -u > "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f > /root/query72.sql > {noformat} > Stack trace from drillbit.log > {noformat} > 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] > ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_161] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > Caused by: java.lang.IllegalStateException: Record count not set for this > vector container > at com.google.common.base.Preconditions.checkState(Preconditions.java:173) > ~[guava-18.0.jar:na] > at > org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:242) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:218) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) >
[jira] [Comment Edited] (DRILL-6517) IllegalStateException: Record count not set for this vector container
[ https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756773#comment-16756773 ] Robert Hou edited comment on DRILL-6517 at 1/31/19 1:38 AM: I am unable to reproduce this problem with sf1. I ran the query for 2 hours and 12 hours, and then successfully canceled the query. I spoke with Khurram and added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set `drill.exec.hashjoin.fallback.enabled` = true;" because the query was running out of memory. I am able to cancel the query. I am running Drill 1.14, commit 35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested EAND scenario.". The query can be canceled with Drill 1.15. was (Author: rhou): am unable to reproduce this problem with sf1. I ran the query for 2 hours and 12 hours, and then successfully canceled the query. I spoke with Khurram and added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set `drill.exec.hashjoin.fallback.enabled` = true;" because the query was running out of memory. I am able to cancel the query. I am running Drill 1.14, commit 35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested EAND scenario.". The query can be canceled with Drill 1.15. > IllegalStateException: Record count not set for this vector container > - > > Key: DRILL-6517 > URL: https://issues.apache.org/jira/browse/DRILL-6517 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.14.0 >Reporter: Khurram Faraaz >Assignee: Boaz Ben-Zvi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.15.0 > > Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill > > > TPC-DS query is Canceled after 2 hrs and 47 mins and we see an > IllegalStateException: Record count not set for this vector container, in > drillbit.log > Steps to reproduce the problem, query profile > (24d7b377-7589-7928-f34f-57d02061acef) is attached here. > {noformat} > In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster > export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"} > and set these options from sqlline, > alter system set `planner.memory.max_query_memory_per_node` = 10737418240; > alter system set `drill.exec.hashagg.fallback.enabled` = true; > To run the query (replace IP-ADDRESS with your foreman node's IP address) > cd /opt/mapr/drill/drill-1.14.0/bin > ./sqlline -u > "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f > /root/query72.sql > {noformat} > Stack trace from drillbit.log > {noformat} > 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] > ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_161] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > Caused by: java.lang.IllegalStateException: Record count not set for this > vector container > at com.google.common.base.Preconditions.checkState(Preconditions.java:173) > ~[guava-18.0.jar:na] > at > org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at >
[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container
[ https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756773#comment-16756773 ] Robert Hou commented on DRILL-6517: --- am unable to reproduce this problem with sf1. I ran the query for 2 hours and 12 hours, and then successfully canceled the query. I spoke with Khurram and added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set `drill.exec.hashjoin.fallback.enabled` = true;" because the query was running out of memory. I am able to cancel the query. I am running Drill 1.14, commit 35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested EAND scenario.". > IllegalStateException: Record count not set for this vector container > - > > Key: DRILL-6517 > URL: https://issues.apache.org/jira/browse/DRILL-6517 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.14.0 >Reporter: Khurram Faraaz >Assignee: Boaz Ben-Zvi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.15.0 > > Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill > > > TPC-DS query is Canceled after 2 hrs and 47 mins and we see an > IllegalStateException: Record count not set for this vector container, in > drillbit.log > Steps to reproduce the problem, query profile > (24d7b377-7589-7928-f34f-57d02061acef) is attached here. > {noformat} > In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster > export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"} > and set these options from sqlline, > alter system set `planner.memory.max_query_memory_per_node` = 10737418240; > alter system set `drill.exec.hashagg.fallback.enabled` = true; > To run the query (replace IP-ADDRESS with your foreman node's IP address) > cd /opt/mapr/drill/drill-1.14.0/bin > ./sqlline -u > "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f > /root/query72.sql > {noformat} > Stack trace from drillbit.log > {noformat} > 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] > ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_161] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > Caused by: java.lang.IllegalStateException: Record count not set for this > vector container > at com.google.common.base.Preconditions.checkState(Preconditions.java:173) > ~[guava-18.0.jar:na] > at > org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at >
[jira] [Comment Edited] (DRILL-6517) IllegalStateException: Record count not set for this vector container
[ https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756773#comment-16756773 ] Robert Hou edited comment on DRILL-6517 at 1/31/19 1:38 AM: am unable to reproduce this problem with sf1. I ran the query for 2 hours and 12 hours, and then successfully canceled the query. I spoke with Khurram and added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set `drill.exec.hashjoin.fallback.enabled` = true;" because the query was running out of memory. I am able to cancel the query. I am running Drill 1.14, commit 35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested EAND scenario.". The query can be canceled with Drill 1.15. was (Author: rhou): am unable to reproduce this problem with sf1. I ran the query for 2 hours and 12 hours, and then successfully canceled the query. I spoke with Khurram and added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set `drill.exec.hashjoin.fallback.enabled` = true;" because the query was running out of memory. I am able to cancel the query. I am running Drill 1.14, commit 35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested EAND scenario.". > IllegalStateException: Record count not set for this vector container > - > > Key: DRILL-6517 > URL: https://issues.apache.org/jira/browse/DRILL-6517 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.14.0 >Reporter: Khurram Faraaz >Assignee: Boaz Ben-Zvi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.15.0 > > Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill > > > TPC-DS query is Canceled after 2 hrs and 47 mins and we see an > IllegalStateException: Record count not set for this vector container, in > drillbit.log > Steps to reproduce the problem, query profile > (24d7b377-7589-7928-f34f-57d02061acef) is attached here. > {noformat} > In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster > export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"} > and set these options from sqlline, > alter system set `planner.memory.max_query_memory_per_node` = 10737418240; > alter system set `drill.exec.hashagg.fallback.enabled` = true; > To run the query (replace IP-ADDRESS with your foreman node's IP address) > cd /opt/mapr/drill/drill-1.14.0/bin > ./sqlline -u > "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f > /root/query72.sql > {noformat} > Stack trace from drillbit.log > {noformat} > 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] > ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_161] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > Caused by: java.lang.IllegalStateException: Record count not set for this > vector container > at com.google.common.base.Preconditions.checkState(Preconditions.java:173) > ~[guava-18.0.jar:na] > at > org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690) >
[jira] [Closed] (DRILL-6726) Drill fails to query views created before DRILL-6492 when impersonation is enabled
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-6726. - > Drill fails to query views created before DRILL-6492 when impersonation is > enabled > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > Attachments: student > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an > existing view was created before (DRILL-6492) was committed, and this view > references a file that includes a schema which has upper case letters, the > view needs to be rebuilt. There may be variations on this issue that I have > not seen. > To reproduce this problem, create a dfs workspace like this: > {noformat} > "drillTestDirP1": { > "location": "/drill/testdata/p1tests", > "writable": true, > "defaultInputFormat": "parquet", > "allowAccessOutsideWorkspace": false > }, > {noformat} > Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this > command: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute > this query: > {noformat} > select * from student_test_v; > {noformat} > Drill will return an exception: > {noformat} > Error: VALIDATION ERROR: Failure while attempting to expand view. Requested > schema drillTestDirP1 not available in schema dfs. > View Context dfs, drillTestDirP1 > View SQL SELECT * > FROM `dfs.drillTestDirP1`.`student` > [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] > (state=,code=0) > {noformat} > I have attached the student parquet file I used. > This is what the .view.drill file looks like: > {noformat} > { > "name" : "student_test_v", > "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`", > "fields" : [ { > "name" : "**", > "type" : "DYNAMIC_STAR", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ] > } > {noformat} > This means that users may not be able to access views that they have created > using previous versions of Drill. We should maintain backwards > compatibiliity where possible. > As work-around, these views can be re-created. It would be helpful to users > if the error message explains that these views need to be re-created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6726) Drill fails to query views created before DRILL-6492 when impersonation is enabled
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756501#comment-16756501 ] Robert Hou commented on DRILL-6726: --- I have encountered another problem related to this one. If I run Drill 1.15, and then I run Drill 1.14, Drill 1.14 cannot access schemas using mixed-case (have upper case letters). It can access the schema if it uses lower case letters. For example, if the schema used to be called "drillTestDir", Drill 1.14 must use "drilltestdir" in order to use it. This means that scripts that use "drillTestDir" can break. This may not be a major issue now, but sometimes users can try a new version of Drill, and if they run into problems, they can revert to the older version of Drill. We know one user who tried Drill 1.14 and encountered some problems and went back to Drill 1.13. We should keep this in mind in future releases. > Drill fails to query views created before DRILL-6492 when impersonation is > enabled > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > Attachments: student > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an > existing view was created before (DRILL-6492) was committed, and this view > references a file that includes a schema which has upper case letters, the > view needs to be rebuilt. There may be variations on this issue that I have > not seen. > To reproduce this problem, create a dfs workspace like this: > {noformat} > "drillTestDirP1": { > "location": "/drill/testdata/p1tests", > "writable": true, > "defaultInputFormat": "parquet", > "allowAccessOutsideWorkspace": false > }, > {noformat} > Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this > command: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute > this query: > {noformat} > select * from student_test_v; > {noformat} > Drill will return an exception: > {noformat} > Error: VALIDATION ERROR: Failure while attempting to expand view. Requested > schema drillTestDirP1 not available in schema dfs. > View Context dfs, drillTestDirP1 > View SQL SELECT * > FROM `dfs.drillTestDirP1`.`student` > [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] > (state=,code=0) > {noformat} > I have attached the student parquet file I used. > This is what the .view.drill file looks like: > {noformat} > { > "name" : "student_test_v", > "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`", > "fields" : [ { > "name" : "**", > "type" : "DYNAMIC_STAR", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ] > } > {noformat} > This means that users may not be able to access views that they have created > using previous versions of Drill. We should maintain backwards > compatibiliity where possible. > As work-around, these views can be re-created. It would be helpful to users > if the error message explains that these views need to be re-created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-6709. - > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756486#comment-16756486 ] Robert Hou commented on DRILL-6709: --- I have verified this. > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6880) Hash-Join: Many null keys on the build side form a long linked chain in the Hash Table
[ https://issues.apache.org/jira/browse/DRILL-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-6880. - > Hash-Join: Many null keys on the build side form a long linked chain in the > Hash Table > -- > > Key: DRILL-6880 > URL: https://issues.apache.org/jira/browse/DRILL-6880 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Boaz Ben-Zvi >Assignee: Boaz Ben-Zvi >Priority: Critical > Fix For: 1.16.0 > > > When building the Hash Table for the Hash-Join, each new key is matched with > an existing key (same bucket) by calling the generated method > `isKeyMatchInternalBuild`, which compares the two. However when both keys are > null, the method returns *false* (meaning not-equal; i.e. it is a new key), > thus the new key is added into the list following the old key. When a third > null key is found, it would be matched with the prior two, and added as well. > Etc etc ... > This way many null values would perform checks at order N^2 / 2. > _Suggested improvement_: The generated code should return a third result, > meaning "two null keys". Then in case of Inner or Left joins all the > duplicate nulls can be discarded. > Below is a simple example, note the time difference between non-null and the > all-nulls tables (also instrumentation showed that for nulls, the method > above was called 1249975000 times!!) > {code:java} > 0: jdbc:drill:zk=local> use dfs.tmp; > 0: jdbc:drill:zk=local> create table testNull as (select cast(null as int) > mycol from > dfs.`/data/test128M.tbl` limit 5); > 0: jdbc:drill:zk=local> create table test1 as (select cast(1 as int) mycol1 > from > dfs.`/data/test128M.tbl` limit 6); > 0: jdbc:drill:zk=local> create table test2 as (select cast(2 as int) mycol2 > from dfs.`/data/test128M.tbl` limit 5); > 0: jdbc:drill:zk=local> select count(*) from test1 join test2 on test1.mycol1 > = test2.mycol2; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > 1 row selected (0.443 seconds) > 0: jdbc:drill:zk=local> select count(*) from test1 join testNull on > test1.mycol1 = testNull.mycol; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > 1 row selected (140.098 seconds) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6880) Hash-Join: Many null keys on the build side form a long linked chain in the Hash Table
[ https://issues.apache.org/jira/browse/DRILL-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742671#comment-16742671 ] Robert Hou commented on DRILL-6880: --- I have verified this fix. > Hash-Join: Many null keys on the build side form a long linked chain in the > Hash Table > -- > > Key: DRILL-6880 > URL: https://issues.apache.org/jira/browse/DRILL-6880 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Boaz Ben-Zvi >Assignee: Boaz Ben-Zvi >Priority: Critical > Fix For: 1.16.0 > > > When building the Hash Table for the Hash-Join, each new key is matched with > an existing key (same bucket) by calling the generated method > `isKeyMatchInternalBuild`, which compares the two. However when both keys are > null, the method returns *false* (meaning not-equal; i.e. it is a new key), > thus the new key is added into the list following the old key. When a third > null key is found, it would be matched with the prior two, and added as well. > Etc etc ... > This way many null values would perform checks at order N^2 / 2. > _Suggested improvement_: The generated code should return a third result, > meaning "two null keys". Then in case of Inner or Left joins all the > duplicate nulls can be discarded. > Below is a simple example, note the time difference between non-null and the > all-nulls tables (also instrumentation showed that for nulls, the method > above was called 1249975000 times!!) > {code:java} > 0: jdbc:drill:zk=local> use dfs.tmp; > 0: jdbc:drill:zk=local> create table testNull as (select cast(null as int) > mycol from > dfs.`/data/test128M.tbl` limit 5); > 0: jdbc:drill:zk=local> create table test1 as (select cast(1 as int) mycol1 > from > dfs.`/data/test128M.tbl` limit 6); > 0: jdbc:drill:zk=local> create table test2 as (select cast(2 as int) mycol2 > from dfs.`/data/test128M.tbl` limit 5); > 0: jdbc:drill:zk=local> select count(*) from test1 join test2 on test1.mycol1 > = test2.mycol2; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > 1 row selected (0.443 seconds) > 0: jdbc:drill:zk=local> select count(*) from test1 join testNull on > test1.mycol1 = testNull.mycol; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > 1 row selected (140.098 seconds) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file
[ https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737743#comment-16737743 ] Robert Hou commented on DRILL-5796: --- Found our documentation on this, and the default limit is 10K rowgroups. Which means we are limited to 10K files. > Filter pruning for multi rowgroup parquet file > -- > > Key: DRILL-5796 > URL: https://issues.apache.org/jira/browse/DRILL-5796 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Reporter: Damien Profeta >Assignee: Jean-Blas IMBERT >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > Today, filter pruning use the file name as the partitioning key. This means > you can remove a partition only if the whole file is for the same partition. > With parquet, you can prune the filter if the rowgroup make a partition of > your dataset as the unit of work if the rowgroup not the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6957) Parquet rowgroup filtering can have incorrect file count
Robert Hou created DRILL-6957: - Summary: Parquet rowgroup filtering can have incorrect file count Key: DRILL-6957 URL: https://issues.apache.org/jira/browse/DRILL-6957 Project: Apache Drill Issue Type: Bug Reporter: Robert Hou Assignee: Jean-Blas IMBERT If a query accesses all the files, the Scan operator indicates that one file is accessed. The number of rowgroups is correct. Here is an example query: {noformat} select count(*) from dfs.`/custdata/tudata/fact/vintage/snapshot_period_id=20151231/comp_id=120` where cur_tot_bal_amt < 100 {noformat} Here is the plan: {noformat} Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {9.8376721446E9 rows, 4.35668337906E10 cpu, 2.810763469E9 io, 4096.0 network, 0.0 memory}, id = 4477 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {9.8376721445E9 rows, 4.35668337905E10 cpu, 2.810763469E9 io, 4096.0 network, 0.0 memory}, id = 4476 00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {9.8376721435E9 rows, 4.35668337895E10 cpu, 2.810763469E9 io, 4096.0 network, 0.0 memory}, id = 4475 00-03 UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {9.8376721425E9 rows, 4.35668337775E10 cpu, 2.810763469E9 io, 4096.0 network, 0.0 memory}, id = 4474 01-01StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {9.8376721415E9 rows, 4.35668337695E10 cpu, 2.810763469E9 io, 0.0 network, 0.0 memory}, id = 4473 01-02 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): rowcount = 1.4053817345E9, cumulative cost = {8.432290407E9 rows, 2.67022529555E10 cpu, 2.810763469E9 io, 0.0 network, 0.0 memory}, id = 4472 01-03SelectionVectorRemover : rowType = RecordType(ANY cur_tot_bal_amt): rowcount = 1.4053817345E9, cumulative cost = {7.0269086725E9 rows, 2.10807260175E10 cpu, 2.810763469E9 io, 0.0 network, 0.0 memory}, id = 4471 01-04 Filter(condition=[($0, 100)]) : rowType = RecordType(ANY cur_tot_bal_amt): rowcount = 1.4053817345E9, cumulative cost = {5.621526938E9 rows, 1.9675344283E10 cpu, 2.810763469E9 io, 0.0 network, 0.0 memory}, id = 4470 01-05Scan(table=[[dfs, /custdata/tudata/fact/vintage/snapshot_period_id=20151231/comp_id=120]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///custdata/tudata/fact/vintage/snapshot_period_id=20151231/comp_id=120]], selectionRoot=maprfs:/custdata/tudata/fact/vintage/snapshot_period_id=20151231/comp_id=120, numFiles=1, numRowGroups=1007, usedMetadataFile=false, columns=[`cur_tot_bal_amt`]]]) : rowType = RecordType(ANY cur_tot_bal_amt): rowcount = 2.810763469E9, cumulative cost = {2.810763469E9 rows, 2.810763469E9 cpu, 2.810763469E9 io, 0.0 network, 0.0 memory}, id = 4469 {noformat} numFiles is set to 1 when it should be set to 21. All the files are in one directory. If I add a level of directories (i.e. a directory with multiple directories, each with files), then I get the correct file count. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file
[ https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737694#comment-16737694 ] Robert Hou commented on DRILL-5796: --- It looks like pushdown is performed if there are up to 10K rowgroups. If there are more than 10K rowgroups, I cannot tell if pushdown is being performed. The explain plan suggests it is not being performed. > Filter pruning for multi rowgroup parquet file > -- > > Key: DRILL-5796 > URL: https://issues.apache.org/jira/browse/DRILL-5796 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Reporter: Damien Profeta >Assignee: Jean-Blas IMBERT >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > Today, filter pruning use the file name as the partitioning key. This means > you can remove a partition only if the whole file is for the same partition. > With parquet, you can prune the filter if the rowgroup make a partition of > your dataset as the unit of work if the rowgroup not the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file
[ https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736835#comment-16736835 ] Robert Hou commented on DRILL-5796: --- Is there any limits for this feature? I am testing it with roughly 250 files organized in roughly 20 directories. There should only be one file that matches the query. But the Scan operator shows that all 250 files in 20 directories need to be scanned. Perhaps the optimizer decides not to scan row group stats after some threshold? > Filter pruning for multi rowgroup parquet file > -- > > Key: DRILL-5796 > URL: https://issues.apache.org/jira/browse/DRILL-5796 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Reporter: Damien Profeta >Assignee: Jean-Blas IMBERT >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > Today, filter pruning use the file name as the partitioning key. This means > you can remove a partition only if the whole file is for the same partition. > With parquet, you can prune the filter if the rowgroup make a partition of > your dataset as the unit of work if the rowgroup not the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DRILL-5796) Filter pruning for multi rowgroup parquet file
[ https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736835#comment-16736835 ] Robert Hou edited comment on DRILL-5796 at 1/8/19 7:37 AM: --- Are there any limits for this feature? I am testing it with roughly 250 files organized in roughly 20 directories. There should only be one file that matches the query. But the Scan operator shows that all 250 files in 20 directories need to be scanned. Perhaps the optimizer decides not to scan row group stats after some threshold? was (Author: rhou): Is there any limits for this feature? I am testing it with roughly 250 files organized in roughly 20 directories. There should only be one file that matches the query. But the Scan operator shows that all 250 files in 20 directories need to be scanned. Perhaps the optimizer decides not to scan row group stats after some threshold? > Filter pruning for multi rowgroup parquet file > -- > > Key: DRILL-5796 > URL: https://issues.apache.org/jira/browse/DRILL-5796 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Reporter: Damien Profeta >Assignee: Jean-Blas IMBERT >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > Today, filter pruning use the file name as the partitioning key. This means > you can remove a partition only if the whole file is for the same partition. > With parquet, you can prune the filter if the rowgroup make a partition of > your dataset as the unit of work if the rowgroup not the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6906) File permissions are not being honored
[ https://issues.apache.org/jira/browse/DRILL-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6906: -- Description: I ran sqlline with user "kuser1". {noformat} /opt/mapr/drill/drill-1.15.0.apache/bin/sqlline -u "jdbc:drill:drillbit=10.10.30.206" -n kuser1 -p mapr {noformat} I tried to access a file that is only accessible by root: {noformat} [root@perfnode206 drill-test-framework_krystal]# hf -ls /drill/testdata/impersonation/neg_tc5/student -rwx-- 3 root root 64612 2018-06-19 10:30 /drill/testdata/impersonation/neg_tc5/student {noformat} I am able to read the table, which should not be possible. I used this commit for Drill 1.15. {noformat} git.commit.id=bf2b414ac62cfc515fdd77f2688bb110073d764d git.commit.message.full=DRILL-6866\: Upgrade to SqlLine 1.6.0\n\n1. Changed SqlLine version to 1.6.0.\n2. Overridden new getVersion method in DrillSqlLineApplication.\n3. Set maxColumnWidth to 80 to avoid issue described in DRILL-6769.\n4. Changed colorScheme to obsidian.\n5. Output null value for varchar / char / boolean types as null instead of empty string.\n6. Changed access modifier from package default to public for JDBC classes that implement external interfaces to avoid issues when calling methods from these classes using reflection.\n\ncloses \#1556 {noformat} This is from drillbit.log. It shows that user is kuser1. {noformat} 2018-12-15 05:00:52,516 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] DEBUG o.a.d.e.w.f.QueryStateProcessor - 23eb04fb-1701-bea7-dd97-ecda58795b3b: State change requested PREPARING --> PLANNING 2018-12-15 05:00:52,531 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query with id 23eb04fb-1701-bea7-dd97-ecda58795b3b issued by kuser1: select * from dfs.`/drill/testdata/impersonation/neg_tc5/student` {noformat} It is not clear to me if this is a Drill problem or a file system problem. I tested MFS by logging in as kuser1 and trying to copy the file using "hadoop fs -copyToLocal /drill/testdata/impersonation/neg_tc5/student" and got an error, and was not able to copy the file. So I think MFS permissions are working. I also tried with Drill 1.14, and I get the expected error: {noformat} 0: jdbc:drill:drillbit=10.10.30.206> select * from dfs.`/drill/testdata/impersonation/neg_tc5/student` limit 1; Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 17: Object '/drill/testdata/impersonation/neg_tc5/student' not found within 'dfs' [Error Id: cdf18c2a-b005-4f92-b819-d4324e8807d9 on perfnode206.perf.lab:31010] (state=,code=0) {noformat} The commit for Drill 1.14 is: {noformat} git.commit.message.full=[maven-release-plugin] prepare release drill-1.14.0\n git.commit.id=0508a128853ce796ca7e99e13008e49442f83147 {noformat} This problem exists with both Apache JDBC and Simba ODBC. Here is drill-distrib.conf. drill-override.conf is empty. It is the same for both 1.14 and 1.15. {noformat} drill.exec: { cluster-id: "secure206-drillbits", zk.connect: "perfnode206.perf.lab:5181,perfnode207.perf.lab:5181,perfnode208.perf.lab:5181", rpc.user.client.threads: "4", options.store.parquet.block-size: "268435456", sys.store.provider.zk.blobroot: "maprfs:///apps/drill", spill.directories: [ "/tmp/drill/spill" ], spill.fs: "maprfs:///", storage.action_on_plugins_override_file: "rename" zk.apply_secure_acl: true, impersonation.enabled: true, impersonation.max_chained_user_hops: 3, options.exec.impersonation.inbound_policies: "[{proxy_principals:{users:[\"mapr\"]},target_principals:{users:[\"*\"]}}]", security.auth.mechanisms: ["PLAIN", "KERBEROS"], security.auth.principal : "mapr/maprs...@qa.lab", security.auth.keytab : "/etc/drill/mapr_maprsasl.keytab", security.user.auth.enabled: true, security.user.auth.packages += "org.apache.drill.exec.rpc.user.security", security.user.auth.impl: "pam4j", security.user.auth.pam_profiles: ["sudo", "login"], http.ssl_enabled: true, ssl.useHadoopConfig: true, http.auth.mechanisms: ["FORM", "SPNEGO"], http.auth.spnego.principal: "HTTP/perfnode206.perf@qa.lab", http.auth.spnego.keytab: "/etc/drill_spnego/perfnode206.keytab" } {noformat} was: I ran sqlline with user "kuser1". {noformat} /opt/mapr/drill/drill-1.15.0.apache/bin/sqlline -u "jdbc:drill:drillbit=10.10.30.206" -n kuser1 -p mapr {noformat} I tried to access a file that is only accessible by root: {noformat} [root@perfnode206 drill-test-framework_krystal]# hf -ls /drill/testdata/impersonation/neg_tc5/student -rwx-- 3 root root 64612 2018-06-19 10:30 /drill/testdata/impersonation/neg_tc5/student {noformat} I am able to read the table, which should not be possible. I used this commit for Drill 1.15. {noformat} git.commit.id=bf2b414ac62cfc515fdd77f2688bb110073d764d git.commit.message.full=DRILL-6866\: Upgrade to SqlLine 1.6.0\n\n1.
[jira] [Updated] (DRILL-6906) File permissions are not being honored
[ https://issues.apache.org/jira/browse/DRILL-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6906: -- Description: I ran sqlline with user "kuser1". {noformat} /opt/mapr/drill/drill-1.15.0.apache/bin/sqlline -u "jdbc:drill:drillbit=10.10.30.206" -n kuser1 -p mapr {noformat} I tried to access a file that is only accessible by root: {noformat} [root@perfnode206 drill-test-framework_krystal]# hf -ls /drill/testdata/impersonation/neg_tc5/student -rwx-- 3 root root 64612 2018-06-19 10:30 /drill/testdata/impersonation/neg_tc5/student {noformat} I am able to read the table, which should not be possible. I used this commit for Drill 1.15. {noformat} git.commit.id=bf2b414ac62cfc515fdd77f2688bb110073d764d git.commit.message.full=DRILL-6866\: Upgrade to SqlLine 1.6.0\n\n1. Changed SqlLine version to 1.6.0.\n2. Overridden new getVersion method in DrillSqlLineApplication.\n3. Set maxColumnWidth to 80 to avoid issue described in DRILL-6769.\n4. Changed colorScheme to obsidian.\n5. Output null value for varchar / char / boolean types as null instead of empty string.\n6. Changed access modifier from package default to public for JDBC classes that implement external interfaces to avoid issues when calling methods from these classes using reflection.\n\ncloses \#1556 {noformat} This is from drillbit.log. It shows that user is kuser1. {noformat} 2018-12-15 05:00:52,516 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] DEBUG o.a.d.e.w.f.QueryStateProcessor - 23eb04fb-1701-bea7-dd97-ecda58795b3b: State change requested PREPARING --> PLANNING 2018-12-15 05:00:52,531 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query with id 23eb04fb-1701-bea7-dd97-ecda58795b3b issued by kuser1: select * from dfs.`/drill/testdata/impersonation/neg_tc5/student` {noformat} It is not clear to me if this is a Drill problem or a file system problem. I tested MFS by logging in as kuser1 and trying to copy the file using "hadoop fs -copyToLocal /drill/testdata/impersonation/neg_tc5/student" and got an error, and was not able to copy the file. So I think MFS permissions are working. I also tried with Drill 1.14, and I get the expected error: {noformat} 0: jdbc:drill:drillbit=10.10.30.206> select * from dfs.`/drill/testdata/impersonation/neg_tc5/student` limit 1; Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 17: Object '/drill/testdata/impersonation/neg_tc5/student' not found within 'dfs' [Error Id: cdf18c2a-b005-4f92-b819-d4324e8807d9 on perfnode206.perf.lab:31010] (state=,code=0) {noformat} The commit for Drill 1.14 is: {noformat} git.commit.message.full=[maven-release-plugin] prepare release drill-1.14.0\n git.commit.id=0508a128853ce796ca7e99e13008e49442f83147 {noformat} This problem exists with both Apache JDBC and Simba ODBC. Here is drill-distrib.conf. drill-override.conf is empty. It is the same for both 1.14 and 1.15. {noformat} drill.exec: { cluster-id: "secure206-drillbits", zk.connect: "perfnode206.perf.lab:5181,perfnode207.perf.lab:5181,perfnode208.perf.lab:5181", rpc.user.client.threads: "4", options.store.parquet.block-size: "268435456", sys.store.provider.zk.blobroot: "maprfs:///apps/drill", spill.directories: [ "/tmp/drill/spill" ], spill.fs: "maprfs:///", storage.action_on_plugins_override_file: "rename" zk.apply_secure_acl: true, impersonation.enabled: true, impersonation.max_chained_user_hops: 3, options.exec.impersonation.inbound_policies: "[{proxy_principals:{users:[\"mapr\"]},target_principals:{users:[\"*\"]}}]", # security.auth.mechanisms: ["MAPRSASL", "PLAIN", "KERBEROS"], security.auth.mechanisms: ["PLAIN", "KERBEROS"], security.auth.principal : "mapr/maprs...@qa.lab", security.auth.keytab : "/etc/drill/mapr_maprsasl.keytab", security.user.auth.enabled: true, security.user.auth.packages += "org.apache.drill.exec.rpc.user.security", security.user.auth.impl: "pam4j", security.user.auth.pam_profiles: ["sudo", "login"], http.ssl_enabled: true, ssl.useHadoopConfig: true, http.auth.mechanisms: ["FORM", "SPNEGO"], http.auth.spnego.principal: "HTTP/perfnode206.perf@qa.lab", http.auth.spnego.keytab: "/etc/drill_spnego/perfnode206.keytab" } {noformat} was: I ran sqlline with user "kuser1". {noformat} /opt/mapr/drill/drill-1.15.0.apache/bin/sqlline -u "jdbc:drill:drillbit=10.10.30.206" -n kuser1 -p mapr {noformat} I tried to access a file that is only accessible by root: {noformat} [root@perfnode206 drill-test-framework_krystal]# hf -ls /drill/testdata/impersonation/neg_tc5/student -rwx-- 3 root root 64612 2018-06-19 10:30 /drill/testdata/impersonation/neg_tc5/student {noformat} I am able to read the table, which should not be possible. I used this commit for Drill 1.15. {noformat} git.commit.id=bf2b414ac62cfc515fdd77f2688bb110073d764d
[jira] [Created] (DRILL-6906) File permissions are not being honored
Robert Hou created DRILL-6906: - Summary: File permissions are not being honored Key: DRILL-6906 URL: https://issues.apache.org/jira/browse/DRILL-6906 Project: Apache Drill Issue Type: Bug Components: Client - JDBC, Client - ODBC Affects Versions: 1.15.0 Reporter: Robert Hou Assignee: Pritesh Maker Fix For: 1.15.0 I ran sqlline with user "kuser1". {noformat} /opt/mapr/drill/drill-1.15.0.apache/bin/sqlline -u "jdbc:drill:drillbit=10.10.30.206" -n kuser1 -p mapr {noformat} I tried to access a file that is only accessible by root: {noformat} [root@perfnode206 drill-test-framework_krystal]# hf -ls /drill/testdata/impersonation/neg_tc5/student -rwx-- 3 root root 64612 2018-06-19 10:30 /drill/testdata/impersonation/neg_tc5/student {noformat} I am able to read the table, which should not be possible. I used this commit for Drill 1.15. {noformat} git.commit.id=bf2b414ac62cfc515fdd77f2688bb110073d764d git.commit.message.full=DRILL-6866\: Upgrade to SqlLine 1.6.0\n\n1. Changed SqlLine version to 1.6.0.\n2. Overridden new getVersion method in DrillSqlLineApplication.\n3. Set maxColumnWidth to 80 to avoid issue described in DRILL-6769.\n4. Changed colorScheme to obsidian.\n5. Output null value for varchar / char / boolean types as null instead of empty string.\n6. Changed access modifier from package default to public for JDBC classes that implement external interfaces to avoid issues when calling methods from these classes using reflection.\n\ncloses \#1556 {noformat} This is from drillbit.log. It shows that user is kuser1. {noformat} 2018-12-15 05:00:52,516 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] DEBUG o.a.d.e.w.f.QueryStateProcessor - 23eb04fb-1701-bea7-dd97-ecda58795b3b: State change requested PREPARING --> PLANNING 2018-12-15 05:00:52,531 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query with id 23eb04fb-1701-bea7-dd97-ecda58795b3b issued by kuser1: select * from dfs.`/drill/testdata/impersonation/neg_tc5/student` {noformat} It is not clear to me if this is a Drill problem or a file system problem. I tested MFS by logging in as kuser1 and trying to copy the file using "hadoop fs -copyToLocal /drill/testdata/impersonation/neg_tc5/student" and got an error, and was not able to copy the file. So I think MFS permissions are working. I also tried with Drill 1.14, and I get the expected error: {noformat} 0: jdbc:drill:drillbit=10.10.30.206> select * from dfs.`/drill/testdata/impersonation/neg_tc5/student` limit 1; Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 17: Object '/drill/testdata/impersonation/neg_tc5/student' not found within 'dfs' [Error Id: cdf18c2a-b005-4f92-b819-d4324e8807d9 on perfnode206.perf.lab:31010] (state=,code=0) {noformat} The commit for Drill 1.14 is: {noformat} git.commit.message.full=[maven-release-plugin] prepare release drill-1.14.0\n git.commit.id=0508a128853ce796ca7e99e13008e49442f83147 {noformat} This problem exists with both Apache JDBC and Simba ODBC. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6902) Extra limit operator is not needed
Robert Hou created DRILL-6902: - Summary: Extra limit operator is not needed Key: DRILL-6902 URL: https://issues.apache.org/jira/browse/DRILL-6902 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.15.0 Reporter: Robert Hou Assignee: Pritesh Maker For TPCDS query 49, there is an extra limit operator that is not needed. Here is the query: {noformat} SELECT 'web' AS channel, web.item, web.return_ratio, web.return_rank, web.currency_rank FROM (SELECT item, return_ratio, currency_ratio, Rank() OVER ( ORDER BY return_ratio) AS return_rank, Rank() OVER ( ORDER BY currency_ratio) AS currency_rank FROM (SELECT ws.ws_item_sk AS item, ( Cast(Sum(COALESCE(wr.wr_return_quantity, 0)) AS DEC(15, 4)) / Cast( Sum(COALESCE(ws.ws_quantity, 0)) AS DEC(15, 4)) ) AS return_ratio, ( Cast(Sum(COALESCE(wr.wr_return_amt, 0)) AS DEC(15, 4)) / Cast( Sum( COALESCE(ws.ws_net_paid, 0)) AS DEC(15, 4)) ) AS currency_ratio FROM web_sales ws LEFT OUTER JOIN web_returns wr ON ( ws.ws_order_number = wr.wr_order_number AND ws.ws_item_sk = wr.wr_item_sk ), date_dim WHERE wr.wr_return_amt > 1 AND ws.ws_net_profit > 1 AND ws.ws_net_paid > 0 AND ws.ws_quantity > 0 AND ws_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 12 GROUP BY ws.ws_item_sk) in_web) web WHERE ( web.return_rank <= 10 OR web.currency_rank <= 10 ) UNION SELECT 'catalog' AS channel, catalog.item, catalog.return_ratio, catalog.return_rank, catalog.currency_rank FROM (SELECT item, return_ratio, currency_ratio, Rank() OVER ( ORDER BY return_ratio) AS return_rank, Rank() OVER ( ORDER BY currency_ratio) AS currency_rank FROM (SELECT cs.cs_item_sk AS item, ( Cast(Sum(COALESCE(cr.cr_return_quantity, 0)) AS DEC(15, 4)) / Cast( Sum(COALESCE(cs.cs_quantity, 0)) AS DEC(15, 4)) ) AS return_ratio, ( Cast(Sum(COALESCE(cr.cr_return_amount, 0)) AS DEC(15, 4 )) / Cast(Sum( COALESCE(cs.cs_net_paid, 0)) AS DEC( 15, 4)) ) AS currency_ratio FROM catalog_sales cs LEFT OUTER JOIN catalog_returns cr ON ( cs.cs_order_number = cr.cr_order_number AND cs.cs_item_sk = cr.cr_item_sk ), date_dim WHERE cr.cr_return_amount > 1 AND cs.cs_net_profit > 1 AND cs.cs_net_paid > 0 AND cs.cs_quantity > 0 AND cs_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 12 GROUP BY cs.cs_item_sk) in_cat) catalog WHERE ( catalog.return_rank <= 10 OR catalog.currency_rank <= 10 ) UNION SELECT 'store' AS channel, store.item, store.return_ratio, store.return_rank, store.currency_rank FROM (SELECT item, return_ratio, currency_ratio, Rank() OVER ( ORDER BY return_ratio) AS return_rank, Rank() OVER ( ORDER BY currency_ratio) AS currency_rank FROM (SELECT sts.ss_item_sk AS item, ( Cast(Sum(COALESCE(sr.sr_return_quantity,
[jira] [Created] (DRILL-6897) TPCH 13 has regressed
Robert Hou created DRILL-6897: - Summary: TPCH 13 has regressed Key: DRILL-6897 URL: https://issues.apache.org/jira/browse/DRILL-6897 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.15.0 Reporter: Robert Hou Assignee: Karthikeyan Manivannan Attachments: 240099ed-ef2a-a23a-4559-f1b2e0809e72.sys.drill, 2400be84-c024-cb92-8743-3211589e0247.sys.drill I ran TPCH query 13 with both scale factor 100 and 1000, and ran them 3x to get a warm start, and ran them twice to verify the regression. It is regressing between 26 and 33%. Here is the query: {noformat} select c_count, count(*) as custdist from ( select c.c_custkey, count(o.o_orderkey) from customer c left outer join orders o on c.c_custkey = o.o_custkey and o.o_comment not like '%special%requests%' group by c.c_custkey ) as orders (c_custkey, c_count) group by c_count order by custdist desc, c_count desc; {noformat} I have attached two profiles. 240099ed-ef2a-a23a-4559-f1b2e0809e72 is for Drill 1.15. 2400be84-c024-cb92-8743-3211589e0247 is for Drill 1.14. The commit for Drill 1.15 is 596227bbbecfb19bdb55dd8ea58159890f83bc9c. The commit for Drill 1.14 is 0508a128853ce796ca7e99e13008e49442f83147. The two plans nearly the same. One difference is that Drill 1.15 is using four times more memory in operator 07-01 Unordered Mux Exchange. I think the problem may be in operator 09-01 Project. Drill 1.15 is projecting the comment field while Drill 1.14 does not project the comment field. Another issue is that the Drill 1.15 takes more processing time to filter the order table. Filter operator 09-03 takes an average of 19.3s. For Drill 1.14, filter operator 09-04 takes an average of 15.6s. They process the same number of rows, and have the same number of minor fragments. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-6828) Hit UnrecognizedPropertyException when run tpch queries
[ https://issues.apache.org/jira/browse/DRILL-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou resolved DRILL-6828. --- Resolution: Cannot Reproduce > Hit UnrecognizedPropertyException when run tpch queries > --- > > Key: DRILL-6828 > URL: https://issues.apache.org/jira/browse/DRILL-6828 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.15.0 > Environment: RHEL 7, Apache Drill commit id: > 18e09a1b1c801f2691a05ae7db543bf71874cfea >Reporter: Dechang Gu >Assignee: Robert Hou >Priority: Blocker > Fix For: 1.15.0 > > > Installed Apache Drill 1.15.0 commit id: > 18e09a1b1c801f2691a05ae7db543bf71874cfea DRILL-6763: Codegen optimization of > SQL functions with constant values(\#1481) > Hit the following errors: > {code} > java.sql.SQLException: SYSTEM ERROR: UnrecognizedPropertyException: > Unrecognized field "outgoingBatchSize" (class > org.apache.drill.exec.physical.config.HashPartitionSender), not marked as > ignorable (9 known properties: "receiver-major-fragment", > "initialAllocation", "expr", "userName", "@id", "child", "cost", > "destinations", "maxAllocation"]) > at [Source: (StringReader); line: 1000, column: 29] (through reference > chain: > org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"]) > Fragment 3:175 > Please, refer to logs for more information. > [Error Id: cc023cdb-9a46-4edd-ad0b-6da1e9085291 on ucs-node6.perf.lab:31010] > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528) > at > org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:600) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1288) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61) > at > org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1109) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1120) > at > org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675) > at > org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:196) > at > org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156) > at > org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:227) > at PipSQueak.executeQuery(PipSQueak.java:289) > at PipSQueak.runTest(PipSQueak.java:104) > at PipSQueak.main(PipSQueak.java:477) > Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM > ERROR: UnrecognizedPropertyException: Unrecognized field "outgoingBatchSize" > (class org.apache.drill.exec.physical.config.HashPartitionSender), not marked > as ignorable (9 known properties: "receiver-major-fragment", > "initialAllocation", "expr", "userName", "@id", "child", "cost", > "destinations", "maxAllocation"]) > at [Source: (StringReader); line: 1000, column: 29] (through reference > chain: > org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6828) Hit UnrecognizedPropertyException when run tpch queries
[ https://issues.apache.org/jira/browse/DRILL-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-6828. - > Hit UnrecognizedPropertyException when run tpch queries > --- > > Key: DRILL-6828 > URL: https://issues.apache.org/jira/browse/DRILL-6828 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.15.0 > Environment: RHEL 7, Apache Drill commit id: > 18e09a1b1c801f2691a05ae7db543bf71874cfea >Reporter: Dechang Gu >Assignee: Robert Hou >Priority: Blocker > Fix For: 1.15.0 > > > Installed Apache Drill 1.15.0 commit id: > 18e09a1b1c801f2691a05ae7db543bf71874cfea DRILL-6763: Codegen optimization of > SQL functions with constant values(\#1481) > Hit the following errors: > {code} > java.sql.SQLException: SYSTEM ERROR: UnrecognizedPropertyException: > Unrecognized field "outgoingBatchSize" (class > org.apache.drill.exec.physical.config.HashPartitionSender), not marked as > ignorable (9 known properties: "receiver-major-fragment", > "initialAllocation", "expr", "userName", "@id", "child", "cost", > "destinations", "maxAllocation"]) > at [Source: (StringReader); line: 1000, column: 29] (through reference > chain: > org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"]) > Fragment 3:175 > Please, refer to logs for more information. > [Error Id: cc023cdb-9a46-4edd-ad0b-6da1e9085291 on ucs-node6.perf.lab:31010] > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528) > at > org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:600) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1288) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61) > at > org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1109) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1120) > at > org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675) > at > org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:196) > at > org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156) > at > org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:227) > at PipSQueak.executeQuery(PipSQueak.java:289) > at PipSQueak.runTest(PipSQueak.java:104) > at PipSQueak.main(PipSQueak.java:477) > Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM > ERROR: UnrecognizedPropertyException: Unrecognized field "outgoingBatchSize" > (class org.apache.drill.exec.physical.config.HashPartitionSender), not marked > as ignorable (9 known properties: "receiver-major-fragment", > "initialAllocation", "expr", "userName", "@id", "child", "cost", > "destinations", "maxAllocation"]) > at [Source: (StringReader); line: 1000, column: 29] (through reference > chain: > org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6828) Hit UnrecognizedPropertyException when run tpch queries
[ https://issues.apache.org/jira/browse/DRILL-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699725#comment-16699725 ] Robert Hou commented on DRILL-6828: --- I think this was a problem with how the build was distributed to the nodes. I will close this for now, and re-open if we hit it again. > Hit UnrecognizedPropertyException when run tpch queries > --- > > Key: DRILL-6828 > URL: https://issues.apache.org/jira/browse/DRILL-6828 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.15.0 > Environment: RHEL 7, Apache Drill commit id: > 18e09a1b1c801f2691a05ae7db543bf71874cfea >Reporter: Dechang Gu >Assignee: Robert Hou >Priority: Blocker > Fix For: 1.15.0 > > > Installed Apache Drill 1.15.0 commit id: > 18e09a1b1c801f2691a05ae7db543bf71874cfea DRILL-6763: Codegen optimization of > SQL functions with constant values(\#1481) > Hit the following errors: > {code} > java.sql.SQLException: SYSTEM ERROR: UnrecognizedPropertyException: > Unrecognized field "outgoingBatchSize" (class > org.apache.drill.exec.physical.config.HashPartitionSender), not marked as > ignorable (9 known properties: "receiver-major-fragment", > "initialAllocation", "expr", "userName", "@id", "child", "cost", > "destinations", "maxAllocation"]) > at [Source: (StringReader); line: 1000, column: 29] (through reference > chain: > org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"]) > Fragment 3:175 > Please, refer to logs for more information. > [Error Id: cc023cdb-9a46-4edd-ad0b-6da1e9085291 on ucs-node6.perf.lab:31010] > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528) > at > org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:600) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1288) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61) > at > org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1109) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1120) > at > org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675) > at > org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:196) > at > org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156) > at > org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:227) > at PipSQueak.executeQuery(PipSQueak.java:289) > at PipSQueak.runTest(PipSQueak.java:104) > at PipSQueak.main(PipSQueak.java:477) > Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM > ERROR: UnrecognizedPropertyException: Unrecognized field "outgoingBatchSize" > (class org.apache.drill.exec.physical.config.HashPartitionSender), not marked > as ignorable (9 known properties: "receiver-major-fragment", > "initialAllocation", "expr", "userName", "@id", "child", "cost", > "destinations", "maxAllocation"]) > at [Source: (StringReader); line: 1000, column: 29] (through reference > chain: > org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6567) Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException.
[ https://issues.apache.org/jira/browse/DRILL-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-6567. - > Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: > java.lang.reflect.UndeclaredThrowableException. > --- > > Key: DRILL-6567 > URL: https://issues.apache.org/jira/browse/DRILL-6567 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka >Priority: Critical > Fix For: 1.15.0 > > > This is TPCDS Query 93. > Query: > /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query93.sql > SELECT ss_customer_sk, > Sum(act_sales) sumsales > FROM (SELECT ss_item_sk, > ss_ticket_number, > ss_customer_sk, > CASE > WHEN sr_return_quantity IS NOT NULL THEN > ( ss_quantity - sr_return_quantity ) * ss_sales_price > ELSE ( ss_quantity * ss_sales_price ) > END act_sales > FROM store_sales > LEFT OUTER JOIN store_returns > ON ( sr_item_sk = ss_item_sk > AND sr_ticket_number = ss_ticket_number ), > reason > WHERE sr_reason_sk = r_reason_sk > AND r_reason_desc = 'reason 38') t > GROUP BY ss_customer_sk > ORDER BY sumsales, > ss_customer_sk > LIMIT 100; > Here is the stack trace: > 2018-06-29 07:00:32 INFO DrillTestLogger:348 - > Exception: > java.sql.SQLException: INTERNAL_ERROR ERROR: > java.lang.reflect.UndeclaredThrowableException > Setup failed for null > Fragment 4:56 > [Error Id: 3c72c14d-9362-4a9b-affb-5cf937bed89e on atsqa6c82.qa.lab:31010] > (org.apache.drill.common.exceptions.ExecutionSetupException) > java.lang.reflect.UndeclaredThrowableException > > org.apache.drill.common.exceptions.ExecutionSetupException.fromThrowable():30 > org.apache.drill.exec.store.hive.readers.HiveAbstractReader.setup():327 > org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():245 > org.apache.drill.exec.physical.impl.ScanBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema():118 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.physical.impl.BaseRootExec.next():103 > > org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 > org.apache.drill.exec.physical.impl.BaseRootExec.next():93 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():281 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > Caused By (java.util.concurrent.ExecutionException) >
[jira] [Resolved] (DRILL-6567) Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException.
[ https://issues.apache.org/jira/browse/DRILL-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou resolved DRILL-6567. --- Resolution: Fixed Assignee: Vitalii Diravka (was: Robert Hou) > Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: > java.lang.reflect.UndeclaredThrowableException. > --- > > Key: DRILL-6567 > URL: https://issues.apache.org/jira/browse/DRILL-6567 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka >Priority: Critical > Fix For: 1.15.0 > > > This is TPCDS Query 93. > Query: > /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query93.sql > SELECT ss_customer_sk, > Sum(act_sales) sumsales > FROM (SELECT ss_item_sk, > ss_ticket_number, > ss_customer_sk, > CASE > WHEN sr_return_quantity IS NOT NULL THEN > ( ss_quantity - sr_return_quantity ) * ss_sales_price > ELSE ( ss_quantity * ss_sales_price ) > END act_sales > FROM store_sales > LEFT OUTER JOIN store_returns > ON ( sr_item_sk = ss_item_sk > AND sr_ticket_number = ss_ticket_number ), > reason > WHERE sr_reason_sk = r_reason_sk > AND r_reason_desc = 'reason 38') t > GROUP BY ss_customer_sk > ORDER BY sumsales, > ss_customer_sk > LIMIT 100; > Here is the stack trace: > 2018-06-29 07:00:32 INFO DrillTestLogger:348 - > Exception: > java.sql.SQLException: INTERNAL_ERROR ERROR: > java.lang.reflect.UndeclaredThrowableException > Setup failed for null > Fragment 4:56 > [Error Id: 3c72c14d-9362-4a9b-affb-5cf937bed89e on atsqa6c82.qa.lab:31010] > (org.apache.drill.common.exceptions.ExecutionSetupException) > java.lang.reflect.UndeclaredThrowableException > > org.apache.drill.common.exceptions.ExecutionSetupException.fromThrowable():30 > org.apache.drill.exec.store.hive.readers.HiveAbstractReader.setup():327 > org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():245 > org.apache.drill.exec.physical.impl.ScanBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema():118 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.physical.impl.BaseRootExec.next():103 > > org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 > org.apache.drill.exec.physical.impl.BaseRootExec.next():93 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():281 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > Caused By
[jira] [Closed] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_1
[ https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-6569. - > Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not > read value at 2 in block 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > -- > > Key: DRILL-6569 > URL: https://issues.apache.org/jira/browse/DRILL-6569 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Storage - Hive >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka >Priority: Critical > Fix For: 1.15.0 > > > This is TPCDS Query 19. > I am able to scan the parquet file using: >select * from > dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet` > and I get 3,349,279 rows selected. > There are roughly 15 similar failures in the Advanced nightly run, out of 37 > failures. So this issue accounts for about half the failures. > Query: > /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql > SELECT i_brand_id brand_id, > i_brand brand, > i_manufact_id, > i_manufact, > Sum(ss_ext_sales_price) ext_price > FROM date_dim, > store_sales, > item, > customer, > customer_address, > store > WHERE d_date_sk = ss_sold_date_sk > AND ss_item_sk = i_item_sk > AND i_manager_id = 38 > AND d_moy = 12 > AND d_year = 1998 > AND ss_customer_sk = c_customer_sk > AND c_current_addr_sk = ca_address_sk > AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5) > AND ss_store_sk = s_store_sk > GROUP BY i_brand, > i_brand_id, > i_manufact_id, > i_manufact > ORDER BY ext_price DESC, > i_brand, > i_brand_id, > i_manufact_id, > i_manufact > LIMIT 100; > Here is the stack trace: > 2018-06-29 07:00:32 INFO DrillTestLogger:348 - > Exception: > java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block > 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > Fragment 4:26 > [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010] > (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at > 2 in block 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > > hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243 > hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227 > > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199 > > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57 > > org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417 > org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54 > org.apache.drill.exec.physical.impl.ScanBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 >
[jira] [Commented] (DRILL-6567) Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException.
[ https://issues.apache.org/jira/browse/DRILL-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693961#comment-16693961 ] Robert Hou commented on DRILL-6567: --- Enabling "store.hive.optimize_scan_with_native_readers" allows the test to pass. > Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: > java.lang.reflect.UndeclaredThrowableException. > --- > > Key: DRILL-6567 > URL: https://issues.apache.org/jira/browse/DRILL-6567 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Robert Hou >Priority: Critical > Fix For: 1.15.0 > > > This is TPCDS Query 93. > Query: > /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query93.sql > SELECT ss_customer_sk, > Sum(act_sales) sumsales > FROM (SELECT ss_item_sk, > ss_ticket_number, > ss_customer_sk, > CASE > WHEN sr_return_quantity IS NOT NULL THEN > ( ss_quantity - sr_return_quantity ) * ss_sales_price > ELSE ( ss_quantity * ss_sales_price ) > END act_sales > FROM store_sales > LEFT OUTER JOIN store_returns > ON ( sr_item_sk = ss_item_sk > AND sr_ticket_number = ss_ticket_number ), > reason > WHERE sr_reason_sk = r_reason_sk > AND r_reason_desc = 'reason 38') t > GROUP BY ss_customer_sk > ORDER BY sumsales, > ss_customer_sk > LIMIT 100; > Here is the stack trace: > 2018-06-29 07:00:32 INFO DrillTestLogger:348 - > Exception: > java.sql.SQLException: INTERNAL_ERROR ERROR: > java.lang.reflect.UndeclaredThrowableException > Setup failed for null > Fragment 4:56 > [Error Id: 3c72c14d-9362-4a9b-affb-5cf937bed89e on atsqa6c82.qa.lab:31010] > (org.apache.drill.common.exceptions.ExecutionSetupException) > java.lang.reflect.UndeclaredThrowableException > > org.apache.drill.common.exceptions.ExecutionSetupException.fromThrowable():30 > org.apache.drill.exec.store.hive.readers.HiveAbstractReader.setup():327 > org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():245 > org.apache.drill.exec.physical.impl.ScanBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema():118 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.physical.impl.BaseRootExec.next():103 > > org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 > org.apache.drill.exec.physical.impl.BaseRootExec.next():93 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():281 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 >
[jira] [Commented] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/
[ https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693962#comment-16693962 ] Robert Hou commented on DRILL-6569: --- Enabling "store.hive.optimize_scan_with_native_readers" allows the test to pass. > Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not > read value at 2 in block 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > -- > > Key: DRILL-6569 > URL: https://issues.apache.org/jira/browse/DRILL-6569 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Storage - Hive >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka >Priority: Critical > Fix For: 1.15.0 > > > This is TPCDS Query 19. > I am able to scan the parquet file using: >select * from > dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet` > and I get 3,349,279 rows selected. > There are roughly 15 similar failures in the Advanced nightly run, out of 37 > failures. So this issue accounts for about half the failures. > Query: > /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql > SELECT i_brand_id brand_id, > i_brand brand, > i_manufact_id, > i_manufact, > Sum(ss_ext_sales_price) ext_price > FROM date_dim, > store_sales, > item, > customer, > customer_address, > store > WHERE d_date_sk = ss_sold_date_sk > AND ss_item_sk = i_item_sk > AND i_manager_id = 38 > AND d_moy = 12 > AND d_year = 1998 > AND ss_customer_sk = c_customer_sk > AND c_current_addr_sk = ca_address_sk > AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5) > AND ss_store_sk = s_store_sk > GROUP BY i_brand, > i_brand_id, > i_manufact_id, > i_manufact > ORDER BY ext_price DESC, > i_brand, > i_brand_id, > i_manufact_id, > i_manufact > LIMIT 100; > Here is the stack trace: > 2018-06-29 07:00:32 INFO DrillTestLogger:348 - > Exception: > java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block > 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > Fragment 4:26 > [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010] > (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at > 2 in block 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > > hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243 > hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227 > > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199 > > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57 > > org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417 > org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54 > org.apache.drill.exec.physical.impl.ScanBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 >
[jira] [Updated] (DRILL-6787) Update Spnego webpage
[ https://issues.apache.org/jira/browse/DRILL-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6787: -- Description: A few things should be updated on this webpage: https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/ When configuring drillbits in drill-override.conf, the principal and keytab should be corrected. There are two places where this should be corrected. {noformat} drill.exec.http: { auth.spnego.principal:"HTTP/hostname@realm", auth.spnego.keytab:"path/to/keytab", auth.mechanisms: [“SPNEGO”] } {noformat} For the section on Chrome, we should change "hostname/domain" to "domain". Also, the two blanks around the "=" should be removed. {noformat} google-chrome --auth-server-whitelist="domain" example: google-chrome --auth-server-whitelist="machine.example.com" example: google-chrome --auth-server-whitelist="*.example.com" The IP address can also be used example: google-chrome --auth-server-whitelist="10.10.100.101" The URL given to Chrome to access the Web UI should match the domain specified in auth-server-whitelist. If the domain is used in auth-server-whitelist, then the domain should be used with Chrome. If the IP address is used in auth-server-whitelist, then the IP address should be used with Chrome. {noformat} Also, Linux and Mac should be treated in separate paragraphs. These should be the directions for Mac: {noformat} cd /Applications/Google Chrome.app/Contents/MacOS ./"Google Chrome" --auth-server-whitelist="*.example.com" {noformat} was: A few things should be updated on this webpage: https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/ When configuring drillbits in drill-override.conf, the principal and keytab should be corrected. There are two places where this should be corrected. {noformat} drill.exec.http: { auth.spnego.principal:"HTTP/hostname@realm", auth.spnego.keytab:"path/to/keytab", auth.mechanisms: [“SPNEGO”] } {noformat} For the section on Chrome, we should change "hostname/domain" to "domain". Also, the two blanks around the "=" should be removed. {noformat} google-chrome --auth-server-whitelist="domain" example: google-chrome --auth-server-whitelist="machine.example.com" example: google-chrome --auth-server-whitelist="*.example.com" The IP address can also be used example: google-chrome --auth-server-whitelist="10.10.100.101" The URL given to Chrome to access the Web UI should match the domain specified in auth-server-whitelist. If the domain is used in auth-server-whitelist, then the domain should be used with Chrome. If the IP address is used in auth-server-whitelist, then the IP address should be used with Chrome. {noformat} Also, Linux and Mac should be treated in separate paragraphs. These should be the directions for Mac: {noformat} cd /Applications/Google Chrome.app/Contents/MacOS ./"Google Chrome" --auth-server-whitelist="*.example.com" {noformat} > Update Spnego webpage > - > > Key: DRILL-6787 > URL: https://issues.apache.org/jira/browse/DRILL-6787 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Fix For: 1.15.0 > > > A few things should be updated on this webpage: > https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/ > When configuring drillbits in drill-override.conf, the principal and keytab > should be corrected. There are two places where this should be corrected. > {noformat} > drill.exec.http: { > auth.spnego.principal:"HTTP/hostname@realm", > auth.spnego.keytab:"path/to/keytab", > auth.mechanisms: [“SPNEGO”] > } > {noformat} > For the section on Chrome, we should change "hostname/domain" to "domain". > Also, the two blanks around the "=" should be removed. > {noformat} > google-chrome --auth-server-whitelist="domain" > example: google-chrome --auth-server-whitelist="machine.example.com" > example: google-chrome --auth-server-whitelist="*.example.com" > The IP address can also be used > example: google-chrome --auth-server-whitelist="10.10.100.101" > The URL given to Chrome to access the Web UI should match the domain > specified in auth-server-whitelist. If the domain is used in > auth-server-whitelist, then the domain should be used with Chrome. If the IP > address is used in auth-server-whitelist, then the IP address should be used > with Chrome. > {noformat} > Also, Linux and Mac should be treated in separate paragraphs. These should > be the directions for Mac: > {noformat} > cd /Applications/Google Chrome.app/Contents/MacOS > ./"Google Chrome"
[jira] [Updated] (DRILL-6787) Update Spnego webpage
[ https://issues.apache.org/jira/browse/DRILL-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6787: -- Description: A few things should be updated on this webpage: https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/ When configuring drillbits in drill-override.conf, the principal and keytab should be corrected. There are two places where this should be corrected. {noformat} drill.exec.http: { auth.spnego.principal:"HTTP/hostname@realm", auth.spnego.keytab:"path/to/keytab", auth.mechanisms: [“SPNEGO”] } {noformat} For the section on Chrome, we should change "hostname/domain" to "domain". Also, the two blanks around the "=" should be removed. {noformat} google-chrome --auth-server-whitelist="domain" example: google-chrome --auth-server-whitelist="machine.example.com" example: google-chrome --auth-server-whitelist="*.example.com" The IP address can also be used example: google-chrome --auth-server-whitelist="10.10.100.101" {noformat} The URL given to Chrome to access the Web UI should match the domain specified in auth-server-whitelist. If the domain is used in auth-server-whitelist, then the domain should be used with Chrome. If the IP address is used in auth-server-whitelist, then the IP address should be used with Chrome. Also, Linux and Mac should be treated in separate paragraphs. These should be the directions for Mac: {noformat} cd /Applications/Google Chrome.app/Contents/MacOS ./"Google Chrome" --auth-server-whitelist="*.example.com" {noformat} was: A few things should be updated on this webpage: https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/ When configuring drillbits in drill-override.conf, the principal and keytab should be corrected. There are two places where this should be corrected. {noformat} drill.exec.http: { auth.spnego.principal:"HTTP/hostname@realm", auth.spnego.keytab:"path/to/keytab", auth.mechanisms: [“SPNEGO”] } {noformat} For the section on Chrome, we should change "hostname/domain" to "domain". Also, the two blanks around the "=" should be removed. {noformat} google-chrome --auth-server-whitelist="domain" example: google-chrome --auth-server-whitelist="machine.example.com" example: google-chrome --auth-server-whitelist="*.example.com" The IP address can also be used example: google-chrome --auth-server-whitelist="10.10.100.101" The URL given to Chrome to access the Web UI should match the domain specified in auth-server-whitelist. If the domain is used in auth-server-whitelist, then the domain should be used with Chrome. If the IP address is used in auth-server-whitelist, then the IP address should be used with Chrome. {noformat} Also, Linux and Mac should be treated in separate paragraphs. These should be the directions for Mac: {noformat} cd /Applications/Google Chrome.app/Contents/MacOS ./"Google Chrome" --auth-server-whitelist="*.example.com" {noformat} > Update Spnego webpage > - > > Key: DRILL-6787 > URL: https://issues.apache.org/jira/browse/DRILL-6787 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Fix For: 1.15.0 > > > A few things should be updated on this webpage: > https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/ > When configuring drillbits in drill-override.conf, the principal and keytab > should be corrected. There are two places where this should be corrected. > {noformat} > drill.exec.http: { > auth.spnego.principal:"HTTP/hostname@realm", > auth.spnego.keytab:"path/to/keytab", > auth.mechanisms: [“SPNEGO”] > } > {noformat} > For the section on Chrome, we should change "hostname/domain" to "domain". > Also, the two blanks around the "=" should be removed. > {noformat} > google-chrome --auth-server-whitelist="domain" > example: google-chrome --auth-server-whitelist="machine.example.com" > example: google-chrome --auth-server-whitelist="*.example.com" > The IP address can also be used > example: google-chrome --auth-server-whitelist="10.10.100.101" > {noformat} > The URL given to Chrome to access the Web UI should match the domain > specified in auth-server-whitelist. If the domain is used in > auth-server-whitelist, then the domain should be used with Chrome. If the IP > address is used in auth-server-whitelist, then the IP address should be used > with Chrome. > Also, Linux and Mac should be treated in separate paragraphs. These should > be the directions for Mac: > {noformat} > cd /Applications/Google Chrome.app/Contents/MacOS > ./"Google Chrome"
[jira] [Updated] (DRILL-6787) Update Spnego webpage
[ https://issues.apache.org/jira/browse/DRILL-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6787: -- Description: A few things should be updated on this webpage: https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/ When configuring drillbits in drill-override.conf, the principal and keytab should be corrected. There are two places where this should be corrected. {noformat} drill.exec.http: { auth.spnego.principal:"HTTP/hostname@realm", auth.spnego.keytab:"path/to/keytab", auth.mechanisms: [“SPNEGO”] } {noformat} For the section on Chrome, we should change "hostname/domain" to "domain". Also, the two blanks around the "=" should be removed. {noformat} google-chrome --auth-server-whitelist="domain" example: google-chrome --auth-server-whitelist="machine.example.com" example: google-chrome --auth-server-whitelist="*.example.com" The IP address can also be used example: google-chrome --auth-server-whitelist="10.10.100.101" The URL given to Chrome to access the Web UI should match the domain specified in auth-server-whitelist. If the domain is used in auth-server-whitelist, then the domain should be used with Chrome. If the IP address is used in auth-server-whitelist, then the IP address should be used with Chrome. {noformat} Also, Linux and Mac should be treated in separate paragraphs. These should be the directions for Mac: {noformat} cd /Applications/Google Chrome.app/Contents/MacOS ./"Google Chrome" --auth-server-whitelist="*.example.com" {noformat} was: A few things should be updated on this webpage: https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/ When configuring drillbits in drill-override.conf, the principal and keytab should be corrected. There are two places where this should be corrected. {noformat} drill.exec.http: { auth.spnego.principal:"HTTP/hostname@realm", auth.spnego.keytab:"path/to/keytab", auth.mechanisms: [“SPNEGO”] } {noformat} For the section on Chrome, we should change "hostname/domain" to "domain". Also, the two blanks around the "=" should be removed. {noformat} google-chrome --auth-server-whitelist="domain" example: google-chrome --auth-server-whitelist="machine.example.com" example: google-chrome --auth-server-whitelist="*.example.com" The IP address can also be used example: google-chrome --auth-server-whitelist="10.10.100.101" {noformat} The URL given to Chrome to access the Web UI should match the domain specified in auth-server-whitelist. If the domain is used in auth-server-whitelist, then the domain should be used with Chrome. If the IP address is used in auth-server-whitelist, then the IP address should be used with Chrome. Also, Linux and Mac should be treated in separate paragraphs. These should be the directions for Mac: {noformat} cd /Applications/Google Chrome.app/Contents/MacOS ./"Google Chrome" --auth-server-whitelist="*.example.com" {noformat} > Update Spnego webpage > - > > Key: DRILL-6787 > URL: https://issues.apache.org/jira/browse/DRILL-6787 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Fix For: 1.15.0 > > > A few things should be updated on this webpage: > https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/ > When configuring drillbits in drill-override.conf, the principal and keytab > should be corrected. There are two places where this should be corrected. > {noformat} > drill.exec.http: { > auth.spnego.principal:"HTTP/hostname@realm", > auth.spnego.keytab:"path/to/keytab", > auth.mechanisms: [“SPNEGO”] > } > {noformat} > For the section on Chrome, we should change "hostname/domain" to "domain". > Also, the two blanks around the "=" should be removed. > {noformat} > google-chrome --auth-server-whitelist="domain" > example: google-chrome --auth-server-whitelist="machine.example.com" > example: google-chrome --auth-server-whitelist="*.example.com" > The IP address can also be used > example: google-chrome --auth-server-whitelist="10.10.100.101" > The URL given to Chrome to access the Web UI should match the domain > specified in auth-server-whitelist. If the domain is used in > auth-server-whitelist, then the domain should be used with Chrome. If the IP > address is used in auth-server-whitelist, then the IP address should be used > with Chrome. > {noformat} > Also, Linux and Mac should be treated in separate paragraphs. These should > be the directions for Mac: > {noformat} > cd /Applications/Google Chrome.app/Contents/MacOS > ./"Google Chrome"
[jira] [Updated] (DRILL-6787) Update Spnego webpage
[ https://issues.apache.org/jira/browse/DRILL-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6787: -- Description: A few things should be updated on this webpage: https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/ When configuring drillbits in drill-override.conf, the principal and keytab should be corrected. There are two places where this should be corrected. {noformat} drill.exec.http: { auth.spnego.principal:"HTTP/hostname@realm", auth.spnego.keytab:"path/to/keytab", auth.mechanisms: [“SPNEGO”] } {noformat} For the section on Chrome, we should change "hostname/domain" to "domain". Also, the two blanks around the "=" should be removed. {noformat} google-chrome --auth-server-whitelist="domain" example: google-chrome --auth-server-whitelist="machine.example.com" example: google-chrome --auth-server-whitelist="*.example.com" The IP address can also be used example: google-chrome --auth-server-whitelist="10.10.100.101" The URL given to Chrome to access the Web UI should match the domain specified in auth-server-whitelist. If the domain is used in auth-server-whitelist, then the domain should be used with Chrome. If the IP address is used in auth-server-whitelist, then the IP address should be used with Chrome. {noformat} Also, Linux and Mac should be treated in separate paragraphs. These should be the directions for Mac: {noformat} cd /Applications/Google Chrome.app/Contents/MacOS ./"Google Chrome" --auth-server-whitelist="*.example.com" {noformat} was: A few things should be updated on this webpage: https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/ When configuring drillbits in drill-override.conf, the principal and keytab should be corrected. There are two places where this should be corrected. {noformat} drill.exec.http: { auth.spnego.principal:"HTTP/hostname@realm", auth.spnego.keytab:"path/to/keytab", auth.mechanisms: [“SPNEGO”] } {noformat} For the section on Chrome, we should change "hostname/domain" to "domain". Or "hostname@domain". Also, the two blanks around the "=" should be removed. {noformat} google-chrome --auth-server-whitelist="hostname/domain" {noformat} Also, for the section on Chrome, the "domain" should match the URL given to Chrome to access the Web UI. Also, Linux and Mac should be treated in separate paragraphs. These should be the directions for Mac: {noformat} cd /Applications/Google Chrome.app/Contents/MacOS ./"Google Chrome" --auth-server-whitelist="example.com" {noformat} > Update Spnego webpage > - > > Key: DRILL-6787 > URL: https://issues.apache.org/jira/browse/DRILL-6787 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Fix For: 1.15.0 > > > A few things should be updated on this webpage: > https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/ > When configuring drillbits in drill-override.conf, the principal and keytab > should be corrected. There are two places where this should be corrected. > {noformat} > drill.exec.http: { > auth.spnego.principal:"HTTP/hostname@realm", > auth.spnego.keytab:"path/to/keytab", > auth.mechanisms: [“SPNEGO”] > } > {noformat} > For the section on Chrome, we should change "hostname/domain" to "domain". > Also, the two blanks around the "=" should be removed. > {noformat} > google-chrome --auth-server-whitelist="domain" > example: google-chrome --auth-server-whitelist="machine.example.com" > example: google-chrome --auth-server-whitelist="*.example.com" > The IP address can also be used > example: google-chrome --auth-server-whitelist="10.10.100.101" > The URL given to Chrome to access the Web UI should match the domain > specified in auth-server-whitelist. If the domain is used in > auth-server-whitelist, then the domain should be used with Chrome. If the IP > address is used in auth-server-whitelist, then the IP address should be used > with Chrome. > {noformat} > Also, Linux and Mac should be treated in separate paragraphs. These should > be the directions for Mac: > {noformat} > cd /Applications/Google Chrome.app/Contents/MacOS > ./"Google Chrome" --auth-server-whitelist="*.example.com" > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6787) Update Spnego webpage
Robert Hou created DRILL-6787: - Summary: Update Spnego webpage Key: DRILL-6787 URL: https://issues.apache.org/jira/browse/DRILL-6787 Project: Apache Drill Issue Type: Bug Affects Versions: 1.14.0 Reporter: Robert Hou Assignee: Bridget Bevens Fix For: 1.15.0 A few things should be updated on this webpage: https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/ When configuring drillbits in drill-override.conf, the principal and keytab should be corrected. There are two places where this should be corrected. {noformat} drill.exec.http: { auth.spnego.principal:"HTTP/hostname@realm", auth.spnego.keytab:"path/to/keytab", auth.mechanisms: [“SPNEGO”] } {noformat} For the section on Chrome, we should change "hostname/domain" to "domain". Or "hostname@domain". Also, the two blanks around the "=" should be removed. {noformat} google-chrome --auth-server-whitelist="hostname/domain" {noformat} Also, for the section on Chrome, the "domain" should match the URL given to Chrome to access the Web UI. Also, Linux and Mac should be treated in separate paragraphs. These should be the directions for Mac: {noformat} cd /Applications/Google Chrome.app/Contents/MacOS ./"Google Chrome" --auth-server-whitelist="example.com" {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603787#comment-16603787 ] Robert Hou commented on DRILL-6726: --- [~arina] I have verified the fix. Thanks! Yes, we test with impersonation enabled most of the time. > Drill should return a better error message when an existing view uses a table > that has a mixed case schema > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.15.0 > > Attachments: student > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an > existing view was created before (DRILL-6492) was committed, and this view > references a file that includes a schema which has upper case letters, the > view needs to be rebuilt. There may be variations on this issue that I have > not seen. > To reproduce this problem, create a dfs workspace like this: > {noformat} > "drillTestDirP1": { > "location": "/drill/testdata/p1tests", > "writable": true, > "defaultInputFormat": "parquet", > "allowAccessOutsideWorkspace": false > }, > {noformat} > Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this > command: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute > this query: > {noformat} > select * from student_test_v; > {noformat} > Drill will return an exception: > {noformat} > Error: VALIDATION ERROR: Failure while attempting to expand view. Requested > schema drillTestDirP1 not available in schema dfs. > View Context dfs, drillTestDirP1 > View SQL SELECT * > FROM `dfs.drillTestDirP1`.`student` > [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] > (state=,code=0) > {noformat} > I have attached the student parquet file I used. > This is what the .view.drill file looks like: > {noformat} > { > "name" : "student_test_v", > "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`", > "fields" : [ { > "name" : "**", > "type" : "DYNAMIC_STAR", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ] > } > {noformat} > This means that users may not be able to access views that they have created > using previous versions of Drill. We should maintain backwards > compatibiliity where possible. > As work-around, these views can be re-created. It would be helpful to users > if the error message explains that these views need to be re-created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6726: -- Description: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing view was created before (DRILL-6492) was committed, and this view references a file that includes a schema which has upper case letters, the view needs to be rebuilt. There may be variations on this issue that I have not seen. To reproduce this problem, create a dfs workspace like this: {noformat} "drillTestDirP1": { "location": "/drill/testdata/p1tests", "writable": true, "defaultInputFormat": "parquet", "allowAccessOutsideWorkspace": false }, {noformat} Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this query: {noformat} select * from student_test_v; {noformat} Drill will return an exception: {noformat} Error: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. View Context dfs, drillTestDirP1 View SQL SELECT * FROM `dfs.drillTestDirP1`.`student` [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] (state=,code=0) {noformat} I have attached the student parquet file I used. This is what the .view.drill file looks like: {noformat} { "name" : "student_test_v", "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`", "fields" : [ { "name" : "**", "type" : "DYNAMIC_STAR", "isNullable" : true } ], "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ] } {noformat} This means that users may not be able to access views that they have created using previous versions of Drill. We should maintain backwards compatibiliity where possible. As work-around, these views can be re-created. It would be helpful to users if the error message explains that these views need to be re-created. was: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing view was created before (DRILL-6492) was committed, and this view references a schema which has upper case letters, the view needs to be rebuilt. There may be variations on this issue that I have not seen. To reproduce this problem, create a dfs workspace like this: {noformat} "drillTestDirP1": { "location": "/drill/testdata/p1tests", "writable": true, "defaultInputFormat": "parquet", "allowAccessOutsideWorkspace": false }, {noformat} Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this query: {noformat} select * from student_test_v; {noformat} Drill will return an exception: {noformat} Error: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. View Context dfs, drillTestDirP1 View SQL SELECT * FROM `dfs.drillTestDirP1`.`student` [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] (state=,code=0) {noformat} I have attached the student parquet file I used. This is what the .view.drill file looks like: {noformat} { "name" : "student_test_v", "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`", "fields" : [ { "name" : "**", "type" : "DYNAMIC_STAR", "isNullable" : true } ], "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ] } {noformat} This means that users may not be able to access views that they have created using previous versions of Drill. We should maintain backwards compatibiliity where possible. As work-around, these views can be re-created. It would be helpful to users if the error message explains that these views need to be re-created. > Drill should return a better error message when an existing view uses a table > that has a mixed case schema > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.15.0 > > Attachments: student > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an > existing view was created before (DRILL-6492) was committed, and this view > references a file that includes a
[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6726: -- Description: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing view was created before (DRILL-6492) was committed, and this view references a schema which has upper case letters, the view needs to be rebuilt. There may be variations on this issue that I have not seen. To reproduce this problem, create a dfs workspace like this: {noformat} "drillTestDirP1": { "location": "/drill/testdata/p1tests", "writable": true, "defaultInputFormat": "parquet", "allowAccessOutsideWorkspace": false }, {noformat} Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this query: {noformat} select * from student_test_v; {noformat} Drill will return an exception: {noformat} Error: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. View Context dfs, drillTestDirP1 View SQL SELECT * FROM `dfs.drillTestDirP1`.`student` [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] (state=,code=0) {noformat} I have attached the student parquet file I used. This is what the .view.drill file looks like: {noformat} { "name" : "student_test_v", "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`", "fields" : [ { "name" : "**", "type" : "DYNAMIC_STAR", "isNullable" : true } ], "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ] } {noformat} This means that users may not be able to access views that they have created using previous versions of Drill. We should maintain backwards compatibiliity where possible. As work-around, these views can be re-created. It would be helpful to users if the error message explains that these views need to be re-created. was: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing view was created before (DRILL-6492) was committed, and this view references a schema which has upper case letters, the view needs to be rebuilt. There may be variations on this issue that I have not seen. To reproduce this problem, create a workspace like this: This is the workspace configuration I used: {noformat} "drillTestDirP1": { "location": "/drill/testdata/p1tests", "writable": true, "defaultInputFormat": "parquet", "allowAccessOutsideWorkspace": false }, {noformat} use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this query: {noformat} select * from student_test_v; {noformat} Drill will return an exception: {noformat} Error: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. View Context dfs, drillTestDirP1 View SQL SELECT * FROM `dfs.drillTestDirP1`.`student` [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] (state=,code=0) {noformat} I have attached the student parquet file I used. This is what the .view.drill file looks like: {noformat} { "name" : "student_test_v", "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`", "fields" : [ { "name" : "**", "type" : "DYNAMIC_STAR", "isNullable" : true } ], "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ] } {noformat} This means that users may not be able to access views that they have created using previous versions of Drill. We should maintain backwards compatibiliity where possible. As work-around, these views can be re-created. It would be helpful to users if the error message explains that these views need to be re-created. > Drill should return a better error message when an existing view uses a table > that has a mixed case schema > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.15.0 > > Attachments: student > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an > existing view was created before (DRILL-6492) was committed, and this view > references
[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6726: -- Description: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing view was created before (DRILL-6492) was committed, and this view references a schema which has upper case letters, the view needs to be rebuilt. There may be variations on this issue that I have not seen. To reproduce this problem, create a workspace like this: This is the workspace configuration I used: {noformat} "drillTestDirP1": { "location": "/drill/testdata/p1tests", "writable": true, "defaultInputFormat": "parquet", "allowAccessOutsideWorkspace": false }, {noformat} use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this query: {noformat} select * from student_test_v; {noformat} Drill will return an exception: {noformat} Error: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. View Context dfs, drillTestDirP1 View SQL SELECT * FROM `dfs.drillTestDirP1`.`student` [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] (state=,code=0) {noformat} I have attached the student parquet file I used. This is what the .view.drill file looks like: {noformat} { "name" : "student_test_v", "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`", "fields" : [ { "name" : "**", "type" : "DYNAMIC_STAR", "isNullable" : true } ], "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ] } {noformat} This means that users may not be able to access views that they have created using previous versions of Drill. We should maintain backwards compatibiliity where possible. As work-around, these views can be re-created. It would be helpful to users if the error message explains that these views need to be re-created. was: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing view was created before (DRILL-6492) was committed, and this view references a schema which has upper case letters, the view needs to be rebuilt. There may be variations on this issue that I have not seen. To reproduce this problem, use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this query: {noformat} select * from student_test_v; {noformat} Drill will return an exception: {noformat} Error: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. View Context dfs, drillTestDirP1 View SQL SELECT * FROM `dfs.drillTestDirP1`.`student` [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] (state=,code=0) {noformat} I have attached the student parquet file I used. This is the workspace configuration I used: {noformat} "drillTestDirP1": { "location": "/drill/testdata/p1tests", "writable": true, "defaultInputFormat": "parquet", "allowAccessOutsideWorkspace": false }, {noformat} This is what the .view.drill file looks like: {noformat} { "name" : "student_test_v", "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`", "fields" : [ { "name" : "**", "type" : "DYNAMIC_STAR", "isNullable" : true } ], "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ] } {noformat} This means that users may not be able to access views that they have created using previous versions of Drill. We should maintain backwards compatibiliity where possible. As work-around, these views can be re-created. It would be helpful to users if the error message explains that these views need to be re-created. > Drill should return a better error message when an existing view uses a table > that has a mixed case schema > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.15.0 > > Attachments: student > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an > existing view was created before (DRILL-6492) was committed, and this view >
[jira] [Commented] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602488#comment-16602488 ] Robert Hou commented on DRILL-6726: --- [~arina] I updated the description and attached the parquet file that I used. > Drill should return a better error message when an existing view uses a table > that has a mixed case schema > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.15.0 > > Attachments: student > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an > existing view was created before (DRILL-6492) was committed, and this view > references a schema which has upper case letters, the view needs to be > rebuilt. There may be variations on this issue that I have not seen. > To reproduce this problem, use Drill commit > ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute > this query: > {noformat} > select * from student_test_v; > {noformat} > Drill will return an exception: > {noformat} > Error: VALIDATION ERROR: Failure while attempting to expand view. Requested > schema drillTestDirP1 not available in schema dfs. > View Context dfs, drillTestDirP1 > View SQL SELECT * > FROM `dfs.drillTestDirP1`.`student` > [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] > (state=,code=0) > {noformat} > I have attached the student parquet file I used. > This is the workspace configuration I used: > {noformat} > "drillTestDirP1": { > "location": "/drill/testdata/p1tests", > "writable": true, > "defaultInputFormat": "parquet", > "allowAccessOutsideWorkspace": false > }, > {noformat} > This is what the .view.drill file looks like: > {noformat} > { > "name" : "student_test_v", > "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`", > "fields" : [ { > "name" : "**", > "type" : "DYNAMIC_STAR", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ] > } > {noformat} > This means that users may not be able to access views that they have created > using previous versions of Drill. We should maintain backwards > compatibiliity where possible. > As work-around, these views can be re-created. It would be helpful to users > if the error message explains that these views need to be re-created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6726: -- Attachment: student > Drill should return a better error message when an existing view uses a table > that has a mixed case schema > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.15.0 > > Attachments: student > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an > existing view was created before (DRILL-6492) was committed, and this view > references a schema which has upper case letters, the view needs to be > rebuilt. There may be variations on this issue that I have not seen. > To reproduce this problem, use Drill commit > ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute > this query: > {noformat} > select * from student_test_v; > {noformat} > Drill will return an exception: > {noformat} > Error: VALIDATION ERROR: Failure while attempting to expand view. Requested > schema drillTestDirP1 not available in schema dfs. > View Context dfs, drillTestDirP1 > View SQL SELECT * > FROM `dfs.drillTestDirP1`.`student` > [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] > (state=,code=0) > {noformat} > I have attached the student parquet file I used. > This is the workspace configuration I used: > {noformat} > "drillTestDirP1": { > "location": "/drill/testdata/p1tests", > "writable": true, > "defaultInputFormat": "parquet", > "allowAccessOutsideWorkspace": false > }, > {noformat} > This is what the .view.drill file looks like: > {noformat} > { > "name" : "student_test_v", > "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`", > "fields" : [ { > "name" : "**", > "type" : "DYNAMIC_STAR", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ] > } > {noformat} > This means that users may not be able to access views that they have created > using previous versions of Drill. We should maintain backwards > compatibiliity where possible. > As work-around, these views can be re-created. It would be helpful to users > if the error message explains that these views need to be re-created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6726: -- Description: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing view was created before (DRILL-6492) was committed, and this view references a schema which has upper case letters, the view needs to be rebuilt. There may be variations on this issue that I have not seen. To reproduce this problem, use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this query: {noformat} select * from student_test_v; {noformat} Drill will return an exception: {noformat} Error: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. View Context dfs, drillTestDirP1 View SQL SELECT * FROM `dfs.drillTestDirP1`.`student` [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] (state=,code=0) {noformat} I have attached the student parquet file I used. This is the workspace configuration I used: {noformat} "drillTestDirP1": { "location": "/drill/testdata/p1tests", "writable": true, "defaultInputFormat": "parquet", "allowAccessOutsideWorkspace": false }, {noformat} This is what the .view.drill file looks like: {noformat} { "name" : "student_test_v", "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`", "fields" : [ { "name" : "**", "type" : "DYNAMIC_STAR", "isNullable" : true } ], "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ] } {noformat} This means that users may not be able to access views that they have created using previous versions of Drill. We should maintain backwards compatibiliity where possible. As work-around, these views can be re-created. It would be helpful to users if the error message explains that these views need to be re-created. was: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing view was created before (DRILL-6492) was committed, and this view references a schema which has upper case letters, the view needs to be rebuilt. To reproduce this problem, use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} The use Drill commit If a query references this schema, Drill will return an exception: {noformat} java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. {noformat} It would be helpful to users if the error message explains that these views need to be re-created. > Drill should return a better error message when an existing view uses a table > that has a mixed case schema > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.15.0 > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an > existing view was created before (DRILL-6492) was committed, and this view > references a schema which has upper case letters, the view needs to be > rebuilt. There may be variations on this issue that I have not seen. > To reproduce this problem, use Drill commit > ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute > this query: > {noformat} > select * from student_test_v; > {noformat} > Drill will return an exception: > {noformat} > Error: VALIDATION ERROR: Failure while attempting to expand view. Requested > schema drillTestDirP1 not available in schema dfs. > View Context dfs, drillTestDirP1 > View SQL SELECT * > FROM `dfs.drillTestDirP1`.`student` > [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] > (state=,code=0) > {noformat} > I have attached the student parquet file I used. > This is the workspace configuration I used: > {noformat} > "drillTestDirP1": { > "location": "/drill/testdata/p1tests", > "writable": true, > "defaultInputFormat": "parquet", >
[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6726: -- Description: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing view was created before (DRILL-6492) was committed, and this view references a schema which has upper case letters, the view needs to be rebuilt. To reproduce this problem, use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} The use Drill commit If a query references this schema, Drill will return an exception: {noformat} java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. {noformat} It would be helpful to users if the error message explains that these views need to be re-created. was: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing view was created before (DRILL-6492) was committed, and this view references a schema which has upper case letters, the view needs to be rebuilt. To reproduce this problem, use Drill commit {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} If a query references this schema, Drill will return an exception: {noformat} java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. {noformat} It would be helpful to users if the error message explains that these views need to be re-created. > Drill should return a better error message when an existing view uses a table > that has a mixed case schema > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.15.0 > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an > existing view was created before (DRILL-6492) was committed, and this view > references a schema which has upper case letters, the view needs to be > rebuilt. > To reproduce this problem, use Drill commit > ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this command: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > The use Drill commit > If a query references this schema, Drill will return an exception: > {noformat} > java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand > view. Requested schema drillTestDirP1 not available in schema dfs. > {noformat} > It would be helpful to users if the error message explains that these views > need to be re-created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6726: -- Description: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing view was created before (DRILL-6492) was committed, and this view references a schema which has upper case letters, the view needs to be rebuilt. To reproduce this problem, use Drill commit {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} If a query references this schema, Drill will return an exception: {noformat} java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. {noformat} It would be helpful to users if the error message explains that these views need to be re-created. was: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing view was created before (DRILL-6492) was committed, and this view references a schema which has upper case letters, the view needs to be rebuilt. For example: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} If a query references this schema, Drill will return an exception: {noformat} java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. {noformat} It would be helpful to users if the error message explains that these views need to be re-created. > Drill should return a better error message when an existing view uses a table > that has a mixed case schema > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.15.0 > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an > existing view was created before (DRILL-6492) was committed, and this view > references a schema which has upper case letters, the view needs to be > rebuilt. > To reproduce this problem, use Drill commit > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > If a query references this schema, Drill will return an exception: > {noformat} > java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand > view. Requested schema drillTestDirP1 not available in schema dfs. > {noformat} > It would be helpful to users if the error message explains that these views > need to be re-created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601712#comment-16601712 ] Robert Hou commented on DRILL-6726: --- [~arina] I clarified the issue in the description. Use commit ddb35ce71837376c7caef28c25327ba556bb32f2 to create a view (this is the commit prior to DRILL-6492). Then try to access the view using commit 8bcb103a0e3bcc5f85a03cbed3c6c0cea254ec4e , which is the commit for DRILL-6492. > Drill should return a better error message when an existing view uses a table > that has a mixed case schema > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.15.0 > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an > existing view was created before (DRILL-6492) was committed, and this view > references a schema which has upper case letters, the view needs to be > rebuilt. For example: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > If a query references this schema, Drill will return an exception: > {noformat} > java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand > view. Requested schema drillTestDirP1 not available in schema dfs. > {noformat} > It would be helpful to users if the error message explains that these views > need to be re-created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6726: -- Description: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an existing view was created before (DRILL-6492) was committed, and this view references a schema which has upper case letters, the view needs to be rebuilt. For example: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} If a query references this schema, Drill will return an exception: {noformat} java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. {noformat} It would be helpful to users if the error message explains that these views need to be re-created. was: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view references a schema which has upper case letters, the view needs to be rebuilt. For example: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} If a query references this schema, Drill will return an exception: {noformat} java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. {noformat} It would be helpful to users if the error message explains that these views need to be re-created. > Drill should return a better error message when an existing view uses a table > that has a mixed case schema > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.15.0 > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an > existing view was created before (DRILL-6492) was committed, and this view > references a schema which has upper case letters, the view needs to be > rebuilt. For example: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > If a query references this schema, Drill will return an exception: > {noformat} > java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand > view. Requested schema drillTestDirP1 not available in schema dfs. > {noformat} > It would be helpful to users if the error message explains that these views > need to be re-created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6726) Drill should return a better error message when an existing view uses a table that has a mixed case schema
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6726: -- Summary: Drill should return a better error message when an existing view uses a table that has a mixed case schema (was: Drill should return a better error message when a view uses a table that has a mixed case schema) > Drill should return a better error message when an existing view uses a table > that has a mixed case schema > -- > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.15.0 > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view > references a schema which has upper case letters, the view needs to be > rebuilt. For example: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > If a query references this schema, Drill will return an exception: > {noformat} > java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand > view. Requested schema drillTestDirP1 not available in schema dfs. > {noformat} > It would be helpful to users if the error message explains that these views > need to be re-created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6726) Drill should return a better error message when a view uses a table that has a mixed case schema
[ https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6726: -- Affects Version/s: (was: 1.14.0) 1.15.0 > Drill should return a better error message when a view uses a table that has > a mixed case schema > > > Key: DRILL-6726 > URL: https://issues.apache.org/jira/browse/DRILL-6726 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.15.0 > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view > references a schema which has upper case letters, the view needs to be > rebuilt. For example: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > If a query references this schema, Drill will return an exception: > {noformat} > java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand > view. Requested schema drillTestDirP1 not available in schema dfs. > {noformat} > It would be helpful to users if the error message explains that these views > need to be re-created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6725) Drill 1.15 cannot use existing views if they reference tables with mixed case schemas
[ https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6725: -- Affects Version/s: (was: 1.14.0) 1.15.0 > Drill 1.15 cannot use existing views if they reference tables with mixed case > schemas > - > > Key: DRILL-6725 > URL: https://issues.apache.org/jira/browse/DRILL-6725 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Affects Versions: 1.15.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Fix For: 1.15.0 > > > Drill 1.15 changes schemas to be case-insensitive (DRILL-6492). If a view > references a schema which has upper case letters, the view needs to be > rebuilt. For example: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > If a query references this schema, Drill will return an exception: > {noformat} > java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand > view. Requested schema drillTestDirP1 not available in schema dfs. > {noformat} > Do we have release notes? If so, this should be documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6725) Drill 1.14 cannot use existing views if they reference tables with mixed case schemas
[ https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6725: -- Fix Version/s: (was: 1.14.0) 1.15.0 > Drill 1.14 cannot use existing views if they reference tables with mixed case > schemas > - > > Key: DRILL-6725 > URL: https://issues.apache.org/jira/browse/DRILL-6725 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Fix For: 1.15.0 > > > Drill 1.15 changes schemas to be case-insensitive (DRILL-6492). If a view > references a schema which has upper case letters, the view needs to be > rebuilt. For example: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > If a query references this schema, Drill will return an exception: > {noformat} > java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand > view. Requested schema drillTestDirP1 not available in schema dfs. > {noformat} > Do we have release notes? If so, this should be documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6725) Drill 1.14 cannot use existing views if they reference tables with mixed case schemas
[ https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6725: -- Description: Drill 1.15 changes schemas to be case-insensitive (DRILL-6492). If a view references a schema which has upper case letters, the view needs to be rebuilt. For example: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} If a query references this schema, Drill will return an exception: {noformat} java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. {noformat} Do we have release notes? If so, this should be documented. was: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view references a schema which has upper case letters, the view needs to be rebuilt. For example: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} If a query references this schema, Drill will return an exception: {noformat} java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. {noformat} Do we have release notes? If so, this should be documented. > Drill 1.14 cannot use existing views if they reference tables with mixed case > schemas > - > > Key: DRILL-6725 > URL: https://issues.apache.org/jira/browse/DRILL-6725 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Fix For: 1.15.0 > > > Drill 1.15 changes schemas to be case-insensitive (DRILL-6492). If a view > references a schema which has upper case letters, the view needs to be > rebuilt. For example: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > If a query references this schema, Drill will return an exception: > {noformat} > java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand > view. Requested schema drillTestDirP1 not available in schema dfs. > {noformat} > Do we have release notes? If so, this should be documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6725) Drill 1.15 cannot use existing views if they reference tables with mixed case schemas
[ https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6725: -- Summary: Drill 1.15 cannot use existing views if they reference tables with mixed case schemas (was: Drill 1.14 cannot use existing views if they reference tables with mixed case schemas) > Drill 1.15 cannot use existing views if they reference tables with mixed case > schemas > - > > Key: DRILL-6725 > URL: https://issues.apache.org/jira/browse/DRILL-6725 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Fix For: 1.15.0 > > > Drill 1.15 changes schemas to be case-insensitive (DRILL-6492). If a view > references a schema which has upper case letters, the view needs to be > rebuilt. For example: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > If a query references this schema, Drill will return an exception: > {noformat} > java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand > view. Requested schema drillTestDirP1 not available in schema dfs. > {noformat} > Do we have release notes? If so, this should be documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6726) Drill should return a better error message when a view uses a table that has a mixed case schema
Robert Hou created DRILL-6726: - Summary: Drill should return a better error message when a view uses a table that has a mixed case schema Key: DRILL-6726 URL: https://issues.apache.org/jira/browse/DRILL-6726 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.14.0 Reporter: Robert Hou Assignee: Arina Ielchiieva Fix For: 1.15.0 Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view references a schema which has upper case letters, the view needs to be rebuilt. For example: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} If a query references this schema, Drill will return an exception: {noformat} java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. {noformat} It would be helpful to users if the error message explains that these views need to be re-created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6725) Drill 1.14 cannot use existing views if they reference tables with mixed case schemas
[ https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6725: -- Description: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view references a schema which has upper case letters, the view needs to be rebuilt. For example: {noformat} create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; {noformat} If a query references this schema, Drill will return an exception: {noformat} java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. {noformat} Do we have release notes? If so, this should be documented. was: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view references a schema which has upper case letters, the view needs to be rebuilt. For example: create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; If a query references this schema, Drill will return an exception: java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. Do we have release notes? If so, this should be documented. > Drill 1.14 cannot use existing views if they reference tables with mixed case > schemas > - > > Key: DRILL-6725 > URL: https://issues.apache.org/jira/browse/DRILL-6725 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Fix For: 1.14.0 > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view > references a schema which has upper case letters, the view needs to be > rebuilt. For example: > {noformat} > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > {noformat} > If a query references this schema, Drill will return an exception: > {noformat} > java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand > view. Requested schema drillTestDirP1 not available in schema dfs. > {noformat} > Do we have release notes? If so, this should be documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6725) Drill 1.14 cannot use existing views if they reference tables with mixed case schemas
[ https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6725: -- Description: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view references a schema which has upper case letters, the view needs to be rebuilt. For example: create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; If a query references this schema, Drill will return an exception: java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand view. Requested schema drillTestDirP1 not available in schema dfs. Do we have release notes? If so, this should be documented. was: Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view references a schema which has upper case letters, the view needs to be rebuilt. For example: create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; Do we have release notes? If so, this should be documented. > Drill 1.14 cannot use existing views if they reference tables with mixed case > schemas > - > > Key: DRILL-6725 > URL: https://issues.apache.org/jira/browse/DRILL-6725 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Fix For: 1.14.0 > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view > references a schema which has upper case letters, the view needs to be > rebuilt. For example: > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > If a query references this schema, Drill will return an exception: > java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand > view. Requested schema drillTestDirP1 not available in schema dfs. > Do we have release notes? If so, this should be documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6725) Drill 1.14 cannot use existing views if they reference tables with mixed case schemas
[ https://issues.apache.org/jira/browse/DRILL-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-6725: -- Summary: Drill 1.14 cannot use existing views if they reference tables with mixed case schemas (was: Views cannot use tables with mixed case schemas) > Drill 1.14 cannot use existing views if they reference tables with mixed case > schemas > - > > Key: DRILL-6725 > URL: https://issues.apache.org/jira/browse/DRILL-6725 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Fix For: 1.14.0 > > > Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view > references a schema which has upper case letters, the view needs to be > rebuilt. For example: > create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * > from `dfs.drillTestDirP1`.student; > Do we have release notes? If so, this should be documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6725) Views cannot use tables with mixed case schemas
Robert Hou created DRILL-6725: - Summary: Views cannot use tables with mixed case schemas Key: DRILL-6725 URL: https://issues.apache.org/jira/browse/DRILL-6725 Project: Apache Drill Issue Type: Bug Components: Documentation Affects Versions: 1.14.0 Reporter: Robert Hou Assignee: Bridget Bevens Fix For: 1.14.0 Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view references a schema which has upper case letters, the view needs to be rebuilt. For example: create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from `dfs.drillTestDirP1`.student; Do we have release notes? If so, this should be documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)