[jira] [Updated] (DRILL-7227) TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100
[ https://issues.apache.org/jira/browse/DRILL-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-7227: Labels: (was: ready-to-commit) > TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100 > - > > Key: DRILL-7227 > URL: https://issues.apache.org/jira/browse/DRILL-7227 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Major > Fix For: 1.17.0 > > Attachments: 23387ab0-cb1c-cd5e-449a-c9bcefc901c1.sys.drill, > 2338ae93-155b-356d-382e-0da949c6f439.sys.drill > > > Here is query 78: > {noformat} > WITH ws > AS (SELECT d_year AS ws_sold_year, > ws_item_sk, > ws_bill_customer_skws_customer_sk, > Sum(ws_quantity) ws_qty, > Sum(ws_wholesale_cost) ws_wc, > Sum(ws_sales_price)ws_sp > FROM web_sales > LEFT JOIN web_returns >ON wr_order_number = ws_order_number > AND ws_item_sk = wr_item_sk > JOIN date_dim > ON ws_sold_date_sk = d_date_sk > WHERE wr_order_number IS NULL > GROUP BY d_year, >ws_item_sk, >ws_bill_customer_sk), > cs > AS (SELECT d_year AS cs_sold_year, > cs_item_sk, > cs_bill_customer_skcs_customer_sk, > Sum(cs_quantity) cs_qty, > Sum(cs_wholesale_cost) cs_wc, > Sum(cs_sales_price)cs_sp > FROM catalog_sales > LEFT JOIN catalog_returns >ON cr_order_number = cs_order_number > AND cs_item_sk = cr_item_sk > JOIN date_dim > ON cs_sold_date_sk = d_date_sk > WHERE cr_order_number IS NULL > GROUP BY d_year, >cs_item_sk, >cs_bill_customer_sk), > ss > AS (SELECT d_year AS ss_sold_year, > ss_item_sk, > ss_customer_sk, > Sum(ss_quantity) ss_qty, > Sum(ss_wholesale_cost) ss_wc, > Sum(ss_sales_price)ss_sp > FROM store_sales > LEFT JOIN store_returns >ON sr_ticket_number = ss_ticket_number > AND ss_item_sk = sr_item_sk > JOIN date_dim > ON ss_sold_date_sk = d_date_sk > WHERE sr_ticket_number IS NULL > GROUP BY d_year, >ss_item_sk, >ss_customer_sk) > SELECT ss_item_sk, >Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) ratio, >ss_qty store_qty, >ss_wc >store_wholesale_cost, >ss_sp >store_sales_price, >COALESCE(ws_qty, 0) + COALESCE(cs_qty, 0) >other_chan_qty, >COALESCE(ws_wc, 0) + COALESCE(cs_wc, 0) >other_chan_wholesale_cost, >COALESCE(ws_sp, 0) + COALESCE(cs_sp, 0) >other_chan_sales_price > FROM ss >LEFT JOIN ws > ON ( ws_sold_year = ss_sold_year >AND ws_item_sk = ss_item_sk >AND ws_customer_sk = ss_customer_sk ) >LEFT JOIN cs > ON ( cs_sold_year = ss_sold_year >AND cs_item_sk = cs_item_sk >AND cs_customer_sk = ss_customer_sk ) > WHERE COALESCE(ws_qty, 0) > 0 >AND COALESCE(cs_qty, 0) > 0 >AND ss_sold_year = 1999 > ORDER BY ss_item_sk, > ss_qty DESC, > ss_wc DESC, > ss_sp DESC, > other_chan_qty, > other_chan_wholesale_cost, > other_chan_sales_price, > Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) > LIMIT 100; > {noformat} > The profile for the new plan is 2338ae93-155b-356d-382e-0da949c6f439. Hash > partition sender operator (10-00) takes 10-15 minutes. I am not sure why it > takes so long. It has 10 minor fragments sending to receiver (06-05), which > has 62 minor fragments. But hash partition sender (16-00) has 10 minor > fragments sending to receiver (12-06), which has 220 minor fragments, and > there is no performance issue. > The profile for the old plan is 23387ab0-cb1c-cd5e-449a-c9bcefc901c1. Both > plans use the same commit.
[jira] [Created] (DRILL-7245) TPCDS queries 1, 45, 65, 97 are 3x slower when Statistics is enabled at sf 100
Gautam Parai created DRILL-7245: --- Summary: TPCDS queries 1, 45, 65, 97 are 3x slower when Statistics is enabled at sf 100 Key: DRILL-7245 URL: https://issues.apache.org/jira/browse/DRILL-7245 Project: Apache Drill Issue Type: Bug Affects Versions: 1.16.0 Reporter: Gautam Parai Assignee: Gautam Parai Fix For: 1.17.0 Here is query 65: {noformat} SELECT s_store_name, i_item_desc, sc.revenue, i_current_price, i_wholesale_cost, i_brand FROM store, item, (SELECT ss_store_sk, Avg(revenue) AS ave FROM (SELECT ss_store_sk, ss_item_sk, Sum(ss_sales_price) AS revenue FROM store_sales, date_dim WHERE ss_sold_date_sk = d_date_sk AND d_month_seq BETWEEN 1199 AND 1199 + 11 GROUP BY ss_store_sk, ss_item_sk) sa GROUP BY ss_store_sk) sb, (SELECT ss_store_sk, ss_item_sk, Sum(ss_sales_price) AS revenue FROM store_sales, date_dim WHERE ss_sold_date_sk = d_date_sk AND d_month_seq BETWEEN 1199 AND 1199 + 11 GROUP BY ss_store_sk, ss_item_sk) sc WHERE sb.ss_store_sk = sc.ss_store_sk AND sc.revenue <= 0.1 * sb.ave AND s_store_sk = sc.ss_store_sk AND i_item_sk = sc.ss_item_sk ORDER BY s_store_name, i_item_desc LIMIT 100; {noformat} Here is the new plan. {noformat} 00-00Screen : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402776556728E9 rows, 9.49229008144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45433 00-01 Project(s_store_name=[$0], i_item_desc=[$1], revenue=[$2], i_current_price=[$3], i_wholesale_cost=[$4], i_brand=[$5]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402766556728E9 rows, 9.49229007144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45432 00-02SelectionVectorRemover : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402666556728E9 rows, 9.49228947144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45431 00-03 Limit(fetch=[100]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402566556728E9 rows, 9.49228937144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45430 00-04SingleMergeExchange(sort0=[0], sort1=[1]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402466556728E9 rows, 9.49228897144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45429 01-01 OrderedMuxExchange(sort0=[0], sort1=[1]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402366556728E9 rows, 9.492286842675482E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45428 02-01SelectionVectorRemover : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402266556728E9 rows, 9.492286742675482E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45427 02-02 Limit(fetch=[100]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402166556728E9 rows, 9.492286642675482E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45426 02-03SelectionVectorRemover : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 416478.4359081372, cumulative cost = {1.170402066556728E9 rows, 9.492286242675482E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network,
[jira] [Updated] (DRILL-7227) TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100
[ https://issues.apache.org/jira/browse/DRILL-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-7227: Labels: ready-to-commit (was: ) > TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100 > - > > Key: DRILL-7227 > URL: https://issues.apache.org/jira/browse/DRILL-7227 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > Attachments: 23387ab0-cb1c-cd5e-449a-c9bcefc901c1.sys.drill, > 2338ae93-155b-356d-382e-0da949c6f439.sys.drill > > > Here is query 78: > {noformat} > WITH ws > AS (SELECT d_year AS ws_sold_year, > ws_item_sk, > ws_bill_customer_skws_customer_sk, > Sum(ws_quantity) ws_qty, > Sum(ws_wholesale_cost) ws_wc, > Sum(ws_sales_price)ws_sp > FROM web_sales > LEFT JOIN web_returns >ON wr_order_number = ws_order_number > AND ws_item_sk = wr_item_sk > JOIN date_dim > ON ws_sold_date_sk = d_date_sk > WHERE wr_order_number IS NULL > GROUP BY d_year, >ws_item_sk, >ws_bill_customer_sk), > cs > AS (SELECT d_year AS cs_sold_year, > cs_item_sk, > cs_bill_customer_skcs_customer_sk, > Sum(cs_quantity) cs_qty, > Sum(cs_wholesale_cost) cs_wc, > Sum(cs_sales_price)cs_sp > FROM catalog_sales > LEFT JOIN catalog_returns >ON cr_order_number = cs_order_number > AND cs_item_sk = cr_item_sk > JOIN date_dim > ON cs_sold_date_sk = d_date_sk > WHERE cr_order_number IS NULL > GROUP BY d_year, >cs_item_sk, >cs_bill_customer_sk), > ss > AS (SELECT d_year AS ss_sold_year, > ss_item_sk, > ss_customer_sk, > Sum(ss_quantity) ss_qty, > Sum(ss_wholesale_cost) ss_wc, > Sum(ss_sales_price)ss_sp > FROM store_sales > LEFT JOIN store_returns >ON sr_ticket_number = ss_ticket_number > AND ss_item_sk = sr_item_sk > JOIN date_dim > ON ss_sold_date_sk = d_date_sk > WHERE sr_ticket_number IS NULL > GROUP BY d_year, >ss_item_sk, >ss_customer_sk) > SELECT ss_item_sk, >Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) ratio, >ss_qty store_qty, >ss_wc >store_wholesale_cost, >ss_sp >store_sales_price, >COALESCE(ws_qty, 0) + COALESCE(cs_qty, 0) >other_chan_qty, >COALESCE(ws_wc, 0) + COALESCE(cs_wc, 0) >other_chan_wholesale_cost, >COALESCE(ws_sp, 0) + COALESCE(cs_sp, 0) >other_chan_sales_price > FROM ss >LEFT JOIN ws > ON ( ws_sold_year = ss_sold_year >AND ws_item_sk = ss_item_sk >AND ws_customer_sk = ss_customer_sk ) >LEFT JOIN cs > ON ( cs_sold_year = ss_sold_year >AND cs_item_sk = cs_item_sk >AND cs_customer_sk = ss_customer_sk ) > WHERE COALESCE(ws_qty, 0) > 0 >AND COALESCE(cs_qty, 0) > 0 >AND ss_sold_year = 1999 > ORDER BY ss_item_sk, > ss_qty DESC, > ss_wc DESC, > ss_sp DESC, > other_chan_qty, > other_chan_wholesale_cost, > other_chan_sales_price, > Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) > LIMIT 100; > {noformat} > The profile for the new plan is 2338ae93-155b-356d-382e-0da949c6f439. Hash > partition sender operator (10-00) takes 10-15 minutes. I am not sure why it > takes so long. It has 10 minor fragments sending to receiver (06-05), which > has 62 minor fragments. But hash partition sender (16-00) has 10 minor > fragments sending to receiver (12-06), which has 220 minor fragments, and > there is no performance issue. > The profile for the old plan is
[jira] [Commented] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed
[ https://issues.apache.org/jira/browse/DRILL-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834153#comment-16834153 ] Gautam Parai commented on DRILL-7148: - [~anishareddy] / [~agirish] these 2 tests require a baseline update. Could you please update the baseline? Thanks! > TPCH query 17 increases execution time with Statistics enabled because join > order is changed > > > Key: DRILL-7148 > URL: https://issues.apache.org/jira/browse/DRILL-7148 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Major > Fix For: 1.17.0 > > > TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order > has flipped the build side and the probe side in Major Fragment 01. > Here is the query: > select > sum(l.l_extendedprice) / 7.0 as avg_yearly > from > lineitem l, > part p > where > p.p_partkey = l.l_partkey > and p.p_brand = 'Brand#13' > and p.p_container = 'JUMBO CAN' > and l.l_quantity < ( > select > 0.2 * avg(l2.l_quantity) > from > lineitem l2 > where > l2.l_partkey = p.p_partkey > ); > Here is original plan: > {noformat} > 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, > cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489493 > 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY > avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, > 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489492 > 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, > 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489491 > 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, > cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489490 > 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, > 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489489 > 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY > l_extendedprice): rowcount = 2.948545E9, cumulative cost = > \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488 > 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = > 2.948545E9, cumulative cost = \{7.253787630218E10 rows, > 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489487 > 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): > rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, > 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489486 > 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) > : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, > ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = > \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485 > 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : > rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): > rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, > 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 > memory}, id = 489476 > 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY > E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = > \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, > 9.8471562592256E13 network, 7.92E7 memory}, id = 489475 > 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = > 5.89709E9, cumulative cost = \{3.0417948545E10 rows, 1.57618732434E11 > cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489474 > 04-01
[jira] [Updated] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed
[ https://issues.apache.org/jira/browse/DRILL-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-7148: Labels: ready-to-commit (was: ) > TPCH query 17 increases execution time with Statistics enabled because join > order is changed > > > Key: DRILL-7148 > URL: https://issues.apache.org/jira/browse/DRILL-7148 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order > has flipped the build side and the probe side in Major Fragment 01. > Here is the query: > select > sum(l.l_extendedprice) / 7.0 as avg_yearly > from > lineitem l, > part p > where > p.p_partkey = l.l_partkey > and p.p_brand = 'Brand#13' > and p.p_container = 'JUMBO CAN' > and l.l_quantity < ( > select > 0.2 * avg(l2.l_quantity) > from > lineitem l2 > where > l2.l_partkey = p.p_partkey > ); > Here is original plan: > {noformat} > 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, > cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489493 > 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY > avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, > 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489492 > 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, > 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489491 > 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, > cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489490 > 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, > 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489489 > 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY > l_extendedprice): rowcount = 2.948545E9, cumulative cost = > \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488 > 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = > 2.948545E9, cumulative cost = \{7.253787630218E10 rows, > 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489487 > 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): > rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, > 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489486 > 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) > : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, > ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = > \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485 > 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : > rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): > rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, > 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 > memory}, id = 489476 > 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY > E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = > \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, > 9.8471562592256E13 network, 7.92E7 memory}, id = 489475 > 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = > 5.89709E9, cumulative cost = \{3.0417948545E10 rows, 1.57618732434E11 > cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489474 > 04-01 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2], >
[jira] [Updated] (DRILL-7231) TPCDS-21 regresses after fix for DRILL-7148
[ https://issues.apache.org/jira/browse/DRILL-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-7231: Description: The join rowcount regresses a lot after changes made for DRILL-7148. This affects several TPC-DS queries. One of the fixes for DRILL-7148, introduced a change in DrillRelMDDistinctRowcount to only use the guess of 0.1*input_row_count when not all columns in the group-by key have NDV statistics. However, the fix was incorrect and instead caused it to use the guess-timate NDV even when statistics were present. Since the NDV was estimated as 0.1 * input_count_count because of the regression, the join cardinality was severely underestimated for TPCDS-21 = 400M * 15 / Max(400K, 15) = 150. was:The join rowcount regresses a lot after changes made for DRILL-7148. This affects several TPC-DS queries. > TPCDS-21 regresses after fix for DRILL-7148 > --- > > Key: DRILL-7231 > URL: https://issues.apache.org/jira/browse/DRILL-7231 > Project: Apache Drill > Issue Type: Bug >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > The join rowcount regresses a lot after changes made for DRILL-7148. This > affects several TPC-DS queries. > One of the fixes for DRILL-7148, introduced a change in > DrillRelMDDistinctRowcount to only use the guess of 0.1*input_row_count when > not all columns in the group-by key have NDV statistics. However, the fix was > incorrect and instead caused it to use the guess-timate NDV even when > statistics were present. > Since the NDV was estimated as 0.1 * input_count_count because of the > regression, the join cardinality was severely underestimated for TPCDS-21 = > 400M * 15 / Max(400K, 15) = 150. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7231) TPCDS-21 regresses after fix for DRILL-7148
Gautam Parai created DRILL-7231: --- Summary: TPCDS-21 regresses after fix for DRILL-7148 Key: DRILL-7231 URL: https://issues.apache.org/jira/browse/DRILL-7231 Project: Apache Drill Issue Type: Bug Reporter: Gautam Parai Assignee: Gautam Parai The join rowcount regresses a lot after changes made for DRILL-7148. This affects several TPC-DS queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed
[ https://issues.apache.org/jira/browse/DRILL-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829718#comment-16829718 ] Gautam Parai commented on DRILL-7148: - PR link is present in the Issue Links section > TPCH query 17 increases execution time with Statistics enabled because join > order is changed > > > Key: DRILL-7148 > URL: https://issues.apache.org/jira/browse/DRILL-7148 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order > has flipped the build side and the probe side in Major Fragment 01. > Here is the query: > select > sum(l.l_extendedprice) / 7.0 as avg_yearly > from > lineitem l, > part p > where > p.p_partkey = l.l_partkey > and p.p_brand = 'Brand#13' > and p.p_container = 'JUMBO CAN' > and l.l_quantity < ( > select > 0.2 * avg(l2.l_quantity) > from > lineitem l2 > where > l2.l_partkey = p.p_partkey > ); > Here is original plan: > {noformat} > 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, > cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489493 > 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY > avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, > 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489492 > 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, > 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489491 > 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, > cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489490 > 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, > 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489489 > 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY > l_extendedprice): rowcount = 2.948545E9, cumulative cost = > \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488 > 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = > 2.948545E9, cumulative cost = \{7.253787630218E10 rows, > 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489487 > 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): > rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, > 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489486 > 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) > : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, > ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = > \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485 > 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : > rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): > rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, > 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 > memory}, id = 489476 > 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY > E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = > \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, > 9.8471562592256E13 network, 7.92E7 memory}, id = 489475 > 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = > 5.89709E9, cumulative cost = \{3.0417948545E10 rows, 1.57618732434E11 > cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489474 > 04-01 Project(l_quantity=[$0], l_extendedprice=[$1],
[jira] [Commented] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed
[ https://issues.apache.org/jira/browse/DRILL-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829717#comment-16829717 ] Gautam Parai commented on DRILL-7148: - [https://github.com/apache/drill/pull/1744] > TPCH query 17 increases execution time with Statistics enabled because join > order is changed > > > Key: DRILL-7148 > URL: https://issues.apache.org/jira/browse/DRILL-7148 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order > has flipped the build side and the probe side in Major Fragment 01. > Here is the query: > select > sum(l.l_extendedprice) / 7.0 as avg_yearly > from > lineitem l, > part p > where > p.p_partkey = l.l_partkey > and p.p_brand = 'Brand#13' > and p.p_container = 'JUMBO CAN' > and l.l_quantity < ( > select > 0.2 * avg(l2.l_quantity) > from > lineitem l2 > where > l2.l_partkey = p.p_partkey > ); > Here is original plan: > {noformat} > 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, > cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489493 > 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY > avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, > 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489492 > 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, > 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489491 > 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, > cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489490 > 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, > 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489489 > 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY > l_extendedprice): rowcount = 2.948545E9, cumulative cost = > \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488 > 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = > 2.948545E9, cumulative cost = \{7.253787630218E10 rows, > 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489487 > 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): > rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, > 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489486 > 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) > : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, > ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = > \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485 > 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : > rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): > rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, > 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 > memory}, id = 489476 > 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY > E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = > \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, > 9.8471562592256E13 network, 7.92E7 memory}, id = 489475 > 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = > 5.89709E9, cumulative cost = \{3.0417948545E10 rows, 1.57618732434E11 > cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489474 > 04-01 Project(l_quantity=[$0], l_extendedprice=[$1],
[jira] [Issue Comment Deleted] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed
[ https://issues.apache.org/jira/browse/DRILL-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-7148: Comment: was deleted (was: [https://github.com/apache/drill/pull/1744]) > TPCH query 17 increases execution time with Statistics enabled because join > order is changed > > > Key: DRILL-7148 > URL: https://issues.apache.org/jira/browse/DRILL-7148 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order > has flipped the build side and the probe side in Major Fragment 01. > Here is the query: > select > sum(l.l_extendedprice) / 7.0 as avg_yearly > from > lineitem l, > part p > where > p.p_partkey = l.l_partkey > and p.p_brand = 'Brand#13' > and p.p_container = 'JUMBO CAN' > and l.l_quantity < ( > select > 0.2 * avg(l2.l_quantity) > from > lineitem l2 > where > l2.l_partkey = p.p_partkey > ); > Here is original plan: > {noformat} > 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, > cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489493 > 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY > avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, > 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489492 > 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, > 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489491 > 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, > cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489490 > 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, > 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489489 > 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY > l_extendedprice): rowcount = 2.948545E9, cumulative cost = > \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488 > 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = > 2.948545E9, cumulative cost = \{7.253787630218E10 rows, > 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489487 > 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): > rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, > 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489486 > 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) > : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, > ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = > \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485 > 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : > rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): > rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, > 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 > memory}, id = 489476 > 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY > E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = > \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, > 9.8471562592256E13 network, 7.92E7 memory}, id = 489475 > 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = > 5.89709E9, cumulative cost = \{3.0417948545E10 rows, 1.57618732434E11 > cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489474 > 04-01 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2],
[jira] [Updated] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed
[ https://issues.apache.org/jira/browse/DRILL-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-7148: Labels: ready-to-commit (was: ) > TPCH query 17 increases execution time with Statistics enabled because join > order is changed > > > Key: DRILL-7148 > URL: https://issues.apache.org/jira/browse/DRILL-7148 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order > has flipped the build side and the probe side in Major Fragment 01. > Here is the query: > select > sum(l.l_extendedprice) / 7.0 as avg_yearly > from > lineitem l, > part p > where > p.p_partkey = l.l_partkey > and p.p_brand = 'Brand#13' > and p.p_container = 'JUMBO CAN' > and l.l_quantity < ( > select > 0.2 * avg(l2.l_quantity) > from > lineitem l2 > where > l2.l_partkey = p.p_partkey > ); > Here is original plan: > {noformat} > 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, > cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489493 > 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY > avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, > 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489492 > 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, > 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489491 > 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, > cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489490 > 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, > 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489489 > 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY > l_extendedprice): rowcount = 2.948545E9, cumulative cost = > \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488 > 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = > 2.948545E9, cumulative cost = \{7.253787630218E10 rows, > 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489487 > 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): > rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, > 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489486 > 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) > : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, > ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = > \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485 > 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : > rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): > rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, > 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 > memory}, id = 489476 > 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY > E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = > \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, > 9.8471562592256E13 network, 7.92E7 memory}, id = 489475 > 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = > 5.89709E9, cumulative cost = \{3.0417948545E10 rows, 1.57618732434E11 > cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489474 > 04-01 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2], >
[jira] [Updated] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed
[ https://issues.apache.org/jira/browse/DRILL-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-7148: Fix Version/s: (was: 1.16.0) 1.17.0 > TPCH query 17 increases execution time with Statistics enabled because join > order is changed > > > Key: DRILL-7148 > URL: https://issues.apache.org/jira/browse/DRILL-7148 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Major > Fix For: 1.17.0 > > > TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order > has flipped the build side and the probe side in Major Fragment 01. > Here is the query: > select > sum(l.l_extendedprice) / 7.0 as avg_yearly > from > lineitem l, > part p > where > p.p_partkey = l.l_partkey > and p.p_brand = 'Brand#13' > and p.p_container = 'JUMBO CAN' > and l.l_quantity < ( > select > 0.2 * avg(l2.l_quantity) > from > lineitem l2 > where > l2.l_partkey = p.p_partkey > ); > Here is original plan: > {noformat} > 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, > cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489493 > 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY > avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, > 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489492 > 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, > 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, > 1.17294998955024E11 memory}, id = 489491 > 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, > cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, > 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 > memory}, id = 489490 > 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): > rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, > 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489489 > 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY > l_extendedprice): rowcount = 2.948545E9, cumulative cost = > \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488 > 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = > 2.948545E9, cumulative cost = \{7.253787630218E10 rows, > 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489487 > 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): > rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, > 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, > 1.17294998955024E11 memory}, id = 489486 > 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) > : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, > ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = > \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, > 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485 > 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : > rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): > rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, > 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 > memory}, id = 489476 > 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY > l_quantity, ANY l_extendedprice, ANY p_partkey, ANY > E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = > \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, > 9.8471562592256E13 network, 7.92E7 memory}, id = 489475 > 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_quantity, ANY > l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = > 5.89709E9, cumulative cost = \{3.0417948545E10 rows, 1.57618732434E11 > cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489474 > 04-01 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2], > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2,
[jira] [Commented] (DRILL-7114) ANALYZE command generates warnings for stats file and materialization
[ https://issues.apache.org/jira/browse/DRILL-7114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811085#comment-16811085 ] Gautam Parai commented on DRILL-7114: - Fixed with commit ID: e477480e7b9626abc8efd70914d3bfd4321b7258 > ANALYZE command generates warnings for stats file and materialization > - > > Key: DRILL-7114 > URL: https://issues.apache.org/jira/browse/DRILL-7114 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Reporter: Aman Sinha >Assignee: Gautam Parai >Priority: Minor > Fix For: 1.16.0 > > > When I run ANALYZE, I see warnings in the log file as shown below. The > ANALYZE command should not try to read the stats file or materialize the > stats. > {noformat} > 12:04:32.939 [2370143e-c419-f33c-d879-84989712bc85:foreman] WARN > o.a.d.e.p.common.DrillStatsTable - Failed to read the stats file. > java.io.FileNotFoundException: File /tmp/orders3/.stats.drill/0_0.json does > not exist > 12:04:32.939 [2370143e-c419-f33c-d879-84989712bc85:foreman] WARN > o.a.d.e.p.common.DrillStatsTable - Failed to materialize the stats. > Continuing without stats. > java.io.FileNotFoundException: File /tmp/orders3/.stats.drill/0_0.json does > not exist > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-7114) ANALYZE command generates warnings for stats file and materialization
[ https://issues.apache.org/jira/browse/DRILL-7114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai resolved DRILL-7114. - Resolution: Fixed > ANALYZE command generates warnings for stats file and materialization > - > > Key: DRILL-7114 > URL: https://issues.apache.org/jira/browse/DRILL-7114 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Reporter: Aman Sinha >Assignee: Gautam Parai >Priority: Minor > Fix For: 1.16.0 > > > When I run ANALYZE, I see warnings in the log file as shown below. The > ANALYZE command should not try to read the stats file or materialize the > stats. > {noformat} > 12:04:32.939 [2370143e-c419-f33c-d879-84989712bc85:foreman] WARN > o.a.d.e.p.common.DrillStatsTable - Failed to read the stats file. > java.io.FileNotFoundException: File /tmp/orders3/.stats.drill/0_0.json does > not exist > 12:04:32.939 [2370143e-c419-f33c-d879-84989712bc85:foreman] WARN > o.a.d.e.p.common.DrillStatsTable - Failed to materialize the stats. > Continuing without stats. > java.io.FileNotFoundException: File /tmp/orders3/.stats.drill/0_0.json does > not exist > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7136) Num_buckets for HashAgg in profile may be inaccurate
[ https://issues.apache.org/jira/browse/DRILL-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai reassigned DRILL-7136: --- Assignee: Boaz Ben-Zvi (was: Gautam Parai) > Num_buckets for HashAgg in profile may be inaccurate > > > Key: DRILL-7136 > URL: https://issues.apache.org/jira/browse/DRILL-7136 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build Test >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Major > Fix For: 1.16.0 > > Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill > > > I ran TPCH query 17 with sf 1000. Here is the query: > {noformat} > select > sum(l.l_extendedprice) / 7.0 as avg_yearly > from > lineitem l, > part p > where > p.p_partkey = l.l_partkey > and p.p_brand = 'Brand#13' > and p.p_container = 'JUMBO CAN' > and l.l_quantity < ( > select > 0.2 * avg(l2.l_quantity) > from > lineitem l2 > where > l2.l_partkey = p.p_partkey > ); > {noformat} > One of the hash agg operators has resized 6 times. It should have 4M > buckets. But the profile shows it has 64K buckets. > I have attached a sample profile. In this profile, the hash agg operator is > (04-02). > {noformat} > Operator Metrics > Minor FragmentNUM_BUCKETS NUM_ENTRIES NUM_RESIZING > RESIZING_TIME_MSNUM_PARTITIONS SPILLED_PARTITIONS SPILL_MB > SPILL_CYCLE INPUT_BATCH_COUNT AVG_INPUT_BATCH_BYTES > AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT OUTPUT_BATCH_COUNT > AVG_OUTPUT_BATCH_BYTES AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT > 04-00-02 65,536 748,746 6 364 1 > 582 0 813 582,653 18 26,316,456 401 1,631,943 > 25 26,176,350 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed
Gautam Parai created DRILL-7148: --- Summary: TPCH query 17 increases execution time with Statistics enabled because join order is changed Key: DRILL-7148 URL: https://issues.apache.org/jira/browse/DRILL-7148 Project: Apache Drill Issue Type: Bug Affects Versions: 1.16.0 Reporter: Gautam Parai Assignee: Gautam Parai Fix For: 1.16.0 TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order has flipped the build side and the probe side in Major Fragment 01. Here is the query: select sum(l.l_extendedprice) / 7.0 as avg_yearly from lineitem l, part p where p.p_partkey = l.l_partkey and p.p_brand = 'Brand#13' and p.p_container = 'JUMBO CAN' and l.l_quantity < ( select 0.2 * avg(l2.l_quantity) from lineitem l2 where l2.l_partkey = p.p_partkey ); Here is original plan: {noformat} 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 memory}, id = 489493 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 memory}, id = 489492 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 memory}, id = 489491 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 memory}, id = 489490 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489489 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY l_extendedprice): rowcount = 2.948545E9, cumulative cost = \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = 2.948545E9, cumulative cost = \{7.253787630218E10 rows, 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489487 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489486 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 memory}, id = 489476 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 memory}, id = 489475 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = \{3.0417948545E10 rows, 1.57618732434E11 cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489474 04-01 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2, 1301011)]) : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = \{2.4417958836E10 rows, 1.51618742725E11 cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489473 04-02 Project(l_quantity=[$1], l_extendedprice=[$2], p_partkey=[$3]) : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): rowcount = 5.89709E9, cumulative
[jira] [Commented] (DRILL-7076) NPE is logged when querying postgres tables
[ https://issues.apache.org/jira/browse/DRILL-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804510#comment-16804510 ] Gautam Parai commented on DRILL-7076: - [~vvysotskyi] could you please let me know how to get/create the data-source i.e. the postgres table. > NPE is logged when querying postgres tables > --- > > Key: DRILL-7076 > URL: https://issues.apache.org/jira/browse/DRILL-7076 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Gautam Parai >Priority: Blocker > Fix For: 1.16.0 > > > NPE is seen in logs when querying Postgres table: > {code:sql} > select 1 from postgres.public.tdt > {code} > Stack trace from {{sqlline.log}}: > {noformat} > 2019-03-05 13:49:19,395 [23819dc0-abf8-24f3-ea81-6ced1b6e11af:foreman] WARN > o.a.d.e.p.common.DrillStatsTable - Failed to materialize the stats. > Continuing without stats. > java.lang.NullPointerException: null > at > org.apache.drill.exec.planner.common.DrillStatsTable$StatsMaterializationVisitor.visit(DrillStatsTable.java:189) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at org.apache.calcite.rel.SingleRel.childrenAccept(SingleRel.java:72) > [calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0] > at org.apache.calcite.rel.RelVisitor.visit(RelVisitor.java:44) > [calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0] > at > org.apache.drill.exec.planner.common.DrillStatsTable$StatsMaterializationVisitor.visit(DrillStatsTable.java:202) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61) > [calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0] > at > org.apache.drill.exec.planner.common.DrillStatsTable$StatsMaterializationVisitor.materialize(DrillStatsTable.java:177) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:235) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:331) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:178) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:204) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:114) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:80) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:584) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:272) > [drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_191] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_191] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191] > {noformat} > But query runs and returns the correct result. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-7123) TPCDS query 83 runs slower when Statistics is disabled
[ https://issues.apache.org/jira/browse/DRILL-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai resolved DRILL-7123. - Resolution: Fixed > TPCDS query 83 runs slower when Statistics is disabled > -- > > Key: DRILL-7123 > URL: https://issues.apache.org/jira/browse/DRILL-7123 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Blocker > Fix For: 1.16.0 > > > Query is TPCDS 83 with sf 100: > {noformat} > WITH sr_items > AS (SELECT i_item_id item_id, > Sum(sr_return_quantity) sr_item_qty > FROM store_returns, > item, > date_dim > WHERE sr_item_sk = i_item_sk > AND d_date IN (SELECT d_date >FROM date_dim >WHERE d_week_seq IN (SELECT d_week_seq > FROM date_dim > WHERE > d_date IN ( '1999-06-30', > '1999-08-28', > '1999-11-18' > ))) > AND sr_returned_date_sk = d_date_sk > GROUP BY i_item_id), > cr_items > AS (SELECT i_item_id item_id, > Sum(cr_return_quantity) cr_item_qty > FROM catalog_returns, > item, > date_dim > WHERE cr_item_sk = i_item_sk > AND d_date IN (SELECT d_date >FROM date_dim >WHERE d_week_seq IN (SELECT d_week_seq > FROM date_dim > WHERE > d_date IN ( '1999-06-30', > '1999-08-28', > '1999-11-18' > ))) > AND cr_returned_date_sk = d_date_sk > GROUP BY i_item_id), > wr_items > AS (SELECT i_item_id item_id, > Sum(wr_return_quantity) wr_item_qty > FROM web_returns, > item, > date_dim > WHERE wr_item_sk = i_item_sk > AND d_date IN (SELECT d_date >FROM date_dim >WHERE d_week_seq IN (SELECT d_week_seq > FROM date_dim > WHERE > d_date IN ( '1999-06-30', > '1999-08-28', > '1999-11-18' > ))) > AND wr_returned_date_sk = d_date_sk > GROUP BY i_item_id) > SELECT sr_items.item_id, >sr_item_qty, >sr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / > 3.0 * >100 sr_dev, >cr_item_qty, >cr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / > 3.0 * >100 cr_dev, >wr_item_qty, >wr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / > 3.0 * >100 wr_dev, >( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 >average > FROM sr_items, >cr_items, >wr_items > WHERE sr_items.item_id = cr_items.item_id >AND sr_items.item_id = wr_items.item_id > ORDER BY sr_items.item_id, > sr_item_qty > LIMIT 100; > {noformat} > The number of threads for major fragments 1 and 2 has changed when Statistics > is disabled. The number of minor fragments has been reduced from 10 and 15 > fragments down to 3 fragments. Rowcount has changed for major fragment 2 > from 1439754.0 down to 287950.8. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-7120) Query fails with ChannelClosedException when Statistics is disabled
[ https://issues.apache.org/jira/browse/DRILL-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai resolved DRILL-7120. - Resolution: Fixed > Query fails with ChannelClosedException when Statistics is disabled > --- > > Key: DRILL-7120 > URL: https://issues.apache.org/jira/browse/DRILL-7120 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Blocker > Fix For: 1.16.0 > > > TPCH query 5 fails at sf100 when Statistics is disabled. Here is the query: > {noformat} > select > n.n_name, > sum(l.l_extendedprice * (1 - l.l_discount)) as revenue > from > customer c, > orders o, > lineitem l, > supplier s, > nation n, > region r > where > c.c_custkey = o.o_custkey > and l.l_orderkey = o.o_orderkey > and l.l_suppkey = s.s_suppkey > and c.c_nationkey = s.s_nationkey > and s.s_nationkey = n.n_nationkey > and n.n_regionkey = r.r_regionkey > and r.r_name = 'EUROPE' > and o.o_orderdate >= date '1997-01-01' > and o.o_orderdate < date '1997-01-01' + interval '1' year > group by > n.n_name > order by > revenue desc; > {noformat} > This is the error from drillbit.log: > {noformat} > 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> > FINISHED > 2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State to report: FINISHED > 2019-03-04 18:17:51,454 [BitServer-13] WARN > o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming > stream due to memory limits. Current Allocation: 262144. > 2019-03-04 18:17:51,454 [BitServer-13] ERROR > o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer. > 2019-03-04 18:17:51,463 [BitServer-13] ERROR > o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. > Connection: /10.10.120.104:31012 <--> /10.10.120.106:53048 (data server). > Closing connection. > io.netty.handler.codec.DecoderException: > org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating > buffer. > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271) > ~[netty-codec-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > [netty-transport-4.0.48.Final.jar:4.0.48.Final] > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) >
[jira] [Resolved] (DRILL-7122) TPCDS queries 29 25 17 are slower when Statistics is disabled.
[ https://issues.apache.org/jira/browse/DRILL-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai resolved DRILL-7122. - Resolution: Fixed > TPCDS queries 29 25 17 are slower when Statistics is disabled. > -- > > Key: DRILL-7122 > URL: https://issues.apache.org/jira/browse/DRILL-7122 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Blocker > Fix For: 1.16.0 > > > Here is query 29 with sf 100: > {noformat} > SELECT i_item_id, >i_item_desc, >s_store_id, >s_store_name, >Avg(ss_quantity)AS store_sales_quantity, >Avg(sr_return_quantity) AS store_returns_quantity, >Avg(cs_quantity)AS catalog_sales_quantity > FROM store_sales, >store_returns, >catalog_sales, >date_dim d1, >date_dim d2, >date_dim d3, >store, >item > WHERE d1.d_moy = 4 >AND d1.d_year = 1998 >AND d1.d_date_sk = ss_sold_date_sk >AND i_item_sk = ss_item_sk >AND s_store_sk = ss_store_sk >AND ss_customer_sk = sr_customer_sk >AND ss_item_sk = sr_item_sk >AND ss_ticket_number = sr_ticket_number >AND sr_returned_date_sk = d2.d_date_sk >AND d2.d_moy BETWEEN 4 AND 4 + 3 >AND d2.d_year = 1998 >AND sr_customer_sk = cs_bill_customer_sk >AND sr_item_sk = cs_item_sk >AND cs_sold_date_sk = d3.d_date_sk >AND d3.d_year IN ( 1998, 1998 + 1, 1998 + 2 ) > GROUP BY i_item_id, > i_item_desc, > s_store_id, > s_store_name > ORDER BY i_item_id, > i_item_desc, > s_store_id, > s_store_name > LIMIT 100; > {noformat} > The hash join order has changed. As a result, one of the hash joins does not > seem to reduce the number of rows significantly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7109) Statistics adds external sort, which spills to disk
[ https://issues.apache.org/jira/browse/DRILL-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799449#comment-16799449 ] Gautam Parai commented on DRILL-7109: - [~rhou] can you please create another Jira for the issue where we see filter predicates of the type $0=$0 in TPCH4 - that is another issue which should be looked at outside the scope of statistics. > Statistics adds external sort, which spills to disk > --- > > Key: DRILL-7109 > URL: https://issues.apache.org/jira/browse/DRILL-7109 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Major > Fix For: 1.16.0 > > > TPCH query 4 with sf 100 runs many times slower. One issue is that an extra > external sort has been added, and both external sorts spill to disk. > Also, the hash join sees 100x more data. > Here is the query: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7129) Join with more than 1 condition is not using stats to compute row count estimate
[ https://issues.apache.org/jira/browse/DRILL-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai reassigned DRILL-7129: --- Assignee: Gautam Parai > Join with more than 1 condition is not using stats to compute row count > estimate > > > Key: DRILL-7129 > URL: https://issues.apache.org/jira/browse/DRILL-7129 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Anisha Reddy >Assignee: Gautam Parai >Priority: Major > Fix For: 1.17.0 > > > Below are the details: > > {code:java} > 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from > `table_stats/Tpch0.01/parquet/lineitem`; +-+ | EXPR$0 | +-+ | > 57068 | +-+ 1 row selected (0.179 seconds) > 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from > `table_stats/Tpch0.01/parquet/partsupp`; +-+ | EXPR$0 | +-+ | > 7474 | +-+ 1 row selected (0.171 seconds) > 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from > `table_stats/Tpch0.01/parquet/lineitem` l, > `table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey > and l.l_suppkey = ps.ps_suppkey; +-+ | EXPR$0 | +-+ | 53401 | > +-+ 1 row selected (0.769 seconds) > 0: jdbc:drill:drillbit=10.10.101.108> explain plan including all attributes > for select * from `table_stats/Tpch0.01/parquet/lineitem` l, > `table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey > and l.l_suppkey = ps.ps_suppkey; > +--+--+ > | text | json | > +--+--+ > | 00-00 Screen : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): > rowcount = 57068.0, cumulative cost = {313468.8 rows, 2110446.8 cpu, 193626.0 > io, 0.0 network, 197313.6 memory}, id = 107578 00-01 ProjectAllowDup(**=[$0], > **0=[$1]) : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): rowcount > = 57068.0, cumulative cost = {307762.0 rows, 2104740.0 cpu, 193626.0 io, 0.0 > network, 197313.6 memory}, id = 107577 00-02 Project(T10¦¦**=[$0], > T11¦¦**=[$3]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, DYNAMIC_STAR > T11¦¦**): rowcount = 57068.0, cumulative cost = {250694.0 rows, 1990604.0 > cpu, 193626.0 io, 0.0 network, 197313.6 memory}, id = 107576 00-03 > HashJoin(condition=[AND(=($1, $4), =($2, $5))], joinType=[inner], semi-join: > =[false]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, ANY l_partkey, ANY > l_suppkey, DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = > 57068.0, cumulative cost = {193626.0 rows, 1876468.0 cpu, 193626.0 io, 0.0 > network, 197313.6 memory}, id = 107575 00-05 Project(T10¦¦**=[$0], > l_partkey=[$1], l_suppkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, > ANY l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = > {114136.0 rows, 342408.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = > 107572 00-07 Scan(table=[[dfs, drilltestdir, > table_stats/Tpch0.01/parquet/lineitem]], groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/lineitem]], > selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/lineitem, > numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, > `l_partkey`, `l_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY > l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = {57068.0 > rows, 171204.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 107571 00-04 > Project(T11¦¦**=[$0], ps_partkey=[$1], ps_suppkey=[$2]) : rowType = > RecordType(DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = > 7474.0, cumulative cost = {14948.0 rows, 44844.0 cpu, 22422.0 io, 0.0 > network, 0.0 memory}, id = 107574 00-06 Scan(table=[[dfs, drilltestdir, > table_stats/Tpch0.01/parquet/partsupp]], groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/partsupp]], > selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/partsupp, > numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, > `ps_partkey`, `ps_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY > ps_partkey, ANY ps_suppkey): rowcount = 7474.0, cumulative cost = {7474.0 > rows, 22422.0 cpu, 22422.0 io, 0.0 network, 0.0 memory}, id = 107573 > {code} > The ndv for l_partkey = 2000 > ps_partkey = 1817 > l_supkey = 100 > ps_suppkey = 100 > We see that such joins is
[jira] [Updated] (DRILL-7108) With statistics enabled TPCH 16 has two additional exchange operators
[ https://issues.apache.org/jira/browse/DRILL-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-7108: Labels: ready-to-commit (was: ) > With statistics enabled TPCH 16 has two additional exchange operators > - > > Key: DRILL-7108 > URL: https://issues.apache.org/jira/browse/DRILL-7108 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > TPCH 16 with sf 100 runs 14% slower. Here is the query: > {noformat} > select > p.p_brand, > p.p_type, > p.p_size, > count(distinct ps.ps_suppkey) as supplier_cnt > from > partsupp ps, > part p > where > p.p_partkey = ps.ps_partkey > and p.p_brand <> 'Brand#21' > and p.p_type not like 'MEDIUM PLATED%' > and p.p_size in (38, 2, 8, 31, 44, 5, 14, 24) > and ps.ps_suppkey not in ( > select > s.s_suppkey > from > supplier s > where > s.s_comment like '%Customer%Complaints%' > ) > group by > p.p_brand, > p.p_type, > p.p_size > order by > supplier_cnt desc, > p.p_brand, > p.p_type, > p.p_size; > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7114) ANALYZE command generates warnings for stats file and materialization
[ https://issues.apache.org/jira/browse/DRILL-7114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796306#comment-16796306 ] Gautam Parai commented on DRILL-7114: - [~vitalii] yes you are right - this happens for all queries. I plan to address both issues as part of this JIRA. > ANALYZE command generates warnings for stats file and materialization > - > > Key: DRILL-7114 > URL: https://issues.apache.org/jira/browse/DRILL-7114 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Reporter: Aman Sinha >Assignee: Gautam Parai >Priority: Minor > Fix For: 1.16.0 > > > When I run ANALYZE, I see warnings in the log file as shown below. The > ANALYZE command should not try to read the stats file or materialize the > stats. > {noformat} > 12:04:32.939 [2370143e-c419-f33c-d879-84989712bc85:foreman] WARN > o.a.d.e.p.common.DrillStatsTable - Failed to read the stats file. > java.io.FileNotFoundException: File /tmp/orders3/.stats.drill/0_0.json does > not exist > 12:04:32.939 [2370143e-c419-f33c-d879-84989712bc85:foreman] WARN > o.a.d.e.p.common.DrillStatsTable - Failed to materialize the stats. > Continuing without stats. > java.io.FileNotFoundException: File /tmp/orders3/.stats.drill/0_0.json does > not exist > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7085) Drill Statistics: analyze table cmd fails
[ https://issues.apache.org/jira/browse/DRILL-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai reassigned DRILL-7085: --- Assignee: Gautam Parai > Drill Statistics: analyze table cmd fails > - > > Key: DRILL-7085 > URL: https://issues.apache.org/jira/browse/DRILL-7085 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Anisha Reddy >Assignee: Gautam Parai >Priority: Blocker > Fix For: 1.16.0 > > > On the latest build we are seeing in the regression results that the analyze > cmd is failing. > This is the dir: > {code:java} > [root@qa101-108 bin]# hadoop fs -ls /drill/testdata/table_stats/data_with_0 > Found 1 items -rwxr-xr-x 3 root root 487 2019-03-05 18:26 > /drill/testdata/table_stats/data_with_0/0_0_0.parquet > {code} > {code:java} > 0: jdbc:drill:drillbit=10.10.101.108> set `planner.statistics.use`=true; > +---+--+ > | ok | summary | > +---+--+ > | true | planner.statistics.use updated. | > +---+--+ > 1 row selected (0.164 seconds) > {code} > {code:java} > 0: jdbc:drill:drillbit=10.10.101.108> analyze table `table_stats/data_with_0` > compute statistics; > ++--+ > | ok | summary > | > ++--+ > | false | Table table_stats/data_with_0 is not supported by ANALYZE. Support > is currently limited to directory-based Parquet tables. | > ++--+ > 1 row selected (0.901 seconds) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-1328) Support table statistics
[ https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai resolved DRILL-1328. - Resolution: Fixed Merged with commit IDs 469be17597e7b7c6bc1de9863dcb6c5604a55f0c 3233d8aaff57ac71bd3b726efcd5fdaa92aef861 > Support table statistics > > > Key: DRILL-1328 > URL: https://issues.apache.org/jira/browse/DRILL-1328 > Project: Apache Drill > Issue Type: Improvement >Reporter: Cliff Buchanan >Assignee: Gautam Parai >Priority: Major > Fix For: 1.16.0 > > Attachments: 0001-PRE-Set-value-count-in-splitAndTransfer.patch > > > This consists of several subtasks > * implement operators to generate statistics > * add "analyze table" support to parser/planner > * create a metadata provider to allow statistics to be used by optiq in > planning optimization > * implement statistics functions > Right now, the bulk of this functionality is implemented, but it hasn't been > rigorously tested and needs to have some definite answers for some of the > parts "around the edges" (how analyze table figures out where the table > statistics are located, how a table "append" should work in a read only file > system) > Also, here are a few known caveats: > * table statistics are collected by creating a sql query based on the string > path of the table. This should probably be done with a Table reference. > * Case sensitivity for column statistics is probably iffy > * Math for combining two column NDVs into a joint NDV should be checked. > * Schema changes aren't really being considered yet. > * adding getDrillTable is probably unnecessary; it might be better to do > getTable().unwrap(DrillTable.class) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-1328) Support table statistics
[ https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780922#comment-16780922 ] Gautam Parai commented on DRILL-1328: - Reopened the PR to address the most critical review comments. I will merge the PR based upon approval from the reviewer(s) - this will avoid the costly rebase work which comes with keeping the PR open for such a long time. More work will be done as part of subsequent JIRAs. > Support table statistics > > > Key: DRILL-1328 > URL: https://issues.apache.org/jira/browse/DRILL-1328 > Project: Apache Drill > Issue Type: Improvement >Reporter: Cliff Buchanan >Assignee: Gautam Parai >Priority: Major > Fix For: 1.16.0 > > Attachments: 0001-PRE-Set-value-count-in-splitAndTransfer.patch > > > This consists of several subtasks > * implement operators to generate statistics > * add "analyze table" support to parser/planner > * create a metadata provider to allow statistics to be used by optiq in > planning optimization > * implement statistics functions > Right now, the bulk of this functionality is implemented, but it hasn't been > rigorously tested and needs to have some definite answers for some of the > parts "around the edges" (how analyze table figures out where the table > statistics are located, how a table "append" should work in a read only file > system) > Also, here are a few known caveats: > * table statistics are collected by creating a sql query based on the string > path of the table. This should probably be done with a Table reference. > * Case sensitivity for column statistics is probably iffy > * Math for combining two column NDVs into a joint NDV should be checked. > * Schema changes aren't really being considered yet. > * adding getDrillTable is probably unnecessary; it might be better to do > getTable().unwrap(DrillTable.class) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6899) Fix timestamp issues in unit tests ignored with DRILL-6833
Gautam Parai created DRILL-6899: --- Summary: Fix timestamp issues in unit tests ignored with DRILL-6833 Key: DRILL-6899 URL: https://issues.apache.org/jira/browse/DRILL-6899 Project: Apache Drill Issue Type: Bug Affects Versions: 1.15.0 Reporter: Gautam Parai Assignee: Gautam Parai Fix For: 1.16.0 {{The following tests were disabled in the PR for DRILL-6833}} {{IndexPlanTest.testCastTimestampPlan() - Re-enable after the MapRDB format plugin issue is fixed.}} {{IndexPlanTest.testRowkeyJoinPushdown_13() - Re-enable the testcase after fixing the execution issue with HashJoin used as Rowkeyjoin.}} {{IndexPlanTest.testRowkeyJoinPushdown_12() - Remove the testcase since the SemiJoin transformation makes the rowkeyjoinpushdown transformation invalid.}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6501) Revert/modify fix for DRILL-6212 after CALCITE-2223 is fixed
[ https://issues.apache.org/jira/browse/DRILL-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718497#comment-16718497 ] Gautam Parai commented on DRILL-6501: - This may also be the root cause of https://issues.apache.org/jira/browse/DRILL-6897. Needs further investigation. > Revert/modify fix for DRILL-6212 after CALCITE-2223 is fixed > > > Key: DRILL-6501 > URL: https://issues.apache.org/jira/browse/DRILL-6501 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Major > Fix For: 1.16.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > DRILL-6212 is a temporary fix to alleviate issues due to CALCITE-2223. Once, > CALCITE-2223 is fixed this change needs to be reverted back which would > require DrillProjectMergeRule to go back to extending the ProjectMergeRule. > Please take a look at how CALCITE-2223 is eventually fixed (as of now it is > still not clear which fix is the way to do). Depending on the fix we may need > to additional work to integrate these changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6501) Revert/modify fix for DRILL-6212 after CALCITE-2223 is fixed
[ https://issues.apache.org/jira/browse/DRILL-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-6501: Fix Version/s: 1.16.0 > Revert/modify fix for DRILL-6212 after CALCITE-2223 is fixed > > > Key: DRILL-6501 > URL: https://issues.apache.org/jira/browse/DRILL-6501 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Major > Fix For: 1.16.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > DRILL-6212 is a temporary fix to alleviate issues due to CALCITE-2223. Once, > CALCITE-2223 is fixed this change needs to be reverted back which would > require DrillProjectMergeRule to go back to extending the ProjectMergeRule. > Please take a look at how CALCITE-2223 is eventually fixed (as of now it is > still not clear which fix is the way to do). Depending on the fix we may need > to additional work to integrate these changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6882) Handle the cases where RowKeyJoin's left pipeline being called multiple times.
[ https://issues.apache.org/jira/browse/DRILL-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-6882: Description: RowKeyJoin operator's leftStream pipeline uses IterOutcome.NONE state to repeat the pipeline multiple times. This should be avoided as IterOutcome.NONE will assertion checks in validator. We ran into this issue when trying to fix DRILL-6878. Sample unit test failure from DRILL-6878 select t1.id.ssn as ssn from hbase.`index_test_primary` t1, hbase.`index_test_primary` t2 where t1._id = t2.rowid and cast(t2.activity.irs.firstlogin as timestamp) = to_timestamp('2013-02-04 22:34:38.0', '-MM-dd HH:mm:ss.S') at org.apache.drill.test.DrillTestWrapper.compareMergedOnHeapVectors(DrillTestWrapper.java:630) ~[drill-java-exec-1.15.0-SNAPSHOT-tests.jar:1.15.0-SNAPSHOT] at org.apache.drill.test.DrillTestWrapper.compareOrderedResults(DrillTestWrapper.java:584) ~[drill-java-exec-1.15.0-SNAPSHOT-tests.jar:1.15.0-SNAPSHOT] at org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:157) ~[drill-java-exec-1.15.0-SNAPSHOT-tests.jar:1.15.0-SNAPSHOT] at org.apache.drill.test.TestBuilder.go(TestBuilder.java:139) ~[drill-java-exec-1.15.0-SNAPSHOT-tests.jar:1.15.0-SNAPSHOT] at com.mapr.drill.maprdb.tests.index.IndexPlanTest.testRowkeyJoinPushdown_7(IndexPlanTest.java:1828) ~[test-classes/:na] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_141] Caused by: org.apache.drill.exec.rpc.RpcException: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalStateException: next() [on #851, ScanBatch] called again after it returned NONE. Caller should not have called next() again. was: RowKeyJoin operator's leftStream pipeline uses IterOutcome.NONE state to repeat the pipeline multiple times. This should be avoided as IterOutcome.NONE will assertion checks in validator. We ran into this issue when trying to fix DRILL-6878. > Handle the cases where RowKeyJoin's left pipeline being called multiple times. > -- > > Key: DRILL-6882 > URL: https://issues.apache.org/jira/browse/DRILL-6882 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Blocker > Fix For: 1.15.0 > > > RowKeyJoin operator's leftStream pipeline uses IterOutcome.NONE state to > repeat the pipeline multiple times. This should be avoided as > IterOutcome.NONE will assertion checks in validator. We ran into this issue > when trying to fix DRILL-6878. > Sample unit test failure from DRILL-6878 > select t1.id.ssn as ssn from hbase.`index_test_primary` t1, > hbase.`index_test_primary` t2 where t1._id = t2.rowid and > cast(t2.activity.irs.firstlogin as timestamp) = to_timestamp('2013-02-04 > 22:34:38.0', '-MM-dd HH:mm:ss.S') > at > org.apache.drill.test.DrillTestWrapper.compareMergedOnHeapVectors(DrillTestWrapper.java:630) > ~[drill-java-exec-1.15.0-SNAPSHOT-tests.jar:1.15.0-SNAPSHOT] > at > org.apache.drill.test.DrillTestWrapper.compareOrderedResults(DrillTestWrapper.java:584) > ~[drill-java-exec-1.15.0-SNAPSHOT-tests.jar:1.15.0-SNAPSHOT] > at org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:157) > ~[drill-java-exec-1.15.0-SNAPSHOT-tests.jar:1.15.0-SNAPSHOT] > at org.apache.drill.test.TestBuilder.go(TestBuilder.java:139) > ~[drill-java-exec-1.15.0-SNAPSHOT-tests.jar:1.15.0-SNAPSHOT] > at > com.mapr.drill.maprdb.tests.index.IndexPlanTest.testRowkeyJoinPushdown_7(IndexPlanTest.java:1828) > ~[test-classes/:na] > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_141] > Caused by: org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > IllegalStateException: next() [on #851, ScanBatch] called again after it > returned NONE. Caller should not have called next() again. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6882) Handle the cases where RowKeyJoin's left pipeline being called multiple times.
Gautam Parai created DRILL-6882: --- Summary: Handle the cases where RowKeyJoin's left pipeline being called multiple times. Key: DRILL-6882 URL: https://issues.apache.org/jira/browse/DRILL-6882 Project: Apache Drill Issue Type: Bug Reporter: Gautam Parai Assignee: Gautam Parai RowKeyJoin operator's leftStream pipeline uses IterOutcome.NONE state to repeat the pipeline multiple times. This should be avoided as IterOutcome.NONE will assertion checks in validator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6882) Handle the cases where RowKeyJoin's left pipeline being called multiple times.
[ https://issues.apache.org/jira/browse/DRILL-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-6882: Description: RowKeyJoin operator's leftStream pipeline uses IterOutcome.NONE state to repeat the pipeline multiple times. This should be avoided as IterOutcome.NONE will assertion checks in validator. We ran into this issue when trying to fix DRILL-6878. was: RowKeyJoin operator's leftStream pipeline uses IterOutcome.NONE state to repeat the pipeline multiple times. This should be avoided as IterOutcome.NONE will assertion checks in validator. > Handle the cases where RowKeyJoin's left pipeline being called multiple times. > -- > > Key: DRILL-6882 > URL: https://issues.apache.org/jira/browse/DRILL-6882 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Blocker > Fix For: 1.15.0 > > > RowKeyJoin operator's leftStream pipeline uses IterOutcome.NONE state to > repeat the pipeline multiple times. This should be avoided as > IterOutcome.NONE will assertion checks in validator. We ran into this issue > when trying to fix DRILL-6878. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6882) Handle the cases where RowKeyJoin's left pipeline being called multiple times.
[ https://issues.apache.org/jira/browse/DRILL-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-6882: Fix Version/s: 1.15.0 > Handle the cases where RowKeyJoin's left pipeline being called multiple times. > -- > > Key: DRILL-6882 > URL: https://issues.apache.org/jira/browse/DRILL-6882 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Blocker > Fix For: 1.15.0 > > > RowKeyJoin operator's leftStream pipeline uses IterOutcome.NONE state to > repeat the pipeline multiple times. This should be avoided as > IterOutcome.NONE will assertion checks in validator. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6882) Handle the cases where RowKeyJoin's left pipeline being called multiple times.
[ https://issues.apache.org/jira/browse/DRILL-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-6882: Affects Version/s: 1.15.0 > Handle the cases where RowKeyJoin's left pipeline being called multiple times. > -- > > Key: DRILL-6882 > URL: https://issues.apache.org/jira/browse/DRILL-6882 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Blocker > Fix For: 1.15.0 > > > RowKeyJoin operator's leftStream pipeline uses IterOutcome.NONE state to > repeat the pipeline multiple times. This should be avoided as > IterOutcome.NONE will assertion checks in validator. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6878) Index Plan Test failures after DRILL-6833
Gautam Parai created DRILL-6878: --- Summary: Index Plan Test failures after DRILL-6833 Key: DRILL-6878 URL: https://issues.apache.org/jira/browse/DRILL-6878 Project: Apache Drill Issue Type: Bug Affects Versions: 1.15.0 Reporter: Gautam Parai Assignee: Gautam Parai Fix For: 1.15.0 [ERROR] Failures: [ERROR] IndexPlanTest.testCastTimestampPlan:1148 Incorrect number of rows returned by query. expected:<1> but was:<0> [ERROR] IndexPlanTest.testRowkeyJoinPushdown_1:1702 Did not find expected pattern in plan: RowKeyJoin 00-00 Screen 00-01 Project(ssn=[$0]) 00-02 Project(ssn=[$1]) 00-03 HashJoin(condition=[=($0, $2)], joinType=[inner], semi-join: =[true]) 00-05 Project(_id=[$0], ITEM=[ITEM($1, 'ssn')]) 00-06 Scan(table=[[hbase, index_test_primary]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///tmp/index_test_primary, condition=null], columns=[`_id`, `id`.`ssn`], maxwidth=2]]) 00-04 Scan(table=[[hbase, index_test_primary]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///tmp/index_test_primary, condition=($0 = {"$date":"2013-02-04T22:34:38.000Z"} ), indexName=hash_i_cast_timestamp_firstlogin], columns=[`_id`], maxwidth=10]]) [ERROR] IndexPlanTest.testRowkeyJoinPushdown_10:1883 Incorrect number of rows returned by query. expected:<1> but was:<0> [ERROR] IndexPlanTest.testRowkeyJoinPushdown_12:1918 Did not find expected pattern in plan: HashJoin(.*[ ])+.*Scan.*indexName=i_state_city_dl(.*[ ])+.*RowKeyJoin(.*[ ])+.*RestrictedJsonTableGroupScan(.*[ ])+.*HashAgg(group=[\{0}])(.*[ ])+.*HashJoin 00-00 Screen 00-01 Project(ssn=[$0]) 00-02 Project(ssn=[$1]) 00-03 HashJoin(condition=[=($0, $5)], joinType=[inner], semi-join: =[true]) 00-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) 00-08 Project(_id=[$0], ITEM=[ITEM($1, 'ssn')], ITEM2=[ITEM($2, 'city')]) 00-11 Scan(table=[[hbase, index_test_primary]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///tmp/index_test_primary, condition=null], columns=[`_id`, `id`.`ssn`, `address`.`city`], maxwidth=2]]) 00-07 Project(ITEM0=[$0], ITEM1=[$1]) 00-10 Project(ITEM=[ITEM($0, 'city')], ITEM1=[ITEM($0, 'state')]) 00-14 Scan(table=[[hbase, index_test_primary]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///tmp/index_test_primary, condition=(address.state = "pc"), indexName=i_state_city_dl], columns=[`address`.`city`, `address`.`state`], maxwidth=1]]) 00-04 Project(EXPR$0=[CAST(CAST($0):INTEGER):VARCHAR(10) CHARACTER SET "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"]) 00-06 Project(rowid=[$1], ITEM=[$2], ITEM2=[$3], ITEM0=[$0]) 00-09 HashJoin(condition=[=($2, $0)], joinType=[inner], semi-join: =[false]) 00-13 Project(ITEM=[ITEM($0, 'city')]) 00-16 Scan(table=[[hbase, index_test_primary]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///tmp/index_test_primary, condition=null], columns=[`address`.`city`], maxwidth=2]]) 00-12 Project(rowid=[$0], ITEM0=[$1], ITEM2=[$2]) 00-15 SelectionVectorRemover 00-17 Filter(condition=[=(CAST($2):TIMESTAMP(0), 2013-02-04 22:34:38)]) 00-18 Project(rowid=[$0], ITEM=[ITEM($1, 'city')], ITEM2=[ITEM(ITEM($2, 'irs'), 'firstlogin')]) 00-19 Scan(table=[[hbase, index_test_primary]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///tmp/index_test_primary, condition=null], columns=[`rowid`, `address`.`city`, `activity`.`irs`.`firstlogin`], maxwidth=2]]) [ERROR] IndexPlanTest.testRowkeyJoinPushdown_6:1799 Did not find expected pattern in plan: RowKeyJoin 00-00 Screen 00-01 Project(ssn=[$0]) 00-02 Project(ssn=[$1]) 00-03 HashJoin(condition=[=($0, $2)], joinType=[inner], semi-join: =[true]) 00-05 Project(_id=[$0], ITEM=[ITEM($1, 'ssn')]) 00-07 Scan(table=[[hbase, index_test_primary]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///tmp/index_test_primary, condition=null], columns=[`_id`, `id`.`ssn`], maxwidth=2]]) 00-04 Project(EXPR$0=[CAST(CAST($0):INTEGER):VARCHAR(10) CHARACTER SET "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"]) 00-06 Project(rowid=[$1], ITEM=[$2], ITEM2=[$3], ITEM0=[$0]) 00-08 HashJoin(condition=[=($2, $0)], joinType=[inner], semi-join: =[false]) 00-10 Project(ITEM=[ITEM($0, 'city')]) 00-12 Scan(table=[[hbase, index_test_primary]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///tmp/index_test_primary, condition=null], columns=[`address`.`city`], maxwidth=2]]) 00-09 Project(rowid=[$0], ITEM0=[$1], ITEM2=[$2]) 00-11 SelectionVectorRemover 00-13 Filter(condition=[=(CAST($2):TIMESTAMP(0), 2013-02-04 22:34:38)]) 00-14 Project(rowid=[$0], ITEM=[ITEM($1, 'city')], ITEM2=[ITEM(ITEM($2, 'irs'), 'firstlogin')]) 00-15 Scan(table=[[hbase, index_test_primary]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec
[jira] [Updated] (DRILL-6833) MapRDB queries with Run Time Filters with row_key/Secondary Index Should Support Pushdown
[ https://issues.apache.org/jira/browse/DRILL-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-6833: Labels: ready-to-commit (was: ) > MapRDB queries with Run Time Filters with row_key/Secondary Index Should > Support Pushdown > - > > Key: DRILL-6833 > URL: https://issues.apache.org/jira/browse/DRILL-6833 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.15.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > Drill should push down all row key filters to maprDB for queries that only > have WHERE conditions on row_keys. In the following example, the query only > has where clause on row_keys: > select t.mscIdentities from dfs.root.`/user/mapr/MixTable` t where t.row_key= > (select max(convert_fromutf8(i.KeyA.ENTRY_KEY)) from > dfs.root.`/user/mapr/TableIMSI` i where i.row_key='460021050005636') > row_keys can return at most 1 row. So the physical planning must leverage > MapRDB row_key push down to execute the sub query, with its results execute > the outer query. Currently only the inner query is pushed down. The outer > query requires a table scan. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6770) Queries on MapR-DB JSON tables fail with UnsupportedOperationException: Getting number of rows for tablet not supported
[ https://issues.apache.org/jira/browse/DRILL-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-6770: Labels: ready-to-commit (was: ) > Queries on MapR-DB JSON tables fail with UnsupportedOperationException: > Getting number of rows for tablet not supported > --- > > Key: DRILL-6770 > URL: https://issues.apache.org/jira/browse/DRILL-6770 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization, Storage - MapRDB >Affects Versions: 1.15.0 > Environment: MapR 6.1.0 > Drill 1.15.0 >Reporter: Abhishek Girish >Assignee: Gautam Parai >Priority: Critical > Labels: ready-to-commit > Fix For: 1.15.0 > > > Create a simple MapR-DB JSON table > {code} > $ mapr dbshell > MapR-DB Shell > maprdb root:> create /tmp/t1 > Table /tmp/t1 created. > maprdb root:> insert /tmp/t1 --id '1' --v '{"a":1}' > Document with id: "1" inserted. > maprdb root:> find /tmp/t1 > {"_id":"1","a":1} > 1 document(s) found. > {code} > Querying this from Drill fails: > {code} > > select * from mfs.`/tmp/t1`; > Error: SYSTEM ERROR: UnsupportedOperationException: Getting number of rows > for tablet not supported > {code} > Stack Trace: > {code} > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Error while applying rule DrillTableRule, > args [rel#1400499:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[mfs, > /tmp/t1])] > org.apache.drill.exec.work.foreman.Foreman.run():300 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > Caused By (java.lang.RuntimeException) Error while applying rule > DrillTableRule, args > [rel#1400499:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[mfs, /tmp/t1])] > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():236 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():648 > org.apache.calcite.tools.Programs$RuleSetProgram.run():339 > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():425 > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():365 > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel():252 > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():314 > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():179 > org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145 > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83 > org.apache.drill.exec.work.foreman.Foreman.runSQL():584 > org.apache.drill.exec.work.foreman.Foreman.run():272 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > Caused By (org.apache.drill.common.exceptions.DrillRuntimeException) Error > getting region info for table: maprfs:///tmp/t1 > org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan.init():161 > org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan.():83 > org.apache.drill.exec.store.mapr.db.MapRDBFormatPlugin.getGroupScan():81 > org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan():170 > org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan():117 > org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan():112 > org.apache.drill.exec.planner.logical.DrillTable.getGroupScan():99 > org.apache.drill.exec.planner.logical.DrillScanRel.():90 > org.apache.drill.exec.planner.logical.DrillScanRel.():70 > org.apache.drill.exec.planner.logical.DrillScanRel.():63 > org.apache.drill.exec.planner.logical.DrillScanRule.onMatch():38 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():212 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():648 > org.apache.calcite.tools.Programs$RuleSetProgram.run():339 > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():425 > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():365 > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel():252 > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():314 > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():179 > org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145 > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83 > org.apache.drill.exec.work.foreman.Foreman.runSQL():584 >
[jira] [Created] (DRILL-6841) Statistics ANALYZE statement should be more flexible
Gautam Parai created DRILL-6841: --- Summary: Statistics ANALYZE statement should be more flexible Key: DRILL-6841 URL: https://issues.apache.org/jira/browse/DRILL-6841 Project: Apache Drill Issue Type: Improvement Reporter: Gautam Parai Investigate and implement other alternatives to the ANALYZE statement e.g. SELECT ndv(id) FROM OR enhance the ANALYZE statement to support such options. This may be helpful for users to experiment easily and quickly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6833) MapRDB queries with Run Time Filters with row_key/Secondary Index Should Support Pushdown
Gautam Parai created DRILL-6833: --- Summary: MapRDB queries with Run Time Filters with row_key/Secondary Index Should Support Pushdown Key: DRILL-6833 URL: https://issues.apache.org/jira/browse/DRILL-6833 Project: Apache Drill Issue Type: New Feature Affects Versions: 1.15.0 Reporter: Gautam Parai Assignee: Gautam Parai Fix For: 1.15.0 Drill should push down all row key filters to maprDB for queries that only have WHERE conditions on row_keys. In the following example, the query only has where clause on row_keys: select t.mscIdentities from dfs.root.`/user/mapr/MixTable` t where t.row_key= (select max(convert_fromutf8(i.KeyA.ENTRY_KEY)) from dfs.root.`/user/mapr/TableIMSI` i where i.row_key='460021050005636') row_keys can return at most 1 row. So the physical planning must leverage MapRDB row_key push down to execute the sub query, with its results execute the outer query. Currently only the inner query is pushed down. The outer query requires a table scan. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6824) Drill Query on MapRDB JSON table failing on schema SchemaChangeException, the only distinct Values are NULL and Text
[ https://issues.apache.org/jira/browse/DRILL-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Parai updated DRILL-6824: Description: Drill Query on MapR DB JSON Table or it View fails with a SchemaChangeException. The only distinct values is NULL and some Text. The issue is that when Drill sees a NULL or does not see any values for a column in the first batch it assumes the type as NULLABLE INT. Subsequently, when the data shows up and it is different from NULLABLE INT there is a schema change. Operators e.g. aggregators etc. cannot handle such a Schema Change and throw a SchemaChangeException. One of the short-term solution implemented in this fix: Add a CAST expression which will cast null values to the target type. Hence, we would never see a SchemaChange due to NULLs. However, the MapRDB Reader code was written differently than other reader and was hitting a SchemaChangeException. The code was changed to make it similar to other scans i.e. emit a new batch whenever a schema change is encountered. was:Drill Query on MapR DB JSON Table or it View fails with a SchemaChangeException. Th only distinct values is NULL and some Text. > Drill Query on MapRDB JSON table failing on schema SchemaChangeException, the > only distinct Values are NULL and Text > > > Key: DRILL-6824 > URL: https://issues.apache.org/jira/browse/DRILL-6824 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0, 1.15.0 >Reporter: Gautam Parai >Assignee: Gautam Parai >Priority: Major > Fix For: 1.15.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Drill Query on MapR DB JSON Table or it View fails with a > SchemaChangeException. The only distinct values is NULL and some Text. > The issue is that when Drill sees a NULL or does not see any values for a > column in the first batch it assumes the type as NULLABLE INT. Subsequently, > when the data shows up and it is different from NULLABLE INT there is a > schema change. Operators e.g. aggregators etc. cannot handle such a Schema > Change and throw a SchemaChangeException. > > One of the short-term solution implemented in this fix: Add a CAST expression > which will cast null values to the target type. Hence, we would never see a > SchemaChange due to NULLs. However, the MapRDB Reader code was written > differently than other reader and was hitting a SchemaChangeException. The > code was changed to make it similar to other scans i.e. emit a new batch > whenever a schema change is encountered. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6824) Drill Query on MapRDB JSON table failing on schema SchemaChangeException, the only distinct Values are NULL and Text
Gautam Parai created DRILL-6824: --- Summary: Drill Query on MapRDB JSON table failing on schema SchemaChangeException, the only distinct Values are NULL and Text Key: DRILL-6824 URL: https://issues.apache.org/jira/browse/DRILL-6824 Project: Apache Drill Issue Type: Bug Affects Versions: 1.14.0, 1.15.0 Reporter: Gautam Parai Assignee: Gautam Parai Fix For: 1.15.0 Drill Query on MapR DB JSON Table or it View fails with a SchemaChangeException. Th only distinct values is NULL and some Text. -- This message was sent by Atlassian JIRA (v7.6.3#76005)