[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results
[ https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508205#comment-14508205 ] Abhishek Girish commented on DRILL-2380: Verified on Git.Commit.ID a0a1930 (April 21 build) {code:sql} WITH ss AS (SELECT i_manufact_id, Sum(ss_ext_sales_price) total_sales FROM store_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ss_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), cs AS (SELECT i_manufact_id, Sum(cs_ext_sales_price) total_sales FROM catalog_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND cs_item_sk = i_item_sk AND cs_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND cs_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), ws AS (SELECT i_manufact_id, Sum(ws_ext_sales_price) total_sales FROM web_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ws_item_sk = i_item_sk AND ws_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ws_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id) SELECT i_manufact_id, Sum(total_sales) total_sales FROM (SELECT i_manufact_id, total_sales FROM ss UNION ALL SELECT i_manufact_id, total_sales FROM cs UNION ALL SELECT i_manufact_id, total_sales FROM ws) tmp1 GROUP BY i_manufact_id ORDER BY total_sales LIMIT 10; Drill: +---+-+ | i_manufact_id | total_sales | +---+-+ | 930 | 1.18| | 818 | 41.86 | | 913 | 141.9 | | 784 | 184.9 | | 488 | 275.08 | | 993 | 301.6 | | 700 | 340.520004 | | 895 | 802.3 | | 766 | 839.76 | | 858 | 859.18 | +---+-+ 10 rows selected (11.535 seconds) Postgres: i_manufact_id | total_sales ---+- 930 |1.18 818 | 41.86 913 | 141.9 784 | 184.9 488 | 275.08 993 | 301.6 700 | 340.52 895 | 802.3 766 | 839.76 858 | 859.18 (10 rows) {code} The issue is now resolved. TPC-DS Query 33 and simplified variants return wrong results Key: DRILL-2380 URL: https://issues.apache.org/jira/browse/DRILL-2380 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Sean Hsuan-Yi Chu Priority: Critical Fix For: 0.8.0 TPC-DS query 33 returns wrong results. {code:sql} WITH ss AS (SELECT i_manufact_id, Sum(ss_ext_sales_price) total_sales FROM store_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ss_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), cs AS (SELECT i_manufact_id, Sum(cs_ext_sales_price) total_sales FROM catalog_sales, date_dim,
[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results
[ https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365761#comment-14365761 ] Sean Hsuan-Yi Chu commented on DRILL-2380: -- The failure was due to Union-All. After new union-all had gotten checked in, this query ran and gave the same result as postrgres. TPC-DS Query 33 and simplified variants return wrong results Key: DRILL-2380 URL: https://issues.apache.org/jira/browse/DRILL-2380 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Sean Hsuan-Yi Chu Priority: Critical Fix For: 0.8.0 TPC-DS query 33 returns wrong results. {code:sql} WITH ss AS (SELECT i_manufact_id, Sum(ss_ext_sales_price) total_sales FROM store_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ss_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), cs AS (SELECT i_manufact_id, Sum(cs_ext_sales_price) total_sales FROM catalog_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND cs_item_sk = i_item_sk AND cs_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND cs_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), ws AS (SELECT i_manufact_id, Sum(ws_ext_sales_price) total_sales FROM web_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ws_item_sk = i_item_sk AND ws_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ws_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id) SELECT i_manufact_id, Sum(total_sales) total_sales FROM (SELECT i_manufact_id, total_sales FROM ss UNION ALL SELECT i_manufact_id, total_sales FROM cs UNION ALL SELECT i_manufact_id, total_sales FROM ws) tmp1 GROUP BY i_manufact_id ORDER BY total_sales LIMIT 10; Drill Results: +---+-+ | i_manufact_id | total_sales | +---+-+ | 440 | 0.12| | 434 | 13.16 | | 415 | 14.04 | | 449 | 15.63 | | 563 | 31.46 | | 357 | 49.50 | | 624 | 67.94 | | 192 | 74.40 | | 137 | 83.42 | | 240 | 85.26 | +---+-+ 10 rows selected (7.57 seconds) Postgres Results: i_manufact_id | total_sales ---+- 930 |1.18 818 | 41.86 913 | 141.90 784 | 184.90 488 | 275.08 993 | 301.60 700 | 340.52 895 | 802.30 766 | 839.76 858 | 859.18 (10 rows) {code} The following simplified variants also return wrong results: {code:sql} SELECT sum(x) FROM (SELECT ss_ext_sales_price x, ss_item_sk FROM store_sales GROUP BY ss_item_sk, ss_ext_sales_price UNION ALL SELECT cs_ext_sales_price x, cs_item_sk FROM catalog_sales GROUP BY cs_item_sk, cs_ext_sales_price) tmp GROUP BY x LIMIT 10; Drill Results: ++ | EXPR$0 | ++ | 14141.40 | | 28060.00 | | 30912.70 | | 43706.88 | | 38267.64 | | 10173.00 | | 37829.25 | | 5349.50| | 107515.80 | | 4440.84| ++ 10 rows selected (14.435 seconds) Postgres Results: sum -- 45234.00 5735.31 2275.60 6921.32 2590.46
[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results
[ https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365707#comment-14365707 ] Sean Hsuan-Yi Chu commented on DRILL-2380: -- Drill gave the same result as Postgres: +---+-+ | i_manufact_id | total_sales | +---+-+ | 930 | 1.18| | 818 | 41.86 | | 913 | 141.9 | | 784 | 184.9 | | 488 | 275.08 | | 993 | 301.6 | | 700 | 340.520004 | | 895 | 802.3 | | 766 | 839.76 | | 858 | 859.18 | +---+-+ 10 rows selected (21.237 seconds) TPC-DS Query 33 and simplified variants return wrong results Key: DRILL-2380 URL: https://issues.apache.org/jira/browse/DRILL-2380 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Sean Hsuan-Yi Chu Priority: Critical Fix For: 0.9.0 TPC-DS query 33 returns wrong results. {code:sql} WITH ss AS (SELECT i_manufact_id, Sum(ss_ext_sales_price) total_sales FROM store_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ss_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), cs AS (SELECT i_manufact_id, Sum(cs_ext_sales_price) total_sales FROM catalog_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND cs_item_sk = i_item_sk AND cs_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND cs_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), ws AS (SELECT i_manufact_id, Sum(ws_ext_sales_price) total_sales FROM web_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ws_item_sk = i_item_sk AND ws_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ws_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id) SELECT i_manufact_id, Sum(total_sales) total_sales FROM (SELECT i_manufact_id, total_sales FROM ss UNION ALL SELECT i_manufact_id, total_sales FROM cs UNION ALL SELECT i_manufact_id, total_sales FROM ws) tmp1 GROUP BY i_manufact_id ORDER BY total_sales LIMIT 10; Drill Results: +---+-+ | i_manufact_id | total_sales | +---+-+ | 440 | 0.12| | 434 | 13.16 | | 415 | 14.04 | | 449 | 15.63 | | 563 | 31.46 | | 357 | 49.50 | | 624 | 67.94 | | 192 | 74.40 | | 137 | 83.42 | | 240 | 85.26 | +---+-+ 10 rows selected (7.57 seconds) Postgres Results: i_manufact_id | total_sales ---+- 930 |1.18 818 | 41.86 913 | 141.90 784 | 184.90 488 | 275.08 993 | 301.60 700 | 340.52 895 | 802.30 766 | 839.76 858 | 859.18 (10 rows) {code} The following simplified variants also return wrong results: {code:sql} SELECT sum(x) FROM (SELECT ss_ext_sales_price x, ss_item_sk FROM store_sales GROUP BY ss_item_sk, ss_ext_sales_price UNION ALL SELECT cs_ext_sales_price x, cs_item_sk FROM catalog_sales GROUP BY cs_item_sk, cs_ext_sales_price) tmp
[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results
[ https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361405#comment-14361405 ] Aman Sinha commented on DRILL-2380: --- Actually, the simplified query is doing both sum and group-by on the same column.. which is fine but it does not match the intent of the original TPCDS query. The simplified query would better be written as: {code} SELECT y, sum(x) FROM (SELECT ss_ext_sales_price x, ss_item_sk y FROM store_sales GROUP BY ss_item_sk, ss_ext_sales_price UNION ALL SELECT cs_ext_sales_price x, cs_item_sk y FROM catalog_sales GROUP BY cs_item_sk, cs_ext_sales_price) tmp GROUP BY y ORDER BY y LIMIT 10; {code} TPC-DS Query 33 and simplified variants return wrong results Key: DRILL-2380 URL: https://issues.apache.org/jira/browse/DRILL-2380 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Sean Hsuan-Yi Chu Priority: Critical Fix For: 0.9.0 TPC-DS query 33 returns wrong results. {code:sql} WITH ss AS (SELECT i_manufact_id, Sum(ss_ext_sales_price) total_sales FROM store_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ss_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), cs AS (SELECT i_manufact_id, Sum(cs_ext_sales_price) total_sales FROM catalog_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND cs_item_sk = i_item_sk AND cs_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND cs_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), ws AS (SELECT i_manufact_id, Sum(ws_ext_sales_price) total_sales FROM web_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ws_item_sk = i_item_sk AND ws_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ws_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id) SELECT i_manufact_id, Sum(total_sales) total_sales FROM (SELECT i_manufact_id, total_sales FROM ss UNION ALL SELECT i_manufact_id, total_sales FROM cs UNION ALL SELECT i_manufact_id, total_sales FROM ws) tmp1 GROUP BY i_manufact_id ORDER BY total_sales LIMIT 10; Drill Results: +---+-+ | i_manufact_id | total_sales | +---+-+ | 440 | 0.12| | 434 | 13.16 | | 415 | 14.04 | | 449 | 15.63 | | 563 | 31.46 | | 357 | 49.50 | | 624 | 67.94 | | 192 | 74.40 | | 137 | 83.42 | | 240 | 85.26 | +---+-+ 10 rows selected (7.57 seconds) Postgres Results: i_manufact_id | total_sales ---+- 930 |1.18 818 | 41.86 913 | 141.90 784 | 184.90 488 | 275.08 993 | 301.60 700 | 340.52 895 | 802.30 766 | 839.76 858 | 859.18 (10 rows) {code} The following simplified variants also return wrong results: {code:sql} SELECT sum(x) FROM (SELECT ss_ext_sales_price x, ss_item_sk FROM store_sales GROUP BY ss_item_sk, ss_ext_sales_price UNION ALL SELECT cs_ext_sales_price x, cs_item_sk FROM catalog_sales GROUP BY cs_item_sk, cs_ext_sales_price) tmp GROUP BY x LIMIT 10; Drill Results:
[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results
[ https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361363#comment-14361363 ] Sean Hsuan-Yi Chu commented on DRILL-2380: -- I noticed in your simplified queries, there are always limit 10. This means we just choose 10 results among everything. There is no guarantee which row comes back. TPC-DS Query 33 and simplified variants return wrong results Key: DRILL-2380 URL: https://issues.apache.org/jira/browse/DRILL-2380 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Sean Hsuan-Yi Chu Priority: Critical Fix For: 0.9.0 TPC-DS query 33 returns wrong results. {code:sql} WITH ss AS (SELECT i_manufact_id, Sum(ss_ext_sales_price) total_sales FROM store_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ss_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), cs AS (SELECT i_manufact_id, Sum(cs_ext_sales_price) total_sales FROM catalog_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND cs_item_sk = i_item_sk AND cs_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND cs_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), ws AS (SELECT i_manufact_id, Sum(ws_ext_sales_price) total_sales FROM web_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ws_item_sk = i_item_sk AND ws_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ws_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id) SELECT i_manufact_id, Sum(total_sales) total_sales FROM (SELECT i_manufact_id, total_sales FROM ss UNION ALL SELECT i_manufact_id, total_sales FROM cs UNION ALL SELECT i_manufact_id, total_sales FROM ws) tmp1 GROUP BY i_manufact_id ORDER BY total_sales LIMIT 10; Drill Results: +---+-+ | i_manufact_id | total_sales | +---+-+ | 440 | 0.12| | 434 | 13.16 | | 415 | 14.04 | | 449 | 15.63 | | 563 | 31.46 | | 357 | 49.50 | | 624 | 67.94 | | 192 | 74.40 | | 137 | 83.42 | | 240 | 85.26 | +---+-+ 10 rows selected (7.57 seconds) Postgres Results: i_manufact_id | total_sales ---+- 930 |1.18 818 | 41.86 913 | 141.90 784 | 184.90 488 | 275.08 993 | 301.60 700 | 340.52 895 | 802.30 766 | 839.76 858 | 859.18 (10 rows) {code} The following simplified variants also return wrong results: {code:sql} SELECT sum(x) FROM (SELECT ss_ext_sales_price x, ss_item_sk FROM store_sales GROUP BY ss_item_sk, ss_ext_sales_price UNION ALL SELECT cs_ext_sales_price x, cs_item_sk FROM catalog_sales GROUP BY cs_item_sk, cs_ext_sales_price) tmp GROUP BY x LIMIT 10; Drill Results: ++ | EXPR$0 | ++ | 14141.40 | | 28060.00 | | 30912.70 | | 43706.88 | | 38267.64 | | 10173.00 | | 37829.25 | | 5349.50| | 107515.80 | | 4440.84| ++ 10 rows selected (14.435 seconds) Postgres Results: sum -- 45234.00 5735.31
[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results
[ https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361391#comment-14361391 ] Sean Hsuan-Yi Chu commented on DRILL-2380: -- For the original Query 33, I ran the query but remove d limit. Compared Drill's result with Postgres, for each row with matched i_manufact_id, total_sales are different. TPC-DS Query 33 and simplified variants return wrong results Key: DRILL-2380 URL: https://issues.apache.org/jira/browse/DRILL-2380 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Sean Hsuan-Yi Chu Priority: Critical Fix For: 0.9.0 TPC-DS query 33 returns wrong results. {code:sql} WITH ss AS (SELECT i_manufact_id, Sum(ss_ext_sales_price) total_sales FROM store_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ss_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), cs AS (SELECT i_manufact_id, Sum(cs_ext_sales_price) total_sales FROM catalog_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND cs_item_sk = i_item_sk AND cs_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND cs_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), ws AS (SELECT i_manufact_id, Sum(ws_ext_sales_price) total_sales FROM web_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ws_item_sk = i_item_sk AND ws_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ws_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id) SELECT i_manufact_id, Sum(total_sales) total_sales FROM (SELECT i_manufact_id, total_sales FROM ss UNION ALL SELECT i_manufact_id, total_sales FROM cs UNION ALL SELECT i_manufact_id, total_sales FROM ws) tmp1 GROUP BY i_manufact_id ORDER BY total_sales LIMIT 10; Drill Results: +---+-+ | i_manufact_id | total_sales | +---+-+ | 440 | 0.12| | 434 | 13.16 | | 415 | 14.04 | | 449 | 15.63 | | 563 | 31.46 | | 357 | 49.50 | | 624 | 67.94 | | 192 | 74.40 | | 137 | 83.42 | | 240 | 85.26 | +---+-+ 10 rows selected (7.57 seconds) Postgres Results: i_manufact_id | total_sales ---+- 930 |1.18 818 | 41.86 913 | 141.90 784 | 184.90 488 | 275.08 993 | 301.60 700 | 340.52 895 | 802.30 766 | 839.76 858 | 859.18 (10 rows) {code} The following simplified variants also return wrong results: {code:sql} SELECT sum(x) FROM (SELECT ss_ext_sales_price x, ss_item_sk FROM store_sales GROUP BY ss_item_sk, ss_ext_sales_price UNION ALL SELECT cs_ext_sales_price x, cs_item_sk FROM catalog_sales GROUP BY cs_item_sk, cs_ext_sales_price) tmp GROUP BY x LIMIT 10; Drill Results: ++ | EXPR$0 | ++ | 14141.40 | | 28060.00 | | 30912.70 | | 43706.88 | | 38267.64 | | 10173.00 | | 37829.25 | | 5349.50| | 107515.80 | | 4440.84| ++ 10 rows selected (14.435 seconds) Postgres Results: sum -- 45234.00
[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results
[ https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361401#comment-14361401 ] Aman Sinha commented on DRILL-2380: --- [~seanhychu]'s point is valid .. doing a LIMIT on the results in your simplified query cannot be compared with the Postgres results. The original query has and ORDER BY on total_sales. In your simplified query pls add an {{ORDER BY x}} before the limit. TPC-DS Query 33 and simplified variants return wrong results Key: DRILL-2380 URL: https://issues.apache.org/jira/browse/DRILL-2380 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Sean Hsuan-Yi Chu Priority: Critical Fix For: 0.9.0 TPC-DS query 33 returns wrong results. {code:sql} WITH ss AS (SELECT i_manufact_id, Sum(ss_ext_sales_price) total_sales FROM store_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ss_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), cs AS (SELECT i_manufact_id, Sum(cs_ext_sales_price) total_sales FROM catalog_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND cs_item_sk = i_item_sk AND cs_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND cs_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), ws AS (SELECT i_manufact_id, Sum(ws_ext_sales_price) total_sales FROM web_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ws_item_sk = i_item_sk AND ws_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ws_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id) SELECT i_manufact_id, Sum(total_sales) total_sales FROM (SELECT i_manufact_id, total_sales FROM ss UNION ALL SELECT i_manufact_id, total_sales FROM cs UNION ALL SELECT i_manufact_id, total_sales FROM ws) tmp1 GROUP BY i_manufact_id ORDER BY total_sales LIMIT 10; Drill Results: +---+-+ | i_manufact_id | total_sales | +---+-+ | 440 | 0.12| | 434 | 13.16 | | 415 | 14.04 | | 449 | 15.63 | | 563 | 31.46 | | 357 | 49.50 | | 624 | 67.94 | | 192 | 74.40 | | 137 | 83.42 | | 240 | 85.26 | +---+-+ 10 rows selected (7.57 seconds) Postgres Results: i_manufact_id | total_sales ---+- 930 |1.18 818 | 41.86 913 | 141.90 784 | 184.90 488 | 275.08 993 | 301.60 700 | 340.52 895 | 802.30 766 | 839.76 858 | 859.18 (10 rows) {code} The following simplified variants also return wrong results: {code:sql} SELECT sum(x) FROM (SELECT ss_ext_sales_price x, ss_item_sk FROM store_sales GROUP BY ss_item_sk, ss_ext_sales_price UNION ALL SELECT cs_ext_sales_price x, cs_item_sk FROM catalog_sales GROUP BY cs_item_sk, cs_ext_sales_price) tmp GROUP BY x LIMIT 10; Drill Results: ++ | EXPR$0 | ++ | 14141.40 | | 28060.00 | | 30912.70 | | 43706.88 | | 38267.64 | | 10173.00 | | 37829.25 | | 5349.50| | 107515.80 | | 4440.84| ++ 10 rows selected
[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results
[ https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361320#comment-14361320 ] Sean Hsuan-Yi Chu commented on DRILL-2380: -- in progress now! TPC-DS Query 33 and simplified variants return wrong results Key: DRILL-2380 URL: https://issues.apache.org/jira/browse/DRILL-2380 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Sean Hsuan-Yi Chu Priority: Critical Fix For: 0.9.0 TPC-DS query 33 returns wrong results. {code:sql} WITH ss AS (SELECT i_manufact_id, Sum(ss_ext_sales_price) total_sales FROM store_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ss_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), cs AS (SELECT i_manufact_id, Sum(cs_ext_sales_price) total_sales FROM catalog_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND cs_item_sk = i_item_sk AND cs_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND cs_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), ws AS (SELECT i_manufact_id, Sum(ws_ext_sales_price) total_sales FROM web_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ws_item_sk = i_item_sk AND ws_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ws_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id) SELECT i_manufact_id, Sum(total_sales) total_sales FROM (SELECT i_manufact_id, total_sales FROM ss UNION ALL SELECT i_manufact_id, total_sales FROM cs UNION ALL SELECT i_manufact_id, total_sales FROM ws) tmp1 GROUP BY i_manufact_id ORDER BY total_sales LIMIT 10; Drill Results: +---+-+ | i_manufact_id | total_sales | +---+-+ | 440 | 0.12| | 434 | 13.16 | | 415 | 14.04 | | 449 | 15.63 | | 563 | 31.46 | | 357 | 49.50 | | 624 | 67.94 | | 192 | 74.40 | | 137 | 83.42 | | 240 | 85.26 | +---+-+ 10 rows selected (7.57 seconds) Postgres Results: i_manufact_id | total_sales ---+- 930 |1.18 818 | 41.86 913 | 141.90 784 | 184.90 488 | 275.08 993 | 301.60 700 | 340.52 895 | 802.30 766 | 839.76 858 | 859.18 (10 rows) {code} The following simplified variants also return wrong results: {code:sql} SELECT sum(x) FROM (SELECT ss_ext_sales_price x, ss_item_sk FROM store_sales GROUP BY ss_item_sk, ss_ext_sales_price UNION ALL SELECT cs_ext_sales_price x, cs_item_sk FROM catalog_sales GROUP BY cs_item_sk, cs_ext_sales_price) tmp GROUP BY x LIMIT 10; Drill Results: ++ | EXPR$0 | ++ | 14141.40 | | 28060.00 | | 30912.70 | | 43706.88 | | 38267.64 | | 10173.00 | | 37829.25 | | 5349.50| | 107515.80 | | 4440.84| ++ 10 rows selected (14.435 seconds) Postgres Results: sum -- 45234.00 5735.31 2275.60 6921.32 2590.46 6615.09 14080.77 24819.76 25127.20 (10 rows) SELECT sum(x) FROM (SELECT sum(ss_ext_sales_price) x,