[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results

2015-04-22 Thread Abhishek Girish (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508205#comment-14508205
 ] 

Abhishek Girish commented on DRILL-2380:


Verified on Git.Commit.ID a0a1930 (April 21 build)

{code:sql}
WITH ss 
 AS (SELECT i_manufact_id, 
Sum(ss_ext_sales_price) total_sales 
 FROM   store_sales, 
date_dim, 
customer_address, 
item 
 WHERE  i_manufact_id IN (SELECT i_manufact_id 
  FROM   item 
  WHERE  i_category IN ( 'Books' )) 
AND ss_item_sk = i_item_sk 
AND ss_sold_date_sk = d_date_sk 
AND d_year = 1999 
AND d_moy = 3 
AND ss_addr_sk = ca_address_sk 
AND ca_gmt_offset = -5 
 GROUP  BY i_manufact_id), 
 cs 
 AS (SELECT i_manufact_id, 
Sum(cs_ext_sales_price) total_sales 
 FROM   catalog_sales, 
date_dim, 
customer_address, 
item 
 WHERE  i_manufact_id IN (SELECT i_manufact_id 
  FROM   item 
  WHERE  i_category IN ( 'Books' )) 
AND cs_item_sk = i_item_sk 
AND cs_sold_date_sk = d_date_sk 
AND d_year = 1999 
AND d_moy = 3 
AND cs_bill_addr_sk = ca_address_sk 
AND ca_gmt_offset = -5 
 GROUP  BY i_manufact_id), 
 ws 
 AS (SELECT i_manufact_id, 
Sum(ws_ext_sales_price) total_sales 
 FROM   web_sales, 
date_dim, 
customer_address, 
item 
 WHERE  i_manufact_id IN (SELECT i_manufact_id 
  FROM   item 
  WHERE  i_category IN ( 'Books' )) 
AND ws_item_sk = i_item_sk 
AND ws_sold_date_sk = d_date_sk 
AND d_year = 1999 
AND d_moy = 3 
AND ws_bill_addr_sk = ca_address_sk 
AND ca_gmt_offset = -5 
 GROUP  BY i_manufact_id) 
SELECT i_manufact_id, 
   Sum(total_sales) total_sales 
FROM   (SELECT i_manufact_id, total_sales 
FROM   ss 
UNION ALL 
SELECT i_manufact_id, total_sales
FROM   cs 
UNION ALL 
SELECT i_manufact_id, total_sales
FROM   ws) tmp1 
GROUP  BY i_manufact_id 
ORDER  BY total_sales
LIMIT 10;

Drill:
+---+-+
| i_manufact_id | total_sales |
+---+-+
| 930   | 1.18|
| 818   | 41.86   |
| 913   | 141.9   |
| 784   | 184.9   |
| 488   | 275.08  |
| 993   | 301.6   |
| 700   | 340.520004 |
| 895   | 802.3   |
| 766   | 839.76  |
| 858   | 859.18  |
+---+-+
10 rows selected (11.535 seconds)

Postgres:
 i_manufact_id | total_sales 
---+-
   930 |1.18
   818 |   41.86
   913 |   141.9
   784 |   184.9
   488 |  275.08
   993 |   301.6
   700 |  340.52
   895 |   802.3
   766 |  839.76
   858 |  859.18
(10 rows)
{code}

The issue is now resolved.

 TPC-DS Query 33 and simplified variants return wrong results
 

 Key: DRILL-2380
 URL: https://issues.apache.org/jira/browse/DRILL-2380
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Sean Hsuan-Yi Chu
Priority: Critical
 Fix For: 0.8.0


 TPC-DS query 33 returns wrong results. 
 {code:sql}
 WITH ss 
  AS (SELECT i_manufact_id, 
 Sum(ss_ext_sales_price) total_sales 
  FROM   store_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ss_item_sk = i_item_sk 
 AND ss_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ss_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  cs 
  AS (SELECT i_manufact_id, 
 Sum(cs_ext_sales_price) total_sales 
  FROM   catalog_sales, 
 date_dim, 
 

[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365761#comment-14365761
 ] 

Sean Hsuan-Yi Chu commented on DRILL-2380:
--

The failure was due to Union-All. After new union-all had gotten checked in, 
this query ran and gave the same result as postrgres.

 TPC-DS Query 33 and simplified variants return wrong results
 

 Key: DRILL-2380
 URL: https://issues.apache.org/jira/browse/DRILL-2380
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Sean Hsuan-Yi Chu
Priority: Critical
 Fix For: 0.8.0


 TPC-DS query 33 returns wrong results. 
 {code:sql}
 WITH ss 
  AS (SELECT i_manufact_id, 
 Sum(ss_ext_sales_price) total_sales 
  FROM   store_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ss_item_sk = i_item_sk 
 AND ss_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ss_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  cs 
  AS (SELECT i_manufact_id, 
 Sum(cs_ext_sales_price) total_sales 
  FROM   catalog_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND cs_item_sk = i_item_sk 
 AND cs_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND cs_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  ws 
  AS (SELECT i_manufact_id, 
 Sum(ws_ext_sales_price) total_sales 
  FROM   web_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ws_item_sk = i_item_sk 
 AND ws_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ws_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id) 
 SELECT i_manufact_id, 
Sum(total_sales) total_sales 
 FROM   (SELECT i_manufact_id, total_sales 
 FROM   ss 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   cs 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   ws) tmp1 
 GROUP  BY i_manufact_id 
 ORDER  BY total_sales
 LIMIT 10;
 Drill Results:
 +---+-+
 | i_manufact_id | total_sales |
 +---+-+
 | 440   | 0.12|
 | 434   | 13.16   |
 | 415   | 14.04   |
 | 449   | 15.63   |
 | 563   | 31.46   |
 | 357   | 49.50   |
 | 624   | 67.94   |
 | 192   | 74.40   |
 | 137   | 83.42   |
 | 240   | 85.26   |
 +---+-+
 10 rows selected (7.57 seconds)
 Postgres Results:
  i_manufact_id | total_sales 
 ---+-
930 |1.18
818 |   41.86
913 |  141.90
784 |  184.90
488 |  275.08
993 |  301.60
700 |  340.52
895 |  802.30
766 |  839.76
858 |  859.18
 (10 rows)
 {code}
 The following simplified variants also return wrong results:
 {code:sql}
 SELECT sum(x)
 FROM
 (SELECT ss_ext_sales_price x, ss_item_sk
 FROM  store_sales
  GROUP BY ss_item_sk, ss_ext_sales_price
 UNION ALL
 SELECT cs_ext_sales_price x, cs_item_sk
 FROM catalog_sales
 GROUP BY cs_item_sk, cs_ext_sales_price) tmp
 GROUP BY x
 LIMIT 10;
 Drill Results:
 ++
 |   EXPR$0   |
 ++
 | 14141.40   |
 | 28060.00   |
 | 30912.70   |
 | 43706.88   |
 | 38267.64   |
 | 10173.00   |
 | 37829.25   |
 | 5349.50|
 | 107515.80  |
 | 4440.84|
 ++
 10 rows selected (14.435 seconds)
 Postgres Results:
sum
 --   
  45234.00
   5735.31
   2275.60
   6921.32
   2590.46
   

[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365707#comment-14365707
 ] 

Sean Hsuan-Yi Chu commented on DRILL-2380:
--

Drill gave the same result as Postgres:

+---+-+
| i_manufact_id | total_sales |
+---+-+
| 930   | 1.18|
| 818   | 41.86   |
| 913   | 141.9   |
| 784   | 184.9   |
| 488   | 275.08  |
| 993   | 301.6   |
| 700   | 340.520004 |
| 895   | 802.3   |
| 766   | 839.76  |
| 858   | 859.18  |
+---+-+
10 rows selected (21.237 seconds)

 TPC-DS Query 33 and simplified variants return wrong results
 

 Key: DRILL-2380
 URL: https://issues.apache.org/jira/browse/DRILL-2380
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Sean Hsuan-Yi Chu
Priority: Critical
 Fix For: 0.9.0


 TPC-DS query 33 returns wrong results. 
 {code:sql}
 WITH ss 
  AS (SELECT i_manufact_id, 
 Sum(ss_ext_sales_price) total_sales 
  FROM   store_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ss_item_sk = i_item_sk 
 AND ss_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ss_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  cs 
  AS (SELECT i_manufact_id, 
 Sum(cs_ext_sales_price) total_sales 
  FROM   catalog_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND cs_item_sk = i_item_sk 
 AND cs_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND cs_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  ws 
  AS (SELECT i_manufact_id, 
 Sum(ws_ext_sales_price) total_sales 
  FROM   web_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ws_item_sk = i_item_sk 
 AND ws_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ws_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id) 
 SELECT i_manufact_id, 
Sum(total_sales) total_sales 
 FROM   (SELECT i_manufact_id, total_sales 
 FROM   ss 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   cs 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   ws) tmp1 
 GROUP  BY i_manufact_id 
 ORDER  BY total_sales
 LIMIT 10;
 Drill Results:
 +---+-+
 | i_manufact_id | total_sales |
 +---+-+
 | 440   | 0.12|
 | 434   | 13.16   |
 | 415   | 14.04   |
 | 449   | 15.63   |
 | 563   | 31.46   |
 | 357   | 49.50   |
 | 624   | 67.94   |
 | 192   | 74.40   |
 | 137   | 83.42   |
 | 240   | 85.26   |
 +---+-+
 10 rows selected (7.57 seconds)
 Postgres Results:
  i_manufact_id | total_sales 
 ---+-
930 |1.18
818 |   41.86
913 |  141.90
784 |  184.90
488 |  275.08
993 |  301.60
700 |  340.52
895 |  802.30
766 |  839.76
858 |  859.18
 (10 rows)
 {code}
 The following simplified variants also return wrong results:
 {code:sql}
 SELECT sum(x)
 FROM
 (SELECT ss_ext_sales_price x, ss_item_sk
 FROM  store_sales
  GROUP BY ss_item_sk, ss_ext_sales_price
 UNION ALL
 SELECT cs_ext_sales_price x, cs_item_sk
 FROM catalog_sales
 GROUP BY cs_item_sk, cs_ext_sales_price) tmp
 

[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results

2015-03-13 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361405#comment-14361405
 ] 

Aman Sinha commented on DRILL-2380:
---

Actually, the simplified query is doing both sum and group-by on the same 
column.. which is fine but it does not match the intent of the original TPCDS 
query.  The simplified query would better be written as: 
{code}
SELECT y, sum(x)
FROM
(SELECT ss_ext_sales_price x, ss_item_sk y
FROM  store_sales
 GROUP BY ss_item_sk, ss_ext_sales_price
UNION ALL
SELECT cs_ext_sales_price x, cs_item_sk y
FROM catalog_sales
GROUP BY cs_item_sk, cs_ext_sales_price) tmp
GROUP BY y
ORDER BY y
LIMIT 10;
{code}

 TPC-DS Query 33 and simplified variants return wrong results
 

 Key: DRILL-2380
 URL: https://issues.apache.org/jira/browse/DRILL-2380
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Sean Hsuan-Yi Chu
Priority: Critical
 Fix For: 0.9.0


 TPC-DS query 33 returns wrong results. 
 {code:sql}
 WITH ss 
  AS (SELECT i_manufact_id, 
 Sum(ss_ext_sales_price) total_sales 
  FROM   store_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ss_item_sk = i_item_sk 
 AND ss_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ss_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  cs 
  AS (SELECT i_manufact_id, 
 Sum(cs_ext_sales_price) total_sales 
  FROM   catalog_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND cs_item_sk = i_item_sk 
 AND cs_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND cs_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  ws 
  AS (SELECT i_manufact_id, 
 Sum(ws_ext_sales_price) total_sales 
  FROM   web_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ws_item_sk = i_item_sk 
 AND ws_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ws_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id) 
 SELECT i_manufact_id, 
Sum(total_sales) total_sales 
 FROM   (SELECT i_manufact_id, total_sales 
 FROM   ss 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   cs 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   ws) tmp1 
 GROUP  BY i_manufact_id 
 ORDER  BY total_sales
 LIMIT 10;
 Drill Results:
 +---+-+
 | i_manufact_id | total_sales |
 +---+-+
 | 440   | 0.12|
 | 434   | 13.16   |
 | 415   | 14.04   |
 | 449   | 15.63   |
 | 563   | 31.46   |
 | 357   | 49.50   |
 | 624   | 67.94   |
 | 192   | 74.40   |
 | 137   | 83.42   |
 | 240   | 85.26   |
 +---+-+
 10 rows selected (7.57 seconds)
 Postgres Results:
  i_manufact_id | total_sales 
 ---+-
930 |1.18
818 |   41.86
913 |  141.90
784 |  184.90
488 |  275.08
993 |  301.60
700 |  340.52
895 |  802.30
766 |  839.76
858 |  859.18
 (10 rows)
 {code}
 The following simplified variants also return wrong results:
 {code:sql}
 SELECT sum(x)
 FROM
 (SELECT ss_ext_sales_price x, ss_item_sk
 FROM  store_sales
  GROUP BY ss_item_sk, ss_ext_sales_price
 UNION ALL
 SELECT cs_ext_sales_price x, cs_item_sk
 FROM catalog_sales
 GROUP BY cs_item_sk, cs_ext_sales_price) tmp
 GROUP BY x
 LIMIT 10;
 Drill Results:
 

[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results

2015-03-13 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361363#comment-14361363
 ] 

Sean Hsuan-Yi Chu commented on DRILL-2380:
--

I noticed in your simplified queries, there are always limit 10.

This means we just choose 10 results among everything. There is no guarantee 
which row comes back.

 TPC-DS Query 33 and simplified variants return wrong results
 

 Key: DRILL-2380
 URL: https://issues.apache.org/jira/browse/DRILL-2380
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Sean Hsuan-Yi Chu
Priority: Critical
 Fix For: 0.9.0


 TPC-DS query 33 returns wrong results. 
 {code:sql}
 WITH ss 
  AS (SELECT i_manufact_id, 
 Sum(ss_ext_sales_price) total_sales 
  FROM   store_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ss_item_sk = i_item_sk 
 AND ss_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ss_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  cs 
  AS (SELECT i_manufact_id, 
 Sum(cs_ext_sales_price) total_sales 
  FROM   catalog_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND cs_item_sk = i_item_sk 
 AND cs_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND cs_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  ws 
  AS (SELECT i_manufact_id, 
 Sum(ws_ext_sales_price) total_sales 
  FROM   web_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ws_item_sk = i_item_sk 
 AND ws_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ws_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id) 
 SELECT i_manufact_id, 
Sum(total_sales) total_sales 
 FROM   (SELECT i_manufact_id, total_sales 
 FROM   ss 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   cs 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   ws) tmp1 
 GROUP  BY i_manufact_id 
 ORDER  BY total_sales
 LIMIT 10;
 Drill Results:
 +---+-+
 | i_manufact_id | total_sales |
 +---+-+
 | 440   | 0.12|
 | 434   | 13.16   |
 | 415   | 14.04   |
 | 449   | 15.63   |
 | 563   | 31.46   |
 | 357   | 49.50   |
 | 624   | 67.94   |
 | 192   | 74.40   |
 | 137   | 83.42   |
 | 240   | 85.26   |
 +---+-+
 10 rows selected (7.57 seconds)
 Postgres Results:
  i_manufact_id | total_sales 
 ---+-
930 |1.18
818 |   41.86
913 |  141.90
784 |  184.90
488 |  275.08
993 |  301.60
700 |  340.52
895 |  802.30
766 |  839.76
858 |  859.18
 (10 rows)
 {code}
 The following simplified variants also return wrong results:
 {code:sql}
 SELECT sum(x)
 FROM
 (SELECT ss_ext_sales_price x, ss_item_sk
 FROM  store_sales
  GROUP BY ss_item_sk, ss_ext_sales_price
 UNION ALL
 SELECT cs_ext_sales_price x, cs_item_sk
 FROM catalog_sales
 GROUP BY cs_item_sk, cs_ext_sales_price) tmp
 GROUP BY x
 LIMIT 10;
 Drill Results:
 ++
 |   EXPR$0   |
 ++
 | 14141.40   |
 | 28060.00   |
 | 30912.70   |
 | 43706.88   |
 | 38267.64   |
 | 10173.00   |
 | 37829.25   |
 | 5349.50|
 | 107515.80  |
 | 4440.84|
 ++
 10 rows selected (14.435 seconds)
 Postgres Results:
sum
 --   
  45234.00
   5735.31
  

[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results

2015-03-13 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361391#comment-14361391
 ] 

Sean Hsuan-Yi Chu commented on DRILL-2380:
--

For the original Query 33, I ran the query but remove d limit. 

Compared Drill's result with Postgres, for each row with matched 
i_manufact_id, total_sales are different.

 TPC-DS Query 33 and simplified variants return wrong results
 

 Key: DRILL-2380
 URL: https://issues.apache.org/jira/browse/DRILL-2380
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Sean Hsuan-Yi Chu
Priority: Critical
 Fix For: 0.9.0


 TPC-DS query 33 returns wrong results. 
 {code:sql}
 WITH ss 
  AS (SELECT i_manufact_id, 
 Sum(ss_ext_sales_price) total_sales 
  FROM   store_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ss_item_sk = i_item_sk 
 AND ss_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ss_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  cs 
  AS (SELECT i_manufact_id, 
 Sum(cs_ext_sales_price) total_sales 
  FROM   catalog_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND cs_item_sk = i_item_sk 
 AND cs_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND cs_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  ws 
  AS (SELECT i_manufact_id, 
 Sum(ws_ext_sales_price) total_sales 
  FROM   web_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ws_item_sk = i_item_sk 
 AND ws_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ws_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id) 
 SELECT i_manufact_id, 
Sum(total_sales) total_sales 
 FROM   (SELECT i_manufact_id, total_sales 
 FROM   ss 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   cs 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   ws) tmp1 
 GROUP  BY i_manufact_id 
 ORDER  BY total_sales
 LIMIT 10;
 Drill Results:
 +---+-+
 | i_manufact_id | total_sales |
 +---+-+
 | 440   | 0.12|
 | 434   | 13.16   |
 | 415   | 14.04   |
 | 449   | 15.63   |
 | 563   | 31.46   |
 | 357   | 49.50   |
 | 624   | 67.94   |
 | 192   | 74.40   |
 | 137   | 83.42   |
 | 240   | 85.26   |
 +---+-+
 10 rows selected (7.57 seconds)
 Postgres Results:
  i_manufact_id | total_sales 
 ---+-
930 |1.18
818 |   41.86
913 |  141.90
784 |  184.90
488 |  275.08
993 |  301.60
700 |  340.52
895 |  802.30
766 |  839.76
858 |  859.18
 (10 rows)
 {code}
 The following simplified variants also return wrong results:
 {code:sql}
 SELECT sum(x)
 FROM
 (SELECT ss_ext_sales_price x, ss_item_sk
 FROM  store_sales
  GROUP BY ss_item_sk, ss_ext_sales_price
 UNION ALL
 SELECT cs_ext_sales_price x, cs_item_sk
 FROM catalog_sales
 GROUP BY cs_item_sk, cs_ext_sales_price) tmp
 GROUP BY x
 LIMIT 10;
 Drill Results:
 ++
 |   EXPR$0   |
 ++
 | 14141.40   |
 | 28060.00   |
 | 30912.70   |
 | 43706.88   |
 | 38267.64   |
 | 10173.00   |
 | 37829.25   |
 | 5349.50|
 | 107515.80  |
 | 4440.84|
 ++
 10 rows selected (14.435 seconds)
 Postgres Results:
sum
 --   
  45234.00
   

[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results

2015-03-13 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361401#comment-14361401
 ] 

Aman Sinha commented on DRILL-2380:
---

[~seanhychu]'s point is valid .. doing a LIMIT on the results in your 
simplified query cannot be compared with the Postgres results.  The original 
query has and ORDER BY on total_sales.  In your simplified query pls add an 
{{ORDER BY x}} before the limit. 

 TPC-DS Query 33 and simplified variants return wrong results
 

 Key: DRILL-2380
 URL: https://issues.apache.org/jira/browse/DRILL-2380
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Sean Hsuan-Yi Chu
Priority: Critical
 Fix For: 0.9.0


 TPC-DS query 33 returns wrong results. 
 {code:sql}
 WITH ss 
  AS (SELECT i_manufact_id, 
 Sum(ss_ext_sales_price) total_sales 
  FROM   store_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ss_item_sk = i_item_sk 
 AND ss_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ss_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  cs 
  AS (SELECT i_manufact_id, 
 Sum(cs_ext_sales_price) total_sales 
  FROM   catalog_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND cs_item_sk = i_item_sk 
 AND cs_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND cs_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  ws 
  AS (SELECT i_manufact_id, 
 Sum(ws_ext_sales_price) total_sales 
  FROM   web_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ws_item_sk = i_item_sk 
 AND ws_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ws_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id) 
 SELECT i_manufact_id, 
Sum(total_sales) total_sales 
 FROM   (SELECT i_manufact_id, total_sales 
 FROM   ss 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   cs 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   ws) tmp1 
 GROUP  BY i_manufact_id 
 ORDER  BY total_sales
 LIMIT 10;
 Drill Results:
 +---+-+
 | i_manufact_id | total_sales |
 +---+-+
 | 440   | 0.12|
 | 434   | 13.16   |
 | 415   | 14.04   |
 | 449   | 15.63   |
 | 563   | 31.46   |
 | 357   | 49.50   |
 | 624   | 67.94   |
 | 192   | 74.40   |
 | 137   | 83.42   |
 | 240   | 85.26   |
 +---+-+
 10 rows selected (7.57 seconds)
 Postgres Results:
  i_manufact_id | total_sales 
 ---+-
930 |1.18
818 |   41.86
913 |  141.90
784 |  184.90
488 |  275.08
993 |  301.60
700 |  340.52
895 |  802.30
766 |  839.76
858 |  859.18
 (10 rows)
 {code}
 The following simplified variants also return wrong results:
 {code:sql}
 SELECT sum(x)
 FROM
 (SELECT ss_ext_sales_price x, ss_item_sk
 FROM  store_sales
  GROUP BY ss_item_sk, ss_ext_sales_price
 UNION ALL
 SELECT cs_ext_sales_price x, cs_item_sk
 FROM catalog_sales
 GROUP BY cs_item_sk, cs_ext_sales_price) tmp
 GROUP BY x
 LIMIT 10;
 Drill Results:
 ++
 |   EXPR$0   |
 ++
 | 14141.40   |
 | 28060.00   |
 | 30912.70   |
 | 43706.88   |
 | 38267.64   |
 | 10173.00   |
 | 37829.25   |
 | 5349.50|
 | 107515.80  |
 | 4440.84|
 ++
 10 rows selected 

[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results

2015-03-13 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361320#comment-14361320
 ] 

Sean Hsuan-Yi Chu commented on DRILL-2380:
--

in progress now!

 TPC-DS Query 33 and simplified variants return wrong results
 

 Key: DRILL-2380
 URL: https://issues.apache.org/jira/browse/DRILL-2380
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Sean Hsuan-Yi Chu
Priority: Critical
 Fix For: 0.9.0


 TPC-DS query 33 returns wrong results. 
 {code:sql}
 WITH ss 
  AS (SELECT i_manufact_id, 
 Sum(ss_ext_sales_price) total_sales 
  FROM   store_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ss_item_sk = i_item_sk 
 AND ss_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ss_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  cs 
  AS (SELECT i_manufact_id, 
 Sum(cs_ext_sales_price) total_sales 
  FROM   catalog_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND cs_item_sk = i_item_sk 
 AND cs_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND cs_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  ws 
  AS (SELECT i_manufact_id, 
 Sum(ws_ext_sales_price) total_sales 
  FROM   web_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ws_item_sk = i_item_sk 
 AND ws_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ws_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id) 
 SELECT i_manufact_id, 
Sum(total_sales) total_sales 
 FROM   (SELECT i_manufact_id, total_sales 
 FROM   ss 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   cs 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   ws) tmp1 
 GROUP  BY i_manufact_id 
 ORDER  BY total_sales
 LIMIT 10;
 Drill Results:
 +---+-+
 | i_manufact_id | total_sales |
 +---+-+
 | 440   | 0.12|
 | 434   | 13.16   |
 | 415   | 14.04   |
 | 449   | 15.63   |
 | 563   | 31.46   |
 | 357   | 49.50   |
 | 624   | 67.94   |
 | 192   | 74.40   |
 | 137   | 83.42   |
 | 240   | 85.26   |
 +---+-+
 10 rows selected (7.57 seconds)
 Postgres Results:
  i_manufact_id | total_sales 
 ---+-
930 |1.18
818 |   41.86
913 |  141.90
784 |  184.90
488 |  275.08
993 |  301.60
700 |  340.52
895 |  802.30
766 |  839.76
858 |  859.18
 (10 rows)
 {code}
 The following simplified variants also return wrong results:
 {code:sql}
 SELECT sum(x)
 FROM
 (SELECT ss_ext_sales_price x, ss_item_sk
 FROM  store_sales
  GROUP BY ss_item_sk, ss_ext_sales_price
 UNION ALL
 SELECT cs_ext_sales_price x, cs_item_sk
 FROM catalog_sales
 GROUP BY cs_item_sk, cs_ext_sales_price) tmp
 GROUP BY x
 LIMIT 10;
 Drill Results:
 ++
 |   EXPR$0   |
 ++
 | 14141.40   |
 | 28060.00   |
 | 30912.70   |
 | 43706.88   |
 | 38267.64   |
 | 10173.00   |
 | 37829.25   |
 | 5349.50|
 | 107515.80  |
 | 4440.84|
 ++
 10 rows selected (14.435 seconds)
 Postgres Results:
sum
 --   
  45234.00
   5735.31
   2275.60
   6921.32
   2590.46
   6615.09
  14080.77
  24819.76
  25127.20
 (10 rows)
 SELECT sum(x)
 FROM
 (SELECT sum(ss_ext_sales_price) x,