Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16118 )

Change subject: IMPALA-9902: add rewrite of TPC-DS q38
......................................................................


Patch Set 1:

(3 comments)

Hi Tim, thanks for this patch! I only have two minor comments. I left one 
comment in test_tpcds_queries.py and the other one regarding 
EXPECTED_TPCDS_QUERIES_COUNT in the following.

Specifically, I am wondering if we need to change EXPECTED_TPCDS_QUERIES_COUNT 
from 72 to 73 at 
https://github.com/apache/impala/blob/master/tests/util/parse_util.py#L25 since 
we add one more query in this patch. I recall that the following 2 tests depend 
on the value of EXPECTED_TPCDS_QUERIES_COUNT.

1) 
tests/infra/test_stress_infra.py::TestStressInfra::test_stress_finds_workloads: 
https://github.com/apache/impala/blob/master/tests/infra/test_stress_infra.py#L66,
 and
2) 
tests/infra/test_perf_infra.py::TestPerfInfra::test_run_workload_finds_queries: 
https://github.com/apache/impala/blob/master/tests/infra/test_perf_infra.py#L44

http://gerrit.cloudera.org:8080/#/c/16118/1/testdata/workloads/tpcds/queries/tpcds-q38.test
File testdata/workloads/tpcds/queries/tpcds-q38.test:

http://gerrit.cloudera.org:8080/#/c/16118/1/testdata/workloads/tpcds/queries/tpcds-q38.test@1
PS1, Line 1: ====
           : ---- QUERY: TPCDS-Q38
           : -- This is an unofficial variant that is rewritten to use EXISTS 
subqueries instead of
           : -- INTERSECT
           : select count(*) from (
           :     select distinct c_last_name, c_first_name, d_date
           :     from store_sales, date_dim dd1, customer c1
           :           where store_sales.ss_sold_date_sk = dd1.d_date_sk
           :       and store_sales.ss_customer_sk = c1.c_customer_sk
           :       and d_month_seq between 1189 and 1189 + 11
           :       and exists (
           :    select distinct c_last_name, c_first_name, d_date
           :    from catalog_sales, date_dim dd2, customer c2
           :          where catalog_sales.cs_sold_date_sk = dd2.d_date_sk
           :      and catalog_sales.cs_bill_customer_sk = c2.c_customer_sk
           :      and d_month_seq between 1189 and 1189 + 11
           :      and c1.c_last_name <=> c2.c_last_name and c1.c_first_name <=> 
c2.c_first_name and dd1.d_date <=> dd2.d_date
           :      )
           :      and exists (
           :    select distinct c_last_name, c_first_name, d_date
           :    from web_sales, date_dim dd3, customer c3
           :          where web_sales.ws_sold_date_sk = dd3.d_date_sk
           :      and web_sales.ws_bill_customer_sk = c3.c_customer_sk
           :      and d_month_seq between 1189 and 1189 + 11
           :      and c1.c_last_name <=> c3.c_last_name and c1.c_first_name <=> 
c3.c_first_name and dd1.d_date <=> dd3.d_date
           :      )
           : ) hot_cust
           : limit 100
           : ---- RESULTS
           : 108
           : ---- TYPES
           : BIGINT
           : ====
I just found that HS2 is not able to compile the rewritten Q38 due to the 
following error.

Error while compiling statement: FAILED: SemanticException [Error 10249]: Line 
15:9 Unsupported SubQuery Expression 'd_date': Only 1 SubQuery expression is 
supported.

But I am still able to use HS2 to verify the correctness by using the 
unmodified query and I found the results from Impala and HS2 match.


http://gerrit.cloudera.org:8080/#/c/16118/1/testdata/workloads/tpcds/queries/tpcds-q8.test
File testdata/workloads/tpcds/queries/tpcds-q8.test:

http://gerrit.cloudera.org:8080/#/c/16118/1/testdata/workloads/tpcds/queries/tpcds-q8.test@4
PS1, Line 4: -- INTERSECT.
Thanks Tim for catching this! After checking 
$IMPALA_HOME/toolchain/toolchain-packages-gcc7.5.0/tpc-ds-2.1.0/share/tpc-ds/query_templates/query8.tpl,
 I just realized that Q8 is indeed a variant.


http://gerrit.cloudera.org:8080/#/c/16118/1/tests/query_test/test_tpcds_queries.py
File tests/query_test/test_tpcds_queries.py:

http://gerrit.cloudera.org:8080/#/c/16118/1/tests/query_test/test_tpcds_queries.py@127
PS1, Line 127:   def test_tpcds_q38(self, vector):
I am wondering if we need to add the modified Q38 under the class of 
TestTpcdsDecimalV2Query 
(https://github.com/apache/impala/blob/master/tests/query_test/test_tpcds_queries.py#L269).

On the other hand, I am also wondering if it is better to change the name of 
the test to test_tpcds_q38a so that one can tell from its name that the query 
is a variant.



--
To view, visit http://gerrit.cloudera.org:8080/16118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81807683aa265a946729e15156bd2e33724103e1
Gerrit-Change-Number: 16118
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Sun, 28 Jun 2020 19:19:12 +0000
Gerrit-HasComments: Yes

Reply via email to