Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16118 )
Change subject: IMPALA-9902: add rewrite of TPC-DS q38 ...................................................................... Patch Set 1: (3 comments) Hi Tim, thanks for this patch! I only have two minor comments. I left one comment in test_tpcds_queries.py and the other one regarding EXPECTED_TPCDS_QUERIES_COUNT in the following. Specifically, I am wondering if we need to change EXPECTED_TPCDS_QUERIES_COUNT from 72 to 73 at https://github.com/apache/impala/blob/master/tests/util/parse_util.py#L25 since we add one more query in this patch. I recall that the following 2 tests depend on the value of EXPECTED_TPCDS_QUERIES_COUNT. 1) tests/infra/test_stress_infra.py::TestStressInfra::test_stress_finds_workloads: https://github.com/apache/impala/blob/master/tests/infra/test_stress_infra.py#L66, and 2) tests/infra/test_perf_infra.py::TestPerfInfra::test_run_workload_finds_queries: https://github.com/apache/impala/blob/master/tests/infra/test_perf_infra.py#L44 http://gerrit.cloudera.org:8080/#/c/16118/1/testdata/workloads/tpcds/queries/tpcds-q38.test File testdata/workloads/tpcds/queries/tpcds-q38.test: http://gerrit.cloudera.org:8080/#/c/16118/1/testdata/workloads/tpcds/queries/tpcds-q38.test@1 PS1, Line 1: ==== : ---- QUERY: TPCDS-Q38 : -- This is an unofficial variant that is rewritten to use EXISTS subqueries instead of : -- INTERSECT : select count(*) from ( : select distinct c_last_name, c_first_name, d_date : from store_sales, date_dim dd1, customer c1 : where store_sales.ss_sold_date_sk = dd1.d_date_sk : and store_sales.ss_customer_sk = c1.c_customer_sk : and d_month_seq between 1189 and 1189 + 11 : and exists ( : select distinct c_last_name, c_first_name, d_date : from catalog_sales, date_dim dd2, customer c2 : where catalog_sales.cs_sold_date_sk = dd2.d_date_sk : and catalog_sales.cs_bill_customer_sk = c2.c_customer_sk : and d_month_seq between 1189 and 1189 + 11 : and c1.c_last_name <=> c2.c_last_name and c1.c_first_name <=> c2.c_first_name and dd1.d_date <=> dd2.d_date : ) : and exists ( : select distinct c_last_name, c_first_name, d_date : from web_sales, date_dim dd3, customer c3 : where web_sales.ws_sold_date_sk = dd3.d_date_sk : and web_sales.ws_bill_customer_sk = c3.c_customer_sk : and d_month_seq between 1189 and 1189 + 11 : and c1.c_last_name <=> c3.c_last_name and c1.c_first_name <=> c3.c_first_name and dd1.d_date <=> dd3.d_date : ) : ) hot_cust : limit 100 : ---- RESULTS : 108 : ---- TYPES : BIGINT : ==== I just found that HS2 is not able to compile the rewritten Q38 due to the following error. Error while compiling statement: FAILED: SemanticException [Error 10249]: Line 15:9 Unsupported SubQuery Expression 'd_date': Only 1 SubQuery expression is supported. But I am still able to use HS2 to verify the correctness by using the unmodified query and I found the results from Impala and HS2 match. http://gerrit.cloudera.org:8080/#/c/16118/1/testdata/workloads/tpcds/queries/tpcds-q8.test File testdata/workloads/tpcds/queries/tpcds-q8.test: http://gerrit.cloudera.org:8080/#/c/16118/1/testdata/workloads/tpcds/queries/tpcds-q8.test@4 PS1, Line 4: -- INTERSECT. Thanks Tim for catching this! After checking $IMPALA_HOME/toolchain/toolchain-packages-gcc7.5.0/tpc-ds-2.1.0/share/tpc-ds/query_templates/query8.tpl, I just realized that Q8 is indeed a variant. http://gerrit.cloudera.org:8080/#/c/16118/1/tests/query_test/test_tpcds_queries.py File tests/query_test/test_tpcds_queries.py: http://gerrit.cloudera.org:8080/#/c/16118/1/tests/query_test/test_tpcds_queries.py@127 PS1, Line 127: def test_tpcds_q38(self, vector): I am wondering if we need to add the modified Q38 under the class of TestTpcdsDecimalV2Query (https://github.com/apache/impala/blob/master/tests/query_test/test_tpcds_queries.py#L269). On the other hand, I am also wondering if it is better to change the name of the test to test_tpcds_q38a so that one can tell from its name that the query is a variant. -- To view, visit http://gerrit.cloudera.org:8080/16118 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I81807683aa265a946729e15156bd2e33724103e1 Gerrit-Change-Number: 16118 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Sun, 28 Jun 2020 19:19:12 +0000 Gerrit-HasComments: Yes
