Sahil Takiar has uploaded a new patch set (#4).

Change subject: IMPALA-4232: qgen: Hive does not support aggregates inside 
specific analytic clauses
......................................................................

IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic 
clauses

Hive does not support aggregates inside the following analytic clauses:

* AVG ... OVER
* COUNT ... OVER
* FIRSTVALUE ... OVER
* LAG ... OVER
* LASTVALUE ... OVER
* LEAD ... OVER
* MAX ... OVER
* MIN ... OVER
* SUM ... OVER

So the following query works in Impala, but not in Hive:

SELECT
LEAD(SUM(1)) OVER (PARTITION BY MAX(a1.tinyint_col_3) ORDER BY 
MAX(a1.tinyint_col_3))
FROM table_1 a1;

but the following query works in both Impala and Hive:

SELECT
LEAD(1) OVER (PARTITION BY MAX(a1.tinyint_col_3) ORDER BY MAX(a1.tinyint_col_3))
FROM table_1 a1;

This patch modifies the qgen code so that it doesn't create aggregates inside 
the above
analytic clauses, if the HiveProfile is used. A new method called
get_analytic_funcs_that_cannot_contain_aggs() is added to the DefaultProfile 
and to the
HiveProfile. The implementation in the DefaultProfile returns an empty list. The
implementation in the HiveProfile returns the list of methods above. The
QueryGenerator._create_agg_or_analytic_tree() method is modified so that it 
checks the
get_analytic_funcs_that_cannot_contain_aggs() method when populating the 
possible
function types of the next child in the function tree. If it finds any of these 
functions
already exist in the tree, it ensures that the next child function cannot be an 
aggregate
function.

Misc Changes:

A few miscellaneous changes were made that popped up during testing:

* Fixed a possible NPE in _create_boolean_func_tree
* Disabled ONLY_USE_EQUALITY_JOIN_PREDICATES for the HiveProfile, this should 
have been
done in IMPALA-4101; Hive doesn't support all equality-joins, so specific types 
need to
be disabled, see IMPALA-4101 for more details

Testing:

* Unit tests added: test_query_generator.py and 
test_hive_create_agg_or_analytic_tree.py
* All unit tests pass
* Tested locally against Hive
* Tested against Impala via Leopard
* Tested against Impala via the discrepancy searcher

Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f
---
M tests/comparison/query_generator.py
M tests/comparison/query_profile.py
A tests/comparison/tests/hive/test_hive_create_agg_or_analytic_tree.py
A tests/comparison/tests/test_query_generator.py
4 files changed, 190 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/4581/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>

Reply via email to