Taras Bobrovytsky has submitted this change and it was merged.

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
......................................................................


IMPALA-4101: qgen: Hive join predicates should only contains equality functions

Background:

Hive only supports equi-joins in its JOIN clause, while Postgres and Impala 
support more
complex functions such as <, <=, >, >=, etc. This change modifies the
QueryGenerator._create_relational_join_condition and
QueryGenerator._create_boolean_func_tree methods to only construct equality join
conditions under certain conditions.

The _create_boolean_func_tree method is invoked via
QueryGenerator -> create_query -> _create_from_clause -> _create_join_clause ->
_create_relational_join_condition -> _create_boolean_func_tree. This method is 
invoked
when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree of 
functions
that would typically be found in any of these clauses.

Changes:

The parameter "signatures" is added to the method _create_boolean_func_tree, 
and it lists
out all the allowed signatures the function is allowed to use. Previously, this 
list of
signatures was populated by calling _funcs_to_allowed_signatures(FUNCS), and if
"signatures" is not specified, then the code defaults back to the results of 
that method.
A new method in the DefaultProfile called get_allowed_join_signatures is 
introduced and
returns a list of function signatures that are allowed within a JOIN clause. The
DefaultProfile allows all given signatures, while the HiveProfile only allows 
for the
Equals and And functions, as well as any function that operates over only one 
column.
The reason for these restrictions is that Hive only allows equality joins, does 
not allow
OR operators in the join clause, and has some restrictions on functions that 
operate over
multiple different tables. This last restriction is somewhat subtle; if one 
side of the
equals operator contains a function that operates over two different tables, 
the other
side of the operator cannot contain either of those tables. While it is 
possible to have
functions that take in multiple input parameters, the inputs must be taken from 
specific
tables to prevent Hive from throwing a compile time exception. Adding support 
for this in
qgen code will require significant effort and modification to some core methods
(_create_relational_join_condition and _populate_func_with_vals), so it's best 
to disable
these for Hive altogether.

Note that the _create_boolean_func_tree still allows for OR operators due to 
some logic
around its "and_or_fill_ratio" variable. The plan is to fix this in a future 
patch that
specifically focuses on removing OR operators from Hive JOIN clauses.

Minor change to discrepancy_searcher so that the logs print out "Hive" instead 
of
"Impala" when running against Hive.

Testing:

* Added a new unit test that ensures the HiveProfile only returns equality joins
* Unit tests pass
* Tested against Hive locally
* Tested against Impala via Leopard
* Tested against Impala via the Discrepancy Checker

Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Reviewed-on: http://gerrit.cloudera.org:8080/4419
Reviewed-by: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Tested-by: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
---
M tests/comparison/discrepancy_searcher.py
M tests/comparison/query_generator.py
M tests/comparison/query_profile.py
A tests/comparison/tests/hive/test_hive_create_relational_join_condition.py
4 files changed, 139 insertions(+), 20 deletions(-)

Approvals:
  Taras Bobrovytsky: Looks good to me, approved; Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: David Knupp <dkn...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>

Reply via email to