Done, thanks Henry!

--Sahil

On Mon, Sep 26, 2016 at 3:57 PM, Henry Robinson <he...@cloudera.com> wrote:

> stak...@cloudera.com - Could you please fill out your Gerrit profile? In
> the top-right corner there's a drop-down menu with 'settings' as an option.
>
> That stops you from showing up on these reviews as 'Anonymous Coward'.
>
> On 25 September 2016 at 18:14, Anonymous Coward (Code Review) <
> ger...@cloudera.org> wrote:
>
>> Hello Michael Brown,
>>
>> I'd like you to reexamine a change.  Please visit
>>
>>     http://gerrit.cloudera.org:8080/4419
>>
>> to look at the new patch set (#9).
>>
>> Change subject: IMPALA-4101: qgen: Hive join predicates should only
>> contains equality functions
>> ......................................................................
>>
>> IMPALA-4101: qgen: Hive join predicates should only contains equality
>> functions
>>
>> Background:
>>
>> Hive only supports equi-joins in its JOIN clause, while Postgres and
>> Impala support more
>> complex functions such as <, <=, >, >=, etc. This change modifies the
>> QueryGenerator._create_relational_join_condition and
>> QueryGenerator._create_boolean_func_tree methods to only construct
>> equality join
>> conditions under certain conditions.
>>
>> The _create_boolean_func_tree method is invoked via
>> QueryGenerator -> create_query -> _create_from_clause ->
>> _create_join_clause ->
>> _create_relational_join_condition -> _create_boolean_func_tree. This
>> method is invoked
>> when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree
>> of functions
>> that would typically be found in any of these clauses.
>>
>> Changes:
>>
>> The parameter "signatures" is added to the method
>> _create_boolean_func_tree, and it lists
>> out all the allowed signatures the function is allowed to use.
>> Previously, this list of
>> signatures was populated by calling _funcs_to_allowed_signatures(FUNCS),
>> and if
>> "signatures" is not specified, then the code defaults back to the results
>> of that method.
>> A new method in the DefaultProfile called get_allowed_join_signatures is
>> introduced and
>> returns a list of function signatures that are allowed within a JOIN
>> clause. The
>> DefaultProfile allows all given signatures, while the HiveProfile only
>> allows for the
>> Equals and And functions, as well as any function that operates over only
>> one column.
>> The reason for these restrictions is that Hive only allows equality
>> joins, does not allow
>> OR operators in the join clause, and has some restrictions on functions
>> that operate over
>> multiple different tables. This last restriction is somewhat subtle; if
>> one side of the
>> equals operator contains a function that operates over two different
>> tables, the other
>> side of the operator cannot contain either of those tables. While it is
>> possible to have
>> functions that take in multiple input parameters, the inputs must be
>> taken from specific
>> tables to prevent Hive from throwing a compile time exception. Adding
>> support for this in
>> qgen code will require significant effort and modification to some core
>> methods
>> (_create_relational_join_condition and _populate_func_with_vals), so
>> it's best to disable
>> these for Hive altogether.
>>
>> Note that the _create_boolean_func_tree still allows for OR operators due
>> to some logic
>> around its "and_or_fill_ratio" variable. The plan is to fix this in a
>> future patch that
>> specifically focuses on removing OR operators from Hive JOIN clauses.
>>
>> Minor change to discrepancy_searcher so that the logs print out "Hive"
>> instead of
>> "Impala" when running against Hive.
>>
>> Testing:
>>
>> * Added a new unit test that ensures the HiveProfile only returns
>> equality joins
>> * Unit tests pass
>> * Tested against Hive locally
>> * Tested against Impala via Leopard
>> * Tested against Impala via the Discrepancy Checker
>>
>> Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
>> ---
>> M tests/comparison/discrepancy_searcher.py
>> M tests/comparison/query_generator.py
>> M tests/comparison/query_profile.py
>> A tests/comparison/tests/hive/test_hive_create_relational_join
>> _condition.py
>> 4 files changed, 139 insertions(+), 20 deletions(-)
>>
>>
>>   git pull ssh://gerrit.cloudera.org:29418/Impala-ASF
>> refs/changes/19/4419/9
>> --
>> To view, visit http://gerrit.cloudera.org:8080/4419
>> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>>
>> Gerrit-MessageType: newpatchset
>> Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
>> Gerrit-PatchSet: 9
>> Gerrit-Project: Impala-ASF
>> Gerrit-Branch: master
>> Gerrit-Owner: stak...@cloudera.com
>> Gerrit-Reviewer: David Knupp <dkn...@cloudera.com>
>> Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
>> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
>> Gerrit-Reviewer: stak...@cloudera.com
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "impala-cr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to impala-cr+unsubscr...@cloudera.com.
>> For more options, visit https://groups.google.com/a/cloudera.com/d/optout
>> .
>>
>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679
>

Reply via email to