Done, thanks Henry! --Sahil
On Mon, Sep 26, 2016 at 3:57 PM, Henry Robinson <he...@cloudera.com> wrote: > stak...@cloudera.com - Could you please fill out your Gerrit profile? In > the top-right corner there's a drop-down menu with 'settings' as an option. > > That stops you from showing up on these reviews as 'Anonymous Coward'. > > On 25 September 2016 at 18:14, Anonymous Coward (Code Review) < > ger...@cloudera.org> wrote: > >> Hello Michael Brown, >> >> I'd like you to reexamine a change. Please visit >> >> http://gerrit.cloudera.org:8080/4419 >> >> to look at the new patch set (#9). >> >> Change subject: IMPALA-4101: qgen: Hive join predicates should only >> contains equality functions >> ...................................................................... >> >> IMPALA-4101: qgen: Hive join predicates should only contains equality >> functions >> >> Background: >> >> Hive only supports equi-joins in its JOIN clause, while Postgres and >> Impala support more >> complex functions such as <, <=, >, >=, etc. This change modifies the >> QueryGenerator._create_relational_join_condition and >> QueryGenerator._create_boolean_func_tree methods to only construct >> equality join >> conditions under certain conditions. >> >> The _create_boolean_func_tree method is invoked via >> QueryGenerator -> create_query -> _create_from_clause -> >> _create_join_clause -> >> _create_relational_join_condition -> _create_boolean_func_tree. This >> method is invoked >> when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree >> of functions >> that would typically be found in any of these clauses. >> >> Changes: >> >> The parameter "signatures" is added to the method >> _create_boolean_func_tree, and it lists >> out all the allowed signatures the function is allowed to use. >> Previously, this list of >> signatures was populated by calling _funcs_to_allowed_signatures(FUNCS), >> and if >> "signatures" is not specified, then the code defaults back to the results >> of that method. >> A new method in the DefaultProfile called get_allowed_join_signatures is >> introduced and >> returns a list of function signatures that are allowed within a JOIN >> clause. The >> DefaultProfile allows all given signatures, while the HiveProfile only >> allows for the >> Equals and And functions, as well as any function that operates over only >> one column. >> The reason for these restrictions is that Hive only allows equality >> joins, does not allow >> OR operators in the join clause, and has some restrictions on functions >> that operate over >> multiple different tables. This last restriction is somewhat subtle; if >> one side of the >> equals operator contains a function that operates over two different >> tables, the other >> side of the operator cannot contain either of those tables. While it is >> possible to have >> functions that take in multiple input parameters, the inputs must be >> taken from specific >> tables to prevent Hive from throwing a compile time exception. Adding >> support for this in >> qgen code will require significant effort and modification to some core >> methods >> (_create_relational_join_condition and _populate_func_with_vals), so >> it's best to disable >> these for Hive altogether. >> >> Note that the _create_boolean_func_tree still allows for OR operators due >> to some logic >> around its "and_or_fill_ratio" variable. The plan is to fix this in a >> future patch that >> specifically focuses on removing OR operators from Hive JOIN clauses. >> >> Minor change to discrepancy_searcher so that the logs print out "Hive" >> instead of >> "Impala" when running against Hive. >> >> Testing: >> >> * Added a new unit test that ensures the HiveProfile only returns >> equality joins >> * Unit tests pass >> * Tested against Hive locally >> * Tested against Impala via Leopard >> * Tested against Impala via the Discrepancy Checker >> >> Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459 >> --- >> M tests/comparison/discrepancy_searcher.py >> M tests/comparison/query_generator.py >> M tests/comparison/query_profile.py >> A tests/comparison/tests/hive/test_hive_create_relational_join >> _condition.py >> 4 files changed, 139 insertions(+), 20 deletions(-) >> >> >> git pull ssh://gerrit.cloudera.org:29418/Impala-ASF >> refs/changes/19/4419/9 >> -- >> To view, visit http://gerrit.cloudera.org:8080/4419 >> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings >> >> Gerrit-MessageType: newpatchset >> Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459 >> Gerrit-PatchSet: 9 >> Gerrit-Project: Impala-ASF >> Gerrit-Branch: master >> Gerrit-Owner: stak...@cloudera.com >> Gerrit-Reviewer: David Knupp <dkn...@cloudera.com> >> Gerrit-Reviewer: Michael Brown <mi...@cloudera.com> >> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> >> Gerrit-Reviewer: stak...@cloudera.com >> >> -- >> You received this message because you are subscribed to the Google Groups >> "impala-cr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to impala-cr+unsubscr...@cloudera.com. >> For more options, visit https://groups.google.com/a/cloudera.com/d/optout >> . >> > > > > -- > Henry Robinson > Software Engineer > Cloudera > 415-994-6679 >