[Impala-ASF-CR] IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses .. Patch Set 6: Code-Review+2 Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses
Taras Bobrovytsky has submitted this change and it was merged. Change subject: IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses .. IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses Hive does not support aggregates inside the following analytic clauses: * AVG ... OVER * COUNT ... OVER * FIRSTVALUE ... OVER * LAG ... OVER * LASTVALUE ... OVER * LEAD ... OVER * MAX ... OVER * MIN ... OVER * SUM ... OVER So the following query works in Impala, but not in Hive: SELECT LEAD(SUM(1)) OVER (PARTITION BY MAX(a1.tinyint_col_3) ORDER BY MAX(a1.tinyint_col_3)) FROM table_1 a1; but the following query works in both Impala and Hive: SELECT LEAD(1) OVER (PARTITION BY MAX(a1.tinyint_col_3) ORDER BY MAX(a1.tinyint_col_3)) FROM table_1 a1; This patch modifies the qgen code so that it doesn't create aggregates inside the above analytic clauses, if the HiveProfile is used. A new method called get_analytic_funcs_that_cannot_contain_aggs() is added to the DefaultProfile and to the HiveProfile. The implementation in the DefaultProfile returns an empty list. The implementation in the HiveProfile returns the list of methods above. The QueryGenerator._create_agg_or_analytic_tree() method is modified so that it checks the get_analytic_funcs_that_cannot_contain_aggs() method when populating the possible function types of the next child in the function tree. If it finds any of these functions already exist in the tree, it ensures that the next child function cannot be an aggregate function. Misc Changes: A few miscellaneous changes were made that popped up during testing: * Fixed a possible NPE in _create_boolean_func_tree * Disabled ONLY_USE_EQUALITY_JOIN_PREDICATES for the HiveProfile, this should have been done in IMPALA-4101; Hive doesn't support all equality-joins, so specific types need to be disabled, see IMPALA-4101 for more details Testing: * Unit tests added: test_query_generator.py and test_hive_create_agg_or_analytic_tree.py * All unit tests pass * Tested locally against Hive * Tested against Impala via Leopard * Tested against Impala via the discrepancy searcher Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f Reviewed-on: http://gerrit.cloudera.org:8080/4581 Reviewed-by: Taras Bobrovytsky Tested-by: Taras Bobrovytsky --- M tests/comparison/query_generator.py M tests/comparison/query_profile.py A tests/comparison/tests/hive/test_hive_create_agg_or_analytic_tree.py A tests/comparison/tests/test_query_generator.py 4 files changed, 189 insertions(+), 6 deletions(-) Approvals: Taras Bobrovytsky: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/4581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses
Sahil Takiar has posted comments on this change. Change subject: IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses .. Patch Set 6: (1 comment) @Taras, comments addressed. http://gerrit.cloudera.org:8080/#/c/4581/5/tests/comparison/query_generator.py File tests/comparison/query_generator.py: PS5, Line 602: allow > allowed Done -- To view, visit http://gerrit.cloudera.org:8080/4581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses
Sahil Takiar has uploaded a new patch set (#6). Change subject: IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses .. IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses Hive does not support aggregates inside the following analytic clauses: * AVG ... OVER * COUNT ... OVER * FIRSTVALUE ... OVER * LAG ... OVER * LASTVALUE ... OVER * LEAD ... OVER * MAX ... OVER * MIN ... OVER * SUM ... OVER So the following query works in Impala, but not in Hive: SELECT LEAD(SUM(1)) OVER (PARTITION BY MAX(a1.tinyint_col_3) ORDER BY MAX(a1.tinyint_col_3)) FROM table_1 a1; but the following query works in both Impala and Hive: SELECT LEAD(1) OVER (PARTITION BY MAX(a1.tinyint_col_3) ORDER BY MAX(a1.tinyint_col_3)) FROM table_1 a1; This patch modifies the qgen code so that it doesn't create aggregates inside the above analytic clauses, if the HiveProfile is used. A new method called get_analytic_funcs_that_cannot_contain_aggs() is added to the DefaultProfile and to the HiveProfile. The implementation in the DefaultProfile returns an empty list. The implementation in the HiveProfile returns the list of methods above. The QueryGenerator._create_agg_or_analytic_tree() method is modified so that it checks the get_analytic_funcs_that_cannot_contain_aggs() method when populating the possible function types of the next child in the function tree. If it finds any of these functions already exist in the tree, it ensures that the next child function cannot be an aggregate function. Misc Changes: A few miscellaneous changes were made that popped up during testing: * Fixed a possible NPE in _create_boolean_func_tree * Disabled ONLY_USE_EQUALITY_JOIN_PREDICATES for the HiveProfile, this should have been done in IMPALA-4101; Hive doesn't support all equality-joins, so specific types need to be disabled, see IMPALA-4101 for more details Testing: * Unit tests added: test_query_generator.py and test_hive_create_agg_or_analytic_tree.py * All unit tests pass * Tested locally against Hive * Tested against Impala via Leopard * Tested against Impala via the discrepancy searcher Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f --- M tests/comparison/query_generator.py M tests/comparison/query_profile.py A tests/comparison/tests/hive/test_hive_create_agg_or_analytic_tree.py A tests/comparison/tests/test_query_generator.py 4 files changed, 189 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/4581/6 -- To view, visit http://gerrit.cloudera.org:8080/4581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/4581/5/tests/comparison/query_generator.py File tests/comparison/query_generator.py: PS5, Line 602: allow allowed -- To view, visit http://gerrit.cloudera.org:8080/4581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses
Sahil Takiar has posted comments on this change. Change subject: IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses .. Patch Set 5: (5 comments) @Taras, updated. Comments addressed http://gerrit.cloudera.org:8080/#/c/4581/2//COMMIT_MSG Commit Message: Line 20: > It's still not exactly clear to me what is not allowed. Can you give an exa Done http://gerrit.cloudera.org:8080/#/c/4581/2/tests/comparison/query_generator.py File tests/comparison/query_generator.py: PS2, Line 584: basic > It's not clear what the funcs argument means. Can you rename it something m Done Line 797: # Check if the func_tree contains any analytic functions returned by > this line and the comment above it should be right above line 805 Done PS2, Line 803: # Plac > this should be 4 spaces Done http://gerrit.cloudera.org:8080/#/c/4581/2/tests/comparison/query_profile.py File tests/comparison/query_profile.py: PS2, Line 693: get_analytic_funcs_that_cannot_co > how about renaming this to allow_analytics_with_aggs? I renamed it to get_analytic_funcs_that_cannot_contain_aggs() which is a little more verbose, but hopefully is more descriptive. Changing it to allow_analytics_with_aggs would require returning the opposite set of functions (analytic functions that can contain aggs, rather than analytic functions that cannot contain aggs) -- To view, visit http://gerrit.cloudera.org:8080/4581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses
Sahil Takiar has uploaded a new patch set (#5). Change subject: IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses .. IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses Hive does not support aggregates inside the following analytic clauses: * AVG ... OVER * COUNT ... OVER * FIRSTVALUE ... OVER * LAG ... OVER * LASTVALUE ... OVER * LEAD ... OVER * MAX ... OVER * MIN ... OVER * SUM ... OVER So the following query works in Impala, but not in Hive: SELECT LEAD(SUM(1)) OVER (PARTITION BY MAX(a1.tinyint_col_3) ORDER BY MAX(a1.tinyint_col_3)) FROM table_1 a1; but the following query works in both Impala and Hive: SELECT LEAD(1) OVER (PARTITION BY MAX(a1.tinyint_col_3) ORDER BY MAX(a1.tinyint_col_3)) FROM table_1 a1; This patch modifies the qgen code so that it doesn't create aggregates inside the above analytic clauses, if the HiveProfile is used. A new method called get_analytic_funcs_that_cannot_contain_aggs() is added to the DefaultProfile and to the HiveProfile. The implementation in the DefaultProfile returns an empty list. The implementation in the HiveProfile returns the list of methods above. The QueryGenerator._create_agg_or_analytic_tree() method is modified so that it checks the get_analytic_funcs_that_cannot_contain_aggs() method when populating the possible function types of the next child in the function tree. If it finds any of these functions already exist in the tree, it ensures that the next child function cannot be an aggregate function. Misc Changes: A few miscellaneous changes were made that popped up during testing: * Fixed a possible NPE in _create_boolean_func_tree * Disabled ONLY_USE_EQUALITY_JOIN_PREDICATES for the HiveProfile, this should have been done in IMPALA-4101; Hive doesn't support all equality-joins, so specific types need to be disabled, see IMPALA-4101 for more details Testing: * Unit tests added: test_query_generator.py and test_hive_create_agg_or_analytic_tree.py * All unit tests pass * Tested locally against Hive * Tested against Impala via Leopard * Tested against Impala via the discrepancy searcher Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f --- M tests/comparison/query_generator.py M tests/comparison/query_profile.py A tests/comparison/tests/hive/test_hive_create_agg_or_analytic_tree.py A tests/comparison/tests/test_query_generator.py 4 files changed, 189 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/4581/5 -- To view, visit http://gerrit.cloudera.org:8080/4581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses
Sahil Takiar has uploaded a new patch set (#4). Change subject: IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses .. IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses Hive does not support aggregates inside the following analytic clauses: * AVG ... OVER * COUNT ... OVER * FIRSTVALUE ... OVER * LAG ... OVER * LASTVALUE ... OVER * LEAD ... OVER * MAX ... OVER * MIN ... OVER * SUM ... OVER So the following query works in Impala, but not in Hive: SELECT LEAD(SUM(1)) OVER (PARTITION BY MAX(a1.tinyint_col_3) ORDER BY MAX(a1.tinyint_col_3)) FROM table_1 a1; but the following query works in both Impala and Hive: SELECT LEAD(1) OVER (PARTITION BY MAX(a1.tinyint_col_3) ORDER BY MAX(a1.tinyint_col_3)) FROM table_1 a1; This patch modifies the qgen code so that it doesn't create aggregates inside the above analytic clauses, if the HiveProfile is used. A new method called get_analytic_funcs_that_cannot_contain_aggs() is added to the DefaultProfile and to the HiveProfile. The implementation in the DefaultProfile returns an empty list. The implementation in the HiveProfile returns the list of methods above. The QueryGenerator._create_agg_or_analytic_tree() method is modified so that it checks the get_analytic_funcs_that_cannot_contain_aggs() method when populating the possible function types of the next child in the function tree. If it finds any of these functions already exist in the tree, it ensures that the next child function cannot be an aggregate function. Misc Changes: A few miscellaneous changes were made that popped up during testing: * Fixed a possible NPE in _create_boolean_func_tree * Disabled ONLY_USE_EQUALITY_JOIN_PREDICATES for the HiveProfile, this should have been done in IMPALA-4101; Hive doesn't support all equality-joins, so specific types need to be disabled, see IMPALA-4101 for more details Testing: * Unit tests added: test_query_generator.py and test_hive_create_agg_or_analytic_tree.py * All unit tests pass * Tested locally against Hive * Tested against Impala via Leopard * Tested against Impala via the discrepancy searcher Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f --- M tests/comparison/query_generator.py M tests/comparison/query_profile.py A tests/comparison/tests/hive/test_hive_create_agg_or_analytic_tree.py A tests/comparison/tests/test_query_generator.py 4 files changed, 190 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/4581/4 -- To view, visit http://gerrit.cloudera.org:8080/4581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses
Sahil Takiar has uploaded a new patch set (#3). Change subject: IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses .. IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses Hive does not support aggregates inside the following analytic clauses: * AVG ... OVER * COUNT ... OVER * FIRSTVALUE ... OVER * LAG ... OVER * LASTVALUE ... OVER * LEAD ... OVER * MAX ... OVER * MIN ... OVER * SUM ... OVER So the following query works in Impala, but not in Hive: SELECT LEAD(SUM(1)) OVER (PARTITION BY MAX(a1.tinyint_col_3) ORDER BY MAX(a1.tinyint_col_3)) FROM table_1 a1; but the following query works in both Impala and Hive: SELECT LEAD(1) OVER (PARTITION BY MAX(a1.tinyint_col_3) ORDER BY MAX(a1.tinyint_col_3)) FROM table_1 a1; This patch modifies the qgen code so that it doesn't create aggregates inside the above analytic clauses, if the HiveProfile is used. A new method called get_analytic_funcs_that_cannot_contain_aggs() is added to the DefaultProfile and to the HiveProfile. The implementation in the DefaultProfile returns an empty list. The implementation in the HiveProfile returns the list of methods above. The QueryGenerator._create_agg_or_analytic_tree() method is modified so that it checks the get_analytic_funcs_that_cannot_contain_aggs() method when populating the possible function types of the next child in the function tree. If it finds any of these functions already exist in the tree, it ensures that the next child function cannot be an aggregate function. Misc Changes: A few miscellaneous changes were made that popped up during testing: * Fixed a possible NPE in _create_boolean_func_tree * Disabled ONLY_USE_EQUALITY_JOIN_PREDICATES for the HiveProfile, this should have been done in IMPALA-4101; Hive doesn't support all equality-joins, so specific types need to be disabled, see IMPALA-4101 for more details Testing: * Unit tests added: test_query_generator.py and test_hive_create_agg_or_analytic_tree.py * All unit tests pass * Tested locally against Hive * Tested against Impala via Leopard * Tested against Impala via the discrepancy searcher Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f --- M tests/comparison/query_generator.py M tests/comparison/query_profile.py A tests/comparison/tests/hive/test_hive_create_agg_or_analytic_tree.py A tests/comparison/tests/test_query_generator.py 4 files changed, 192 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/4581/3 -- To view, visit http://gerrit.cloudera.org:8080/4581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/4581/2//COMMIT_MSG Commit Message: Line 20: It's still not exactly clear to me what is not allowed. Can you give an example query here that is allowed in Impala but not Hive? http://gerrit.cloudera.org:8080/#/c/4581/2/tests/comparison/query_generator.py File tests/comparison/query_generator.py: PS2, Line 584: funcs It's not clear what the funcs argument means. Can you rename it something more descriptive and add it to the function docstring? Line 797: child_null_arg_pools = set() this line and the comment above it should be right above line 805 PS2, Line 803: this should be 4 spaces http://gerrit.cloudera.org:8080/#/c/4581/2/tests/comparison/query_profile.py File tests/comparison/query_profile.py: PS2, Line 693: get_analytics_cannot_contain_aggs how about renaming this to allow_analytics_with_aggs? -- To view, visit http://gerrit.cloudera.org:8080/4581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses
Sahil Takiar has uploaded a new patch set (#2). Change subject: IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses .. IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses Hive does not support aggregates inside the following analytic clauses: * AVG ... OVER * COUNT ... OVER * FIRSTVALUE ... OVER * LAG ... OVER * LASTVALUE ... OVER * LEAD ... OVER * MAX ... OVER * MIN ... OVER * SUM ... OVER This patch modifies the qgen code so that it doesn't create aggregates inside the above analytic clauses, if the HiveProfile is used. A new method called get_analytics_cannot_contain_aggs() is added to the DefaultProfile and to the HiveProfile. The implementation in the DefaultProfile returns an empty list. The implementation in the HiveProfile returns the list of methods above. The QueryGenerator._create_agg_or_analytic_tree() method is modified so that it checks the get_analytics_cannot_contain_aggs method when populating the possible function types of the next child in the function tree. If it finds any of these functions already exist in the tree, it ensures that the next child function cannot be an aggregate function. Misc Changes: A few miscellaneous changes were made that popped up during testing: * Fixed a possible NPE in _create_boolean_func_tree * Disabled ONLY_USE_EQUALITY_JOIN_PREDICATES for the HiveProfile, this should have been done in IMPALA-4101; Hive doesn't support all equality-joins, so specific types need to be disabled, see IMPALA-4101 for more details Testing: * Unit tests added: test_query_generator.py and test_hive_create_agg_or_analytic_tree.py * All unit tests pass * Tested locally against Hive * Tested against Impala via Leopard * Tested against Impala via the discrepancy searcher Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f --- M tests/comparison/query_generator.py M tests/comparison/query_profile.py A tests/comparison/tests/hive/test_hive_create_agg_or_analytic_tree.py A tests/comparison/tests/test_query_generator.py 4 files changed, 188 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/4581/2 -- To view, visit http://gerrit.cloudera.org:8080/4581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses
Sahil Takiar has uploaded a new change for review. http://gerrit.cloudera.org:8080/4581 Change subject: IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses .. IMPALA-4232: qgen: Hive does not support aggregates inside specific analytic clauses Hive does not support aggregates inside the following analytic clauses: * AVG ... OVER * COUNT ... OVER * FIRSTVALUE ... OVER * LAG ... OVER * LASTVALUE ... OVER * LEAD ... OVER * MAX ... OVER * MIN ... OVER * SUM ... OVER This patch modifies the qgen code so that it doesn't create aggregates inside the above analytic clauses, if the HiveProfile is used. A new method called get_analytics_cannot_contain_aggs() is added to the DefaultProfile and to the HiveProfile. The implementation in the DefaultProfile returns an empty list. The implementation in the HiveProfile returns the list of methods above. The QueryGenerator._create_agg_or_analytic_tree() method is modified so that it checks the get_analytics_cannot_contain_aggs method when populating the possible function types of the next child in the function tree. If it finds any of these functions already exist in the tree, it ensures that the next child function cannot be an aggregate function. Misc Changes: A few miscellaneous changes were made that popped up during testing: * Fixed a possible NPE in _create_boolean_func_tree * Fixed a possible NPE in _populate_func_with_vals * Disabled ONLY_USE_EQUALITY_JOIN_PREDICATES for the HiveProfile, this should have been done in IMPALA-4101; Hive doesn't support all equality-joins, so specific types need to be disabled, see IMPALA-4101 for more details Testing: * Unit tests added: test_query_generator.py and test_hive_create_agg_or_analytic_tree.py * All unit tests pass * Tested locally against Hive * Tested against Impala via Leopard * Tested against Impala via the discrepancy searcher Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f --- M tests/comparison/query_generator.py M tests/comparison/query_profile.py A tests/comparison/tests/hive/test_hive_create_agg_or_analytic_tree.py A tests/comparison/tests/test_query_generator.py 4 files changed, 192 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/4581/1 -- To view, visit http://gerrit.cloudera.org:8080/4581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie1096c4cde7ea52a52b39e31cd93242da53b549f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sahil Takiar