[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8398 ) Change subject: IMPALA-3436: Return a decimal when rounding a double .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py File common/function-registry/impala_functions.py: http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py@285 PS3, Line 285: [['round_v1','dround_v1'], > We're not keeping decimal_v1 around indefinitely though are we? * Yes, we want to remove decimal_v1. * The alias does not solve the limitations of our V2 functions. * For users that do not want to switch to V2 the V1 workaround is acceptable. * For users who want to switch to V2 it seems weird to tell them to use a V1-semantics function as workaround, i.e., they'd get inconsistent rounding behavior in their workload or even within the same query. I think it's better to be clear that it doesn't work in V2. -- To view, visit http://gerrit.cloudera.org:8080/8398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 Gerrit-Change-Number: 8398 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 15 Nov 2017 23:25:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8398 ) Change subject: IMPALA-3436: Return a decimal when rounding a double .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py File common/function-registry/impala_functions.py: http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py@285 PS3, Line 285: [['round_v1','dround_v1'], > If users rely on that behavior they can use DECIMAL_V1. Users typically aren't happy with workarounds that involve changing their queries. -- To view, visit http://gerrit.cloudera.org:8080/8398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 Gerrit-Change-Number: 8398 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 15 Nov 2017 22:45:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8398 ) Change subject: IMPALA-3436: Return a decimal when rounding a double .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py File common/function-registry/impala_functions.py: http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py@285 PS3, Line 285: [['round_v1','dround_v1'], > Might as well add it now if we're going to add it. If users rely on that behavior they can use DECIMAL_V1. Allowing non-constant 2nd args in DECIMAL_V2 is really a new feature. If we want to that then imo we should focus on fixing the current functions to allow that in a separate patch. -- To view, visit http://gerrit.cloudera.org:8080/8398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 Gerrit-Change-Number: 8398 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 15 Nov 2017 22:41:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3
Alex Behm has abandoned this change. ( http://gerrit.cloudera.org:8080/8426 ) Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3 .. Abandoned It's sad but I couldn't make this change work reliably. Data loading frequently fails in Hive due to underreplicated HDFS blocks. Abandoning to reflect reality. -- To view, visit http://gerrit.cloudera.org:8080/8426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93 Gerrit-Change-Number: 8426 Gerrit-PatchSet: 3 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6183: Fix Decimal to Double conversion
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8547 ) Change subject: IMPALA-6183: Fix Decimal to Double conversion .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9bf81d039e5037f22c64a32b328832235aafe9e3 Gerrit-Change-Number: 8547 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 14 Nov 2017 22:28:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 ) Change subject: IMPALA-5976: Remove equivalence class computation in FE .. Patch Set 4: (24 comments) http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91 PS3, Line 91: /** > I think If the predicate transfer is the starting point to consider whether Fair point. I was thinking of a definition like this: Slot A has a value transfer to slot B if for all rows containing both slots slot B has the same value as slot A or the tuple containing slot B is NULL. http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@102 PS4, Line 102: * Slot value transfer: Slot value transfers: http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@105 PS4, Line 105: * Its symmetric closure is a equivalence relation. Its equivalence class is called slot What does "Its" refer to here? I think it's easier to state less formally: Each slot is contained in exactly one equivalence class. A slot equivalence class is a set of slots where each pair of slots has a mutual value transfer. Equivalence classes are assigned an arbitrary id to distinguish them from another. http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1527 PS4, Line 1527: // select * from (select A.a, B.b from A left join B on A.a=B.b) v where b is null to drive it home even further use "B.col is null" as the predicate to show that the NULL-checking predicate is unrelated to the slots participating in the equivalence http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1648 PS4, Line 1648: // A map from equivalence class IDs to equivalence classese. The equivalence classes typo: classes http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1813 PS4, Line 1813:* Returns a map of slot equivalence classes on the set of slots in given tuples. the given tuples http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1838 PS4, Line 1838:* propagation. Each mapping assigns every slot in srcSids to slot in destTid which has Each mapping assigns every slot in srcSids to a slot in destTid which has a value transfer from srcSid. http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1989 PS4, Line 1989: + "\n" + tc + "Condensed Graph:\n" + condensedTc); move the first "\n" to previous line http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1998 PS4, Line 1998: // transform equality predicates into a transfer graph remove (pretty clear from function comment) http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@691 PS4, Line 691: interface SlotRefComparator { Move this into SlotRef since it's SlotRef specific? http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@696 PS4, Line 696:* Returns if this expr matches 'that'. Two exprs match if: Returns true if this expr matches 'that' http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@699 PS4, Line 699:* 2. For every pair of corresponding SlotRef, slotRefCmp.matches returns true. For every pair of corresponding SlotRefs, slotRefCmp.matches() returns true. http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@700 PS4, Line 700:* 3. For every pair of corresponding non-SlotRef exprs, localEquals returns true. localEquals() (we generally use () for function names to make it clear we are referring to a function) http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@719 PS4, Line 719: if (fn_ == null && that.fn_ == null) return true; I think we should separate matches() and localEquals() more cleanly. I think localEquals() should be non-abstract and have a default implementation that checks the type and fn_ of this and that. Subclasses can override and call super.localEquals() first.
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 14: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 14 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 13 Nov 2017 23:28:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8202 ) Change subject: IMPALA-4704: Turns on client connections when local catalog initialized. .. Patch Set 24: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8202 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Gerrit-Change-Number: 8202 Gerrit-PatchSet: 24 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 13 Nov 2017 17:51:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 14: Code-Review+1 (1 comment) I'm happy with this change. Nice work. http://gerrit.cloudera.org:8080/#/c/7793/14/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/7793/14/be/src/util/min-max-filter.cc@166 PS14, Line 166: // For integer filter types, add a GetCastIntMinMax() that return the min/max values for Sorry I didn't realize this was already in the .h file. Better to remove this comment in favor of the one in .h -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 14 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 10 Nov 2017 05:54:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6164: Fix stale query profile in TestAlwaysFalseFilter
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8498 ) Change subject: IMPALA-6164: Fix stale query profile in TestAlwaysFalseFilter .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I04bb76d20541fa035d88167b593d1b8bc3873e89 Gerrit-Change-Number: 8498 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Fri, 10 Nov 2017 01:14:47 + Gerrit-HasComments: No
[Impala-ASF-CR] Remove unused/defunct Maven repositories.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8497 ) Change subject: Remove unused/defunct Maven repositories. .. Patch Set 1: Code-Review+2 Nice, thanks -- To view, visit http://gerrit.cloudera.org:8080/8497 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79eb6c483561726c7cbaf86874001f1979128720 Gerrit-Change-Number: 8497 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 09 Nov 2017 00:31:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6164: Fix stale query profile in TestAlwaysFalseFilter
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8498 ) Change subject: IMPALA-6164: Fix stale query profile in TestAlwaysFalseFilter .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I04bb76d20541fa035d88167b593d1b8bc3873e89 Gerrit-Change-Number: 8498 Gerrit-PatchSet: 3 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Thu, 09 Nov 2017 00:29:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6164: Fix stale query profile in TestAlwaysFalseFilter
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8498 ) Change subject: IMPALA-6164: Fix stale query profile in TestAlwaysFalseFilter .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8498/2/tests/custom_cluster/test_always_false_filter.py File tests/custom_cluster/test_always_false_filter.py: http://gerrit.cloudera.org:8080/#/c/8498/2/tests/custom_cluster/test_always_false_filter.py@45 PS2, Line 45: cursor.fetchall() comment why we need to fetchall here -- To view, visit http://gerrit.cloudera.org:8080/8498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I04bb76d20541fa035d88167b593d1b8bc3873e89 Gerrit-Change-Number: 8498 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Thu, 09 Nov 2017 00:23:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8506 ) Change subject: IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc327266cfb8b5c05efec97348528cea6904bb20 Gerrit-Change-Number: 8506 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Comment-Date: Thu, 09 Nov 2017 00:04:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 13: (7 comments) Looks pretty good to me http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/exec/kudu-scanner.cc@200 PS13, Line 200: if (!filter->GetCastIntMinMax(col_type, _min, _max)) { Unfortunate that that every Kudu client has to do this. We should consider filing a JIRA against Kudu to address perform this logic on their side. http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/util/min-max-filter.cc@139 PS13, Line 139: return std::numeric_limits::max(); using std::numeric_limits; at the top if this file http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/util/min-max-filter.cc@165 PS13, Line 165: #define NUMERIC_MIN_MAX_FILTER_CAST(NAME) \ Brief comment especially about the return value would be good. http://gerrit.cloudera.org:8080/#/c/7793/13/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7793/13/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@604 PS13, Line 604: // must be a SlotRef pointing to a column. Wc can allow implicit integer casts typo: We can allow implicit integer casts http://gerrit.cloudera.org:8080/#/c/7793/13/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@605 PS13, Line 605: // by casting the min/max value before sending them to Kudu. min/max values http://gerrit.cloudera.org:8080/#/c/7793/13/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test File testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test: http://gerrit.cloudera.org:8080/#/c/7793/13/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test@98 PS13, Line 98: where a.tinyint_col = b.int_col Let's make the min/max filter selective, e.g. by adding where b.int_col in (0,1) or something like that http://gerrit.cloudera.org:8080/#/c/7793/13/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test@103 PS13, Line 103: # The min/max values in the filter are outside the range representable for the target col. Let's also add a non-selective case where the min/max values fall outside the range of the target column, something like: select STRAIGHT_JOIN count(*) from alltypes a join [BROADCAST] (values(min_int() x), (max_int())) v where a.tinyint_col = v.x -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 13 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 09 Nov 2017 00:02:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8398 ) Change subject: IMPALA-3436: Return a decimal when rounding a double .. Patch Set 3: Code-Review+1 (4 comments) I'm happy with the FE changes. Would be good to get a second pair of eyes on the BE portion. http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc File be/src/exprs/decimal-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc@111 PS1, Line 111: DCHECK(!scale.is_null); > Added a test. Agree it's good to be consistent, but it might bite users when we switch DECIMAL_V2 to the default (their queries that have non-constant args in round() will not analyze). We should be sure to clearly document this new behavior/limitation. http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py File common/function-registry/impala_functions.py: http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@284 PS1, Line 284: # TODO: Remove when DECIMAL_V1 is removed. > Because what if someone typed in "select dround(x)". We need to be able to Good point. I was thinking dround() vs round() in toSql() doesn't matter that much, but probably it's better to preserve the existing behavior like you suggest. http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py File common/function-registry/impala_functions.py: http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py@289 PS3, Line 289: [['round','dround','round_v1','dround_v1'], This function is used both decimal_v1/v2 modes. http://gerrit.cloudera.org:8080/#/c/8398/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: http://gerrit.cloudera.org:8080/#/c/8398/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2439 PS3, Line 2439: AnalyzesOk("select round(cast(1.123 as double), int_col) from functional.alltypes", add a positive test with a non-NULL constant second argument to both v1/v2 -- To view, visit http://gerrit.cloudera.org:8080/8398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 Gerrit-Change-Number: 8398 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Comment-Date: Tue, 07 Nov 2017 01:26:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8461 ) Change subject: IMPALA-6151: add query-level fragment/backend counters .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc@1052 PS3, Line 1052: ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L); > I made the change in PS4 Oops, missed that, sorry! -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 06 Nov 2017 23:16:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8461 ) Change subject: IMPALA-6151: add query-level fragment/backend counters .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 4 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 06 Nov 2017 23:16:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278 PS3, Line 278: lock_guard l(subscribers_lock_); : lock_guard t(topic_lock_); > IMO, we shouldn't use spinlock for topic_lock_ since we can potentially do My general opinion on this is "if it ain't broke, don't fix it". Do we have any evidence that these locks are a problem, or is this a case of premature optimization? In any case, this is not the main focus of this patch, so let's avoid creeping in unrelated changes. I vote for reverting. If we have evidence that a mutex is not good here, then let's change that in a separate patch. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 06 Nov 2017 22:05:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8461 ) Change subject: IMPALA-6151: add query-level fragment/backend counters .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc@1052 PS3, Line 1052: ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L); > At this point the query has been removed from all tracking structures so I My understanding is that archiving can take a long time (e.g. huge profile), so it means this metric may be inaccurate for an extended period. I agree it probably doesn't matter that much, but why not try to make the counter as accurate as possible. There is a non-trivial amount of work to be done at this point. -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 06 Nov 2017 21:54:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8461 ) Change subject: IMPALA-6151: add query-level fragment/backend counters .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@993 PS2, Line 993: RETURN_IF_ERROR(CancelInternal(query_id, check_inflight, cause)); > The only reason CancelInternal() can fail is if the query isn't registered Good point http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc@1052 PS3, Line 1052: ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L); Why keep it above ArchiveQuery()? Not your change, but the function comment on ArchiveQuery seems wrong and confusing with decrementing the metric here. -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 04 Nov 2017 04:23:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8322 ) Change subject: IMPALA-1422: support a constant on LHS of IN predicates. .. Patch Set 6: (12 comments) Some high-level comments before I dig deeper. http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@94 PS2, Line 94: // TODO: derived slot refs (e.g., star-expanded) will not have rawPath set. Why can't this be addressed in this patch? http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@287 PS2, Line 287:* rewritten, null is returned. If 'inPred' is rewritten, the resulting expression ... and the RHS is a subquery. http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@299 PS2, Line 299:* 2) Predicate is NOT IN and RHS returns a single row. What if the RHS subquery is correlated? http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319 PS2, Line 319:* NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL THEN C ELSE e END) = C) Is the transformation even correct if RHS is correlated? http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@312 PS6, Line 312:*a) No group by or analytic function in subquery. Ideally, we should not have to distinguish between cases 3a and 3b, and we should always do this rewrite: C NOT IN (RHS) Rewrites to: NOT EXISTS (SELECT 1 FROM (RHS) v WHERE C IS NULL OR IFNULL(v.e, C) = C) My understanding is that such a rewrite dues not work for case 3a when there is correlation, so we use an alternate rewrite that does not require a new inline view. I think we should reorganize the comments here to reflect this line of thinking and the limitations. For example, something like this: 3) Predicate is NOT IN and RHS returns multiple rows. General rewrite: a) No group by or analytic b) Group by or analytic function in subquery http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@316 PS6, Line 316:*REWRITE: Use IFNULL() instead of CASE to simplify. http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319 PS6, Line 319:* NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL THEN C ELSE e END) = C) I think there's another case where this rewrite is not correct: when the subquery has an order by + limit. Basically, we are reimplementing the "predicate push down" correctness checks here. I'm wondering whether we should reduce the scope of this patch and only allow those case where the generic rewrite works. In a separate JIRA we should consider whether we can make the generic rewrite also work for case 3a, and if not what to do instead. What do you think? http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@321 PS6, Line 321:*Example: The following query gives me an IllegalStateException: explain select id from functional.alltypes t1 where 10 not in (select id from functional.alltypestiny order by id limit 2); Existing bug or related to this change? http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@328 PS6, Line 328:*Note: it useful to think of C NOT IN (RHS) as the boolean expansion: it is useful http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@340 PS6, Line 340:*b) Group by or analytic function in subquery. extra space after "by" http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@345 PS6, Line 345:* NOT EXISTS (SELECT x FROM (RHS) t WHERE C = t or t IS NOT NULL) What does C=t mean here? "t" is a table alias and cannot be compared to a C It's not immediately clear to me why the rewrite here is equivalent to the generic rewrite described in 3a. Please explain (see my suggestion above). Alternatively, I'm also ok if you want to just use the generic rewrite here. http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@354 PS6, Line 354:* - Assumes that subquery is uncorrelated,
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8461 ) Change subject: IMPALA-6151: add query-level fragment/backend counters .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@993 PS2, Line 993: RETURN_IF_ERROR(CancelInternal(query_id, check_inflight, cause)); What if we hit an error here? Need to still maintain the metric. http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@1008 PS2, Line 1008: ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L); To be most accurate, shouldn't we move this to the end of this function? There's still some non-trivial work in thus function after this point. http://gerrit.cloudera.org:8080/#/c/8461/2/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/8461/2/common/thrift/metrics.json@477 PS2, Line 477: "label": "Queries Registered", Imo we need to be careful with terminology. Our query states are already hard to reason about for users. How do the queries displayed in our "/queries" page relate to "registered" queries? I know the answer, but users might not due to confusion with what "registered", "in-flight", and "waiting-to-be-closed" mean. I don't really have a good alternative to "registered", but we should at least add in the description that "registered" refers to "in-flight" and "waiting-to-be-closed" queries from the "/queries" page. -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 23:51:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 11: (7 comments) http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27 PS9, Line 27: > RUNTIME_FILTER_WAIT_TIME_MS applies the same to bloom and min-max - its the Thanks for explaining. Can you add this text into a Google Doc for us to keep track of the evolution/meaning of these query options? No need to polish, just put it somewhere. I think the new types of filters will affect our query options in non-trivial ways and we should come up with a plan that minimizes user confusion, adding new options, and deprecating options. http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@41 PS9, Line 41: > Those results are posted in the review comments. I can include them here as Thanks. No need to add here. http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift@202 PS9, Line 202: // Maximum number of bloom runtime filters allowed per query > There's basically two reasons for this: Thanks. Please add this text to a Google Doc for tracking the evolution of these query options. Even though the min/max filters are smaller and bounded in size, I think extreme queries with a large number of joins and join conditions can still cause havoc. Keep in mind that we now allow an *unbounded* number of such filters. Crazy queries will happen. We should not hold this patch, but the lack of safeguards is concerning to me and we should continue thinking. http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/PlanNodes.thrift@103 PS9, Line 103: 12: optional string kudu_col_name case sensitive? http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@359 PS11, Line 359: public Operator getJoinOp() { return exprCmpOp_; } getExprCmpOp() http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@602 PS11, Line 602: if (!(targetExpr instanceof SlotRef) I think only explicit casts are problematic. Implicit casts should be ok, or am I missing something? http://gerrit.cloudera.org:8080/#/c/7793/11/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test: http://gerrit.cloudera.org:8080/#/c/7793/11/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test@144 PS11, Line 144: on a.month = cast(b.month + 1 as int); Was this your change? Why the change? -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 11 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 03 Nov 2017 23:22:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Correct log line in start-impala-cluster.py.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8432 ) Change subject: Correct log line in start-impala-cluster.py. .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8432 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7ceece1c05b9a4ca9f0a08fa30d195f811490c0e Gerrit-Change-Number: 8432 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 02 Nov 2017 22:21:43 + Gerrit-HasComments: No
[Impala-ASF-CR] Correct log line in start-impala-cluster.py.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8432 ) Change subject: Correct log line in start-impala-cluster.py. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8432/1/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/8432/1/bin/start-impala-cluster.py@398 PS1, Line 398: executors = options.cluster_size - options.num_coordinators Could executors be negative due to the same bug? -- To view, visit http://gerrit.cloudera.org:8080/8432 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7ceece1c05b9a4ca9f0a08fa30d195f811490c0e Gerrit-Change-Number: 8432 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Comment-Date: Wed, 01 Nov 2017 23:58:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 9: (14 comments) http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@13 PS9, Line 13: In RuntimeFilterGenerator in the planner, each partitioned hash join ... each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those ... http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27 PS9, Line 27: For now, min-max filters are only applied at the KuduScanner, which Not specific to the code changes, and I don't expect a response here (probably too long :)). How do the existing query options around runtime filters affect the new min/max filters on Kudu? For example, what does DISABLE_ROW_RUNTIME_FILTERING mean for the Kudu min/max filters? How should users think about setting: RUNTIME_FILTER_WAIT_TIME_MS In particular, are min/max filters more effective against Kudu PK or partition columns? http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@41 PS9, Line 41: Perf Testing: Contrived extreme queries are good data points, but how about running the TPCH/DS perf suites against Kudu? http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@44 PS9, Line 44: - Ran a contrived query with a filter that does eliminate any rows does not eliminate http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift@202 PS9, Line 202: // Maximum number of bloom runtime filters allowed per query I think I understand why you did this, but it seems confusing from a user's perspective. Ok to leave, but do you have a story around the eventual meaning of existing query options when HDFS can do min/max and Kudu can do bloom? http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/PlanNode.java@730 PS9, Line 730: output.append(Joiner.on(", ").join(filtersStr) + "\n"); just return the string? don't think we need the 'output' StringBuilder http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@111 PS9, Line 111: private final Operator joinOp_; Let's call this cmpOp_ or exprCmpOp_ or something else because "joinOp_" usually indicates the join type like left outer, right outer, etc. http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@170 PS9, Line 170: SlotRef slotRef = expr.unwrapSlotRef(false); Add a comment stating that the validity of this is checked elsewhere (and where exactly) http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@368 PS9, Line 368: if (node instanceof HdfsScanNode && type_ != TRuntimeFilterType.BLOOM) { I feel like these checks belong in the caller. Having an addTarget() function be a co-op in some cases seems difficult to reason about. Can be cleaned with a isValidTarget() helper function. http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@463 PS9, Line 463: // We only enforce a limit on the number of bloom filters as they are much more This seems really confusing for users. I'm ok with checking in this version, and let's discuss separately. http://gerrit.cloudera.org:8080/#/c/7793/9/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/7793/9/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test@1248 PS9, Line 1248: | runtime filters: RF004 <- o_orderkey, RF005 <- o_clerk To me the re-numbering is a little strange. We can think about how to address this in a follow-on change. I'm thinking that ideally users should be able to quickly determine the number of runtime filters based on the max RF id, so we could assign the real RF id lazily instead of eagerly. http://gerrit.cloudera.org:8080/#/c/7793/9/testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test File testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test:
[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 ) Change subject: IMPALA-5976: Remove equivalence class computation in FE .. Patch Set 3: (62 comments) http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java File fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java@281 PS3, Line 281: return new FunctionCallExpr("if", ifParams); Let's please avoid code changes unrelated to the purpose of this patch as much as reasonable. Cleanup is nice in general, but this patch is already complex and huge so let's not add anything extra. Such changes can also make backporting more difficult due to conflicts, so cleanup should be done in a separate change. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91 PS3, Line 91: * Slot A has value transfer to slot B if a predicate on A can be applied to B in most We need a precise definition. The original definition was more precise. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@93 PS3, Line 93: * It is a reflexive, transitive binary relation on the set of slots. Not sure this part adds value. What's the significance of these statements with respect to our use of value transfers for planning purposes? If we don't make use of these terms/properties anywhere else in the code, then we should remove these statements. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@277 PS3, Line 277: // The SCC-condensed graph representation of value transfer relationship. The SCC-condensed graph representation of all slot value transfers. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@278 PS3, Line 278: SccCondensedGraph valueTransferGraph; public/private? http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1146 PS3, Line 1146:* Create and register an auxiliary predicate to express an mutual value transfer a mutual value transfer http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1467 PS3, Line 1467:* by replacing the slots of a source predicate with slots of the destTid, if for each how about: if every source slot has a value transfer to a slot in destTid http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1513 PS3, Line 1513: // its referenced tuples are NULL. For example: Let's simplify the example and make it as clear as possible: select * from (select A.a, B.b from A left join B on A.a=B.b) v where b is null http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1618 PS3, Line 1618:* For each slot equivalence class, adds/removes predicates from conjuncts such that it Need a definition of equivalence class in the Analyzer class comment. You do mention the term "equivalence class" but I don't think it has the same meaning of the "equivalence class" terminology used here. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1636 PS3, Line 1636: // A map from equivalence class ids to slot equivalence classes containing on slots Garbled sentence, please clean up http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1956 PS3, Line 1956:* Compute the value transfer graph based on the registered value transfer relation the registered value transfers http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1973 PS3, Line 1973: String condensedTc = globalState_.valueTransferGraph.print(); missing space in string http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1982 PS3, Line 1982:* Populate the value transfer relation from eq conjuncts to graph 'g'. Add value-transfer edges to 'g' based on the registered equi-join conjuncts. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2059 PS3, Line 2059:* Returns the equivalence class of slotID. of the given slot id http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2062 PS3, Line 2062: public List getEquivClass(SlotId slot) { slot -> sid
[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8426 ) Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3 .. Patch Set 3: I am running into MAPREDUCE-6441 with this patch precisely during data load. -- To view, visit http://gerrit.cloudera.org:8080/8426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93 Gerrit-Change-Number: 8426 Gerrit-PatchSet: 3 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 31 Oct 2017 21:15:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8202 ) Change subject: IMPALA-4704: Turns on client connections when local catalog initialized. .. Patch Set 19: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8202 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Gerrit-Change-Number: 8202 Gerrit-PatchSet: 19 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 31 Oct 2017 19:06:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8202 ) Change subject: IMPALA-4704: Turns on client connections when local catalog initialized. .. Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h@1014 PS16, Line 1014: /// Flag that records if backend and/or client services have been started. > they're on back-to-back lines now in impala_server.cc so I think we're fine Ahh right, I wrote this comment before seeing the other file :). lgtm -- To view, visit http://gerrit.cloudera.org:8080/8202 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Gerrit-Change-Number: 8202 Gerrit-PatchSet: 16 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 31 Oct 2017 19:06:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8202 ) Change subject: IMPALA-4704: Turns on client connections when local catalog initialized. .. Patch Set 18: Code-Review+1 (3 comments) I'm happy with this change, but would be great to get another pair of eyes on it. Dan, can you take a look? http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@98 PS17, Line 98: ///- Start ImpalaInternalService API > currently, the main flips this bit-- its the one orchestrating the sequence sounds good http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h@1014 PS16, Line 1014: /// an accurate expiration time, and this structure guarantees that we will always > I think the thing I'm objecting to here is that the internal state will dep Works for me. To make the relationship between 'services_started_' and 'is_ready' clear, I think we should add a DCHECK right before setting 'is_ready' that asserts 'services_started_' must be true. http://gerrit.cloudera.org:8080/#/c/8202/18/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8202/18/be/src/service/impala-server.cc@2050 PS18, Line 2050: I just set to true? -- To view, visit http://gerrit.cloudera.org:8080/8202 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Gerrit-Change-Number: 8202 Gerrit-PatchSet: 18 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 31 Oct 2017 18:51:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8426 ) Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3 .. Patch Set 3: I'm still testing this change. With this change we seem to be more prone to hitting the Hive/mapred local executor race (MAPREDUCE-6441 MAPREDUCE-6992). Will sync with PhilZ to see how to proceed. -- To view, visit http://gerrit.cloudera.org:8080/8426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93 Gerrit-Change-Number: 8426 Gerrit-PatchSet: 3 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Comment-Date: Tue, 31 Oct 2017 18:29:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8426 ) Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3 .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8426/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8426/2//COMMIT_MSG@9 PS2, Line 9: and HBase configuration of our > Update? Done -- To view, visit http://gerrit.cloudera.org:8080/8426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93 Gerrit-Change-Number: 8426 Gerrit-PatchSet: 2 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Comment-Date: Tue, 31 Oct 2017 18:26:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3
Hello Bharath Vissapragada, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8426 to look at the new patch set (#3). Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3 .. IMPALA-3887: Use dfs.namenode.replication.min=3 Changes the HDFS config of our mini-cluster to use: dfs.namenode.replication.min=3 Several of our tests including planner tests expect HDFS blocks to have exactly three replicas. With the default HDFS configuration of dfs.namenode.replication.min=1, HDFS acks writes as soon as the min replication is satisfied. This can lead to test failures because HDFS replication is racing to replicate blocks while our tests are executing and expecting to see all replicas. A more detailed description is in the JIRA. Testing: - A core/hdfs ran passed - A core/hdfs run with data loading passed Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93 --- M fe/src/test/resources/hbase-site.xml.template M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl 2 files changed, 21 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/8426/3 -- To view, visit http://gerrit.cloudera.org:8080/8426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93 Gerrit-Change-Number: 8426 Gerrit-PatchSet: 3 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada
[Impala-ASF-CR] IMPALA-6127: Fix timeout in TestRuntimeFilter.test wait time
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8427 ) Change subject: IMPALA-6127: Fix timeout in TestRuntimeFilter.test_wait_time .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8427 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee005bee8e0a535ce59d2e23e56be6004f2eb9de Gerrit-Change-Number: 8427 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Comment-Date: Tue, 31 Oct 2017 16:22:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8202 ) Change subject: IMPALA-4704: Turns on client connections when local catalog initialized. .. Patch Set 17: (7 comments) http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@81 PS17, Line 81: /// whitespace http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@85 PS17, Line 85: /// by clients at the same time. Might want to weave in something like this to motivate the specific startup sequence: The Impala server is considered 'ready' iff it can successfully process requests in all of its roles. The goal is make the state of the service easy to reason about. http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@87 PS17, Line 87: /// 1. Init Init() http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@89 PS17, Line 89: ///- Wait (indefinitelt) for local catalog to be initialized from statestore (if coordinator) typo: indefinitely http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@93 PS17, Line 93: /// 2. Start Start() http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@98 PS17, Line 98: /// Membership callback thread: When is the 'is_ready' metric set? http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h@1014 PS16, Line 1014: ExpirationQueue queries_by_timestamp_; > clarified the comment. Fair point. My preference would be to use 'is_ready' to avoid redundant state. If metrics are seen as a view on the internal state (which I agree with!), then 'is_ready' should report the value of this 'services_started_' flag and not store a separate value. -- To view, visit http://gerrit.cloudera.org:8080/8202 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Gerrit-Change-Number: 8202 Gerrit-PatchSet: 17 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 31 Oct 2017 16:21:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8426 ) Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8426/1/fe/src/test/resources/hbase-site.xml.template File fe/src/test/resources/hbase-site.xml.template: http://gerrit.cloudera.org:8080/#/c/8426/1/fe/src/test/resources/hbase-site.xml.template@38 PS1, Line 38: : : dfs.namenode.replication.min : 3 : > Good question, let me give it a try. I tried without the min replication here and it works fine. Removed it. -- To view, visit http://gerrit.cloudera.org:8080/8426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93 Gerrit-Change-Number: 8426 Gerrit-PatchSet: 1 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Comment-Date: Tue, 31 Oct 2017 15:57:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3
Hello Bharath Vissapragada, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8426 to look at the new patch set (#2). Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3 .. IMPALA-3887: Use dfs.namenode.replication.min=3 Changes the HDFS and HBase configuration of our mini-cluster to use: dfs.namenode.replication.min=3 Several of our tests including planner tests expect HDFS blocks to have exactly three replicas. With the default HDFS configuration of dfs.namenode.replication.min=1, HDFS acks writes as soon as the min replication is satisfied. This can lead to test failures because HDFS replication is racing to replicate blocks while our tests are executing and expecting to see all replicas. A more detailed description is in the JIRA. Testing: - A core/hdfs ran passed - A core/hdfs run with data loading passed Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93 --- M fe/src/test/resources/hbase-site.xml.template M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl 2 files changed, 21 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/8426/2 -- To view, visit http://gerrit.cloudera.org:8080/8426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93 Gerrit-Change-Number: 8426 Gerrit-PatchSet: 2 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada
[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8426 ) Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3 .. Patch Set 1: (1 comment) Don't have many data points regarding the effects on build+test runtimes. The master-core tests have pretty high variance. A quick examination showed times ranging roughly from 4:50 to 5:20. The same job with this patch ran in 5:12 so at least it seems within the same ballpark. I'll do an exhaustive test run and report numbers on that. http://gerrit.cloudera.org:8080/#/c/8426/1/fe/src/test/resources/hbase-site.xml.template File fe/src/test/resources/hbase-site.xml.template: http://gerrit.cloudera.org:8080/#/c/8426/1/fe/src/test/resources/hbase-site.xml.template@38 PS1, Line 38: : : dfs.namenode.replication.min : 3 : > Isn't this redundant? Per my understanding, this only goes into HDFS servic Good question, let me give it a try. I originally tried without modifying hbase-site at all and that didn't work, maybe setting dfs.replication is enough. I agree that setting the min replication here seems weird/unnecessary. -- To view, visit http://gerrit.cloudera.org:8080/8426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93 Gerrit-Change-Number: 8426 Gerrit-PatchSet: 1 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Comment-Date: Tue, 31 Oct 2017 05:27:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8202 ) Change subject: IMPALA-4704: Turns on client connections when local catalog initialized. .. Patch Set 16: (9 comments) Nice, this looks much better http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h@105 PS16, Line 105: /// TODO: The state of a running query is currently not cleaned up if the Let's document the startup procedure and the reasoning behind it here http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h@1014 PS16, Line 1014: /// Flag that records if backend and/or client services have been started. Clarify the meaning of "and/or" here. The distinction seems important. Will this flag be set if only one service has been started or not? I know I suggested this new flag :), but now I'm wondering whether we can use the existing "is_ready" metric for this purpose. http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h@1017 PS16, Line 1017: boost::mutex services_started_lock_; std::atomic_bool instead of this lock? http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.cc@1626 PS16, Line 1626: // Register this backend only if services have been started. Feels clearer to me to check this at the caller. Can be hard to reason about functions that are no-ops depending on internal state. Any reason to have the check in here? In any case we should add a function comment for the new behavior (ideally in the membership callback assuming you agree to move this check) http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.cc@1933 PS16, Line 1933: Status ImpalaServer::Init(int32_t thrift_be_port, int32_t beeswax_port, Why reformat fn args? Seemed ok the way it was. http://gerrit.cloudera.org:8080/#/c/8202/16/tests/custom_cluster/test_catalog_wait.py File tests/custom_cluster/test_catalog_wait.py: http://gerrit.cloudera.org:8080/#/c/8202/16/tests/custom_cluster/test_catalog_wait.py@24 PS16, Line 24: """Impalad must wait for the catalog prior to opening up client ports. Specify the waiting condition more precisely. An impalad must wait for the initial catalog update to arrive via the statestore. http://gerrit.cloudera.org:8080/#/c/8202/16/tests/custom_cluster/test_catalog_wait.py@54 PS16, Line 54: self.expect_connection(self.cluster.impalads[0]) Ideally we'd also check the internal service http://gerrit.cloudera.org:8080/#/c/8202/16/tests/custom_cluster/test_catalog_wait.py@59 PS16, Line 59: def test_query_subset(self): Can we combine this test wit the one above? They seem very similar http://gerrit.cloudera.org:8080/#/c/8202/16/tests/custom_cluster/test_catalog_wait.py@71 PS16, Line 71: self.cluster.impalads[0].service.get_metric_value('impala-server.ready', 1); I'm wondering whether this can be flaky. We often use self.wait_for_metric_value() in case we expect delays. These impalads might still be waiting for the initial catalog update. -- To view, visit http://gerrit.cloudera.org:8080/8202 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Gerrit-Change-Number: 8202 Gerrit-PatchSet: 16 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 31 Oct 2017 05:11:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3
Alex Behm has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8426 Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3 .. IMPALA-3887: Use dfs.namenode.replication.min=3 Changes the HDFS and HBase configuration of our mini-cluster to use: dfs.namenode.replication.min=3 Several of our tests including planner tests expect HDFS blocks to have exactly three replicas. With the default HDFS configuration of dfs.namenode.replication.min=1, HDFS acks writes as soon as the min replication is satisfied. This can lead to test failures because HDFS replication is racing to replicate blocks while our tests are executing and expecting to see all replicas. A more detailed description is in the JIRA. Testing: - A core/hdfs ran passed - A core/hdfs run with data loading passed Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93 --- M fe/src/test/resources/hbase-site.xml.template M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl 2 files changed, 26 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/8426/1 -- To view, visit http://gerrit.cloudera.org:8080/8426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93 Gerrit-Change-Number: 8426 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-6124: Fix alter table ddl updates and test
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8411 ) Change subject: IMPALA-6124: Fix alter table ddl updates and test .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8411/4/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/8411/4/tests/metadata/test_last_ddl_time_update.py@25 PS4, Line 25: class TestLastDdlTimeUpdate(ImpalaTestSuite): > Does this test still make sense? There's no code on our side that modifies Sorry, I was thinking ahead. Nvm this comment. -- To view, visit http://gerrit.cloudera.org:8080/8411 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3126252e7709304d3e1fa4bb06a0b847180bd6cf Gerrit-Change-Number: 8411 Gerrit-PatchSet: 4 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 30 Oct 2017 23:43:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6124: Fix alter table ddl updates and test
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8411 ) Change subject: IMPALA-6124: Fix alter table ddl updates and test .. Patch Set 6: Code-Review+1 (1 comment) Lgtm. Bharath, please do the +2 http://gerrit.cloudera.org:8080/#/c/8411/4/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/8411/4/tests/metadata/test_last_ddl_time_update.py@25 PS4, Line 25: class TestLastDdlTimeUpdate(ImpalaTestSuite): Does this test still make sense? There's no code on our side that modifies the lastDdlTime -- To view, visit http://gerrit.cloudera.org:8080/8411 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3126252e7709304d3e1fa4bb06a0b847180bd6cf Gerrit-Change-Number: 8411 Gerrit-PatchSet: 6 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 30 Oct 2017 23:42:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6118: Fix assertion failure in coordinator bloom filter updating
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8410 ) Change subject: IMPALA-6118: Fix assertion failure in coordinator bloom filter updating .. Patch Set 3: Code-Review+2 Seems fine, std::string is weird -- To view, visit http://gerrit.cloudera.org:8080/8410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia28f9eb8d0b3e795d798949a1cfcfeaaec7040ed Gerrit-Change-Number: 8410 Gerrit-PatchSet: 3 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 30 Oct 2017 23:33:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8202 ) Change subject: IMPALA-4704: Turns on client connections when local catalog initialized. .. Patch Set 15: (8 comments) http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/frontend.cc File be/src/service/frontend.cc: http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/frontend.cc@31 PS15, Line 31: #ifndef NDEBUG What's the harm in always compiling it in? http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/frontend.cc@256 PS15, Line 256: if (result) return true; return result? http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/impala-server.h@125 PS15, Line 125: /// Initializes client RPC services. Must be called after InitInternalServices. Is a we typically say InitInternalServices() to make it clear it's a function http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/impala-server.cc@1541 PS15, Line 1541: // Additionally wait for the local catalog to be initialized if the server is a Suggest rephrasing "wait" to clarify. "wait" could be misinterpreted as blocking. Maybe say something like: Announce the availability of this impalad coordinator once the local catalog has been initialized. http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/impalad-main.cc File be/src/service/impalad-main.cc: http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/impalad-main.cc@91 PS15, Line 91: Status status = impala_server->StartInternalServices(); As discussed, ordering is not quite right yet. It might make sense to add flags for client_services_started and internal_services_started to address the race between opening internal/client ports and announcing the availability of this daemon through the membership callback. http://gerrit.cloudera.org:8080/#/c/8202/15/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/8202/15/bin/start-impala-cluster.py@81 PS15, Line 81: # For testing: list of comma-separated delays, in milliseconds, that delay catalog clarify which catalog exactly http://gerrit.cloudera.org:8080/#/c/8202/15/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8202/15/fe/src/main/java/org/apache/impala/service/Frontend.java@877 PS15, Line 877: } add a final LOG.info() here declaring success (and remove the one in impala-server) http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/test/java/org/apache/impala/service/FrontendTest.java File fe/src/test/java/org/apache/impala/service/FrontendTest.java: http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/test/java/org/apache/impala/service/FrontendTest.java@a69 PS12, Line 69: > I was a bit surprised to see authorization configs here-- what was the inte Sentry can configured in a service mode or with a file-based auth policy. I suppose this test was checking that the readiness check works regardless of auth policy. Not sure how much sense that makes. For auth we use AuthorizationTest. I'm thinking this test doesn't make sense anymore. -- To view, visit http://gerrit.cloudera.org:8080/8202 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Gerrit-Change-Number: 8202 Gerrit-PatchSet: 15 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 30 Oct 2017 21:36:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6126: ASAN detects heap-use-after-free in thrift-server-test
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8412 ) Change subject: IMPALA-6126: ASAN detects heap-use-after-free in thrift-server-test .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8412/1/be/src/rpc/thrift-server-test.cc File be/src/rpc/thrift-server-test.cc: http://gerrit.cloudera.org:8080/#/c/8412/1/be/src/rpc/thrift-server-test.cc@109 PS1, Line 109: string current_executable_path; Brief comment why this needs to be here. Alternative: static member in the ThriftParamsTest class? -- To view, visit http://gerrit.cloudera.org:8080/8412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2cd434757de2cd384def5b360a479e51812a Gerrit-Change-Number: 8412 Gerrit-PatchSet: 1 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 30 Oct 2017 16:41:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6124: Make assertion in ddl update test resilient to long runtime
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8411 ) Change subject: IMPALA-6124: Make assertion in ddl update test resilient to long runtime .. Patch Set 1: Code-Review+2 lgtm once you complete testing -- To view, visit http://gerrit.cloudera.org:8080/8411 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3126252e7709304d3e1fa4bb06a0b847180bd6cf Gerrit-Change-Number: 8411 Gerrit-PatchSet: 1 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Sat, 28 Oct 2017 17:00:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6118: Fix assertion failure in coordinator bloom filter updating
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8410 ) Change subject: IMPALA-6118: Fix assertion failure in coordinator bloom filter updating .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8410/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/8410/1/be/src/runtime/coordinator.cc@1180 PS1, Line 1180: bloom_filter_.directory.clear(); I think the bigger problem here is actually that the memory of the string is not freed (meaning we have a big chunk of untracked memory!). You could create a new string on the stack and move() the contents into that string, so the memory is freed. Then I think the old assumption of 0 capacity is valid again. -- To view, visit http://gerrit.cloudera.org:8080/8410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia28f9eb8d0b3e795d798949a1cfcfeaaec7040ed Gerrit-Change-Number: 8410 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Comment-Date: Sat, 28 Oct 2017 04:41:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6123: Fix column order of a query test in test inline view limit
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8409 ) Change subject: IMPALA-6123: Fix column order of a query test in test_inline_view_limit .. Patch Set 2: Any idea how this test succeeded in the pre-commit tests? -- To view, visit http://gerrit.cloudera.org:8080/8409 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11667872b8788a8b0040bf9252bf07b987b5d330 Gerrit-Change-Number: 8409 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 28 Oct 2017 04:01:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6123: Fix column order of a query test in test inline view limit
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8409 ) Change subject: IMPALA-6123: Fix column order of a query test in test_inline_view_limit .. Patch Set 2: Actually nvm, I see it only fails i exhaustive. Makes sense. -- To view, visit http://gerrit.cloudera.org:8080/8409 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11667872b8788a8b0040bf9252bf07b987b5d330 Gerrit-Change-Number: 8409 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 28 Oct 2017 04:02:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8398 ) Change subject: IMPALA-3436: Return a decimal when rounding a double .. Patch Set 1: (13 comments) http://gerrit.cloudera.org:8080/#/c/8398/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8398/1//COMMIT_MSG@16 PS1, Line 16: This is implemented by introducing a "hack" where we make the parser How about we say rewrite or translation instead of hack? I don't think it's really a terrible hack - it's a minimal change. http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc File be/src/exprs/decimal-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc@111 PS1, Line 111: DCHECK(!scale.is_null); Add a FE test case in AnalyzeExprsTest.java that makes sure this is enforced. The existing round() functions that take a DECIMAL argument must have a non-NULL, constant second argument. New tests should be added to make sure that is also enforced for your new round() function. There's also a subtle change in behavior we should consider. The existing round() function accepts non-constant arguments, should the new round function() also allow that or not? http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc@116 PS1, Line 116: bool ovf = false; overflow http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions.h File be/src/exprs/decimal-functions.h: http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions.h@58 PS1, Line 58: FunctionContext*, const DoubleVal&, const IntVal&); add arg names for consistency http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py File common/function-registry/impala_functions.py: http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@284 PS1, Line 284: [['round_v1','dround_v1'], Why keep the dround_v1 alias? http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@286 PS1, Line 286: [['round','dround'], 'DECIMAL', ['DOUBLE', 'INT'], 'impala::DecimalFunctions::RoundTo'], Can you organize these and comment on whether these are used in v1/v2 or both? Seems confusing now. http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@287 PS1, Line 287: [['round','dround','round_v1','dround_v1'], Why keep the dround_v1 alias? http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@403 PS1, Line 403: [['round','dround','round_v1','dround_v1'], Why keep the dround_v1 alias in these? http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/cup/sql-parser.cup@70 PS1, Line 70: private TQueryOptions queryOptions; add TODO to remove when decimal v1 is gone, maybe come up with a specific thing you can grep for later in the code, e.g. DECIMAL_V1 and use that consistently in the TODOs for removal http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@100 PS1, Line 100: FunctionCallExpr functionCallExpr = new FunctionCallExpr(fnName, params); newline http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@111 PS1, Line 111: // nullif(x, y) -> if(x DISTINCT FROM y, x, NULL) newline http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@385 PS1, Line 385:* This can only be called for functions that return wildcard decimals and the first Is this comment still accurate? http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@407 PS1, Line 407: // The situtation where none of the parameters are decimal, but the return type is * Shrink comment to: None of the parameters are decimal but the return type is decimal. * It's not clear to me why we'd use (38,0) in this case, can you explain? For example, round(double, int) does not have decimal args. Shouldn't the return type be (38,X) where X is the value of the second arg if it is a constant? -- To view, visit http://gerrit.cloudera.org:8080/8398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 Gerrit-Change-Number: 8398 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Comment-Date: Fri, 27 Oct 2017
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 13 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 26 Oct 2017 17:06:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Alex Behm has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Reviewed-on: http://gerrit.cloudera.org:8080/7564 Tested-by: Impala Public Jenkins Reviewed-by: Alex Behm--- M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/service/fe-support.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test 12 files changed, 775 insertions(+), 108 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 14 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 12: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@288 PS10, Line 288: if (!currentTestCase.isValid()) { > Correct, every PlannerTest .test file ends with "". I've removed parse I agree. This code is really confusing and could use some cleanup - but not in this patch. I believe we already have a check similar to the one you proposed. Try creating an invalid .test file, e.g., by omitting the last "". -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 12 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 25 Oct 2017 18:39:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] DRAFT IMPALA-5185: Skip pages based on Parquet::Statistics
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7354 ) Change subject: DRAFT IMPALA-5185: Skip pages based on Parquet::Statistics .. Patch Set 3: Lars, if this patch is not actively being worked on can we abandon? -- To view, visit http://gerrit.cloudera.org:8080/7354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8eec838c5baf22167049f570dd0ef9762c5ae0a6 Gerrit-Change-Number: 7354 Gerrit-PatchSet: 3 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 25 Oct 2017 05:18:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4978 / IMPALA-5631: [DOCS] Add FQDN known issue
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7388 ) Change subject: IMPALA-4978 / IMPALA-5631: [DOCS] Add FQDN known issue .. Patch Set 2: Documenting this seems kinda important. Adding Sailesh as reviewer to help move this forward. Sailesh, feel free to find another suitable person to review this. -- To view, visit http://gerrit.cloudera.org:8080/7388 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib039d0102878f1c05470371f581cb258287b9bc0 Gerrit-Change-Number: 7388 Gerrit-PatchSet: 2 Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 25 Oct 2017 05:18:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6004: Fix test row filters failure on ASAN
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8358 ) Change subject: IMPALA-6004: Fix test_row_filters failure on ASAN .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/8358/1/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test File testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test: http://gerrit.cloudera.org:8080/#/c/8358/1/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test@350 PS1, Line 350: SET RUNTIME_FILTER_WAIT_TIME_MS=$RUNTIME_FILTER_WAIT_TIME_MS; > Its not really necessary. There's one case (14) that seems to legitimately Thanks for making this change. I agree it's maybe not strictly necessary, but fine-tuning the timeout on a case-by-case basis has already caused many build failures and subsequent fixes to bump the timeouts. I'm hoping that this blanket configuration will ultimately be more stable. -- To view, visit http://gerrit.cloudera.org:8080/8358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia098735594b36a72f02bf7edd051171689618051 Gerrit-Change-Number: 8358 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 25 Oct 2017 05:05:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Prereqs for load test system testing
Alex Behm has abandoned this change. ( http://gerrit.cloudera.org:8080/5851 ) Change subject: Prereqs for load test system testing .. Abandoned Abandoning to reflect that this is not actively being worked on. -- To view, visit http://gerrit.cloudera.org:8080/5851 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I38b5b4fe8c30d1c5f591bbe37b6be8bb8a5398ab Gerrit-Change-Number: 5851 Gerrit-PatchSet: 1 Gerrit-Owner: Harrison Sheinblatt
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 11: (13 comments) Looks good, final comments. http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@455 PS10, Line 455: TQueryOptions options; > I've renamed the parameter. Thanks for investigating and providing an explanation! We should summarize your findings and add them to the function comment of FeSupport#NativeParseQueryOptions() to explain why we must pass in an existing TQueryOptions for this to work as intended. http://gerrit.cloudera.org:8080/#/c/7564/11/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/7564/11/be/src/service/fe-support.cc@453 PS11, Line 453: Java_org_apache_impala_service_FeSupport_NativeParseQueryOptions( Nice and clean! Thanks. http://gerrit.cloudera.org:8080/#/c/7564/11/be/src/service/fe-support.cc@454 PS11, Line 454: JNIEnv* env, jclass caller_class, jstring options_str, jbyteArray tquery_options) { rename options_str to csv_query_options for consistency http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@397 PS10, Line 397: int maxNumFilters = ctx.getQueryOptions().getMax_num_runtime_filters(); > 0 is a valid value. MAX_NUM_RUNTIME_FILTERS=0 effectively removes every run Sorry, my mistake. For some reason I didn't read the ">=" correctly. http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java File fe/src/main/java/org/apache/impala/service/FeSupport.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@93 PS10, Line 93: > See my response to be/src/service/fe-support.cc L455. Yikes! What a can of worms. Thanks for investigating and explaining. http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@288 PS10, Line 288: if (!currentTestCase.isValid()) { > L288-L293 handles test cases found in the middle of the test file. All PlannerTest .test files should end with "", so I think the code here in L228 is enough. I tried your patch locally using a .test file with 3 tests and the query option parsing worked as expected without L338. I might be missing something, if so, can you point me to an example file where we also need the parseQueryOptions() call in L338? http://gerrit.cloudera.org:8080/#/c/7564/11/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java: http://gerrit.cloudera.org:8080/#/c/7564/11/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@359 PS11, Line 359: " - " + e.getMessage()); also pass in 'e' as the cause http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test: http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@38 PS11, Line 38: DISTRIBUTEDPLAN What's the value in keeping this section? http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@174 PS11, Line 174: DISTRIBUTEDPLAN What's the value in keeping this section? http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@257 PS11, Line 257: DISTRIBUTEDPLAN What's the value in keeping this section? http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@392 PS11, Line 392: MAX_NUM_RUNTIME_FILTERS=4 How about setting to 3 so we can see both local and non-local filters being removed from the distributed plan http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@582 PS11, Line 582: DISTRIBUTEDPLAN What's the value in keeping this section?
[Impala-ASF-CR] IMPALA-6060: Check the return value of JNI exception handling functions
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8334 ) Change subject: IMPALA-6060: Check the return value of JNI exception handling functions .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8334 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3ed88bf8739c56a066f2402727c8204e96aa116 Gerrit-Change-Number: 8334 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Tue, 24 Oct 2017 17:24:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6060: Check the return value of JNI exception handling functions
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8334 ) Change subject: IMPALA-6060: Check the return value of JNI exception handling functions .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc File be/src/util/jni-util.cc: http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc@190 PS3, Line 190: auto msg = static_cast(env->CallStaticObjectMethod(jni_util_class(), > Same here and in other cases below: The use of auto obscures the code. This It is somewhat obvious from the static_cast, of course. Still, auto generally adds cognitive burden to the reader unless it really helps readability -- To view, visit http://gerrit.cloudera.org:8080/8334 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3ed88bf8739c56a066f2402727c8204e96aa116 Gerrit-Change-Number: 8334 Gerrit-PatchSet: 3 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Tue, 24 Oct 2017 00:42:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6060: Check the return value of JNI exception handling functions
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8334 ) Change subject: IMPALA-6060: Check the return value of JNI exception handling functions .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc File be/src/util/jni-util.cc: http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc@190 PS3, Line 190: auto msg = static_cast(env->CallStaticObjectMethod(jni_util_class(), Same here and in other cases below: The use of auto obscures the code. This is a jstring which really is not obvious and does matter. http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc@194 PS3, Line 194: auto oom_msg = Substitute(oom_msg_template, "throwableToString"); no auto please http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc@201 PS3, Line 201: auto stack = static_cast(env->CallStaticObjectMethod(jni_util_class(), no auto please http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc@205 PS3, Line 205: auto oom_msg = Substitute(oom_msg_template, "throwableToStackTrace"); no auto please -- To view, visit http://gerrit.cloudera.org:8080/8334 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3ed88bf8739c56a066f2402727c8204e96aa116 Gerrit-Change-Number: 8334 Gerrit-PatchSet: 3 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Tue, 24 Oct 2017 00:41:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6068: Fix dataload for complextypes fileformat
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8350 ) Change subject: IMPALA-6068: Fix dataload for complextypes_fileformat .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/8350/3/testdata/common/text_delims_table.py File testdata/common/text_delims_table.py: http://gerrit.cloudera.org:8080/#/c/8350/3/testdata/common/text_delims_table.py@60 PS3, Line 60: print ("LOAD DATA LOCAL INPATH '%s' " Mention that this is intended to be printed into one of the sections of our functional schema template. -- To view, visit http://gerrit.cloudera.org:8080/8350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 Gerrit-Change-Number: 8350 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Tue, 24 Oct 2017 00:22:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8202 ) Change subject: IMPALA-4704: Turns on client connections when local catalog initialized. .. Patch Set 12: (10 comments) http://gerrit.cloudera.org:8080/#/c/8202/12/be/src/service/frontend.h File be/src/service/frontend.h: http://gerrit.cloudera.org:8080/#/c/8202/12/be/src/service/frontend.h@171 PS12, Line 171: Status WaitForCatalog(); Why even return anything? Success is implied by this function returning. http://gerrit.cloudera.org:8080/#/c/8202/12/be/src/service/frontend.cc File be/src/service/frontend.cc: http://gerrit.cloudera.org:8080/#/c/8202/12/be/src/service/frontend.cc@253 PS12, Line 253: JniLocalFrame jni_frame; This JNI frame stuff is not needed here because we are not creating any local references. The same is true for SetCatalogInitialized() - frame stuff not needed. http://gerrit.cloudera.org:8080/#/c/8202/12/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8202/12/be/src/service/impala-server.cc@2045 PS12, Line 2045: if (thrift_be_server_.get()) { I wonder if this can cause problems. A coordinator+executor impalad might have registered with the statestore but not received the initial catalog update yet. This will not stop other coordinators from scheduling work on this impalad - but the InternalService port has not yet been opened, meaning those queries will fail. I think we need to strictly distinguish between the internal and client facing services. http://gerrit.cloudera.org:8080/#/c/8202/12/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/8202/12/bin/start-impala-cluster.py@81 PS12, Line 81: parser.add_option("--disable_catalog", dest="disable_catalog", default=False, disable_catalogd? We seem to be consistently referring to processes with their "d" suffix http://gerrit.cloudera.org:8080/#/c/8202/12/bin/start-impala-cluster.py@255 PS12, Line 255: def is_catalog_ready(impala_cluster): is_catalogd_up()? To distinguish the other "catalog ready" which refers to the local catalog cache of an impalad. http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/main/java/org/apache/impala/service/Frontend.java@865 PS12, Line 865: LOG.info("Waiting for local catalog to be initialized."); Users may not know what "initialized" means, better to state explicitly what we are waiting for, e.g.: Waiting for the first catalog update from the statestore. http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/main/java/org/apache/impala/service/Frontend.java@866 PS12, Line 866: int num_tries = 0; numTries http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/main/java/org/apache/impala/service/Frontend.java@905 PS12, Line 905: "Analyzing a query is not support when the local catalog is not initialized."); I'd chose a different phrasing because "not supported" could be misinterpreted as a missing feature or might make the user think he/she did something wrong - but hitting this is not expected and probably a bug on our side. How about: "Local catalog has not been initialized. Aborting query analysis." http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/main/java/org/apache/impala/service/JniFrontend.java File fe/src/main/java/org/apache/impala/service/JniFrontend.java: http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/main/java/org/apache/impala/service/JniFrontend.java@583 PS12, Line 583: public void setCatalogInitialized() { setCatalogIsReady() for consistency http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/test/java/org/apache/impala/service/FrontendTest.java File fe/src/test/java/org/apache/impala/service/FrontendTest.java: http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/test/java/org/apache/impala/service/FrontendTest.java@a69 PS12, Line 69: Can you think of a reason for keeping this test around? The interesting case was the "not ready" case which is now gone, so I'm not sure this test makes sense anymore. -- To view, visit http://gerrit.cloudera.org:8080/8202 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Gerrit-Change-Number: 8202 Gerrit-PatchSet: 12 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6004: Fix test row filters failure on ASAN
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8358 ) Change subject: IMPALA-6004: Fix test_row_filters failure on ASAN .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8358/1/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test File testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test: http://gerrit.cloudera.org:8080/#/c/8358/1/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test@350 PS1, Line 350: SET RUNTIME_FILTER_WAIT_TIME_MS=10; Do you think it makes sense to have different timeouts for each different test case in this file? If not, we should consider using the same timeout everywhere in this test and set the query option based on solution like this: https://gerrit.cloudera.org/#/c/8357 -- To view, visit http://gerrit.cloudera.org:8080/8358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia098735594b36a72f02bf7edd051171689618051 Gerrit-Change-Number: 8358 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Comment-Date: Mon, 23 Oct 2017 18:58:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6060: Check the return value of JNI exception handling functions
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8334 ) Change subject: IMPALA-6060: Check the return value of JNI exception handling functions .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc File be/src/util/jni-util.cc: http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc@193 PS1, Line 193: auto oom_msg_template = "$0 throws an unchecked throwable. JVM likely runs out of " This case of auto obscures the code. Might not be clear to everybody that this is a const char* and not a std::string. How about this err message: "$0 threw an unchecked exception. The JVM is likely out of memory (OOM)." http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc@197 PS1, Line 197: if (msg == nullptr) { > RETURN_ERROR_IF_EXC will call GetJniExceptionMsg again, which will result i You are right that RETURN_ERROR_IF_EXC() does not make sense here. I stand by my main point. We need to check for exceptions after every JNI function call, that's the standard practice and we should simply follow it here. Yes, checking for msg == nullptr might lead to the same result in practice, but that check is unnecessarily indirect. The right way to handle this case is: if (env->ExceptionOccurred()) { env->ExceptionClear(); ... } -- To view, visit http://gerrit.cloudera.org:8080/8334 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3ed88bf8739c56a066f2402727c8204e96aa116 Gerrit-Change-Number: 8334 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Mon, 23 Oct 2017 18:28:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls. .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 9 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 23 Oct 2017 17:51:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8354 ) Change subject: IMPALA-6070: Parallel compute_table_stats.py .. Patch Set 2: Code-Review+1 Thanks for adding the logging. Let's MikeB do the final +2. -- To view, visit http://gerrit.cloudera.org:8080/8354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d Gerrit-Change-Number: 8354 Gerrit-PatchSet: 2 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 23 Oct 2017 17:46:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8354 ) Change subject: IMPALA-6070: Parallel compute_table_stats.py .. Patch Set 1: (2 comments) Nice http://gerrit.cloudera.org:8080/#/c/8354/1/tests/util/compute_table_stats.py File tests/util/compute_table_stats.py: http://gerrit.cloudera.org:8080/#/c/8354/1/tests/util/compute_table_stats.py@37 PS1, Line 37: print 'Executing: %s' % statement Do these prints come out clean or all garbled due to the parallelism? http://gerrit.cloudera.org:8080/#/c/8354/1/tests/util/compute_table_stats.py@119 PS1, Line 119: compute_stats(client_factory, db_names=db_names, please add newline to separate more cleanly -- To view, visit http://gerrit.cloudera.org:8080/8354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d Gerrit-Change-Number: 8354 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Comment-Date: Mon, 23 Oct 2017 17:09:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6068: Fix dataload for complextypes fileformat
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8350 ) Change subject: IMPALA-6068: Fix dataload for complextypes_fileformat .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8350/2/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/8350/2/testdata/bin/generate-schema-statements.py@528 PS2, Line 528: insert = eval_section(section['DEPENDENT_LOAD']) Are these two sections intended to be mutually exclusive? If so, then we should throw an error if both are specified. If not, in what order are they run? -- To view, visit http://gerrit.cloudera.org:8080/8350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 Gerrit-Change-Number: 8350 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Alex Behm Gerrit-Comment-Date: Mon, 23 Oct 2017 16:58:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 8: Code-Review+1 (1 comment) Dimitris should +2 http://gerrit.cloudera.org:8080/#/c/8235/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@122 PS8, Line 122: // Limits the number of errors logged when loading huge partitioned tables. remove "huge" -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 8 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 23 Oct 2017 16:47:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6070: Parallel data load.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8320 ) Change subject: IMPALA-6070: Parallel data load. .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac Gerrit-Change-Number: 8320 Gerrit-PatchSet: 2 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Mon, 23 Oct 2017 16:41:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@221 PS7, Line 221: // are excluded because they are considered hidden from Impala's perspecitve Shrink: // Number of hidden files excluded from file metadata loading . More details at isValidDataFile(). http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@226 PS7, Line 226: // Number of files skipped while computeing the block metadata information. Shrink: // Number of files skipped from file metadata loading because the the files have not changed since the last load. More details at hasFileChanged(). http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@248 PS7, Line 248: // If set to true, reloads the block metadata only when the files in this path reloads the file metadata (let's try to use the new "file metadata" terminology consistently) http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@249 PS7, Line 249: // has changed since last load (more details in hasFileChanged()). have changed http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@785 PS7, Line 785:* much higher throughput of RPC calls for listStatus/listFiles. Based on our ... RPC calls for listStatus/listFiles. For simplicity, the filesystem type is determined based on the table's root path and not for each partition individually. Based on our experiments, S3 showed ... http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@791 PS7, Line 791:* This method is not optimized for tables with mixed partition types (partitions mapped I don'd think we need this much detail, how about folding a simple sentence into the existing paragraph above, see suggestion above. http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@794 PS7, Line 794:* This may not work well with the mixed partition types and needs to fixed. Don't think this is needed. http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@819 PS7, Line 819: // For tables without partitions, we have no block metadata to load. remove "," http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@841 PS7, Line 841: LOG.error("Encountered an error loading block metadata for table: " + Could this flood the log when HDFS is under pressure or there is an intermittent Kerberos issue? I'm thinking we should aggregate these into a single log message. Do you think having every single path helps with supportability? My feeling is it's more important to note how many tasks failed, but I'll defer to you if you think having the paths is useful. -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Sat, 21 Oct 2017 00:12:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6803: [DOCS] Clarify scope of STRAIGHT JOIN hint
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8336 ) Change subject: IMPALA-6803: [DOCS] Clarify scope of STRAIGHT_JOIN hint .. Patch Set 4: Code-Review+2 Thanks! -- To view, visit http://gerrit.cloudera.org:8080/8336 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd8e4d8e5260f5a2b7b5c52dc3269bb373b9c76f Gerrit-Change-Number: 8336 Gerrit-PatchSet: 4 Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: John Russell Gerrit-Reviewer: Mala Chikka Kempanna Gerrit-Comment-Date: Fri, 20 Oct 2017 23:55:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6803: [DOCS] Clarify scope of STRAIGHT JOIN hint
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8336 ) Change subject: IMPALA-6803: [DOCS] Clarify scope of STRAIGHT_JOIN hint .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8336/3/docs/shared/impala_common.xml File docs/shared/impala_common.xml: http://gerrit.cloudera.org:8080/#/c/8336/3/docs/shared/impala_common.xml@2869 PS3, Line 2869: in the query block containing the hint. It does not affect the join order I suggest removing the "nested within those table references part" because WHERE-clause subqueries do not belong to "those table references" Phrasing: It does not affect the join order of any nested queries, such as views, inline views or WHERE-clause subqueries. -- To view, visit http://gerrit.cloudera.org:8080/8336 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd8e4d8e5260f5a2b7b5c52dc3269bb373b9c76f Gerrit-Change-Number: 8336 Gerrit-PatchSet: 3 Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: John Russell Gerrit-Reviewer: Mala Chikka Kempanna Gerrit-Comment-Date: Fri, 20 Oct 2017 16:27:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 ) Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq .. Patch Set 4: (3 comments) Nice cleanup! I don't have much to say - patch makes a lot of sense to me. http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h@276 PS4, Line 276: std::vector string_slot_offsets_; What's the reason for keeping a separate vector? We also have these in the scan node's tuple descriptor. http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h@341 PS4, Line 341: /// - 'copy_out_strings': if true, strings in returned tuples that pass conjuncts are copy_strings? http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/scanner-context.cc@110 PS4, Line 110: parent_->scan_node_->num_owned_io_buffers_.Add(-1); Not sure this counter is still useful. Do you think we can get rid of it? -- To view, visit http://gerrit.cloudera.org:8080/8172 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941 Gerrit-Change-Number: 8172 Gerrit-PatchSet: 4 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Oct 2017 05:37:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6060: Check the return value of JNI exception handling functions
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8334 ) Change subject: IMPALA-6060: Check the return value of JNI exception handling functions .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.h File be/src/util/jni-util.h: http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.h@159 PS1, Line 159: /// lives as long as this guard. jstr should not be null. Usually such a comment indicates a precondition, but in this case a nullptr is handled gracefully by returning a status. I think it's actually better to make it a precondition and use a DCHECK instead of returning a Status. http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc File be/src/util/jni-util.cc: http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc@30 PS1, Line 30: Status JniUtfCharGuard::create(JNIEnv *env, jstring jstr, JniUtfCharGuard *out) { fix pointer style while you are here: JNIEnv* env, etc. http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc@197 PS1, Line 197: if (msg == nullptr) { I think the real cause of the bug is here. The general rule is that you always need to check for exceptions after a JNI function call. I think we are missing a RETURN_ERROR_IF_EXC(env); at this point. This handles checked as well as unchecked Java exceptions, including OOM. http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc@208 PS1, Line 208: if (stack == nullptr) { same here, I think the right fix is to add a RETURN_ERROR_IF_EXC(env); -- To view, visit http://gerrit.cloudera.org:8080/8334 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3ed88bf8739c56a066f2402727c8204e96aa116 Gerrit-Change-Number: 8334 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Comment-Date: Fri, 20 Oct 2017 04:41:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4918: Support getting column comments via HS2
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8315 ) Change subject: IMPALA-4918: Support getting column comments via HS2 .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1d33dfd031b5344d7136695b623cec76143ada5c Gerrit-Change-Number: 8315 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 20 Oct 2017 04:11:15 + Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Clarify scope of STRAIGHT JOIN hint
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8336 ) Change subject: [DOCS] Clarify scope of STRAIGHT_JOIN hint .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/8336/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8336/1//COMMIT_MSG@7 PS1, Line 7: [DOCS] Clarify scope of STRAIGHT_JOIN hint IMPALA-6083 http://gerrit.cloudera.org:8080/#/c/8336/1/docs/shared/impala_common.xml File docs/shared/impala_common.xml: http://gerrit.cloudera.org:8080/#/c/8336/1/docs/shared/impala_common.xml@2869 PS1, Line 2869: in the topmost query block where the hint occurs. It does not affect the join order "topmost" is confusing. It is simply the query block in which the hint appears. How about: ... affects the join order of table references in the query block containing the hint. http://gerrit.cloudera.org:8080/#/c/8336/1/docs/shared/impala_common.xml@2870 PS1, Line 2870: of any queries nested within those table references, such as subqueries or views. of any nested queries, such as views, inline views or WHERE-clause subqueries. (generally, not all nested queries are part of a table reference) http://gerrit.cloudera.org:8080/#/c/8336/1/docs/shared/impala_common.xml@2872 PS1, Line 2872: whichever top-level or nested queries need a specific join order. apply the hint to all query blocks that need a fixed join order -- To view, visit http://gerrit.cloudera.org:8080/8336 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd8e4d8e5260f5a2b7b5c52dc3269bb373b9c76f Gerrit-Change-Number: 8336 Gerrit-PatchSet: 1 Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mala Chikka Kempanna Gerrit-Comment-Date: Thu, 19 Oct 2017 19:03:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6073: Fail on misconfigured CLASSPATH.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8327 ) Change subject: IMPALA-6073: Fail on misconfigured CLASSPATH. .. Patch Set 1: (3 comments) Thanks for fixing, lgtm, only minor nits http://gerrit.cloudera.org:8080/#/c/8327/1/be/src/common/init.cc File be/src/common/init.cc: http://gerrit.cloudera.org:8080/#/c/8327/1/be/src/common/init.cc@178 PS1, Line 178: char *classpath = getenv("CLASSPATH"); style: char* http://gerrit.cloudera.org:8080/#/c/8327/1/be/src/common/init.cc@179 PS1, Line 179: assert(classpath != NULL); Why not DCHECK and add a message http://gerrit.cloudera.org:8080/#/c/8327/1/be/src/common/init.cc@182 PS1, Line 182: assert(std::string(classpath).find("*") == std::string::npos); DCHECK and include an error message -- To view, visit http://gerrit.cloudera.org:8080/8327 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icb92fa5cad8bbc29b620bec5904e45ed7aeff13d Gerrit-Change-Number: 8327 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Comment-Date: Thu, 19 Oct 2017 03:45:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1958 PS7, Line 1958: Lists.partition(allHmsPartitionsToAdd, MAX_PARTITION_UPDATES_PER_RPC)) { While you are here, can you also change bulkAlterPartitions() to use Lists.partition()? http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@514 PS7, Line 514: def test_alter_table_create_many_partitions(self, vector, unique_database): This test doesn't belong in the TestLibCache class. Don't we have existing tests that test the multi-partition add alter? Seems like this test belongs there. http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@519 PS7, Line 519: self.client.execute('use {0}'.format(unique_database)) "use" can generally cause problems in tests because we sometimes run a test against different impalads for example to exercise sync_ddl. While I think it is ok in this specific instance, it's generally better practice not so issue "use" statements unless absolutely necessary. http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@527 PS7, Line 527: # 473 -1 0 0B NOT CACHED NOT CACHED TEXTfalse hdfs://localhost:20500/test-warehouse/test_alter_table_create_many_partitions_5b4888f5.db/t/p=473 This comment can become stale quickly. I suggest not adding it an output row verbatim but mentioning that show partitions contains the partition HDFS paths which we expect to contain p=val sub-directories that you are going to search for with the regex. -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 7 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 19 Oct 2017 03:40:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6070: Parallel data load.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8320 ) Change subject: IMPALA-6070: Parallel data load. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh File testdata/bin/run-hive-server.sh: http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75 PS1, Line 75: HADOOP_HEAPSIZE="1024" hive --service hiveserver2 > ${LOGDIR}/hive-server2.out 2>&1 & > People like Alex are those whom I was most concerned about, as I know he us :). I'm still using that good-old machine, mem should be fine (fingers crossed). -- To view, visit http://gerrit.cloudera.org:8080/8320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac Gerrit-Change-Number: 8320 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 19 Oct 2017 00:47:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6070: Parallel data load.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8320 ) Change subject: IMPALA-6070: Parallel data load. .. Patch Set 1: (2 comments) Changes like these tend to be slow and painful to test, so I'm in favor of not parallelizing additional things in this patch. Additional steps can be improved later. http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@33 PS1, Line 33: What testing did you do? Does the data load still run on a non-beefy local machine? http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh File testdata/bin/run-hive-server.sh: http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75 PS1, Line 75: HADOOP_HEAPSIZE="1024" hive --service hiveserver2 > ${LOGDIR}/hive-server2.out 2>&1 & > This looks like it will also increase HADOOP_HEAPSIZE when not doing a para I'd prefer to keep this change. Our Hive server tends to OOM pretty easily when doing anything non-trivial with Hive on our mini cluster. -- To view, visit http://gerrit.cloudera.org:8080/8320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac Gerrit-Change-Number: 8320 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 19 Oct 2017 00:07:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5976: Remove equivalent class computation in FE
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 ) Change subject: IMPALA-5976: Remove equivalent class computation in FE .. Patch Set 1: (47 comments) Very nice. First wave of comments. I have not exhaustively gone through everything yet since this is a very substantial patch. http://gerrit.cloudera.org:8080/#/c/8317/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8317/1//COMMIT_MSG@7 PS1, Line 7: IMPALA-5976: Remove equivalent class computation in FE Remove equivalence classes from FE http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a2017 PS1, Line 2017: Very satisfying to see this code deleted! http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a2890 PS1, Line 2890: This function was nice for debugging. Does your patch add a similar function that nicely prints the two graphs? http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1253 PS1, Line 1253: return !tids.isEmpty() && (tids.size() > 1 || isOjConjunct(e) || isFullOuterJoined(e) This new formatting is extremely hard to read, please reformat for readability. The old formatting was much better. http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1496 PS1, Line 1496: for (SlotId dst : getEquivSlots(srcSid)) { Unless we are absolutely convinced that the change is correct, I think we should try to preserve the semantics of the original code. My understanding is that the new getEquivSlots() returns slots that have a mutual value transfer with srcSid - which is different than what the original code did. http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1608 PS1, Line 1608:* For each equivalence class, adds/removes predicates from conjuncts such that it Refers to "equivalence class" again which is now an undefined term. If you want to use that term we should define what it means. Let's think about how to evolve the terminology. Based on my understanding of your patch it looks like you want to redefine "equivalence class" to be a set of slots where each pair of slots has a mutual value transfer. http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1626 PS1, Line 1626: // Equivalence classes only containing slots belonging to lhsTids. What does this mean? Is the Integer key the SCC id? http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1802 PS1, Line 1802: if (slotDesc.getId().asInt() >= g.numVertices()) continue; I've seen this check in a couple of places, would be nice if that could be handled by 'g' instead of all callsites. http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1952 PS1, Line 1952: public void computeValueTransfer() { computeValueTransferGraph http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1972 PS1, Line 1972: private void constructValueTransferFromEqPredicates(Graph g) { valueTransfers (plural) http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2047 PS1, Line 2047:* Returns the ids of slots that are in the same equivalence class as slotId Comment need updating. The concept of "equivalence class" does not exist anymore and we should not use that term. http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2055 PS1, Line 2055: for (int dst: g.getSccMembers(g.getSccId(slotId.asInt( { In terms of the g API I think callers should have to care about the existence of SCCs. Can we hide those details? http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2070 PS1, Line 2070: Collections.sort(result); Shouldn't the destinations already be sorted? http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Expr.java@702 PS1, Line 702: if (!localCmp.apply(Pair.create(this, that))) return false; It's very unfortunate that we have to create a new pair object to do a SlotRef comparison. We should try to avoid that generate an excessive number of objects. Expr.equals() really is called quite frequently and SlotRef is the most common Expr. Using a Java Comparator is not ideal because we don't provide
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 10: (27 comments) http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@455 PS10, Line 455: jbyteArray thrift_in_query_options) { Not sure what 'thrift_in_query_options' means. How about using 'tquery_options' instead? Ideally, we would not need to pass in the existing query options. http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@457 PS10, Line 457: DeserializeThriftMsg(env, thrift_in_query_options, ); Needs error handling, e.g. using THROW_IF_ERROR_RET http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@458 PS10, Line 458: const char *o = env->GetStringUTFChars(options_str, NULL); * You need to release the string UTF chars, take a look at JniUtfCharGuard in jni-util.h * style: const char* o http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@461 PS10, Line 461: TParseQueryOptionsResult result; I think it's simpler to convert all Status to an exception. That way we don't need the TParseQueryOptionsResult at all, and the error handling is consistent (always throws an exception in case of error). http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java@119 PS10, Line 119: // Create runtime filters. 'singleNodePlan' now points to the root of the distributed Comment is wrong. 'singleNodePlan' definitely does not point to the root of the distributed plan. Use rootFragment.getPlanRoot() instead http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@354 PS10, Line 354: public void addTarget(RuntimeFilterTarget target) { targets_.add(target); } add newline before http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@397 PS10, Line 397: Preconditions.checkState(maxNumFilters >= 0); Why is 0 not a valid value? http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@423 PS10, Line 423: filter.computeNdvEstimate(); My understanding is that IMPALA-3450 has been fixed and we can remove this code. http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@537 PS10, Line 537:* The assigned filters are the ones for which 'scanNode' can be used a destination can be used as a destination node http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@539 PS10, Line 539:* 1. If DISABLE_ROW_RUNTIME_FILTERING query option is set, a filter is only assigned to If the DISABLE_ROW_RUNTIME_FILTERING query option is set ... http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@541 PS10, Line 541:* 2. If RUNTIME_FILTER_MODE query option is set to LOCAL, a filter is only assigned to If the RUNTIME_FILTER_MODE query option is set ... http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@542 PS10, Line 542:*'scanNode' if the filter is local to the scan node. This doesn't really explain what the LOCAL option means. How about: ... a filter is only assigned to 'scanNode' if the filter is produced within the same fragment that contains the scan node. http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@563 PS10, Line 563: if (!isSingleNodeExec && runtimeFilterMode == TRuntimeFilterMode.LOCAL && Why the !isSingleNodeExec condition? I think that !isLocalTarget implies !isSingleNodeExec. http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@576 PS10, Line 576: static private boolean IsLocalTarget(RuntimeFilter filter, ScanNode targetNode) { isLocalTarget() http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@583 PS10, Line 583: static private boolean IsBoundByPartitionColumns(Analyzer analyzer, Expr targetExpr, isBoundByPartitionColumns()
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 6: (15 comments) http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@38 PS6, Line 38: DEFINE_int32(max_hdfs_parts_parallel_load, 5, Let's spell out "parts" into "partitions" to avoid confusion. http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@40 PS6, Line 40: "tables. Due to HDFS architectural limitations, it is unlikely to get a linear " In the sentence "Due to HDFS architectural..." what does the "it" after the comma refer to? Better to spell it out, e.g., "the response time of block metadata loading is unlikely to improve beyond 5 threads." http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@42 PS6, Line 42: DEFINE_int32(max_s3_parts_parallel_load, 20, Does ADLS fall into this or the HDFS bucket? What about other filesystems? http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@216 PS6, Line 216: private class PathStorageMdLoadingStats { Naming seems a little weird. I think "Storage" is too generic and "Block" is too specific. How about we settle on "FileMetadata", so we'd have: FileMetadataLoadStats FileMetadataLoadRequest http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@240 PS6, Line 240: // If set to true, reloads the block metadata only when the underlying file Please clarify "the underlying file" http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@413 PS6, Line 413: ++loadStats.ignoredFiles; I think we should distinguish the hidden file and already-up-to-date file cases, e.g. split ignoredFiles into 'hiddenFiles' and 'skippedFreshFiles' or similar http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@773 PS6, Line 773:* Returns the thread pool size to load the block metadata of this table. Command on how we distinguish between HDFS, S3 and other filesystems (using the table root path?), and what the behavior is for mixed-filesystem tables. http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@793 PS6, Line 793: return Math.max(Math.min(numPaths, threadPoolSize), 1); For my understanding, why is the max() needed here? Is it not be a precondition that numPaths > 0 and threadPoolSize > 0? http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@800 PS6, Line 800:* metadata is reloaded, else it is loaded from scratch. Difference between "reloaded" and "loaded from scratch" is not very clear. Maybe say "incrementally refreshed" vs. "reloaded from scratch"? http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@804 PS6, Line 804: int numPathsToLoad = partsByPath.keySet().size(); partsByPath.size()? http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@812 PS6, Line 812: for (Path p: partsByPath.keySet()) { The tasks are ordered randomly. I wonder if submitting the tasks in a clustered fashion would be better or worse. Not relevant to this patch, but might be interesting as a follow-on investigation. http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@821 PS6, Line 821: } catch (ExecutionException | InterruptedException e) { We still consider the load successful if one of these fails. What do you think the right behavior should be? Is a partially loaded table an acceptable outcome? http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@822 PS6, Line 822: LOG.error("Encountered an error loading block metadata for table: " + Let's also dump the path http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@898 PS6, Line 898:* the newly created HdfsPartitions in parallel. remove "the" http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/util/ListMap.java File fe/src/main/java/org/apache/impala/util/ListMap.java: http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/util/ListMap.java@38 PS6, Line 38: private List list_ = Collections.synchronizedList(new ArrayList()); The CC requirements and behavior are not clear to me. Why is it not sufficient to make all methods synchronized? -- To
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 15: Csaba, are those failing tests specifically require a timestamp that issues such a warning? If not, I suggest we just change the timestamps of those tests in a way that preserves test coverage and avoids the warnings problem. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 13 Oct 2017 18:16:10 + Gerrit-HasComments: No
[Impala-ASF-CR] Download toolchain in parallel.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8237 ) Change subject: Download toolchain in parallel. .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46a6088bb002402c7653dbc8257dff869afb26ec Gerrit-Change-Number: 8237 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 09 Oct 2017 21:17:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8216 ) Change subject: IMPALA-6011: Remove use of Guava Hasher. .. Patch Set 2: Afaik these hashes are stored in Navigator and used for finding exact duplicates or something like that. I would need to confirm how these hashes are used. The safer route for now is to not change the hashes to avoid doing more damage. -- To view, visit http://gerrit.cloudera.org:8080/8216 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I04bb436eb93a86b8db24b2a97f14047f828b Gerrit-Change-Number: 8216 Gerrit-PatchSet: 2 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 09 Oct 2017 18:03:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2636: HS2 GetTables() returns TABLE TYPE as TABLE for VIEW
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7353 ) Change subject: IMPALA-2636: HS2 GetTables() returns TABLE_TYPE as TABLE for VIEW .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90616388e6181cf342b3de389af940214ed46428 Gerrit-Change-Number: 7353 Gerrit-PatchSet: 4 Gerrit-Owner: sandeep akinapelliGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: sandeep akinapelli Gerrit-Comment-Date: Fri, 06 Oct 2017 23:07:52 + Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 ) Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS .. Patch Set 5: Code-Review+2 I'm still working with Bharath on getting diagnostics for the table sizes, but this patch looks good. -- To view, visit http://gerrit.cloudera.org:8080/7999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03 Gerrit-Change-Number: 7999 Gerrit-PatchSet: 5 Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 06 Oct 2017 23:07:13 + Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 ) Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml File docs/shared/impala_common.xml: http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml@1227 PS4, Line 1227: COMPUTE INCREMENTAL STATS during the lifetime of a table, (or vice versa) http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml@1243 PS4, Line 1243: be cached on every impalad host that is eligible to be a coordinator. as it must be cached on the catalogd and on every ... http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml@1244 PS4, Line 1244: If this metadata for a table exceeds 2 GB, you might experience service downtime. It's worse than that. If the aggregate metadata of *all* tables combined gets to 2GB you may experience downtime. http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_partitioning.xml File docs/topics/impala_partitioning.xml: http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_partitioning.xml@612 PS4, Line 612: as new partitions are added, Impala includes a variation of this statement that is intended for use with How about: includes a variation of this statement that allows computing statistics on a per-partition basis such that stats can be incrementally updated when new partitions are added. http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_perf_stats.xml File docs/topics/impala_perf_stats.xml: http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_perf_stats.xml@361 PS4, Line 361: COMPUTE STATS statement might take hours, or even days. For such tables, use Sorry, I disagree with the "For such tables, use COMPUTE INCREMENTAL STATS" part. I think we need to be very careful about recommending incremental stats. We can document what it does, but I think we should go out of out way to not explicitly recommend it for any reason. -- To view, visit http://gerrit.cloudera.org:8080/7999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03 Gerrit-Change-Number: 7999 Gerrit-PatchSet: 4 Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 06 Oct 2017 21:53:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 ) Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml File docs/shared/impala_common.xml: http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247 PS2, Line 1247: does not affect > fair pointer for me, but my comment is about whether this wording is clear Fine with me to expand this to add my earlier explanation of what the "incremental" part of the stats actually is (to explain why dropping the "incremental" portion is fine). http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248 PS2, Line 1248: optimizations such as partition pruning. > such as partition pruning or join ordering. Actually I would remove partition pruning because stats have nothing to do with that (and we don't want to imply that they do) -- To view, visit http://gerrit.cloudera.org:8080/7999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03 Gerrit-Change-Number: 7999 Gerrit-PatchSet: 2 Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 06 Oct 2017 17:46:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 ) Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml File docs/shared/impala_common.xml: http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1228 PS2, Line 1228: DROP STATS and : DROP INCREMENTAL STATS) > are these drops required? They are not required if you *exactly* what you are doing, but that does not apply to most people. Dropping first is always safe and definitely the recommended practice. If you do not drop and switch back-and-forth between compute stats and compute incremental sats, then you may end up with "unexpected" table metadata. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243 PS2, Line 1243: be cached on every impalad host. If this metadata for a table exceeds > more specifically, impalads that are also coordinators? Yes http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243 PS2, Line 1243: metadata for a table exceeds : 2 GB > is there a diagnostic page that we can point to here that explains how to f This is a longer story. I asked Bharath to help here. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247 PS2, Line 1247: does not affect > does that mean lack of stats has not affect on optimization or something el Please see my explanation on what "incremental" stats is in previous patch sets. Dropping *incremental* stats has no effect on the row counts and NDVs used in query optimization. -- To view, visit http://gerrit.cloudera.org:8080/7999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03 Gerrit-Change-Number: 7999 Gerrit-PatchSet: 2 Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Silvius Rus Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 06 Oct 2017 17:34:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6021: Revert "IMPALA-6009: Upgrade Guava to 14.0.1"
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8225 ) Change subject: IMPALA-6021: Revert "IMPALA-6009: Upgrade Guava to 14.0.1" .. Patch Set 1: Code-Review+2 Lesson learned: Other components should revert immediately. PhilZ is right of course :) -- To view, visit http://gerrit.cloudera.org:8080/8225 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd07d9b011ebd407d7a274725a10b6371d024186 Gerrit-Change-Number: 8225 Gerrit-PatchSet: 1 Gerrit-Owner: Michael BrownGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 06 Oct 2017 16:02:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8191 ) Change subject: IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8191 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d Gerrit-Change-Number: 8191 Gerrit-PatchSet: 4 Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 06 Oct 2017 15:21:55 + Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 ) Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml File docs/shared/impala_common.xml: http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1224 PS2, Line 1224: For a particular table, use either COMPUTE STATS or Yes! http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1233 PS2, Line 1233: When you run COMPUTE INCREMENTAL STATS on a table for the first time, I suggest some minor rephrasing to drive home the "don't switch mantra" a little more, see comments. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1234 PS2, Line 1234: the statistics are computed again from scratch regardless of whether you previously ran regardless of whether the table has existing stats. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1236 PS2, Line 1236: for scanning the entire table when switching from COMPUTE STATS to when running COMPUTE INCREMENTAL STATS for the first time on a given table. (do not mention switching... not supposed to do that) http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1244 PS2, Line 1244: 2 GB, a serious error can occur. If only a limited number of partitions are actively being If the aggregate metadata of all tables exceeds 2 GB you may experience service downtime (daemon crashes). ("serious error" really isn't clear to me) http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1245 PS2, Line 1245: added or inserted into, you can run COMPUTE INCREMENTAL STATS for the active Sorry my phrasing might have been misleading. By "active" partitions I meant those partitions that are being queried (i.e. read)... if you query some partitions very infrequently then there is no point in keeping incremental stats for them. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248 PS2, Line 1248: optimizations such as partition pruning. such as partition pruning or join ordering. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml File docs/topics/impala_partitioning.xml: http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@624 PS2, Line 624: subset of partitions rather than the entire table. The incremental nature makes it suitable for large tables Need to be careful here because "large tables" could be misinterpreted to mean "tables with many partitions". I'd prefer to avoid the word "suitable" and instead use a phrasing that states it enables updating the stats as partitions are added. Whether incremental stats is "suitable" for anything is questionable because of the huge memory downside. I'd agree that incremental stats could be suitable in situations where you have a huge partitioned table with a small rolling window of "active" partitions, so you only ever need to keep incremental stats on let's say <100 partitions. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml File docs/topics/impala_perf_stats.xml: http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361 PS2, Line 361: COMPUTE STATS statement might take hours, or even days. That situation is where you switch Rephrase to avoid "switch" since switching is bad -- To view, visit http://gerrit.cloudera.org:8080/7999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03 Gerrit-Change-Number: 7999 Gerrit-PatchSet: 2 Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Silvius Rus Gerrit-Comment-Date: Fri, 06 Oct 2017 06:43:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2190: [DOCS] from timestamp() and to timestamp()
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8046 ) Change subject: IMPALA-2190: [DOCS] from_timestamp() and to_timestamp() .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8046 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5f6dfd3fb99a70975d712bbef6c05900eddadd27 Gerrit-Change-Number: 8046 Gerrit-PatchSet: 3 Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Andre Calfa Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Comment-Date: Fri, 06 Oct 2017 06:16:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.
Alex Behm has abandoned this change. ( http://gerrit.cloudera.org:8080/8216 ) Change subject: IMPALA-6011: Remove use of Guava Hasher. .. Abandoned Guava does some weird stuff internally such that the hash it produces for the same query is different than the hash produced by the code in this patch. Probably safer not to pursue this. We can probably fix it, but I imagine the new code will be more complex/controversial, so I don't think it's worth the hassle Sorry for the noise. -- To view, visit http://gerrit.cloudera.org:8080/8216 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I04bb436eb93a86b8db24b2a97f14047f828b Gerrit-Change-Number: 8216 Gerrit-PatchSet: 2 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong