[Impala-ASF-CR] IMPALA-9778: Refactor HdfsPartition to be immutable
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15985 ) Change subject: IMPALA-9778: Refactor HdfsPartition to be immutable .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/15985/9/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/15985/9/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@86 PS9, Line 86: public class HdfsPartition implements FeFsPartition, PrunablePartition { is it possible to annotate tihs as @Immutable? I think that'll trigger error-prone to check to make sure it truly is immutable -- To view, visit http://gerrit.cloudera.org:8080/15985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib52e5810d01d5e0c910daacb9c98977426d3914c Gerrit-Change-Number: 15985 Gerrit-PatchSet: 10 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 05 Jun 2020 14:52:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: Sorry for repeated comments, but one more thought occurred me to me: the most effective case of min-max filter is when min == max, and thus the filter becomes an equality. An equality predicate can then serve to prune Kudu hash partitions, which isn't doable with bloom or range predicates. -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 10 Apr 2020 04:12:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124 PS3, Line 124: BLOOM_MIN_MAX = 2 > Right, sortedness in kudu is based on the (compound) primary key. So, min-m BTW forgot that I took some measurements of the ApplyPredicatePrimitive performance bfeore: int8 NOT NULL (c >= 0 AND c < 2) 3152.2M evals/sec 0.88 cycles/eval int16 NOT NULL (c >= 0 AND c < 2) 3309.6M evals/sec 0.85 cycles/eval int32 NOT NULL (c >= 0 AND c < 2) 3384.0M evals/sec 0.85 cycles/eval int64 NOT NULL (c >= 0 AND c < 2) 1847.6M evals/sec 1.57 cycles/eval float NOT NULL (c >= 0 AND c < 2) 3268.3M evals/sec 0.88 cycles/eval double NOT NULL (c >= 0 AND c < 2) 2245.2M evals/sec 1.27 cycles/eval -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 10 Apr 2020 03:56:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 ) Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124 PS3, Line 124: BLOOM_MIN_MAX = 2 > This is my understanding, I'll check further within Kudu team. Right, sortedness in kudu is based on the (compound) primary key. So, min-max predicates can help in the following ways: (1) eliminate range partitions -- eg a min/max on a timestamp could would help a lot if the table is timestamp-partitioned by day, whereas a bloom filter on timestamp is probably not useful since cardinality is very high (2) convert to primary key-based range scans -- if we have inequality/equality predicates on a prefix of the primary key, we can convert those predicates into a range scan on the table using b-tree style indexes. Again if you had a min-max on timestamp and timestamp were the leading portion of your key, it's a big win. Or, if you had a min-max on timestamp, a composite primary key of (entity_id, timestamp), and an equality predicate on entity_id, it would still do the PK range scan conversion for a big win. (3) normal row-by-row evaluation - this is still useful since, particularly for primitive columns, it's very fast to evaluate (SIMD-accelerated in latest release, etc). It's likely much faster to evaluate than a bloom filter which will probably have some frequency of L1 cache misses if not L2/LLC, plus the hash calculation and mixing itself which likely takes a couple cycles. So, you're right that it may not eliminate much data in some queries, but the cost might be small enough that it's worth doing even if it eliminates 10-20%. To validate this I loaded 800M rows into a table using 'kudu perf loadgen' and then compared scanning with $ :./build/latest/bin/kudu perf table_scan localhost default.loadgen_auto_0b01a9eba6d54f2ba7fa5b59e3a597c5 --columns=int_val --num-threads=10 --predicates ["AND", [">=", "int_val", 0]] vs the same without the predicate. In this case the predicate matches all rows, but there's no special short-circuit logic, so it actually evaluates it everywhere. Wall clock times: with the predicate: 1.7023 +- 0.0266 seconds time elapsed ( +- 1.56% ) without the predicate: 1.6514 +- 0.0370 seconds time elapsed ( +- 2.24% ) Also running perf record -a, I can see that about 5% of the total CPU was used on the tserver in ApplyPredicatePrimitive. Admittedly, this overhead will go up on a percentage basis after some other in-flight work we have is finished (eg more CPU-efficient integer encoding bankim's working on) but I think we also have more room to optimize ApplyPredicatePrimitive by using AVX2 instead of just SSE4 instructions when available. As for the overhead of sending to the coordinator and broadcasting, isn't this a _tiny_ amount of data? ie it's only two values, which for primitives is going to be a max of 16 bytes each, and for varlen, you could feasibly truncate to a fixed size bound even in the case of a long string. So, I'm generally in favor of sending and evaluating both. In the best case it's a huge win (eliminating range partitions) and in the worst case it's probably not a major cost. One further thought here as a future extension: we could provide an API in the Kudu scanner that says "this predicate is optional: drop it if you think it's not filtering much". For the case of these propagated predicates feeding into a join, dropping them at runtime if they are ineffective is worthwhile (I think Impala already does something like this in its scanner path, right?) -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 10 Apr 2020 03:49:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP: Asynchronous code generation
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15105 ) Change subject: WIP: Asynchronous code generation .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/15105/9/be/src/codegen/codegen-fn-ptr.h File be/src/codegen/codegen-fn-ptr.h: http://gerrit.cloudera.org:8080/#/c/15105/9/be/src/codegen/codegen-fn-ptr.h@41 PS9, Line 41: static constexpr std::memory_order mem_order_ = std::memory_order_acq_rel; just passing through: I think you have it correct here. Relaxed is insuffiicent because it would permit reordering of the writing of the code and the storing of the pointer. In other words, your compilation thread does things like: 1. allocate memory for function machine code 2. write code to allocated memory 3. publish pointer and without "release" semantics on the publishing of the pointer, there is no guarantee that #2 and #3 won't be reordered. In practice, this is really unlikely to happen, because there are probably some other locking operations in between #2 and #3 and because x86 is a TSO architecture, but worth doing it right regardless. Also worth noting that acq/rel semantics on x86 isn't expensive, they are just simple load/store operations. It only prevents compiler reordering. So, there is little to gain by using relaxed. -- To view, visit http://gerrit.cloudera.org:8080/15105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7cbfa7c6734dcf03641629429057d6a4194aa6b Gerrit-Change-Number: 15105 Gerrit-PatchSet: 9 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Feb 2020 21:18:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9116: KUDU-2989. Work around SASL bug when FQDN is >d characters
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/14614 ) Change subject: IMPALA-9116: KUDU-2989. Work around SASL bug when FQDN is >=64 characters .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4898814f2f7ab87151798336414dde7078d28a4a Gerrit-Change-Number: 14614 Gerrit-PatchSet: 1 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 01 Nov 2019 17:17:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7506: support global INVALIDATE METADATA in local catalog mode
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/14307 ) Change subject: IMPALA-7506: support global INVALIDATE METADATA in local catalog mode .. Patch Set 8: (3 comments) I generally like this solution -- seems simpler than trying to track the full contents of the cache. Is it possible to use this solution for catalog v1 as well, and remove the CatalogObjectVersionSet thing? I think that thing is buggy/complex, maybe would be nice to just have one implementation to worry about. (ok to do that in a follow-up if it's not trivial, or if you think it's a risky change, maybe we shouldn't do it at all) http://gerrit.cloudera.org:8080/#/c/14307/8/be/src/util/impalad-metrics.h File be/src/util/impalad-metrics.h: http://gerrit.cloudera.org:8080/#/c/14307/8/be/src/util/impalad-metrics.h@119 PS8, Line 119: /// Minimal catalog object version in local catalog cache of Coordinator. this isn't actually true -- it's no longer the minimum, but rather a lower bound, right? ie it's not a tight bound like it used to be in v1. We should probably rename the metric, variables, thrift fields, etc, to reflect that. http://gerrit.cloudera.org:8080/#/c/14307/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/14307/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1587 PS8, Line 1587: lastResetStartVersion_ = startVersion; What if there are two concurrent calls to INVALIDATE METADATA? is it possible that the lastResetStartVersion_ will end up going downward, because the setting of startVersion is under a different lock acquisition from the version writelock above? http://gerrit.cloudera.org:8080/#/c/14307/8/tests/authorization/test_grant_revoke.py File tests/authorization/test_grant_revoke.py: http://gerrit.cloudera.org:8080/#/c/14307/8/tests/authorization/test_grant_revoke.py@364 PS8, Line 364: self.client.execute("grant role {0} to group foobar".format(role_name)) why were these changes necessary? -- To view, visit http://gerrit.cloudera.org:8080/14307 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib61a7ab1ffa062620ffbc2dadc34bd7a8ca9e549 Gerrit-Change-Number: 14307 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 22 Oct 2019 23:18:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9001: Fix SPNEGO for requests with no 'Authorization'
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/14352 ) Change subject: IMPALA-9001: Fix SPNEGO for requests with no 'Authorization' .. Patch Set 1: Code-Review+2 I suppose adding a test for this with the ApacheDS KDC and HttpUrlConnection is too complicated? If so, that's fine. -- To view, visit http://gerrit.cloudera.org:8080/14352 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id9b6ac99b799324ec22e95fd1eb022d5ad6f54bd Gerrit-Change-Number: 14352 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 04 Oct 2019 16:25:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8228: Ownership support for Ranger authz
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/14106 ) Change subject: IMPALA-8228: Ownership support for Ranger authz .. Patch Set 11: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/14106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I737b7164a3e7afb9996b3402e6872effd663f7b4 Gerrit-Change-Number: 14106 Gerrit-PatchSet: 11 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Austin Nobis Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 12 Sep 2019 03:32:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8228: Ownership support for Ranger authz
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/14106 ) Change subject: IMPALA-8228: Ownership support for Ranger authz .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/14106/6/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/14106/6/fe/src/main/java/org/apache/impala/service/Frontend.java@783 PS6, Line 783: TODO mind filing a JIRA and putting the number here? http://gerrit.cloudera.org:8080/#/c/14106/6/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java File fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java: http://gerrit.cloudera.org:8080/#/c/14106/6/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java@3060 PS6, Line 3060: authorize("select count(*) from functional.alltypes") nit: might make this test a little easier to read to do something like create an Map: ImmutableMap.builder() .put("select count(*) from functional.alltypes", selectError("functional.alltypes")) .put("select id from functional.alltypes", selectError("functional.alltypes")) ... .build(); and then you can loop over the queries in the before/after case below to avoid having to duplicate all the queries twice http://gerrit.cloudera.org:8080/#/c/14106/6/tests/authorization/test_ranger.py File tests/authorization/test_ranger.py: http://gerrit.cloudera.org:8080/#/c/14106/6/tests/authorization/test_ranger.py@665 PS6, Line 665: # Run show tables on the db. The resulting list should be empty. This happens : # because the created table's ownership information is not aggresively cached : # by the current Catalog implementations. Hence the analysis pass does not : # have access to the ownership information to verify if the current session : # user is actually the owner. We need to fix this by caching the HMS metadata : # more aggressively when the table loads. this is true only in localcatalog, right? otherwise the 'create table' call would have fully populated the cache rather than leaving an incomplete table? http://gerrit.cloudera.org:8080/#/c/14106/6/tests/authorization/test_ranger.py@669 PS6, Line 669: We need to fix this by caching the HMS metadata : # more aggressively when the table loads. add a TODO(IMPALA-x) with jira -- To view, visit http://gerrit.cloudera.org:8080/14106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I737b7164a3e7afb9996b3402e6872effd663f7b4 Gerrit-Change-Number: 14106 Gerrit-PatchSet: 6 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Austin Nobis Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 10 Sep 2019 23:49:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-8228: Ownership support for Ranger authz
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/14106 ) Change subject: [WIP] IMPALA-8228: Ownership support for Ranger authz .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2673 PS2, Line 2673: FeTable table = getTable(fqTableName.getDb(), fqTableName.getTbl()); we seem to have lost the "checkNotNull' here? was that intentional? http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java File fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java: http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java@138 PS2, Line 138: tableName_ this could be null in the case of non-table-specific statements, which seems like it would fail http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java@189 PS2, Line 189: dbName this should be 'database_' right? http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/authorization/AuthorizableFactory.java File fe/src/main/java/org/apache/impala/authorization/AuthorizableFactory.java: http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/authorization/AuthorizableFactory.java@35 PS2, Line 35: String ownerUser mind adding @Nullable annotations here and below, if this is allowed to be null to indicate no known owner? http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/authorization/AuthorizableTable.java File fe/src/main/java/org/apache/impala/authorization/AuthorizableTable.java: http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/authorization/AuthorizableTable.java@30 PS2, Line 30: private final String ownerUser_; how about using @Nullable on this? http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/authorization/AuthorizableTable.java@37 PS2, Line 37: ownerUser_ = ownerUser; would an empty owner string be valid? maybe we should be checking that it's not empty? http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/authorization/AuthorizableTable.java@55 PS2, Line 55: public String getOwnerUser() { return ownerUser_; } this is @Override right? http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java File fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java: http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java@71 PS2, Line 71: onTable mind giving this a more explicit name like 'onTableWithUnknownOwner' or something? think that's better than just overloading, so it's clear that when you have an owner you should use a different call. Or, get rid of this overload and explicitly pass the null owner at call sites http://gerrit.cloudera.org:8080/#/c/14106/2/fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java@74 PS2, Line 74: // TableNotFound Analsis exceptions and instead mask that as an typo -- To view, visit http://gerrit.cloudera.org:8080/14106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I737b7164a3e7afb9996b3402e6872effd663f7b4 Gerrit-Change-Number: 14106 Gerrit-PatchSet: 2 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Austin Nobis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 20 Aug 2019 17:25:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8869: Fix handling of HTTP keep-alive when returning 401
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/14076 ) Change subject: IMPALA-8869: Fix handling of HTTP keep-alive when returning 401 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d5f80dbcde5b623a1d0586b5d763a062dd21afa Gerrit-Change-Number: 14076 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 20 Aug 2019 02:53:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13932 ) Change subject: IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor .. Patch Set 2: I'm out the next two weeks on PTO, so probably best if someone else reviews this -- To view, visit http://gerrit.cloudera.org:8080/13932 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce Gerrit-Change-Number: 13932 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 26 Jul 2019 23:26:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8783: Add Kerberos SPNEGO support to the http hs2 server
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13918 ) Change subject: IMPALA-8783: Add Kerberos SPNEGO support to the http hs2 server .. Patch Set 7: Code-Review+2 k, seems like this is good. Just to triple check, I think you should run one manual test where you do add some code locally like: if (!token.empty()) { token[5]++; } and make sure that it doesn't allow connecting :) -- To view, visit http://gerrit.cloudera.org:8080/13918 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15d9a842ab37ebc34b9fde5917137ff2961d870a Gerrit-Change-Number: 13918 Gerrit-PatchSet: 7 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 26 Jul 2019 23:18:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8783: Add Kerberos SPNEGO support to the http hs2 server
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13918 ) Change subject: IMPALA-8783: Add Kerberos SPNEGO support to the http hs2 server .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/13918/6/be/src/transport/THttpServer.cpp File be/src/transport/THttpServer.cpp: http://gerrit.cloudera.org:8080/#/c/13918/6/be/src/transport/THttpServer.cpp@184 PS6, Line 184: if (has_ldap_ && (!got_negotiate_auth || !has_kerberos_)) { shouldn't this be conditioned on got_basic_auth? otherwise a non-authenticated request is going to count as a basic_auth_failure http://gerrit.cloudera.org:8080/#/c/13918/6/be/src/transport/THttpServer.cpp@196 PS6, Line 196: is_complete) { : if (metrics_enabled_) combine these two? http://gerrit.cloudera.org:8080/#/c/13918/6/be/src/transport/THttpServer.cpp@200 PS6, Line 200: total_negotiate_auth_failure_ shouldn't we only increment this if there was a token present? the normal flow is that someone will connect, send no auth header, and get a 401. We don't want to count this 401 as a failed authentication -- To view, visit http://gerrit.cloudera.org:8080/13918 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15d9a842ab37ebc34b9fde5917137ff2961d870a Gerrit-Change-Number: 13918 Gerrit-PatchSet: 6 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 26 Jul 2019 22:41:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8783: Add Kerberos SPNEGO support to the http hs2 server
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13918 ) Change subject: IMPALA-8783: Add Kerberos SPNEGO support to the http hs2 server .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/13918/1/be/src/transport/THttpServer.cpp File be/src/transport/THttpServer.cpp: http://gerrit.cloudera.org:8080/#/c/13918/1/be/src/transport/THttpServer.cpp@187 PS1, Line 187: bool is_complete; Similar question to the earlier comment -- isn't it possible for SpnegoStep to return "OK" but not "complete"? In that case, it would be responding with a token, and you'd want to be sending the token back with the Unauthorized response? http://gerrit.cloudera.org:8080/#/c/13918/3/be/src/transport/THttpServer.cpp File be/src/transport/THttpServer.cpp: http://gerrit.cloudera.org:8080/#/c/13918/3/be/src/transport/THttpServer.cpp@199 PS3, Line 199: call the auth function with an empty string to allow them to : // generate return headers. shouldn't they have generated the return header from the bad token? ie i htink the SPNEGO spec includes the possibility of a multi-step authentication, where the client sends a token, the server responds with a different token (but not authorized). Is that not the case? -- To view, visit http://gerrit.cloudera.org:8080/13918 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15d9a842ab37ebc34b9fde5917137ff2961d870a Gerrit-Change-Number: 13918 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 26 Jul 2019 20:39:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8783: Add Kerberos SPNEGO support to the http hs2 server
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13918 ) Change subject: IMPALA-8783: Add Kerberos SPNEGO support to the http hs2 server .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/13918/1/be/src/rpc/authentication.cc File be/src/rpc/authentication.cc: http://gerrit.cloudera.org:8080/#/c/13918/1/be/src/rpc/authentication.cc@565 PS1, Line 565: connection_context->return_headers; nit: std::move() around this http://gerrit.cloudera.org:8080/#/c/13918/1/be/src/transport/THttpServer.cpp File be/src/transport/THttpServer.cpp: http://gerrit.cloudera.org:8080/#/c/13918/1/be/src/transport/THttpServer.cpp@256 PS1, Line 256: h << "WWW-Authenticate: Negotiate" << CRLF; shouldn't we be using a Negotiate header which includes the token returned by gssapi here? I guess this will work for certain configurations where the server's initial challenge is empty, but I think there might be cases where this isn't the case. -- To view, visit http://gerrit.cloudera.org:8080/13918 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15d9a842ab37ebc34b9fde5917137ff2961d870a Gerrit-Change-Number: 13918 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 25 Jul 2019 17:47:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Support SPNEGO for Impala webserver
Hello Thomas Tauber-Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13774 to look at the new patch set (#4). Change subject: Support SPNEGO for Impala webserver .. Support SPNEGO for Impala webserver This ports over changes from kudu commit 1f291b77ef0868ac888a850678adc2d7cce65529 which implemented SPNEGO for the Kudu webserver. Unfortunately, thorough testing of this is difficult given that curl isn't currently in the toolchain. I was able to manually test this by adding a 'sleep(1000)' call into the newly added test case, then setting up $KRB5_CONFIG in my shell to point to the temporary KDC's environment, and using 'curl -u : --negotiate http://...' to authenticate. Strangely, using the version of curl on el7 didn't seem to work properly (perhaps an el7 curl bug) but using curl on my Ubuntu 18 laptop I was able to authenticate with SPNEGO. Change-Id: Ife2b04310e1571d231bf8ee1bcfd3b7afc2edd8f --- M be/src/exec/kudu-util.h M be/src/gutil/strings/escaping.cc M be/src/util/CMakeLists.txt A be/src/util/kudu-status-util.h M be/src/util/webserver-test.cc M be/src/util/webserver.cc M be/src/util/webserver.h 7 files changed, 267 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/13774/4 -- To view, visit http://gerrit.cloudera.org:8080/13774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ife2b04310e1571d231bf8ee1bcfd3b7afc2edd8f Gerrit-Change-Number: 13774 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] Support SPNEGO for Impala webserver
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13774 ) Change subject: Support SPNEGO for Impala webserver .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/13774/1/be/src/util/kudu-status-util.h File be/src/util/kudu-status-util.h: http://gerrit.cloudera.org:8080/#/c/13774/1/be/src/util/kudu-status-util.h@26 PS1, Line 26: \ > Might as well fix the formatting while you're here Done http://gerrit.cloudera.org:8080/#/c/13774/1/be/src/util/webserver.cc File be/src/util/webserver.cc: http://gerrit.cloudera.org:8080/#/c/13774/1/be/src/util/webserver.cc@203 PS1, Line 203: // is invalid, a bad Status will be returned (and the other out-parameters left untouched). > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/13774/1/be/src/util/webserver.cc@214 PS1, Line 214: RETURN_NOT_OK(kudu::gssapi::SpnegoStep(neg_token, _token_b64, _complete, authn_user)); > line too long (96 > 90) Done http://gerrit.cloudera.org:8080/#/c/13774/1/be/src/util/webserver.cc@370 PS1, Line 370: We assume that security::InitKerberosForServer() I don't think the --principal is actually necessary -- the way SPNEGO works is that the client provides a principal, and the server will look in the configured keytab for any matching principal. This is what allows users to have multiple DNS names, and clients can connect to them via any name (useful for L4 load balancing or CNAMEs to work, for example). If we don't have KRB5_KTNAME set at this point, it'll be null, and we'll get the expected bad Status below. Does that seem right? We actually do have some test case for this on the equivalent Kudu code, but it wasn't really easy to port it over to Impala's tests. > What about the opposite case - where --principal is set but > --webserver_require_spnego=false? Right now I think we just silently allow > this, but users may find it surprising that setting up kerberos doesn't > automatically secure the webserver. Maybe worth logging a warning in that > case? The problem with this is that, while SPNEGO is a standard, it's a huge pain in the ass for users to set up, and in fact I think even on secure clusters, most users don't enable SPNEGO on their web UIs. If we started WARNING about being insecure if you don't enable SPNEGO I think people will get pretty grouchy. That said, this is still useful because we'll use this to allow access via Knox Trusted Proxy support, which uses SPNEGO to authenticate the Knox service to Impala (and Knox itself gets authentication via pluggable modules, including SSO integration) -- To view, visit http://gerrit.cloudera.org:8080/13774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife2b04310e1571d231bf8ee1bcfd3b7afc2edd8f Gerrit-Change-Number: 13774 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 10 Jul 2019 04:45:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Support SPNEGO for Impala webserver
Hello Thomas Tauber-Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13774 to look at the new patch set (#2). Change subject: Support SPNEGO for Impala webserver .. Support SPNEGO for Impala webserver This ports over changes from kudu commit 1f291b77ef0868ac888a850678adc2d7cce65529 which implemented SPNEGO for the Kudu webserver. Unfortunately, thorough testing of this is difficult given that curl isn't currently in the toolchain. I was able to manually test this by adding a 'sleep(1000)' call into the newly added test case, then setting up $KRB5_CONFIG in my shell to point to the temporary KDC's environment, and using 'curl -u : --negotiate http://...' to authenticate. Strangely, using the version of curl on el7 didn't seem to work properly (perhaps an el7 curl bug) but using curl on my Ubuntu 18 laptop I was able to authenticate with SPNEGO. Change-Id: Ife2b04310e1571d231bf8ee1bcfd3b7afc2edd8f --- M be/src/exec/kudu-util.h M be/src/gutil/strings/escaping.cc M be/src/util/CMakeLists.txt A be/src/util/kudu-status-util.h M be/src/util/webserver-test.cc M be/src/util/webserver.cc M be/src/util/webserver.h 7 files changed, 265 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/13774/2 -- To view, visit http://gerrit.cloudera.org:8080/13774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ife2b04310e1571d231bf8ee1bcfd3b7afc2edd8f Gerrit-Change-Number: 13774 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13818 ) Change subject: IMPALA-8748: Always pass hostname to RpcMgr::GetProxy() .. Patch Set 1: Code-Review+2 (1 comment) +2 pending one issue with the DCHECK http://gerrit.cloudera.org:8080/#/c/13818/1/be/src/rpc/rpc-mgr.inline.h File be/src/rpc/rpc-mgr.inline.h: http://gerrit.cloudera.org:8080/#/c/13818/1/be/src/rpc/rpc-mgr.inline.h@44 PS1, Line 44: DCHECK(!IsResolvedAddress(MakeNetworkAddress(hostname))) << hostname << "\n" I don't think this DCHECK is valid -- it's OK to run things in an environment without DNS, if you give the hosts principals like impala/1.2.3.4. -- To view, visit http://gerrit.cloudera.org:8080/13818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0 Gerrit-Change-Number: 13818 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 09 Jul 2019 03:33:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13665 ) Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Gerrit-Change-Number: 13665 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 08 Jul 2019 22:52:02 + Gerrit-HasComments: No
[Impala-ASF-CR] build: use thin static archives
Hello Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13775 to look at the new patch set (#2). Change subject: build: use thin static archives .. build: use thin static archives This enables thin static archives for our internal libraries during the build. This makes linking much faster since the static archives just point to object files instead of copying them. This reduces the size of intermediate '.a' files for my debug build from about 1.4GB to 58MB. Incremental rebuilds are slightly sped up, though maybe in the realm of noise (likely the performance improvement depends on how much RAM is available for buffer caching the IO): Without patch incremental build of catalogd after modifying CatalogObjects.thrift: real0m53.433s user7m6.400s sys 1m21.610s With patch: real0m44.772s user7m6.632s sys 1m16.870s Change-Id: I3b54b5be8658c951914758c406cca55d4cc1756e --- M CMakeLists.txt 1 file changed, 9 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/13775/2 -- To view, visit http://gerrit.cloudera.org:8080/13775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3b54b5be8658c951914758c406cca55d4cc1756e Gerrit-Change-Number: 13775 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] build: use thin static archives
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13775 ) Change subject: build: use thin static archives .. Patch Set 1: > Patch Set 1: > > I tried this out and I couldn't tell a difference when doing buildall.sh > -skiptests. Can you include a measurement of what got faster? hmm, it should make the generated .a files much smaller and thus link faster. Let me run some tests, maybe I put this in the wrong spot in the CMakeLists.txt file (I copied it from Kudu where we saw a speedup but will admit I didn't verify it in Impala) -- To view, visit http://gerrit.cloudera.org:8080/13775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3b54b5be8658c951914758c406cca55d4cc1756e Gerrit-Change-Number: 13775 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 02 Jul 2019 19:17:45 + Gerrit-HasComments: No
[Impala-ASF-CR] Support SPNEGO for Impala webserver
Todd Lipcon has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13774 Change subject: Support SPNEGO for Impala webserver .. Support SPNEGO for Impala webserver This ports over changes from kudu commit 1f291b77ef0868ac888a850678adc2d7cce65529 which implemented SPNEGO for the Kudu webserver. Unfortunately, thorough testing of this is difficult given that curl isn't currently in the toolchain. I was able to manually test this by adding a 'sleep(1000)' call into the newly added test case, then setting up $KRB5_CONFIG in my shell to point to the temporary KDC's environment, and using 'curl -u : --negotiate http://...' to authenticate. Strangely, using the version of curl on el7 didn't seem to work properly (perhaps an el7 curl bug) but using curl on my Ubuntu 18 laptop I was able to authenticate with SPNEGO. Change-Id: Ife2b04310e1571d231bf8ee1bcfd3b7afc2edd8f --- M be/src/exec/kudu-util.h M be/src/gutil/strings/escaping.cc M be/src/util/CMakeLists.txt A be/src/util/kudu-status-util.h M be/src/util/webserver-test.cc M be/src/util/webserver.cc M be/src/util/webserver.h 7 files changed, 263 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/13774/1 -- To view, visit http://gerrit.cloudera.org:8080/13774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ife2b04310e1571d231bf8ee1bcfd3b7afc2edd8f Gerrit-Change-Number: 13774 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon
[Impala-ASF-CR] build: use thin static archives
Todd Lipcon has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13775 Change subject: build: use thin static archives .. build: use thin static archives This enables thin static archives for our internal libraries during the build. This makes linking much faster since the static archives just point to object files instead of copying them. Change-Id: I3b54b5be8658c951914758c406cca55d4cc1756e --- M CMakeLists.txt 1 file changed, 9 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/13775/1 -- To view, visit http://gerrit.cloudera.org:8080/13775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3b54b5be8658c951914758c406cca55d4cc1756e Gerrit-Change-Number: 13775 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon
[Impala-ASF-CR] IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13665 ) Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories .. Patch Set 11: (3 comments) http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@545 PS8, Line 545: :* If 'recursive' is true, all > Actually, I added this line in the doc since I noticed that the RecursingIt hrm, were the semantics already broken then? See the unit test AcidUtilsTest.testAcidStateNoBase: assertFiltering(new String[]{ "base_01.txt", "post_upgrade.txt", "delta_05_05_/", "delta_05_05_/lmn.txt", "base_10/", "delta_012_012_/", "delta_012_012_/_0", "delta_012_012_/_1"}, "", // writeIdList that accepts all transactions as valid new String[]{ // Post upgrade files are ignored if there is a valid base. "delta_012_012_/_0", "delta_012_012_/_1"}); ie here the empty base directory has important semantics. Maybe we don't have a valid e2e test case to reproduce this yet because of this Hive bug: https://issues.apache.org/jira/browse/HIVE-21750 but it certainly seems like this is an issue. Let's file a JIRA to make sure we validate this case before releasing this feature. http://gerrit.cloudera.org:8080/#/c/13665/11/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/13665/11/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@731 PS11, Line 731: // we use listFiles API to natively recurse on S3 for performance reasons, so we Are we really gaining much from using this RecursiveIteratorWithFilter class for the S3 case now? It seems like this is getting to be a bit of spaghetti, with this being named "Recursive" but having a flag that makes it non-recursive. What about just having a separate class like 'FilteringIterator' which wraps RemoteIterator and does filtering, and keep this one as always-recursive, so the code path is easier to follow? http://gerrit.cloudera.org:8080/#/c/13665/11/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@736 PS11, Line 736: LOG.info("Ignoring {} since its contained in a ignored directory", This probably shouldn't be at INFO level -- To view, visit http://gerrit.cloudera.org:8080/13665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Gerrit-Change-Number: 13665 Gerrit-PatchSet: 11 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 01 Jul 2019 22:40:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Update squeasel to 7973705170f4744d1806e32695f7ea1e8308ee95
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13768 ) Change subject: Update squeasel to 7973705170f4744d1806e32695f7ea1e8308ee95 .. Patch Set 3: Code-Review+2 ps2 fixes the whitespace, ps3 is just a rebase. Carrying +2 -- To view, visit http://gerrit.cloudera.org:8080/13768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I17f90561e0ea6b0917fff51b055225060a4fa549 Gerrit-Change-Number: 13768 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 01 Jul 2019 21:18:23 + Gerrit-HasComments: No
[Impala-ASF-CR] Update squeasel to 7973705170f4744d1806e32695f7ea1e8308ee95
Hello Michael Ho, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13768 to look at the new patch set (#2). Change subject: Update squeasel to 7973705170f4744d1806e32695f7ea1e8308ee95 .. Update squeasel to 7973705170f4744d1806e32695f7ea1e8308ee95 This updates to the latest build of squeasel, which fixes a few small issues and also improves support for keepalive. This patch doesn't itself enable keepalive, but switches to the new APIs. Change-Id: I17f90561e0ea6b0917fff51b055225060a4fa549 --- M be/src/thirdparty/squeasel/squeasel.c M be/src/thirdparty/squeasel/squeasel.h M be/src/util/webserver.cc M be/src/util/webserver.h 4 files changed, 77 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/13768/2 -- To view, visit http://gerrit.cloudera.org:8080/13768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I17f90561e0ea6b0917fff51b055225060a4fa549 Gerrit-Change-Number: 13768 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-6159 / KUDU-2192: Enable TCP keepalive for all outbound connections
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13764 ) Change subject: IMPALA-6159 / KUDU-2192: Enable TCP keepalive for all outbound connections .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13764 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa1d66d83aea1cc82d07fc6217be5fc1306695bc Gerrit-Change-Number: 13764 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 28 Jun 2019 23:42:36 + Gerrit-HasComments: No
[Impala-ASF-CR] Update squeasel to 7973705170f4744d1806e32695f7ea1e8308ee95
Hello Michael Ho, Lars Volker, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/13768 to review the following change. Change subject: Update squeasel to 7973705170f4744d1806e32695f7ea1e8308ee95 .. Update squeasel to 7973705170f4744d1806e32695f7ea1e8308ee95 This updates to the latest build of squeasel, which fixes a few small issues and also improves support for keepalive. This patch doesn't itself enable keepalive, but switches to the new APIs. Change-Id: I17f90561e0ea6b0917fff51b055225060a4fa549 --- M be/src/thirdparty/squeasel/squeasel.c M be/src/thirdparty/squeasel/squeasel.h M be/src/util/webserver.cc M be/src/util/webserver.h 4 files changed, 76 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/13768/1 -- To view, visit http://gerrit.cloudera.org:8080/13768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I17f90561e0ea6b0917fff51b055225060a4fa549 Gerrit-Change-Number: 13768 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] Update kudu/security from 9ebcb77aa911aae76c48e717af24e643cb81908d
Hello Michael Ho, Lars Volker, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/13767 to review the following change. Change subject: Update kudu/security from 9ebcb77aa911aae76c48e717af24e643cb81908d .. Update kudu/security from 9ebcb77aa911aae76c48e717af24e643cb81908d This updates the kudu security code to the latest version, which includes support for GSSAPI calls, necessary for SPNEGO. This is a straight rsync of the kudu/util source code except for some minor CMakeLists changes that were carried over from the old version. Change-Id: Ie7c91193fd49f8ca1234b23cf61fc90c1fdbe2e0 --- M be/src/kudu/security/CMakeLists.txt A be/src/kudu/security/gssapi.cc A be/src/kudu/security/gssapi.h M be/src/kudu/security/init.cc M be/src/kudu/security/openssl_util.cc M be/src/kudu/security/test/mini_kdc.cc M be/src/kudu/security/test/mini_kdc.h M be/src/kudu/security/tls_socket-test.cc M be/src/kudu/security/tls_socket.cc M be/src/kudu/security/token-test.cc M be/src/kudu/security/token.proto M be/src/kudu/security/token_signer.cc M be/src/kudu/security/token_signer.h M be/src/kudu/security/token_verifier.cc M be/src/kudu/util/test_util.cc M be/src/kudu/util/test_util.h 16 files changed, 848 insertions(+), 172 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/13767/1 -- To view, visit http://gerrit.cloudera.org:8080/13767 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie7c91193fd49f8ca1234b23cf61fc90c1fdbe2e0 Gerrit-Change-Number: 13767 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-8585: Insert data into ACID table during dataload
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13633 ) Change subject: IMPALA-8585: Insert data into ACID table during dataload .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13633/1/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/13633/1/testdata/datasets/functional/functional_schema_template.sql@2137 PS1, Line 2137: -- Compactions could be also interesting, but seem to be sporadically very slow. > can you file a JIRA for this issue that you mentioned? mind filing this JIRA and adding a TODO here? then we can commit this with the workaround for now but also track the fix -- To view, visit http://gerrit.cloudera.org:8080/13633 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id459519d5b963ea760c44719c12736bf104938f9 Gerrit-Change-Number: 13633 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 27 Jun 2019 15:25:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13665 ) Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories .. Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@541 PS8, Line 541: temporary directories the code now seems to also skip any files that match this pattern, not just directories. I think that's probably OK but we should be clear on the semantics. http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@545 PS8, Line 545: all underlying files (except which are :* in the ignored directories) does this mean we no longer yield directories, and only yield files? does this not break the case of an empty base data dir, which actually has semantic value? http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@568 PS8, Line 568: isS3AFileSystem(p) why'd you change from isS3AFileSystem(fs) to isS3AFileSystem(p)? In the case that 'p' isn't a fully qualified path, this will check the wrong filesystem, and we don't have any preconditions checking that 'p' must be fully qualified. http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@568 PS8, Line 568: isS3AFileSystem perhaps we can add a @VisibleForTesting way we can make this path get used even for non-S3, so we can get test coverage here? http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@626 PS8, Line 626: LISTING_TYPE style nit: enums should be named like classes (ListingType) http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@699 PS8, Line 699: // if the current file is on a ignored path return early : if (isIgnoredPath(fileStatus)) return; how does this prevent recursion into tmp dirs in the recursive listFiles case? It seems like this is only checking the filename (last path component) and not intermediate path components, which we'd need to be doing to filter out contents of subdirectories for the "native recursion" -- To view, visit http://gerrit.cloudera.org:8080/13665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Gerrit-Change-Number: 13665 Gerrit-PatchSet: 8 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 27 Jun 2019 06:46:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8627: re-enable catalog v2 in containers
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13708 ) Change subject: IMPALA-8627: re-enable catalog v2 in containers .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/13708/2/docker/catalogd/Dockerfile File docker/catalogd/Dockerfile: http://gerrit.cloudera.org:8080/#/c/13708/2/docker/catalogd/Dockerfile@27 PS2, Line 27: "-catalog_topic_mode=minimal", "-hms_event_polling_interval_s=1",\ > Do you think it would be useful to enable them one by one so that triage of I noticed we have some tests which try to detect whether "v2" is enabled, not specifically checking for polling vs localcatalog, and those tests fail if you enable one without the other. We may need to change those checks to be more specific to one feature or the other to do this. -- To view, visit http://gerrit.cloudera.org:8080/13708 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3b4dd7060c3977c4a943b2492008c1dd601402a2 Gerrit-Change-Number: 13708 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 24 Jun 2019 19:28:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8682: Add authorized proxy user/group test coverage with Ranger
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13679 ) Change subject: IMPALA-8682: Add authorized proxy user/group test coverage with Ranger .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6f797600720e6432b85cac8f13afe8fa5624596 Gerrit-Change-Number: 13679 Gerrit-PatchSet: 5 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 21 Jun 2019 17:39:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8682: Add authorized proxy user/group test coverage with Ranger
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13679 ) Change subject: IMPALA-8682: Add authorized proxy user/group test coverage with Ranger .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/13679/4/tests/authorization/test_authorized_proxy.py File tests/authorization/test_authorized_proxy.py: http://gerrit.cloudera.org:8080/#/c/13679/4/tests/authorization/test_authorized_proxy.py@195 PS4, Line 195: # what are these commented-out lines? http://gerrit.cloudera.org:8080/#/c/13679/4/tests/authorization/test_authorized_proxy.py@231 PS4, Line 231: # Try to user we are not authorized to delegate to. typo? -- To view, visit http://gerrit.cloudera.org:8080/13679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6f797600720e6432b85cac8f13afe8fa5624596 Gerrit-Change-Number: 13679 Gerrit-PatchSet: 4 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 20 Jun 2019 16:21:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Backport KUDU-2871 (part 1): disable TLS 1.3.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13689 ) Change subject: Backport KUDU-2871 (part 1): disable TLS 1.3. .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13689 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae77e06906e01d8442e0f767e7f920bd330cc5da Gerrit-Change-Number: 13689 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 20 Jun 2019 15:44:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7534. Handle invalidation races in CatalogdMetaProvider
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13664 ) Change subject: IMPALA-7534. Handle invalidation races in CatalogdMetaProvider .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/13664/4/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/13664/4/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@243 PS4, Line 243: final AtomicInteger piggybackSuccessCountForTests = new AtomicInteger(); > nit: final Done http://gerrit.cloudera.org:8080/#/c/13664/4/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@250 PS4, Line 250: final AtomicInteger piggybackExceptionCountForTests = new AtomicInteger(); > nit: final Done http://gerrit.cloudera.org:8080/#/c/13664/4/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@434 PS4, Line 434: returns an invalidation for the table name list > just for my sake of understanding. Is the invalidate of table name done in yea, because a table was created, the list of tables in that DB gets invalidated. -- To view, visit http://gerrit.cloudera.org:8080/13664 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I70f377db88e204825a909389f28dc3451815235c Gerrit-Change-Number: 13664 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 19 Jun 2019 21:39:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7534. Handle invalidation races in CatalogdMetaProvider
Hello Bharath Vissapragada, Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13664 to look at the new patch set (#5). Change subject: IMPALA-7534. Handle invalidation races in CatalogdMetaProvider .. IMPALA-7534. Handle invalidation races in CatalogdMetaProvider This handles a race condition in which a cache invalidation concurrent with a cache load would potentially be skipped, causing out-of-date data to persist in the cache. This would present itself as spurious "table not found" errors. A new test case triggers the issue reliably by injecting latency into the metadata fetch RPC and running DDLs concurrently on the same database across 8 threads. With the fix, the test passes reliably. Another option to fix this might have been to switch to Caffeine instead of Guava's loading cache. However, Caffeine requires Java 8, and LocalCatalog is being backported to Impala 2.x which still can run on Java 7. So, working around the Guava issue will make backporting (and future backports) easier. Change-Id: I70f377db88e204825a909389f28dc3451815235c --- M be/src/exec/catalog-op-executor.cc M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java M tests/custom_cluster/test_local_catalog.py 4 files changed, 260 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/13664/5 -- To view, visit http://gerrit.cloudera.org:8080/13664 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I70f377db88e204825a909389f28dc3451815235c Gerrit-Change-Number: 13664 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7534. Handle invalidation races in CatalogdMetaProvider
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13664 ) Change subject: IMPALA-7534. Handle invalidation races in CatalogdMetaProvider .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/13664/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/13664/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@455 PS2, Line 455: //load to be triggered, which sees the post-invalidated data in catalogd. > Yeah, I mean even a unit test would be helpful. The logic seems simple enou I verified that the new e2e test does cover these cases, but also added a specific targeted unit test along with some test-only counters. -- To view, visit http://gerrit.cloudera.org:8080/13664 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I70f377db88e204825a909389f28dc3451815235c Gerrit-Change-Number: 13664 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 18 Jun 2019 22:44:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7534. Handle invalidation races in CatalogdMetaProvider
Hello Bharath Vissapragada, Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13664 to look at the new patch set (#4). Change subject: IMPALA-7534. Handle invalidation races in CatalogdMetaProvider .. IMPALA-7534. Handle invalidation races in CatalogdMetaProvider This handles a race condition in which a cache invalidation concurrent with a cache load would potentially be skipped, causing out-of-date data to persist in the cache. This would present itself as spurious "table not found" errors. A new test case triggers the issue reliably by injecting latency into the metadata fetch RPC and running DDLs concurrently on the same database across 8 threads. With the fix, the test passes reliably. Another option to fix this might have been to switch to Caffeine instead of Guava's loading cache. However, Caffeine requires Java 8, and LocalCatalog is being backported to Impala 2.x which still can run on Java 7. So, working around the Guava issue will make backporting (and future backports) easier. Change-Id: I70f377db88e204825a909389f28dc3451815235c --- M be/src/exec/catalog-op-executor.cc M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java M tests/custom_cluster/test_local_catalog.py 4 files changed, 243 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/13664/4 -- To view, visit http://gerrit.cloudera.org:8080/13664 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I70f377db88e204825a909389f28dc3451815235c Gerrit-Change-Number: 13664 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7534. Handle invalidation races in CatalogdMetaProvider
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13664 ) Change subject: IMPALA-7534. Handle invalidation races in CatalogdMetaProvider .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/13664/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/13664/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@442 PS2, Line 442: // > We're not setting hit = true, so we're counting this case as a cache miss? yea, I was wondering whether to add another separate metric for "piggybacked hits" or something, but wasn't sure it was worth it. What do you think? http://gerrit.cloudera.org:8080/#/c/13664/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@455 PS2, Line 455: boolean hit = false; > Do we have tests for these exceptional code paths? Including the case when I'm guessing this gets covered by our stress test which does concurrent queries and invalidates. I'll double check that this is actually covered and add another stress test if not. -- To view, visit http://gerrit.cloudera.org:8080/13664 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I70f377db88e204825a909389f28dc3451815235c Gerrit-Change-Number: 13664 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 18 Jun 2019 20:43:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7534. Handle invalidation races in CatalogdMetaProvider
Hello Bharath Vissapragada, Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13664 to look at the new patch set (#3). Change subject: IMPALA-7534. Handle invalidation races in CatalogdMetaProvider .. IMPALA-7534. Handle invalidation races in CatalogdMetaProvider This handles a race condition in which a cache invalidation concurrent with a cache load would potentially be skipped, causing out-of-date data to persist in the cache. This would present itself as spurious "table not found" errors. A new test case triggers the issue reliably by injecting latency into the metadata fetch RPC and running DDLs concurrently on the same database across 8 threads. With the fix, the test passes reliably. Another option to fix this might have been to switch to Caffeine instead of Guava's loading cache. However, Caffeine requires Java 8, and LocalCatalog is being backported to Impala 2.x which still can run on Java 7. So, working around the Guava issue will make backporting (and future backports) easier. Change-Id: I70f377db88e204825a909389f28dc3451815235c --- M be/src/exec/catalog-op-executor.cc M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M tests/custom_cluster/test_local_catalog.py 3 files changed, 159 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/13664/3 -- To view, visit http://gerrit.cloudera.org:8080/13664 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I70f377db88e204825a909389f28dc3451815235c Gerrit-Change-Number: 13664 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7534. Handle invalidation races in CatalogdMetaProvider
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13664 ) Change subject: IMPALA-7534. Handle invalidation races in CatalogdMetaProvider .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/13664/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/13664/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@427 PS2, Line 427: private ValueType loadWithCaching(String itemString, > I think this could use a method comment since the logic is non-trivial (pro Ack http://gerrit.cloudera.org:8080/#/c/13664/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@435 PS2, Line 435: Object inCache = cache_.get(key, () -> f); > Just for my understanding, here we are relying on the inherent thread-safet yea, will add a comment http://gerrit.cloudera.org:8080/#/c/13664/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@573 PS2, Line 573: Uninterruptibles.sleepUninterruptibly(10, TimeUnit.MILLISECONDS); > Remove? woops! http://gerrit.cloudera.org:8080/#/c/13664/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@687 PS2, Line 687: RuntimeException > Does this mean the query is not eventually retried? Do you know if we ever interrupt threads? Otherwise I don't think this introduces any actual possibility for an exception. -- To view, visit http://gerrit.cloudera.org:8080/13664 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I70f377db88e204825a909389f28dc3451815235c Gerrit-Change-Number: 13664 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 18 Jun 2019 19:21:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8584: Add cookie support to the HTTP HS2 server
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13672 ) Change subject: IMPALA-8584: Add cookie support to the HTTP HS2 server .. Patch Set 2: (5 comments) didn't look in detail yet, a couple high-level questions first http://gerrit.cloudera.org:8080/#/c/13672/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13672/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-8584: Add cookie support to the HTTP HS2 server Some questions about the design here: - by persisting the cookies in a bimap, what happens when you have multiple clients separately authenticating as the same user? It seems like you'd end up handing the same UUID cookie to all of the remote users, and then they'd all expire at the same time, causing them to all have to go back to LDAP to reauthenticate simultaneously, which might be problematic for a large number of clients. It might be better to just use unique session IDs, rather than maintaining the reverse mapping from username->uuid. - Similar question: if Impala is behind a load balancer of some kind, when a user reconnects, they'll use the same cookie to talk to a different impalad. That other impalad won't find the cookie and will force them to re-auth each time they switch backends I guess? In typical deployments, do people reconnect frequently through the load balancer or tend to use sticky sessions? An alternative here would be to use HMAC signatures in the cookie, using a secret shared via the statestore or somesuch. Any idea what Hive does, for comparison? Would be good to document a couple of these design points in the commit message or the JIRA. http://gerrit.cloudera.org:8080/#/c/13672/2/be/src/rpc/authentication.cc File be/src/rpc/authentication.cc: http://gerrit.cloudera.org:8080/#/c/13672/2/be/src/rpc/authentication.cc@546 PS2, Line 546: random_device Any idea what the source of this entropy is? Is it subject to entropy deletion DOS attacks? (or entropy depletion associated perf problems?) http://gerrit.cloudera.org:8080/#/c/13672/2/be/src/rpc/authentication.cc@562 PS2, Line 562: cookie = Substitute("impala.hs2.auth=$0", PrintId(cookie_id)); Do we want any expiration on the cookie? Or is the default (session cookie) appropriate? Also, I would guess we want this to be an http-only and secure cookie (assuming https is enabled)? Might be some other settings that would be a good idea for security: see https://odino.org/security-hardening-http-cookies/ which has a bunch of recommendations. http://gerrit.cloudera.org:8080/#/c/13672/2/be/src/rpc/cookie-mgr.cc File be/src/rpc/cookie-mgr.cc: http://gerrit.cloudera.org:8080/#/c/13672/2/be/src/rpc/cookie-mgr.cc@61 PS2, Line 61: username_to_cookie_.emplace(std::piecewise_construct, std::forward_as_tuple(username), hrm, this is odd. YOu can't just do .emplace(username, Cookie(cookie, UnixMillis()); here? http://gerrit.cloudera.org:8080/#/c/13672/2/be/src/transport/THttpServer.cpp File be/src/transport/THttpServer.cpp: http://gerrit.cloudera.org:8080/#/c/13672/2/be/src/transport/THttpServer.cpp@104 PS2, Line 104: cookieValue_ = string(value); the Cookie header can pass multiple cookies, separated by ';'. It doesn't seem like that's getting handled here? It might also be possible to pass multiple Cookie headers, but not sure about that one. -- To view, visit http://gerrit.cloudera.org:8080/13672 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I647c06f94ef91aa3b6413e91576c4ec506ed57f4 Gerrit-Change-Number: 13672 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 18 Jun 2019 19:19:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8667. Remove --pull incremental stats flag
Hello Bharath Vissapragada, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/13671 to review the following change. Change subject: IMPALA-8667. Remove --pull_incremental_stats flag .. IMPALA-8667. Remove --pull_incremental_stats flag This flag was added as a "chicken bit" -- so we could disable the new feature if we had some problems with it. It's been out in the wild for a number of months and we haven't seen any such problems, so at this point let's stop maintaining the old code path. Change-Id: I8878fcd8a2462963c7db3183a003bb9816dda8f9 --- M be/src/common/global-flags.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M docs/topics/impala_metadata.xml M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M tests/common/custom_cluster_test_suite.py M tests/conftest.py D tests/custom_cluster/test_incremental_stats.py 15 files changed, 36 insertions(+), 180 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/13671/1 -- To view, visit http://gerrit.cloudera.org:8080/13671 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8878fcd8a2462963c7db3183a003bb9816dda8f9 Gerrit-Change-Number: 13671 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada
[Impala-ASF-CR] IMPALA-7534. Handle invalidation races in CatalogdMetaProvider
Hello Bharath Vissapragada, Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13664 to look at the new patch set (#2). Change subject: IMPALA-7534. Handle invalidation races in CatalogdMetaProvider .. IMPALA-7534. Handle invalidation races in CatalogdMetaProvider This handles a race condition in which a cache invalidation concurrent with a cache load would potentially be skipped, causing out-of-date data to persist in the cache. This would present itself as spurious "table not found" errors. A new test case triggers the issue reliably by injecting latency into the metadata fetch RPC and running DDLs concurrently on the same database across 8 threads. With the fix, the test passes reliably. Change-Id: I70f377db88e204825a909389f28dc3451815235c --- M be/src/exec/catalog-op-executor.cc M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M tests/custom_cluster/test_local_catalog.py 3 files changed, 133 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/13664/2 -- To view, visit http://gerrit.cloudera.org:8080/13664 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I70f377db88e204825a909389f28dc3451815235c Gerrit-Change-Number: 13664 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] WIP: IMPALA-7434. Handle invalidation races in CatalogdMetaProvider
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13664 ) Change subject: WIP: IMPALA-7434. Handle invalidation races in CatalogdMetaProvider .. Patch Set 1: > Patch Set 1: > > I'm trying to understand the issue here. Apart from the versioning problem, > is the thread-safety issue in loadWithCaching() described in > https://softwaremill.com/race-condition-cache-guava-caffeine/ not a problem > here? That's exactly the problem this is fixing. It ensures that an invalidate concurrent with loadWithCaching() will cause the concurrent load to _not_ write its result to the cache. In terms of multiple threads calling loadWithCaching() at the same time, I think our code appropriately ensures that the first thread who gets there inserts a Future, and the second thread piggy-backs onto that same future. -- To view, visit http://gerrit.cloudera.org:8080/13664 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I70f377db88e204825a909389f28dc3451815235c Gerrit-Change-Number: 13664 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 18 Jun 2019 17:04:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8658: Populate missing Ranger audit fields
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13601 ) Change subject: IMPALA-8658: Populate missing Ranger audit fields .. Patch Set 8: > Patch Set 8: > > > Patch Set 7: Verified-1 > > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4483/ > > InetAddresses.forName() only takes IP address and not a hostname. I'm > switching back to InetAddress.getByName(host) to get an IP address of a host. That's exactly my confusion, though -- how are we getting something other than an IP address into this field? Are we doing reverse lookups somewhere on the C++ side? I'm afraid to be doing DNS lookups in the audit path here and potentially fail or slow down queries if DNS is unreliable (hint: DNS is always unreliable) -- To view, visit http://gerrit.cloudera.org:8080/13601 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I167bc35411ad9b4164c292077ff082671967cff8 Gerrit-Change-Number: 13601 Gerrit-PatchSet: 8 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 18 Jun 2019 08:18:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. IMPALA-8542. Add an access trace for the data cache This adds a relatively simple JSON-formatted access trace for the data cache feature. Each partition stores a trace file named 'trace.txt', with each line representing a hit, miss, or store into the cache. The trace is collected using the kudu::AsyncLogger class which handles buffering and deferring the actual IO to a background thread. By default, the full cache key info is written to the trace (including the file paths), but a flag can enable anonymization (128-bit city-hashing) of the file names in case any user would like to capture a trace to be shared publically without divulging their table names. Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Reviewed-on: http://gerrit.cloudera.org:8080/13425 Tested-by: Impala Public Jenkins Reviewed-by: Michael Ho --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h 3 files changed, 297 insertions(+), 4 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Ho: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 9 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] fe: improve logging for metadata loading
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13463 ) Change subject: fe: improve logging for metadata loading .. fe: improve logging for metadata loading This annotates various catalogd-internal calls associated with refreshing and loading metadata with a 'String why' parameter, useful for logging. This can help understand why a particular piece of metadata was loaded, and in the case of REFRESH calls, who issued the original refresh. Additionally, some of the log statements were improved to add a bit of extra detail such as the list of partitions being refreshed (abbreviated) and whether or not the table schema is being refreshed. Change-Id: I4d90a5f075b05d2dc96692b3349abe35ce24b8b8 Reviewed-on: http://gerrit.cloudera.org:8080/13463 Tested-by: Impala Public Jenkins Reviewed-by: Vihang Karajgaonkar --- M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java 18 files changed, 207 insertions(+), 144 deletions(-) Approvals: Impala Public Jenkins: Verified Vihang Karajgaonkar: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/13463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I4d90a5f075b05d2dc96692b3349abe35ce24b8b8 Gerrit-Change-Number: 13463 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] WIP: IMPALA-7434. Handle invalidation races in CatalogdMetaProvider
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13664 ) Change subject: WIP: IMPALA-7434. Handle invalidation races in CatalogdMetaProvider .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/13664/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13664/1//COMMIT_MSG@7 PS1, Line 7: WIP: IMPALA-7434. Handle invalidation races in CatalogdMetaProvider oops this should say IMPALA-7534. http://gerrit.cloudera.org:8080/#/c/13664/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/13664/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@240 PS1, Line 240: 7543 also got it wrong here :) -- To view, visit http://gerrit.cloudera.org:8080/13664 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I70f377db88e204825a909389f28dc3451815235c Gerrit-Change-Number: 13664 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 18 Jun 2019 00:09:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP: IMPALA-7434. Handle invalidation races in CatalogdMetaProvider
Hello Bharath Vissapragada, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/13664 to review the following change. Change subject: WIP: IMPALA-7434. Handle invalidation races in CatalogdMetaProvider .. WIP: IMPALA-7434. Handle invalidation races in CatalogdMetaProvider This handles a race condition in which a cache invalidation concurrent with a cache load would potentially be skipped, causing out-of-date data to persist in the cache. This would present itself as spurious "table not found" errors. A new test case triggers the issue reliably by injecting latency into the metadata fetch RPC and running DDLs concurrently on the same database across 8 threads. With the fix, the test passes reliably. WIP because it probably needs a little more cleanup and testing, but posting for early review Change-Id: I70f377db88e204825a909389f28dc3451815235c --- M be/src/exec/catalog-op-executor.cc M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M tests/custom_cluster/test_local_catalog.py 3 files changed, 113 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/13664/1 -- To view, visit http://gerrit.cloudera.org:8080/13664 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I70f377db88e204825a909389f28dc3451815235c Gerrit-Change-Number: 13664 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada
[Impala-ASF-CR] IMPALA-8671: Do not re-create RangerAuthorizationChecker instance on catalog update
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13663 ) Change subject: IMPALA-8671: Do not re-create RangerAuthorizationChecker instance on catalog update .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iabd5c1b5c7cafc7a03681def0c0f6f775d690c41 Gerrit-Change-Number: 13663 Gerrit-PatchSet: 4 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 17 Jun 2019 20:33:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8671: Do not re-create RangerAuthorizationChecker instance on catalog update
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13663 ) Change subject: IMPALA-8671: Do not re-create RangerAuthorizationChecker instance on catalog update .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/13663/3/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java File fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java: http://gerrit.cloudera.org:8080/#/c/13663/3/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java@79 PS3, Line 79: // Ranger plugin. do we need to reinit the ranger plugin for explicit REFRESH AUTHORIZATION calls? http://gerrit.cloudera.org:8080/#/c/13663/3/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java@81 PS3, Line 81: authzChecker_ = new RangerAuthorizationChecker(authzConfig_); this isn't safe under concurrency unless authzChecker is volatile. Should we just initialize it in the constructor? -- To view, visit http://gerrit.cloudera.org:8080/13663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iabd5c1b5c7cafc7a03681def0c0f6f775d690c41 Gerrit-Change-Number: 13663 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 17 Jun 2019 20:15:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8671: Do not re-create RangerAuthorizationChecker instance on catalog update
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13663 ) Change subject: IMPALA-8671: Do not re-create RangerAuthorizationChecker instance on catalog update .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/13663/2/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java File fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java: http://gerrit.cloudera.org:8080/#/c/13663/2/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java@39 PS2, Line 39: authzChecker_ where does this ever get set? -- To view, visit http://gerrit.cloudera.org:8080/13663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iabd5c1b5c7cafc7a03681def0c0f6f775d690c41 Gerrit-Change-Number: 13663 Gerrit-PatchSet: 2 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 17 Jun 2019 20:09:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-8630: Hash the full path when calculating consistent remote placement
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13545 ) Change subject: [WIP] IMPALA-8630: Hash the full path when calculating consistent remote placement .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/13545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b Gerrit-Change-Number: 13545 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 17 Jun 2019 17:59:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8617: Add support for lz4 in parquet
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13582 ) Change subject: IMPALA-8617: Add support for lz4 in parquet .. Patch Set 7: Code-Review+2 looks good. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/13582 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia6850a39ef3f1e0e7ba48e08eef1d4f7cbb74d0c Gerrit-Change-Number: 13582 Gerrit-PatchSet: 7 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 17 Jun 2019 17:44:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8658: Populate missing Ranger audit fields
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13601 ) Change subject: IMPALA-8658: Populate missing Ranger audit fields .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/13601/5/fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java File fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java: http://gerrit.cloudera.org:8080/#/c/13601/5/fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java@43 PS5, Line 43:* @param sqlStmt the SQL statement to be logged for auditing nit: add a @param for sessionState? http://gerrit.cloudera.org:8080/#/c/13601/5/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java File fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java: http://gerrit.cloudera.org:8080/#/c/13601/5/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java@59 PS5, Line 59: clusterName_ = clusterName; should you Preconditions.checkNotNull() for these? or Objects.firstNonNull(clusterName, "") or something? The above comment seems to indicate that if they end up null then ranger will hit NPE http://gerrit.cloudera.org:8080/#/c/13601/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/13601/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@367 PS5, Line 367: requestingUser, ddlRequest.getHeader it seems like 'requestingUser' always is being constructed out of ddlRequest.getHeader().getRequesting_user(), right? Given that, can we just pass ddlRequest.getHeader() and not the redundant username here? http://gerrit.cloudera.org:8080/#/c/13601/5/fe/src/main/java/org/apache/impala/util/TSessionStateUtil.java File fe/src/main/java/org/apache/impala/util/TSessionStateUtil.java: http://gerrit.cloudera.org:8080/#/c/13601/5/fe/src/main/java/org/apache/impala/util/TSessionStateUtil.java@44 PS5, Line 44: InetAddresses.forString(session.getNetwork_address().getHostname()) : .getHostAddress(); hrm, now I'm wondering why we can't just use getNetowkr_address().getHostname() directly here? -- To view, visit http://gerrit.cloudera.org:8080/13601 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I167bc35411ad9b4164c292077ff082671967cff8 Gerrit-Change-Number: 13601 Gerrit-PatchSet: 5 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 17 Jun 2019 17:30:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8599: Create a Maven module for query event hook API
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13653 ) Change subject: IMPALA-8599: Create a Maven module for query event hook API .. Patch Set 2: Code-Review+2 (1 comment) Looks fine. We should probably reorganize all this stuff so that the different modules are submodules within a top-level directory with a pom to tie them together (multi-module maven project) instead of depending on each other as separate projects tied together by cmake, but I guess that's out of scope for this patch. Any interest in taking that on? http://gerrit.cloudera.org:8080/#/c/13653/2/query-event-hook-api/pom.xml File query-event-hook-api/pom.xml: http://gerrit.cloudera.org:8080/#/c/13653/2/query-event-hook-api/pom.xml@51 PS2, Line 51: : org.apache.maven.plugins : maven-compiler-pluginhttp://gerrit.cloudera.org:8080/13653 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I84c422d83c19b75c3d1d7a772b971f4f7704d44c Gerrit-Change-Number: 13653 Gerrit-PatchSet: 2 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 17 Jun 2019 17:19:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-8630: Hash the full path when calculating consistent remote placement
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13545 ) Change subject: [WIP] IMPALA-8630: Hash the full path when calculating consistent remote placement .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/13545/2/common/fbs/CatalogObjects.fbs File common/fbs/CatalogObjects.fbs: http://gerrit.cloudera.org:8080/#/c/13545/2/common/fbs/CatalogObjects.fbs@81 PS2, Line 81: // A hash of the absolute file path > I wonder if this even makes the struct bigger, given padding. Or if we can FWIW we currently use a flatbuffer 'table' instead of a 'struct' which makes all the fields optional, and ends up taking extra space since there's some kind of vtable entry for each field or something. Given that I don't think there's padding involved. But we could probably get a win by moving all of the non-optional fields into a struct -- To view, visit http://gerrit.cloudera.org:8080/13545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b Gerrit-Change-Number: 13545 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 17 Jun 2019 15:41:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-8630: Hash the full path when calculating consistent remote placement
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13545 ) Change subject: [WIP] IMPALA-8630: Hash the full path when calculating consistent remote placement .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/13545/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13545/2//COMMIT_MSG@27 PS2, Line 27: The alternative is to construct the DescriptorTbl in the scheduler the advantage of the alternative is saving some extra memory per file descriptor. The FileDescs are persistent per-file so this can add up to many MB of increased consumption on a big catalogd. That said, with the recent work to reduce consumption and add eviction on catalogd, maybe it's not such a big deal as it used to be (eg IMPALA-7406 saved 100 bytes per FileFbDesc and here we're only losing a little bit of that gain) http://gerrit.cloudera.org:8080/#/c/13545/2/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/13545/2/be/src/scheduling/scheduler.cc@801 PS2, Line 801: uint32_t hash = HashUtil::Hash(_file_split->full_path_hash, : sizeof(hdfs_file_split->full_path_hash), 0); why hash the hash instead of just making the hash here be a uint32_t directly? http://gerrit.cloudera.org:8080/#/c/13545/2/common/fbs/CatalogObjects.fbs File common/fbs/CatalogObjects.fbs: http://gerrit.cloudera.org:8080/#/c/13545/2/common/fbs/CatalogObjects.fbs@82 PS2, Line 82: int should be a uint? -- To view, visit http://gerrit.cloudera.org:8080/13545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b Gerrit-Change-Number: 13545 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 23:20:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] fe: improve logging for metadata loading
Hello Vihang Karajgaonkar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13463 to look at the new patch set (#4). Change subject: fe: improve logging for metadata loading .. fe: improve logging for metadata loading This annotates various catalogd-internal calls associated with refreshing and loading metadata with a 'String why' parameter, useful for logging. This can help understand why a particular piece of metadata was loaded, and in the case of REFRESH calls, who issued the original refresh. Additionally, some of the log statements were improved to add a bit of extra detail such as the list of partitions being refreshed (abbreviated) and whether or not the table schema is being refreshed. Change-Id: I4d90a5f075b05d2dc96692b3349abe35ce24b8b8 --- M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java 18 files changed, 207 insertions(+), 144 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/13463/4 -- To view, visit http://gerrit.cloudera.org:8080/13463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4d90a5f075b05d2dc96692b3349abe35ce24b8b8 Gerrit-Change-Number: 13463 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] fe: improve logging for metadata loading
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13463 ) Change subject: fe: improve logging for metadata loading .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/13463/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/13463/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@a948 PS3, Line 948: > having this and the one after fetchAllPartitions call seems useful to quick OK. Planning on working on a nicer patch to expose these timings as more structured information and even back to the profile of the user who submitted it, but will just add these back for now. http://gerrit.cloudera.org:8080/#/c/13463/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@103 PS3, Line 103: > nit, unnecessary new line Done -- To view, visit http://gerrit.cloudera.org:8080/13463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d90a5f075b05d2dc96692b3349abe35ce24b8b8 Gerrit-Change-Number: 13463 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 14 Jun 2019 22:58:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Hello Michael Ho, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13425 to look at the new patch set (#7). Change subject: IMPALA-8542. Add an access trace for the data cache .. IMPALA-8542. Add an access trace for the data cache This adds a relatively simple JSON-formatted access trace for the data cache feature. Each partition stores a trace file named 'trace.txt', with each line representing a hit, miss, or store into the cache. The trace is collected using the kudu::AsyncLogger class which handles buffering and deferring the actual IO to a background thread. By default, the full cache key info is written to the trace (including the file paths), but a flag can enable anonymization (128-bit city-hashing) of the file names in case any user would like to capture a trace to be shared publically without divulging their table names. Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h 3 files changed, 297 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/13425/7 -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8459. LocalCatalog: Allow dropping tables that fail to load
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13557 ) Change subject: IMPALA-8459. LocalCatalog: Allow dropping tables that fail to load .. Patch Set 3: Ran a build locally with LocalCatalog enabled. I got two failures but they were both "Exception: ("DB {0} didn't show up after {1}s", 'hms_sanity_db', 30)" which is tracked under another JIRA. -- To view, visit http://gerrit.cloudera.org:8080/13557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia85d6b10669866dcc9f2bf7385a4775e483dd6b0 Gerrit-Change-Number: 13557 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 14 Jun 2019 22:19:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Hello Michael Ho, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13425 to look at the new patch set (#4). Change subject: IMPALA-8542. Add an access trace for the data cache .. IMPALA-8542. Add an access trace for the data cache This adds a relatively simple JSON-formatted access trace for the data cache feature. Each partition stores a trace file named 'trace.txt', with each line representing a hit, miss, or store into the cache. The trace is collected using the kudu::AsyncLogger class which handles buffering and deferring the actual IO to a background thread. By default, the full cache key info is written to the trace (including the file paths), but a flag can enable anonymization (128-bit city-hashing) of the file names in case any user would like to capture a trace to be shared publically without divulging their table names. Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h 3 files changed, 296 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/13425/4 -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 5: r4 fixes the comments, r5 is a rebase (had a trivial conflict to deal with from the mtime fix adding some DCHECKs in the key encoding path) -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 00:39:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Hello Michael Ho, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13425 to look at the new patch set (#5). Change subject: IMPALA-8542. Add an access trace for the data cache .. IMPALA-8542. Add an access trace for the data cache This adds a relatively simple JSON-formatted access trace for the data cache feature. Each partition stores a trace file named 'trace.txt', with each line representing a hit, miss, or store into the cache. The trace is collected using the kudu::AsyncLogger class which handles buffering and deferring the actual IO to a background thread. By default, the full cache key info is written to the trace (including the file paths), but a flag can enable anonymization (128-bit city-hashing) of the file names in case any user would like to capture a trace to be shared publically without divulging their table names. Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h 3 files changed, 296 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/13425/5 -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@656 PS2, Line 656: // Limit the write concurrency to avoid blocking the caller (which could be calling > I think we usually go with "start the arguments on a new line indented by f Done http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@870 PS2, Line 870: "); > We usually use all-caps for constants Done http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@874 PS2, Line 874: case STORE: jw.String("S"); break; > see above, google style guide disagrees Done http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@88 PS3, Line 88: "(Advanced) Collect a trace of all lookups in the data cache."); > nit: the indent here and other places in this change seems to be different Done http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@98 PS3, Line 98: impala-cache-trace.txt > The data cache by default removes all files with names containing prefix CA it currently creates the trace file with open() flags to truncate the existing file if there is one, so it's essentially the same behavior. If a user wants to archive them before restarting, I think it's up to them. Is that what you were asking? http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@116 PS3, Line 116: if (file_) Flush(); > Does it make sense to also call file_->Close() here or does the d'tor take dtor handles it http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@148 PS3, Line 148: > nit: unnecessary blank line. Done http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@186 PS3, Line 186: unique_ptr underlying_logger_; : unique_ptr logger_; > Would be nice to briefly document these variables. Done http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@498 PS3, Line 498: if (FLAGS_data_cache_enable_tracing) { : tracer_.reset(new Tracer(path_ + "/" + TRACE_FILE_NAME)); : RETURN_IF_ERROR(tracer_->Start()); : } > I wonder how this may affect the available storage space allocated for the yea, I didn't want to add the complexity of accounting for the trace file, since as you said, this is meant for experimentation. http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@887 PS3, Line 887: char buf[BUF_LEN]; > nit: same name as 'buf' declared outside of the scope. While functionally c Done -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 00:30:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8583: Add metrics for Basic auth
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13640 ) Change subject: IMPALA-8583: Add metrics for Basic auth .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6d1b1bcc093e4b00802bc108ff4d8d4e2cdfd88f Gerrit-Change-Number: 13640 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 00:19:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8658: Populate missing Ranger audit fields
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13601 ) Change subject: IMPALA-8658: Populate missing Ranger audit fields .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/13601/3/fe/src/main/java/org/apache/impala/util/TSessionStateUtil.java File fe/src/main/java/org/apache/impala/util/TSessionStateUtil.java: http://gerrit.cloudera.org:8080/#/c/13601/3/fe/src/main/java/org/apache/impala/util/TSessionStateUtil.java@47 PS3, Line 47: return InetAddress.getByName(session.getNetwork_address().getHostname()) > Ranger wants it to be an IP address, although technically it's just a strin I guess I'm wondering: would this ever _not_ be a dotted-decimal address? I'm guessing we pick it up from the inbound socket in our thrift server. I'm afraid of accidentally doing resolution here. Maybe we should use https://google.github.io/guava/releases/20.0/api/docs/com/google/common/net/InetAddresses.html#forString-java.lang.String- here? -- To view, visit http://gerrit.cloudera.org:8080/13601 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I167bc35411ad9b4164c292077ff082671967cff8 Gerrit-Change-Number: 13601 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 00:16:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8459. LocalCatalog: Allow dropping tables that fail to load
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13557 ) Change subject: IMPALA-8459. LocalCatalog: Allow dropping tables that fail to load .. Patch Set 1: (4 comments) r2 fixes the comments. Will rebase and run a full test run locally as suggested by Tim http://gerrit.cloudera.org:8080/#/c/13557/1/fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java File fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java: http://gerrit.cloudera.org:8080/#/c/13557/1/fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java@40 PS1, Line 40: * Represents a table with incomplete metadata. The metadata may be incomplete because > yea I think this only gets used by the "v1" catalog. I'll add a comment. Done http://gerrit.cloudera.org:8080/#/c/13557/1/fe/src/main/java/org/apache/impala/catalog/local/FailedLoadLocalTable.java File fe/src/main/java/org/apache/impala/catalog/local/FailedLoadLocalTable.java: http://gerrit.cloudera.org:8080/#/c/13557/1/fe/src/main/java/org/apache/impala/catalog/local/FailedLoadLocalTable.java@33 PS1, Line 33: private TableLoadingException cause_; > nit: final? Done http://gerrit.cloudera.org:8080/#/c/13557/1/fe/src/main/java/org/apache/impala/catalog/local/FailedLoadLocalTable.java@37 PS1, Line 37: cause > nit: checkNotNull()? Done http://gerrit.cloudera.org:8080/#/c/13557/1/fe/src/main/java/org/apache/impala/service/MetadataOp.java File fe/src/main/java/org/apache/impala/service/MetadataOp.java: http://gerrit.cloudera.org:8080/#/c/13557/1/fe/src/main/java/org/apache/impala/service/MetadataOp.java@317 PS1, Line 317: || table instanceof FeIncompleteTable > Can't we just override isLoaded() in FailedLoad..Table impl to make it less the problem with making 'isLoaded()' return false in the "failed table" case is that StmtMetadataLoader will then think it has to keep trying to load the table. Agree this stuff is pretty janky, mostly because the original implementation uses IncompleteTable to mean both an unloaded placeholder table and a loaded-but-failed table. Didn't want to try to refactor all of that legacy code here. -- To view, visit http://gerrit.cloudera.org:8080/13557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia85d6b10669866dcc9f2bf7385a4775e483dd6b0 Gerrit-Change-Number: 13557 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 13 Jun 2019 20:32:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8459. LocalCatalog: Allow dropping tables that fail to load
Hello Bharath Vissapragada, Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13557 to look at the new patch set (#2). Change subject: IMPALA-8459. LocalCatalog: Allow dropping tables that fail to load .. IMPALA-8459. LocalCatalog: Allow dropping tables that fail to load This adds support in LocalCatalog for IncompleteTables. A new FeIncompleteTable interface is introduced, and when a table fails to load, it produces a FeIncompleteTable instead of an immediate error. This allows DROP TABLE statements to get past analysis and execute even when tables are in bad states (eg a Kudu table with no backing table in Kudu itself). This re-enables some tests that were previously disabled for LocalCatalog. Change-Id: Ia85d6b10669866dcc9f2bf7385a4775e483dd6b0 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java A fe/src/main/java/org/apache/impala/catalog/FeIncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java A fe/src/main/java/org/apache/impala/catalog/local/FailedLoadLocalTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java M fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java M tests/common/skip.py M tests/query_test/test_kudu.py 12 files changed, 133 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/13557/2 -- To view, visit http://gerrit.cloudera.org:8080/13557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia85d6b10669866dcc9f2bf7385a4775e483dd6b0 Gerrit-Change-Number: 13557 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] fe: improve logging for metadata loading
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13463 ) Change subject: fe: improve logging for metadata loading .. Patch Set 1: (7 comments) patch set r2 has fixes for the comments. r3 will rebase to tip of master http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1954 PS1, Line 1954: why > will do Done http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@103 PS1, Line 103: hiveexec > mistake -- eclipse loves to auto-import shaded crap :( Done http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@628 PS1, Line 628: new Exception() > yea, this was some debugging junk I left in by mistake Done http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@891 PS1, Line 891: org.apache.hadoop.hive.metastore.api.Table msTbl, String why) throws TableLoadingException { > line too long (98 > 90) Done http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@813 PS1, Line 813: insert > would be used to say processing table-level insert event from HMS for consi Done http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java: http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@170 PS1, Line 170: assertNotNull(catalog_.getOrLoadTable(hbaseDb.getName(), "alltypessmallbinary", "test")); > line too long (93 > 90) Done http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@172 PS1, Line 172: assertNotNull(catalog_.getOrLoadTable(hbaseDb.getName(), "hbasealltypeserror", "test")); > line too long (92 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/13463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d90a5f075b05d2dc96692b3349abe35ce24b8b8 Gerrit-Change-Number: 13463 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 13 Jun 2019 20:23:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] fe: improve logging for metadata loading
Hello Vihang Karajgaonkar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13463 to look at the new patch set (#2). Change subject: fe: improve logging for metadata loading .. fe: improve logging for metadata loading This annotates various catalogd-internal calls associated with refreshing and loading metadata with a 'String why' parameter, useful for logging. This can help understand why a particular piece of metadata was loaded, and in the case of REFRESH calls, who issued the original refresh. Additionally, some of the log statements were improved to add a bit of extra detail such as the list of partitions being refreshed (abbreviated) and whether or not the table schema is being refreshed. Change-Id: I4d90a5f075b05d2dc96692b3349abe35ce24b8b8 --- M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java 18 files changed, 207 insertions(+), 147 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/13463/2 -- To view, visit http://gerrit.cloudera.org:8080/13463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4d90a5f075b05d2dc96692b3349abe35ce24b8b8 Gerrit-Change-Number: 13463 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] fe: improve logging for metadata loading
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13463 ) Change subject: fe: improve logging for metadata loading .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@515 PS1, Line 515: needed to fetch partition stats > Can we create constants for these reasons. Seems they never change. Given they're only used in one place, and not 'magic numbers' that benefit from a descriptive variable name, not sure what the benefit is. Doesn't it just add another line of code and cause you to have to navigate around when reading the code? http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1954 PS1, Line 1954: why > Nit, I think have nouns as the arguments are more readable. eg: reload(Tabl will do http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@103 PS1, Line 103: hiveexec > Is there a particular reason to use the shaded version of Joiner? mistake -- eclipse loves to auto-import shaded crap :( http://gerrit.cloudera.org:8080/#/c/13463/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@628 PS1, Line 628: new Exception() > This will potentially add a lot of traces in the log. Don't we generally lo yea, this was some debugging junk I left in by mistake -- To view, visit http://gerrit.cloudera.org:8080/13463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d90a5f075b05d2dc96692b3349abe35ce24b8b8 Gerrit-Change-Number: 13463 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 13 Jun 2019 20:10:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8658: Populate missing Ranger audit fields
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13601 ) Change subject: IMPALA-8658: Populate missing Ranger audit fields .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/13601/3/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: http://gerrit.cloudera.org:8080/#/c/13601/3/common/thrift/JniCatalog.thrift@642 PS3, Line 642: 6: required string sql_stmt Can you name this 'redacted_sql_stmt' or something so that it's clear this is supposed to be post-redaction? http://gerrit.cloudera.org:8080/#/c/13601/3/common/thrift/JniCatalog.thrift@640 PS3, Line 640: : // The SQL statement to be logged. : 6: required string sql_stmt : : // The client IP address. : 7: required string client_ip What do you think about moving these to TCatalogServiceRequestHeader? They seem generally useful for logging/tracing what's going on on the catalogd for any operation, and 'client_ip' certainly seems to fit nicely with 'requesting_user' which is already present there. http://gerrit.cloudera.org:8080/#/c/13601/3/fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java File fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java: http://gerrit.cloudera.org:8080/#/c/13601/3/fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java@111 PS3, Line 111: } catch (UnknownHostException e) { : throw new AnalysisException("Unable to get the client IP address", e); seems weird that this can throw. Is it trying to reverse-lookup or something? http://gerrit.cloudera.org:8080/#/c/13601/3/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java File fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java: http://gerrit.cloudera.org:8080/#/c/13601/3/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@222 PS3, Line 222: } catch (UnknownHostException e) { : throw new InternalException("Unable to get the client IP address", e); : } similar to elsewhere, it seems like this should never happen if we just have IP addresses and not names. Perhaps we can catch it and throw as RuntimeException in getClientIpAddress()? Or we could use a different way of getting it that can't throw? http://gerrit.cloudera.org:8080/#/c/13601/3/fe/src/main/java/org/apache/impala/util/TSessionStateUtil.java File fe/src/main/java/org/apache/impala/util/TSessionStateUtil.java: http://gerrit.cloudera.org:8080/#/c/13601/3/fe/src/main/java/org/apache/impala/util/TSessionStateUtil.java@47 PS3, Line 47: return InetAddress.getByName(session.getNetwork_address().getHostname()) We could avoid the 'throws UnknownHostException' by just using String.format("%s:%d", ...) with the host and port, no? -- To view, visit http://gerrit.cloudera.org:8080/13601 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I167bc35411ad9b4164c292077ff082671967cff8 Gerrit-Change-Number: 13601 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 13 Jun 2019 20:03:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8633 : Insert event should not error when table does not exists
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13548 ) Change subject: IMPALA-8633 : Insert event should not error when table does not exists .. Patch Set 6: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/13548/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/13548/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@825 PS6, Line 825: e : . tiny nit: can you move 'e' to the next line? this breaking is pretty unnatural -- To view, visit http://gerrit.cloudera.org:8080/13548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I961cd7cbede4c248dba538c7fabb4bc708e49693 Gerrit-Change-Number: 13548 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 13 Jun 2019 19:56:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8585: Insert data into ACID table during dataload
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13633 ) Change subject: IMPALA-8585: Insert data into ACID table during dataload .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13633/1/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/13633/1/testdata/datasets/functional/functional_schema_template.sql@2137 PS1, Line 2137: -- Compactions could be also interesting, but seem to be sporadically very slow. can you file a JIRA for this issue that you mentioned? -- To view, visit http://gerrit.cloudera.org:8080/13633 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id459519d5b963ea760c44719c12736bf104938f9 Gerrit-Change-Number: 13633 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 13 Jun 2019 19:52:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8583: Add metrics for BASIC auth
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13640 ) Change subject: IMPALA-8583: Add metrics for BASIC auth .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/13640/1/be/src/rpc/auth-provider.h File be/src/rpc/auth-provider.h: http://gerrit.cloudera.org:8080/#/c/13640/1/be/src/rpc/auth-provider.h@33 PS1, Line 33: MatricGroup this doesn't seem right http://gerrit.cloudera.org:8080/#/c/13640/1/be/src/transport/THttpServer.h File be/src/transport/THttpServer.h: http://gerrit.cloudera.org:8080/#/c/13640/1/be/src/transport/THttpServer.h@84 PS1, Line 84: explicit no need for explicit on multi-arg ctor http://gerrit.cloudera.org:8080/#/c/13640/1/be/src/transport/THttpServer.cpp File be/src/transport/THttpServer.cpp: http://gerrit.cloudera.org:8080/#/c/13640/1/be/src/transport/THttpServer.cpp@152 PS1, Line 152: total_basic_auth_success_->Increment(1); semi-related to this patch: do we do any logging of the incorrect login attempts? http://gerrit.cloudera.org:8080/#/c/13640/1/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/13640/1/common/thrift/metrics.json@945 PS1, Line 945: BASIC auth in the long description perhaps we should say Basic (username/password) Auth, or something? People may not understand the underlying HTTP mechanism but would like to understand that this represents incorrect password attempts http://gerrit.cloudera.org:8080/#/c/13640/1/common/thrift/metrics.json@949 PS1, Line 949: BASIC nit: BASIC isn't an acronym so I think it should just be capitalized as "Basic" http://gerrit.cloudera.org:8080/#/c/13640/1/common/thrift/metrics.json@955 PS1, Line 955: auth similar to above, maybe say something like "due to an invalid username and password" -- To view, visit http://gerrit.cloudera.org:8080/13640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6d1b1bcc093e4b00802bc108ff4d8d4e2cdfd88f Gerrit-Change-Number: 13640 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 13 Jun 2019 19:45:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8617: Add support for lz4 in parquet
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13582 ) Change subject: IMPALA-8617: Add support for lz4 in parquet .. Patch Set 6: (8 comments) http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc File be/src/exec/parquet/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc@97 PS6, Line 97: const string& column_name style nit: take this by value, and then initialize the member variable as column_name_(std::move(column_name)) http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc@400 PS6, Line 400: const string& column_name_; nit: it's dangerous (and anti-style-guide) to store a reference as a member variable like this -- whoever passed in this string might well mutate or destroy the string leaving a dangling reference here. Typically we lean towards using pointers for cases like this, and documenting clearly in the constructor that there's a lifetime requirement for a passed-in raw pointer. In this case, though, it's not a hot path at all so I think you should just make this a 'string' vs adding lifetime dependencies http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc@773 PS6, Line 773: Parquet file '$0' column '$1' hit an error nit: change this to something like "Error writing partquet file '$0' column '$1': $2" (so it's more obvious this is on the _write_ side rather than the read side http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc@903 PS6, Line 903: return Status(Substitute("Parquet file '$0' column '$1' hit an error. $2", same http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc@997 PS6, Line 997: // Map parquet codecs to Impala codecs. Its important to do the above check before : // we do any mapping. : parquet::CompressionCodec::type parquet_codec = ConvertImpalaToParquetCodec(codec); : codec = ConvertParquetToImpalaCodec(parquet_codec); this code isn't very intuitive to me. What's going on here? We're mapping the impala codec to parquet, and then back to impala? Is this serving some kind of canonicalization? If so, can you add some comment explaining why we do this back-and-forth round-trip? http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/parquet-column-readers.cc@1346 PS6, Line 1346: Parquet file '$0' column '$1' hit an error nit: think this reads better as "Error decompressing parquet file '$0' column '$1': $2" Is the offset of the page readily available? If not, no need to plumb it through a bunch of code, but if we can get it easily, I think that would aid in debugging if we ever hit this error http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/util/compress.cc File be/src/util/compress.cc: http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/util/compress.cc@349 PS6, Line 349: int64_t length = -1; nit: we tend towards the "early return" style instead of the "one return" style. You can get rid of this local variable and just change the assignments below on L355 and L360 to return statements, and then get rid of the 'else' on line 356 and 361, for a net reduction of several lines of code http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/util/compress.cc@383 PS6, Line 383: CHECK(size > 0) You can do: CHECK_GT(size, 0) << "LZ4_compress_default() failed" which will end up including the 'size' value in the failure. That said, maybe we're better off with an if statement and return a bad Status here indicating a runtime error, since it seems you do have error checking at the call site, etc? -- To view, visit http://gerrit.cloudera.org:8080/13582 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia6850a39ef3f1e0e7ba48e08eef1d4f7cbb74d0c Gerrit-Change-Number: 13582 Gerrit-PatchSet: 6 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 13 Jun 2019 17:36:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8658: Populate missing Ranger audit fields
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13601 ) Change subject: IMPALA-8658: Populate missing Ranger audit fields .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/13601/2/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java File fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java: http://gerrit.cloudera.org:8080/#/c/13601/2/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java@117 PS2, Line 117: auditEvent.setClientIP(InetAddress.getLocalHost().getHostAddress()); this seems like the impala IP, not the client's IP. I'm not sure exactly where, but I know we have the end user IP somewhere in our session state http://gerrit.cloudera.org:8080/#/c/13601/2/fe/src/test/resources/ranger-hive-audit.xml File fe/src/test/resources/ranger-hive-audit.xml: http://gerrit.cloudera.org:8080/#/c/13601/2/fe/src/test/resources/ranger-hive-audit.xml@27 PS2, Line 27: ranger.plugin.hive.ambari.cluster.name if this parameter isn't set by the end user, do we risk an uninterpretable NPE or something? Should we be checking it somewhere during initialization to make sure it's set? -- To view, visit http://gerrit.cloudera.org:8080/13601 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I167bc35411ad9b4164c292077ff082671967cff8 Gerrit-Change-Number: 13601 Gerrit-PatchSet: 2 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 13 Jun 2019 16:03:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8654: Log the SQL statement in the Ranger audit log
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13590 ) Change subject: IMPALA-8654: Log the SQL statement in the Ranger audit log .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13590 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id9f584ac4209604675eb13b6d6f349c6cbb1a387 Gerrit-Change-Number: 13590 Gerrit-PatchSet: 4 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 13 Jun 2019 16:01:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8654: Log the SQL statement in the Ranger audit log
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13590 ) Change subject: IMPALA-8654: Log the SQL statement in the Ranger audit log .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13590 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id9f584ac4209604675eb13b6d6f349c6cbb1a387 Gerrit-Change-Number: 13590 Gerrit-PatchSet: 1 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 11 Jun 2019 23:36:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8617: Add support for lz4 in parquet
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13582 ) Change subject: IMPALA-8617: Add support for lz4 in parquet .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/13582/2/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: http://gerrit.cloudera.org:8080/#/c/13582/2/common/thrift/generate_error_codes.py@421 PS2, Line 421:"LZ4Block: Decompressed size is not correct."), > I think this is probably a result of having abort_on_error=0 by default. Fo it seems we're missing the underlying HDFS path which had the bad block, though. Is it possible to ensure that this gets propagated up? If that's a lot of scope creep for this commit maybe we can file it as a follow-up? My fear is that someone hits this issue and we can't tell them which file is broken. -- To view, visit http://gerrit.cloudera.org:8080/13582 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia6850a39ef3f1e0e7ba48e08eef1d4f7cbb74d0c Gerrit-Change-Number: 13582 Gerrit-PatchSet: 2 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 11 Jun 2019 23:08:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8649: Fix confusing SHOW GRANT error messages
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13587 ) Change subject: IMPALA-8649: Fix confusing SHOW GRANT error messages .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb88bdc19cd1223902b44e3634f756d086332266 Gerrit-Change-Number: 13587 Gerrit-PatchSet: 2 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Austin Nobis Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 11 Jun 2019 23:06:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8617: Add support for lz4 in parquet
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13582 ) Change subject: IMPALA-8617: Add support for lz4 in parquet .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/13582/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13582/2//COMMIT_MSG@21 PS2, Line 21: <4 byte big endian uncompressed size> nit: some weird formatting here and below http://gerrit.cloudera.org:8080/#/c/13582/2//COMMIT_MSG@25 PS2, Line 25: parquer typo http://gerrit.cloudera.org:8080/#/c/13582/2/be/src/util/compress.cc File be/src/util/compress.cc: http://gerrit.cloudera.org:8080/#/c/13582/2/be/src/util/compress.cc@355 PS2, Line 355: length = sizeof (int32_t); style nit: I don't think we usually have a space between sizeof and '(' http://gerrit.cloudera.org:8080/#/c/13582/2/be/src/util/compress.cc@381 PS2, Line 381: int64_t size = LZ4_compress_default(reinterpret_cast(input), lz4.h indicates that this can return 0 to indicate failure of some kind. I can't think of any reasons it would happen aside from OOM but it's probably worth a CHECK http://gerrit.cloudera.org:8080/#/c/13582/2/be/src/util/decompress-test.cc File be/src/util/decompress-test.cc: http://gerrit.cloudera.org:8080/#/c/13582/2/be/src/util/decompress-test.cc@559 PS2, Line 559: EXPECT_EQ(Ubsan::MemCmp(input_, output, input_len), 0); do we ever expect input_ or output_ to be null? if not, we can use regular memcmp here http://gerrit.cloudera.org:8080/#/c/13582/2/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: http://gerrit.cloudera.org:8080/#/c/13582/2/common/thrift/generate_error_codes.py@421 PS2, Line 421:"LZ4Block: Decompressed size is not correct."), when these errors get propagated up, do we get the context of which file had the issue? -- To view, visit http://gerrit.cloudera.org:8080/13582 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia6850a39ef3f1e0e7ba48e08eef1d4f7cbb74d0c Gerrit-Change-Number: 13582 Gerrit-PatchSet: 2 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 11 Jun 2019 20:51:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8551: Make the grant/revoke error messages to be more user friendly
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13525 ) Change subject: IMPALA-8551: Make the grant/revoke error messages to be more user friendly .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13525 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8995f5dc88b211cd3af415713802cfeac44fe576 Gerrit-Change-Number: 13525 Gerrit-PatchSet: 7 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 10 Jun 2019 19:57:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8633 : Insert event should not error when table does not exists
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13548 ) Change subject: IMPALA-8633 : Insert event should not error when table does not exists .. Patch Set 4: (1 comment) Is there any e2e test we could write to stress these paths? eg if you were to run the metastore poller with a 5 second frequency, and you did some random sequence of hive operations, would it end up triggering it? I'm wondering whether we have similar issues in other events. For example, if you ALTER TABLE followed by DROP, will Impala get desynchronized? http://gerrit.cloudera.org:8080/#/c/13548/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/13548/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@829 PS4, Line 829: } should this have an else { throw e }? -- To view, visit http://gerrit.cloudera.org:8080/13548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I961cd7cbede4c248dba538c7fabb4bc708e49693 Gerrit-Change-Number: 13548 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 10 Jun 2019 19:10:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] fe: improve logging for metadata loading
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13463 ) Change subject: fe: improve logging for metadata loading .. Patch Set 1: Vihang, any chance you can take a look at this? -- To view, visit http://gerrit.cloudera.org:8080/13463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d90a5f075b05d2dc96692b3349abe35ce24b8b8 Gerrit-Change-Number: 13463 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 10 Jun 2019 19:11:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8551: Make the grant/revoke error messages to be more user friendly
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13525 ) Change subject: IMPALA-8551: Make the grant/revoke error messages to be more user friendly .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/13525/6/tests/authorization/test_ranger.py File tests/authorization/test_ranger.py: http://gerrit.cloudera.org:8080/#/c/13525/6/tests/authorization/test_ranger.py@572 PS6, Line 572: assert "Error granting a privilege to Ranger. Ranger error message: " \ don't these asserts need to be updated? -- To view, visit http://gerrit.cloudera.org:8080/13525 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8995f5dc88b211cd3af415713802cfeac44fe576 Gerrit-Change-Number: 13525 Gerrit-PatchSet: 5 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 10 Jun 2019 19:03:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Bump CDP BUILD NUMBER to 1153860
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13574 ) Change subject: Bump CDP_BUILD_NUMBER to 1153860 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief3458c7dbf2ee973b586e2d0d54cc0739b6ae3b Gerrit-Change-Number: 13574 Gerrit-PatchSet: 1 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 10 Jun 2019 19:01:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8551: Bump CDP BUILD NUMBER to 1153860
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13525 ) Change subject: IMPALA-8551: Bump CDP_BUILD_NUMBER to 1153860 .. Patch Set 5: (2 comments) Looks pretty good modulo some nits. One thing, though: would it be possible to split this into one commit to bump the CDP_BUILD_NUMBER and one to change the error messages? I'm afraid when we look through lists of commits to backport across branches, we'll see this and think "oh it's just a CDP version bump, we can skip this one and pick up a later one" whereas in fact it also contains a functional change. http://gerrit.cloudera.org:8080/#/c/13525/5/fe/src/main/java/org/apache/impala/authorization/ranger/RangerCatalogdAuthorizationManager.java File fe/src/main/java/org/apache/impala/authorization/ranger/RangerCatalogdAuthorizationManager.java: http://gerrit.cloudera.org:8080/#/c/13525/5/fe/src/main/java/org/apache/impala/authorization/ranger/RangerCatalogdAuthorizationManager.java@173 PS5, Line 173: } catch (Exception e) { pre-existing issue, but: is it possible to add a DEBUG level log message here which includes the stack trace of the exception? I'm afraid at some point we'll trigger an NPE or something within the Ranger client code, and the excepption will get swallowed, and all we'll see is "null" in the error message. (same below) http://gerrit.cloudera.org:8080/#/c/13525/5/fe/src/main/java/org/apache/impala/authorization/ranger/RangerCatalogdAuthorizationManager.java@174 PS5, Line 174: throw new InternalException("Error granting a privilege to Ranger. " + nit: can we reword this to "in Ranger" instead of "to Ranger" -- the privilege is being granted to some target user, not to the ranger service itself. -- To view, visit http://gerrit.cloudera.org:8080/13525 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8995f5dc88b211cd3af415713802cfeac44fe576 Gerrit-Change-Number: 13525 Gerrit-PatchSet: 5 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 10 Jun 2019 15:50:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Update mustache to commit b290952d8eb93d085214d8c8c9eab8559df9f606
Todd Lipcon has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13563 Change subject: Update mustache to commit b290952d8eb93d085214d8c8c9eab8559df9f606 .. Update mustache to commit b290952d8eb93d085214d8c8c9eab8559df9f606 This fixes some issues with regard to template variable scoping that will be useful for Knox integration. Change-Id: If26f3aaa2a3279a1f6a300c4f4cee7ec899e22ed --- M be/src/thirdparty/mustache/README M be/src/thirdparty/mustache/mustache.cc M be/src/thirdparty/mustache/mustache.h M be/src/util/webserver.cc 4 files changed, 184 insertions(+), 102 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/13563/1 -- To view, visit http://gerrit.cloudera.org:8080/13563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If26f3aaa2a3279a1f6a300c4f4cee7ec899e22ed Gerrit-Change-Number: 13563 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon
[Impala-ASF-CR] IMPALA-8551: Bump CDP BUILD NUMBER to 1153860
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13525 ) Change subject: IMPALA-8551: Bump CDP_BUILD_NUMBER to 1153860 .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/13525/2/tests/authorization/test_ranger.py File tests/authorization/test_ranger.py: http://gerrit.cloudera.org:8080/#/c/13525/2/tests/authorization/test_ranger.py@578 PS2, Line 578: > We can but we have to parse the error message and that can be somewhat erro hrm. I agree that parsing the error message isn't super robust, but I think we could at least do something like a simple regex substitution to remove ^HTTP \d+ Error: .+' and replace it with something like "Ranger error: ... (HTTP code 403)" so that it's clear that the error came from ranger and not produced within Impala or within the user's connection. I'm concerned that if a user using Hue sees an error pop up like this they may think there's something wrong with their HTTP connection to Hue, for example. -- To view, visit http://gerrit.cloudera.org:8080/13525 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8995f5dc88b211cd3af415713802cfeac44fe576 Gerrit-Change-Number: 13525 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 07 Jun 2019 17:13:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8459. LocalCatalog: Allow dropping tables that fail to load
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13557 ) Change subject: IMPALA-8459. LocalCatalog: Allow dropping tables that fail to load .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/13557/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13557/1//COMMIT_MSG@17 PS1, Line 17: This re-enables some tests that were previously disabled for > Were you able to run the end-to-end tests with the local catalog enabled? I I ran the kudu tests but didn't re-run the full suite. You think it's worth running a parameterized job and make sure I didn't regress something else? Is there an existing jenkins job that can easily be configured like that? http://gerrit.cloudera.org:8080/#/c/13557/1/fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java File fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java: http://gerrit.cloudera.org:8080/#/c/13557/1/fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java@40 PS1, Line 40: * Represents a table with incomplete metadata. The metadata may be incomplete because > Trying to understand - does this IncompleteTable implementation get used in yea I think this only gets used by the "v1" catalog. I'll add a comment. -- To view, visit http://gerrit.cloudera.org:8080/13557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia85d6b10669866dcc9f2bf7385a4775e483dd6b0 Gerrit-Change-Number: 13557 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 07 Jun 2019 16:38:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8459. LocalCatalog: Allow dropping tables that fail to load
Hello Tim Armstrong, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/13557 to review the following change. Change subject: IMPALA-8459. LocalCatalog: Allow dropping tables that fail to load .. IMPALA-8459. LocalCatalog: Allow dropping tables that fail to load This adds support in LocalCatalog for IncompleteTables. A new FeIncompleteTable interface is introduced, and when a table fails to load, it produces a FeIncompleteTable instead of an immediate error. This allows DROP TABLE statements to get past analysis and execute even when tables are in bad states (eg a Kudu table with no backing table in Kudu itself). This re-enables some tests that were previously disabled for LocalCatalog. Change-Id: Ia85d6b10669866dcc9f2bf7385a4775e483dd6b0 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java A fe/src/main/java/org/apache/impala/catalog/FeIncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java A fe/src/main/java/org/apache/impala/catalog/local/FailedLoadLocalTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java M fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java M tests/common/skip.py M tests/query_test/test_kudu.py 12 files changed, 127 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/13557/1 -- To view, visit http://gerrit.cloudera.org:8080/13557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia85d6b10669866dcc9f2bf7385a4775e483dd6b0 Gerrit-Change-Number: 13557 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8635. Use local metastore URIs when checking for Kudu/HMS integration
Todd Lipcon has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13555 Change subject: IMPALA-8635. Use local metastore URIs when checking for Kudu/HMS integration .. IMPALA-8635. Use local metastore URIs when checking for Kudu/HMS integration The patch for IMPALA-8504 (part 2) (6bb404dc359) checks to see if Impala and Kudu are configured against the same metastore to determine if the HMS integration is enabled. However, instead of using its own metastore URI config, it uses the configuration stored on the remote HMS. This is error prone because it's not common for the HMS configuration to store its own URI. Instead, we should use our own config. This patch changes to using the local configuration for this purpose. More robust would be to use the HMS "UUID" support, since it's possible that Kudu and Impala are talking to different HMS instances sharing a backing DB, but that work is deferred to a later commit since it depends on Kudu-side changes. This commit doesn't add any tests, but fixes the existing tests when running against Hive 3, where the HMS server side uses a different configuration key for the metastore URIs. Change-Id: Id7a4c2cc0580f7c4dc5cfceed30b91e87c547612 --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java 3 files changed, 9 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/13555/1 -- To view, visit http://gerrit.cloudera.org:8080/13555 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id7a4c2cc0580f7c4dc5cfceed30b91e87c547612 Gerrit-Change-Number: 13555 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon