Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13215 )
Change subject: IMPALA-8438: Store WriteId and valid writeId list for table and partitio ...................................................................... Patch Set 3: (17 comments) http://gerrit.cloudera.org:8080/#/c/13215/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13215/3//COMMIT_MSG@12 PS3, Line 12: Tests: I think if we add some of this info to the profile or to EXPLAIN output we could write some better end-to-end tests that actually modify a table and see that we pick up the new write ids on REFRESH http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java@238 PS3, Line 238: return -1l; nit: please use capital L so this doesn't look like -11. (same below) http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java@258 PS3, Line 258: public static long getShimVersion() { can we rename this to getMajorVersion or something since it's not the full version number? http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@374 PS3, Line 374: * @param wrtId the write id to get the corresponding txn I had to trace my way through Hive code to understand what this meant. My understanding is the following: - if we pass no writeId, we'll get the current *latest* state of writes for a table - if we pass a write ID, that's assuming that we are actually a writer on this table that has already been allocated an ID, and it'll translate our writeID back to a transaction list, and see which prior writes to the table were committed when our write started? Do we have any use case for the latter yet? If not, maybe we should not expose it. http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@378 PS3, Line 378: long writeId can we use Optional<Long> here? http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java File fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@236 PS3, Line 236: //Keep the log for HMS 3 related dev. How about putting this into the query timeline event marked above? This way it will show up in profiles, which is moderately useful. http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@285 PS3, Line 285: HdfsPartition hPartition = (HdfsPartition)part; this won't work with LocalCatalog mode enabled. Probably best to just remove this, and instead add the relevant metadata to teh output of 'SHOW PARTITIONS' if you think it's useful for debugging. http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@617 PS3, Line 617: // For Acid table. can you be more clear here? eg "The last committed write ID which modified this partition." http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@618 PS3, Line 618: private long writeId_ = -1l; -1L http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@120 PS3, Line 120: // Last commited write id for this table. The way this got implemented, this is just a high watermark write id, which isn't necessarily the most recently committed. Not sure we have a strong reason to cache this. http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@127 PS3, Line 127: validWrtIds_ rename to 'validWriteIds_' (no need to abbreviate things) http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@285 PS3, Line 285: * @param wrtId the write id to get the corresponding txn per note elsewhere, I don't think we have a good use for this http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@289 PS3, Line 289: if (LOG.isTraceEnabled()) LOG.trace("Get valid writeIds for table: " + name_); getFullName() http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@290 PS3, Line 290: wrtIds nit: don't abbreviate http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@298 PS3, Line 298: LOG.trace("Valid writeIds: " + (wrtIds == null ? "null" : wrtIds)); I don't think you need to check for null here, since null will append to a string OK, right? http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@311 PS3, Line 311: writeId_ maybe I misunderstood what this gets used for, but I don't think this has the effect we're looking for. http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@314 PS3, Line 314: writeId_ = MetastoreShim.getHighestWriteId(validWrtIds_); I'm not sure this is useful. The high watermark doesn't tell us the latest committed transaction, because transactions can commit out of order. -- To view, visit http://gerrit.cloudera.org:8080/13215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6edbd64424edf0ba88af110ab8b958a1966b8b54 Gerrit-Change-Number: 13215 Gerrit-PatchSet: 3 Gerrit-Owner: Yongzhi Chen <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Sudhanshu Arora <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Yongzhi Chen <[email protected]> Gerrit-Comment-Date: Fri, 03 May 2019 06:35:38 +0000 Gerrit-HasComments: Yes
