Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13215 )

Change subject: IMPALA-8438: Store WriteId and valid writeId list for table and 
partitio
......................................................................


Patch Set 3:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/13215/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13215/3//COMMIT_MSG@12
PS3, Line 12: Tests:
I think if we add some of this info to the profile or to EXPLAIN output we 
could write some better end-to-end tests that actually modify a table and see 
that we pick up the new write ids on REFRESH


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java@238
PS3, Line 238:     return -1l;
nit: please use capital L so this doesn't look like -11. (same below)


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java@258
PS3, Line 258:   public static long getShimVersion() {
can we rename this to getMajorVersion or something since it's not the full 
version number?


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@374
PS3, Line 374:   * @param wrtId the write id to get the corresponding txn
I had to trace my way through Hive code to understand what this meant. My 
understanding is the following:

- if we pass no writeId, we'll get the current *latest* state of writes for a 
table
- if we pass a write ID, that's assuming that we are actually a writer on this 
table that has already been allocated an ID, and it'll translate our writeID 
back to a transaction list, and see which prior writes to the table were 
committed when our write started?

Do we have any use case for the latter yet? If not, maybe we should not expose 
it.


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@378
PS3, Line 378: long writeId
can we use Optional<Long> here?


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java
File fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@236
PS3, Line 236:       //Keep the log for HMS 3 related dev.
How about putting this into the query timeline event marked above? This way it 
will show up in profiles, which is moderately useful.


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@285
PS3, Line 285:         HdfsPartition hPartition = (HdfsPartition)part;
this won't work with LocalCatalog mode enabled. Probably best to just remove 
this, and instead add the relevant metadata to teh output of 'SHOW PARTITIONS' 
if you think it's useful for debugging.


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@617
PS3, Line 617:   // For Acid table.
can you be more clear here? eg "The last committed write ID which modified this 
partition."


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@618
PS3, Line 618:   private long writeId_ = -1l;
-1L


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@120
PS3, Line 120:   // Last commited write id for this table.
The way this got implemented, this is just a high watermark write id, which 
isn't necessarily the most recently committed. Not sure we have a strong reason 
to cache this.


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@127
PS3, Line 127: validWrtIds_
rename to 'validWriteIds_' (no need to abbreviate things)


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@285
PS3, Line 285:    * @param wrtId the write id to get the corresponding txn
per note elsewhere, I don't think we have a good use for this


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@289
PS3, Line 289:     if (LOG.isTraceEnabled()) LOG.trace("Get valid writeIds for 
table: " + name_);
getFullName()


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@290
PS3, Line 290: wrtIds
nit: don't abbreviate


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@298
PS3, Line 298:       LOG.trace("Valid writeIds: " + (wrtIds == null ? "null" : 
wrtIds));
I don't think you need to check for null here, since null will append to a 
string OK, right?


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@311
PS3, Line 311: writeId_
maybe I misunderstood what this gets used for, but I don't think this has the 
effect we're looking for.


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/main/java/org/apache/impala/catalog/Table.java@314
PS3, Line 314:       writeId_ = MetastoreShim.getHighestWriteId(validWrtIds_);
I'm not sure this is useful. The high watermark doesn't tell us the latest 
committed transaction, because transactions can commit out of order.



--
To view, visit http://gerrit.cloudera.org:8080/13215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6edbd64424edf0ba88af110ab8b958a1966b8b54
Gerrit-Change-Number: 13215
Gerrit-PatchSet: 3
Gerrit-Owner: Yongzhi Chen <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Sudhanshu Arora <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Yongzhi Chen <[email protected]>
Gerrit-Comment-Date: Fri, 03 May 2019 06:35:38 +0000
Gerrit-HasComments: Yes

Reply via email to