Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16392 )

Change subject: MPALA-10075: Reuse unchanged partition instances
......................................................................


Patch Set 5:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@7
PS5, Line 7: MPALA
typo: IMPALA


http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@11
PS5, Line 11: redudant
typo: redundant


http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@13
PS5, Line 13: changes
typo: changed


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@293
PS5, Line 293:       if (oldFd == null || oldFd.getFileLength() != 
fd.getFileLength()
             :           || oldFd.getModificationTime() != 
fd.getModificationTime()
Are you sure that we shouldn't check other properties too? Or the modification 
time will change for all other changes?

An example that comes to mind is HDFS rebalancing - my guess is that if the 
locations of some blocks change for a file, the modification time won't change, 
so we won't propagate the change, which can lead to suboptimal plans.

One more thing: I would prefer to move this logic to FileDescriptor, e.g. in a 
function like equalsTo(), or isUnchanged().


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1541
PS5, Line 1541: filesChanged_
I think that relying on filesChanged_ is counter intuitive in equalsTo. 
Changing the name to something like "equalsToOriginal" would reflect the 
semantics better, as we assume here that the partition is compared to the same 
partition before the update.

Checking the files here again is also a solution - it would waste some CPU but 
the semantics would be clear.


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1541
PS5, Line 1541: oldInstance == oldInstance_
Is it possible for these two to be different?


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1545
PS5, Line 1545: oldInstance_
We are using oldInstance_ here and in the last line while using oldInstance in 
all other places. This is correct as we have checked that the two are equal, 
but I would prefer more consistency.


http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@933
PS5, Line 933:       // TODO: trace
Do you plan to change the logging below?


http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py
File tests/metadata/test_reuse_partitions.py:

http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py@49
PS5, Line 49: test_reuse_partitions
An idea to extend this test is to run the same queries for an insert_only 
transactional table. Some operations are not supported there (ALTER TABLE), but 
I think that other operations should lead to the same result. What may affect 
this is writeId - if it is changed while the partition actually doesn't change, 
it could lead to less reuse than in the the non-transactional case. It would be 
good to verify that this doesn't happen.


http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py@55
PS5, Line 55:       "create table %s.%s (id int) partitioned by (p int) stored 
as textfile"
            :       % (unique_database, tbl_name))
nit: +2 indentation



--
To view, visit http://gerrit.cloudera.org:8080/16392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2dd645c260d271291021e52fdac4b74924df1170
Gerrit-Change-Number: 16392
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Comment-Date: Tue, 06 Oct 2020 10:44:08 +0000
Gerrit-HasComments: Yes

Reply via email to