Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21072 )

Change subject: IMPALA-12831: Fix HdfsTable.toMinimalTCatalogObject() failed by 
concurrent modification
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21072/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/21072/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2133
PS1, Line 2133:   /**
> Would it be hard to synchronize access to partitionMap_?
We recently fixed a performance regression on INVALIDATE that it triggers table 
loading and could be blocked by concurrent DDLs (IMPALA-11501). Adding 
synchronization here means INVALIDATE could again be blocked by concurrent 
DDLs. I think the executiing time is more important than memory usage. So the 
current patch is best effort to get the partition list to optimize the memory 
usage. If the read lock can't be acquired, we still have the same performance 
since tryReadLock() has a zero timeout.

BTW, the partition ID changes when a partition is reloaded (though the 
partition name unchanged) since HdfsPartition objects are immutable and are 
assigned an incremental id (IMPALA-9778).

To avoid INVALIDATE be blocked by concurrent DDLs, maybe another approach is 
using ConcurrentHashMap as partitionMap_. But in some places we do clear the 
map, e.g.
https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L706
So we could send an empty map here before all partitions are loaded.

What do you think?


http://gerrit.cloudera.org:8080/#/c/21072/1/tests/custom_cluster/test_concurrent_ddls.py
File tests/custom_cluster/test_concurrent_ddls.py:

http://gerrit.cloudera.org:8080/#/c/21072/1/tests/custom_cluster/test_concurrent_ddls.py@215
PS1, Line 215: refresh
> refresh_handle would be clearer ase there are other operations in parallel
Done



--
To view, visit http://gerrit.cloudera.org:8080/21072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9f8e65c0bd24000241eedf8ca765c1e4e14fdb3
Gerrit-Change-Number: 21072
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Comment-Date: Tue, 27 Feb 2024 08:48:46 +0000
Gerrit-HasComments: Yes

Reply via email to