Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11109 )
Change subject: IMPALA-7386. Replace CatalogObjectVersionQueue with a multiset ...................................................................... IMPALA-7386. Replace CatalogObjectVersionQueue with a multiset The implementation of CatalogObjectVersionQueue was based on a priority queue, which is not a good choice of data structure for the use case. This class exposes the following operations, with corresponding runtimes for the heap: - addVersion: O(lg n) - removeVersion: O(n) - getMinVersion: O(1) Given that every update of a catalog object requires removing the old version, the O(n) runtime could get quite expensive, particularly for operations like INVALIDATE METADATA which end up removing the old version of all of the objects in the catalog -- thus becoming accidentally quadratic. The new patch switches to a TreeMultiset coupled with a simple cache of the min element, with runtimes: - addVersion: O(lg n) - removeVersion: O(lg n) - getMinVersion: O(1) The downside of this patch is increased memory overhead: we are going from a PriorityQueue which has 8 bytes overhead per element to a TreeMultiset which has 48 bytes overhead per element. In a catalog with millions of tables this won't be insignificant; however, in such a catalog the performance savings are probably critical. According to a quick JMH benchmark, removal of an element from a PriorityQueue with 1M elements takes about 3.5ms, so an operation which invalidates all million tables would take about 3500 seconds without this patch. Meanwhile, the same benchmark using a TreeSet takes about 28ns per removal, so the whole invalidation would take about 28ms. This patch also makes the data structure synchronized, just for simplicity's sake, since uncontended locks are very cheap in Java. Change-Id: I1b6c58a1acef9b02fcc26a04d048ee9cc47dc0ef Reviewed-on: http://gerrit.cloudera.org:8080/11109 Tested-by: Todd Lipcon <[email protected]> Reviewed-by: Todd Lipcon <[email protected]> --- M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java D fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java A fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionSet.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java A fe/src/test/java/org/apache/impala/catalog/CatalogObjectVersionSetTest.java 6 files changed, 223 insertions(+), 94 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I1b6c58a1acef9b02fcc26a04d048ee9cc47dc0ef Gerrit-Change-Number: 11109 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
