Hello Bharath Vissapragada, Vuk Ercegovac,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/11109
to review the following change.
Change subject: IMPALA-7386. Replace CatalogObjectVersionQueue with a multiset
......................................................................
IMPALA-7386. Replace CatalogObjectVersionQueue with a multiset
The implementation of CatalogObjectVersionQueue was based on a priority
queue, which is not a good choice of data structure for the use case.
This class exposes the following operations, with corresponding runtimes
for the heap:
- addVersion: O(lg n)
- removeVersion: O(n)
- getMinVersion: O(1)
Given that every update of a catalog object requires removing the old
version, the O(n) runtime could get quite expensive, particularly for
operations like INVALIDATE METADATA which end up removing the old
version of all of the objects in the catalog -- thus becoming
accidentally quadratic.
The new patch switches to a TreeMultiset coupled with a simple cache of
the min element, with runtimes:
- addVersion: O(lg n)
- removeVersion: O(lg n)
- getMinVersion: O(1)
The downside of this patch is increased memory overhead: we are going
from a PriorityQueue which has 8 bytes overhead per element to a
TreeMultiset which has 48 bytes overhead per element. In a catalog with
millions of tables this won't be insignificant; however, in such a
catalog the performance savings are probably critical. According to
a quick JMH benchmark, removal of an element from a PriorityQueue with
1M elements takes about 3.5ms, so an operation which invalidates all
million tables would take about 3500 seconds without this patch.
Meanwhile, the same benchmark using a TreeSet takes about 28ns per
removal, so the whole invalidation would take about 28ms.
This patch also makes the data structure synchronized, just for
simplicity's sake, since uncontended locks are very cheap in Java.
Change-Id: I1b6c58a1acef9b02fcc26a04d048ee9cc47dc0ef
---
M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java
D fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java
A fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionSet.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
A fe/src/test/java/org/apache/impala/catalog/CatalogObjectVersionSetTest.java
6 files changed, 214 insertions(+), 92 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/11109/1
--
To view, visit http://gerrit.cloudera.org:8080/11109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1b6c58a1acef9b02fcc26a04d048ee9cc47dc0ef
Gerrit-Change-Number: 11109
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>