Hello Bharath Vissapragada, Csaba Ringhofer, Impala Public Jenkins, Vuk
Ercegovac,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11109
to look at the new patch set (#3).
Change subject: IMPALA-7386. Replace CatalogObjectVersionQueue with a multiset
......................................................................
IMPALA-7386. Replace CatalogObjectVersionQueue with a multiset
The implementation of CatalogObjectVersionQueue was based on a priority
queue, which is not a good choice of data structure for the use case.
This class exposes the following operations, with corresponding runtimes
for the heap:
- addVersion: O(lg n)
- removeVersion: O(n)
- getMinVersion: O(1)
Given that every update of a catalog object requires removing the old
version, the O(n) runtime could get quite expensive, particularly for
operations like INVALIDATE METADATA which end up removing the old
version of all of the objects in the catalog -- thus becoming
accidentally quadratic.
The new patch switches to a TreeMultiset coupled with a simple cache of
the min element, with runtimes:
- addVersion: O(lg n)
- removeVersion: O(lg n)
- getMinVersion: O(1)
The downside of this patch is increased memory overhead: we are going
from a PriorityQueue which has 8 bytes overhead per element to a
TreeMultiset which has 48 bytes overhead per element. In a catalog with
millions of tables this won't be insignificant; however, in such a
catalog the performance savings are probably critical. According to
a quick JMH benchmark, removal of an element from a PriorityQueue with
1M elements takes about 3.5ms, so an operation which invalidates all
million tables would take about 3500 seconds without this patch.
Meanwhile, the same benchmark using a TreeSet takes about 28ns per
removal, so the whole invalidation would take about 28ms.
This patch also makes the data structure synchronized, just for
simplicity's sake, since uncontended locks are very cheap in Java.
Change-Id: I1b6c58a1acef9b02fcc26a04d048ee9cc47dc0ef
---
M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java
D fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java
A fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionSet.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
A fe/src/test/java/org/apache/impala/catalog/CatalogObjectVersionSetTest.java
6 files changed, 223 insertions(+), 94 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/11109/3
--
To view, visit http://gerrit.cloudera.org:8080/11109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1b6c58a1acef9b02fcc26a04d048ee9cc47dc0ef
Gerrit-Change-Number: 11109
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>