[Impala-ASF-CR] IMPALA-7386. Replace CatalogObjectVersionQueue with a multiset

Todd Lipcon (Code Review) Thu, 02 Aug 2018 10:53:27 -0700

Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11109 )


Change subject: IMPALA-7386. Replace CatalogObjectVersionQueue with a multiset
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11109/1/fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionSet.java
File fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionSet.java:

http://gerrit.cloudera.org:8080/#/c/11109/1/fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionSet.java@40
PS1, Line 40: TreeMultiset<Long>
> Is it possible to have much more objects than versions? If yes, then it may
Internally the TreeMultiset actually uses a pair of <element, count> in its 
tree node: 
https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/TreeMultiset.java#L570

I'm having some trouble matching up the memory measurements from 
https://github.com/DimitrisAndreou/memory-measurer/blob/master/ElementCostInDataStructures.txt
 with the actual class members above, though. From looking at above, each tree 
node should have 5 object references, 1 long, and 3 ints, which should add up 
to 60 bytes. Add the 16-byte object header and 4 bytes padding (objects must be 
a multiple of 8) and you have 80 bytes per tree node. Then the boxed Long 
itself should be another 24 bytes (16 byte header plus 8 byte long). So, that 
makes a total of 104 bytes per entry, more than I'd originally counted in the 
commit message.

Weirdly it looks like TreeSet in JDK is implemented on top of a TreeMap<E, 
Object> where each entry is mapped to some PRESENT singleton. So for memory 
overhead we have to look at TreeMap.Entry: 
https://github.com/baratali/jdk8/blob/master/src/main/java/openjdk/jdk/src/share/classes/java/util/TreeMap.java#L2048
It seems like each Entry has five object references (40 bytes), 1 bool which 
needs 7 bytes of padding, and the 16-byte object header. So, 64 bytes per tree 
node. Then our Pair structure would be 16 byte header, plus 8 byte long, plus 4 
byte count, 4 bytes padding. That adds up to 64+16+8+4+4 = 96 per entry.

So, if we switched to TreeSet<LongIntPair> we'd probably save 8 bytes per entry 
(<10%). Seems to me like if it's probably not quite worth the extra code here 
since we'd need comparators, more complex removal logic, etc.



--
To view, visit http://gerrit.cloudera.org:8080/11109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b6c58a1acef9b02fcc26a04d048ee9cc47dc0ef
Gerrit-Change-Number: 11109
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Thu, 02 Aug 2018 17:52:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7386. Replace CatalogObjectVersionQueue with a multiset

Reply via email to