Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11109 )
Change subject: IMPALA-7386. Replace CatalogObjectVersionQueue with a multiset ...................................................................... Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11109/1/fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionSet.java File fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionSet.java: http://gerrit.cloudera.org:8080/#/c/11109/1/fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionSet.java@40 PS1, Line 40: TreeMultiset<Long> > Is it possible to have much more objects than versions? If yes, then it may Internally the TreeMultiset actually uses a pair of <element, count> in its tree node: https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/TreeMultiset.java#L570 I'm having some trouble matching up the memory measurements from https://github.com/DimitrisAndreou/memory-measurer/blob/master/ElementCostInDataStructures.txt with the actual class members above, though. From looking at above, each tree node should have 5 object references, 1 long, and 3 ints, which should add up to 60 bytes. Add the 16-byte object header and 4 bytes padding (objects must be a multiple of 8) and you have 80 bytes per tree node. Then the boxed Long itself should be another 24 bytes (16 byte header plus 8 byte long). So, that makes a total of 104 bytes per entry, more than I'd originally counted in the commit message. Weirdly it looks like TreeSet in JDK is implemented on top of a TreeMap<E, Object> where each entry is mapped to some PRESENT singleton. So for memory overhead we have to look at TreeMap.Entry: https://github.com/baratali/jdk8/blob/master/src/main/java/openjdk/jdk/src/share/classes/java/util/TreeMap.java#L2048 It seems like each Entry has five object references (40 bytes), 1 bool which needs 7 bytes of padding, and the 16-byte object header. So, 64 bytes per tree node. Then our Pair structure would be 16 byte header, plus 8 byte long, plus 4 byte count, 4 bytes padding. That adds up to 64+16+8+4+4 = 96 per entry. So, if we switched to TreeSet<LongIntPair> we'd probably save 8 bytes per entry (<10%). Seems to me like if it's probably not quite worth the extra code here since we'd need comparators, more complex removal logic, etc. -- To view, visit http://gerrit.cloudera.org:8080/11109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b6c58a1acef9b02fcc26a04d048ee9cc47dc0ef Gerrit-Change-Number: 11109 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Thu, 02 Aug 2018 17:52:40 +0000 Gerrit-HasComments: Yes
