Hello Bharath Vissapragada, Tianyi Wang, Tim Armstrong, Csaba Ringhofer, Impala 
Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11232

to look at the new patch set (#3).

Change subject: IMPALA-7406. Avoid memory overhead of FlatBuffer object 
wrappers for FileDescriptor
......................................................................

IMPALA-7406. Avoid memory overhead of FlatBuffer object wrappers for 
FileDescriptor

This patch saves approximately 100 bytes per file in the catalog and
impalad heap by changing HdfsPartition to store raw byte[] arrays
instead of FileDescriptor wrapper objects. The outward-facing APIs of
HdfsPartition remain the same: the patch uses Guava's lazy list
transformation utility to return a List which is backed by the
underlying byte[] and wraps on the fly.

No new tests are added since existing tests cover these code paths from
an end-to-end basis. I tested the memory usage by running a catalogd
with background loading enabled, waiting until loading finished, and
then running 'jmap -histo:live' a few times on the catalogd until the
result stabilized. I then used a quick python script[1] to diff the
histograms and look for any changes with high magnitude. I found the
following significant diffs:

Object counts:
  Negative (good):
    java.nio.HeapByteBuffer: 12131 -> 35 (-12096)
    org.apache.impala.fb.FbFileDesc: 12090 -> 0 (-12090)
    org.apache.impala.catalog.HdfsPartition$FileDescriptor: 12090 -> 0 (-12090)
    java.util.ArrayList: 37825 -> 28028 (-9797)
    [Ljava.lang.Object;: 24393 -> 15035 (-9358)
  Positive (bad):
   com.google.common.collect.RegularImmutableList: 1694 -> 2019 (+325)
   com.google.common.collect.SingletonImmutableList: 8560 -> 19142 (+10582)

  Net: -55431 + 10934 = -44497

Heap size:
  Negative (good):
    java.nio.HeapByteBuffer: 582288 -> 1680 (-580608)
    [Ljava.lang.Object;: 3560960 -> 3026448 (-534512)
    org.apache.impala.fb.FbFileDesc: 290160 -> 0 (-290160)
    java.util.ArrayList: 907800 -> 672672 (-235128)
    org.apache.impala.catalog.HdfsPartition$FileDescriptor: 193440 -> 0 
(-193440)
  Positive (bad):
    com.google.common.collect.RegularImmutableList: 54208 -> 64608 (+10400)
    com.google.common.collect.SingletonImmutableList: 205440 -> 459408 (+253968)

  Net: -1833848 + 264368 = -1569480

To summarize, the measurements confirm that this patch saved about 130
bytes and 3.7 objects per file descriptor. The savings are a bit more
than estimated because switching from ArrayList to ImmutableList inside
HdfsPartition seems to have also yielded some savings as it optimizes
for the singleton and empty case.

In terms of performance, it would be very unlikely that this patch
causes a regression. The short-lived allocations of the wrapper objects
will all come out of the TLAB, and even loading a partition with a
million partitions will only allocate ~100MB of short-lived objects,
which should only take a few milliseconds to collect (young-generation
collections take time relative to the number of live objects, not dead
objects). The other overheads associated with any operation on a
million-partition table would surely dominate.

[1] https://gist.github.com/cb9c52907bbc416b055ffdaa34aaaa77

Change-Id: I986dd5c05ac8904898d01a6ef3487bd7225e0500
---
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
1 file changed, 60 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/11232/3
--
To view, visit http://gerrit.cloudera.org:8080/11232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I986dd5c05ac8904898d01a6ef3487bd7225e0500
Gerrit-Change-Number: 11232
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>

Reply via email to