problem with LZO compressor on write only loads

Friso van Vollenhoven Mon, 03 Jan 2011 08:15:35 -0800

Hi all,

I seem to run into a problem that occurs when using LZO compression on a heavy 
write only load. I am using 0.90 RC1 and, thus, the LZO compressor code that 
supports the reinit() method (from Kevin Weil's github, version 0.4.8). There 
are some more Hadoop LZO incarnations, so I am pointing my question to this 
list.


It looks like the compressor uses direct byte buffers to store the original and 
compressed bytes in memory, so the native code can work with it without the JVM 
having to copy anything around. The direct buffers are possibly reused after a 
reinit() call, but will often be newly created in the init() method, because 
the existing buffer can be the wrong size for reusing. The latter case will 
leave the previously used buffers by the compressor instance eligible for 
garbage collection. I think the problem is that this collection never occurs 
(in time), because the GC does not consider it necessary yet. The GC does not 
know about the native heap and based on the state of the JVM heap, there is no 
reason to finalize these objects yet. However, direct byte buffers are only 
freed in the finalizer, so the native heap keeps growing. On write only loads, 
a full GC will rarely happen, because the max heap will not grow far beyond the 
mem stores (no block cache is used). So what happens is that the machine starts 
using swap before the GC will ever clean up the direct byte buffers. I am 
guessing that without the reinit() support, the buffers were collected earlier 
because the referring objects would also be collected every now and then or 
things would perhaps just never promote to an older generation.

When I do a pmap on a running RS after it has grown to some 40Gb resident size 
(with a 16Gb heap), it will show a lot of near 64M anon blocks (presumably 
native heap). I show this before with the 0.4.6 version of Hadoop LZO, but that 
was under normal load. After that I went back to a HBase version that does not 
require the reinit(). Now I am on 0.90 with the new LZO, but never did a heavy 
load like this one with that, until now...

Can anyone with a better understanding of the LZO code confirm that the above 
could be the case? If so, would it be possible to change the LZO compressor 
(and decompressor) to use maybe just one fixed size buffer (they all appear 
near 64M anyway) or possibly reuse an existing buffer also when it is not the 
exact required size but just large enough to make do? Having short lived direct 
byte buffers is apparently a discouraged practice. If anyone can provide some 
pointers on what to look out for, I could invest some time in creating a patch.


Thanks,
Friso

problem with LZO compressor on write only loads

Reply via email to