We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class
description--------------------------------------------------------------------------1:
24405 291733256 byte[]2: 6056 40228984
int[]3: 388799 19966776 char[]4: 101779
16284640 org.codehaus.jackson.impl.ReaderBasedParser5: 369623
11827936 java.lang.String6: 111059 8769424
java.util.HashMap$Entry[]7: 204083 8163320
org.codehaus.jackson.impl.JsonReadContext8: 211374 6763968
java.util.HashMap$Entry9: 102551 5742856
org.codehaus.jackson.util.TextBuffer10: 105854 5080992
java.nio.HeapByteBuffer11: 105821 5079408
java.nio.HeapCharBuffer12: 104578 5019744 java.util.HashMap13:
102551 4922448 org.codehaus.jackson.io.IOContext14: 101782
4885536 org.codehaus.jackson.map.DeserializationConfig15: 101783
4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer16: 101779
4071160 org.codehaus.jackson.map.deser.StdDeserializationContext17:
101779 4071160 java.io.StringReader18: 101754 4070160
java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of
the avro format. Does avro use Jackson a lot in reading the avro files? Is
there any way to improve this? Thanks.
Ey-Chih Chow
From: [email protected]
To: [email protected]
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its
not likely due to object reuse. This tends to cause more CPU time in the
garbage collector, but not out of memory conditions. This can be hard to do on
a cluster, but grabbing 'jmap –histo' output from a JVM that has a
larger-than-expected JVM heap usage can often be used to quickly identify the
cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or
not.
On 5/31/11 5:40 PM, "ey-chih chow" <[email protected]> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I
looked at AvroUtf8InputFormat and got the following question. Why a new Utf8
object has to be created each time the method next(AvroWrapper<Utf8> key,
NullWritable value) is called ? Will this eat up too much memory when we call
next(key, value) many times? Since Utf8 is mutable, can we just create one
Utf8 object for all the calls to next(key, value)? Will this save memory?
Thanks.
Ey-Chih Chow
From: [email protected]
To: [email protected]
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when
running on production. Can anybody suggest some object reuse techniques to cut
down memory usage? Thanks.
Ey-Chih Chow