no, and even GenericData.Record simply writes using a StringBuilder; I doubt 
this is the culprit.

On 6/1/11 3:14 PM, "ey-chih chow" 
<[email protected]<mailto:[email protected]>> wrote:

We use a lot of toString() call on the avro Utf8 object.  Will this cause 
Jackson call?  Thanks.

Ey-Chih

________________________________
From: [email protected]<mailto:[email protected]>
To: [email protected]<mailto:[email protected]>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse

This is great info.

Jackson should only be used once when the file is opened, so this is confusing 
from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently? 
 There are over 100000 Jackson DeserializationConfig objects.

Another place that parses the schema is in AvroSerialization.java.  Does the 
Hadoop getDeserializer() API method get called once per job, or per record?  If 
this is called more than once per map job, it might explain this.

In principle, Jackson is only used by a mapper during initialization.  The 
below indicates that this may not be the case or that something outside of Avro 
is causing a lot of Jackson JSON parsing.

Are you using something that is converting the Avro data to Json form?  
toString() on most Avro datum objects will do a lot of work with Jackson, for 
example — but the below are deserializer objects not serializer objects so that 
is not likely the issue.

On 6/1/11 11:34 AM, "ey-chih chow" 
<[email protected]<mailto:[email protected]>> wrote:

We ran jmap on one of our mapper and found the top usage as follows:

num  #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator

It looks like Jackson eats up a lot of memory.  Our mapper reads in files of 
the avro format.  Does avro use Jackson a lot in reading the avro files?  Is 
there any way to improve this?  Thanks.

Ey-Chih Chow

________________________________
From: [email protected]<mailto:[email protected]>
To: [email protected]<mailto:[email protected]>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its 
not likely due to object reuse.  This tends to cause more CPU time in the 
garbage collector, but not out of memory conditions.  This can be hard to do on 
a cluster, but grabbing 'jmap –histo' output from a JVM that has a 
larger-than-expected JVM heap usage can often be used to quickly identify the 
cause of memory consumption issues.

I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or 
not.


On 5/31/11 5:40 PM, "ey-chih chow" 
<[email protected]<mailto:[email protected]>> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I 
looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 
object has to be created each time the method next(AvroWrapper<Utf8> key, 
NullWritable value) is called ?  Will this eat up too much memory when we call 
next(key, value) many times?  Since Utf8 is mutable, can we just create one 
Utf8 object for all the calls to next(key, value)?  Will this save memory?  
Thanks.

Ey-Chih Chow

________________________________
From: [email protected]<mailto:[email protected]>
To: [email protected]<mailto:[email protected]>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi,

We have several mapreduce jobs using avro.  They take too much memory when 
running on production.  Can anybody suggest some object reuse techniques to cut 
down memory usage?  Thanks.

Ey-Chih Chow

Reply via email to