All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not. On 5/31/11 5:40 PM, "ey-chih chow" <[email protected]<mailto:[email protected]>> wrote: I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks. Ey-Chih Chow ________________________________ From: [email protected]<mailto:[email protected]> To: [email protected]<mailto:[email protected]> Subject: avro object reuse Date: Tue, 31 May 2011 10:38:39 -0700 Hi, We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks. Ey-Chih Chow
