Hi,
I am running a pig script to process some webapp logs, and got this
java heap error. The task logs look like:


...
2010-02-24 10:54:52,147 INFO org.apache.hadoop.mapred.ReduceTask: Read
2186251 bytes from map-output for attempt_201002240944_0068_m_000058_0
2010-02-24 10:54:52,147 INFO org.apache.hadoop.mapred.ReduceTask: Rec
#1 from attempt_201002240944_0068_m_000058_0 -> (25, 116) from H1S1
2010-02-24 10:54:52,578 INFO org.apache.hadoop.mapred.ReduceTask:
GetMapEventsThread exiting
2010-02-24 10:54:52,578 INFO org.apache.hadoop.mapred.ReduceTask:
getMapsEventsThread joined.
2010-02-24 10:54:52,578 INFO org.apache.hadoop.mapred.ReduceTask:
Closed ram manager
2010-02-24 10:54:52,579 INFO org.apache.hadoop.mapred.ReduceTask:
Interleaved on-disk merge complete: 0 files left.
2010-02-24 10:54:58,209 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201002240944_0068_r_000047_0 Merge of the 31 files in-memory
complete. Local file is
/hadoopdata3/local/taskTracker/jobcache/job_201002240944_0068/attempt_201002240944_0068_r_000047_0/output/map_2.out
of size 11223155
2010-02-24 10:54:58,210 INFO org.apache.hadoop.mapred.ReduceTask:
In-memory merge complete: 33 files left.
2010-02-24 10:54:58,213 INFO org.apache.hadoop.mapred.Merger: Merging
33 sorted segments
2010-02-24 10:54:58,215 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 33 segments left of total size: 74732551
bytes
2010-02-24 10:55:05,420 INFO org.apache.hadoop.mapred.ReduceTask:
Merged 33 segments, 74732551 bytes to disk to satisfy reduce memory
limit
2010-02-24 10:55:05,421 INFO org.apache.hadoop.mapred.ReduceTask:
Merging 2 files, 23137352 bytes from disk
2010-02-24 10:55:05,422 INFO org.apache.hadoop.mapred.ReduceTask:
Merging 0 segments, 0 bytes from memory into reduce
2010-02-24 10:55:05,422 INFO org.apache.hadoop.mapred.Merger: Merging
2 sorted segments
2010-02-24 10:55:05,428 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 2 segments left of total size: 23137344
bytes
2010-02-24 10:55:17,314 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called (Collection threshold exceeded) init = 5439488(5312K) used =
420142440(410295K) committed = 551223296(538304K) max =
715849728(699072K)
2010-02-24 10:55:21,483 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called (Collection threshold exceeded) init = 5439488(5312K) used =
420216440(410367K) committed = 551223296(538304K) max =
715849728(699072K)
2010-02-24 10:55:22,425 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called (Collection threshold exceeded) init = 5439488(5312K) used =
420217528(410368K) committed = 551223296(538304K) max =
715849728(699072K)
2010-02-24 10:55:23,263 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called (Collection threshold exceeded) init = 5439488(5312K) used =
420228280(410379K) committed = 551223296(538304K) max =
715849728(699072K)
2010-02-24 10:55:24,236 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called (Collection threshold exceeded) init = 5439488(5312K) used =
408702304(399123K) committed = 551223296(538304K) max =
715849728(699072K)
2010-02-24 10:55:24,991 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called (Collection threshold exceeded) init = 5439488(5312K) used =
408714624(399135K) committed = 551223296(538304K) max =
715849728(699072K)
2010-02-24 10:55:26,012 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called (Collection threshold exceeded) init = 5439488(5312K) used =
415409064(405672K) committed = 551223296(538304K) max =
715849728(699072K)
2010-02-24 10:55:26,778 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called (Collection threshold exceeded) init = 5439488(5312K) used =
417917824(408122K) committed = 551223296(538304K) max =
715849728(699072K)
2010-02-24 10:55:27,777 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called (Collection threshold exceeded) init = 5439488(5312K) used =
417832352(408039K) committed = 551223296(538304K) max =
715849728(699072K)
2010-02-24 10:55:28,571 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called (Collection threshold exceeded) init = 5439488(5312K) used =
417834120(408041K) committed = 551223296(538304K) max =
715849728(699072K)
2010-02-24 10:55:29,311 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called (Collection threshold exceeded) init = 5439488(5312K) used =
417857624(408064K) committed = 551223296(538304K) max =
715849728(699072K)
...

and eventually, the used memory will exceed the max, and give a java
heap size error. My understanding is that the intermediate outputs
from maps are too big for in-memory merge, but which configuration
should I change?

Regards,
Shawn

Reply via email to