Hi, I am running a pig script to process some webapp logs, and got this java heap error. The task logs look like:
... 2010-02-24 10:54:52,147 INFO org.apache.hadoop.mapred.ReduceTask: Read 2186251 bytes from map-output for attempt_201002240944_0068_m_000058_0 2010-02-24 10:54:52,147 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201002240944_0068_m_000058_0 -> (25, 116) from H1S1 2010-02-24 10:54:52,578 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEventsThread exiting 2010-02-24 10:54:52,578 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined. 2010-02-24 10:54:52,578 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager 2010-02-24 10:54:52,579 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 0 files left. 2010-02-24 10:54:58,209 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201002240944_0068_r_000047_0 Merge of the 31 files in-memory complete. Local file is /hadoopdata3/local/taskTracker/jobcache/job_201002240944_0068/attempt_201002240944_0068_r_000047_0/output/map_2.out of size 11223155 2010-02-24 10:54:58,210 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 33 files left. 2010-02-24 10:54:58,213 INFO org.apache.hadoop.mapred.Merger: Merging 33 sorted segments 2010-02-24 10:54:58,215 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 33 segments left of total size: 74732551 bytes 2010-02-24 10:55:05,420 INFO org.apache.hadoop.mapred.ReduceTask: Merged 33 segments, 74732551 bytes to disk to satisfy reduce memory limit 2010-02-24 10:55:05,421 INFO org.apache.hadoop.mapred.ReduceTask: Merging 2 files, 23137352 bytes from disk 2010-02-24 10:55:05,422 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes from memory into reduce 2010-02-24 10:55:05,422 INFO org.apache.hadoop.mapred.Merger: Merging 2 sorted segments 2010-02-24 10:55:05,428 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 23137344 bytes 2010-02-24 10:55:17,314 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called (Collection threshold exceeded) init = 5439488(5312K) used = 420142440(410295K) committed = 551223296(538304K) max = 715849728(699072K) 2010-02-24 10:55:21,483 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called (Collection threshold exceeded) init = 5439488(5312K) used = 420216440(410367K) committed = 551223296(538304K) max = 715849728(699072K) 2010-02-24 10:55:22,425 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called (Collection threshold exceeded) init = 5439488(5312K) used = 420217528(410368K) committed = 551223296(538304K) max = 715849728(699072K) 2010-02-24 10:55:23,263 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called (Collection threshold exceeded) init = 5439488(5312K) used = 420228280(410379K) committed = 551223296(538304K) max = 715849728(699072K) 2010-02-24 10:55:24,236 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called (Collection threshold exceeded) init = 5439488(5312K) used = 408702304(399123K) committed = 551223296(538304K) max = 715849728(699072K) 2010-02-24 10:55:24,991 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called (Collection threshold exceeded) init = 5439488(5312K) used = 408714624(399135K) committed = 551223296(538304K) max = 715849728(699072K) 2010-02-24 10:55:26,012 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called (Collection threshold exceeded) init = 5439488(5312K) used = 415409064(405672K) committed = 551223296(538304K) max = 715849728(699072K) 2010-02-24 10:55:26,778 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called (Collection threshold exceeded) init = 5439488(5312K) used = 417917824(408122K) committed = 551223296(538304K) max = 715849728(699072K) 2010-02-24 10:55:27,777 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called (Collection threshold exceeded) init = 5439488(5312K) used = 417832352(408039K) committed = 551223296(538304K) max = 715849728(699072K) 2010-02-24 10:55:28,571 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called (Collection threshold exceeded) init = 5439488(5312K) used = 417834120(408041K) committed = 551223296(538304K) max = 715849728(699072K) 2010-02-24 10:55:29,311 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called (Collection threshold exceeded) init = 5439488(5312K) used = 417857624(408064K) committed = 551223296(538304K) max = 715849728(699072K) ... and eventually, the used memory will exceed the max, and give a java heap size error. My understanding is that the intermediate outputs from maps are too big for in-memory merge, but which configuration should I change? Regards, Shawn
