There are two ways pig controls the memory used by large bags -
1. Triggers set on GC, similar to mechanism described by Julien here -
https://techblug.wordpress.com/2011/07/21/detecting-low-memory-in-java-part-2/
. When pig gets notified about high memory usage, it goes through the
list of Spi
Daniel,
iirc spill requests are triggered by a gc, and spill_count is triggered by
an actual spill, so the former number may be a bit misleading (if gc is
effective, lots of gcs might be fine).
D
On Wed, Aug 3, 2011 at 10:12 AM, Daniel Dai wrote:
> Spill means Pig need to dump memory into disk.
Spill means Pig need to dump memory into disk. It happens when Pig
deals with a large key, and Pig run short of memory. The high number
indicates Pig need to write to disk frequently and performance may
downgrade, and you may explore approach, such as using skewed join.
Daniel
On Tue, Aug 2, 2011
org.apache.pig.PigCounters
PROACTIVE_SPILL_COUNT_RECS
0
2,372,598
2,372,598
SPILLABLE_MEMORY_MANAGER_SPILL_COUNT
0
64
64
PROACTIVE_SPILL_COUNT_BAGS
I was checking my jobtracker and I have no idea what these three counters are
representative of...
Can anyone shed some light, please?
-S