Hey, guys, how are Bags passed to EvalFunc stored? I was looking at the Accumulator interface and it says that the reason why this needed for COUNT and SUM is because EvalFunc always gives you the entire bag when the EvalFunc is run on a bag.
I always thought if I did COUNT(TABLE) or SUM(TABLE.FIELD), and the code inside that does for(Tuple entry:inputDataBag){ .... stuff } was an actual iterator that iterated on the bag sequentially without necessarily having the entire bag in memory all at once. ?? Because it's an iterator, so there's no way to do anything other than to stream through it. I'm looking at this because Accumulator has no way of telling Pig "I've seen enough" It streams through the entire bag no matter what happens. (like, hypothetically speaking, if I was writing "5th item of a sorted bag" udf), after I see 5th of a 5 million entry bag, I want to stop executing if possible. Is there a easy way to make this happen?