Hi Thejas,

I am using 0.9. What I see is that the members POForeach and
myPlans:ArrayList of PODemux seem to be keeping two deep copies of the same
set of databags.

Thanks,
Shubham.

On Mon, Oct 10, 2011 at 6:57 PM, Thejas Nair <[email protected]> wrote:

> What version of pig are you using ? You might want to try 0.9.1 .
> This sounds like the issue described in - https://issues.apache.org/**
> jira/browse/PIG-1815 <https://issues.apache.org/jira/browse/PIG-1815> .
>
> Thanks,
> Thejas
>
>
> On 10/10/11 2:22 PM, Shubham Chopra wrote:
>
>> The job I am trying to run performs some projections and aggregations. I
>> see
>> that maps continuously fail with an OOM with the following stack trace:
>>
>> Error: java.lang.OutOfMemoryError: Java heap space
>>        at org.apache.pig.data.**DefaultTuple.(DefaultTuple.**java:69)
>>        at org.apache.pig.data.**BinSedesTuple.(BinSedesTuple.**java:82)
>>        at org.apache.pig.data.**BinSedesTupleFactory.newTuple(**
>> BinSedesTupleFactory.java:38)
>>        at org.apache.pig.data.**BinInterSedes.readTuple(**
>> BinInterSedes.java:109)
>>        at org.apache.pig.data.**BinInterSedes.readDatum(**
>> BinInterSedes.java:270)
>>        at org.apache.pig.data.**BinInterSedes.readDatum(**
>> BinInterSedes.java:251)
>>        at org.apache.pig.data.**BinInterSedes.addColsToTuple(**
>> BinInterSedes.java:556)
>>        at org.apache.pig.data.**BinSedesTuple.readFields(**
>> BinSedesTuple.java:64)
>>        at org.apache.pig.impl.io.**PigNullableWritable.**readFields(**
>> PigNullableWritable.java:114)
>>        at org.apache.hadoop.io.**serializer.**WritableSerialization$**
>> WritableDeserializer.**deserialize(**WritableSerialization.java:67)
>>        at org.apache.hadoop.io.**serializer.**WritableSerialization$**
>> WritableDeserializer.**deserialize(**WritableSerialization.java:40)
>>        at org.apache.hadoop.mapreduce.**ReduceContext.nextKeyValue(**
>> ReduceContext.java:116)
>>        at org.apache.hadoop.mapreduce.**ReduceContext$ValueIterator.**
>> next(ReduceContext.java:163)
>>        at org.apache.pig.backend.hadoop.**executionengine.physicalLayer.*
>> *relationalOperators.**POCombinerPackage.getNext(**
>> POCombinerPackage.java:141)
>>        at org.apache.pig.backend.hadoop.**executionengine.physicalLayer.*
>> *relationalOperators.**POMultiQueryPackage.getNext(**
>> POMultiQueryPackage.java:238)
>>        at org.apache.pig.backend.hadoop.**executionengine.**
>> mapReduceLayer.PigCombiner$**Combine.**processOnePackageOutput(**
>> PigCombiner.java:171)
>>        at org.apache.pig.backend.hadoop.**executionengine.**
>> mapReduceLayer.PigCombiner$**Combine.reduce(PigCombiner.**java:162)
>>        at org.apache.pig.backend.hadoop.**executionengine.**
>> mapReduceLayer.PigCombiner$**Combine.reduce(PigCombiner.**java:51)
>>        at org.apache.hadoop.mapreduce.**Reducer.run(Reducer.java:176)
>>        at org.apache.hadoop.mapred.Task$**NewCombinerRunner.combine(**
>> Task.java:1222)
>>        at org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.**
>> sortAndSpill(MapTask.java:**1265)
>>        at org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.**
>> access$1800(MapTask.java:686)
>>        at org.apache.hadoop.mapred.**MapTask$MapOutputBuffer$**
>> SpillThread.run(MapTask.java:**1173)
>>
>>
>> An analysis of the heapdump showed that apart from the io sort buffer, the
>> remaining memory was being consumed almost in its entirety by
>> org.apache.pig.backend.hadoop.**executionengine.physicalLayer.**
>> relationalOperators.PODemux
>> (predominantly by an ArrayList, and a POForeach)
>>
>> Should the combiner usage be causing this high memory consumption? Is
>> there
>> any way to make the combiner run more frequently and aggregate the data
>> more
>> aggressively? The data I am using reduces by a factor of at least 1:10
>> after
>> the combiner step and is neatly partitioned to maximize the effectiveness
>> of
>> combiner.
>>
>> Thanks,
>> Shubham.
>>
>>
>

Reply via email to