Thanks Ashutosh, is there any workaround for this, will increasing the heap
size help ?


On 7/8/10 1:59 PM, "Ashutosh Chauhan" <ashutosh.chau...@gmail.com> wrote:

> Syed,
> 
> You are likely hit by https://issues.apache.org/jira/browse/PIG-1442 .
> Your query and stacktrace look very similar to the one in the jira
> ticket. This may get fixed by 0.8 release.
> 
> Ashutosh
> 
> On Thu, Jul 8, 2010 at 13:42, Syed Wasti <mdwa...@hotmail.com> wrote:
>> Sorry about the delay, was held with different things.
>> Here is the script and the errors below;
>> 
>> AA = LOAD 'table1' USING PigStorage('\t') as
>> (ID,b,c,d,e,f,g,h,i,j,k,l,m,n,o);
>> 
>> AB = FOREACH AA GENERATE ID, e, f, n,o;
>> 
>> AC = FILTER AB BY o == 1;
>> 
>> AD = GROUP AC BY (ID, b);
>> 
>> AE = FOREACH AD { A = DISTINCT AC.d;
>>        GENERATE group.ID, (chararray) 'S' AS type, group.b, (int)
>> COUNT_STAR(filt) AS cnt, (int) COUNT(A) AS cnt_distinct; }
>> 
>> The same steps are repeated to load 5 different tables and then a UNION is
>> done on them.
>> 
>> Final_res = UNION AE, AF, AG, AH, AI;
>> 
>> The actual number of columns will be 15 here I am showing with one table.
>> 
>> Final_table =   FOREACH Final_res GENERATE ID,
>>                (type == 'S' AND b == 1?cnt:0) AS 12_tmp,
>>                (type == 'S' AND b == 2?cnt:0) AS 13_tmp,
>>                (type == 'S' AND b == 1?cnt_distinct:0) AS 12_distinct_tmp,
>>                (type == 'S' AND b == 2?cnt_distinct:0) AS 13_distinct_tmp;
>> 
>> It works fine until here, it is only after adding this last part of the
>> query it starts throwing heap errors.
>> 
>> grp_id =    GROUP Final_table BY ID;
>> 
>> Final_data = FOREACH grp_reg_id GENERATE group AS ID
>> SUM(Final_table.12_tmp), SUM(Final_table.13_tmp),
>> SUM(Final_table.12_distinct_tmp), SUM(Final_table.13_distinct_tmp);
>> 
>> STORE Final_data;
>> 
>> 
>> Error: java.lang.OutOfMemoryError: Java heap space
>>  at java.util.ArrayList.(ArrayList.java:112)
>>  at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:63)
>>  at
>> org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35
>> )
>>  at
>> org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:55)
>>  at
>> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
>>  at
>> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:130)
>>  at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:289)
>>  at
>> org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.ja
>> va:114)
>>  at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
>> eserialize(WritableSerialization.java:67)
>>  at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
>> eserialize(WritableSerialization.java:40)
>>  at
>> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:11
>> 6)
>>  at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>>  at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1217)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1
>> 227)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:64
>> 8)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.jav
>> a:1135)
>> 
>> 
>> Error: java.lang.OutOfMemoryError: Java heap space
>>  at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
>> ors.POCombinerPackage.createDataBag(POCombinerPackage.java:139)
>>  at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
>> ors.POCombinerPackage.getNext(POCombinerPackage.java:148)
>>  at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Com
>> bine.processOnePackageOutput(PigCombiner.java:168)
>>  at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Com
>> bine.reduce(PigCombiner.java:159)
>>  at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Com
>> bine.reduce(PigCombiner.java:50)
>>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>  at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1217)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1
>> 227)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:64
>> 8)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.jav
>> a:1135)
>> 
>> 
>> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
>>  at java.util.AbstractList.iterator(AbstractList.java:273)
>>  at org.apache.pig.data.DefaultTuple.getMemorySize(DefaultTuple.java:185)
>>  at org.apache.pig.data.InternalCachedBag.add(InternalCachedBag.java:89)
>>  at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
>> ors.POCombinerPackage.getNext(POCombinerPackage.java:168)
>>  at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Com
>> bine.processOnePackageOutput(PigCombiner.java:168)
>>  at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Com
>> bine.reduce(PigCombiner.java:159)
>>  at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Com
>> bine.reduce(PigCombiner.java:50)
>>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>  at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1217)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1
>> 227)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:64
>> 8)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.jav
>> a:1135)
>> 
>> 
>> Error: GC overhead limit exceeded
>> -------
>> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
>>  at
>> org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35
>> )
>>  at
>> org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:55)
>>  at
>> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
>>  at
>> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:130)
>>  at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:289)
>>  at
>> org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.ja
>> va:114)
>>  at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
>> eserialize(WritableSerialization.java:67)
>>  at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
>> eserialize(WritableSerialization.java:40)
>>  at
>> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:11
>> 6)
>>  at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>>  at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1217)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1
>> 227)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:64
>> 8)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.jav
>> a:1135)
>> 
>> 
>> 
>> On 7/7/10 5:50 PM, "Ashutosh Chauhan" <ashutosh.chau...@gmail.com> wrote:
>> 
>>> Syed,
>>> 
>>> One line stack traces arent much helpful :) Please provide the full stack
>>> trace and the pig script which produced it and we can take a look.
>>> 
>>> Ashutosh
>>> On Wed, Jul 7, 2010 at 14:09, Syed Wasti <mdwa...@hotmail.com> wrote:
>>> 
>>>> 
>>>> I am running my Pig scripts on our QA cluster (with 4 datanoes, see blelow)
>>>> and has Cloudera CDH2 release installed and global heap max is ­Xmx4096m.I
>>>> am
>>>> constantly getting OutOfMemory errors (see below) on my map and reduce
>>>> jobs, when I try run my script against large data where it produces around
>>>> 600 maps.
>>>> Looking for some tips on the best configuration for pig and to get rid of
>>>> these errors. Thanks.
>>>> 
>>>> 
>>>> 
>>>> Error: GC overhead limit exceededError: java.lang.OutOfMemoryError: Java
>>>> heap space
>>>> 
>>>> Regards
>>>> Syed
>>>> 
>> 
>> 
>> 
> 


Reply via email to