Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Rohini U Wed, 21 Mar 2012 16:45:50 -0700

My input data size is 9GB and I am using 20 machines.

My grouped criteria has two cases so I want 1) counts by the criteria I
have grouped 2) counts of the two inviduals cases in each of my group.


So my script in detail is:

counts = FOREACH grouped {
                     selectedFields1 = FILTER rawItems  BY type="EMPLOYER";
                   selectedFields2 = FILTER rawItems  BY type="LOCATION";
                     GENERATE
                             FLATTEN(group) as (item1, item2, item3, type) ,
                              SUM(selectedFields1.count) as
selectFields1Count,
                              SUM(selectedFields2.count) as
selectFields2Count,
                             COUNT(rawItems) as groupCriteriaCount

              }

Is there a way way to do this?


On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <[email protected]> wrote:

> you are not doing grouping followed by counting. You are doing grouping
> followed by filtering followed by counting.
> Try filtering before grouping.
>
> D
>
> On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <[email protected]> wrote:
>
> > Hi,
> >
> > I have a pig script which does a simple GROUPing followed by couting and
> I
> > get this error.  My data is certaining not that big for it to cause this
> > out of memory error. Is there a chance that this is because of some bug ?
> > Did any one come across this kind of error before?
> >
> > I am using pig 0.9.1 with hadoop 0.20.205
> >
> > My script:
> > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> >
> > grouped = GROUP rawItems BY (item1, item2, item3, type);
> >
> > counts = FOREACH grouped {
> >                     selectedFields = FILTER rawItems  BY type="EMPLOYER";
> >                     GENERATE
> >                             FLATTEN(group) as (item1, item2, item3,
> type) ,
> >                              SUM(selectedFields.count) as count
> >
> >              }
> >
> > Stack Trace:
> >
> > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child (main):
> Error
> > running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> >        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> >        at
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> >        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >        at java.security.AccessController.doPrivileged(Native Method)
> >        at javax.security.auth.Subject.doAs(Subject.java:396)
> >        at
> >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >
> > Thanks
> > -Rohini
> >
>



-- 
Regards
-Rohini

--
**
People of accomplishment rarely sat back & let things happen to them. They
went out & happened to things - Leonardo Da Vinci

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Reply via email to