Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Dmitriy Ryaboy Thu, 22 Mar 2012 12:27:21 -0700

It's done for some cases, but this one is different since the group key
needs to change.


D

On Wed, Mar 21, 2012 at 11:41 PM, Prashant Kommireddi
<[email protected]>wrote:

> Sure I can do that. Isn't this something that should be done already? Or
> does it not work if the filter is working on a field that is part of the
> group?
>
> On Wed, Mar 21, 2012 at 11:02 PM, Dmitriy Ryaboy <[email protected]>
> wrote:
>
> > Prashant, mind filing a jira with this example? Technically, this is
> > something we could do automatically.
> >
> > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> [email protected]
> > >wrote:
> >
> > > Please pull your FILTER out of GROUP BY and do it earlier
> > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> > >
> > > In this case, you could use a FILTER followed by a bincond to
> introduce a
> > > new field "employerOrLocation", then do a group by and include the new
> > > field in the GROUP BY clause.
> > >
> > > Thanks,
> > > Prashant
> > >
> > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <[email protected]> wrote:
> > >
> > > > My input data size is 9GB and I am using 20 machines.
> > > >
> > > > My grouped criteria has two cases so I want 1) counts by the
> criteria I
> > > > have grouped 2) counts of the two inviduals cases in each of my
> group.
> > > >
> > > > So my script in detail is:
> > > >
> > > > counts = FOREACH grouped {
> > > >                     selectedFields1 = FILTER rawItems  BY
> > > type="EMPLOYER";
> > > >                   selectedFields2 = FILTER rawItems  BY
> > type="LOCATION";
> > > >                      GENERATE
> > > >                             FLATTEN(group) as (item1, item2, item3,
> > > type) ,
> > > >                               SUM(selectedFields1.count) as
> > > > selectFields1Count,
> > > >                              SUM(selectedFields2.count) as
> > > > selectFields2Count,
> > > >                             COUNT(rawItems) as groupCriteriaCount
> > > >
> > > >              }
> > > >
> > > > Is there a way way to do this?
> > > >
> > > >
> > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <[email protected]>
> > > > wrote:
> > > >
> > > > > you are not doing grouping followed by counting. You are doing
> > grouping
> > > > > followed by filtering followed by counting.
> > > > > Try filtering before grouping.
> > > > >
> > > > > D
> > > > >
> > > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <[email protected]>
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I have a pig script which does a simple GROUPing followed by
> > couting
> > > > and
> > > > > I
> > > > > > get this error.  My data is certaining not that big for it to
> cause
> > > > this
> > > > > > out of memory error. Is there a chance that this is because of
> some
> > > > bug ?
> > > > > > Did any one come across this kind of error before?
> > > > > >
> > > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > > >
> > > > > > My script:
> > > > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > > > >
> > > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > > >
> > > > > > counts = FOREACH grouped {
> > > > > >                     selectedFields = FILTER rawItems  BY
> > > > type="EMPLOYER";
> > > > > >                     GENERATE
> > > > > >                             FLATTEN(group) as (item1, item2,
> item3,
> > > > > type) ,
> > > > > >                              SUM(selectedFields.count) as count
> > > > > >
> > > > > >              }
> > > > > >
> > > > > > Stack Trace:
> > > > > >
> > > > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child
> > (main):
> > > > > Error
> > > > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> > > exceeded
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > > > > >        at
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > > > > >        at
> > > > > >
> > > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > > > > >        at
> > > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > > > > >        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > > > >        at java.security.AccessController.doPrivileged(Native
> > Method)
> > > > > >        at javax.security.auth.Subject.doAs(Subject.java:396)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > > > >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > > > >
> > > > > > Thanks
> > > > > > -Rohini
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards
> > > > -Rohini
> > > >
> > > > --
> > > > **
> > > > People of accomplishment rarely sat back & let things happen to them.
> > > They
> > > > went out & happened to things - Leonardo Da Vinci
> > > >
> > >
> >
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Reply via email to