Prashant,

I just added the stackhere as comment to the opened jira.


Thanks for the help.

-Rohini

On Fri, Mar 23, 2012 at 12:46 PM, Prashant Kommireddi
<[email protected]>wrote:

> Rohini, it's fine even if you could reply with the stacktrace here. I can
> add it to JIRA.
>
> Thanks,
> Prashant
>
> On Thu, Mar 22, 2012 at 7:10 PM, Prashant Kommireddi <[email protected]
> >wrote:
>
> > Rohini,
> >
> > Here is the JIRA. https://issues.apache.org/jira/browse/PIG-2610
> >
> > Can you please post the stacktrace as a comment to it?
> >
> > Thanks,
> > Prashant
> >
> >
> > On Thu, Mar 22, 2012 at 2:37 PM, Jonathan Coveney <[email protected]
> >wrote:
> >
> >> Rohini,
> >>
> >> In the meantime, something like the following should work:
> >>
> >> aw = LOAD 'input' using MyCustomLoader();
> >>
> >> searches = FOREACH raw GENERATE
> >>               day, searchType,
> >>               FLATTEN(impBag) AS (adType, clickCount)
> >>           ;
> >>
> >> searches_2 = foreach searches generate *, ( adType == 'type1' ?
> clickCount
> >> : 0 ) as type1_clickCount, ( adType == 'type2' ? clickCount : 0 ) as
> >> type2_clickCount;
> >>
> >> groupedSearches = GROUP searches_2 BY (day, searchType) PARALLEL 50;
> >> counts = FOREACH groupedSearches{
> >>                GENERATE
> >>                   FLATTEN(group) AS (day, searchType),
> >>                   COUNT(searches) numSearches,
> >>                   SUM(clickCount) AS clickCountPerSearchType,
> >>                    SUM(searches_2. type1_clickCount) AS type1ClickCount,
> >>                   SUM(searches_2. type2_clickCount) AS type2ClickCount;
> >>       }
> >> ;
> >>
> >> 2012/3/22 Rohini U <[email protected]>
> >>
> >> > Thanks Prashant,
> >> > I am using Pig 0.9.1 and hadoop 0.20.205
> >> >
> >> > Thanks,
> >> > Rohini
> >> >
> >> > On Thu, Mar 22, 2012 at 1:27 PM, Prashant Kommireddi <
> >> [email protected]
> >> > >wrote:
> >> >
> >> > > This makes more sense, grouping and filter are on different
> columns. I
> >> > will
> >> > > open a JIRA soon.
> >> > >
> >> > > What version of Pig and Hadoop are you using?
> >> > >
> >> > > Thanks,
> >> > > Prashant
> >> > >
> >> > > On Thu, Mar 22, 2012 at 1:12 PM, Rohini U <[email protected]>
> wrote:
> >> > >
> >> > > > Hi Prashant,
> >> > > >
> >> > > > Here is my script in full.
> >> > > >
> >> > > >
> >> > > > raw = LOAD 'input' using MyCustomLoader();
> >> > > >
> >> > > > searches = FOREACH raw GENERATE
> >> > > >                day, searchType,
> >> > > >                FLATTEN(impBag) AS (adType, clickCount)
> >> > > >            ;
> >> > > >
> >> > > > groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
> >> > > > counts = FOREACH groupedSearches{
> >> > > >                type1 = FILTER searches BY adType == 'type1';
> >> > > >                type2 = FILTER searches BY adType == 'type2';
> >> > > >                GENERATE
> >> > > >                    FLATTEN(group) AS (day, searchType),
> >> > > >                    COUNT(searches) numSearches,
> >> > > >                    SUM(clickCount) AS clickCountPerSearchType,
> >> > > >                    SUM(type1.clickCount) AS type1ClickCount,
> >> > > >                    SUM(type2.clickCount) AS type2ClickCount;
> >> > > >        }
> >> > > > ;
> >> > > >
> >> > > > As you can see above, I am counting the counts by the day and
> search
> >> > type
> >> > > > in clickCountPerSearchType and for each of them i need the counts
> >> > broken
> >> > > by
> >> > > > the ad type.
> >> > > >
> >> > > > Thanks for your help!
> >> > > > Thanks,
> >> > > > Rohini
> >> > > >
> >> > > >
> >> > > > On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
> >> > > > <[email protected]>wrote:
> >> > > >
> >> > > > > Hi Rohini,
> >> > > > >
> >> > > > > From your query it looks like you are already grouping it by
> >> TYPE, so
> >> > > not
> >> > > > > sure why you would want the SUM of, say "EMPLOYER" type in
> >> "LOCATION"
> >> > > and
> >> > > > > vice-versa. Your output is already broken down by TYPE.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Prashant
> >> > > > >
> >> > > > > On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <[email protected]>
> >> > wrote:
> >> > > > >
> >> > > > > > Thanks for the suggestion Prashant. However, that will not
> work
> >> in
> >> > my
> >> > > > > case.
> >> > > > > >
> >> > > > > > If I filter before the group and include the new field in
> group
> >> as
> >> > > you
> >> > > > > > suggested, I get the individual counts broken by the select
> >> field
> >> > > > > > critieria. However, I want the totals also without taking the
> >> > select
> >> > > > > fields
> >> > > > > > into account. That is why I took the approach I described in
> my
> >> > > earlier
> >> > > > > > emails.
> >> > > > > >
> >> > > > > > Thanks
> >> > > > > > Rohini
> >> > > > > >
> >> > > > > > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> >> > > > > [email protected]
> >> > > > > > >wrote:
> >> > > > > >
> >> > > > > > > Please pull your FILTER out of GROUP BY and do it earlier
> >> > > > > > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> >> > > > > > >
> >> > > > > > > In this case, you could use a FILTER followed by a bincond
> to
> >> > > > > introduce a
> >> > > > > > > new field "employerOrLocation", then do a group by and
> include
> >> > the
> >> > > > new
> >> > > > > > > field in the GROUP BY clause.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > Prashant
> >> > > > > > >
> >> > > > > > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <
> [email protected]
> >> >
> >> > > > wrote:
> >> > > > > > >
> >> > > > > > > > My input data size is 9GB and I am using 20 machines.
> >> > > > > > > >
> >> > > > > > > > My grouped criteria has two cases so I want 1) counts by
> the
> >> > > > > criteria I
> >> > > > > > > > have grouped 2) counts of the two inviduals cases in each
> >> of my
> >> > > > > group.
> >> > > > > > > >
> >> > > > > > > > So my script in detail is:
> >> > > > > > > >
> >> > > > > > > > counts = FOREACH grouped {
> >> > > > > > > >                     selectedFields1 = FILTER rawItems  BY
> >> > > > > > > type="EMPLOYER";
> >> > > > > > > >                   selectedFields2 = FILTER rawItems  BY
> >> > > > > > type="LOCATION";
> >> > > > > > > >                      GENERATE
> >> > > > > > > >                             FLATTEN(group) as (item1,
> item2,
> >> > > item3,
> >> > > > > > > type) ,
> >> > > > > > > >                               SUM(selectedFields1.count)
> as
> >> > > > > > > > selectFields1Count,
> >> > > > > > > >                              SUM(selectedFields2.count) as
> >> > > > > > > > selectFields2Count,
> >> > > > > > > >                             COUNT(rawItems) as
> >> > groupCriteriaCount
> >> > > > > > > >
> >> > > > > > > >              }
> >> > > > > > > >
> >> > > > > > > > Is there a way way to do this?
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <
> >> > > > [email protected]>
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > you are not doing grouping followed by counting. You are
> >> > doing
> >> > > > > > grouping
> >> > > > > > > > > followed by filtering followed by counting.
> >> > > > > > > > > Try filtering before grouping.
> >> > > > > > > > >
> >> > > > > > > > > D
> >> > > > > > > > >
> >> > > > > > > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <
> >> > [email protected]
> >> > > >
> >> > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi,
> >> > > > > > > > > >
> >> > > > > > > > > > I have a pig script which does a simple GROUPing
> >> followed
> >> > by
> >> > > > > > couting
> >> > > > > > > > and
> >> > > > > > > > > I
> >> > > > > > > > > > get this error.  My data is certaining not that big
> for
> >> it
> >> > to
> >> > > > > cause
> >> > > > > > > > this
> >> > > > > > > > > > out of memory error. Is there a chance that this is
> >> because
> >> > > of
> >> > > > > some
> >> > > > > > > > bug ?
> >> > > > > > > > > > Did any one come across this kind of error before?
> >> > > > > > > > > >
> >> > > > > > > > > > I am using pig 0.9.1 with hadoop 0.20.205
> >> > > > > > > > > >
> >> > > > > > > > > > My script:
> >> > > > > > > > > > rawItems = LOAD 'in' as (item1, item2, item3, type,
> >> count);
> >> > > > > > > > > >
> >> > > > > > > > > > grouped = GROUP rawItems BY (item1, item2, item3,
> type);
> >> > > > > > > > > >
> >> > > > > > > > > > counts = FOREACH grouped {
> >> > > > > > > > > >                     selectedFields = FILTER rawItems
>  BY
> >> > > > > > > > type="EMPLOYER";
> >> > > > > > > > > >                     GENERATE
> >> > > > > > > > > >                             FLATTEN(group) as (item1,
> >> > item2,
> >> > > > > item3,
> >> > > > > > > > > type) ,
> >> > > > > > > > > >                              SUM(selectedFields.count)
> >> as
> >> > > count
> >> > > > > > > > > >
> >> > > > > > > > > >              }
> >> > > > > > > > > >
> >> > > > > > > > > > Stack Trace:
> >> > > > > > > > > >
> >> > > > > > > > > > 2012-03-21 19:19:59,346 FATAL
> >> > org.apache.hadoop.mapred.Child
> >> > > > > > (main):
> >> > > > > > > > > Error
> >> > > > > > > > > > running child : java.lang.OutOfMemoryError: GC
> overhead
> >> > limit
> >> > > > > > > exceeded
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> >> > > > > > > > > >        at
> >> > > > > org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > >
> >> > > >
> >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> >> > > > > > > > > >        at
> >> > > > > > > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> >> > > > > > > > > >        at
> >> > > org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >> > > > > > > > > >        at
> >> > java.security.AccessController.doPrivileged(Native
> >> > > > > > Method)
> >> > > > > > > > > >        at
> >> > javax.security.auth.Subject.doAs(Subject.java:396)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> >> > > > > > > > > >        at
> >> > org.apache.hadoop.mapred.Child.main(Child.java:249)
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks
> >> > > > > > > > > > -Rohini
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > --
> >> > > > > > > > Regards
> >> > > > > > > > -Rohini
> >> > > > > > > >
> >> > > > > > > > --
> >> > > > > > > > **
> >> > > > > > > > People of accomplishment rarely sat back & let things
> >> happen to
> >> > > > them.
> >> > > > > > > They
> >> > > > > > > > went out & happened to things - Leonardo Da Vinci
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Regards
> >> > -Rohini
> >> >
> >> > --
> >> > **
> >> > People of accomplishment rarely sat back & let things happen to them.
> >> They
> >> > went out & happened to things - Leonardo Da Vinci
> >> >
> >>
> >
> >
>

Reply via email to