now that I've turned off noSplitCombination we have 640 mappers. the relation being ranked is likely in the billions or 1+ trillion records.
On Fri, Apr 5, 2013 at 10:47 AM, Bill Graham <[email protected]> wrote: > How many mappers and reducers do you have? Skimming the Rank code it looks > like it creates at least N counters per task which would be a scalability > bug. > > On Friday, April 5, 2013, Lauren Blau wrote: > > > this is defintely caused by the RANK operator. Is there some way to > reduce > > the number of counters generated by this operator when working with large > > data? > > thanks > > > > On Thu, Apr 4, 2013 at 7:01 PM, Lauren Blau < > > [email protected] <javascript:;>> wrote: > > > > > I can think of only 2 things that have changed since this script last > ran > > > successfully. Switched to using the range specification of the schema > for > > > a2, and the input data has grown considerably. > > > > > > Lauren > > > > > > > > > On Thu, Apr 4, 2013 at 7:00 PM, Lauren Blau < > > > [email protected] <javascript:;>> wrote: > > > > > >> no > > >> > > >> > > >> On Thu, Apr 4, 2013 at 4:54 PM, Dmitriy Ryaboy <[email protected] > <javascript:;> > > >wrote: > > >> > > >>> Do you have any special properties set? > > >>> Like the pig.udf.profile one maybe.. > > >>> D > > >>> > > >>> > > >>> On Thu, Apr 4, 2013 at 6:25 AM, Lauren Blau < > > >>> [email protected] <javascript:;>> wrote: > > >>> > > >>> > I'm running a simple script to add a sequence_number to a relation, > > >>> sort > > >>> > the result and store to a file: > > >>> > > > >>> > a0 = load '<filename>' using PigStorage('\t','-schema'); > > >>> > a1 = rank a0; > > >>> > a2 = foreach a1 generate col1 .. col16 , rank_a0 as > sequence_number; > > >>> > a3 = order a2 by sequence_number; > > >>> > store a3 into 'outputfile' using PigStorage('\t','-schema'); > > >>> > > > >>> > I get the following error: > > >>> > org.apache.hadoop.mapreduce.counters.LimitExceededException: Too > many > > >>> > counters: 241 max=240 > > >>> > at > > >>> > > > >>> > > org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:61) > > >>> > at > > >>> > > > >>> > > org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:68) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:174) > > >>> > at > > >>> > > org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:278) > > >>> > at > > >>> > > > >>> > > > >>> > > > org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:303) > > >>> > at > > >>> > > > org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280) > > >>> > at > > >>> > > > org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75) > > >>> > at > > >>> > > > >>> > > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:951) > > >>> > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:835) > > >>> > > > >>> > > > >>> > we aren't able to up our counters any higher (policy) and I don't > > >>> > understand why I should need so many counters for such a simple > > script > > >>> > anyway? > > >>> > running Apache Pig version 0.11.1-SNAPSHOT (r: unknown) > > >>> > compiled Mar 22 2013, 10:19:19 > > >>> > > > >>> > Can someone help? > > >>> > > > >>> > Thanks, > > >>> > Lauren > > >>> > > > >>> > > >> > > >> > > > > > > > > -- > Sent from Gmail Mobile >
