The code snipped I wrote was for use inside a UDF, not part of Pig Latin. The way to get at things like counters when running Pig code would have to be to write a Java driver program that would use the new API in https://issues.apache.org/jira/browse/PIG-1478 and https://issues.apache.org/jira/browse/PIG-1333
-Dmitriy On Mon, Oct 18, 2010 at 2:57 AM, Josh Devins <[email protected]> wrote: > Ah, sorry, just saw that this should read: > > PigStatusReporter.getInstance() and there is no special counters > keyword/variable. However is this common for Pig, being able to access > static methods directly from within a Pig script? > > Thanks, > > Josh > > > On 18 October 2010 11:56, Josh Devins <[email protected]> wrote: > >> Thanks, I will explore the stats in MR mode a bit once I'm on 0.8/trunk. >> >> I will also have a look at wrapping some of the standard loaders to get >> better stats out of them. Is this of interest to anyone else? Should I >> submit back to PiggyBank? >> >> This syntax of counters.PigStatusReporter, is that documented somewhere? Is >> it only on 0.8/trunk? What other variables do we have access to in the >> "native" Pig script other than "counters"? >> >> Josh >> >> >> >> On 17 October 2010 19:44, Dmitriy Ryaboy <[email protected]> wrote: >> >>> No on Filters (though every MR job tells you the number of records >>> ingested, >>> and the number returned, and as of 0.8 it also tells you which relations >>> were being produced in the job -- so you can sort of back into that). >>> EB sort of gives you 2), most of the loaders in there give you number of >>> malformed records, though they do not store the bad records anywhere. >>> I am not sure what you mean by 3) -- you can just increment >>> counters. >>> PigStatusReporter.getInstance().getCounter(myEnum).increment(1L); >>> >>> (watch out for a null reporter when you are still in the client-side). >>> >>> -D >>> >>> >>> On Sat, Oct 16, 2010 at 2:28 PM, Josh Devins <[email protected]> wrote: >>> >>> > I've seen a few threads about counters, PigStats, Elephant-Bird's stats >>> > utility class, etc. >>> > >>> > http://www.mail-archive.com/[email protected]/msg00900.html >>> > http://www.mail-archive.com/user%40pig.apache.org/msg00034.html >>> > >>> > Has any progress been made on this or to provide a comprehensive >>> > stats/counter mechanism? >>> > >>> > What I'm looking to do is three-fold: >>> > >>> > 1) Get stats on the number of records that are filtered out when using >>> the >>> > FILTER operation >>> > 2) Get stats on the number of records dropped/not loaded in a LOAD >>> function >>> > (and actual copies of the records/rows from the file for later >>> evaluation) >>> > 3) Output my own stats from a Pig job (without resorting to writing my >>> own >>> > UDF and pushing things into PigStats using the Elephant-Bird utility) >>> > >>> > If any of this is possible, it would be great to see some examples or >>> > documentation. I would hate to go to raw Hadoop MR code just to get to >>> > counters. >>> > >>> > Thanks, >>> > >>> > Josh >>> > >>> >> >> >
