Ah, sorry, just saw that this should read:

PigStatusReporter.getInstance() and there is no special counters
keyword/variable. However is this common for Pig, being able to access
static methods directly from within a Pig script?

Thanks,

Josh


On 18 October 2010 11:56, Josh Devins <[email protected]> wrote:

> Thanks, I will explore the stats in MR mode a bit once I'm on 0.8/trunk.
>
> I will also have a look at wrapping some of the standard loaders to get
> better stats out of them. Is this of interest to anyone else? Should I
> submit back to PiggyBank?
>
> This syntax of counters.PigStatusReporter, is that documented somewhere? Is
> it only on 0.8/trunk? What other variables do we have access to in the
> "native" Pig script other than "counters"?
>
> Josh
>
>
>
> On 17 October 2010 19:44, Dmitriy Ryaboy <[email protected]> wrote:
>
>> No on Filters (though every MR job tells you the number of records
>> ingested,
>> and the number returned, and as of 0.8 it also tells you which relations
>> were being produced in the job -- so you can sort of back into that).
>> EB sort of gives you 2), most of the loaders in there give you number of
>> malformed records, though they do not store the bad records anywhere.
>> I am not sure what you mean by 3) -- you can just increment
>> counters.
>> PigStatusReporter.getInstance().getCounter(myEnum).increment(1L);
>>
>> (watch out for a null reporter when you are still in the client-side).
>>
>> -D
>>
>>
>> On Sat, Oct 16, 2010 at 2:28 PM, Josh Devins <[email protected]> wrote:
>>
>> > I've seen a few threads about counters, PigStats, Elephant-Bird's stats
>> > utility class, etc.
>> >
>> > http://www.mail-archive.com/[email protected]/msg00900.html
>> > http://www.mail-archive.com/user%40pig.apache.org/msg00034.html
>> >
>> > Has any progress been made on this or to provide a comprehensive
>> > stats/counter mechanism?
>> >
>> > What I'm looking to do is three-fold:
>> >
>> > 1) Get stats on the number of records that are filtered out when using
>> the
>> > FILTER operation
>> > 2) Get stats on the number of records dropped/not loaded in a LOAD
>> function
>> > (and actual copies of the records/rows from the file for later
>> evaluation)
>> > 3) Output my own stats from a Pig job (without resorting to writing my
>> own
>> > UDF and pushing things into PigStats using the Elephant-Bird utility)
>> >
>> > If any of this is possible, it would be great to see some examples or
>> > documentation. I would hate to go to raw Hadoop MR code just to get to
>> > counters.
>> >
>> > Thanks,
>> >
>> > Josh
>> >
>>
>
>

Reply via email to