Thanks Dmitriy. That's not sth I want. I want sth just like that in SQL, you 
can get a number of total count of tuples (or other things of interest) and use 
that like a variable (sorry, I don't know if I should use variable here in PIG, 
but PIG passes command line parameter as a variable, right?). So, this variable 
will be convenient for quick calculation of statistics in PIG scripts. Though I 
also realize it might not be true to use a variable in this way in PIG. So, it 
might be a misconcept in my mind anyway...

Thanks,

Michael

--- On Tue, 2/23/10, Dmitriy Ryaboy <[email protected]> wrote:

From: Dmitriy Ryaboy <[email protected]>
Subject: Re: count total number of tuples in a bag?
To: [email protected]
Date: Tuesday, February 23, 2010, 6:10 PM

c = FOREACH b GENERATE group as key, COUNT(a);

will give you the number of rows in a per key.

a_all = group a ALL;
a_count = FOREACH a_all GENERATE COUNT(a);

will give you the total number of rows in a.

Does that answer your question?


On Tue, Feb 23, 2010 at 3:54 PM, jiang licht <[email protected]> wrote:

> Excuse me I could have missed important part of PIG document and asked this
> trivial question here :) What is the best way to find out the total number
> of tuples (rows) in the bag of data loaded? For example, after "a = LOAD
> 'sth' AS (key, value); b = GROUP a BY key; c = FOREACH b GENERATE key;" I
> want to know how many tuples are loaded to 'a' and total number left in 'c'.
> One way might be to use a udf function. But is there a support of counting
> this in PIG?
>
> Thanks,
>
> Michael
>
>
>



      

Reply via email to