Thanks Dmitriy. That's not sth I want. I want sth just like that in SQL, you can get a number of total count of tuples (or other things of interest) and use that like a variable (sorry, I don't know if I should use variable here in PIG, but PIG passes command line parameter as a variable, right?). So, this variable will be convenient for quick calculation of statistics in PIG scripts. Though I also realize it might not be true to use a variable in this way in PIG. So, it might be a misconcept in my mind anyway...
Thanks, Michael --- On Tue, 2/23/10, Dmitriy Ryaboy <[email protected]> wrote: From: Dmitriy Ryaboy <[email protected]> Subject: Re: count total number of tuples in a bag? To: [email protected] Date: Tuesday, February 23, 2010, 6:10 PM c = FOREACH b GENERATE group as key, COUNT(a); will give you the number of rows in a per key. a_all = group a ALL; a_count = FOREACH a_all GENERATE COUNT(a); will give you the total number of rows in a. Does that answer your question? On Tue, Feb 23, 2010 at 3:54 PM, jiang licht <[email protected]> wrote: > Excuse me I could have missed important part of PIG document and asked this > trivial question here :) What is the best way to find out the total number > of tuples (rows) in the bag of data loaded? For example, after "a = LOAD > 'sth' AS (key, value); b = GROUP a BY key; c = FOREACH b GENERATE key;" I > want to know how many tuples are loaded to 'a' and total number left in 'c'. > One way might be to use a udf function. But is there a support of counting > this in PIG? > > Thanks, > > Michael > > >
