On Jun 30, 2010, at 3:17 PM, Syed Wasti wrote:
> OR use the COUNT_STAR function to compute the number of elements in a bag. > > lsccnt = FOREACH lscg generate group.from_state, group.to_state, > COUNT_STAR(lsc); > > > On 6/30/10 3:12 PM, "Syed Wasti" <[email protected]> wrote: > >> I guess this is what you are looking for; >> >> lsccnt = FOREACH lscg { >> dist_id = DISTINCT lsc.listener_id; >> GENERATE group.from_state, group.to_state, COUNT(dist_id); >> }; >> >> >> On 6/30/10 2:18 PM, "elein" <[email protected]> wrote: >> >>> >>> lsc = LOAD '/user/hadoop/radio_event/listenerStateChange/2010-06-30' >>> AS (daterecorded:chararray, listener_id:long, to_state:chararray, >>> from_state:chararray); >>> describe lsc; >>> lscg = group lsc by (from_state, to_state); >>> describe lscg; >>> //lsccnt = FOREACH lscg generate group.from_state, group.to_state, >>> COUNT(lsc.listener_id); >>> lsccnt = FOREACH lscg generate group.from_state, group.to_state, COUNT(lsc); >>> >>> The first lsccnt line generates (,,0L) and the second generates (,,54321); >>> What I want is tuples like >>> (state1,state2,123) >>> (state3,state2,456 >>> >>> And so on for each combination of from_state and to_state. >>> >>> What am I missing? >>> >>> elein >>> [email protected] I will try both of those things. I've found however, I'm dealing with an xml file instead of a tab separated file and need to figure out how to get access to the loader udf. Obviously I'm a newbie and just getting started and my environment is not quite together. Thank you, elein [email protected]
