On Jun 30, 2010, at 3:17 PM, Syed Wasti wrote:

> OR use the COUNT_STAR function to compute the number of elements in a bag.
> 
> lsccnt = FOREACH lscg generate group.from_state, group.to_state,
> COUNT_STAR(lsc);
> 
> 
> On 6/30/10 3:12 PM, "Syed Wasti" <[email protected]> wrote:
> 
>> I guess this is what you are looking for;
>> 
>> lsccnt =    FOREACH lscg {
>>            dist_id = DISTINCT lsc.listener_id;
>>            GENERATE group.from_state, group.to_state, COUNT(dist_id);
>>            };
>> 
>> 
>> On 6/30/10 2:18 PM, "elein" <[email protected]> wrote:
>> 
>>> 
>>> lsc = LOAD '/user/hadoop/radio_event/listenerStateChange/2010-06-30'
>>>   AS (daterecorded:chararray, listener_id:long, to_state:chararray,
>>> from_state:chararray);
>>> describe lsc;
>>> lscg = group lsc by (from_state, to_state);
>>> describe lscg;
>>> //lsccnt = FOREACH lscg generate group.from_state, group.to_state,
>>> COUNT(lsc.listener_id);
>>> lsccnt = FOREACH lscg generate group.from_state, group.to_state, COUNT(lsc);
>>> 
>>> The first lsccnt line generates (,,0L) and the second generates (,,54321);
>>> What I want is tuples like
>>> (state1,state2,123)
>>> (state3,state2,456
>>> 
>>> And so on for each combination of from_state and to_state.
>>> 
>>> What am I missing?
>>> 
>>> elein
>>> [email protected]

I will try both of those things.  I've found however, I'm dealing with an xml 
file instead
of a tab separated file and need to figure out how to get access to the
loader udf.  Obviously I'm a newbie and just getting started and my environment 
is not quite
together.

Thank you,

elein
[email protected]




Reply via email to