Prashant, thank you for the advice. That's a good one!
On Thu, Mar 7, 2013 at 12:04 PM, Prashant Kommireddi <[email protected]>wrote: > You can group on ip which will result in ip -> bag mappings. Let's say you > loaded (ip, hour, count) dataset as follows > > grunt> a = load 'data' using PigStorage(',') as (ip:chararray, hour:int, > count:int); > grunt> b = group a by > ip; > grunt> describe b; > b: {group: chararray,a: {(ip: chararray,hour: int,count: int)}} > > The group here is ip and the resulting bag contains (ip, hour, count) > tuples. > grunt> dump b; > > (128.187.97.22,{(128.187.97.22,0,180),(128.187.97.22,1,84),(128.187.97.22,2,25),(128.187.97.22,22,31),(128.187.97.22,23,2)}) > > You can write a very simple UDF that returns a Tuple based on the contents > of this bag that would have what you need. The UDF would traverse through > the tuples in the bag and create a map of hour->count. Your result tuple > should be sorted by hour and only return the counts in an order (m0...m23). > > -Prashant > > On Wed, Mar 6, 2013 at 7:20 AM, Eugene Morozov <[email protected] > >wrote: > > > Hello! > > > > I have a script that gives me following result: > > > > time_grouped = GROUP joined BY (ip, hour); > > counts = FOREACH time_grouped GENERATE group.ip as ip, group.hour as > hour, > > COUNT(joined) as count; > > > > (128.187.97.22, 0, 180) > > (128.187.97.22, 1, 84) > > (128.187.97.22, 2, 25) > > (128.187.97.22, 22, 31) > > (128.187.97.22, 23, 2) > > > > That is IP address, hour of day, counter. > > I'd like to get following: > > > > (128.187.97.22, (m1,m2, m3, ..., m23)) > > m1-m23 corresponds to the counter. And if there is nothing for particular > > hour, then I'd like to have 0 instead empty value. > > > > The trick here is that if I do not have anything for particular hour, > then > > I won't have count for it. > > Is there a way to achieve the goal? > > > > Thanks in advance > > -- > > Evgeny Morozov > > Developer Grid Dynamics > > Skype: morozov.evgeny > > www.griddynamics.com > > [email protected] > > > -- Evgeny Morozov Developer Grid Dynamics Skype: morozov.evgeny www.griddynamics.com [email protected]
