You're going to need to do a UDF (ideally that implements the accumulator interface), and you need to do a nested foreach and sort going into the UDF.
2012/1/26 Thejas Nair <[email protected]> > I haven't understood your data/schema. > > I am hoping this is close to what you are trying to solve - > schema Inp: (timestamp : int, user, url); > > user_url_group = group inp by (user, url); > session_duration = foreach user_url_group generate group.user as user, > group.url as url, MAX(inp.timestamp) - MIN(inp.timestamp) as duration; > > -Thejas > > > > > On 1/25/12 2:12 AM, David Houston wrote: > >> Hi, >> >> I have an group of records that gets outputted like the below. >> >> ((1010046645226466896,http://**www.url.com/ <http://www.url.com/> >> ),1277793285) >> ((1010046645226466896,http:///**www.url.com/?image=580<http://www.url.com/?image=580> >> ),**1277793315) >> ((1010046645226466896,http:///**www.url.com/?image=582<http://www.url.com/?image=582> >> ),**1277793359) >> ((1010046645226466896,http:///**www.url.com/?image=582<http://www.url.com/?image=582> >> ),**1277793470) >> ((1010046645226466896,http:///**www.url.com/?image=585<http://www.url.com/?image=585> >> ),**1277793387) >> >> >> The code that gets me here is; >> >> ht = FOREACH A GENERATE CONCAT(visid_high,visid_low) AS guid, service, >> hit_time_gmt, page_url as url; >> >> grpd = GROUP ht BY (guid, url) PARALLEL 20; >> >> B = FOREACH grpd { >> t = DISTINCT ht.hit_time_gmt; >> >> GENERATE group, flatten(t); >> } >> >> >> What I'm having difficultly doing is working out how I would subtract >> next value from the last to work out how long a user spent on each page. >> >> Any help would be greatly appreciated. >> >> >> Many thanks >> >> Dave >> ##############################**##############################** >> ######################### >> Note: >> >> Any views or opinions are solely those of the author and do not >> necessarily represent >> those of Channel Four Television Corporation unless specifically stated. >> This email >> and any files transmitted are confidential and intended solely for the >> use of the >> individual or entity to which they are addressed. If you have received >> this email in >> error, please notify [email protected] >> >> Thank You. >> >> Channel Four Television Corporation, created by statute under English >> law, is at 124 Horseferry Road, London, SW1P 2TX . >> >> 4 Ventures Limited (Company No. 04106849), incorporated in England and >> Wales has its registered office at 124 Horseferry Road, London SW1P 2TX. >> >> VAT no: GB 626475817 >> >> ##############################**##############################** >> ######################### >> > >
