You're going to need to do a UDF (ideally that implements the accumulator
interface), and you need to do a nested foreach and sort going into the UDF.

2012/1/26 Thejas Nair <[email protected]>

> I haven't understood your data/schema.
>
> I am hoping this is close to what you are trying to solve -
> schema Inp: (timestamp : int, user, url);
>
> user_url_group = group inp by (user, url);
> session_duration = foreach user_url_group generate group.user as user,
> group.url as url, MAX(inp.timestamp) - MIN(inp.timestamp) as duration;
>
> -Thejas
>
>
>
>
> On 1/25/12 2:12 AM, David Houston wrote:
>
>> Hi,
>>
>> I have an group of records that gets outputted like the below.
>>
>> ((1010046645226466896,http://**www.url.com/ <http://www.url.com/>
>> ),1277793285)
>> ((1010046645226466896,http:///**www.url.com/?image=580<http://www.url.com/?image=580>
>> ),**1277793315)
>> ((1010046645226466896,http:///**www.url.com/?image=582<http://www.url.com/?image=582>
>> ),**1277793359)
>> ((1010046645226466896,http:///**www.url.com/?image=582<http://www.url.com/?image=582>
>> ),**1277793470)
>> ((1010046645226466896,http:///**www.url.com/?image=585<http://www.url.com/?image=585>
>> ),**1277793387)
>>
>>
>> The code that gets me here is;
>>
>> ht = FOREACH A GENERATE CONCAT(visid_high,visid_low) AS guid, service,
>> hit_time_gmt, page_url as url;
>>
>> grpd = GROUP ht BY (guid, url) PARALLEL 20;
>>
>> B = FOREACH grpd {
>> t = DISTINCT ht.hit_time_gmt;
>>
>> GENERATE group, flatten(t);
>> }
>>
>>
>> What I'm having difficultly doing is working out how I would subtract
>> next value from the last to work out how long a user spent on each page.
>>
>> Any help would be greatly appreciated.
>>
>>
>> Many thanks
>>
>> Dave
>> ##############################**##############################**
>> #########################
>> Note:
>>
>> Any views or opinions are solely those of the author and do not
>> necessarily represent
>> those of Channel Four Television Corporation unless specifically stated.
>> This email
>> and any files transmitted are confidential and intended solely for the
>> use of the
>> individual or entity to which they are addressed. If you have received
>> this email in
>> error, please notify [email protected]
>>
>> Thank You.
>>
>> Channel Four Television Corporation, created by statute under English
>> law, is at 124 Horseferry Road, London, SW1P 2TX .
>>
>> 4 Ventures Limited (Company No. 04106849), incorporated in England and
>> Wales has its registered office at 124 Horseferry Road, London SW1P 2TX.
>>
>> VAT no: GB 626475817
>>
>> ##############################**##############################**
>> #########################
>>
>
>

Reply via email to