Some clarification on the below.  Ignore the outer bag, I'd removed some data 
elements for clarity and simplicity.  Basically, I'm trying to find a way to go 
from:

{(pg),(pg),...,(pg)}
to
{(pg,pg,...,pg)}

For an abritrary number of "pg" tuples.

SB

-----Original Message-----
From: Steve Bernstein [mailto:[email protected]] 
Sent: Wednesday, August 29, 2012 4:28 PM
To: [email protected]
Subject: group by clickstream

Hi all,
I have a bag, clickstreams: {clickStream: {pageName: chararray}}, for which 
each row represents a sequence of pages and events in a single session on a 
website.  The interior bag, clickstream, represents this as a sequence of one 
or more single element tuples, e.g.,

{(homepage),(pg1),(pg2),...,(pgN)}

I'd like to group by the sequences so I can get counts and ultimately sort to 
find the most common clickstreams.  A bag can't be a key for grouping, I've 
discovered, but it seems like it ought to be easy to flatten the clickstream 
bag into some other form such that the sequences can be used as keys for 
grouping.  But I can't figure it out.

Any ideas?

Thanks!
Steve

Reply via email to