This seems reasonable, except it seems like it would make more sense to convert 
query parameters to maps.  By definition a query parameter is key=value.  And a 
map is easier to work with in general then a bag, since there's no need to 
flatten them.

Alan.

On Jun 11, 2012, at 10:55 AM, Mohit Anchlia wrote:

> I am looking at how to parse URL with query parameters to process
> clickstream data. Are there any examples I can look at? My steps that I
> envision are:
> 
> 1) Read lines and convert query parameters into bags that is a group of
> fields for a particular dimension table. So if Geo is one of the dimensions
> group all the geo related information from that URL as a Bag.
> In the end it would like like {{92122,CA},{Unix,FireFox}}. In this example
> first bag is GEO dimension and the second is Browser dimension.
> 2) Load these into OLAP staging database
> 3) Populate star schema from staging tables
> 
> I am sure other people might already be doing this so I thought I'll check
> as to if this makes sense.

Reply via email to