I am looking at how to parse URL with query parameters to process
clickstream data. Are there any examples I can look at? My steps that I
envision are:

1) Read lines and convert query parameters into bags that is a group of
fields for a particular dimension table. So if Geo is one of the dimensions
group all the geo related information from that URL as a Bag.
In the end it would like like {{92122,CA},{Unix,FireFox}}. In this example
first bag is GEO dimension and the second is Browser dimension.
2) Load these into OLAP staging database
3) Populate star schema from staging tables

I am sure other people might already be doing this so I thought I'll check
as to if this makes sense.

Reply via email to