Re: HBase Schema Design for clickstream data

2012-06-27 Thread Amandeep Khurana
Mohit, What would be your read patterns later on? Are you going to read per session, or for a time period, or for a set of users, or process through the entire dataset every time? That would play an important role in defining your keys and columns. -Amandeep On Tue, Jun 26, 2012 at 1:34 PM,

Re: HBase Schema Design for clickstream data

2012-06-27 Thread Mohit Anchlia
Analysis include: Visitor level Session level - visitors could have multiple levels Page hits, conversions - popular pages, sequence of pages hit in one session Orders purchased - mostly determined by URL and query parameters How should I go about designing schema? Thanks Sent from my iPad

Re: HBase Schema Design for clickstream data

2012-06-27 Thread Amandeep Khurana
That's not a whole lot of information to give you recommendations about the schema. However, at a high level, you should think about structuring your row keys such that you minimize the requirement for scans and can get the required data based on the row keys. So, putting the user in the row