> > Read the part about monotonically increasing keys in the HBase book. There > have been lots of other threads in the dist-list about this topic too.
Thanks for mentioning that, Doug. I did see that in the HBase book. My wording was poor. I meant that the column names would be derived from data like [timestamp, action details, session ID]. I've been trying to figure out if I could use the cell's timestamp (and have no garbage collection) so that the key name would be derived from [action details, session ID]. The downside of that approach is I'd need to load all of the cells in memory and sort it in order to do some of the analysis I need. I don't remember seeing an admonishing against monotonically increasing column names. Is that also a bad idea? Thanks for your help, Leif Wickland > > -----Original Message----- > From: Leif Wickland [mailto:[email protected]] > Sent: Monday, June 13, 2011 1:29 PM > To: [email protected] > Subject: Re: Question from HBase book: "HBase currently does not do well > with anything about two or three column families" > > > > > If they have divergent read and write patterns why not put them in > > separate tables? > > > > That's an entirely fair question. I'm new to this. I figured if the data > was related to the same thing and could have the same key, then it ought to > go into various CFs on that key in a single table. I got the feeling from > reading the BigTable paper that the typical design approach was to dump lots > of CFs into a table. It seems like that's not the HBase-way, though. > > For the most part it's not a big deal to store the data in separate tables. > However, I'm curious what you'd recommend for one particular part of it. > Specifically I'd like to store actions within a web visit. I've been > planning to store individual actions as columns in their own column family, > keyed by something like [timestamp, action details, session ID]. In another > column family I'd been planning on storing statistics about the actions, > such as first time, end time, count, etc. When writing to the actions CF, > I'd need to read from and possibly update the stats CF. Would your > recommendation be to store that kind of data in the same CF, two CFs in the > same table, or in two separate tables? > > My thought was that I could use row locking to avoid races to update the > stats after inserting into actions if I took the two CF approach. > > Thanks for your feedback, > > Leif Wickland >
