Re: " monotonically increasing column names." No problem with that.
-----Original Message----- From: Leif Wickland [mailto:[email protected]] Sent: Monday, June 13, 2011 5:29 PM To: [email protected] Subject: Re: Question from HBase book: "HBase currently does not do well with anything about two or three column families" > > Read the part about monotonically increasing keys in the HBase book. > There have been lots of other threads in the dist-list about this topic too. Thanks for mentioning that, Doug. I did see that in the HBase book. My wording was poor. I meant that the column names would be derived from data like [timestamp, action details, session ID]. I've been trying to figure out if I could use the cell's timestamp (and have no garbage collection) so that the key name would be derived from [action details, session ID]. The downside of that approach is I'd need to load all of the cells in memory and sort it in order to do some of the analysis I need. I don't remember seeing an admonishing against monotonically increasing column names. Is that also a bad idea? Thanks for your help, Leif Wickland > > -----Original Message----- > From: Leif Wickland [mailto:[email protected]] > Sent: Monday, June 13, 2011 1:29 PM > To: [email protected] > Subject: Re: Question from HBase book: "HBase currently does not do > well with anything about two or three column families" > > > > > If they have divergent read and write patterns why not put them in > > separate tables? > > > > That's an entirely fair question. I'm new to this. I figured if the > data was related to the same thing and could have the same key, then > it ought to go into various CFs on that key in a single table. I got > the feeling from reading the BigTable paper that the typical design > approach was to dump lots of CFs into a table. It seems like that's not the > HBase-way, though. > > For the most part it's not a big deal to store the data in separate tables. > However, I'm curious what you'd recommend for one particular part of it. > Specifically I'd like to store actions within a web visit. I've been > planning to store individual actions as columns in their own column > family, keyed by something like [timestamp, action details, session > ID]. In another column family I'd been planning on storing statistics > about the actions, such as first time, end time, count, etc. When > writing to the actions CF, I'd need to read from and possibly update > the stats CF. Would your recommendation be to store that kind of data > in the same CF, two CFs in the same table, or in two separate tables? > > My thought was that I could use row locking to avoid races to update > the stats after inserting into actions if I took the two CF approach. > > Thanks for your feedback, > > Leif Wickland >
