> That's an entirely fair question. I'm new to this. I figured if the data > was related to the same thing and could have the same key, then it ought to > go into various CFs on that key in a single table. I got the feeling from > reading the BigTable paper that the typical design approach was to dump lots > of CFs into a table. It seems like that's not the HBase-way, though.
My interpretation is that they try to keep that number low, from page 2: "It is our intent that the number of distinct column families in a table be small (in the hundreds at most)" > > For the most part it's not a big deal to store the data in separate tables. > However, I'm curious what you'd recommend for one particular part of it. > Specifically I'd like to store actions within a web visit. I've been > planning to store individual actions as columns in their own column family, > keyed by something like [timestamp, action details, session ID]. In another > column family I'd been planning on storing statistics about the actions, > such as first time, end time, count, etc. When writing to the actions CF, > I'd need to read from and possibly update the stats CF. Would your > recommendation be to store that kind of data in the same CF, two CFs in the > same table, or in two separate tables? Could you just store that in the same family? > > My thought was that I could use row locking to avoid races to update the > stats after inserting into actions if I took the two CF approach. Row locking is rarely a good idea, it doesn't scale and they currently aren't persisted anywhere except the RS memory (so if it dies...). Using a single family might be better for you. J-D
