Hello, I wanted to use separate tables for each log type since they are considerably big. Around 100gb per month, so it just seemed natural to put them into different tables since i dont need to query them all together.
Thanks for the heads up Mete On Mon, May 21, 2012 at 6:12 PM, Ian Varley <[email protected]> wrote: > Mete, > > Why separate tables per log type? Why not a single table with the key: > > <log type><date> > > That's roughly the approach used by OpenTSDB (with "metric id" instead of > "log type", but same idea). OpenTSDB goes further by "bucketing" values > into rows using a base timestamp in the row key and offset timestamps in > the column qualifiers, for more efficiency. > > If you start the key with log type, you can do partial scans for a > specific date, but only within a single log type; to scan across all log > types, you'd need to do multiple scans (one per log type). If you have a > fixed and relatively small number of log types (less than 20, say), this > could still be the best approach, but if it's a very frequent operation to > scan by time across all log types and you have a lot of log types, you > might want to reconsider that. > > The case for using a hash as the start of the key is really just to avoid > region server "hot spotting" (where, even though you have lots of machines, > all your insert traffic is going to one of them because all inserts are > happening "now" and only one region server contains the range that "now" is > in). Salting or hashing a timestamp based key spreads that out so the load > is evenly distributed; but it prevents you from doing linear scans over the > time dimension. That's why OpenTSDB (and similar models) start the key with > another value that "spreads" the data over all servers. > > Ian > > On May 21, 2012, at 7:56 AM, mete wrote: > > > Hello folks, > > > > i am trying to come up with a nice key design for storing logs in the > > company. I am planning to index them and store row key in the index for > > random reads. > > > > I need to balance the writes equally between the R.S. and i could not > > understand how opentsdb does that with prefixing the metric id. (i > related > > metric id with the log type) In my log storage case a log line just has a > > type and a date and the rest of it is not really very useful information. > > > > So i think that i can create a table for every distinct log type and i > need > > a random salt to route to a different R.S. similar to this: > > <salt>-<date> > > > > But with this approach i believe i will lose the ability to do effective > > partial scans to a specific date. (if for some reason i need that) What > do > > you think? And for the salt approach do you use randomly generated salts > or > > hashes that actually mean something? (like the hash of the date) > > > > I am using random uuids at the moment but i am trying to find a better > > approach, any feedback is welcome > > > > cheers > > Mete > >
