i'm starting a new project, which is pretty simple it will be something like google analytics, but of course a bit smaller what is required: web servers handle requests with a kind of generic key/value list that requests will come at a pretty much high rate, lets say 1000 req per second so far i guess, there will be no problem, to handle that, and to store it in the hbase, right?
on the other hand, of course, the data must be processed and monitored that is required to be time based, i.e. i want to get statistics about a time period, lets say from day A to day B that should wotk, BUT! if i want to have a fast scan, i need to have the time stamp in the row key, right? other wise i well need to make a full scan, which can take a lot of time, if there is much data but if i have the timestamp in the key, i will end up having hot regions, like described here http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/ so what would be a better way, to have fast scans without hot regions? cheers andre
