Hello all, We are currently orienting on HBase as a possible way to store our log data in a structured way, and I want to verify a few things I was not able to find online. Specifically, what we are trying to achieve:
* be able to quickly search for logs within a specific time range; * limit the amount of maps in our mapreduce jobs to only those areas we're interested in. As I understand it, there is a tradeoff: * if you use a timestamp as a split key, be prepared for a tradeoff: a single region server can become a hotspot. This is bad when writing data at a high load; * if we do not have the timestamp as the first key of the splitkeys, a MapReduce job will have to do a TableScan and have a huge amount of maps. Is there a known solution / workaround for this problem that people have used? Since our timespan queries are usually limited based on days, we were considering adding a new table for each day, but that looked like a bit of an ugly hack. Any ideas / suggestions about this ? Regards, Leon Mergen
