On May 15, 2012, at 7:10 AM, Leon Mergen <[email protected]> wrote:

> Hello all,
>
> We are currently orienting on HBase as a possible way to store our log data
> in a structured way, and I want to verify a few things I was not able to
> find online. Specifically, what we are trying to achieve:
>
> * be able to quickly search for logs within a specific time range;
> * limit the amount of maps in our mapreduce jobs to only those areas we're
> interested in.
>
> As I understand it, there is a tradeoff:
>
> * if you use a timestamp as a split key, be prepared for a tradeoff: a
> single region server can become a hotspot. This is bad when writing data at
> a high load;
> * if we do not have the timestamp as the first key of the splitkeys, a
> MapReduce job will have to do a TableScan and have a huge amount of maps.
>
> Is there a known solution / workaround for this problem that people have
> used? Since our timespan queries are usually limited based on days, we were
> considering adding a new table for each day, but that looked like a bit of
> an ugly hack.
>
> Any ideas / suggestions about this ?
>
> Regards,
>
> Leon Mergen

Reply via email to