Designing Row Key

Phil Evans Wed, 07 Mar 2012 02:23:56 -0800

Dear All,

We’re currently designing a Row Key for our schema and this has raised a
number of queries which we’ve struggled to find a definitive answer to but
think we understand what goes on and hoped someone on the list would be
able to help clarify!


Ultimately, the data we are storing is time series data and we understand
the issues that can arise from having the reverse order timestamp in the
left most part of the key. However, from what I’ve read the solution used
by the OpenTSDB project for prefixing the reverse order date with some sort
of salted value (the metric type) would work well for us, too.

   - Due to the shape of the data we are storing, it is quite likely that a
   handful of those salted values (perhaps 3 or 4 of them) will have
   significantly more rows stored against them than the others. Could this
   result in a particular node getting full? From the impression I’ve got from
   *HBase - The Definitive Guide* it appears that it’s possible for regions
   to get moved between nodes. Is that correct and does this happen
   automatically? Is it possible for one of those metric types/salted values
   to be stored over a number of different regions to stop a particular node
   from being nailed?
   - Secondly, from a data recovery point of view, our assumption is,
   should a node fail we’re covered because the data is partially replicated
   to multiple nodes (by HDFS) and therefore the regions previously served by
   the failed node can be reconstructed and made available via a different
   node. Is that a correct assumption? For development purposes we are
   currently running with three nodes. Is that sufficient? Is there a
   recommended minimum number of nodes?

Thanks for taking the time to read my email and apologise if some of these
questions are a bit basic!

Looking forward to your response,
Cheers,

Phil.

Designing Row Key

Reply via email to