Dear All, We’re currently designing a Row Key for our schema and this has raised a number of queries which we’ve struggled to find a definitive answer to but think we understand what goes on and hoped someone on the list would be able to help clarify!
Ultimately, the data we are storing is time series data and we understand the issues that can arise from having the reverse order timestamp in the left most part of the key. However, from what I’ve read the solution used by the OpenTSDB project for prefixing the reverse order date with some sort of salted value (the metric type) would work well for us, too. - Due to the shape of the data we are storing, it is quite likely that a handful of those salted values (perhaps 3 or 4 of them) will have significantly more rows stored against them than the others. Could this result in a particular node getting full? From the impression I’ve got from *HBase - The Definitive Guide* it appears that it’s possible for regions to get moved between nodes. Is that correct and does this happen automatically? Is it possible for one of those metric types/salted values to be stored over a number of different regions to stop a particular node from being nailed? - Secondly, from a data recovery point of view, our assumption is, should a node fail we’re covered because the data is partially replicated to multiple nodes (by HDFS) and therefore the regions previously served by the failed node can be reconstructed and made available via a different node. Is that a correct assumption? For development purposes we are currently running with three nodes. Is that sufficient? Is there a recommended minimum number of nodes? Thanks for taking the time to read my email and apologise if some of these questions are a bit basic! Looking forward to your response, Cheers, Phil.
