Hi there- You probably also want to see this section in the RefGuide on schema design...
http://hbase.apache.org/book.html#rowkey.design ... as well as this for region-RS assignment (and failover)... http://hbase.apache.org/book.html#regions.arch re: "recommended minimum number of nodes?" The RefGuide also comments on this as well... http://hbase.apache.org/book.html#arch.overview Good luck! On 3/7/12 5:23 AM, "Phil Evans" <[email protected]> wrote: >Dear All, > >We¹re currently designing a Row Key for our schema and this has raised a >number of queries which we¹ve struggled to find a definitive answer to but >think we understand what goes on and hoped someone on the list would be >able to help clarify! > >Ultimately, the data we are storing is time series data and we understand >the issues that can arise from having the reverse order timestamp in the >left most part of the key. However, from what I¹ve read the solution used >by the OpenTSDB project for prefixing the reverse order date with some >sort >of salted value (the metric type) would work well for us, too. > > - Due to the shape of the data we are storing, it is quite likely that >a > handful of those salted values (perhaps 3 or 4 of them) will have > significantly more rows stored against them than the others. Could this > result in a particular node getting full? From the impression I¹ve got >from > *HBase - The Definitive Guide* it appears that it¹s possible for >regions > to get moved between nodes. Is that correct and does this happen > automatically? Is it possible for one of those metric types/salted >values > to be stored over a number of different regions to stop a particular >node > from being nailed? > - Secondly, from a data recovery point of view, our assumption is, > should a node fail we¹re covered because the data is partially >replicated > to multiple nodes (by HDFS) and therefore the regions previously >served by > the failed node can be reconstructed and made available via a different > node. Is that a correct assumption? For development purposes we are > currently running with three nodes. Is that sufficient? Is there a > recommended minimum number of nodes? > >Thanks for taking the time to read my email and apologise if some of these >questions are a bit basic! > >Looking forward to your response, >Cheers, > >Phil.
