I should clarify that I've been pre-splitting tables at each shard so that each tablet consists of a single row.
On Oct 30, 2012, at 3:06 PM, Krishmin Rai wrote: > Hi All, > We're working with an index table whose row is a shardId (an integer, like > the wiki-search or IndexedDoc examples). I was just wondering what the right > strategy is for choosing a number of partitions, particularly given a cluster > that could potentially grow. > > If I simply set the number of shards equal to the number of slave nodes, > additional nodes would not improve query performance (at least over the data > already ingested). But starting with more partitions than slave nodes would > result in multiple tablets per tablet server… I'm not really sure how that > would impact performance, particularly given that all queries against the > table will be batchscanners with an infinite range. > > Just wondering how others have addressed this problem, and if there are any > performance rules of thumb regarding the ratio of tablets to tablet servers. > > Thanks! > Krishmin