100 shards on a node will almost certainly be slow, but at least it would be scalable. 7TB of data on one node is going to be slow regardless of how you shard it.
I might choose a number with more useful divisors than 100, perhaps 96 or 144. wunder On Feb 28, 2013, at 4:25 PM, Mark Miller wrote: > You will pay some in performance, but it's certainly not bad practice. It's a > good choice for setting up so that you can scale later. You just have to do > some testing to make sure it fits your requirments. The Collections API even > has built in support for this - you can specify more shards than nodes and it > will overload a node. See the documentation. Later you can start up a new > replica on another machine and kill/remove the original. > > - Mark > > On Feb 28, 2013, at 7:10 PM, Chris Simpson <chrissimpson1...@outlook.com> > wrote: > >> Dear Lucene / Solr Community- >> >> I recently posted this question on Stackoverflow, but it doesnt seem to be >> going too far. Then I found this mailing list and was hoping perhaps to have >> more luck: >> >> Question- >> >> If I plan on holding 7TB of data in a Solr Cloud, is it bad practice to >> begin with 1 server holding 100 shards and then begin populating the >> collection where once the size grew, each shard ultimately will be peeled >> off into its own dedicated server (holding ~70GB ea with its own dedicated >> resources and replicas)? >> >> That is, I would start the collection with 100 shards locally, then as data >> grew, I could peel off one shard at a time and give it its own server -- >> dedicated w/plenty of resources. >> >> Is this okay to do -- or would I somehow incur a massive bottleneck >> internally by putting that many shards in 1 server to start with while data >> was low? >> >> Thank you. >> Chris >>