100 shards on a node will almost certainly be slow, but at least it would be 
scalable. 7TB of data on one node is going to be slow regardless of how you 
shard it.

I might choose a number with more useful divisors than 100, perhaps 96 or 144.

wunder

On Feb 28, 2013, at 4:25 PM, Mark Miller wrote:

> You will pay some in performance, but it's certainly not bad practice. It's a 
> good choice for setting up so that you can scale later. You just have to do 
> some testing to make sure it fits your requirments. The Collections API even 
> has built in support for this - you can specify more shards than nodes and it 
> will overload a node. See the documentation. Later you can start up a new 
> replica on another machine and kill/remove the original.
> 
> - Mark
> 
> On Feb 28, 2013, at 7:10 PM, Chris Simpson <chrissimpson1...@outlook.com> 
> wrote:
> 
>> Dear Lucene / Solr Community-
>> 
>> I recently posted this question on Stackoverflow, but it doesnt seem to be 
>> going too far. Then I found this mailing list and was hoping perhaps to have 
>> more luck:
>> 
>> Question-
>> 
>> If I plan on holding 7TB of data in a Solr Cloud, is it bad practice to 
>> begin with 1 server holding 100 shards and then begin populating the 
>> collection where once the size grew, each shard ultimately will be peeled 
>> off into its own dedicated server (holding ~70GB ea with its own dedicated 
>> resources and replicas)?
>> 
>> That is, I would start the collection with 100 shards locally, then as data 
>> grew, I could peel off one shard at a time and give it its own server -- 
>> dedicated w/plenty of resources.
>> 
>> Is this okay to do -- or would I somehow incur a massive bottleneck 
>> internally by putting that many shards in 1 server to start with while data 
>> was low?
>> 
>> Thank you.
>> Chris
>> 




Reply via email to