Thanks Erik,

I forwarded your thoughts to management and put in good word for Lucid 
Imagination.

Regards,
Kallin Nagelberg

-----Original Message-----
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Thursday, April 08, 2010 2:18 PM
To: solr-user@lucene.apache.org
Subject: Re: index corruption / deployment strategy

Kallin,

It's a very rare report, and practically impossible I'm told, to  
corrupt the index these days thanks to Lucene's improvements over the  
last several releases (ignoring hardware malfunctions).

A single index is the best way to go, in my opinion - though at your  
scale you're probably looking at sharding it and using distributed  
search.  So you'll have multiple physical indexes, one for each shard,  
and a single virtual index in the eyes of your searching clients.

Backups, of course, are sensible, and Solr's replication capabilities  
can help here by requesting them periodically.  You'll be using  
replication anyway to scale to your query volume.

As for hardware scaling considerations, there are variables to  
consider like how faceting, sorting, and querying speed across a  
single large index versus sharding.  I'm guessing you'll be best with  
at least two shards, though possibly more considering these variables.

        Erik
         @ Lucid Imagination

p.s. have your higher-ups give us a call if they'd like to discuss  
their concerns and consider commercial support for your mission  
critical big scale use of Solr :)



On Apr 8, 2010, at 1:33 PM, Nagelberg, Kallin wrote:
> I've been doing work evaluating Solr for use on a hightraffic  
> website for sometime and things are looking positive. I have some  
> concerns from my higher-ups that I need to address. I have suggested  
> that we use a single index in order to keep things simple, but there  
> are suggestions to split are documents amongst different indexes.
>
> The primary motivation for this split is a worry about potential  
> index corruption. IE, if we only have one index and it becomes  
> corrupt what do we do? I never considered this to be an issue since  
> we would have backups etc., but I think they have had issues with  
> other search technology in the past where one big index resulted in  
> frequent and difficult to recover from corruption. Do you think this  
> is a concern with Solr? If so, what would you suggest to mitigate  
> the risk?
>
> My second question involves general deployment strategy. We will  
> expect about 50 million documents, each on average a few paragraphs,  
> and our website receives maybe 10 million hits a day. Can anyone  
> provide an idea of # of servers, clustering/replication setup etc.  
> that might be appropriate for this scenario? I'm interested to hear  
> what other's experience is with similar situations.
>
> Thanks,
> -Kallin Nagelberg
>

Reply via email to