Hi Rahul,

This is unfortunately not enough information for anyone to give you very 
precise answers, so I'll just give some rough ones:

* best disk - SSD :)
* CPU - multicore, depends on query complexity, concurrency, etc.
* sharded search and failover - start with SolrCloud, there are a couple of 
pages about it on the Wiki 
and http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Rahul Warawdekar <rahul.warawde...@gmail.com>
>To: solr-user <solr-user@lucene.apache.org>
>Sent: Tuesday, October 11, 2011 11:47 AM
>Subject: Architecture and Capacity planning for large Solr index
>
>Hi All,
>
>I am working on a Solr search based project, and would highly appreciate
>help/suggestions from you all regarding Solr architecture and capacity
>planning.
>Details of the project are as follows
>
>1. There are 2 databases from which, data needs to be indexed and made
>searchable,
>                - Production
>                - Archive
>2. Production database will retain 6 months old data and archive data every
>month.
>3. Archive database will retain 3 years old data.
>4. Database is SQL Server 2008 and Solr version is 3.1
>
>Data to be indexed contains a huge volume of attachments (PDF, Word, excel
>etc..), approximately 200 GB per month.
>We are planning to do a full index every month (multithreaded) and
>incremental indexing on a daily basis.
>The Solr index size is coming to approximately 25 GB per month.
>
>If we were to use distributed search, what would be the best configuration
>for Production as well as Archive indexes ?
>What would be the best CPU/RAM/Disk configuration ?
>How can I implement failover mechanism for sharded searches ?
>
>Please let me know in case I need to share more information.
>
>
>-- 
>Thanks and Regards
>Rahul A. Warawdekar
>
>
>

Reply via email to