Hi Rahul, This is unfortunately not enough information for anyone to give you very precise answers, so I'll just give some rough ones:
* best disk - SSD :) * CPU - multicore, depends on query complexity, concurrency, etc. * sharded search and failover - start with SolrCloud, there are a couple of pages about it on the Wiki and http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/ Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ >________________________________ >From: Rahul Warawdekar <rahul.warawde...@gmail.com> >To: solr-user <solr-user@lucene.apache.org> >Sent: Tuesday, October 11, 2011 11:47 AM >Subject: Architecture and Capacity planning for large Solr index > >Hi All, > >I am working on a Solr search based project, and would highly appreciate >help/suggestions from you all regarding Solr architecture and capacity >planning. >Details of the project are as follows > >1. There are 2 databases from which, data needs to be indexed and made >searchable, > - Production > - Archive >2. Production database will retain 6 months old data and archive data every >month. >3. Archive database will retain 3 years old data. >4. Database is SQL Server 2008 and Solr version is 3.1 > >Data to be indexed contains a huge volume of attachments (PDF, Word, excel >etc..), approximately 200 GB per month. >We are planning to do a full index every month (multithreaded) and >incremental indexing on a daily basis. >The Solr index size is coming to approximately 25 GB per month. > >If we were to use distributed search, what would be the best configuration >for Production as well as Archive indexes ? >What would be the best CPU/RAM/Disk configuration ? >How can I implement failover mechanism for sharded searches ? > >Please let me know in case I need to share more information. > > >-- >Thanks and Regards >Rahul A. Warawdekar > > >