Thanks Otis !
Please ignore my earlier email which does not have all the information.

My business requirements have changed a bit.
We now need one year rolling data in Production, with the following details
    - Number of records -> 1.2 million
    - Solr index size for these records comes to approximately 200 - 220
GB. (includes large attachments)
    - Approx 250 users who will be searching the applicaiton with a peak of
1 search request every 40 seconds.

I am planning to address this using Solr distributed search on a VMWare
virtualized environment as follows.

1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves
(load balanced)

2. Master configuration for each server is as follows
    - 4 CPUs
    - 16 GB RAM
    - 300 GB disk space

3. Slave configuration for each server is as follows
    - 4 CPUs
    - 16 GB RAM
    - 150 GB disk space

4. I am planning to use SAN instead of local storage to store Solr index.

And my questions are as follows:
Will 3 shards serve the purpose here ?
Is SAN a a good option for storing solr index, given the high index volume ?




On Mon, Nov 21, 2011 at 3:05 PM, Rahul Warawdekar <
rahul.warawde...@gmail.com> wrote:

> Thanks !
>
> My business requirements have changed a bit.
> We need one year rolling data in Production.
> The index size for the same comes to approximately 200 - 220 GB.
> I am planning to address this using Solr distributed search as follows.
>
> 1. Whole index to be split up between 3 shards, with 3 masters and 6
> slaves (load balanced)
> 2. Master configuration
>  will be 4 CPU
>
>
>
> On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com> wrote:
>
>> Hi Rahul,
>>
>> This is unfortunately not enough information for anyone to give you very
>> precise answers, so I'll just give some rough ones:
>>
>> * best disk - SSD :)
>> * CPU - multicore, depends on query complexity, concurrency, etc.
>> * sharded search and failover - start with SolrCloud, there are a couple
>> of pages about it on the Wiki and
>> http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/
>>
>> Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>> >________________________________
>> >From: Rahul Warawdekar <rahul.warawde...@gmail.com>
>> >To: solr-user <solr-user@lucene.apache.org>
>> >Sent: Tuesday, October 11, 2011 11:47 AM
>> >Subject: Architecture and Capacity planning for large Solr index
>> >
>> >Hi All,
>> >
>> >I am working on a Solr search based project, and would highly appreciate
>> >help/suggestions from you all regarding Solr architecture and capacity
>> >planning.
>> >Details of the project are as follows
>> >
>> >1. There are 2 databases from which, data needs to be indexed and made
>> >searchable,
>> >                - Production
>> >                - Archive
>> >2. Production database will retain 6 months old data and archive data
>> every
>> >month.
>> >3. Archive database will retain 3 years old data.
>> >4. Database is SQL Server 2008 and Solr version is 3.1
>> >
>> >Data to be indexed contains a huge volume of attachments (PDF, Word,
>> excel
>> >etc..), approximately 200 GB per month.
>> >We are planning to do a full index every month (multithreaded) and
>> >incremental indexing on a daily basis.
>> >The Solr index size is coming to approximately 25 GB per month.
>> >
>> >If we were to use distributed search, what would be the best
>> configuration
>> >for Production as well as Archive indexes ?
>> >What would be the best CPU/RAM/Disk configuration ?
>> >How can I implement failover mechanism for sharded searches ?
>> >
>> >Please let me know in case I need to share more information.
>> >
>> >
>> >--
>> >Thanks and Regards
>> >Rahul A. Warawdekar
>> >
>> >
>> >
>>
>
>
>
> --
> Thanks and Regards
> Rahul A. Warawdekar
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar

Reply via email to