subject:"Re\: Architecture and Capacity planning for large Solr index"

Re: Architecture and Capacity planning for large Solr index

2011-11-23 Thread Erick Erickson

Whether three shards will give you adequate throughput is not an
answerable question. Here's what I suggest. Get a single box
of the size you expect your servers to be and index 1/3 of your
documents on it. Run stress tests. That's really the only way to
be fairly sure your hardware is adequate.

As far as SANs are concerned, local storage is almost always
better. I'd advise against trying to share the index amongst
slaves, SAN or not. And using the SAN for each slave's copy
seems unnecessary with storage as cheap as it is, what
advantage do you see in this scenario?

Best
Erick

On Mon, Nov 21, 2011 at 3:18 PM, Rahul Warawdekar
rahul.warawde...@gmail.com wrote:
Thanks Otis !
Please ignore my earlier email which does not have all the information.

My business requirements have changed a bit.
We now need one year rolling data in Production, with the following details
- Number of records - 1.2 million
- Solr index size for these records comes to approximately 200 - 220
GB. (includes large attachments)
- Approx 250 users who will be searching the applicaiton with a peak of
1 search request every 40 seconds.

I am planning to address this using Solr distributed search on a VMWare
virtualized environment as follows.

1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves
(load balanced)

2. Master configuration for each server is as follows
- 4 CPUs
- 16 GB RAM
- 300 GB disk space

3. Slave configuration for each server is as follows
- 4 CPUs
- 16 GB RAM
- 150 GB disk space

4. I am planning to use SAN instead of local storage to store Solr index.

And my questions are as follows:
Will 3 shards serve the purpose here ?
Is SAN a a good option for storing solr index, given the high index volume ?

On Mon, Nov 21, 2011 at 3:05 PM, Rahul Warawdekar
rahul.warawde...@gmail.com wrote:

Thanks !

My business requirements have changed a bit.
We need one year rolling data in Production.
The index size for the same comes to approximately 200 - 220 GB.
I am planning to address this using Solr distributed search as follows.

1. Whole index to be split up between 3 shards, with 3 masters and 6
slaves (load balanced)
2. Master configuration
will be 4 CPU

On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

Hi Rahul,

This is unfortunately not enough information for anyone to give you very
precise answers, so I'll just give some rough ones:

* best disk - SSD :)
* CPU - multicore, depends on query complexity, concurrency, etc.
* sharded search and failover - start with SolrCloud, there are a couple
of pages about it on the Wiki and
http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

From: Rahul Warawdekar rahul.warawde...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tuesday, October 11, 2011 11:47 AM
Subject: Architecture and Capacity planning for large Solr index

Hi All,

I am working on a Solr search based project, and would highly appreciate
help/suggestions from you all regarding Solr architecture and capacity
planning.
Details of the project are as follows

1. There are 2 databases from which, data needs to be indexed and made
searchable,
- Production
- Archive
2. Production database will retain 6 months old data and archive data
every
month.
3. Archive database will retain 3 years old data.
4. Database is SQL Server 2008 and Solr version is 3.1

Data to be indexed contains a huge volume of attachments (PDF, Word,
excel
etc..), approximately 200 GB per month.
We are planning to do a full index every month (multithreaded) and
incremental indexing on a daily basis.
The Solr index size is coming to approximately 25 GB per month.

If we were to use distributed search, what would be the best
configuration
for Production as well as Archive indexes ?
What would be the best CPU/RAM/Disk configuration ?
How can I implement failover mechanism for sharded searches ?

Please let me know in case I need to share more information.

--
Thanks and Regards
Rahul A. Warawdekar

Re: Architecture and Capacity planning for large Solr index

2011-11-21 Thread Rahul Warawdekar

Thanks !

1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves
(load balanced)
2. Master configuration
will be 4 CPU

On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

Hi Rahul,

This is unfortunately not enough information for anyone to give you very
precise answers, so I'll just give some rough ones:

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

From: Rahul Warawdekar rahul.warawde...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tuesday, October 11, 2011 11:47 AM
Subject: Architecture and Capacity planning for large Solr index

Hi All,

I am working on a Solr search based project, and would highly appreciate
help/suggestions from you all regarding Solr architecture and capacity
planning.
Details of the project are as follows

Data to be indexed contains a huge volume of attachments (PDF, Word, excel
etc..), approximately 200 GB per month.
We are planning to do a full index every month (multithreaded) and
incremental indexing on a daily basis.
The Solr index size is coming to approximately 25 GB per month.

If we were to use distributed search, what would be the best configuration
for Production as well as Archive indexes ?
What would be the best CPU/RAM/Disk configuration ?
How can I implement failover mechanism for sharded searches ?

Please let me know in case I need to share more information.

--
Thanks and Regards
Rahul A. Warawdekar

Re: Architecture and Capacity planning for large Solr index

2011-11-21 Thread Rahul Warawdekar

Thanks Otis !
Please ignore my earlier email which does not have all the information.

I am planning to address this using Solr distributed search on a VMWare
virtualized environment as follows.

1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves
(load balanced)

2. Master configuration for each server is as follows
- 4 CPUs
- 16 GB RAM
- 300 GB disk space

3. Slave configuration for each server is as follows
- 4 CPUs
- 16 GB RAM
- 150 GB disk space

4. I am planning to use SAN instead of local storage to store Solr index.

And my questions are as follows:
Will 3 shards serve the purpose here ?
Is SAN a a good option for storing solr index, given the high index volume ?

On Mon, Nov 21, 2011 at 3:05 PM, Rahul Warawdekar
rahul.warawde...@gmail.com wrote:

Thanks !

1. Whole index to be split up between 3 shards, with 3 masters and 6
slaves (load balanced)
2. Master configuration
will be 4 CPU

On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

Hi Rahul,

This is unfortunately not enough information for anyone to give you very
precise answers, so I'll just give some rough ones:

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

From: Rahul Warawdekar rahul.warawde...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tuesday, October 11, 2011 11:47 AM
Subject: Architecture and Capacity planning for large Solr index

Hi All,

I am working on a Solr search based project, and would highly appreciate
help/suggestions from you all regarding Solr architecture and capacity
planning.
Details of the project are as follows

Please let me know in case I need to share more information.

--
Thanks and Regards
Rahul A. Warawdekar

Re: Architecture and Capacity planning for large Solr index

Re: Architecture and Capacity planning for large Solr index

Re: Architecture and Capacity planning for large Solr index

3 matches

Site Navigation

Mail list logo

Footer information