Hi, folks,

 

We're planning to use SOLR for our project, got some questions. It's a
very new experience for us so any help is really appreciated.

 

1) We're storing two types of documents; both have pretty much the same
fields (with a few extra fields for one type). The important thing is
that the search queries will have to return the Docs of both types. So
we're thinking of having one index for both types with the "type" field,
just like in the example here: 

http://wiki.apache.org/solr/MultipleIndexes#head-9e6bee989c8120974eee9df
0944b58a28d489ba2

This field will be also used as a facet if a user wants to limit the
search results by type.

 

Do you think it's a sound idea?

 

2) The hardware configuration.

 

First, some numbers we're expecting.

 

* The average size of a doc: ~100K

 

* The number of indexes: 1

 

* The query response time we're looking for: < 200 - 300ms

 

* The number of stored docs:

1st year: 500K - 1M

2nd year: 2-3M

 

* The estimated number of concurrent users per second 1st year: 15 - 25
2nd year: 40 - 60

 

* The estimated number of queries

1st year: 15 - 25

2nd year: 40 - 60

 

Now the questions

 

* Should we do sharding or not? 

If we start without sharding, how hard will it be to enable it?

Is it just some config changes + the index rebuild or is it more?

 

My personal opinion is to go without sharding at first and enable it
later if do get a lot of documents. 

 

* How should we organize our clusters.

Should we have 2 or more identical Masters (means that all the
updates/optimisations/etc. are done for every one of them)?

An alternative, afaik, is to reconfigure one slave to become the new
Master, how hard is that?

 

* Basically, we can get servers of two kinds:

 

* Single Processor, Dual Core Opteron 2214HE * 2 GB DDR2 SDRAM * 1 x 250
GB (7200 RPM) SATA Drive(s)

 

* Dual Processor, Quad Core 5335

* 16 GB Memory (Fully Buffered)

* 2 x 73 GB (10k RPM) 2.5" SAS Drive(s), RAID 1

 

The second - more powerful - one is more expensive, of course. 

 

* How can we take advantage of the multiprocessor/multicore servers? 

Is there some special setup required to make, say, 2 instances of SOLR
run on the same server using different processors/cores?

 

* Does it make much difference to get a more powerful Master? 

Or, on the contrary, as slaves will be queried more often, they should
be the better ones? Maybe just the HDDs for the slaves should be as fast
as possible?

 

* How many slaves does it make sense to have per one Master? 

 

What's (roughly) the performance gain from 1 to 2, 2 -> 3, etc? 

 

When does it stop making sense to add more slaves? 

As far as I understand, it depends mainly on the size of the index.
However, I'd guess the time required to do a push for too many slaves
can be a problem too, correct?

 

Thanks,

Andrey.

Reply via email to