On 3/12/2013 4:12 AM, kobe.free.wo...@gmail.com wrote:
Following is the prod scenario:-
1. Web Server 1 (with above mentioned configuration) - will be hosting Solr
instance and web site.
2. Web Server 2 (with above mentioned configuration) - will be hosting
second Solr instance and web site.
Does this scenario looks fine w.r.t the indexing/ searching performance?
Also, on the front end are .NET web applications that issue queries via HTTP
requests to our searchers.
It's always recommended that Solr live on separate hardware from
everything else, and I'll add my +1 to wunder's "don't use Windows" note
here too. You've already gotten some awesome replies about why, here's
my two cents:
Busy web servers, especially those that run full applications, tend to
be hungry for CPU and RAM resources. This also describes Solr, which is
itself a web application (java servlet). If Solr is not the only thing
on the box, then nobody can even make a guess about whether the hardware
you're using will be big enough. Even when Solr is the only thing on
the box, advice found here is often only a guess. Adding additional
software to the machine guarantees that it's a guess, and a vague one at
best.
If your servers have plenty of CPU and RAM left over even when the web
server reaches peak load, then you might be OK. The "500 users per
minute" figure you've given sounds like a lot.
Note that any enumeration of RAM resources must include the amount of
required OS disk cache, not just the amount of RAM required by the
applications themselves. Here's a blog post about how Lucene (and Solr)
uses RAM and the OS disk cache. When it's big enough, the OS disk cache
is helpful even for applications that don't use MMap:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
The number of documents (800K) and fields (450 per doc) that you've
mentioned sounds like it will produce an index size that's way too big
to fit in the OS disk cache on a 12GB server, unless all of those fields
contain numeric data encoded in numeric data types rather than fully
tokenized text.
If you are searching and/or filtering on very many of those fields, plus
facets, Solr is going to require a lot of heap memory, further reducing
the amount of OS disk cache available. With a web application receiving
several hundred requests per minute running on the same hardware, 12GB
probably won't be anywhere near enough ... I'd say the absolute minimum
you'd want to consider for your combined setup would be 64GB, and more
might be a good idea. Depending on the total index size, 32GB might be
enough for a dedicated Solr server.
Thanks,
Shawn