On 3/12/2013 4:12 AM, kobe.free.wo...@gmail.com wrote:
Following is the prod scenario:-

1. Web Server 1 (with above mentioned configuration) - will be hosting Solr
instance and web site.
2. Web Server 2 (with above mentioned configuration) - will be hosting
second Solr instance and web site.

Does this scenario looks fine w.r.t the indexing/ searching performance?
Also, on the front end are .NET web applications that issue queries via HTTP
requests to our searchers.

It's always recommended that Solr live on separate hardware from everything else, and I'll add my +1 to wunder's "don't use Windows" note here too. You've already gotten some awesome replies about why, here's my two cents:

Busy web servers, especially those that run full applications, tend to be hungry for CPU and RAM resources. This also describes Solr, which is itself a web application (java servlet). If Solr is not the only thing on the box, then nobody can even make a guess about whether the hardware you're using will be big enough. Even when Solr is the only thing on the box, advice found here is often only a guess. Adding additional software to the machine guarantees that it's a guess, and a vague one at best.

If your servers have plenty of CPU and RAM left over even when the web server reaches peak load, then you might be OK. The "500 users per minute" figure you've given sounds like a lot.

Note that any enumeration of RAM resources must include the amount of required OS disk cache, not just the amount of RAM required by the applications themselves. Here's a blog post about how Lucene (and Solr) uses RAM and the OS disk cache. When it's big enough, the OS disk cache is helpful even for applications that don't use MMap:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

The number of documents (800K) and fields (450 per doc) that you've mentioned sounds like it will produce an index size that's way too big to fit in the OS disk cache on a 12GB server, unless all of those fields contain numeric data encoded in numeric data types rather than fully tokenized text.

If you are searching and/or filtering on very many of those fields, plus facets, Solr is going to require a lot of heap memory, further reducing the amount of OS disk cache available. With a web application receiving several hundred requests per minute running on the same hardware, 12GB probably won't be anywhere near enough ... I'd say the absolute minimum you'd want to consider for your combined setup would be 64GB, and more might be a good idea. Depending on the total index size, 32GB might be enough for a dedicated Solr server.

Thanks,
Shawn

Reply via email to