Thanks Shawn. You're absolutely right about the performance balance, though it's good to hear it from an experienced source (if you don't mind me calling you that!) Fortunately we don't have a top performance requirement, and we have a small audience so a low query volume. On similar systems we're "managing" to just provide a Solr service with a 3TB index size on 160GB RAM, though we have scripts to handle the occasionally necessary service restart when someone submits a more exotic query. This, btw, gives a response time of ~45-90 seconds for uncached queries. My question I suppose comes from my hope that we can do away with the restart scripts as I doubt they help the Solr service (as they can if necessary just kill processes and restart), and get to responses times < 20 seconds.
-----Original Message----- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: 10 December 2013 17:37 To: solr-user@lucene.apache.org Subject: Re: Solr hardware memory question On 12/10/2013 9:51 AM, Hoggarth, Gil wrote: > We're probably going to be building a Solr service to handle a dataset > of ~60TB, which for our data and schema typically gives a Solr index > size of 1/10th - i.e., 6TB. Given there's a general rule about the > amount of hardware memory required should exceed the size of the Solr > index (exceed to also allow for the operating system etc.), how have > people handled this situation? Do I really need, for example, 12 > servers with 512GB RAM, or are there other techniques to handling this? That really depends on what kind of query volume you'll have and what kind of performance you want. If your query volume is low and you can deal with slow individual queries, then you won't need that much memory. If either of those requirements increases, you'd probably need more memory, up to the 6TB total -- or 12TB if you need to double the total index size for redundancy purposes. If your index is constantly growing like most are, you need to plan for that too. Putting the entire index into RAM is required for *top* performance, but not for base functionality. It might be possible to put only a fraction of your index into RAM. Only testing can determine what you really need to obtain the performance you're after. Perhaps you've already done this, but you should try as much as possible to reduce your index size. Store as few fields as possible, only just enough to build a search result list/grid and retrieve the full document from the canonical data store. Save termvectors and docvalues on as few fields as possible. If you can, reduce the number of terms produced by your analysis chains. Thanks, Shawn