Again, thank you for this incredible information, I feel on much firmer
footing now. I'm going to test distributing this across 10 servers,
borrowing a Hadoop cluster temporarily, and see how it does with enough
memory to have the whole index cached. But I'm thinking that we'll try the
SSD route as our index will probably rest in the 1/2 terabyte range
eventually, there's still a lot of active development.

I guess the RAM disk would work in our case also, as we only index in
batches, and eventually I'd like to do that off of Solr and just update the
index (I'm presuming this is doable in solr cloud, but I haven't put it to
task yet). If I could purpose Hadoop to index the shards, that would be
ideal, though I haven't quite figured out how to go about it yet.

David


-----Original Message-----
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Friday, April 19, 2013 9:42 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud loadbalancing, replication, and failover

On 4/19/2013 3:48 AM, David Parks wrote:
> The Physical Memory is 90% utilized (21.18GB of 23.54GB). Solr has 
> dark grey allocation of 602MB, and light grey of an additional 108MB, 
> for a JVM total of 710MB allocated. If I understand correctly, Solr 
> memory utilization is
> *not* for caching (unless I configured document caches or some of the 
> other cache options in Solr, which don't seem to apply in this case, 
> and I haven't altered from their defaults).

Right.  Solr does have caches, but they serve specific purposes.  The OS is
much better at general large-scale caching than Solr is.  Solr caches get
cleared (and possibly re-warmed) whenever you issue a commit on your index
that makes new documents visible.

> So assuming this box was dedicated to 1 solr instance/shard. What JVM 
> heap should I set? Does that matter? 24GB JVM heap? Or keep it lower 
> and ensure the OS cache has plenty of room to operate? (this is an 
> Ubuntu 12.10 server instance).

The JVM heap to use is highly dependent on the nature of your queries, the
number of documents, the number of unique terms, etc.  The best thing to do
is try it out with a relatively large heap, see how much memory actually
gets used inside the JVM.  The jvisualvm and jconsole tools will give you
nice graphs of JVM memory usage.  The jstat program will give you raw
numbers on the commandline that you'll need to add to get the full picture.
Due to the garbage collection model that Java uses, what you'll see is a
sawtooth pattern - memory usage goes up to max heap, then garbage collection
reduces it to the actual memory used.
 Generally speaking, you want to have more heap available than the "low"
point of that sawtooth pattern.  If that low point is around 3GB when you
are hitting your index hard with queries and updates, then you would want to
give Solr a heap of 4 to 6 GB.

> Would I be wise to just put the index on a RAM disk and guarantee 
> performance?  Assuming I installed sufficient RAM?

A RAM disk is a very good way to guarantee performance - but RAM disks are
ephemeral.  Reboot or have an OS crash and it's gone, you'll have to
reindex.  Also remember that you actually need at *least* twice the size of
your index so that Solr (Lucene) has enough room to do merges, and the
worst-case scenario is *three* times the index size.  Merging happens during
normal indexing, not just when you optimize.  If you have enough RAM for
three times your index size and it takes less than an hour or two to rebuild
the index, then a RAM disk might be a viable way to go.  I suspect that this
won't work for you.

Thanks,
Shawn

Reply via email to