Re: Solr RAM Requirements
Hi Chak Rather than comparing the overall size of your index to the RAM available for the OS disk cache, you might want to look at particular files. For example if you allow phrase queries, than the size of the *prx files is relevant, if you don't, you can look at the size of your *frq files. You also might want to take a look at the free memory when you start up Solr and then watch as it fills up as you get more queries (or send cache-warming queries). Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search KaktuChakarabati wrote: My question was mainly about the fact there seems to be two different aspects to the solr RAM usage: in-process and out-process. By that I mean, yes i know the many different parameters/caches to do with solr in-process memory usage and related culprits, however I also understand that as for actual index access (posting list, positional index etc), solr mostly delegates the access/caching of this to the OS/disk cache. So I guess my question is more about that: namely, what would be a good way to calculate an overall ram requirement profile for a server running solr? -- View this message in context: http://old.nabble.com/Solr-RAM-Requirements-tp27924551p27933779.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr RAM Requirements
Hey, I am trying to understand what kind of calculation I should do in order to come up with reasonable RAM size for a given solr machine. Suppose the index size is at 16GB. The Max heap allocated to JVM is about 12GB. The machine I'm trying now has 24GB. When the machine is running for a while serving production, I can see in top that the resident memory taken by the jvm is indeed at 12gb. Now, on top of this i should assume that if i want the whole index to fit in disk cache i need about 12gb+16gb = 28GB of RAM just for that. Is this kind of calculation correct or am i off here? Any other recommendations Anyone could make w.r.t these numbers ? Thanks, -Chak -- View this message in context: http://old.nabble.com/Solr-RAM-Requirements-tp27924551p27924551.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr RAM Requirements
On Tue, Mar 16, 2010 at 9:08 PM, KaktuChakarabati jimmoe...@gmail.comwrote: Hey, I am trying to understand what kind of calculation I should do in order to come up with reasonable RAM size for a given solr machine. Suppose the index size is at 16GB. The Max heap allocated to JVM is about 12GB. The machine I'm trying now has 24GB. When the machine is running for a while serving production, I can see in top that the resident memory taken by the jvm is indeed at 12gb. Now, on top of this i should assume that if i want the whole index to fit in disk cache i need about 12gb+16gb = 28GB of RAM just for that. Is this kind of calculation correct or am i off here? Hmmm..not quite. The idea of the ram usage isn't to simply hold the index in memory - if you want this use a RAMDirectory. The memory being used will be a combination of various caches (Lucene and Solr), index buffers et al., and of course the server itself. The specifics depend very much on what your server is doing at any given time - e.g. lots of concurrent searches, lots of indexing, both etc., and how things are setup in your solrconfig.xml. A really excellent resource that's worth looking at regarding all this can be found here: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr Any other recommendations Anyone could make w.r.t these numbers ? Thanks, -Chak -- View this message in context: http://old.nabble.com/Solr-RAM-Requirements-tp27924551p27924551.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr RAM Requirements
There are certainly a number of widely varying opinions on the use of RAM directory. Basically, though, if you need the index to be persistent at some point (i.e. saved across reboots, crashes etc.), you'll need to write to a disk, so RAM directory becomes somewhat superfluous in this case. Generally, good hardware and fast disks are a better bet, since you'll probably want to have them anyway :-) From my own experiences with varying types/sizes of indexes, and the general wisdom gleamed from the experts, the amount of memory required for a given environment is very much a 'how long is a piece of string' type of scenario. It depends on so many factors that it's impractical to come up with a easy 'standardized' formula. What I've found useful as a rough guidance (in additon to the very useful URL I mentioned earlier), is if your server is doing lots of indexing and not much searching, you want your os fs cache to have access to a healthy amount of memory. If you're doing lots of searching/reading (and particularly faceting), you'll want a good amount of ram for Solr/Lucene caching (which caches need what depends on the type of data you're searching). If you have a server that is doing a lot of both indexing and searching, you should consider breaking them out using replication and possibly using load balancers (if you have lots of concurrent querying going on). It stands to reason that the bigger the index gets, the more memory will generally be required for working on various aspects of it. When you get into very large indexes, it becomes more efficient to distribute the indexing across servers (and replicating those servers), so that no single machine has huge cache lists to traverse. Again, the 'Scaling Lucene and Solr' page goes into these scenarios and is well worth studying. On Wed, Mar 17, 2010 at 12:29 AM, KaktuChakarabati jimmoe...@gmail.comwrote: Hey Peter, Thanks for your reply. My question was mainly about the fact there seems to be two different aspects to the solr RAM usage: in-process and out-process. By that I mean, yes i know the many different parameters/caches to do with solr in-process memory usage and related culprits, however I also understand that as for actual index access (posting list, positional index etc), solr mostly delegates the access/caching of this to the OS/disk cache. So I guess my question is more about that: namely, what would be a good way to calculate an overall ram requirement profile for a server running solr? Also, I was under the impression benefits from RAMDirectory would be minimal when disk caches are effective no? And does RAMDirectory work with replication? if so, doesnt it slow it down? ( on each replication, load up entire index to RAM at once? ) Peter Sturge wrote: On Tue, Mar 16, 2010 at 9:08 PM, KaktuChakarabati jimmoe...@gmail.comwrote: Hey, I am trying to understand what kind of calculation I should do in order to come up with reasonable RAM size for a given solr machine. Suppose the index size is at 16GB. The Max heap allocated to JVM is about 12GB. The machine I'm trying now has 24GB. When the machine is running for a while serving production, I can see in top that the resident memory taken by the jvm is indeed at 12gb. Now, on top of this i should assume that if i want the whole index to fit in disk cache i need about 12gb+16gb = 28GB of RAM just for that. Is this kind of calculation correct or am i off here? Hmmm..not quite. The idea of the ram usage isn't to simply hold the index in memory - if you want this use a RAMDirectory. The memory being used will be a combination of various caches (Lucene and Solr), index buffers et al., and of course the server itself. The specifics depend very much on what your server is doing at any given time - e.g. lots of concurrent searches, lots of indexing, both etc., and how things are setup in your solrconfig.xml. A really excellent resource that's worth looking at regarding all this can be found here: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr Any other recommendations Anyone could make w.r.t these numbers ? Thanks, -Chak -- View this message in context: http://old.nabble.com/Solr-RAM-Requirements-tp27924551p27924551.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Solr-RAM-Requirements-tp27924551p27926536.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr RAM Requirements
Just turn your entire disk to RAM http://www.hyperossystems.co.uk/ 800X faster. Who cares if it swaps to 'disk' then :-) Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Tue, 3/16/10, Peter Sturge peter.stu...@googlemail.com wrote: From: Peter Sturge peter.stu...@googlemail.com Subject: Re: Solr RAM Requirements To: solr-user@lucene.apache.org Date: Tuesday, March 16, 2010, 6:25 PM There are certainly a number of widely varying opinions on the use of RAM directory. Basically, though, if you need the index to be persistent at some point (i.e. saved across reboots, crashes etc.), you'll need to write to a disk, so RAM directory becomes somewhat superfluous in this case. Generally, good hardware and fast disks are a better bet, since you'll probably want to have them anyway :-) From my own experiences with varying types/sizes of indexes, and the general wisdom gleamed from the experts, the amount of memory required for a given environment is very much a 'how long is a piece of string' type of scenario. It depends on so many factors that it's impractical to come up with a easy 'standardized' formula. What I've found useful as a rough guidance (in additon to the very useful URL I mentioned earlier), is if your server is doing lots of indexing and not much searching, you want your os fs cache to have access to a healthy amount of memory. If you're doing lots of searching/reading (and particularly faceting), you'll want a good amount of ram for Solr/Lucene caching (which caches need what depends on the type of data you're searching). If you have a server that is doing a lot of both indexing and searching, you should consider breaking them out using replication and possibly using load balancers (if you have lots of concurrent querying going on). It stands to reason that the bigger the index gets, the more memory will generally be required for working on various aspects of it. When you get into very large indexes, it becomes more efficient to distribute the indexing across servers (and replicating those servers), so that no single machine has huge cache lists to traverse. Again, the 'Scaling Lucene and Solr' page goes into these scenarios and is well worth studying. On Wed, Mar 17, 2010 at 12:29 AM, KaktuChakarabati jimmoe...@gmail.comwrote: Hey Peter, Thanks for your reply. My question was mainly about the fact there seems to be two different aspects to the solr RAM usage: in-process and out-process. By that I mean, yes i know the many different parameters/caches to do with solr in-process memory usage and related culprits, however I also understand that as for actual index access (posting list, positional index etc), solr mostly delegates the access/caching of this to the OS/disk cache. So I guess my question is more about that: namely, what would be a good way to calculate an overall ram requirement profile for a server running solr? Also, I was under the impression benefits from RAMDirectory would be minimal when disk caches are effective no? And does RAMDirectory work with replication? if so, doesnt it slow it down? ( on each replication, load up entire index to RAM at once? ) Peter Sturge wrote: On Tue, Mar 16, 2010 at 9:08 PM, KaktuChakarabati jimmoe...@gmail.comwrote: Hey, I am trying to understand what kind of calculation I should do in order to come up with reasonable RAM size for a given solr machine. Suppose the index size is at 16GB. The Max heap allocated to JVM is about 12GB. The machine I'm trying now has 24GB. When the machine is running for a while serving production, I can see in top that the resident memory taken by the jvm is indeed at 12gb. Now, on top of this i should assume that if i want the whole index to fit in disk cache i need about 12gb+16gb = 28GB of RAM just for that. Is this kind of calculation correct or am i off here? Hmmm..not quite. The idea of the ram usage isn't to simply hold the index in memory - if you want this use a RAMDirectory. The memory being used will be a combination of various caches (Lucene and Solr), index buffers et al., and of course the server itself. The specifics depend very much on what your server is doing at any given time - e.g. lots of concurrent searches, lots of indexing, both etc., and how things are setup in your solrconfig.xml. A really excellent resource that's worth looking at regarding all this can be found here: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr Any other recommendations Anyone could make w.r.t these numbers ? Thanks, -Chak -- View this message in context: http