Re: Solr RAM Requirements

2010-03-17 Thread Tom Burton-West

Hi Chak

Rather than comparing the overall size of your index to the RAM available
for the OS disk cache, you might want to look at particular files. For
example if you allow phrase queries, than the size of the *prx files is
relevant, if you don't, you can look at the size of your *frq files.   You
also might want to take a look at the free memory when you start up Solr and
then watch as it fills up as you get more queries (or send cache-warming
queries).   

Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search





KaktuChakarabati wrote:
 
 My question was mainly about the fact there seems to be two different
 aspects to the solr RAM usage: in-process and out-process. 
 By that I mean, yes i know the many different parameters/caches to do with
 solr in-process memory usage and related culprits, however I also
 understand that as for actual index access (posting list, positional index
 etc), solr mostly delegates the access/caching of this to the OS/disk
 cache. 
 So I guess my question is more about that: namely, what would be a good
 way to calculate an overall ram requirement profile for a server running
 solr? 
 
-- 
View this message in context: 
http://old.nabble.com/Solr-RAM-Requirements-tp27924551p27933779.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr RAM Requirements

2010-03-16 Thread KaktuChakarabati

Hey,
I am trying to understand what kind of calculation I should do in order to
come up with reasonable RAM size for a given solr machine.

Suppose the index size is at 16GB.
The Max heap allocated to JVM is about 12GB.

The machine I'm trying now has 24GB.
When the machine is running for a while serving production, I can see in top
that the resident memory taken by the jvm is indeed at 12gb.
Now, on top of this i should assume that if i want the whole index to fit in
disk cache i need about 12gb+16gb = 28GB of RAM just for that. Is this kind
of calculation correct or am i off here?

Any other recommendations Anyone could make w.r.t these numbers ?

Thanks,
-Chak
-- 
View this message in context: 
http://old.nabble.com/Solr-RAM-Requirements-tp27924551p27924551.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr RAM Requirements

2010-03-16 Thread Peter Sturge
On Tue, Mar 16, 2010 at 9:08 PM, KaktuChakarabati jimmoe...@gmail.comwrote:


 Hey,
 I am trying to understand what kind of calculation I should do in order to
 come up with reasonable RAM size for a given solr machine.

 Suppose the index size is at 16GB.
 The Max heap allocated to JVM is about 12GB.

 The machine I'm trying now has 24GB.
 When the machine is running for a while serving production, I can see in
 top
 that the resident memory taken by the jvm is indeed at 12gb.
 Now, on top of this i should assume that if i want the whole index to fit
 in
 disk cache i need about 12gb+16gb = 28GB of RAM just for that. Is this kind
 of calculation correct or am i off here?


Hmmm..not quite. The idea of the ram usage isn't to simply hold the index in
memory - if you want this use a RAMDirectory.
The memory being used will be a combination of various caches (Lucene and
Solr), index buffers et al., and of course the server itself. The specifics
depend very
much on what your server is doing at any given time - e.g. lots of
concurrent searches, lots of indexing, both etc., and how things are setup
in your solrconfig.xml.

A really excellent resource that's worth looking at regarding all this can
be found here:

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr



 Any other recommendations Anyone could make w.r.t these numbers ?

 Thanks,
 -Chak
 --
 View this message in context:
 http://old.nabble.com/Solr-RAM-Requirements-tp27924551p27924551.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr RAM Requirements

2010-03-16 Thread Peter Sturge
There are certainly a number of widely varying opinions on the use of RAM
directory.
Basically, though, if you need the index to be persistent at some point
(i.e. saved across reboots, crashes etc.),
you'll need to write to a disk, so RAM directory becomes somewhat
superfluous in this case.

Generally, good hardware and fast disks are a better bet, since you'll
probably want to have them anyway :-)

From my own experiences with varying types/sizes of indexes, and the general
wisdom gleamed from the experts, the amount of memory required for a given
environment is very much
a 'how long is a piece of string' type of scenario. It depends on so many
factors that it's impractical to come up with a easy 'standardized' formula.

What I've found useful as a rough guidance (in additon to the very useful
URL I mentioned earlier), is if your server is doing lots of indexing and
not much searching, you want your os fs cache to have access to a healthy
amount of memory.
If you're doing lots of searching/reading (and particularly faceting),
you'll want a good amount of ram for Solr/Lucene caching (which caches need
what depends on the type of data you're searching).
If you have a server that is doing a lot of both indexing and searching, you
should consider breaking them out using replication and possibly using load
balancers (if you have lots of concurrent querying going on).

It stands to reason that the bigger the index gets, the more memory will
generally be required for working on various aspects of it. When you get
into very large indexes, it becomes more efficient to distribute the
indexing across servers (and replicating those servers), so that no single
machine has huge cache lists to traverse. Again, the 'Scaling Lucene and
Solr' page goes into these scenarios and is well worth studying.



On Wed, Mar 17, 2010 at 12:29 AM, KaktuChakarabati jimmoe...@gmail.comwrote:


 Hey Peter,
 Thanks for your reply.
 My question was mainly about the fact there seems to be two different
 aspects to the solr RAM usage: in-process and out-process.
 By that I mean, yes i know the many different parameters/caches to do with
 solr in-process memory usage and related culprits, however I also
 understand
 that as for actual index access (posting list, positional index etc), solr
 mostly delegates the access/caching of this to the OS/disk cache.
 So I guess my question is more about that: namely, what would be a good way
 to calculate an overall ram requirement profile for a server running solr?
 Also, I was under the impression benefits from RAMDirectory would be
 minimal
 when disk caches are effective no?
 And does RAMDirectory work with replication? if so, doesnt it slow it down?
 ( on each replication, load up entire index to RAM at once? )



 Peter Sturge wrote:
 
  On Tue, Mar 16, 2010 at 9:08 PM, KaktuChakarabati
  jimmoe...@gmail.comwrote:
 
 
  Hey,
  I am trying to understand what kind of calculation I should do in order
  to
  come up with reasonable RAM size for a given solr machine.
 
  Suppose the index size is at 16GB.
  The Max heap allocated to JVM is about 12GB.
 
  The machine I'm trying now has 24GB.
  When the machine is running for a while serving production, I can see in
  top
  that the resident memory taken by the jvm is indeed at 12gb.
  Now, on top of this i should assume that if i want the whole index to
 fit
  in
  disk cache i need about 12gb+16gb = 28GB of RAM just for that. Is this
  kind
  of calculation correct or am i off here?
 
 
  Hmmm..not quite. The idea of the ram usage isn't to simply hold the index
  in
  memory - if you want this use a RAMDirectory.
  The memory being used will be a combination of various caches (Lucene and
  Solr), index buffers et al., and of course the server itself. The
  specifics
  depend very
  much on what your server is doing at any given time - e.g. lots of
  concurrent searches, lots of indexing, both etc., and how things are
 setup
  in your solrconfig.xml.
 
  A really excellent resource that's worth looking at regarding all this
 can
  be found here:
 
 
 http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr
 
 
 
  Any other recommendations Anyone could make w.r.t these numbers ?
 
  Thanks,
  -Chak
  --
  View this message in context:
  http://old.nabble.com/Solr-RAM-Requirements-tp27924551p27924551.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://old.nabble.com/Solr-RAM-Requirements-tp27924551p27926536.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr RAM Requirements

2010-03-16 Thread Dennis Gearon
Just turn your entire disk to RAM

http://www.hyperossystems.co.uk/

800X faster. Who cares if it swaps to 'disk' then :-)


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Tue, 3/16/10, Peter Sturge peter.stu...@googlemail.com wrote:

 From: Peter Sturge peter.stu...@googlemail.com
 Subject: Re: Solr RAM Requirements
 To: solr-user@lucene.apache.org
 Date: Tuesday, March 16, 2010, 6:25 PM
 There are certainly a number of
 widely varying opinions on the use of RAM
 directory.
 Basically, though, if you need the index to be persistent
 at some point
 (i.e. saved across reboots, crashes etc.),
 you'll need to write to a disk, so RAM directory becomes
 somewhat
 superfluous in this case.
 
 Generally, good hardware and fast disks are a better bet,
 since you'll
 probably want to have them anyway :-)
 
 From my own experiences with varying types/sizes of
 indexes, and the general
 wisdom gleamed from the experts, the amount of memory
 required for a given
 environment is very much
 a 'how long is a piece of string' type of scenario. It
 depends on so many
 factors that it's impractical to come up with a easy
 'standardized' formula.
 
 What I've found useful as a rough guidance (in additon to
 the very useful
 URL I mentioned earlier), is if your server is doing lots
 of indexing and
 not much searching, you want your os fs cache to have
 access to a healthy
 amount of memory.
 If you're doing lots of searching/reading (and particularly
 faceting),
 you'll want a good amount of ram for Solr/Lucene caching
 (which caches need
 what depends on the type of data you're searching).
 If you have a server that is doing a lot of both indexing
 and searching, you
 should consider breaking them out using replication and
 possibly using load
 balancers (if you have lots of concurrent querying going
 on).
 
 It stands to reason that the bigger the index gets, the
 more memory will
 generally be required for working on various aspects of it.
 When you get
 into very large indexes, it becomes more efficient to
 distribute the
 indexing across servers (and replicating those servers), so
 that no single
 machine has huge cache lists to traverse. Again, the
 'Scaling Lucene and
 Solr' page goes into these scenarios and is well worth
 studying.
 
 
 
 On Wed, Mar 17, 2010 at 12:29 AM, KaktuChakarabati jimmoe...@gmail.comwrote:
 
 
  Hey Peter,
  Thanks for your reply.
  My question was mainly about the fact there seems to
 be two different
  aspects to the solr RAM usage: in-process and
 out-process.
  By that I mean, yes i know the many different
 parameters/caches to do with
  solr in-process memory usage and related culprits,
 however I also
  understand
  that as for actual index access (posting list,
 positional index etc), solr
  mostly delegates the access/caching of this to the
 OS/disk cache.
  So I guess my question is more about that: namely,
 what would be a good way
  to calculate an overall ram requirement profile for a
 server running solr?
  Also, I was under the impression benefits from
 RAMDirectory would be
  minimal
  when disk caches are effective no?
  And does RAMDirectory work with replication? if so,
 doesnt it slow it down?
  ( on each replication, load up entire index to RAM at
 once? )
 
 
 
  Peter Sturge wrote:
  
   On Tue, Mar 16, 2010 at 9:08 PM,
 KaktuChakarabati
   jimmoe...@gmail.comwrote:
  
  
   Hey,
   I am trying to understand what kind of
 calculation I should do in order
   to
   come up with reasonable RAM size for a given
 solr machine.
  
   Suppose the index size is at 16GB.
   The Max heap allocated to JVM is about 12GB.
  
   The machine I'm trying now has 24GB.
   When the machine is running for a while
 serving production, I can see in
   top
   that the resident memory taken by the jvm is
 indeed at 12gb.
   Now, on top of this i should assume that if i
 want the whole index to
  fit
   in
   disk cache i need about 12gb+16gb = 28GB of
 RAM just for that. Is this
   kind
   of calculation correct or am i off here?
  
  
   Hmmm..not quite. The idea of the ram usage isn't
 to simply hold the index
   in
   memory - if you want this use a RAMDirectory.
   The memory being used will be a combination of
 various caches (Lucene and
   Solr), index buffers et al., and of course the
 server itself. The
   specifics
   depend very
   much on what your server is doing at any given
 time - e.g. lots of
   concurrent searches, lots of indexing, both etc.,
 and how things are
  setup
   in your solrconfig.xml.
  
   A really excellent resource that's worth looking
 at regarding all this
  can
   be found here:
  
  
  http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr
  
  
  
   Any other recommendations Anyone could make
 w.r.t these numbers ?
  
   Thanks,
   -Chak
   --
   View this message in context:
   http