Thanks Walter, you guys gave me really nice ideas about RAM approximation. 2013/4/11 Walter Underwood <wun...@wunderwood.org>
> Here is the situation where merging can require 3X space. It can only > happen if you force merge, then index with merging turned off, but we had > Ultraseek customers do that. > > * All documents are merged into a single segment. > * Without a merge, all documents are replaced. > * This results in one segment of deleted documents and one of new > documents (2X). > * A merge takes place, creating a new segment of the same size, thus 3X. > > For normal operation, 2X is plenty of room. > > wunder > > On Apr 11, 2013, at 6:46 AM, Michael Ryan wrote: > > > I've investigated this in the past. The worst case is 2*indexSize > additional disk space (3*indexSize total) during an optimize. > > > > In our system, we use LogByteSizeMergePolicy, and used to have a > mergeFactor of 10. We would see the worst case happen when there were > exactly 20 segments (or some other multiple of 10, I believe) at the start > of the optimize. IIRC, it would merge those 20 segments down to 2 segments, > and then merge those 2 segments down to 1 segment. 1*indexSize space was > used by the original index (because there is still a reader open on it), > 1*indexSpace was used by the 2 segments, and 1*indexSize space was used by > the 1 segment. This is the worst case because there are two full additional > copies of the index on disk. Normally, when the number of segments is not a > multiple of the mergeFactor, there will be some part of the index that was > not part of both merges (and this part that is excluded usually would be > the largest segments). > > > > We worked around this by doing multiple optimize passes, where the first > pass merges down to between 2 and 2*mergeFactor-1 segments (based on a > great tip from Lance Norskog on the mailing list a couple years ago). > > > > I'm not sure if the current merge policy implementations still have this > issue. > > > > -Michael > > > > -----Original Message----- > > From: Furkan KAMACI [mailto:furkankam...@gmail.com] > > Sent: Thursday, April 11, 2013 2:44 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Approximately needed RAM for 5000 query/second at a Solr > machine? > > > > Hi Walter; > > > > Is there any document or something else says that worst case is three > times of disk space? Twice times or three times. It is really different > when we talk about GB's of disk spaces. > > > > > > 2013/4/10 Walter Underwood <wun...@wunderwood.org> > > > >> Correct, except the worst case maximum for disk space is three times. > >> --wunder > >> > >> On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote: > >> > >>> You're mixing up disk and RAM requirements when you talk about > >>> having twice the disk size. Solr does _NOT_ require twice the index > >>> size of RAM to optimize, it requires twice the size on _DISK_. > >>> > >>> In terms of RAM requirements, you need to create an index, run > >>> realistic queries at the installation and measure. > >>> > >>> Best > >>> Erick > >>> > >>> On Tue, Apr 9, 2013 at 10:32 PM, bigjust <bigj...@lambdaphil.es> > wrote: > >>>> > >>>> > >>>> > >>>>>> On 4/9/2013 7:03 PM, Furkan KAMACI wrote: > >>>>>>> These are really good metrics for me: > >>>>>>> You say that RAM size should be at least index size, and it is > >>>>>>> better to have a RAM size twice the index size (because of worst > >>>>>>> case scenario). > >>>>>>> On the other hand let's assume that I have a RAM size that is > >>>>>>> bigger than twice of indexes at machine. Can Solr use that extra > >>>>>>> RAM or is it a approximately maximum limit (to have twice size > >>>>>>> of indexes at machine)? > >>>>>> What we have been discussing is the OS cache, which is memory > >>>>>> that is not used by programs. The OS uses that memory to make > >>>>>> everything run faster. The OS will instantly give that memory up > >>>>>> if a program requests it. > >>>>>> Solr is a java program, and java uses memory a little > >>>>>> differently, so Solr most likely will NOT use more memory when it > is available. > >>>>>> In a "normal" directly executable program, memory can be > >>>>>> allocated at any time, and given back to the system at any time. > >>>>>> With Java, you tell it the maximum amount of memory the program > >>>>>> is ever allowed to use. Because of how memory is used inside > >>>>>> Java, most long-running Java programs (like Solr) will allocate > >>>>>> up to the configured maximum even if they don't really need that > much memory. > >>>>>> Most Java virtual machines will never give the memory back to the > >>>>>> system even if it is not required. > >>>>>> Thanks, Shawn > >>>>>> > >>>>>> > >>>> Furkan KAMACI <furkankam...@gmail.com> writes: > >>>> > >>>>> I am sorry but you said: > >>>>> > >>>>> *you need enough free RAM for the OS to cache the maximum amount > >>>>> of disk space all your indexes will ever use* > >>>>> > >>>>> I have made an assumption my indexes at my machine. Let's assume > >>>>> that it is 5 GB. So it is better to have at least 5 GB RAM? OK, > >>>>> Solr will use RAM up to how much I define it as a Java processes. > >>>>> When we think about the indexes at storage and caching them at RAM > >>>>> by OS, is that what you talk about: having more than 5 GB - or - > >>>>> 10 GB RAM for my machine? > >>>>> > >>>>> 2013/4/10 Shawn Heisey <s...@elyograg.org> > >>>>> > >>>> > >>>> 10 GB. Because when Solr shuffles the data around, it could use up > >>>> to twice the size of the index in order to optimize the index on disk. > >>>> > >>>> -- Justin > >> > >> -- > >> Walter Underwood > >> wun...@wunderwood.org > >> > >> > >> > >> > > -- > Walter Underwood > wun...@wunderwood.org > > > >