Re: fieldCache problem OOM exception

2011-07-23 Thread Santiago Bazerque
Bernd, in our case, optimizing the index seems to flush the FieldCache for
some reason. On the other hand, doing a few commits without optimizing seems
to make the problem worse.

Hope that helps, we would like to give it a try and debug this in Lucene,
but are pressed for time right now. Perhaps later next week we will.

Best,
Santiago

On Fri, Jul 22, 2011 at 4:01 AM, Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

> The current status of my installation is that with some tweeking of
> JAVA I get a runtime of about 2 weeks until OldGen (14GB) is filled
> to 100 percent and won't free anything even with FullGC.
> The part of fieldCache in a HeapDump to that time is over 80 percent
> from the whole heap (20GB). And that is what eats up all OldGen
> until OOM.
> Next week I will start with tomcat 6.x to see how that one behaves, but
> there isn't any hope. It is just a different container which wouldn't
> change anything about how Lucene eats up memory with fieldCache.
>
> After digging through all the code, logging and debugging I can say that it
> seams to be not a memory leak.
>
> Solr is using the fieldCache from Lucene under the hood of the servlet
> container.
> The fieldCache grows until everything cachable is in memory or OOM
> is reached, what ever comes first.
>
> The description says: "Provides introspection of the Lucene FieldCache,
> this is **NOT** a cache that is managed by Solr."
> So it seams to be a Lucene problem.
>
> As a matter of fact and due to this limitation solr can't be used
> with a single huge index. I don't know how other applications which are
> using Lucene and it's fieldCache (and there are a lot of them) are
> handling this and how they take care of the size of the fieldCache.
> And, I currently don't know how to calculate the limit.
> Say for example: the size of *.tii and *.tis file in the index should be
> the -Xmx size of your JAVA to be save with fieldCache and
> OOM.
>
> May be an expert can give more detailed info about fieldCache and its
> possible maximum size.
>
> Some data about our index:
> -rw-r--r-- 1 solr users 84448291214 19. Jul 10:43 _12jl.fdt
> -rw-r--r-- 1 solr users   236458468 19. Jul 10:43 _12jl.fdx
> -rw-r--r-- 1 solr users1208 19. Jul 10:30 _12jl.fnm
> -rw-r--r-- 1 solr users 19950615826 19. Jul 11:20 _12jl.frq
> -rw-r--r-- 1 solr users   532031548 19. Jul 11:20 _12jl.nrm
> -rw-r--r-- 1 solr users 20616887682 19. Jul 11:20 _12jl.prx
> -rw-r--r-- 1 solr users   291149087 19. Jul 11:20 _12jl.tii
> -rw-r--r-- 1 solr users 30850743727 19. Jul 11:20 _12jl.tis
> -rw-r--r-- 1 solr users  20  9. Jun 11:11 segments.gen
> -rw-r--r-- 1 solr users 274 19. Jul 11:20 segments_pl
> Size: 146,15 GB
> Docs: 29.557.308
>
>
> Regards,
> Bernd
>
>
> Am 22.07.2011 00:10, schrieb Santiago Bazerque:
>
>  Hello Erick,
>>
>> I have a 1.7MM documents, 3.6GB index. I also hava an unusual amount of
>> dynamic fields, that I use for sorting. My FieldCache currently has about
>> 13.000 entries, even though my index only has 1-3 queries per second. Each
>> query sorts by two dynamic fields, and facets on 3-4 fields that are
>> fixed.
>> These latter fields are always in the field cache, what I find suspicious
>> is
>> the other ~13.000 that are sitting there.
>>
>> I am using a 32GB heap, and I am seeing periodical OOM errors (I didn't
>> spot
>> a regular pattern as Bernd did, but haven't increased RAM as methodically
>> as
>> he has).
>>
>> If you need any more info, I'll be glad to post it to the list.
>>
>> Best,
>> Santiago
>>
>> On Fri, Jun 17, 2011 at 9:13 AM, Erick 
>> Erickson
>> >wrote:
>>
>>  Sorry, it was late last night when I typed that...
>>>
>>> Basically, if you sort and facet on #all# the fields you mentioned, it
>>> should populate
>>> the cache in one go. If the problem is that you just have too many unique
>>> terms
>>> for all those operations, then it should go bOOM.
>>>
>>> But, frankly, that's unlikely, I'm just suggesting that to be sure the
>>> easy case isn't
>>> the problem. Take a memory snapshot at that point just to see, it should
>>> be
>>> a
>>> high-water mark.
>>>
>>> The fact that you increase the heap and can then run for longer is
>>> extremely
>>> suspicious, and really smells like a memory issue, so we'd like to pursue
>>> it.
>>>
>>> I'd be really interested if anyone else is seeing anything 

Re: fieldCache problem OOM exception

2011-07-21 Thread Santiago Bazerque
Hello Erick,

I have a 1.7MM documents, 3.6GB index. I also hava an unusual amount of
dynamic fields, that I use for sorting. My FieldCache currently has about
13.000 entries, even though my index only has 1-3 queries per second. Each
query sorts by two dynamic fields, and facets on 3-4 fields that are fixed.
These latter fields are always in the field cache, what I find suspicious is
the other ~13.000 that are sitting there.

I am using a 32GB heap, and I am seeing periodical OOM errors (I didn't spot
a regular pattern as Bernd did, but haven't increased RAM as methodically as
he has).

If you need any more info, I'll be glad to post it to the list.

Best,
Santiago

On Fri, Jun 17, 2011 at 9:13 AM, Erick Erickson wrote:

> Sorry, it was late last night when I typed that...
>
> Basically, if you sort and facet on #all# the fields you mentioned, it
> should populate
> the cache in one go. If the problem is that you just have too many unique
> terms
> for all those operations, then it should go bOOM.
>
> But, frankly, that's unlikely, I'm just suggesting that to be sure the
> easy case isn't
> the problem. Take a memory snapshot at that point just to see, it should be
> a
> high-water mark.
>
> The fact that you increase the heap and can then run for longer is
> extremely
> suspicious, and really smells like a memory issue, so we'd like to pursue
> it.
>
> I'd be really interested if anyone else is seeing anything similar,
> these are the
> scary ones...
>
> Best
> Erick
>
> On Fri, Jun 17, 2011 at 3:09 AM, Bernd Fehling
>  wrote:
> > Hi Erik,
> > I will take some memory snapshots during the next week,
> > but how can it be to get OOMs with one query?
> >
> > - I started with 6g for JVM --> 1 day until OOM.
> > - increased to 8 g --> 2 days until OOM
> > - increased to 10g --> 3.5 days until OOM
> > - increased to 16g --> 5 days until OOM
> > - currently 20g --> about 7 days until OOM
> >
> > Starting the system takes about 3.5g and goes up to about 4g after a
> while.
> >
> > The only dirty workaround so far is to restart the whole system after 5
> > days.
> > Not really nice.
> >
> > The problem seams to be fieldCache which is under the hood of jetty.
> > Do you know of any sizing features for fieldCache to limit the memory
> > consumption?
> >
> > Regards
> > Bernd
> >
> > Am 17.06.2011 03:37, schrieb Erick Erickson:
> >>
> >> Well, if my theory is right, you should be able to generate OOMs at will
> >> by
> >> sorting and faceting on all your fields in one query.
> >>
> >> But Lucene's cache should be garbage collected, can you take some memory
> >> snapshots during the week? It should hit a point and stay steady there.
> >>
> >> How much memory are you giving your JVM? It looks like a lot given your
> >> memory snapshot.
> >>
> >> Best
> >> Erick
> >>
> >> On Thu, Jun 16, 2011 at 3:01 AM, Bernd Fehling
> >>   wrote:
> >>>
> >>> Hi Erik,
> >>>
> >>> yes I'm sorting and faceting.
> >>>
> >>> 1) Fields for sorting:
> >>>   sort=f_dccreator_sort, sort=f_dctitle, sort=f_dcyear
> >>>   The parameter "facet.sort=" is empty, only using parameter "sort=".
> >>>
> >>> 2) Fields for faceting:
> >>>   f_dcperson, f_dcsubject, f_dcyear, f_dccollection, f_dclang,
> >>> f_dctypenorm,
> >>> f_dccontenttype
> >>>   Other faceting parameters:
> >>>
> >>>
> >>>
> ...&facet=true&facet.mincount=1&facet.limit=100&facet.sort=&facet.prefix=&...
> >>>
> >>> 3) The LukeRequestHandler takes too long for my huge index so this is
> >>> from
> >>>   the standalone luke (compiled for solr3.2):
> >>>   f_dccreator_sort = 10.029.196
> >>>   f_dctitle= 21.514.939
> >>>   f_dcyear =  1.471
> >>>   f_dcperson   = 14.138.165
> >>>   f_dcsubject  =  8.012.319
> >>>   f_dccollection   =  1.863
> >>>   f_dclang =299
> >>>   f_dctypenorm = 14
> >>>   f_dccontenttype  =497
> >>>
> >>> numDocs:28.940.964
> >>> numTerms:  686.813.235
> >>> optimized:true
> >>> hasDeletions:false
> >>>
> >>> What can you read/calculate from this values?
> >>>
> >>> Is my index to big for Lucene/Solr?
> >>>
> >>> What I don't understand, why fieldCache is not garbage collected
> >>> and therefore reduced in size from time to time.
> >>>
> >>> Regards
> >>> Bernd
> >>>
> >>> Am 15.06.2011 17:50, schrieb Erick Erickson:
> 
>  The first question I have is whether you're sorting and/or
>  faceting on many unique string values? I'm guessing
>  that sometime you are. So, some questions to help
>  pin it down:
>  1>what fields are you sorting on?
>  2>what fields are you faceting on?
>  3>how many unique terms in each (see the solr admin page).
> 
>  Best
>  Erick
> 
>  On Wed, Jun 15, 2011 at 8:22 AM, Bernd Fehling
>  wrote:
> >
> > Dear list,
> >
> > after getting OOM exception after one week of operation with
> > solr 3.2 I used MemoryAnalyzer for the heapdumpfile.
> > It looks like the field

Re: Weird optimize performance degradation

2011-06-22 Thread Santiago Bazerque
Thanks for your answers Erick & Mohammad!

I'll get back to the list if I have more specific info about this issue, so
far the index is performing normally again.

Best,
Santiago

On Mon, Jun 20, 2011 at 9:29 AM, Erick Erickson wrote:

> Hmmm, that is odd, anyone else want to chime in here?
>
> But optimizing isn't going to help with the strange commit
> times, it'll only make it worse. It's not doing you much if
> any good, so I'd think about not optimizing
>
> About the commit times in general. Depending upon when the
> merge happens, lots of work can go on under the covers.
>
> Here's a detailed look at merging...
> http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/
>
> But the short form is that, depending upon the number of
> segments and the merge policy, you may periodically hit
> a commit that copies, perhaps,  #all# the current segments
> into a single segment, which will create a large pause.
>
> But it's always possible that something's wonky with documents
> that have a very large number of fields.
>
> There's some interesting work being done on trunk to flatten
> out this curve, but that's not going to do you much good
> in the 3.x code line...
>
> Best
> Erick
>
> On Sun, Jun 19, 2011 at 10:32 AM, Santiago Bazerque 
> wrote:
> > Hello Erick, thanks for your answer!
> >
> > Yes, our over-optimization is mainly due to paranoia over these strange
> > commit times. The long optimize time persisted in all the subsequent
> > commits, and this is consistent with what we are seeing in other
> production
> > indexes that have the same problem. Once the anomaly shows up, it never
> > commits quickly again.
> >
> > I combed through the last 50k documents that were added before the first
> > slow commit. I found one with a larger than usual number of fields
> (didn't
> > write down the number, but it was a few thousands).
> >
> > I deleted it, and the following optimize was normal again (110 seconds).
> So
> > I'm pretty sure a document with lots of fields is the cause of the
> slowdown.
> >
> > If that would be useful, I can do some further testing to confirm this
> > hypothesis and send the document to the list.
> >
> > Thanks again for your answer.
> >
> > Best,
> > Santiago
> >
> > On Sun, Jun 19, 2011 at 10:21 AM, Erick Erickson <
> erickerick...@gmail.com>wrote:
> >
> >> First, there's absolutely no reason to optimize this often, if at all.
> >> Older
> >> versions of Lucene would search faster on an optimized index, but
> >> this is no longer necessary. Optimize will reclaim data from
> >> deleted documents, but is generally recommended to be performed
> >> fairly rarely, often at off-peak hours.
> >>
> >> Note that optimize will re-write your entire index into a single new
> >> segment,
> >> so following your pattern it'll take longer and longer each time.
> >>
> >> But the speed change happening at 500,000 documents is suspiciously
> >> close to the default mergeFactor of 10 X 50,000. Do subsequent
> >> optimizes (i.e. on the 750,000th document) still take that long? But
> >> this doesn't make sense because if you're optimizing instead of
> >> committing, each optimize should reduce your index to 1 segment and
> >> you'll never hit a merge.
> >>
> >> So I'm a little confused. If you're really optimizing every 50K docs,
> what
> >> I'd expect to see is successively longer times, and at the end of each
> >> optimize I'd expect there to be only one segment in your index.
> >>
> >> Are you sure you're not just seeing successively longer times on each
> >> optimize and just noticing it after 10?
> >>
> >> Best
> >> Erick
> >>
> >> On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque  >
> >> wrote:
> >> > Hello!
> >> >
> >> > Here is a puzzling experiment:
> >> >
> >> > I build an index of about 1.2MM documents using SOLR 3.1. The index
> has a
> >> > large number of dynamic fields (about 15.000). Each document has about
> >> 100
> >> > fields.
> >> >
> >> > I add the documents in batches of 20, and every 50.000 documents I
> >> optimize
> >> > the index.
> >> >
> >> > The first 10 optimizes (up to exactly 500k documents) take less than a
> >> > minute and a half.
>

Re: Weird optimize performance degradation

2011-06-19 Thread Santiago Bazerque
Hello Erick, thanks for your answer!

Yes, our over-optimization is mainly due to paranoia over these strange
commit times. The long optimize time persisted in all the subsequent
commits, and this is consistent with what we are seeing in other production
indexes that have the same problem. Once the anomaly shows up, it never
commits quickly again.

I combed through the last 50k documents that were added before the first
slow commit. I found one with a larger than usual number of fields (didn't
write down the number, but it was a few thousands).

I deleted it, and the following optimize was normal again (110 seconds). So
I'm pretty sure a document with lots of fields is the cause of the slowdown.

If that would be useful, I can do some further testing to confirm this
hypothesis and send the document to the list.

Thanks again for your answer.

Best,
Santiago

On Sun, Jun 19, 2011 at 10:21 AM, Erick Erickson wrote:

> First, there's absolutely no reason to optimize this often, if at all.
> Older
> versions of Lucene would search faster on an optimized index, but
> this is no longer necessary. Optimize will reclaim data from
> deleted documents, but is generally recommended to be performed
> fairly rarely, often at off-peak hours.
>
> Note that optimize will re-write your entire index into a single new
> segment,
> so following your pattern it'll take longer and longer each time.
>
> But the speed change happening at 500,000 documents is suspiciously
> close to the default mergeFactor of 10 X 50,000. Do subsequent
> optimizes (i.e. on the 750,000th document) still take that long? But
> this doesn't make sense because if you're optimizing instead of
> committing, each optimize should reduce your index to 1 segment and
> you'll never hit a merge.
>
> So I'm a little confused. If you're really optimizing every 50K docs, what
> I'd expect to see is successively longer times, and at the end of each
> optimize I'd expect there to be only one segment in your index.
>
> Are you sure you're not just seeing successively longer times on each
> optimize and just noticing it after 10?
>
> Best
> Erick
>
> On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque 
> wrote:
> > Hello!
> >
> > Here is a puzzling experiment:
> >
> > I build an index of about 1.2MM documents using SOLR 3.1. The index has a
> > large number of dynamic fields (about 15.000). Each document has about
> 100
> > fields.
> >
> > I add the documents in batches of 20, and every 50.000 documents I
> optimize
> > the index.
> >
> > The first 10 optimizes (up to exactly 500k documents) take less than a
> > minute and a half.
> >
> > But the 11th and all subsequent commits take north of 10 minutes. The
> commit
> > logs look identical (in the INFOSTREAM.txt file), but what used to be
> >
> >   Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene
> Merge
> > Thread #0]: merge: total 50 docs
> >
> > Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene Merge
> > Thread #0]: merge store matchedCount=2 vs 2
> >
> >
> > now eats a lot of time:
> >
> >
> >   Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene
> Merge
> > Thread #0]: merge: total 55 docs
> >
> > Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene Merge
> > Thread #0]: merge store matchedCount=2 vs 2
> >
> >
> > What could be happening between those two lines that takes 10 minutes at
> > full CPU? (and with 50k docs less used to take so much less?).
> >
> >
> > Thanks in advance,
> >
> > Santiago
> >
>


Weird optimize performance degradation

2011-06-19 Thread Santiago Bazerque
Hello!

Here is a puzzling experiment:

I build an index of about 1.2MM documents using SOLR 3.1. The index has a
large number of dynamic fields (about 15.000). Each document has about 100
fields.

I add the documents in batches of 20, and every 50.000 documents I optimize
the index.

The first 10 optimizes (up to exactly 500k documents) take less than a
minute and a half.

But the 11th and all subsequent commits take north of 10 minutes. The commit
logs look identical (in the INFOSTREAM.txt file), but what used to be

   Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene Merge
Thread #0]: merge: total 50 docs

Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene Merge
Thread #0]: merge store matchedCount=2 vs 2


now eats a lot of time:


   Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene Merge
Thread #0]: merge: total 55 docs

Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene Merge
Thread #0]: merge store matchedCount=2 vs 2


What could be happening between those two lines that takes 10 minutes at
full CPU? (and with 50k docs less used to take so much less?).


Thanks in advance,

Santiago


Large number of dynamic fields

2011-06-02 Thread Santiago Bazerque
Hello,

I have a 7Gb index having 2MM documents. Each document has about 400 fields,
but fields are dynamic and in total I have ~200k fields.

We're using SOLR 3.1 and tomcat 5.5. We are seeing very slow start-up times
(from tomcat startup to SOLR ready to answer queries about 5 minutes). We
have tried from 8 to 32GB of memory, with little difference.

Would you say SOLR is not suitable for such a large number of fields?
Committing ~10k docs takes about 5 minutes as well.

Thanks in advance,
Santiago


Slow, CPU-bound commit

2011-05-09 Thread Santiago Bazerque
Hello,

I am using the new SOLR 3.1 for a 2.6 Gb, 1MM documents index. Reading the
forums and the archive I learned that SOLR and Lucene now manage commits and
transactions a bit differently than in previous versions, and indeed I feel
the behavior has changed.

Here's the thing: committing a few 100s documents is consistently taking
about 12 minutes of pure CPU fury on a 6168 AMD Opteron processor.

Here is a log of the commit (taken from INFOSTREAM.txt):

http://pastebin.com/1rFK3Fs1

These numbers improved significantly after we increased ramBufferSizeMB to
1024, here is the full solrconfig.xml:

http://pastebin.com/M1Tw0ATe

Does these numbers look normal to you? The index is being used for searching
while the commit takes place (about 1 search per second).

Thanks in advance,
Santiago