On Mon, Feb 07, 2011 at 02:06:00PM +0100, Markus Jelsma said:
> Heap usage can spike after a commit. Existing caches are still in use and new 
> caches are being generated and/or auto warmed. Can you confirm this is the 
> case?

We see spikes after replication which I suspect is, as you say, because 
of the ensuing commit.

What we seem to have found is that when we weren't using the Concurrent 
GC stop-the-world gc runs would kill the app. Now that we're using CMS 
we occasionally find ourselves in situations where the app still has 
memory "left over" but the load on the machine spikes, the GC duty cycle 
goes to 100 and the app never recovers.

Restarting usually helps but sometimes we have to take the machine out 
of the laod balancer, wait for a number of minutes and then out it back 
in.

We're working on two hypotheses 

Firstly - we're CPU bound somehow and that at some point we cross some 
threshhold and GC or something else is just unable to to keep up. So 
whilst it looks like instantaneous death of the app it's actually 
gradual resource exhaustion where the definition of 'gradual' is 'a very 
short period of time' (as opposed to some cataclysmic infinite loop bug 
somewhere).

Either that or ... Secondly - there's some sort of Query Of Death that 
kills machines. We just haven't found it yet, even when replaying logs. 

Or some combination of both. Or other things. It's maddeningly 
frustrating.

We're also got to try deploying a custom solr.war and try using the 
MMapDirectory to see if that helps with anything.





Reply via email to