Stefan Egli created OAK-2683:
--------------------------------

             Summary: the "hitting the observation queue limit" problem
                 Key: OAK-2683
                 URL: https://issues.apache.org/jira/browse/OAK-2683
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: core, mongomk, segmentmk
            Reporter: Stefan Egli
             Fix For: 1.3.0


There are several tickets in this area:
* OAK-2587: threading with observation being too eagar causing observation 
queue to grow
* OAK-2669: avoiding diffing from mongo by using persistent cache instead.
* OAK-2349: which might be a duplicate or at least similar to 2669..
* OAK-2562: diffcache is inefficient

Yet I think it makes sent to create this summarizing ticket, about describing 
again what happens when the observation queue hits the limit - and eventually 
about how this can be improved

Consider the following scenario (also compare with OAK-2587 - but that one 
focused more on eagerness of threading):
* rate of incoming commits is large and starts to generate many changes into 
the observation queues, hence those queue become somewhat filled/loaded
* depending on the underlying nodestore used the calculation of diffs is more 
or less expensive - but at least for mongomk it is important that the diff can 
be served from the cache
** in case of mongomk it can happen that diffs are no longer found in the cache 
and thus require a round-trip to mongo - which is magnitudes slower than via 
cache of course. this would result in the queue to start increasing even faster 
as dequeuing becomes slower now.
** not sure about tarmk - I believe it should always be fast there
* so based on the above, there can be a situation where the queue grows and 
hits the configured limit
* if this limit is reached, the current mechanism is to collapse any subsequent 
change into one-big-marked-as-external-event change, lets call this a 
collapsed-change.
* this collapsed-change now becomes part of the normal queue and eventually 
would 'walk down the queue' and be processed normally - hence opening a high 
chance that yet a new collapsed-change is created should the queue just hit the 
limit again. and this game can now be played for a while, resulting in the 
queue to contain many/mostly such collapse-changes.
* there is now an additional assumption in that the diffing of such collapses 
is more expensive than normal diffing - plus it is almost guaranteed that the 
diff cannot for example be shared between observation listeners, since the 
exact 'collapse borders' depends on timing of each of the listeners' queues - 
ie the collapse diffs are unique thus not cachable..
* so as a result: once you have those collapse-diffs you can almost not get rid 
of them - they are heavy to process - hence dequeuing is very slow
* at the same time, there is always likely some commits happening in a typical 
system, eg with sling on top you have sling discovery which does heartbeats 
every now and then. So there's always new commits that add to the load.
* this will hence create a situation where quite a small additional commit rate 
can keep all the queues filled - due to the fact that the queue is full with 
'heavy collapse diffs' that have to be calculated for each and every listener 
(of which you could have eg 150-200) individually.

So again, possible solutions for this:
* OAK-2669: tune diffing via persistent cache
* OAK-2587: have more threads to remain longer 'in the cache zone'
* tune your input speed explicitly to avoid filling the observation queues 
(this would be specific to your use-case of course, but can be seen as 
explicitly throttling on the input side)
* increase the relevant caches to the max
* but I think we will come up with yet a broader improvement of this 
observation queue limit problem by either
** doing flow control - eg via the commit rate limiter (also see OAK-1659)
** moving out handling of observation changes to a messaging subsystem - be it 
to handle local events only (since handling external events makes the system 
problematic wrt scalability if not done right) - also see [corresponding 
suggestion on dev list|http://markmail.org/message/b5trr6csyn4zzuj7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to