[ 
https://issues.apache.org/jira/browse/OAK-1133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810698#comment-13810698
 ] 

Alexander Klimetschek edited comment on OAK-1133 at 10/31/13 8:59 PM:
----------------------------------------------------------------------

*Clustered/external events* are a somewhat separate topic. I totally agree that 
you want to avoid them. AFAICS most events can be handled locally - from my 
experience most application use cases require local handling anyway, since 
there are other things local to the instance that you depend on, mostly a 
sticky session from the web server to ensure users see the data as quickly as 
possible.

But still I think an enormous waste is going on if you look at *actual 
listeners in applications that register broadly for all events* and you need 
all the eventing/threading going on, just to figure out in 80% of the cases 
that this event can be discarded by the listener.  And the listener has to read 
the repository data again in a separate session just to do the check. *This is 
not convenience, this is reducing unnecessary work*.

Now the same principle can be used for cluster events, if you need them: I 
don't think an external mechanism such as JMS would really help, as you would 
add an extra data stream between instances that would need to send *all 
events*, since you cannot know if a listener on a target instance is interested 
in it or not (you cannot assume the listener code is shared and the specific 
registration happens on all cluster nodes, allowing you to filter out events to 
just the ones for which there are listeners on the other nodes).

There already is a cluster sync happening between instances and once it arrives 
on the target instance, the same approach as proposed here could happen: those 
registered filters would run (not sure if the oak {{Observer}} as hook works 
here) and only trigger events (local) if needed. Of course those *listener 
registrations would include whether they care about local or external events*. 
Again different from today where you have to look at the event as it arrives in 
the listener to decide "oh this was external, no I don't care" after having 
already wasted precious resources for that event. And by default a listener 
would not register for external events, it would need to be a very dedicated 
extra step in the registration API to do so, to discourage accidentally 
registering for them, based on the experience that maybe 98% of observation use 
cases don't need external events.


was (Author: alexander.klimetschek):
Clustered/external events is a somewhat separate topic. I totally agree that 
you want to avoid them. AFAICS most events can be handled locally - from my 
experience most application use cases require local handling anyway, since 
there are other things local to the instance that you depend on, mostly a 
sticky session from the web server to ensure users see the data as quickly as 
possible.

But still I think an enormous waste is going on if you look at actual listeners 
in applications that registers broadly for all events and you need all the 
eventing/threading going on, just to figure out in 80% of the cases that this 
event can be discarded by the listener.  And the listener has to read the 
repository data again in a separate session just to do the check. This is not 
convenience, this is reducing unnecessary work.

Now the same principle can be used for cluster events, if you need them: I 
don't think an external mechanism such as JMS would really help, as you would 
add an extra data stream between instances that would need to send *all 
events*, since you cannot know if a listener on a target instance is interested 
in it or not (you cannot assume the listener code is shared and the specific 
registration happens on all cluster nodes, allowing you to filter out events to 
just the ones for which there are listeners on the other nodes).

There already is a cluster sync happening between instances and once it arrives 
on the target instance, the same approach as proposed here would happen: those 
registered filters would run (not sure if the oak {{Observer}} as hook works 
here) and only send out events if needed. Of course those registrations/filters 
would include whether they care about local or external events. Again different 
from today where you have to look at the event as it arrives in the listener to 
decide "oh this was external, no I don't care" after having already wasted 
precious resources for that event. And by default a listener would not register 
for external events, it would need to be a very dedicated extra step in the 
registration API to do so, to discourage accidentally registering for them, 
based on the experience that maybe 98% of observation use cases don't need 
external events.

> Observation listener PLUS
> -------------------------
>
>                 Key: OAK-1133
>                 URL: https://issues.apache.org/jira/browse/OAK-1133
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: commons, jcr
>            Reporter: Alexander Klimetschek
>              Labels: performance
>
> Oak should provide an *extended and efficient JCR observation listener* 
> mechanism to support common use cases not handled well by the restricted 
> options of the JCR observation (only base path, node types and raw events). 
> Those cases require listeners to register much more broadly and then filter 
> out their specific cases themselves, thus putting too many events into the 
> observation system and creating a huge overhead due to asynchronous access to 
> the modified JCR data to do the filtering. This easily is a big performance 
> bottleneck with many writes and thus many events.
> Previous discussions [on the 
> list|http://markmail.org/message/oyq7fnfrveceemoh] and in OAK-1120.
> The goals should be:
> * performance: handle filtering as early as possible, during the commit, 
> where access to the modified data is already present
> * provide robust implementation for typical filtering cases
> * provide an asynchronous listener mechanism as in JCR
> * minimize effect on the lower levels on Oak (a visible addition in 
> oak-commons or oak-jcr should be enough)
> * for delete events, allow filtering on the to-be-deleted data (currently not 
> possible in jcr listeners that run after the fact)
> * if possible: design as an extension of the jcr observation to simplify 
> migration for existing code
> * if possible: provide an intelligent listener that can work with pure JCR 
> (aka Jackrabbit 2) as well, by falling back to in-listener-filtering
> * maybe: synchronous option using the same simple interface (instead of raw 
> Oak plugins itself); however, not sure if there is a benefit if they can only 
> read data and not change or block the session commit
> Typical filtering cases:
> - paths with globbing support (for example /content/foo/*/something)
> - check for property values (equal, not equal, contains etc.), most 
> importantly
> sling:resourceType in Sling apps
> - allow to check properties on child nodes as well, typically jcr:content
> - node types (already in jcr observation)
> - created/modified/deleted events, separate from move/copy
> - and more... a custom filter should be possible to pass through (with 
> similar access as the {{Observer}})



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to