Re: Efficiently process observation event for local changes
fwiw: I think separating queues for listeners interested in local events from a queue for listeners interested in global events is a a very promising approach. Cheers Michael On 23 Mar 2015, at 16:03, Chetan Mehrotra chetan.mehro...@gmail.com wrote: After discussing this further with Marcel and Michael we came to conclusion that we can achieve similar performance by make use of persistent cache for storing the diff. This would require slight change in way we interpret the diff JSOP. This should not require any change in current logic related to observation event generation. Opened OAK-2669 to track that. One thing that we might still want to do is to use separate queue size for listeners interested in local events only and those which can work with external event. On a system like AEM there 180 listeners which listen for external changes and ~20 which only listen to local changes. So makes sense to have bigger queues for such listners Chetan Mehrotra On Mon, Mar 23, 2015 at 4:09 PM, Michael Dürig mdue...@apache.org wrote: On 23.3.15 11:03 , Stefan Egli wrote: Going one step further we could also discuss to completely moving the handling of the 'observation queues' to an actual messaging system. Whether this would be embedded to an oak instance or whether it would be shared between instances in an oak cluster might be a different question (the embedded variant would have less implication on the overall oak model, esp also timing-wise). But the observation model quite exactly matches the publish-subscribe semantics - it actually matches pub-sub more than it fits into the 'cache semantics' to me. Definitely something to try out, given someone find the time for it. ;-) Mind you that some time ago I implemented persisting events to Apache Kafka [1], which wasn't greeted with great enthusiasm though... OTOH the same concern regarding pushing the bottleneck to IO applies here. Furthermore filtering the persisted events through access control is something we need yet to figure out as AC is a) sessions scoped and b) depends on the tree hierarchy. Michael [1] https://github.com/mduerig/oak-kafka .. just saying .. On 3/23/15 10:47 AM, Michael Dürig mdue...@apache.org wrote: On 23.3.15 5:04 , Chetan Mehrotra wrote: B - Proposed Changes --- 1. Move the notion of listening to local events to Observer level - So upon any new change detected we only push the change to a given queue if its local and bounded listener is only interested in local. Currently we push all changes which later do get filter out but we avoid doing that first level itself and keep queue content limited to local changes only I think there is no change needed in the Observer API itself as you can already figure out from the passed CommitInfo whether a commit is external or not. BTW please take care with the term local as there is also the concept of session local commits. 2. Attach the calculated diff as part of commit info which is attached to the given change. This would allow eliminating the chances of the cache miss altogether and would ensure observation is not delayed due to slow processing of diff. This can be done on best effort basis if the diff is to large then we do not attach it and in that case we diff again 3. For listener which are only interested in local events we can use a different queue size limit i.e. allow larger queues for such listener. Later we can also look into using a journal (or persistent queue) for local event processing. Definitely something to try out. A few points to consider: * There doesn't seem to be too much of a difference to me whether this is routed via a cache or directly attached to commits. In wither way it adds additional memory requirements and churn, which need to be managed. * When introducing persisted queuing we need to be careful not to just move the bottleneck to IO. * An eventual implementation should not break the fundamental design. Either hide it in the implementation or find a clean way to put this into the overall design. Michael
Re: Efficiently process observation event for local changes
Related to this, I've created https://issues.apache.org/jira/browse/OAK-2683 which is about an issue that happens when the observation queue limit is reached. Cheers, Stefan On 3/23/15 4:03 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: After discussing this further with Marcel and Michael we came to conclusion that we can achieve similar performance by make use of persistent cache for storing the diff. This would require slight change in way we interpret the diff JSOP. This should not require any change in current logic related to observation event generation. Opened OAK-2669 to track that. One thing that we might still want to do is to use separate queue size for listeners interested in local events only and those which can work with external event. On a system like AEM there 180 listeners which listen for external changes and ~20 which only listen to local changes. So makes sense to have bigger queues for such listners Chetan Mehrotra On Mon, Mar 23, 2015 at 4:09 PM, Michael Dürig mdue...@apache.org wrote: On 23.3.15 11:03 , Stefan Egli wrote: Going one step further we could also discuss to completely moving the handling of the 'observation queues' to an actual messaging system. Whether this would be embedded to an oak instance or whether it would be shared between instances in an oak cluster might be a different question (the embedded variant would have less implication on the overall oak model, esp also timing-wise). But the observation model quite exactly matches the publish-subscribe semantics - it actually matches pub-sub more than it fits into the 'cache semantics' to me. Definitely something to try out, given someone find the time for it. ;-) Mind you that some time ago I implemented persisting events to Apache Kafka [1], which wasn't greeted with great enthusiasm though... OTOH the same concern regarding pushing the bottleneck to IO applies here. Furthermore filtering the persisted events through access control is something we need yet to figure out as AC is a) sessions scoped and b) depends on the tree hierarchy. Michael [1] https://github.com/mduerig/oak-kafka .. just saying .. On 3/23/15 10:47 AM, Michael Dürig mdue...@apache.org wrote: On 23.3.15 5:04 , Chetan Mehrotra wrote: B - Proposed Changes --- 1. Move the notion of listening to local events to Observer level - So upon any new change detected we only push the change to a given queue if its local and bounded listener is only interested in local. Currently we push all changes which later do get filter out but we avoid doing that first level itself and keep queue content limited to local changes only I think there is no change needed in the Observer API itself as you can already figure out from the passed CommitInfo whether a commit is external or not. BTW please take care with the term local as there is also the concept of session local commits. 2. Attach the calculated diff as part of commit info which is attached to the given change. This would allow eliminating the chances of the cache miss altogether and would ensure observation is not delayed due to slow processing of diff. This can be done on best effort basis if the diff is to large then we do not attach it and in that case we diff again 3. For listener which are only interested in local events we can use a different queue size limit i.e. allow larger queues for such listener. Later we can also look into using a journal (or persistent queue) for local event processing. Definitely something to try out. A few points to consider: * There doesn't seem to be too much of a difference to me whether this is routed via a cache or directly attached to commits. In wither way it adds additional memory requirements and churn, which need to be managed. * When introducing persisted queuing we need to be careful not to just move the bottleneck to IO. * An eventual implementation should not break the fundamental design. Either hide it in the implementation or find a clean way to put this into the overall design. Michael
Re: Efficiently process observation event for local changes
On 23.3.15 5:04 , Chetan Mehrotra wrote: B - Proposed Changes --- 1. Move the notion of listening to local events to Observer level - So upon any new change detected we only push the change to a given queue if its local and bounded listener is only interested in local. Currently we push all changes which later do get filter out but we avoid doing that first level itself and keep queue content limited to local changes only I think there is no change needed in the Observer API itself as you can already figure out from the passed CommitInfo whether a commit is external or not. BTW please take care with the term local as there is also the concept of session local commits. 2. Attach the calculated diff as part of commit info which is attached to the given change. This would allow eliminating the chances of the cache miss altogether and would ensure observation is not delayed due to slow processing of diff. This can be done on best effort basis if the diff is to large then we do not attach it and in that case we diff again 3. For listener which are only interested in local events we can use a different queue size limit i.e. allow larger queues for such listener. Later we can also look into using a journal (or persistent queue) for local event processing. Definitely something to try out. A few points to consider: * There doesn't seem to be too much of a difference to me whether this is routed via a cache or directly attached to commits. In wither way it adds additional memory requirements and churn, which need to be managed. * When introducing persisted queuing we need to be careful not to just move the bottleneck to IO. * An eventual implementation should not break the fundamental design. Either hide it in the implementation or find a clean way to put this into the overall design. Michael
Re: Efficiently process observation event for local changes
On 23.3.15 11:03 , Stefan Egli wrote: Going one step further we could also discuss to completely moving the handling of the 'observation queues' to an actual messaging system. Whether this would be embedded to an oak instance or whether it would be shared between instances in an oak cluster might be a different question (the embedded variant would have less implication on the overall oak model, esp also timing-wise). But the observation model quite exactly matches the publish-subscribe semantics - it actually matches pub-sub more than it fits into the 'cache semantics' to me. Definitely something to try out, given someone find the time for it. ;-) Mind you that some time ago I implemented persisting events to Apache Kafka [1], which wasn't greeted with great enthusiasm though... OTOH the same concern regarding pushing the bottleneck to IO applies here. Furthermore filtering the persisted events through access control is something we need yet to figure out as AC is a) sessions scoped and b) depends on the tree hierarchy. Michael [1] https://github.com/mduerig/oak-kafka .. just saying .. On 3/23/15 10:47 AM, Michael Dürig mdue...@apache.org wrote: On 23.3.15 5:04 , Chetan Mehrotra wrote: B - Proposed Changes --- 1. Move the notion of listening to local events to Observer level - So upon any new change detected we only push the change to a given queue if its local and bounded listener is only interested in local. Currently we push all changes which later do get filter out but we avoid doing that first level itself and keep queue content limited to local changes only I think there is no change needed in the Observer API itself as you can already figure out from the passed CommitInfo whether a commit is external or not. BTW please take care with the term local as there is also the concept of session local commits. 2. Attach the calculated diff as part of commit info which is attached to the given change. This would allow eliminating the chances of the cache miss altogether and would ensure observation is not delayed due to slow processing of diff. This can be done on best effort basis if the diff is to large then we do not attach it and in that case we diff again 3. For listener which are only interested in local events we can use a different queue size limit i.e. allow larger queues for such listener. Later we can also look into using a journal (or persistent queue) for local event processing. Definitely something to try out. A few points to consider: * There doesn't seem to be too much of a difference to me whether this is routed via a cache or directly attached to commits. In wither way it adds additional memory requirements and churn, which need to be managed. * When introducing persisted queuing we need to be careful not to just move the bottleneck to IO. * An eventual implementation should not break the fundamental design. Either hide it in the implementation or find a clean way to put this into the overall design. Michael
Re: Efficiently process observation event for local changes
Going one step further we could also discuss to completely moving the handling of the 'observation queues' to an actual messaging system. Whether this would be embedded to an oak instance or whether it would be shared between instances in an oak cluster might be a different question (the embedded variant would have less implication on the overall oak model, esp also timing-wise). But the observation model quite exactly matches the publish-subscribe semantics - it actually matches pub-sub more than it fits into the 'cache semantics' to me. .. just saying .. On 3/23/15 10:47 AM, Michael Dürig mdue...@apache.org wrote: On 23.3.15 5:04 , Chetan Mehrotra wrote: B - Proposed Changes --- 1. Move the notion of listening to local events to Observer level - So upon any new change detected we only push the change to a given queue if its local and bounded listener is only interested in local. Currently we push all changes which later do get filter out but we avoid doing that first level itself and keep queue content limited to local changes only I think there is no change needed in the Observer API itself as you can already figure out from the passed CommitInfo whether a commit is external or not. BTW please take care with the term local as there is also the concept of session local commits. 2. Attach the calculated diff as part of commit info which is attached to the given change. This would allow eliminating the chances of the cache miss altogether and would ensure observation is not delayed due to slow processing of diff. This can be done on best effort basis if the diff is to large then we do not attach it and in that case we diff again 3. For listener which are only interested in local events we can use a different queue size limit i.e. allow larger queues for such listener. Later we can also look into using a journal (or persistent queue) for local event processing. Definitely something to try out. A few points to consider: * There doesn't seem to be too much of a difference to me whether this is routed via a cache or directly attached to commits. In wither way it adds additional memory requirements and churn, which need to be managed. * When introducing persisted queuing we need to be careful not to just move the bottleneck to IO. * An eventual implementation should not break the fundamental design. Either hide it in the implementation or find a clean way to put this into the overall design. Michael