Re: Efficiently process observation event for local changes

2015-03-30 Thread Michael Marth
fwiw: I think separating queues for listeners interested in local events from a 
queue for listeners interested in global events is a a very promising approach.

Cheers
Michael

 On 23 Mar 2015, at 16:03, Chetan Mehrotra chetan.mehro...@gmail.com wrote:
 
 After discussing this further with Marcel and Michael we came to conclusion
 that we can achieve similar performance by make use of persistent cache for
 storing the diff. This would require slight change in way we interpret the
 diff JSOP. This should not require any change in current logic related to
 observation event generation. Opened OAK-2669 to track that.
 
 One thing that we might still want to do is to use separate queue size for
 listeners interested in local events only and those which can work with
 external event. On a system like AEM there 180 listeners which listen for
 external changes and ~20 which only listen to local changes. So makes sense
 to have bigger queues for such listners
 
 Chetan Mehrotra
 
 On Mon, Mar 23, 2015 at 4:09 PM, Michael Dürig mdue...@apache.org wrote:
 
 
 
 On 23.3.15 11:03 , Stefan Egli wrote:
 
 Going one step further we could also discuss to completely moving the
 handling of the 'observation queues' to an actual messaging system.
 Whether this would be embedded to an oak instance or whether it would be
 shared between instances in an oak cluster might be a different question
 (the embedded variant would have less implication on the overall oak
 model, esp also timing-wise). But the observation model quite exactly
 matches the publish-subscribe semantics - it actually matches pub-sub more
 than it fits into the 'cache semantics' to me.
 
 
 Definitely something to try out, given someone find the time for it. ;-)
 Mind you that some time ago I implemented persisting events to Apache Kafka
 [1], which wasn't greeted with great enthusiasm though...
 
 OTOH the same concern regarding pushing the bottleneck to IO applies here.
 Furthermore filtering the persisted events through access control is
 something we need yet to figure out as AC is a) sessions scoped and b)
 depends on the tree hierarchy.
 
 Michael
 
 
 [1] https://github.com/mduerig/oak-kafka
 
 
 
 .. just saying ..
 
 On 3/23/15 10:47 AM, Michael Dürig mdue...@apache.org wrote:
 
 
 
 On 23.3.15 5:04 , Chetan Mehrotra wrote:
 
 B - Proposed Changes
 ---
 
 1. Move the notion of listening to local events to Observer level - So
 upon
 any new change detected we only push the change to a given queue if its
 local and bounded listener is only interested in local. Currently we
 push
 all changes which later do get filter out but we avoid doing that first
 level itself and keep queue content limited to local changes only
 
 
 I think there is no change needed in the Observer API itself as you can
 already figure out from the passed CommitInfo whether a commit is
 external or not. BTW please take care with the term local as there is
 also the concept of session local commits.
 
 
 2. Attach the calculated diff as part of commit info which is attached
 to
 the given change. This would allow eliminating the chances of the cache
 miss altogether and would ensure observation is not delayed due to slow
 processing of diff. This can be done on best effort basis if the diff
 is to
 large then we do not attach it and in that case we diff again
 
 3. For listener which are only interested in local events we can use a
 different queue size limit i.e. allow larger queues for such listener.
 
 Later we can also look into using a journal (or persistent queue) for
 local
 event processing.
 
 
 Definitely something to try out. A few points to consider:
 
 * There doesn't seem to be too much of a difference to me whether this
 is routed via a cache or directly attached to commits. In wither way it
 adds additional memory requirements and churn, which need to be managed.
 
 * When introducing persisted queuing we need to be careful not to just
 move the bottleneck to IO.
 
 * An eventual implementation should not break the fundamental design.
 Either hide it in the implementation or find a clean way to put this
 into the overall design.
 
 Michael
 
 
 
 



Re: Efficiently process observation event for local changes

2015-03-25 Thread Stefan Egli
Related to this, I've created

https://issues.apache.org/jira/browse/OAK-2683

which is about an issue that happens when the observation queue limit is
reached.

Cheers,
Stefan

On 3/23/15 4:03 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote:

After discussing this further with Marcel and Michael we came to
conclusion
that we can achieve similar performance by make use of persistent cache
for
storing the diff. This would require slight change in way we interpret the
diff JSOP. This should not require any change in current logic related to
observation event generation. Opened OAK-2669 to track that.

One thing that we might still want to do is to use separate queue size for
listeners interested in local events only and those which can work with
external event. On a system like AEM there 180 listeners which listen for
external changes and ~20 which only listen to local changes. So makes
sense
to have bigger queues for such listners

Chetan Mehrotra

On Mon, Mar 23, 2015 at 4:09 PM, Michael Dürig mdue...@apache.org wrote:



 On 23.3.15 11:03 , Stefan Egli wrote:

 Going one step further we could also discuss to completely moving the
 handling of the 'observation queues' to an actual messaging system.
 Whether this would be embedded to an oak instance or whether it would
be
 shared between instances in an oak cluster might be a different
question
 (the embedded variant would have less implication on the overall oak
 model, esp also timing-wise). But the observation model quite exactly
 matches the publish-subscribe semantics - it actually matches pub-sub
more
 than it fits into the 'cache semantics' to me.


 Definitely something to try out, given someone find the time for it. ;-)
 Mind you that some time ago I implemented persisting events to Apache
Kafka
 [1], which wasn't greeted with great enthusiasm though...

 OTOH the same concern regarding pushing the bottleneck to IO applies
here.
 Furthermore filtering the persisted events through access control is
 something we need yet to figure out as AC is a) sessions scoped and b)
 depends on the tree hierarchy.

 Michael


 [1] https://github.com/mduerig/oak-kafka



 .. just saying ..

 On 3/23/15 10:47 AM, Michael Dürig mdue...@apache.org wrote:



 On 23.3.15 5:04 , Chetan Mehrotra wrote:

 B - Proposed Changes
 ---

 1. Move the notion of listening to local events to Observer level -
So
 upon
 any new change detected we only push the change to a given queue if
its
 local and bounded listener is only interested in local. Currently we
 push
 all changes which later do get filter out but we avoid doing that
first
 level itself and keep queue content limited to local changes only


 I think there is no change needed in the Observer API itself as you
can
 already figure out from the passed CommitInfo whether a commit is
 external or not. BTW please take care with the term local as there
is
 also the concept of session local commits.


 2. Attach the calculated diff as part of commit info which is
attached
 to
 the given change. This would allow eliminating the chances of the
cache
 miss altogether and would ensure observation is not delayed due to
slow
 processing of diff. This can be done on best effort basis if the diff
 is to
 large then we do not attach it and in that case we diff again

 3. For listener which are only interested in local events we can use
a
 different queue size limit i.e. allow larger queues for such
listener.

 Later we can also look into using a journal (or persistent queue) for
 local
 event processing.


 Definitely something to try out. A few points to consider:

 * There doesn't seem to be too much of a difference to me whether this
 is routed via a cache or directly attached to commits. In wither way
it
 adds additional memory requirements and churn, which need to be
managed.

 * When introducing persisted queuing we need to be careful not to just
 move the bottleneck to IO.

 * An eventual implementation should not break the fundamental design.
 Either hide it in the implementation or find a clean way to put this
 into the overall design.

 Michael








Re: Efficiently process observation event for local changes

2015-03-23 Thread Michael Dürig



On 23.3.15 5:04 , Chetan Mehrotra wrote:

B - Proposed Changes
---

1. Move the notion of listening to local events to Observer level - So upon
any new change detected we only push the change to a given queue if its
local and bounded listener is only interested in local. Currently we push
all changes which later do get filter out but we avoid doing that first
level itself and keep queue content limited to local changes only


I think there is no change needed in the Observer API itself as you can 
already figure out from the passed CommitInfo whether a commit is 
external or not. BTW please take care with the term local as there is 
also the concept of session local commits.




2. Attach the calculated diff as part of commit info which is attached to
the given change. This would allow eliminating the chances of the cache
miss altogether and would ensure observation is not delayed due to slow
processing of diff. This can be done on best effort basis if the diff is to
large then we do not attach it and in that case we diff again

3. For listener which are only interested in local events we can use a
different queue size limit i.e. allow larger queues for such listener.

Later we can also look into using a journal (or persistent queue) for local
event processing.


Definitely something to try out. A few points to consider:

* There doesn't seem to be too much of a difference to me whether this 
is routed via a cache or directly attached to commits. In wither way it 
adds additional memory requirements and churn, which need to be managed.


* When introducing persisted queuing we need to be careful not to just 
move the bottleneck to IO.


* An eventual implementation should not break the fundamental design. 
Either hide it in the implementation or find a clean way to put this 
into the overall design.


Michael


Re: Efficiently process observation event for local changes

2015-03-23 Thread Michael Dürig



On 23.3.15 11:03 , Stefan Egli wrote:

Going one step further we could also discuss to completely moving the
handling of the 'observation queues' to an actual messaging system.
Whether this would be embedded to an oak instance or whether it would be
shared between instances in an oak cluster might be a different question
(the embedded variant would have less implication on the overall oak
model, esp also timing-wise). But the observation model quite exactly
matches the publish-subscribe semantics - it actually matches pub-sub more
than it fits into the 'cache semantics' to me.


Definitely something to try out, given someone find the time for it. ;-) 
Mind you that some time ago I implemented persisting events to Apache 
Kafka [1], which wasn't greeted with great enthusiasm though...


OTOH the same concern regarding pushing the bottleneck to IO applies 
here. Furthermore filtering the persisted events through access control 
is something we need yet to figure out as AC is a) sessions scoped and 
b) depends on the tree hierarchy.


Michael


[1] https://github.com/mduerig/oak-kafka



.. just saying ..

On 3/23/15 10:47 AM, Michael Dürig mdue...@apache.org wrote:




On 23.3.15 5:04 , Chetan Mehrotra wrote:

B - Proposed Changes
---

1. Move the notion of listening to local events to Observer level - So
upon
any new change detected we only push the change to a given queue if its
local and bounded listener is only interested in local. Currently we
push
all changes which later do get filter out but we avoid doing that first
level itself and keep queue content limited to local changes only


I think there is no change needed in the Observer API itself as you can
already figure out from the passed CommitInfo whether a commit is
external or not. BTW please take care with the term local as there is
also the concept of session local commits.



2. Attach the calculated diff as part of commit info which is attached
to
the given change. This would allow eliminating the chances of the cache
miss altogether and would ensure observation is not delayed due to slow
processing of diff. This can be done on best effort basis if the diff
is to
large then we do not attach it and in that case we diff again

3. For listener which are only interested in local events we can use a
different queue size limit i.e. allow larger queues for such listener.

Later we can also look into using a journal (or persistent queue) for
local
event processing.


Definitely something to try out. A few points to consider:

* There doesn't seem to be too much of a difference to me whether this
is routed via a cache or directly attached to commits. In wither way it
adds additional memory requirements and churn, which need to be managed.

* When introducing persisted queuing we need to be careful not to just
move the bottleneck to IO.

* An eventual implementation should not break the fundamental design.
Either hide it in the implementation or find a clean way to put this
into the overall design.

Michael





Re: Efficiently process observation event for local changes

2015-03-23 Thread Stefan Egli
Going one step further we could also discuss to completely moving the
handling of the 'observation queues' to an actual messaging system.
Whether this would be embedded to an oak instance or whether it would be
shared between instances in an oak cluster might be a different question
(the embedded variant would have less implication on the overall oak
model, esp also timing-wise). But the observation model quite exactly
matches the publish-subscribe semantics - it actually matches pub-sub more
than it fits into the 'cache semantics' to me.

.. just saying ..

On 3/23/15 10:47 AM, Michael Dürig mdue...@apache.org wrote:



On 23.3.15 5:04 , Chetan Mehrotra wrote:
 B - Proposed Changes
 ---

 1. Move the notion of listening to local events to Observer level - So
upon
 any new change detected we only push the change to a given queue if its
 local and bounded listener is only interested in local. Currently we
push
 all changes which later do get filter out but we avoid doing that first
 level itself and keep queue content limited to local changes only

I think there is no change needed in the Observer API itself as you can
already figure out from the passed CommitInfo whether a commit is
external or not. BTW please take care with the term local as there is
also the concept of session local commits.


 2. Attach the calculated diff as part of commit info which is attached
to
 the given change. This would allow eliminating the chances of the cache
 miss altogether and would ensure observation is not delayed due to slow
 processing of diff. This can be done on best effort basis if the diff
is to
 large then we do not attach it and in that case we diff again

 3. For listener which are only interested in local events we can use a
 different queue size limit i.e. allow larger queues for such listener.

 Later we can also look into using a journal (or persistent queue) for
local
 event processing.

Definitely something to try out. A few points to consider:

* There doesn't seem to be too much of a difference to me whether this
is routed via a cache or directly attached to commits. In wither way it
adds additional memory requirements and churn, which need to be managed.

* When introducing persisted queuing we need to be careful not to just
move the bottleneck to IO.

* An eventual implementation should not break the fundamental design.
Either hide it in the implementation or find a clean way to put this
into the overall design.

Michael