Re: [observation] pure internal or external listeners

2016-09-02 Thread Stefan Egli
On 02/09/16 13:41, "Stefan Egli"  wrote:

>On 02/09/16 13:26, "Chetan Mehrotra"  wrote:
>
>>Listener for local Change
>>--
>>
>>Such a listener is more particular about type of change and is doing
>>some persisted state change i.e. like registering a job, invoking some
>>third party service to update the value. This listener is only
>>interested in local as it know same listener is also active on other
>>cluster node (homogeneous cluster setup) so if a node gets added it
>>only need to react on the cluster node where it got added.
>
>One thing this reminds me of is a use-case where you have say 3 cluster
>nodes, each one handling mainly local events lets say. All fine. Then 1
>node crashes while likely it's (local) observation queue wasn't entirely
>empty. Those events would then probably not get handled by anyone (and
>that node wouldn't necessarily be restarted as the cluster continues
>normally, it could be restarted as a new clusterNodeId..). So maybe
>there's an issue there.

I think this should be handled same as today with (non-journaled)
listeners loosing events on any crash: either upon restart or when an
instance leaves the cluster (which can be noticed eg via Sling's Discovery
API) someone (preferably the leader) should handle this and do a
repository scan of whatever interesting the crashing instance might have
stored. Lack of journaled observation that's the way to go probably.

Cheers,
Stefan




Re: [observation] pure internal or external listeners

2016-09-02 Thread Stefan Egli
Hi Chetan,

(see below)

On 02/09/16 13:26, "Chetan Mehrotra"  wrote:

>On Fri, Sep 2, 2016 at 4:00 PM, Stefan Egli  wrote:
>> If we
>> separate listeners into purely internal vs external, then a queue as a
>>whole
>> is either purely internal or external and we no longer have this issue.
>
>Not sure here on how this would work. The observation queue is made up
>of ContentChange which is a tuple of [root NodeState , CommitInfo
>(null for external)]
>
>--- NS1-L---NS2-L--NS3---NS4-L---NS5-L ---NS6-L
>
>--- a  /a/b  - /a/c --- /a/c
> /a/b /a/b
>/a/d
>
>So if we dedicate a queue for local changes only what would happen.
>
>If we drop NS3 then while diffing [NS2-L, NS4-L] /a/c would be
>reported as "added" and "local". Now we have a listener which listens
>for locally added nt:file node such it can start some processing job
>for it. Such a listener would then think its a locally added node and
>would start a duplicate job

Good point. We could probably fix this though by not only storing 1 root
NodeState per ContentChange, but store 2: a 'from' and a 'to' (the 'from'
is currently implicit, as that's taken from the previous entry, but if we
skip entries, then it needs to be re-added). So with that, we could safely
drop external changes as 'uninterested' and diffing would still report the
correct thing.

>
>In general I believe
>
>Listener for external Change
>--
>listener which are listening for external changes are maintaining some
>state and purge/refresh it upon detecting change in interested paths.
>They would work fine if multiple content change occurrences are merged
>
>[NS4-L, NS5-L] + [NS5-L,NS6-L] = [NS4, NS6] (external) as they would
>still detect the change
>
>An example of this is LuceneIndexObserver which sets queue size to 5
>and does not care its local or not. It just interested in if index
>node is updated
>
>Listener for local Change
>--
>
>Such a listener is more particular about type of change and is doing
>some persisted state change i.e. like registering a job, invoking some
>third party service to update the value. This listener is only
>interested in local as it know same listener is also active on other
>cluster node (homogeneous cluster setup) so if a node gets added it
>only need to react on the cluster node where it got added.

One thing this reminds me of is a use-case where you have say 3 cluster
nodes, each one handling mainly local events lets say. All fine. Then 1
node crashes while likely it's (local) observation queue wasn't entirely
empty. Those events would then probably not get handled by anyone (and
that node wouldn't necessarily be restarted as the cluster continues
normally, it could be restarted as a new clusterNodeId..). So maybe
there's an issue there.

>
>So for such it needs to be ensured that mixed content changes are not
>compacted. So its fine to
>
>[NS4-L, NS5-L] + [NS5-L,NS6-L] = [NS4, NS6] (can be treated as
>local with loss of user identity which caused the change)
>[NS2-L, NS3]+ [NS3, NS4-L] = [NS2-L, NS4-L] (cannot be treated as
>local)

I think keeping the 'from/to' tuple instead of just 1 root NodeState would
make the above picture more simple.

Cheers,
Stefan

>
>Just thinking out loud here to understand the problem space better :)
>
>Chetan Mehrotra




Re: [observation] pure internal or external listeners

2016-09-02 Thread Carsten Ziegeler
That's an interesting approach. I kind of like it, especially as it
provides a path to get rid of external listeners one day completely.

Carsten

> Hi,
> 
> As you're probably aware there are currently several different issues being
> worked upon related to the observation queue limit problem ([0], epic [1]).
> I wanted to discuss yet another improvement and first ask what the list
> thinks.
> 
> What about requiring observation listeners to either consume only internal
> or only external events, but never both together, we wouldn't support that
> anymore. (And if you're in a cluster you want to be very careful with
> consuming external events in the first place - but that's another topic)
> 
> The root problem of the 'queue hitting the limit' as of today is that it
> throws away the CommitInfo, thus doesn't know anymore if it's an internal or
> an external event (besides actually loosing the CommitInfo details). If we
> separate listeners into purely internal vs external, then a queue as a whole
> is either purely internal or external and we no longer have this issue. We
> could continue to throw away the CommitInfo (or avoid that using a persisted
> obs queue ([2])), but we could then still say with certainty if it's an
> internal or an external event.
> 
> A user that would want to receive both internal and external events could
> simply create two listeners. Those would both receive events as expected.
> The only difference would be that the two stream of events would not be in
> sync - but I doubt that this would be a big loss.
> 
> We could thus introduce 'ExcludeInternal' and demand in
> ObservationManager.addEventListener that the listener is flagged with one of
> ExcludeInternal or ExcludeExternal.
> 
> Wdyt?
> 
> Cheers,
> Stefan
> --
> [0] - https://issues.apache.org/jira/browse/OAK-2683
> [1] - https://issues.apache.org/jira/browse/OAK-4614
> [2] - https://issues.apache.org/jira/browse/OAK-4581
> 
> 
> 
> 


 

-- 
Carsten Ziegeler
Adobe Research Switzerland
cziege...@apache.org



Re: [observation] pure internal or external listeners

2016-09-02 Thread Chetan Mehrotra
On Fri, Sep 2, 2016 at 4:00 PM, Stefan Egli  wrote:
> If we
> separate listeners into purely internal vs external, then a queue as a whole
> is either purely internal or external and we no longer have this issue.

Not sure here on how this would work. The observation queue is made up
of ContentChange which is a tuple of [root NodeState , CommitInfo
(null for external)]

--- NS1-L---NS2-L--NS3---NS4-L---NS5-L ---NS6-L

--- a  /a/b  - /a/c --- /a/c
 /a/b /a/b
/a/d

So if we dedicate a queue for local changes only what would happen.

If we drop NS3 then while diffing [NS2-L, NS4-L] /a/c would be
reported as "added" and "local". Now we have a listener which listens
for locally added nt:file node such it can start some processing job
for it. Such a listener would then think its a locally added node and
would start a duplicate job

In general I believe

Listener for external Change
--
listener which are listening for external changes are maintaining some
state and purge/refresh it upon detecting change in interested paths.
They would work fine if multiple content change occurrences are merged

[NS4-L, NS5-L] + [NS5-L,NS6-L] = [NS4, NS6] (external) as they would
still detect the change

An example of this is LuceneIndexObserver which sets queue size to 5
and does not care its local or not. It just interested in if index
node is updated

Listener for local Change
--

Such a listener is more particular about type of change and is doing
some persisted state change i.e. like registering a job, invoking some
third party service to update the value. This listener is only
interested in local as it know same listener is also active on other
cluster node (homogeneous cluster setup) so if a node gets added it
only need to react on the cluster node where it got added.

So for such it needs to be ensured that mixed content changes are not
compacted. So its fine to

[NS4-L, NS5-L] + [NS5-L,NS6-L] = [NS4, NS6] (can be treated as
local with loss of user identity which caused the change)
[NS2-L, NS3]+ [NS3, NS4-L] = [NS2-L, NS4-L] (cannot be treated as local)

Just thinking out loud here to understand the problem space better :)

Chetan Mehrotra


Re: [observation] pure internal or external listeners

2016-09-02 Thread Stefan Egli
Perhaps for backwards compatibility we could auto-create 2 listeners for
the case where a listener is registered without ExcludeInternal or
ExcludeExternal - and issue a corresponding, loud, WARN.

On 02/09/16 12:30, "Stefan Egli"  wrote:

>Hi,
>
>As you're probably aware there are currently several different issues
>being
>worked upon related to the observation queue limit problem ([0], epic
>[1]).
>I wanted to discuss yet another improvement and first ask what the list
>thinks.
>
>What about requiring observation listeners to either consume only internal
>or only external events, but never both together, we wouldn't support that
>anymore. (And if you're in a cluster you want to be very careful with
>consuming external events in the first place - but that's another topic)
>
>The root problem of the 'queue hitting the limit' as of today is that it
>throws away the CommitInfo, thus doesn't know anymore if it's an internal
>or
>an external event (besides actually loosing the CommitInfo details). If we
>separate listeners into purely internal vs external, then a queue as a
>whole
>is either purely internal or external and we no longer have this issue. We
>could continue to throw away the CommitInfo (or avoid that using a
>persisted
>obs queue ([2])), but we could then still say with certainty if it's an
>internal or an external event.
>
>A user that would want to receive both internal and external events could
>simply create two listeners. Those would both receive events as expected.
>The only difference would be that the two stream of events would not be in
>sync - but I doubt that this would be a big loss.
>
>We could thus introduce 'ExcludeInternal' and demand in
>ObservationManager.addEventListener that the listener is flagged with one
>of
>ExcludeInternal or ExcludeExternal.
>
>Wdyt?
>
>Cheers,
>Stefan
>--
>[0] - https://issues.apache.org/jira/browse/OAK-2683
>[1] - https://issues.apache.org/jira/browse/OAK-4614
>[2] - https://issues.apache.org/jira/browse/OAK-4581
>
>
>