Re: contextCSN of subordinate syncrepl DBs

2009-11-22 Thread Howard Chu

Rein Tollevik wrote:

Howard Chu wrote:


It appears that the current code is sending newCookie messages pretty much all
the time. It's definitely too chatty now, and it appears that it's breaking
test050 sometimes, though I still haven't identified exactly why. I thought it
was because the consumer was accepting the new cookie values unconditionally,
but even after filtering out old values test050 still failed. #if'ing out the
relevant code in syncprov.c makes test050 run fine though. (syncprov.c:1675
thru 1723.)


The newCookie messages should only be sent if the local csn set is
updated without being accompanied by a real modification. And in an MMR
setup like test050 that should never happen except maybe during the
initial replication of the databases.

Any occurrences of newCookie message should be a symptom of another bug,


Yes, you're right. I was finally able to trace one of the occurrences and fix 
it in HEAD. In this particular case, an entry was Added and that event was 
pushed into the syncprov response queue. Then the entry was Modified and again 
the response was queued. But when the queue was processed, it was retrieving 
the entry from the DB at that point in time - so the update was sent out with 
the Add's CSN in the sync control, but the entry's entryCSN attribute already 
had the Mod's stamp in it. That's why during the updateCookie step there was a 
missing CSN...


Anyway, this race condition has been fixed in HEAD by enqueuing the dup'd 
entry, so that the outbound updates all have consistent state.



and I do believe one such race condition exist in syncrepl.  One
possible scenario, with 4 or more hosts:

server1 makes two or more changes to the db, with csn n and n+1.
server2 receive both, and starts replicating them to server3.
server3 receives and starts processing the first change from server1. It
updates cs_pvals in the syncrepl structure with the csn n of the first
modification.  Then, the same modification is received from server2, but
is rejected as being too old.  The second modification is received from
server2, this time being accepted.  This second modification is tagged
with csn n+1, which gets stored in the db by syncrepl_updateCookie and
picked up by syncprov. syncprov on server3 replicates the second change
with csn n+1 to server4.
server4 accepts the second modification from server3, without having
received the first change.  And when that arrives from server1 or 2 it
will be rejected as being too old.


Cannot happen. Every server sends its changes out in order; server 4 cannot 
receive csn n+1 from server 3 unless it has already received csn n from server 3.



If the second modify operation is received and processed by server3
after it have added csn n to the csn queue, but before it is committed,
the second modification will be tagged with csn n.  The csn being
written to the db is still csn n+1 though, which will be picked up by
syncprov and trigger a newCookie message.  Even without this, the csns
stored in the db on server3 is invalid and will result in an incomplete
db should it fail before the first modification completes.


Sholdn't happen now; the cs_pmutex will prevent a new sync op from starting. 
Likewise syncprov_op_response will prevent a new mod from completing.



The csns for any given sid are sent by the originating server in order,
I think the fix should be to always process them in the same order in
syncrepl.  For each sid in the csn set there should be one mutex, and
modifications with any given sid should only take place in the thread
holding the mutex.  To avoid stalling too long it must be possible for
the other syncrepl stanzas to note that a csn is too old without waiting
on the mutex for the csn sid.


That sounds ok.


I don't think it is correct for syncrepl to fetch csn values from
syncprov either. The only csn syncprov can update is the one with the
local sid, and syncrepl should simply ignore modifications tagged with
csn values with its own sid.  Provided syncrepl starts the replication
phase with a csn value with its own sid that is. The latter is to cover
the case where a server is being reinitialized from one of its peers, it
should then accept any changes that originated on the local server
before it was reinitialized.  Upon completing the initial replication
phase it will receive a csn set that may include its own sid, and it
should start ignoring modification with that sid.


Makes sense.


Neither of these changes should interfere with ordinary multi-master
configurations where syncrepl and syncprov are both use on the same
(glue) database.


Having spent the last 12 hours prodding at test050 I find that whenever I have
it working well, test058 "breaks" with contextCSN mismatches. At this point I
really have to question the rationale behind test058. First and foremost,
syncprov should not be sending gratuitous New Cookie messages to consumers
whose search terms are outside the scope of the update. I.e., if the actual
data

Re: contextCSN of subordinate syncrepl DBs

2009-11-22 Thread Rein Tollevik

Howard Chu wrote:


It appears that the current code is sending newCookie messages pretty much all
the time. It's definitely too chatty now, and it appears that it's breaking
test050 sometimes, though I still haven't identified exactly why. I thought it
was because the consumer was accepting the new cookie values unconditionally,
but even after filtering out old values test050 still failed. #if'ing out the
relevant code in syncprov.c makes test050 run fine though. (syncprov.c:1675
thru 1723.)


The newCookie messages should only be sent if the local csn set is 
updated without being accompanied by a real modification. And in an MMR 
setup like test050 that should never happen except maybe during the 
initial replication of the databases.


Any occurrences of newCookie message should be a symptom of another bug, 
and I do believe one such race condition exist in syncrepl.  One 
possible scenario, with 4 or more hosts:


server1 makes two or more changes to the db, with csn n and n+1.
server2 receive both, and starts replicating them to server3.
server3 receives and starts processing the first change from server1. It 
updates cs_pvals in the syncrepl structure with the csn n of the first 
modification.  Then, the same modification is received from server2, but 
is rejected as being too old.  The second modification is received from 
server2, this time being accepted.  This second modification is tagged 
with csn n+1, which gets stored in the db by syncrepl_updateCookie and 
picked up by syncprov. syncprov on server3 replicates the second change 
with csn n+1 to server4.
server4 accepts the second modification from server3, without having 
received the first change.  And when that arrives from server1 or 2 it 
will be rejected as being too old.


If the second modify operation is received and processed by server3 
after it have added csn n to the csn queue, but before it is committed, 
the second modification will be tagged with csn n.  The csn being 
written to the db is still csn n+1 though, which will be picked up by 
syncprov and trigger a newCookie message.  Even without this, the csns 
stored in the db on server3 is invalid and will result in an incomplete 
db should it fail before the first modification completes.


The csns for any given sid are sent by the originating server in order, 
I think the fix should be to always process them in the same order in 
syncrepl.  For each sid in the csn set there should be one mutex, and 
modifications with any given sid should only take place in the thread 
holding the mutex.  To avoid stalling too long it must be possible for 
the other syncrepl stanzas to note that a csn is too old without waiting 
on the mutex for the csn sid.


I don't think it is correct for syncrepl to fetch csn values from 
syncprov either. The only csn syncprov can update is the one with the 
local sid, and syncrepl should simply ignore modifications tagged with 
csn values with its own sid.  Provided syncrepl starts the replication 
phase with a csn value with its own sid that is. The latter is to cover 
the case where a server is being reinitialized from one of its peers, it 
should then accept any changes that originated on the local server 
before it was reinitialized.  Upon completing the initial replication 
phase it will receive a csn set that may include its own sid, and it 
should start ignoring modification with that sid.


Neither of these changes should interfere with ordinary multi-master 
configurations where syncrepl and syncprov are both use on the same 
(glue) database.


Having spent the last 12 hours prodding at test050 I find that whenever I have
it working well, test058 "breaks" with contextCSN mismatches. At this point I
really have to question the rationale behind test058. First and foremost,
syncprov should not be sending gratuitous New Cookie messages to consumers
whose search terms are outside the scope of the update. I.e., if the actual
data update didn't go to the consumer, then the following cookie update should
not either. In such an asymmetric configuration, it should be expected that
the contextCSNs will not match across all the servers, and forcing them all to
match is beginning to look like an error, to me.


Whenever the provider makes a local change that should not be replicated 
to the consumer the consumers database state continutes to be in sync. 
Yet, its csn set indicates that it isn't and it will always start out 
replicating all changes made after the oldest csn it holds.  Which can 
be quite a lot.  The only way to fix this is to send the newCookie messages.


Rein


Re: contextCSN of subordinate syncrepl DBs

2009-11-21 Thread Howard Chu
Rein Tollevik wrote:
> I've been trying to figure out why syncrepl used on a backend that is 
> subordinate to a glue database with the syncprov overlay should save the 
> contextCSN in the suffix of the glue database rather than the suffix of 
> the backend where syncrepl is used.  But all I come up with are reasons 
> why this should not be the case.  So, unless anyone can enlighten me as 
> to what I'm missing, I suggest that this be changed.
> 
> The problem with the current design is that it makes it impossible to 
> reliably replicate more than one subordinate db from the same remote 
> server, as there are now race conditions where one of the subordinate 
> backends could save an updated contextCSN value that is picked up by the 
> other before it has finished its synchronization. An example of a 
> configuration where more than one subordinate db replicated from the 
> same server might be necessary is the central master described in my 
> previous posting in 
> http://www.openldap.org/lists/openldap-devel/200806/msg00041.html
> 
> My idea as to how this race condition could be verified was to add 
> enough entries to one of the backends (while the consumer was stopped) 
> to make it possible to restart the consumer after the first backend had 
> saved the updated contextCSN but before the second has finished its 
> synchronization.  But I was able to produce it by simply add or delete 
> of an entry in one of the backends before starting the consumer.  Far to 
> often was the backend without any changes able to pick up and save the 
> updated contextCSN from the producer before syncrepl on the second 
> backend fetched its initial value.  I.e it started with an updated 
> contextCSN and didn't receive the changes that had taken place on the 
> producer.  If syncrepl stored the values in the suffix of their own 
> database then they wouldn't interfere with each other like this.
> 
> There is a similar problem in syncprov, as it must use the lowest 
> contextCSN value (with a given sid) saved by the syncrepl backends 
> configured within the subtree where syncprov is used.  But to do that it 
> also needs to distinguish the contextCSN values of each syncrepl 
> backend, which it can't do when they all save them in the glue suffix.
> This also implies that syncprov must ignore contextCSN updates from 
> syncrepl until all syncrepl backends has saved a value, and that 
> syncprov on the provider must send newCookie sync info messages when it 
> updates its contextCSN value when the changed entry isn't being 
> replicated to a consumer.  I.e as outlined in the message referred to above.

It appears that the current code is sending newCookie messages pretty much all
the time. It's definitely too chatty now, and it appears that it's breaking
test050 sometimes, though I still haven't identified exactly why. I thought it
was because the consumer was accepting the new cookie values unconditionally,
but even after filtering out old values test050 still failed. #if'ing out the
relevant code in syncprov.c makes test050 run fine though. (syncprov.c:1675
thru 1723.)

> Neither of these changes should interfere with ordinary multi-master 
> configurations where syncrepl and syncprov are both use on the same 
> (glue) database.

Having spent the last 12 hours prodding at test050 I find that whenever I have
it working well, test058 "breaks" with contextCSN mismatches. At this point I
really have to question the rationale behind test058. First and foremost,
syncprov should not be sending gratuitous New Cookie messages to consumers
whose search terms are outside the scope of the update. I.e., if the actual
data update didn't go to the consumer, then the following cookie update should
not either. In such an asymmetric configuration, it should be expected that
the contextCSNs will not match across all the servers, and forcing them all to
match is beginning to look like an error, to me.

> I'll volunteer to implement and test the necessary changes if this is 
> the right solution.  But to know whether my analysis is correct or not I 
> need feedback.  So, comments please?

-- 
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/


Re: contextCSN of subordinate syncrepl DBs

2009-11-17 Thread Howard Chu
Rein Tollevik wrote:
> I've been trying to figure out why syncrepl used on a backend that is 
> subordinate to a glue database with the syncprov overlay should save the 
> contextCSN in the suffix of the glue database rather than the suffix of 
> the backend where syncrepl is used.  But all I come up with are reasons 
> why this should not be the case.  So, unless anyone can enlighten me as 
> to what I'm missing, I suggest that this be changed.
> 
> The problem with the current design is that it makes it impossible to 
> reliably replicate more than one subordinate db from the same remote 
> server, as there are now race conditions where one of the subordinate 
> backends could save an updated contextCSN value that is picked up by the 
> other before it has finished its synchronization. An example of a 
> configuration where more than one subordinate db replicated from the 
> same server might be necessary is the central master described in my 
> previous posting in 
> http://www.openldap.org/lists/openldap-devel/200806/msg00041.html
> 
> My idea as to how this race condition could be verified was to add 
> enough entries to one of the backends (while the consumer was stopped) 
> to make it possible to restart the consumer after the first backend had 
> saved the updated contextCSN but before the second has finished its 
> synchronization.  But I was able to produce it by simply add or delete 
> of an entry in one of the backends before starting the consumer.  Far to 
> often was the backend without any changes able to pick up and save the 
> updated contextCSN from the producer before syncrepl on the second 
> backend fetched its initial value.  I.e it started with an updated 
> contextCSN and didn't receive the changes that had taken place on the 
> producer.  If syncrepl stored the values in the suffix of their own 
> database then they wouldn't interfere with each other like this.

OK.

> There is a similar problem in syncprov, as it must use the lowest 
> contextCSN value (with a given sid) saved by the syncrepl backends 
> configured within the subtree where syncprov is used.  But to do that it 
> also needs to distinguish the contextCSN values of each syncrepl 
> backend, which it can't do when they all save them in the glue suffix.
> This also implies that syncprov must ignore contextCSN updates from 
> syncrepl until all syncrepl backends has saved a value, and that 
> syncprov on the provider must send newCookie sync info messages when it 
> updates its contextCSN value when the changed entry isn't being 
> replicated to a consumer.  I.e as outlined in the message referred to above.

Then (at least) at server startup time syncprov must retrieve the contextCSNs
from all of its subordinate DBs. Perhaps a subtree search with filter
"(contextCSN=*)" would suffice; this would of course require setting a
presence index on this attribute to run reasonably. (Or we can add a glue
function to return a list of the subordinate suffixes or DBs...)

By the way, please use "subordinate database" and "superior database" when
discussing these things; "glue database" is too ambiguous.
-- 
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/


Re: contextCSN of subordinate syncrepl DBs

2008-06-18 Thread Rein Tollevik

Howard Chu skrev:

There are only two supported modes of operation intended here. In one 
case, the glued databases each have their own syncprov overlay, and 
replication does not cross glue boundaries. In the other case, there is 
a single syncprov overlay for the entire glued tree, and the boundaries 
between glued DBs are ignored. In this config, all of the contextCSNs 
must be saved in the glue DB so that the single syncprov overlay can 
stay informed about any underlying changes.


I understand that syncprov needs to be informed about changes in the 
subordinate DBs, and it is my intention that it shall stay that way. 
syncrepl on the subordinate db must continue to write through the glue 
database so that syncprov sees all changes, including the update to the 
contextCSN.  It is already being specially informed about the contextCSN 
updates, to exclude them from replication.  Syncprov must itself update 
the contextCSN values it manages in its own suffix when it receives 
these updates from syncrepl.  I.e the contextCSN values would end up 
being stored in the suffix of both the glue and the subordinate DB.  And 
as syncprov in some situations must advertise an older csn value (for a 
given sid) than syncrepl on the subordinate DBs this seems correct to do.


My suggested change would add support for the kind of configuration I 
have outlined, without harming the currently supported configurations. 
It should be a fairly simple change, so I still suggest that it is made.


Rein


Re: contextCSN of subordinate syncrepl DBs

2008-06-18 Thread Howard Chu

Rein Tollevik wrote:

I've been trying to figure out why syncrepl used on a backend that is
subordinate to a glue database with the syncprov overlay should save the
contextCSN in the suffix of the glue database rather than the suffix of
the backend where syncrepl is used.  But all I come up with are reasons
why this should not be the case.  So, unless anyone can enlighten me as
to what I'm missing, I suggest that this be changed.



The problem with the current design is that it makes it impossible to
reliably replicate more than one subordinate db from the same remote
server, as there are now race conditions where one of the subordinate
backends could save an updated contextCSN value that is picked up by the
other before it has finished its synchronization. An example of a
configuration where more than one subordinate db replicated from the
same server might be necessary is the central master described in my
previous posting in
http://www.openldap.org/lists/openldap-devel/200806/msg00041.html


There are only two supported modes of operation intended here. In one case, 
the glued databases each have their own syncprov overlay, and replication does 
not cross glue boundaries. In the other case, there is a single syncprov 
overlay for the entire glued tree, and the boundaries between glued DBs are 
ignored. In this config, all of the contextCSNs must be saved in the glue DB 
so that the single syncprov overlay can stay informed about any underlying 
changes.

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/