One thing I just noticed, while testing replication with 3 servers on my
laptop - during a refresh, the provider gets blocked waiting to write to
the consumers after writing about 4000 entries. I.e., the consumers
aren't processing fast enough to keep up with the search running on the
provider.
(That's actually not too surprising since reads are usually faster than
writes anyway.)
The consumer code has lots of problems as it is, just adding this note
to the pile.
I'm considering adding an option to the consumer to write its entries
with dbnosync during the refresh phase. The rationale being, there's
nothing to lose anyway if the refresh is interrupted. I.e., the consumer
can't update its contextCSN until the very end of the refresh, so any
partial refresh that gets interrupted is wasted effort - the consumer
will always have to start over from the beginning on its next refresh
attempt. As such, there's no point in safely/synchronously writing any
of the received entries - they're useless until the final contextCSN update.
The implementation approach would be to define a new control e.g. "fast
write" for the consumer to pass to the underlying backend on any write
op. We would also have to e.g. add an MDB_TXN_NOSYNC flag to
mdb_txn_begin() (BDB already has the equivalent flag).
This would only be used for writes that are part of a refresh phase. In
persist mode the provider and consumers' write speeds should be more
closely matched so it wouldn't be necessary or useful.
Comments?
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/