[Issue 10293] Log operations generated by syncrepl at STATS level

openldap-its Thu, 20 Nov 2025 00:46:55 -0800

https://bugs.openldap.org/show_bug.cgi?id=10293

--- Comment #10 from Ondřej Kuzník <[email protected]> ---
On Wed, Nov 19, 2025 at 05:05:03PM +0000, [email protected] wrote:
> And this gets to the heart of what "I NEED" (on behalf of the customer). For
> each update request handled by any producer the same update is performed on
> every consumer (replicated). I MAY have included irrelevant SRCH and CMP ops 
> in
> my original work but the ADD, MOD, MODDN, and DEL changes are reflected and
> processed as received. I only suggest that those operations are invisible to
> all but those processing the SYNC level logs and deserve to be represented in
> the STATS level log as a courtesy to the users of simplified log reduction
> packages like SPLUNK and DATA DOG..

Hi Marty,
I'm trying to say that in syncrepl, those operations *do not exist*.
syncrepl is a search, you only get entries and a few other types of
messages, all outlined in what you responded to below. There are no
ADD/MOD/DEL/MODDN changes as such or they're already noted as such in
what I mentioned. (There is a better explanation nearer the end of this
message.)

What processing happens on the consumer to adapt their own database
based on the information just isn't traceable the way you ask for.
However I do believe that the proposal delivers pretty much what you
need even if it doesn't look like what you expected.

>>> Furthermore, the monitoring tools would be required to understand the more
>>> complex SYNC logging formats and content. Most customers rely at best on AWS
>>> Cloudwatch (a Grafana based system) and/or batch log reduction programs like
>>> SPLUNK and DataDog which lack the more complex processing capabilities to
>>> understand SYNC logging. The best of our customers have no idea what the 
>>> total
>>> "load" being presented to their OpenLDAP servers and are unable to diagnose
>>> performance issues without it.
>> 
>> Instead of flooding the STATS level with yet another firehose of data,
>> wouldn't exposing a cn=monitor counter like "total time spent on
>> processing messages from this provider" be better? That's the exact
>> contribution you're looking for, no expensive log processing needed
>> *and* fairly easy to provide.
> 
> For the reasons above about the lack of actual customer use of cn=monitor I
> would respond with NO. cn=monitor is only valuable processed into a 
> time-series
> database and through analysis tools like Grafana and Prometheus. And it lacks
> information like the etime (already hidden in the SYNC log) and rid from which
> the update was replicated.

Sure, you don't want cn=monitor, noted.

>> If not, let's be concrete then, have a look at the below and see if it
>> covers what you're after. "etime=" are seconds since we started
>> processing the message, so to the extent you consider current etime=
>> logging to represent "load", it should too. There is no "hidden"
>> processing time except overheads that would have been hidden from
>> current STATS messages as well. And certainly no double-counting which
>> your previous suggestions would have been prone to.

Please reread the proposal as it was written: as the syncrepl search
unfolds, there are certain types of messages the consumer receives from
the provider. The proposal lists all of them, outlines their purpose (or
you can watch my last LDAPCon talk explaining the protocol if you want
to see how it all fits together) and suggests corresponding message to
show up in the log.

That way all replication related work gets counted for its processing
cost which is what I understand you're asking for without risks of
double-counting (making the exercise moot) or leaking irrelevant
details. One message = one line in the log. And before you seize on this
again, 90% of the time one message = one update on the consumer, but it
could be anything between 0 (relatively common) and an unbounded number
of tasks some of which are entry updates in ways that seemingly have no
relation to the message that was received.

>> RFC4533 which we're processing defines the following message types (I
>> won't touch on other replication protocols we support e.g. dirsync):
>> 
>> Intermediate "newcookie" message:
>> rid=001 SYNC NEW_COOKIE cookie=<cookie value> etime=0.123
> 
> Right ... that was actually an addition we made at least partially to earlier
> suggestions/pressure from my requests. No?
> 
>> For a refreshDelete/refreshPresent Intermediate message:
>> rid=012 SYNC REFRESH_DELETE refreshDone=0 cookie=<cookie value if sent>
>> etime=0.012
>> rid=012 SYNC REFRESH_PRESENT refreshDone=0 cookie=<cookie value if sent>
>> etime=0.012
> 
> At no time did I ask for that.
> 
>> For a syncIdSet (present/delete phase contents) Intermediate message:
>> rid=123 SYNC ID_SET delete=0|1 cookie=<cookie value if sent> etime=0.234
>> result=<"processed"/"failed">
> 
> Or that.
> 
>> For a search entry message regardless of how we ended up interpreting
>> it:
>> rid=201 SYNC ENTRY dn="<dn as received>" state=<"state" field from the
>> control: 0|1|2|3> cookie=<cookie value if sent> etime=0.123
>> result=<"processed"/"skipped"/"failed">
> 
> I wonder what the producer would be searching for directly via the 
> replication 
> process. I was under the impression this was part of something like delta-sync
> and not in reference to recording the occurence of an update replicated down. 
> I
> would be happy to better understand how this fits such that it
> could/should/would be part of what I think the customer wants/needs.

The provider receives a search operation. Always, that's what the
replication protocol is modeled on. So yes, the provider searches for
what the consumer asked for, taking into account that if an entry could
not have changed based on what the consumer said, it doesn't need to be
sent again and vice versa. If you look at it this way, you'll see why
all the above and below exist and would make sense logging since any of
them could trigger vast changes to the replica.

>> For a search result message (end of refreshOnly):
>> rid=321 SYNC RESULT err=<resultCode as received> delete=0|1 cookie=<value if
>> sent> etime=0.... result=<"processed"/"failed">
> 
> See previous comment about search in the SYNC information log.
> 
>> I believe this is a little too much information to put into the logs but
>> this is what might fit the description above. It might also not be
>> feasible to provide "result=skipped" messages and they might show
>> "processed" instead, can't tell right now.
> 
> I hope taking all but the actual updates as requested through the original
> producer off the table it is less of a firehose and more reasonable.

And again, there are no updates sent to the consumer, it's all just a
"special search request" on the protocol level. It's a consumer's job to
make sense of it however it wants.

Regards,

-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10293] Log operations generated by syncrepl at STATS level

Reply via email to