Hi Ondřej,

Thanks for getting back. I do have the logs from a previous
replication stall. I'll capture the logs again next time it happens. I
checked the logs. I don't see any abandoned connections.

aaa-prod-aws-12:1636
# requesting: contextCSN
contextCSN: 20250102015911.702871Z#000000#000#000000

All the relevant logs and info:

dn: cn=Consumer 152,cn=Database 1,cn=Databases,cn=Monitor
structuralObjectClass: olmSyncReplInstance
creatorsName:
modifiersName:
createTimestamp: 20241209130653Z
modifyTimestamp: 20241209130653Z
olmSRProviderURIList: ldaps://aaa-master-1.uis.georgetown.edu:636/
olmSRConnection: IP=172.20.86.12:49880
olmSRSyncPhase: Persist
olmSRNextConnect: 00000101000000Z
olmSRLastConnect: 20241229203510Z
olmSRLastContact: 20250102015934Z
olmSRLastCookieRcvd: rid=152,csn=20250102015911.702871Z#000000#000#000000
olmSRLastCookieSent: rid=152,csn=20241229202835.459483Z#000000#000#000000
entryDN: cn=Consumer 152,cn=Database 1,cn=Databases,cn=Monitor
subschemaSubentry: cn=Subschema
hasSubordinates: FALSE

*Consumer:*
netstat -an | grep 49880
tcp        0      0 172.20.86.12:49880      172.17.21.52:636
 ESTABLISHED

*Master:*
netstat -an | grep 172.20.86.12
tcp        0      0 172.17.21.52:636        172.20.86.12:49880
 ESTABLISHED

*Master logs:*
Jan  1 20:59:18 aaa-prod-master-1 slapd[3281130]: conn=1035 op=1
syncprov_sendresp:
cookie=rid=152,csn=20250102015911.686467Z#000000#000#000000
*Jan  1 20:59:18 aaa-prod-master-1 slapd[3281130]: conn=1035 op=1
syncprov_sendresp: cookie=rid=152,csn=**20250102015911.702871Z#000000#*
*000#000000*

*Nothing about rid=152 is logged after the above.*







*Consumer logs:Jan  1 20:59:34 aaa-prod-aws-12 slapd[1229307]: do_syncrep2:
rid=152 cookie=rid=152,csn=20250102015911.702871Z#000000#000#000000Jan  1
20:59:34 aaa-prod-aws-12 slapd[1229307]: syncrepl_entry: rid=152
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_MODIFY)
csn=20250102015911.702871Z#000000#000#000000 tid 0x7f7a753fc640Jan  1
20:59:34 aaa-prod-aws-12 slapd[1229307]: slap_queue_csn: queueing
0x7f7a687c6190 20250102015911.702871Z#000000#000#000000Jan  1 20:59:34
aaa-prod-aws-12 slapd[1229307]: slap_graduate_commit_csn: removing
0x7f7a687c6190 20250102015911.702871Z#000000#000#000000Jan  1 20:59:34
aaa-prod-aws-12 slapd[1229307]: slap_queue_csn: queueing 0x7f7a6877d9b0
20250102015911.702871Z#000000#000#000000Jan  1 20:59:34 aaa-prod-aws-12
slapd[1229307]: slap_graduate_commit_csn: removing 0x7f7a6877d9b0
20250102015911.702871Z#000000#000#000000*


*Nothing about replication is logged after the above.*

>From the last coredump:

Thread 1 (Thread 0x7f85243fa640 (LWP 192314)):
#0  connection_abandon (c=0x7f9eb4ad0078) at connection.c:714
#1  0x00000000004460d5 in connection_closing (c=0x7f9eb4ad0078,
why=0x5db380 <conn_lost_str> "connection lost") at connection.c:785
#2  0x0000000000447d18 in connection_read (s=31, cri=0x7f85243f99a0)
at connection.c:1453
#3  0x000000000044741b in connection_read_thread (ctx=0x7f85243f99f0,
argv=0x1f) at connection.c:1260
#4  0x00007f9ecd406bed in ldap_int_thread_pool_wrapper
(xpool=0xac8080) at tpool.c:1059
#5  0x00007f9ecca89c02 in start_thread () from /lib64/libc.so.6
#6  0x00007f9eccb0ec40 in clone3 () from /lib64/libc.so.6
No core file now.


Thanks,

Suresh


On Tue, Mar 4, 2025 at 6:12 AM Ondřej Kuzník <[email protected]> wrote:

> On Mon, Jan 13, 2025 at 10:42:58AM -0500, Suresh Veliveli wrote:
> > Hi Ondřej,
> >
> > Attached is the file from the last crash for "thread apply all bt full".
> I
> > built it from the src (openldap.org). The installation is prefixed to
> > /var/services/openldap directory. I do have "stats sync" log level
> enabled.
> > Our logs are huge, I could get the necessary info if you can tell what I
> > need to look for.
>
> Hi Suresh,
> as I mentioned, you want to see what the provider was doing with the
> session and the decisions it took along the way. To see that, you want
> to find where the session starts (where you find this "cookie=rid=..."
> message) and *then* use the "conn=xxx op=yyy" you find in this message
> to isolate the messages that correlate with it. That's the first thing
> you'll need to track down what eventually happened to the session.
>
> If it's related to the crash in any way, it might also show us if
> something went wrong if we're lucky.
>
> Also just out of interest, are there any Abandon/Cancel requests in the
> logs?
>
> Thanks,
>
> --
> Ondřej Kuzník
> Senior Software Engineer
> Symas Corporation                       http://www.symas.com
> Packaged, certified, and supported LDAP solutions powered by OpenLDAP
>


-- 
Suresh Veliveli
Sr. UNIX Systems Engineer
Georgetown University
University Information Services | Security Infrastructure and
Policy-Identity and Collaboration
202-262-6676 (cell) | 202-687-3108 (work)

Reply via email to