Hi Ondřej, Thanks for getting back. I do have the logs from a previous replication stall. I'll capture the logs again next time it happens. I checked the logs. I don't see any abandoned connections.
aaa-prod-aws-12:1636 # requesting: contextCSN contextCSN: 20250102015911.702871Z#000000#000#000000 All the relevant logs and info: dn: cn=Consumer 152,cn=Database 1,cn=Databases,cn=Monitor structuralObjectClass: olmSyncReplInstance creatorsName: modifiersName: createTimestamp: 20241209130653Z modifyTimestamp: 20241209130653Z olmSRProviderURIList: ldaps://aaa-master-1.uis.georgetown.edu:636/ olmSRConnection: IP=172.20.86.12:49880 olmSRSyncPhase: Persist olmSRNextConnect: 00000101000000Z olmSRLastConnect: 20241229203510Z olmSRLastContact: 20250102015934Z olmSRLastCookieRcvd: rid=152,csn=20250102015911.702871Z#000000#000#000000 olmSRLastCookieSent: rid=152,csn=20241229202835.459483Z#000000#000#000000 entryDN: cn=Consumer 152,cn=Database 1,cn=Databases,cn=Monitor subschemaSubentry: cn=Subschema hasSubordinates: FALSE *Consumer:* netstat -an | grep 49880 tcp 0 0 172.20.86.12:49880 172.17.21.52:636 ESTABLISHED *Master:* netstat -an | grep 172.20.86.12 tcp 0 0 172.17.21.52:636 172.20.86.12:49880 ESTABLISHED *Master logs:* Jan 1 20:59:18 aaa-prod-master-1 slapd[3281130]: conn=1035 op=1 syncprov_sendresp: cookie=rid=152,csn=20250102015911.686467Z#000000#000#000000 *Jan 1 20:59:18 aaa-prod-master-1 slapd[3281130]: conn=1035 op=1 syncprov_sendresp: cookie=rid=152,csn=**20250102015911.702871Z#000000#* *000#000000* *Nothing about rid=152 is logged after the above.* *Consumer logs:Jan 1 20:59:34 aaa-prod-aws-12 slapd[1229307]: do_syncrep2: rid=152 cookie=rid=152,csn=20250102015911.702871Z#000000#000#000000Jan 1 20:59:34 aaa-prod-aws-12 slapd[1229307]: syncrepl_entry: rid=152 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_MODIFY) csn=20250102015911.702871Z#000000#000#000000 tid 0x7f7a753fc640Jan 1 20:59:34 aaa-prod-aws-12 slapd[1229307]: slap_queue_csn: queueing 0x7f7a687c6190 20250102015911.702871Z#000000#000#000000Jan 1 20:59:34 aaa-prod-aws-12 slapd[1229307]: slap_graduate_commit_csn: removing 0x7f7a687c6190 20250102015911.702871Z#000000#000#000000Jan 1 20:59:34 aaa-prod-aws-12 slapd[1229307]: slap_queue_csn: queueing 0x7f7a6877d9b0 20250102015911.702871Z#000000#000#000000Jan 1 20:59:34 aaa-prod-aws-12 slapd[1229307]: slap_graduate_commit_csn: removing 0x7f7a6877d9b0 20250102015911.702871Z#000000#000#000000* *Nothing about replication is logged after the above.* >From the last coredump: Thread 1 (Thread 0x7f85243fa640 (LWP 192314)): #0 connection_abandon (c=0x7f9eb4ad0078) at connection.c:714 #1 0x00000000004460d5 in connection_closing (c=0x7f9eb4ad0078, why=0x5db380 <conn_lost_str> "connection lost") at connection.c:785 #2 0x0000000000447d18 in connection_read (s=31, cri=0x7f85243f99a0) at connection.c:1453 #3 0x000000000044741b in connection_read_thread (ctx=0x7f85243f99f0, argv=0x1f) at connection.c:1260 #4 0x00007f9ecd406bed in ldap_int_thread_pool_wrapper (xpool=0xac8080) at tpool.c:1059 #5 0x00007f9ecca89c02 in start_thread () from /lib64/libc.so.6 #6 0x00007f9eccb0ec40 in clone3 () from /lib64/libc.so.6 No core file now. Thanks, Suresh On Tue, Mar 4, 2025 at 6:12 AM Ondřej Kuzník <[email protected]> wrote: > On Mon, Jan 13, 2025 at 10:42:58AM -0500, Suresh Veliveli wrote: > > Hi Ondřej, > > > > Attached is the file from the last crash for "thread apply all bt full". > I > > built it from the src (openldap.org). The installation is prefixed to > > /var/services/openldap directory. I do have "stats sync" log level > enabled. > > Our logs are huge, I could get the necessary info if you can tell what I > > need to look for. > > Hi Suresh, > as I mentioned, you want to see what the provider was doing with the > session and the decisions it took along the way. To see that, you want > to find where the session starts (where you find this "cookie=rid=..." > message) and *then* use the "conn=xxx op=yyy" you find in this message > to isolate the messages that correlate with it. That's the first thing > you'll need to track down what eventually happened to the session. > > If it's related to the crash in any way, it might also show us if > something went wrong if we're lucky. > > Also just out of interest, are there any Abandon/Cancel requests in the > logs? > > Thanks, > > -- > Ondřej Kuzník > Senior Software Engineer > Symas Corporation http://www.symas.com > Packaged, certified, and supported LDAP solutions powered by OpenLDAP > -- Suresh Veliveli Sr. UNIX Systems Engineer Georgetown University University Information Services | Security Infrastructure and Policy-Identity and Collaboration 202-262-6676 (cell) | 202-687-3108 (work)
