Re: (ITS#9098) assert fails in meta_back_search in some cases after reconnect

2020-02-17 Thread clement . oudot
Hello,

we just has the same crash with OpenLDAP 2.4.49. I think this issue
should be reopened as it was not fixed.

The core dump given by Maxime can be used to dig into the bug. We can
provide other information if needed.

-- 
Clément Oudot | Identity Solutions Manager

clement.ou...@worteks.com

Worteks | https://www.worteks.com






Re: (ITS#9098) assert fails in meta_back_search in some cases after reconnect

2020-01-10 Thread maxime . besson
Hi,

After a couple of months without any issues using 2.4.48, we suddenly
encountered a crash again, but this time on 2.4.48. It was the exact
same symptom, and the same assert failing as in my original message. It
appears that the issue happens a lot more rarely in 2.4.48 compared to
2.4.47, so ITS#8841 possibly had an effect on my issue, but did not
solve it completely.

I have a core dump I can run tests against, if you need.

-- 
Maxime Besson
Expert Infrastructure - Worteks
maxime.bes...@worteks.com





Re: (ITS#9098) assert fails in meta_back_search in some cases after reconnect

2019-12-11 Thread maxime . besson


> I suggest upgrading to 2.4.48 to pick up this fix:
> 
>    Fixed slapd-meta assertion when network interface goes down (ITS#8841)
> 


While reviewing the changelog for 2.4.48 it had seemed to me that this
fix was unrelated to the problem I had, because it was a different assert.

However, after upgrading 2.4.48 as suggested, and waiting about a month
for safety, the problem seems to have disappeared entirely, so it was
the same issue after all. Sorry about that, I think you can close this
issue.

-- 
Maxime Besson
Expert Infrastructure - Worteks
maxime.bes...@worteks.com





Re: (ITS#9098) assert fails in meta_back_search in some cases after reconnect

2019-10-16 Thread quanah



--On Wednesday, October 16, 2019 8:37 PM + maxime.bes...@worteks.com 
wrote:

> Full_Name: Maxime Besson
> Version: 2.4.47
> OS: Debian Jessie
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (2a01:cb00:802:8400:2cbe:3c60:fca6:e50b)

I suggest upgrading to 2.4.48 to pick up this fix:

Fixed slapd-meta assertion when network interface goes down (ITS#8841)


--Quaanh

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:






(ITS#9098) assert fails in meta_back_search in some cases after reconnect

2019-10-16 Thread maxime . besson
Full_Name: Maxime Besson
Version: 2.4.47
OS: Debian Jessie
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (2a01:cb00:802:8400:2cbe:3c60:fca6:e50b)


I am running a meta-directory with the following DB configuration. version
2.4.47, LTB build on Ubuntu 16.04

dn: olcDatabase={1}meta,cn=config
objectClass: olcMetaConfig
objectClass: olcDatabaseConfig
objectClass: olcConfig
objectClass: top
olcDatabase: {1}meta
olcSuffix: dc=com
olcAccess: {0}to * by * read
olcRootDN: cn=admin,dc=com

dn: olcMetaSub={0}uri,olcDatabase={1}meta,cn=config
objectClass: olcMetaTargetConfig
objectClass: olcConfig
objectClass: top
olcMetaSub: {0}uri
olcDbURI: ldap://1.2.3.4/dc=example,dc=com
olcDbIDAssertBind: mode=legacy flags=non-prescriptive,proxy-authz-non-critical
bindmethod=simple binddn="cn=admin,dc=example,dc=com" credentials="X"
olcDbTimeout: 5
olcDbNetworkTimeout: 3
olcDbNretries: never
olcDbRebindAsUser: true

... 

(There are 8 backends in total)


Timeouts were added in order to avoid blocking OpenLDAP completely when one
server becomes completely unavailable. However, since I added them, the slapd
process started crashing every now and then (from a couple hours to a couple of
days), usually during small network interruptions that affect all backends: I
see plenty of reconnect logs shortly before the crashes.

The crash is always immediately preceded by the following log message:

meta_search_dobind_init[{i}]: retrying URI="{url}" DN="{DN}"

{i} is never the same, and {url} and {DN} are the correct settings for backend
i.

The crash itself is an ABRT at the following assert in back-meta/search.c:

1957assert( candidates[ i 
].sr_msgid >= 0
1958|| candidates[ i 
].sr_msgid == META_MSGID_CONNECTING );


I have analyzed several core dumps, and found that every single time slapd
crashes, sr_msgid has a value of -1 (META_MSGID_IGNORE), which indeed causes the
assert to fail. 

I found that candidates[i]->sr_flags has a value of 3 (META_CANDIDATE +
META_BINDING)

And the msc_mscflags in mc->mc_conns[ i ] are

* 0x100081 for all connections before the one that triggers the crash
* 0x100010 for the candidate that crashes the server
* 0x100080 for all connections after it


I am having trouble reproducing this in a test environment, but it happens
regularly in production, I have tried changing the timeouts, adding a
non-default bind timeout , and disabling retries (they were originally allowed)
but the crashes keep happening. Note that disabling retries (olcDbNretries:
never) still seems to lead to retries in meta_search_dobind_init, since the log
message is still there.

I cannot share the core dumps due to the sensitive information inside them.
However I would gladly extract more information from them if it can help solving
this.