[Dbmail-dev] [DBMail 0001081]: DBmail ABEND'ing upon LDAP access error.

2016-10-06 Thread Mantis Bug Tracker
ed error,
dbmail.20160930-1642.err .  The service still ABEND's. 

-- 
 (0003754) PeterS (reporter) - 01-Oct-16 00:18
 http://dbmail.org/mantis/view.php?id=1081#c3754 
-- 
I've patched a GIT clone (git clone git://git.dbmail.eu/paul/dbmail) from
today with dbmail_src_modules_authldap.c-20160930.diff and receive the
attached error, dbmail.20160930-1721.err .  The service still ABEND's. 

-- 
 (0003755) alan (reporter) - 01-Oct-16 15:36
 http://dbmail.org/mantis/view.php?id=1081#c3755 
-- 
Ok, latest patch src_modules_authldap.c-20161001.diff increases the time to
check for server gone away to 2 minutes.

There is an edge case where after binding, the connection instantly goes
away. Also the daemon exits if the ldap server doesn't reappear after two
minutes, similar to not starting if not available.

There should be warnings such as the following, then recovers when the
server returns.
Oct 01 14:06:38 lully.p-o.co.uk dbmail-imapd[96422]: [0x805c0a800]
Warning:[auth] ldap_con_get(+142): LDAP gone away: Can't contact LDAP
server. Trying to reconnect(1/5).
Oct 01 14:06:39 lully.p-o.co.uk dbmail-imapd[96422]: [0x805c0a800]
Warning:[auth] ldap_con_get(+142): LDAP gone away: Can't contact LDAP
server. Trying to reconnect(2/5).
Oct 01 14:06:49 lully.p-o.co.uk dbmail-imapd[96422]: [0x805c0a800]
Warning:[auth] ldap_con_get(+142): LDAP gone away: Can't contact LDAP
server. Trying to reconnect(119/5).

Does this get closer to addressing your issue? 

-- 
 (0003756) PeterS (reporter) - 03-Oct-16 16:54
 http://dbmail.org/mantis/view.php?id=1081#c3756 
-- 
I've patched the Red Hat RHEL 7 (EPEL 7) version of DBmail
(dbmail-3.2.3-1.el7.x86_64, dbmail-auth-ldap-3.2.3-1.el7.x86_64, and
dbmail-debuginfo-3.2.3-1.el7.x86_64) with
src_modules_authldap.c-20161001.diff and receive the attached error,
dbmail.20161003-0956.err .  This time I've cut the last parts of the full
dbmail.err file which consists of the last threads 0x7f86e84e8000 and
0x7f86e81d1400 .  The service still ABEND's.

I'll now try with today's GIT version and the patch. 

-- 
 (0003757) PeterS (reporter) - 03-Oct-16 19:57
 http://dbmail.org/mantis/view.php?id=1081#c3757 
-- 
I've patched a GIT clone (git clone git://git.dbmail.eu/paul/dbmail) from
today with src_modules_authldap.c-20161001.diff and receive the attached
error, dbmail.20161003-1055.err (dbmail.20161003-1055.err.xz) . This time
I've cut the last parts of the full dbmail.err file which consists of the
last threads 0x1466450 and 0xde7400.  The service still ABEND's.

 

-- 
 (0003758) alan (reporter) - 04-Oct-16 17:54
 http://dbmail.org/mantis/view.php?id=1081#c3758 
-- 
Patch src_modules_authldap.c-20161004.diff adds slightly more debugging
information, plus binding and search retries.

The error appears to be transient and localised in authldap_search, this
patch should address the issue.

fyi there is a copy at https://github.com/alan-hicks/dbmail 

-- 
 (0003759) PeterS (reporter) - 04-Oct-16 21:47
 http://dbmail.org/mantis/view.php?id=1081#c3759 
-- 
I pulled down a clone of your https://github.com/alan-hicks/dbmail and had
it running.  This version does not ABEND but rather stops communicating
(which caused my client to Segfault in the middle of doing IMAPS things).

Please see the attached dbmail.20161004-1348.err
(dbmail.20161004-1348.err.xz) for the full error listing.

Correction: it did ABEND when I restarted my IMAPS client and attempted to
connect to the DBmail instance.  The same "dbmail-imapd: dm_config.c:134:
config_get_value_once: Assertion `config_dict' failed." error as before is
the last thing logged.

 

-- 
 (0003760) alan (reporter) - 06-Oct-16 15:57
 http://dbmail.org/mantis/view.php?id=1081#c3760 
-- 
I'm struggling to find what causes dbmail to be unable to contact the
server.
Patch src_modules_authldap.c-20161006.diff bumps the search timeout to 60
seconds.
The error is that the ldap server can't be contacted. Although there is
only a single connection, nothing suggests it's blocked, out of memory or
any of the normal errors. As the

[Dbmail-dev] [DBMail 0001081]: DBmail ABEND'ing upon LDAP access error.

2016-10-06 Thread Mantis Bug Tracker
ed error,
dbmail.20160930-1642.err .  The service still ABEND's. 

-- 
 (0003754) PeterS (reporter) - 01-Oct-16 00:18
 http://dbmail.org/mantis/view.php?id=1081#c3754 
-- 
I've patched a GIT clone (git clone git://git.dbmail.eu/paul/dbmail) from
today with dbmail_src_modules_authldap.c-20160930.diff and receive the
attached error, dbmail.20160930-1721.err .  The service still ABEND's. 

-- 
 (0003755) alan (reporter) - 01-Oct-16 15:36
 http://dbmail.org/mantis/view.php?id=1081#c3755 
-- 
Ok, latest patch src_modules_authldap.c-20161001.diff increases the time to
check for server gone away to 2 minutes.

There is an edge case where after binding, the connection instantly goes
away. Also the daemon exits if the ldap server doesn't reappear after two
minutes, similar to not starting if not available.

There should be warnings such as the following, then recovers when the
server returns.
Oct 01 14:06:38 lully.p-o.co.uk dbmail-imapd[96422]: [0x805c0a800]
Warning:[auth] ldap_con_get(+142): LDAP gone away: Can't contact LDAP
server. Trying to reconnect(1/5).
Oct 01 14:06:39 lully.p-o.co.uk dbmail-imapd[96422]: [0x805c0a800]
Warning:[auth] ldap_con_get(+142): LDAP gone away: Can't contact LDAP
server. Trying to reconnect(2/5).
Oct 01 14:06:49 lully.p-o.co.uk dbmail-imapd[96422]: [0x805c0a800]
Warning:[auth] ldap_con_get(+142): LDAP gone away: Can't contact LDAP
server. Trying to reconnect(119/5).

Does this get closer to addressing your issue? 

-- 
 (0003756) PeterS (reporter) - 03-Oct-16 16:54
 http://dbmail.org/mantis/view.php?id=1081#c3756 
-- 
I've patched the Red Hat RHEL 7 (EPEL 7) version of DBmail
(dbmail-3.2.3-1.el7.x86_64, dbmail-auth-ldap-3.2.3-1.el7.x86_64, and
dbmail-debuginfo-3.2.3-1.el7.x86_64) with
src_modules_authldap.c-20161001.diff and receive the attached error,
dbmail.20161003-0956.err .  This time I've cut the last parts of the full
dbmail.err file which consists of the last threads 0x7f86e84e8000 and
0x7f86e81d1400 .  The service still ABEND's.

I'll now try with today's GIT version and the patch. 

-- 
 (0003757) PeterS (reporter) - 03-Oct-16 19:57
 http://dbmail.org/mantis/view.php?id=1081#c3757 
-- 
I've patched a GIT clone (git clone git://git.dbmail.eu/paul/dbmail) from
today with src_modules_authldap.c-20161001.diff and receive the attached
error, dbmail.20161003-1055.err (dbmail.20161003-1055.err.xz) . This time
I've cut the last parts of the full dbmail.err file which consists of the
last threads 0x1466450 and 0xde7400.  The service still ABEND's.

 

-- 
 (0003758) alan (reporter) - 04-Oct-16 17:54
 http://dbmail.org/mantis/view.php?id=1081#c3758 
-- 
Patch src_modules_authldap.c-20161004.diff adds slightly more debugging
information, plus binding and search retries.

The error appears to be transient and localised in authldap_search, this
patch should address the issue.

fyi there is a copy at https://github.com/alan-hicks/dbmail 

-- 
 (0003759) PeterS (reporter) - 04-Oct-16 21:47
 http://dbmail.org/mantis/view.php?id=1081#c3759 
-- 
I pulled down a clone of your https://github.com/alan-hicks/dbmail and had
it running.  This version does not ABEND but rather stops communicating
(which caused my client to Segfault in the middle of doing IMAPS things).

Please see the attached dbmail.20161004-1348.err
(dbmail.20161004-1348.err.xz) for the full error listing.

Correction: it did ABEND when I restarted my IMAPS client and attempted to
connect to the DBmail instance.  The same "dbmail-imapd: dm_config.c:134:
config_get_value_once: Assertion `config_dict' failed." error as before is
the last thing logged.

 

-- 
 (0003760) alan (reporter) - 06-Oct-16 15:57
 http://dbmail.org/mantis/view.php?id=1081#c3760 
-- 
I'm struggling to find what causes dbmail to be unable to contact the
server.
Patch src_modules_authldap.c-20161006.diff bumps the search timeout to 60
seconds.
The error is that the ldap server can't be contacted. Although there is
only a single connection, nothing suggests it's blocked, out of memory or
any of the normal errors. As the