[Dbmail-dev] [DBMail 0001081]: DBmail ABEND'ing upon LDAP access error.
1552]: [0x25a0630] Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP server. Trying again(519/3600). Oct 14 13:53:37 swlx143.swmed.e dbmail-imapd[1552]: [0x25a04a0] Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP server. Trying again(429/3600). Oct 14 13:53:38 swlx143.swmed.e dbmail-imapd[1552]: [0x25a0630] Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP server. Trying again(520/3600). Oct 14 13:53:38 swlx143.swmed.e dbmail-imapd[1552]: [0x25a04a0] Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP server. Trying again(430/3600). A TCPDUMP of all traffic for the destination of the LDAPS server that is configured reveals NO TRAFFIC being generated. Am I misunderstanding what "Trying again" means perhaps? -- (0003767) alan (reporter) - 14-Oct-16 22:40 http://dbmail.org/mantis/view.php?id=1081#c3767 -- That part of the code is trying to send a request, something in dbmail is blocking it. Are existing sessions still OK, is it possible to start a new session? Any clues to what else might be waiting or stalled? -- (0003768) alan (reporter) - 17-Oct-16 10:34 http://dbmail.org/mantis/view.php?id=1081#c3768 -- Default timeout of 4000 appears excessive, I'm successfully running production with 300 (5 mins), perhaps 60 might be useful for testing and may illuminate a stuck thread. -- (0003769) PeterS (reporter) - 17-Oct-16 21:07 http://dbmail.org/mantis/view.php?id=1081#c3769 -- Here is the latest. dbmail.20161017-1031.err.xz (dbmail.20161017-1031.err) and dbmail.20161017-1031.extra.txt (dbmail.20161017-1031.extra.txt.xz) Updated configuration for your suggested timeout, "query_timeout = 60". Still ABENDs, with the last line being: "dbmail-imapd: dm_config.c:134: config_get_value_once: Assertion `config_dict' failed." -- (0003770) alan (reporter) - 18-Oct-16 15:05 http://dbmail.org/mantis/view.php?id=1081#c3770 -- Just pushed a commit so the connection should show an error but not ABEND. The imap connection should fail and require re-authentication, depending on how the client connects it may need to be restarted. All should be ok if the other threads continue and either the affected thread dies with the connection or fully recovers after 'unrecoverable error while talking to ldap server'. -- (0003771) PeterS (reporter) - 18-Oct-16 19:52 http://dbmail.org/mantis/view.php?id=1081#c3771 -- The update is a bit better as it does not ABEND any longer. However once it loses its connection to LDAP it can not recover and infinitely attempts to connect. See sanitized log, dbmail.20161018-1122.err (dbmail.20161018-1122.err.xz). I have watched for any traffic outbound from the system to LDAP or LDAPS and it is not actually attempting or sending anything, so the connection attempts are bogus. -- (0003772) alan (reporter) - 19-Oct-16 18:58 http://dbmail.org/mantis/view.php?id=1081#c3772 -- LDAP call updated to remove a deprecated function. It appears timeout's don't work for synchronous calls: http://www.openldap.org/lists/openldap-technical/201311/msg00266.html -- (0003773) PeterS (reporter) - 20-Oct-16 23:44 http://dbmail.org/mantis/view.php?id=1081#c3773 -- It still is not ABEND'ing, which is great. However it still appears to be attempting to re-use a long-dead LDAPS connection (0x1049d90) provided in this case. Please see attached sanitized log parts, dbmail.20161020-1644.imap_session_handle_auth.err (dbmail.20161020-1644.imap_session_handle_auth.err.xz), dbmail.20161020-1644.ldap_tcpdump.txt (dbmail.20161020-1644.ldap_tcpdump.txt.xz), and dbmail.20161020-1644.thread_0x1049d90.err (dbmail.20161020-1644.thread_0x1049d90.err.xz). The provided text-only TCPDUMP/Wireshark output is actually still running right now, and shows no traffic at all to the configured LDAP/LDAPS server since 2016-10-20 11:42:05 . I focused on thread ID 0x1049d90 since that seems to be the one that it cho
[Dbmail-dev] [DBMail 0001081]: DBmail ABEND'ing upon LDAP access error.
1552]: [0x25a0630] Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP server. Trying again(519/3600). Oct 14 13:53:37 swlx143.swmed.e dbmail-imapd[1552]: [0x25a04a0] Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP server. Trying again(429/3600). Oct 14 13:53:38 swlx143.swmed.e dbmail-imapd[1552]: [0x25a0630] Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP server. Trying again(520/3600). Oct 14 13:53:38 swlx143.swmed.e dbmail-imapd[1552]: [0x25a04a0] Warning:[auth] authldap_search(+293): LDAP gone away: Can't contact LDAP server. Trying again(430/3600). A TCPDUMP of all traffic for the destination of the LDAPS server that is configured reveals NO TRAFFIC being generated. Am I misunderstanding what "Trying again" means perhaps? -- (0003767) alan (reporter) - 14-Oct-16 22:40 http://dbmail.org/mantis/view.php?id=1081#c3767 -- That part of the code is trying to send a request, something in dbmail is blocking it. Are existing sessions still OK, is it possible to start a new session? Any clues to what else might be waiting or stalled? -- (0003768) alan (reporter) - 17-Oct-16 10:34 http://dbmail.org/mantis/view.php?id=1081#c3768 -- Default timeout of 4000 appears excessive, I'm successfully running production with 300 (5 mins), perhaps 60 might be useful for testing and may illuminate a stuck thread. -- (0003769) PeterS (reporter) - 17-Oct-16 21:07 http://dbmail.org/mantis/view.php?id=1081#c3769 -- Here is the latest. dbmail.20161017-1031.err.xz (dbmail.20161017-1031.err) and dbmail.20161017-1031.extra.txt (dbmail.20161017-1031.extra.txt.xz) Updated configuration for your suggested timeout, "query_timeout = 60". Still ABENDs, with the last line being: "dbmail-imapd: dm_config.c:134: config_get_value_once: Assertion `config_dict' failed." -- (0003770) alan (reporter) - 18-Oct-16 15:05 http://dbmail.org/mantis/view.php?id=1081#c3770 -- Just pushed a commit so the connection should show an error but not ABEND. The imap connection should fail and require re-authentication, depending on how the client connects it may need to be restarted. All should be ok if the other threads continue and either the affected thread dies with the connection or fully recovers after 'unrecoverable error while talking to ldap server'. -- (0003771) PeterS (reporter) - 18-Oct-16 19:52 http://dbmail.org/mantis/view.php?id=1081#c3771 -- The update is a bit better as it does not ABEND any longer. However once it loses its connection to LDAP it can not recover and infinitely attempts to connect. See sanitized log, dbmail.20161018-1122.err (dbmail.20161018-1122.err.xz). I have watched for any traffic outbound from the system to LDAP or LDAPS and it is not actually attempting or sending anything, so the connection attempts are bogus. -- (0003772) alan (reporter) - 19-Oct-16 18:58 http://dbmail.org/mantis/view.php?id=1081#c3772 -- LDAP call updated to remove a deprecated function. It appears timeout's don't work for synchronous calls: http://www.openldap.org/lists/openldap-technical/201311/msg00266.html -- (0003773) PeterS (reporter) - 20-Oct-16 23:44 http://dbmail.org/mantis/view.php?id=1081#c3773 -- It still is not ABEND'ing, which is great. However it still appears to be attempting to re-use a long-dead LDAPS connection (0x1049d90) provided in this case. Please see attached sanitized log parts, dbmail.20161020-1644.imap_session_handle_auth.err (dbmail.20161020-1644.imap_session_handle_auth.err.xz), dbmail.20161020-1644.ldap_tcpdump.txt (dbmail.20161020-1644.ldap_tcpdump.txt.xz), and dbmail.20161020-1644.thread_0x1049d90.err (dbmail.20161020-1644.thread_0x1049d90.err.xz). The provided text-only TCPDUMP/Wireshark output is actually still running right now, and shows no traffic at all to the configured LDAP/LDAPS server since 2016-10-20 11:42:05 . I focused on thread ID 0x1049d90 since that seems to be the one that it chose to re-use after having not us