RE: sync_client fails to exit when manual replication and rolling replication are combined (2.3.16-8)
On Saturday, October 16, 2010 12:49 AM, Bron Gondwana wrote (2.3.16-8) On Fri, Oct 15, 2010 at 03:42:21PM -0400, Simpson, John R wrote: However, if I have run sync_client manually while rolling replication is enabled the rolling replication instance will not exit. Instead, it appears to start spawning subprocesses and throwing database errors. The change in database errors (below) appears to coincide with the completion of Exporting cyrus-imapd databases. The critical DB error messages continue until sync_client is killed. [ ... ] Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR db4: PANIC: fatal region error detected; run recovery Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR: critical database situation Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR db4: PANIC: fatal region error detected; run recovery Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR: critical database situation ... continue until sync_client is killed ... Nothing magic about sync_client itself here - it's something with the hand run sync_client and attaching/detaching from the environment. This has been on the TODO list at FastMail for a while - and your information may actually help us narrow down the cause. We don't use BDB, but the log messages annoy us too! If there's else anything I can do to help track this down, please let me know. It was interesting to see that the errors were coming from the original, rolling replication sync_client process, not a manually initiated sync_client that didn't exit properly. The reason I'm running sync_client manually is to seed the replica with the existing users and mailboxes on the master server, as described in http://www.cyrusimap.org/docs/cyrus-imapd/2.3.16/install-replication.php. Would it be better to use rsync? Is there any reason not to add code to clean up any remaining sync_client processes to the stop function in /etc/rc.d/init.d/cyrus-imapd? Yes, that could get a little tricky because the init script has multi instance support and so you don't have to only identify sync_client processes running outside master but also identify which instance they bwlong to. Beside that, init scripts usually _only_ terminate services they have started, not anything else. I am pretty sure we're not using BDB either, but I found a log file, /var/lib/imap/db/log.01, that appears to be a Berkeley DB log file. Right, as long as BDBless builds are not possible, we will always see BDB being initialized, even if not used :( BTW, I can't help you much with sync_client because I have never used it. However, I'm quite sure shutting down cyrus-imapd while sync_client is running may do bad things with your databases. The init script tries to convert all BDB to skiplist and the cleaning up the BDB environment. I guess that's not good while sync_client is running. Simon Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: sync_client fails to exit when manual replication and rolling replication are combined (2.3.16-8)
Hi, On Tue, 19 Oct 2010 14:22:06 +0200 Simon Matter simon.mat...@invoca.ch said: I am pretty sure we're not using BDB either, but I found a log file, /var/lib/imap/db/log.01, that appears to be a Berkeley DB log file. simon Right, as long as BDBless builds are not possible, we will always see BDB simon being initialized, even if not used :( I believe recent Cyrus IMAPd can be built without Berkeley DB by giving --with-bdb=no to configure. Since 2.4.X doesn't use Berkeley DB by default, I'm building it without Berkeley DB, and I don't see /var/imap/db/log.* anymore. Sincerely, -- Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan u...@mahoroba.org u...@{,jp.}FreeBSD.org http://www.imasy.org/~ume/ Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
RE: sync_client fails to exit when manual replication and rolling replication are combined (2.3.16-8)
On Saturday, October 16, 2010 12:49 AM, Bron Gondwana wrote (2.3.16-8) On Fri, Oct 15, 2010 at 03:42:21PM -0400, Simpson, John R wrote: However, if I have run sync_client manually while rolling replication is enabled the rolling replication instance will not exit. Instead, it appears to start spawning subprocesses and throwing database errors. The change in database errors (below) appears to coincide with the completion of Exporting cyrus-imapd databases. The critical DB error messages continue until sync_client is killed. [ ... ] Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR db4: PANIC: fatal region error detected; run recovery Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR: critical database situation Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR db4: PANIC: fatal region error detected; run recovery Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR: critical database situation ... continue until sync_client is killed ... Nothing magic about sync_client itself here - it's something with the hand run sync_client and attaching/detaching from the environment. This has been on the TODO list at FastMail for a while - and your information may actually help us narrow down the cause. We don't use BDB, but the log messages annoy us too! If there's else anything I can do to help track this down, please let me know. It was interesting to see that the errors were coming from the original, rolling replication sync_client process, not a manually initiated sync_client that didn't exit properly. The reason I'm running sync_client manually is to seed the replica with the existing users and mailboxes on the master server, as described in http://www.cyrusimap.org/docs/cyrus-imapd/2.3.16/install-replication.php. Would it be better to use rsync? Is there any reason not to add code to clean up any remaining sync_client processes to the stop function in /etc/rc.d/init.d/cyrus-imapd? I am pretty sure we're not using BDB either, but I found a log file, /var/lib/imap/db/log.01, that appears to be a Berkeley DB log file. Thank you, John Regards, Bron. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
sync_client fails to exit when manual replication and rolling replication are combined (2.3.16-8)
Greetings all, With the help of this list, I've successfully upgraded our lab 2.3.7 (RHEL/CentOS packaged) server to 2.3.16-8 and tested rolling replication, manual replication by user, and manual replication by mailbox. Everything was going better than expected until I shut down cyrus-imapd and /var/log/maillog started filling up with DB errors. If I shut down cyrus-imapd with rolling replication enabled and have not run sync_client manually, both Cyrus and sync_client shut down cleanly. However, if I have run sync_client manually while rolling replication is enabled the rolling replication instance will not exit. Instead, it appears to start spawning subprocesses and throwing database errors. The change in database errors (below) appears to coincide with the completion of Exporting cyrus-imapd databases. The critical DB error messages continue until sync_client is killed. I've run ctl_cyrusdb -r as suggested by the run recovery message. Below are the steps that reproduce the problem, /var/log/maillog, the most relevant portions of imapd.conf and cyrus.conf, and the packages installed on the system. cyrus-imapd-2.3.16-8 was built with rpmbuild -ba on CentOS 5.4 64-bit using http://www.invoca.ch/pub/packages/cyrus-imapd/cyrus-imapd-2.3.16-8.src.rpm. The cyrus-sasl and db4 packages are from CentOS. Please let me know if any other information would be useful. Thank you for your help. Best regards, John # /usr/lib/cyrus-imapd/sync_client -v -u testu...@testdomain.net USER testu...@testdomain.net ADDSUB testu...@testdomain.net INBOX # date ; service cyrus-imapd stop Fri Oct 15 14:51:34 EDT 2010 Shutting down cyrus-imapd: [ OK ] Exporting cyrus-imapd databases: [ OK ] Oct 15 14:50:58 eml-store04 sync_client[23742]: USER received NO response: IMAP_MAILBOX_NONEXISTENT Failed to access inbox for testu...@testdomain.net: Mailbox does not exist NOTE: Despite this message, the user appears identical on the master and replica when checked with ctl_mboxlist -d. Oct 15 14:51:35 eml-store04 master[22922]: attempting clean shutdown on SIGQUIT Oct 15 14:51:35 eml-store04 master[22922]: process 22950 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22949 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22948 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22947 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22946 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22945 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22944 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22943 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22939 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22938 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22937 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22936 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22935 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22934 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22933 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22932 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: process 22931 exited, status 75 Oct 15 14:51:35 eml-store04 master[22922]: All children have exited, closing down Oct 15 14:51:35 eml-store04 sync_client[23914]: DBERROR db4: region 1 (environment): reference count went negative Oct 15 14:51:35 eml-store04 sync_client[23916]: DBERROR db4: region 1 (environment): reference count went negative Oct 15 14:51:35 eml-store04 sync_client[23919]: DBERROR db4: region 1 (environment): reference count went negative Oct 15 14:51:35 eml-store04 sync_client[23925]: DBERROR db4: region 1 (environment): reference count went negative Oct 15 14:51:35 eml-store04 sync_client[23929]: DBERROR db4: region 1 (environment): reference count went negative ... many more ... Oct 15 14:51:41 eml-store04 sync_client[25331]: DBERROR db4: region 1 (environment): reference count went negative Oct 15 14:51:41 eml-store04 sync_client[25332]: DBERROR db4: region 1 (environment): reference count went negative Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR db4: PANIC: fatal region error detected; run recovery Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR: critical database situation Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR db4: PANIC: fatal region error detected; run recovery Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR: critical database situation ... continue until sync_client is killed ... From /etc/cyrus.conf: START { # do not delete this entry! recover cmd=ctl_cyrusdb -r # this is only necessary if using idled for IMAP IDLE idled cmd=idled syncclient
Re: sync_client fails to exit when manual replication and rolling replication are combined (2.3.16-8)
On Fri, Oct 15, 2010 at 03:42:21PM -0400, Simpson, John R wrote: However, if I have run sync_client manually while rolling replication is enabled the rolling replication instance will not exit. Instead, it appears to start spawning subprocesses and throwing database errors. The change in database errors (below) appears to coincide with the completion of Exporting cyrus-imapd databases. The critical DB error messages continue until sync_client is killed. [ ... ] Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR db4: PANIC: fatal region error detected; run recovery Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR: critical database situation Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR db4: PANIC: fatal region error detected; run recovery Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR: critical database situation ... continue until sync_client is killed ... Nothing magic about sync_client itself here - it's something with the hand run sync_client and attaching/detaching from the environment. This has been on the TODO list at FastMail for a while - and your information may actually help us narrow down the cause. We don't use BDB, but the log messages annoy us too! Regards, Bron. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/