RE: sync_client fails to exit when manual replication and rolling replication are combined (2.3.16-8)

2010-10-19 Thread Simon Matter
 On Saturday, October 16, 2010 12:49 AM, Bron Gondwana wrote (2.3.16-8)

 On Fri, Oct 15, 2010 at 03:42:21PM -0400, Simpson, John R wrote:
  However, if I have run sync_client manually while rolling replication
 is
 enabled the rolling replication instance will not exit.  Instead, it
 appears to start spawning subprocesses and throwing database errors.
 The
 change in database errors (below) appears to coincide with the
 completion
 of Exporting cyrus-imapd databases.  The critical DB error messages
 continue until sync_client is killed.

 [ ... ]

  Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR db4: PANIC:
 fatal region error detected; run recovery
  Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR: critical
 database situation
  Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR db4: PANIC:
 fatal region error detected; run recovery
  Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR: critical
 database situation
  ... continue until sync_client is killed ...

 Nothing magic about sync_client itself here - it's something with the
 hand
 run sync_client and attaching/detaching from the environment.  This has
 been on the TODO list at FastMail for a while - and your information may
 actually help us narrow down the cause.  We don't use BDB, but the log
 messages annoy us too!

 If there's else anything I can do to help track this down, please let me
 know.  It was interesting to see that the errors were coming from the
 original, rolling replication sync_client process, not a manually
 initiated sync_client that didn't exit properly.

 The reason I'm running sync_client manually is to seed the replica with
 the existing users and mailboxes on the master server, as described in
 http://www.cyrusimap.org/docs/cyrus-imapd/2.3.16/install-replication.php.
 Would it be better to use rsync?

 Is there any reason not to add code to clean up any remaining sync_client
 processes to the stop function in /etc/rc.d/init.d/cyrus-imapd?

Yes, that could get a little tricky because the init script has multi
instance support and so you don't have to only identify sync_client
processes running outside master but also identify which instance they
bwlong to. Beside that, init scripts usually _only_ terminate services
they have started, not anything else.


 I am pretty sure we're not using BDB either, but I found a log file,
 /var/lib/imap/db/log.01, that appears to be a Berkeley DB log
 file.

Right, as long as BDBless builds are not possible, we will always see BDB
being initialized, even if not used :(

BTW, I can't help you much with sync_client because I have never used it.
However, I'm quite sure shutting down cyrus-imapd while sync_client is
running may do bad things with your databases. The init script tries to
convert all BDB to skiplist and the cleaning up the BDB environment. I
guess that's not good while sync_client is running.

Simon


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: sync_client fails to exit when manual replication and rolling replication are combined (2.3.16-8)

2010-10-19 Thread Hajimu UMEMOTO
Hi,

 On Tue, 19 Oct 2010 14:22:06 +0200
 Simon Matter simon.mat...@invoca.ch said:

 I am pretty sure we're not using BDB either, but I found a log file,
 /var/lib/imap/db/log.01, that appears to be a Berkeley DB log
 file.

simon Right, as long as BDBless builds are not possible, we will always see BDB
simon being initialized, even if not used :(

I believe recent Cyrus IMAPd can be built without Berkeley DB by
giving --with-bdb=no to configure.
Since 2.4.X doesn't use Berkeley DB by default, I'm building it
without Berkeley DB, and I don't see /var/imap/db/log.* anymore.

Sincerely,

--
Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan
u...@mahoroba.org  u...@{,jp.}FreeBSD.org
http://www.imasy.org/~ume/

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


RE: sync_client fails to exit when manual replication and rolling replication are combined (2.3.16-8)

2010-10-18 Thread Simpson, John R
On Saturday, October 16, 2010 12:49 AM, Bron Gondwana wrote (2.3.16-8)
 
 On Fri, Oct 15, 2010 at 03:42:21PM -0400, Simpson, John R wrote:
  However, if I have run sync_client manually while rolling replication is
 enabled the rolling replication instance will not exit.  Instead, it
 appears to start spawning subprocesses and throwing database errors.  The
 change in database errors (below) appears to coincide with the completion
 of Exporting cyrus-imapd databases.  The critical DB error messages
 continue until sync_client is killed.
 
 [ ... ]
 
  Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR db4: PANIC:
 fatal region error detected; run recovery
  Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR: critical
 database situation
  Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR db4: PANIC:
 fatal region error detected; run recovery
  Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR: critical
 database situation
  ... continue until sync_client is killed ...
 
 Nothing magic about sync_client itself here - it's something with the hand
 run sync_client and attaching/detaching from the environment.  This has
 been on the TODO list at FastMail for a while - and your information may
 actually help us narrow down the cause.  We don't use BDB, but the log
 messages annoy us too!

If there's else anything I can do to help track this down, please let me know.  
It was interesting to see that the errors were coming from the original, 
rolling replication sync_client process, not a manually initiated sync_client 
that didn't exit properly.

The reason I'm running sync_client manually is to seed the replica with the 
existing users and mailboxes on the master server, as described in 
http://www.cyrusimap.org/docs/cyrus-imapd/2.3.16/install-replication.php.  
Would it be better to use rsync?

Is there any reason not to add code to clean up any remaining sync_client 
processes to the stop function in /etc/rc.d/init.d/cyrus-imapd?

I am pretty sure we're not using BDB either, but I found a log file, 
/var/lib/imap/db/log.01, that appears to be a Berkeley DB log file.  

Thank you,

John

 
 Regards,
 
 Bron.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


sync_client fails to exit when manual replication and rolling replication are combined (2.3.16-8)

2010-10-15 Thread Simpson, John R
Greetings all,

With the help of this list, I've successfully upgraded our lab 2.3.7 
(RHEL/CentOS packaged) server to 2.3.16-8 and tested rolling replication, 
manual replication by user, and manual replication by mailbox.  Everything was 
going better than expected until I shut down cyrus-imapd and /var/log/maillog 
started filling up with DB errors.

If I shut down cyrus-imapd with rolling replication enabled and have not run 
sync_client manually, both Cyrus and sync_client shut down cleanly.

However, if I have run sync_client manually while rolling replication is 
enabled the rolling replication instance will not exit.  Instead, it appears to 
start spawning subprocesses and throwing database errors.  The change in 
database errors (below) appears to coincide with the completion of Exporting 
cyrus-imapd databases.  The critical DB error messages continue until 
sync_client is killed.

I've run ctl_cyrusdb -r as suggested by the run recovery message.

Below are the steps that reproduce the problem, /var/log/maillog, the most 
relevant portions of imapd.conf and cyrus.conf, and the packages installed on 
the system.  cyrus-imapd-2.3.16-8 was built with rpmbuild -ba on CentOS 5.4 
64-bit using 
http://www.invoca.ch/pub/packages/cyrus-imapd/cyrus-imapd-2.3.16-8.src.rpm.  
The cyrus-sasl and db4 packages are from CentOS.  Please let me know if any 
other information would be useful.

Thank you for your help.

Best regards,

John


# /usr/lib/cyrus-imapd/sync_client -v -u testu...@testdomain.net
USER testu...@testdomain.net
ADDSUB testu...@testdomain.net INBOX
# date ; service cyrus-imapd stop
Fri Oct 15 14:51:34 EDT 2010
Shutting down cyrus-imapd: [  OK  ]
Exporting cyrus-imapd databases:   [  OK  ]

Oct 15 14:50:58 eml-store04 sync_client[23742]: USER received NO response: 
IMAP_MAILBOX_NONEXISTENT Failed to access inbox for testu...@testdomain.net: 
Mailbox does not exist

  NOTE: Despite this message, the user appears identical on the 
  master and replica when checked with ctl_mboxlist -d.

Oct 15 14:51:35 eml-store04 master[22922]: attempting clean shutdown on SIGQUIT
Oct 15 14:51:35 eml-store04 master[22922]: process 22950 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22949 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22948 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22947 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22946 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22945 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22944 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22943 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22939 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22938 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22937 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22936 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22935 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22934 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22933 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22932 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22931 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: All children have exited, closing 
down
Oct 15 14:51:35 eml-store04 sync_client[23914]: DBERROR db4: region 1 
(environment): reference count went negative
Oct 15 14:51:35 eml-store04 sync_client[23916]: DBERROR db4: region 1 
(environment): reference count went negative
Oct 15 14:51:35 eml-store04 sync_client[23919]: DBERROR db4: region 1 
(environment): reference count went negative
Oct 15 14:51:35 eml-store04 sync_client[23925]: DBERROR db4: region 1 
(environment): reference count went negative
Oct 15 14:51:35 eml-store04 sync_client[23929]: DBERROR db4: region 1 
(environment): reference count went negative
... many more ...
Oct 15 14:51:41 eml-store04 sync_client[25331]: DBERROR db4: region 1 
(environment): reference count went negative
Oct 15 14:51:41 eml-store04 sync_client[25332]: DBERROR db4: region 1 
(environment): reference count went negative
Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR db4: PANIC: fatal 
region error detected; run recovery
Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR: critical database 
situation
Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR db4: PANIC: fatal 
region error detected; run recovery
Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR: critical database 
situation
... continue until sync_client is killed ...


From /etc/cyrus.conf:
START {
  # do not delete this entry!
  recover   cmd=ctl_cyrusdb -r
  # this is only necessary if using idled for IMAP IDLE
  idled cmd=idled
  syncclient

Re: sync_client fails to exit when manual replication and rolling replication are combined (2.3.16-8)

2010-10-15 Thread Bron Gondwana
On Fri, Oct 15, 2010 at 03:42:21PM -0400, Simpson, John R wrote:
 However, if I have run sync_client manually while rolling replication is 
 enabled the rolling replication instance will not exit.  Instead, it appears 
 to start spawning subprocesses and throwing database errors.  The change in 
 database errors (below) appears to coincide with the completion of Exporting 
 cyrus-imapd databases.  The critical DB error messages continue until 
 sync_client is killed.

[ ... ]

 Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR db4: PANIC: fatal 
 region error detected; run recovery
 Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR: critical database 
 situation
 Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR db4: PANIC: fatal 
 region error detected; run recovery
 Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR: critical database 
 situation
 ... continue until sync_client is killed ...

Nothing magic about sync_client itself here - it's something with the hand
run sync_client and attaching/detaching from the environment.  This has
been on the TODO list at FastMail for a while - and your information may
actually help us narrow down the cause.  We don't use BDB, but the log
messages annoy us too!

Regards,

Bron.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/