Re: Duplicate delivery DB (was Unexpected database recovery)

2003-12-04 Thread Henrique de Moraes Holschuh
On Wed, 03 Dec 2003, Rob Siemborski wrote:
 On Wed, 3 Dec 2003, Christian Schulte wrote:
  Richard Gilbert schrieb:
   The reason why I am restarting the server is to deal with the odd
   IOERROR: reading message: unexpected end of file errors, which I am
   still getting, which first led me to the lock_flock patch (archive message
   18705).
 
  This is interesting. I am also seeing these entries from time to time,
  maybe once a month. Is this lock_flock patch currently used in 2.2 ?
 
 No -- under high lock contention it performs very very poorly.

That it does, that it does...

 We are looking at an alternat patch to break deadlocks (should they
 occur):
 
 https://bugzilla.andrew.cmu.edu/show_bug.cgi?id=1177
 
 But no one has confirmed that the patch works.

I know it works as in it operates correctly in Linux 2.4.2x, glibc 2.3.2,
but I don't have performance data.

I _can_ try to obtain that performance data if someone sends me scenarios
that will test the performance (scripts are MORE than welcome BTW :P)

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


Duplicate delivery DB (was Unexpected database recovery)

2003-12-03 Thread Richard Gilbert
  ... I applied John Wade's lock_flock patch to the version of Cyrus
  impad we were already running, i.e. 2.1.14 and rebuilt and reinstalled.
  cyrus-imapd was restarted at 5 am this morning to minimise inconvenience
  to users.  I was surprised to find that the system was unavailable until
  about 08:39 because of database recovery.
 
  Nov 19 05:00:11 impala master[9697]: [...] process started
  Nov 19 05:00:11 impala ctl_cyrusdb[9698]: [...] recovering cyrus databases
  Nov 19 05:05:10 impala ctl_mboxlist[10854]: [...] skiplist: recovered
  /var/imap/mailboxes.db (61786 records, 4909724 bytes) in 9 seconds
  Nov 19 08:38:54 impala ctl_cyrusdb[9698]: [...] done recovering cyrus databases
  Nov 19 08:38:54 impala master[9697]: [...] ready for work
  Nov 19 08:38:54 impala ctl_cyrusdb[22419]: [...] checkpointing cyrus databases

 The lock_flock patch has serious performance implications (namely, if you
 don't get a lock on the first try, you have to wait an entire second to
 try again), and given that this happened just after you changed the
 locking mechanism, it seems suspicious.

 However, I can't think what would be causing the recovery process to lose
 at getting the locks it needs, so (nothing else should be running at
 that time)

My logs don't go back any further (without hassling the sys admins) but
the database recovery took ~90 mins two days before the lock_flock patch
was applied and then ~3h 38m immediately after it was applied:

Nov 17 05:00:10 impala master[19332]: [ID 965400 local6.notice] process started
Nov 17 06:30:51 impala master[19332]: [ID 139525 local6.notice] ready for work
Nov 19 05:00:11 impala master[9697]: [ID 965400 local6.notice] process started
Nov 19 08:38:54 impala master[9697]: [ID 139525 local6.notice] ready for work

I then reduced the -E paramater to ctl_deliver from 3 to 1 and at the next
restart a few days later, still running with the lock_flock patch, it
took 20 seconds and at the next restart 22 seconds:

Nov 23 04:00:10 impala master[1871]: [ID 965400 local6.notice] process started
Nov 23 04:00:30 impala master[1871]: [ID 139525 local6.notice] ready for work
Nov 28 04:00:10 impala master[22770]: [ID 965400 local6.notice] process started
Nov 28 04:00:32 impala master[22770]: [ID 139525 local6.notice] ready for work

The reason why I am restarting the server is to deal with the odd
IOERROR: reading message: unexpected end of file errors, which I am
still getting, which first led me to the lock_flock patch (archive message
18705).

Today for the first time I am seeing the constantly repeated message
DBERROR: mydelete: error deleting [EMAIL PROTECTED]:
DB_NOTFOUND: No matching key/data pair found.  I haven't seen much on the
list about this so I will be stopping the server in the early hours and
deleting the duplicate.db before restarting.  ([EMAIL PROTECTED] said he ran
reconstruct -f after deleting the database but reconstruct doesn't seem to
be relevant to the duplicate delivery database so I won't bother.)

The duplicate delivery database (Berkeley DB3) seems to be at the root of
the few problems I have with Cyrus and yet I get the impression from the
discussions on this list that disabling duplicatesuppression won't
actually make things any better because the database is still maintained.

Thank you in anticipation of any advice or comments.

Richard
--
Richard Gilbert
Corporate Information and Computing Services
University of Sheffield, Sheffield, S10 2TN, UK
Phone: +44 114 222 3028   Fax: +44 114 222 3040


Re: Duplicate delivery DB (was Unexpected database recovery)

2003-12-03 Thread Rob Siemborski
On Wed, 3 Dec 2003, Christian Schulte wrote:

 Richard Gilbert schrieb:
  The reason why I am restarting the server is to deal with the odd
  IOERROR: reading message: unexpected end of file errors, which I am
  still getting, which first led me to the lock_flock patch (archive message
  18705).

 This is interesting. I am also seeing these entries from time to time,
 maybe once a month. Is this lock_flock patch currently used in 2.2 ?

No -- under high lock contention it performs very very poorly.

We are looking at an alternat patch to break deadlocks (should they
occur):

https://bugzilla.andrew.cmu.edu/show_bug.cgi?id=1177

But no one has confirmed that the patch works.

-Rob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456
Research Systems Programmer * /usr/contributed Gatekeeper



Re: Unexpected database recovery

2003-11-20 Thread Henrique de Moraes Holschuh
On Thu, 20 Nov 2003, Philipp Sacha wrote:
 I could reproduce stucking behaviour of lmtp by starting cyrus with 5 
 preforked lmtpd -a processes. When i kill that processes manually 
 and then try to telnet to port lmtp on the mailserver, i have a stuck 
 lmptd. That means that port lmtp is opened but no prompt appears.

That's a different bug, which is solved. You need the new cyrus master code
that does pid tracking, which has been commited to 2.2 upstream.

Is that the bug you always had?  If so, you applied the wrong patch :(

There is a patch for the child pid tracking/child morgue in bugzila (I don't
recall the number. State is CLOSED I believe), you can apply that to 2.1
(but you need to also compare it to 2.2, I believe one minor mistake was
fixed since then...)

IMHO all that pid tracking code should be added to 2.1 as well.

 How can i verify that the patch is working? I have seen that 
 setsigalrm logs to syslog. Is it therefor sufficient to grep for 
 SIGALRM in the cyrus log?

I think so.

The way I usually test for locks is to run The Mad Postman against a postfix
that delivers to cyrus, and tell postfix to do something akin to 200
lmtp deliveries in parallel.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


Re: Unexpected database recovery

2003-11-20 Thread Richard Gilbert
  Yesterday I applied John Wade's lock_flock patch to the version of Cyrus
  imapd we were already running, i.e. 2.1.14 and rebuilt and reinstalled.
  cyrus-imapd was restarted at 5 am this morning to minimise inconvenience
  to users.  I was surprised to find that the system was unavailable until
  about 08:39 because of database recovery.
 
  My question is: was this database recovery caused by the system realising
  that the software had changed, or was it a complete coincidence?  We
  restart the system three times a week at 5am and this has not happenned
  before, as far as I know.

 The lock_flock patch has serious performance implications (namely, if you
 don't get a lock on the first try, you have to wait an entire second to
 try again),

Thank you very much, Rob, for your swift response. I suppose that explains
why the patch has never been incorporated in the distribution. :-)  A few
times yesterday colleagues said that there was a problem with IMAP when
it appeared to be fine in general.  I guess these could have been caused
by temporary performance problems.
[more below]

 and given that this happened just after you changed the
 locking mechanism, it seems suspicious.

 However, I can't think what would be causing the recovery process to lose
 at getting the locks it needs, so (nothing else should be running at
 that time)

 FWIW, database recovery is necessary every time you restart cyrus to
 ensure that the databases are in a consistant state before data is served.

Thank you for pointing that out.  I checked and found that the recovery
was already taking ~90 mins before the patch, but no-one seemed to notice!

If I don't use the patch I expect the problem with LMTP delivery to return
with the associated ramp up of the number of db3 lockers reported.  The
only database which is using Berkeley DB is the duplicate delivery
database, so logically this must be the source of the db3 locking problem.
The database was very large (138 MBytes) and pruning on this was set to 3
days so I will change this to 1 day to reduce the size and consequent
recovery time.  However, I am beginning to wonder whether I should stop
using the duplicate delivery database as the simplest way of avoiding db3
locking problems.  Would this mean that a single message delivered to 50
users would start to appear as 50 separate copies rather than one file
with 50 links?

(Cyrus is running on a Solaris 8 system with about 28,000 users.)

TYIA

Richard
--
Richard Gilbert
Corporate Information and Computing Services
University of Sheffield, Sheffield, S10 2TN, UK
Phone: +44 114 222 3028   Fax: +44 114 222 3040


Re: Unexpected database recovery

2003-11-20 Thread Rob Siemborski
On Thu, 20 Nov 2003, Henrique de Moraes Holschuh wrote:

 IMHO all that pid tracking code should be added to 2.1 as well.

2.1 is going to be basically end-of-life (barring currently unforseen
issues) when I release 2.1.16 later today... and as such we've been trying
to avoid significant code changes to it.

The pid tracking stuff definitely qualifies as significant, and as we've
only seen the problem once locally (during normal operation -- not by
explicitly killing processes in accept()) maybe once since we deployed
2.x.

-Rob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456
Research Systems Programmer * /usr/contributed Gatekeeper



Unexpected database recovery

2003-11-19 Thread Richard Gilbert
Yesterday I applied John Wade's lock_flock patch to the version of Cyrus
impad we were already running, i.e. 2.1.14 and rebuilt and reinstalled.
cyrus-imapd was restarted at 5 am this morning to minimise inconvenience
to users.  I was surprised to find that the system was unavailable until
about 08:39 because of database recovery.

Nov 19 05:00:11 impala master[9697]: [...] process started
Nov 19 05:00:11 impala ctl_cyrusdb[9698]: [...] recovering cyrus databases
Nov 19 05:05:10 impala ctl_mboxlist[10854]: [...] skiplist: recovered
/var/imap/mailboxes.db (61786 records, 4909724 bytes) in 9 seconds
Nov 19 08:38:54 impala ctl_cyrusdb[9698]: [...] done recovering cyrus databases
Nov 19 08:38:54 impala master[9697]: [...] ready for work
Nov 19 08:38:54 impala ctl_cyrusdb[22419]: [...] checkpointing cyrus databases

My question is: was this database recovery caused by the system realising
that the software had changed, or was it a complete coincidence?  We
restart the system three times a week at 5am and this has not happenned
before, as far as I know.

It's a bit early to say, but the number of lockers in the DBERROR db3: N
lockers is staying very low today -- rarely anything other than 2.  I'm
touching wood and crossing my fingers even as I type!

Richard
--
Richard Gilbert
Corporate Information and Computing Services
University of Sheffield, Sheffield, S10 2TN, UK
Phone: +44 114 222 3028   Fax: +44 114 222 3040


Re: Unexpected database recovery

2003-11-19 Thread Rob Siemborski
On Wed, 19 Nov 2003, Richard Gilbert wrote:

 Yesterday I applied John Wade's lock_flock patch to the version of Cyrus
 impad we were already running, i.e. 2.1.14 and rebuilt and reinstalled.
 cyrus-imapd was restarted at 5 am this morning to minimise inconvenience
 to users.  I was surprised to find that the system was unavailable until
 about 08:39 because of database recovery.

 Nov 19 05:00:11 impala master[9697]: [...] process started
 Nov 19 05:00:11 impala ctl_cyrusdb[9698]: [...] recovering cyrus databases
 Nov 19 05:05:10 impala ctl_mboxlist[10854]: [...] skiplist: recovered
   /var/imap/mailboxes.db (61786 records, 4909724 bytes) in 9 seconds
 Nov 19 08:38:54 impala ctl_cyrusdb[9698]: [...] done recovering cyrus databases
 Nov 19 08:38:54 impala master[9697]: [...] ready for work
 Nov 19 08:38:54 impala ctl_cyrusdb[22419]: [...] checkpointing cyrus databases

 My question is: was this database recovery caused by the system realising
 that the software had changed, or was it a complete coincidence?  We
 restart the system three times a week at 5am and this has not happenned
 before, as far as I know.

The lock_flock patch has serious performance implications (namely, if you
don't get a lock on the first try, you have to wait an entire second to
try again), and given that this happened just after you changed the
locking mechanism, it seems suspicious.

However, I can't think what would be causing the recovery process to lose
at getting the locks it needs, so (nothing else should be running at
that time)

FWIW, database recovery is necessary every time you restart cyrus to
ensure that the databases are in a consistant state before data is served.

-Rob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456
Research Systems Programmer * /usr/contributed Gatekeeper



Re: Unexpected database recovery

2003-11-19 Thread Henrique de Moraes Holschuh
One with deadlock problems and thinking of using the flock patch should 
read the stuff in https://bugzilla.andrew.cmu.edu/show_bug.cgi?id=1177

The POSIX alarm fix for the timeout/deadlocks stuff is working just fine
here.  Unfortunately Philipp Sacha didn't reply yet to give us a second
testimony on wether it works or not...

Philipp? Did it work?

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


Re: database recovery...

2003-09-10 Thread Scott Adkins
Well, I am not sure that it is something bizarre going on with the mmap()
method that configure chose at compile time.  I still have to do some
testing, but I am not really convinced that the 27MB/28MB sizes are tied
in to mailboxes.db being nearly the same size.
So, what kind of things would cause the process to grow?  If a user makes
a connection to IMAP, it starts out small.  Does the memory footprint grow
as they open and close folders, reading in the various cyrus files listed
in that particular folder?  I imagine that if somebody had a really big
folder (like the many around here who never delete their mail), could that
drive the footprint up a bit?
With process resuse, especially with 250 connections per process, I can
imagine that the older process will be the ones that are much bigger.
Anyways, it is really hard to pinpoint on the system.
Scott

--On Wednesday, September 10, 2003 9:28 AM -0400 Rob Siemborski 
[EMAIL PROTECTED] wrote:

On Wed, 10 Sep 2003, Scott Adkins wrote:

So, with 3000+ cyrus process averaging about 20MB each, it consumed
pretty much all our real RAM (we have 8GB on each cluster member).  I
would say about 6GB of memory was consumed in just Cyrus processes.
This sounds like something bizarre is going on with what cyrus chose for
its mmap() method.  (Or the Tru64 mmap is doing something silly in terms
of memory allocation).
Since this didn't really change between 2.0 and 2.1, I don't offhand know
what to blame (though perhaps a change of database formats could do this
also--since skiplist will grow much larger than flat for the same data).
-Rob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456
Research Systems Programmer * /usr/contributed Gatekeeper


--
+---+
 Scott W. Adkinshttp://www.cns.ohiou.edu/~sadkins/
  UNIX Systems Engineer  mailto:[EMAIL PROTECTED]
   ICQ 7626282 Work (740)593-9478 Fax (740)593-1944
+---+
PGP Public Key available at http://www.cns.ohiou.edu/~sadkins/pgp/

pgp0.pgp
Description: PGP signature


Re: database recovery...

2003-09-10 Thread Rob Siemborski
On Wed, 10 Sep 2003, Scott Adkins wrote:

 Well, I am not sure that it is something bizarre going on with the mmap()
 method that configure chose at compile time.  I still have to do some
 testing, but I am not really convinced that the 27MB/28MB sizes are tied
 in to mailboxes.db being nearly the same size.

Well, since they are all consistantly the same size, and it is something
that is close, it is pretty suspect to me, but...

 So, what kind of things would cause the process to grow?  If a user makes
 a connection to IMAP, it starts out small.  Does the memory footprint grow
 as they open and close folders, reading in the various cyrus files listed
 in that particular folder?  I imagine that if somebody had a really big
 folder (like the many around here who never delete their mail), could that
 drive the footprint up a bit?

A large folder would have a large uid-seq# map, and threading operations
(along with some others) would take more memory.  But unless you have one
obviously huge folder, I can't explain why all the large processes are the
same size with this method.

-Rob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456
Research Systems Programmer * /usr/contributed Gatekeeper



Re: database recovery...

2003-09-10 Thread Rob Siemborski
On Wed, 10 Sep 2003, Igor Brezac wrote:

 My installation of cyrus 2.2-CVS on Solaris 9 shows the same memory
 footprint per process.  20M is taken by mmap(), the rest is shared
 physical memory and about 1M of 'private' physical memory per process
 (imapd, pop3d, lmtpd, etc).

This is consistant with our 2.1 installation as well.

That's why I'm lead to suspect either mmap() wasn't detected properly by
cyrus (And you're therefore using map_nommap), or the mmap implementation
itself is doing something silly.

-Rob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456
Research Systems Programmer * /usr/contributed Gatekeeper



Re: database recovery...

2003-09-10 Thread Jure Pecar
On Wed, 10 Sep 2003 10:48:34 -0400 (EDT)
Igor Brezac [EMAIL PROTECTED] wrote:

 My installation of cyrus 2.2-CVS on Solaris 9 shows the same memory
 footprint per process.  20M is taken by mmap(), the rest is shared
 physical memory and about 1M of 'private' physical memory per process
 (imapd, pop3d, lmtpd, etc).


2.2.1 looks OK on linux ... process sizes are around 1.5mb.


--

Jure Pecar


Re: database recovery...

2003-09-10 Thread Rob Siemborski
On Wed, 10 Sep 2003, Scott Adkins wrote:

 So, you guys seeing around a 20MB virtual size and a 1MB resident size
 when looking at the process table, right?

Yes.

-Rob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456
Research Systems Programmer * /usr/contributed Gatekeeper



Re: database recovery...

2003-09-10 Thread Scott Adkins
So, you guys seeing around a 20MB virtual size and a 1MB resident size
when looking at the process table, right?
Scott

--On Wednesday, September 10, 2003 11:09 AM -0400 Rob Siemborski 
[EMAIL PROTECTED] wrote:

On Wed, 10 Sep 2003, Igor Brezac wrote:

My installation of cyrus 2.2-CVS on Solaris 9 shows the same memory
footprint per process.  20M is taken by mmap(), the rest is shared
physical memory and about 1M of 'private' physical memory per process
(imapd, pop3d, lmtpd, etc).
This is consistant with our 2.1 installation as well.

That's why I'm lead to suspect either mmap() wasn't detected properly by
cyrus (And you're therefore using map_nommap), or the mmap implementation
itself is doing something silly.
-Rob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456
Research Systems Programmer * /usr/contributed Gatekeeper


--
+---+
 Scott W. Adkinshttp://www.cns.ohiou.edu/~sadkins/
  UNIX Systems Engineer  mailto:[EMAIL PROTECTED]
   ICQ 7626282 Work (740)593-9478 Fax (740)593-1944
+---+
PGP Public Key available at http://www.cns.ohiou.edu/~sadkins/pgp/

pgp0.pgp
Description: PGP signature


Re: database recovery...

2003-09-10 Thread Igor Brezac

On Wed, 10 Sep 2003, Scott Adkins wrote:

 So, you guys seeing around a 20MB virtual size and a 1MB resident size
 when looking at the process table, right?

Yes.  There is also 5-6MB shared 'resident' size (libs, etc).


 Scott

 --On Wednesday, September 10, 2003 11:09 AM -0400 Rob Siemborski
 [EMAIL PROTECTED] wrote:

  On Wed, 10 Sep 2003, Igor Brezac wrote:
 
  My installation of cyrus 2.2-CVS on Solaris 9 shows the same memory
  footprint per process.  20M is taken by mmap(), the rest is shared
  physical memory and about 1M of 'private' physical memory per process
  (imapd, pop3d, lmtpd, etc).
 
  This is consistant with our 2.1 installation as well.
 
  That's why I'm lead to suspect either mmap() wasn't detected properly by
  cyrus (And you're therefore using map_nommap), or the mmap implementation
  itself is doing something silly.
 
  -Rob
 
  -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456
  Research Systems Programmer * /usr/contributed Gatekeeper
 





-- 
Igor


database recovery...

2003-09-09 Thread Scott Adkins
We are running a Tru64 TruCluster system.  We have 2 members in the cluster
and run Cyrus IMAP 2.2.1b.  We typically ran the system with Cyrus being
CAA'd and only running on one member at a time.  The stuff would relocate
to the other cluster member if for some reason it could not run on the first
one or we had to take it down for maintenance or whatever.
Well, it appears that this new version uses a lot more memory than the 
2.0.16
version did, with a lot of the processes settling on 27MB or 28MB of 
resident
memory in use (not virtual memory, which the processes always indicate has
more in use, but real memory in use).  On Tru64, there is no way to 
determine
exactly where that memory is going, unlike Solaris where you can run any of
the proc tools, like pmap, to get a break down of what memory is shared,
what is stored in the heap and what is consumed by the stack.  Running lsof
doesn't help, as they all show the same thing... interestingly enough, our
mailboxes.db file is about 27MB in size, but I can find a lot of processes
that are only a couple megabytes in size and that file is opened with them
as well, so I think it is just a coincidence.  Has anyone else noticed the
larger memory footprint?

So, with 3000+ cyrus process averaging about 20MB each, it consumed pretty
much all our real RAM (we have 8GB on each cluster member).  I would say
about 6GB of memory was consumed in just Cyrus processes.
We decided to run Cyrus on both cluster members at the same time.  Since
we are using a cluster file system which uses flock() to keep things working
properly, it shouldn't be a problem.  For those not familiar with Tru64's
cluster file system, this is not NFS.  It is basically a local file system
as far as each member is concerned, but it is shared like NFS on all the
members.
Anyways, as Cyrus starts up on each member, it runs the ctl_cyrusdb -r
command.  The problem with that is that it runs it on each member at the
same time (if I start them at the same time), so mailboxes.db has two
of these processes hitting it at the same time.  Worse, one member may
finish faster than the other and start accepting connections before the
other member has completed the recovery process.
This doesn't appear to cause any side effects, but I would like to know
if there would be any from this... especially if users are hitting the
file while a recovery is in progress.
Also, it takes a really *really* long time for the recovery process to
run, which means even a simple restart is felt by all, as it takes several
minutes for it to complete.  In 2.0.16 with a flat file database, there
was no wait at all for the restart to occur, and most people may not even
notice it, since their email clients would silently reopen IMAP connections
that were closed on them.
Is there any way to shorten the duration of the recovery process?  For
instance, increasing the frequency of checkpoints considerably is one idea
I have... would that help?  Is there a point that I could do the recovery
process on a schedule (like once a night) instead of running it at startup
time to cut down on the overhead?
Anyways, I am looking for some insight into this process...

Thanks,
Scott
--
+---+
 Scott W. Adkinshttp://www.cns.ohiou.edu/~sadkins/
  UNIX Systems Engineer  mailto:[EMAIL PROTECTED]
   ICQ 7626282 Work (740)593-9478 Fax (740)593-1944
+---+
PGP Public Key available at http://www.cns.ohiou.edu/~sadkins/pgp/

pgp0.pgp
Description: PGP signature


Database recovery

2002-10-06 Thread Norbert Warmuth

Hi,

assuming following configuration:
(linux) cluster with two nodes and shared scsi
db3 as database backends

Will cyrus be able to recover databases on failover and get a
consistent state? 

Does cyrus recover the databases automatically or do you need to do
manual recovery? The documentation mentions automatic recovery on
startup but it does not mention the database backends for which this
is true. 

Glancing at the code it seems following database backends do automatic
recovery:
skiplist
db3
db3_nosync

Is this true?

Is cyrus able to detect corrupted databases?

Thanks in advance,
Norbert