Re: Another 2.4 upgrade horror story

2012-09-30 Thread Bryan Hill

On Sep 25, 2012, at 11:57 AM, Deniss cy...@sad.lv wrote:

 
 
 On 25.09.2012 15:28, Eric Luyten wrote:
 On Tue, September 25, 2012 2:01 pm, Sebastian Hagedorn wrote:
 Hi,
 
 
 about three weeks ago we upgraded our Cyrus installation from 2.3.x to 
 2.4.16.
 We were aware of the reindexing issue, so we took precautionary
 measures, but they didn't help a lot. We've got about 7 TB of mail data for
 almost 200,000 mailboxes. We did the upgrade on a Sunday and had told our
 users that mail access wouldn't be possible for the whole day. After the
 actual software upgrade we ran distributed scripts that triggered the index
 upgrades. We started with the largest mailboxes. The idea was that after 
 those
 that took the longest had been upgraded, the rest should be OK overnight and
 early Monday. However, even though our storage infrastructure was kept at 
 99 %
 I/O saturation, progress was much slower than anticipated.
 
 
 Ultimately the server was virtually unuseable for the whole Monday and
 parts of Tuesday. The last mailbox was finally upgraded on Thursday, 
 although
 on Wednesday most things were already working normally.
 
 I realize that some of our problems were caused by infrastructure that's
 not up to current standards, but nonetheless I would really urge you to 
 never
 again use an upgrade mechanism like that. Give admins a chance to upgrade
 indexes in the background and over time.
 
 
 +1
 
 
 Sebastian,
 
 
 Thank you for sharing your experiences.
 
 As a site willing/needing to upgrade from 2.3.16 to 2.4.X this fall, we
 are interested in learning about your storage backend characteristics.
 
 What read/write IOPS rates were you registering before/during/after your
 upgrade process ?
 
 I'd understand your reluctance to share this information in a public forum.
 No offence taken whatsoever !
 
 
 Kind regards,
 Eric Luyten, Computing Centre VUB/ULB, eric.luy...@vub.ac.be
 
 
 migration process from 2.3 to 2.4 took ~ one year for our installation. 
 we converted ~200Tb of users data.
 first step we did - spread data on many nodes using cyrus replication.
 next we started converting nodes one by one at weekends nights to 
 minimize IO load generated by users.
 in fact cyrus read all data from disk to generate new indexes, so 
 convert is limited by disk IO mainly while CPU is pretty cheap nowadays.
 we got around 500Gb in 8 hours rate for forced reindex with 100% disk load.
 we started forced reindex with most active users meanwhile allowing 
 users to login and trigger reindex of their mailboxes
 
 


Sorry for hi-jacking this thread, but I'm curious as to the preferred method of 
forcing a reindex on a mailbox?  I know it triggers when a user logs in and 
accesses the mailbox.  I would like to divide up users and perform the reindex 
in chunks.  

Thanks,
Bryan

---
Bryan D. Hill
UCSD Physics Computing Facility
CTBP Systems Support

9500 Gilman Dr.  # 0319
La Jolla, CA 92093
+1-858-534-5538
bh...@ucsd.edu
AIM:  pozvibesd
Web:  http://www.physics.ucsd.edu/pcf


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Another 2.4 upgrade horror story

2012-09-30 Thread Eric Luyten
On Sun, September 30, 2012 6:47 pm, Bryan Hill wrote:

...
 Sorry for hi-jacking this thread, but I'm curious as to the preferred method
 of forcing a reindex on a mailbox?  I know it triggers when a user logs in
 and accesses the mailbox.  I would like to divide up users and perform the
 reindex in chunks.


Bryan,


We found out that a Cyrus quota fix (quota -f ...) only regenerates
metadata for old format mailboxes, whereas reconstruct -r does 'em
all, also the already-converted.


Cheers,
Eric Luyten, Computing Centre VUB/ULB.




Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Another 2.4 upgrade horror story

2012-09-30 Thread Bron Gondwana
On Sun, Sep 30, 2012, at 09:46 PM, Eric Luyten wrote:
 On Sun, September 30, 2012 6:47 pm, Bryan Hill wrote:
 
 ...
  Sorry for hi-jacking this thread, but I'm curious as to the preferred method
  of forcing a reindex on a mailbox?  I know it triggers when a user logs in
  and accesses the mailbox.  I would like to divide up users and perform the
  reindex in chunks.
 
 
 Bryan,
 
 
 We found out that a Cyrus quota fix (quota -f ...) only regenerates
 metadata for old format mailboxes, whereas reconstruct -r does 'em
 all, also the already-converted.

cyr_expire touches all the mailboxes in order...

Bron.
-- 
  Bron Gondwana
  br...@fastmail.fm


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Another 2.4 upgrade horror story

2012-09-30 Thread Bryan Hill

On Sep 30, 2012, at 1:36 PM, Bron Gondwana br...@fastmail.fm wrote:

 On Sun, Sep 30, 2012, at 09:46 PM, Eric Luyten wrote:
 On Sun, September 30, 2012 6:47 pm, Bryan Hill wrote:
 
 ...
 Sorry for hi-jacking this thread, but I'm curious as to the preferred method
 of forcing a reindex on a mailbox?  I know it triggers when a user logs in
 and accesses the mailbox.  I would like to divide up users and perform the
 reindex in chunks.
 
 
 Bryan,
 
 
 We found out that a Cyrus quota fix (quota -f ...) only regenerates
 metadata for old format mailboxes, whereas reconstruct -r does 'em
 all, also the already-converted.
 
 cyr_expire touches all the mailboxes in order...
 
 Bron.
 -- 
  Bron Gondwana
  br...@fastmail.fm
 

Ah, nice.  I'll look at this too.  Thanks Bron!

Thanks,
Bryan

---
Bryan D. Hill
UCSD Physics Computing Facility
CTBP Systems Support

9500 Gilman Dr.  # 0319
La Jolla, CA 92093
+1-858-534-5538
bh...@ucsd.edu
AIM:  pozvibesd
Web:  http://www.physics.ucsd.edu/pcf


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Another 2.4 upgrade horror story

2012-09-27 Thread Sebastian Hagedorn

Hi Eric,

--On 25. September 2012 14:28:03 +0200 Eric Luyten eric.luy...@vub.ac.be 
wrote:



Thank you for sharing your experiences.

As a site willing/needing to upgrade from 2.3.16 to 2.4.X this fall, we
are interested in learning about your storage backend characteristics.

What read/write IOPS rates were you registering before/during/after your
upgrade process ?

I'd understand your reluctance to share this information in a public
forum. No offence taken whatsoever !


no problem, it just took me a while to gather the information. Our backends 
are IBM DS4300s. Some of the disks are 73 GB 15k RPM Fibre Channel Disks 
(RAID 5), others are 146 GB 10k RPM Fibre Channel Disks. The SAN 
controllers are IBM SVCs (model 2145 8F4). The load is balanced (though not 
evenly) over four disk controllers. According to our storage guy we see 600 
IOps/s on average, with peaks up to 2,000-3,000 under normal circumstances. 
During the migration we saw 10,000 IOps/s per controller, 40,000 in sum.


Hope this helps

Sebastian
--
.:.Sebastian Hagedorn - RZKR-W (Gebäude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
.:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.

p7sVI2BLiozHf.p7s
Description: S/MIME cryptographic signature

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Another 2.4 upgrade horror story

2012-09-26 Thread Bron Gondwana
On Wed, Sep 26, 2012, at 01:25 AM, Wolfgang Breyha wrote:
 On 2012-09-25 19:05, Simon Beale wrote:
  The only gotcha I experienced was I forgot that cyrus was configured to
  hardlink mail, which of course was no longer the case after each mailbox
  was migrated, so my disk usage exploded. (But easily fixed/restored once
  identified).
 
 What did you use for restoring the hardlinks? freedup as well?
 
 I'm asking because I found a bug in freedup causing dataloss. I already
 sent a patch fixing it to the author of freedup last november, but he
 didn't release a new version yet.

I have a script for doing it - though only within a single user...

This is the core of the link logic:

print fixing up files for $guid ($srcname)\n;
foreach my $file (@others) {
  my $tmpfile = $file . tmp;
  print link error $tmpfile\n unless link($srcname, $tmpfile);
  chown($uid, $gid, $tmpfile);
  chmod(0600, $tmpfile);
  print rename error $file\n unless rename($tmpfile, $file);
}

I suspect your fixup is similar :)

Bron.
-- 
  Bron Gondwana
  br...@fastmail.fm


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Another 2.4 upgrade horror story

2012-09-26 Thread Janne Peltonen
Hi!

On Tue, Sep 25, 2012 at 05:55:27PM +0200, Wolfgang Breyha wrote:
 Time was not the limiting factor for us. Availability and safety of our
 mailboxes was. Nobody can guarantee you that the migration works out
 flawlessly. If moving one mailbox fails I have troubles with exactly one
 mailbox. If reconstruction of one mailbox fails while migrating hard from 2.3
 to 2.4 your system is down even longer. And we had a couple of problematic
 mailboxes as partly documented on this mailinglist;-)

Um. Isn't it still only that one mailbox that couldn't be reconstructed that's
unaccessible, with the system as a whole up and running and everything else
accessible and working? Or am I missing something? I mean, each mailbox gets
recontsructed separately. And by having tested the migration, we had reason to
believe most if not all of the mailboxes would get reconstructed OK, so we
didn't expect that any significant number of users would see any trouble except
maybe slowness for a short while.

As it happened, we didn't have any trouble with the reconstruction of any
mailboxes, the process went flawlessly - it was just slow. Also, it did cause
observable slowness only because at first, I hadn't reconstructed the mailboxes
systematically but had trusted the system to do it on demand. After the
business day and daily peak usage was over, I set up such a number of
reconstruction processes that the system didn't choke on them, and everything
(~60k mailboxes) was reconstructed before next morning.

 In detail I
 *) check for active sessions on the mailbox
 *) lock the mailbox by denying access via userdeny.db
 *) move the mailbox
 *) unlock it

Yeah, that process is identical to the one we use to move mailboxes between
backends to balance their disk usage. (The mailbox-moving part of the process,
that is; the algorithm to decide how many mailboxes to move from where and
whence is another matter.)


--Janne
-- 
Janne Peltonen janne.pelto...@helsinki.fi PGP Key ID: 0x9CFAC88B
Please consider membership of the Hospitality Club 
(http://www.hospitalityclub.org)

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Another 2.4 upgrade horror story

2012-09-26 Thread Sebastian Hagedorn
--On 26. September 2012 09:42:17 +0300 Janne Peltonen 
janne.pelto...@helsinki.fi wrote:



As it happened, we didn't have any trouble with the reconstruction of any
mailboxes, the process went flawlessly - it was just slow.


Same here.


Also, it did
cause observable slowness only because at first, I hadn't reconstructed
the mailboxes systematically but had trusted the system to do it on
demand. After the business day and daily peak usage was over, I set up
such a number of reconstruction processes that the system didn't choke on
them, and everything (~60k mailboxes) was reconstructed before next
morning.


We did it systematically from the start, but that wasn't enough.
--
.:.Sebastian Hagedorn - RZKR-W (Gebäude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
.:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.

p7srJlsQUqHxg.p7s
Description: S/MIME cryptographic signature

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Re: Another 2.4 upgrade horror story

2012-09-26 Thread Sebastian Hagedorn

Hi,

I've got questions regarding the procedure you describe. I'm trying to wrap 
my head around the various possible approaches to replication and 
clustering.


--On 25. September 2012 21:57:49 +0300 Deniss cy...@sad.lv wrote:


migration process from 2.3 to 2.4 took ~ one year for our installation.
we converted ~200Tb of users data.
first step we did - spread data on many nodes using cyrus replication.


The official documentation for replication seems to be this one:

http://cyrusimap.web.cmu.edu/docs/cyrus-imapd/2.4.0/install-replication.php

The way I read that, replication is all or nothing. So did each of the 
nodes have the whole 200 TB? If not, how did you achieve that? Did you have 
a murder with multiple backends to begin with?



next we started converting nodes one by one at weekends nights to
minimize IO load generated by users.


How does replication work across Cyrus versions? I assume it wouldn't have 
been possible to create a new 2.4 replica from an existing 2.3 master?


Thanks, Sebastian
--
.:.Sebastian Hagedorn - RZKR-W (Gebäude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
.:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.

p7sDHOgEW45Y1.p7s
Description: S/MIME cryptographic signature

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Another 2.4 upgrade horror story

2012-09-26 Thread Deniss

On 2012.09.26. 10:24, Sebastian Hagedorn wrote:
 Hi,
 
 I've got questions regarding the procedure you describe. I'm trying to
 wrap my head around the various possible approaches to replication and
 clustering.
 
 --On 25. September 2012 21:57:49 +0300 Deniss cy...@sad.lv wrote:
 
 migration process from 2.3 to 2.4 took ~ one year for our installation.
 we converted ~200Tb of users data.
 first step we did - spread data on many nodes using cyrus replication.
 
 The official documentation for replication seems to be this one:
 
 http://cyrusimap.web.cmu.edu/docs/cyrus-imapd/2.4.0/install-replication.php
 
 
 The way I read that, replication is all or nothing. So did each of the
 nodes have the whole 200 TB? If not, how did you achieve that? Did you
 have a murder with multiple backends to begin with?

Our system's design allows us to seamless move mailboxes across cyrus
backends one by one using sync_client. We have no murder.
Each node had relatively small list of mailboxes when we started convert
on it. After convert we aggregate mailboxes back.

 
 next we started converting nodes one by one at weekends nights to
 minimize IO load generated by users.
 
 How does replication work across Cyrus versions? I assume it wouldn't
 have been possible to create a new 2.4 replica from an existing 2.3 master?

it not works

 
 Thanks, Sebastian

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Another 2.4 upgrade horror story

2012-09-26 Thread Wolfgang Breyha
Simon Matter wrote, on 26.09.2012 06:58:
 I have not used freedup for restoring the hardlinks but I'm interested in
 the patch. If it's not big could you post it here?

Sure! attached.

I built a RPM based on
http://pkgs.repoforge.org/freedup/freedup-1.5.3-1.rf.src.rpm
and latest source.
But building based on
http://www.freedup.org/freedup-1.6-2.src.rpm
should work as well.

Greetings, Wolfgang
-- 
Wolfgang Breyha wbre...@gmx.net | http://www.blafasel.at/
Vienna University Computer Center | Austria

--- freedup-1.6/freedup.c.orig	2011-02-04 08:22:15.0 +0100
+++ freedup-1.6/freedup.c	2011-11-11 10:52:24.788733835 +0100
@@ -613,7 +613,7 @@
 	 */
 	if( mktemp(tmpfilename) == NULL )
 	{
-	perror(There is no unique temporory file name.);
+	perror(There is no unique temporary file name.);
 	}
 	if( dirmtime!=0 )
 	{
@@ -628,7 +628,7 @@
 	if( lstat(tmpfilename,tstat) != 0 )
 	{
 	/*
-	 * The errror needs not to be catched, since it is wanted
+	 * The error needs not to be catched, since it is wanted
 	 * that no file exists with the target name
 	 */
 	rename( bname, tmpfilename );
@@ -643,12 +643,23 @@
 	}
 	if( lnk( symaname, bname ) != 0 )
 	{
-	perror(Linking failed.);
-	}
-	if( unlink( tmpfilename ) != 0 )
-	{
-	perror(Unlink failed.);
-	}
+	// linking failed! try to move original in place again and
+	// log that fact
+	fprintf(stderr, Linking failed. Trying roleback: \%s\, bname);
+	if ( rename( tmpfilename, bname ) != 0 )
+	{
+		// moving old file in place again failed.
+		// at least log that -v
+		fprintf(stderr, unable to rename: \%s\, tmpfilename);
+	}
+	}
+	else 
+	// unlink renamed original only of linking was successful
+	if( unlink( tmpfilename ) != 0 )
+	{
+		// unlinking failed! log that -v
+		fprintf(stderr, Unlink failed: \%s\, tmpfilename);
+	}
 	if( (dirmtime!=0)  (gotdirtime!=0) )
 	{
 	utimecache.actime  = dstat.st_atime;

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Another 2.4 upgrade horror story

2012-09-25 Thread Sebastian Hagedorn

Hi,

about three weeks ago we upgraded our Cyrus installation from 2.3.x to 
2.4.16. We were aware of the reindexing issue, so we took precautionary 
measures, but they didn't help a lot. We've got about 7 TB of mail data for 
almost 200,000 mailboxes. We did the upgrade on a Sunday and had told our 
users that mail access wouldn't be possible for the whole day. After the 
actual software upgrade we ran distributed scripts that triggered the index 
upgrades. We started with the largest mailboxes. The idea was that after 
those that took the longest had been upgraded, the rest should be OK 
overnight and early Monday. However, even though our storage infrastructure 
was kept at 99 % I/O saturation, progress was much slower than anticipated.


Ultimately the server was virtually unuseable for the whole Monday and 
parts of Tuesday. The last mailbox was finally upgraded on Thursday, 
although on Wednesday most things were already working normally.


I realize that some of our problems were caused by infrastructure that's 
not up to current standards, but nonetheless I would really urge you to 
never again use an upgrade mechanism like that. Give admins a chance to 
upgrade indexes in the background and over time.


Sebastian
--
.:.Sebastian Hagedorn - RZKR-W (Gebäude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
.:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.

p7s6Xo4Kvaea1.p7s
Description: S/MIME cryptographic signature

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Another 2.4 upgrade horror story

2012-09-25 Thread Eric Luyten
On Tue, September 25, 2012 2:01 pm, Sebastian Hagedorn wrote:
 Hi,


 about three weeks ago we upgraded our Cyrus installation from 2.3.x to 2.4.16.
 We were aware of the reindexing issue, so we took precautionary
 measures, but they didn't help a lot. We've got about 7 TB of mail data for
 almost 200,000 mailboxes. We did the upgrade on a Sunday and had told our
 users that mail access wouldn't be possible for the whole day. After the
 actual software upgrade we ran distributed scripts that triggered the index
 upgrades. We started with the largest mailboxes. The idea was that after those
 that took the longest had been upgraded, the rest should be OK overnight and
 early Monday. However, even though our storage infrastructure was kept at 99 %
 I/O saturation, progress was much slower than anticipated.


 Ultimately the server was virtually unuseable for the whole Monday and
 parts of Tuesday. The last mailbox was finally upgraded on Thursday, although
 on Wednesday most things were already working normally.

 I realize that some of our problems were caused by infrastructure that's
 not up to current standards, but nonetheless I would really urge you to never
 again use an upgrade mechanism like that. Give admins a chance to upgrade
 indexes in the background and over time.


+1


Sebastian,


Thank you for sharing your experiences.

As a site willing/needing to upgrade from 2.3.16 to 2.4.X this fall, we
are interested in learning about your storage backend characteristics.

What read/write IOPS rates were you registering before/during/after your
upgrade process ?

I'd understand your reluctance to share this information in a public forum.
No offence taken whatsoever !


Kind regards,
Eric Luyten, Computing Centre VUB/ULB, eric.luy...@vub.ac.be




Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Another 2.4 upgrade horror story

2012-09-25 Thread Ben Carter
Can you tell us more about your storage configuration?

Ben

-- 
Ben Carter
University of Pittsburgh/CSSD
b...@pitt.edu
412-624-6470

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Another 2.4 upgrade horror story

2012-09-25 Thread Wolfgang Breyha
Sebastian Hagedorn wrote, on 25.09.2012 14:01:
 I realize that some of our problems were caused by infrastructure that's 
 not up to current standards, but nonetheless I would really urge you to 
 never again use an upgrade mechanism like that. Give admins a chance to 
 upgrade indexes in the background and over time.

There is such an upgrade path using a murder environment and moving mailboxes
between backends. We used that for our 150k user infrastructure and had no IO
headaches at all. It was a good moment to update distribution, filesystems,
hardware,  as well.

I tested conversion speed from 2.3 to 2.4 on about 100 mailboxes and it was
pretty obvious that touching all our mailboxes in one shot is clearly
impossible without unreasonable downtime.

Greetings, Wolfgang
-- 
Wolfgang Breyha wbre...@gmx.net | http://www.blafasel.at/
Vienna University Computer Center | Austria


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Another 2.4 upgrade horror story

2012-09-25 Thread Janne Peltonen
Hi!

On Tue, Sep 25, 2012 at 03:25:45PM +0200, Wolfgang Breyha wrote:
 Sebastian Hagedorn wrote, on 25.09.2012 14:01:
  I realize that some of our problems were caused by infrastructure that's 
  not up to current standards, but nonetheless I would really urge you to 
  never again use an upgrade mechanism like that. Give admins a chance to 
  upgrade indexes in the background and over time.
 
 There is such an upgrade path using a murder environment and moving mailboxes
 between backends. We used that for our 150k user infrastructure and had no IO
 headaches at all. It was a good moment to update distribution, filesystems,
 hardware,  as well.

Could you elaborate on that? I considered that option, but seeing as moving
even a couple dozen users from a backend to another using RENAME takes hours
and one backend contains thousands of users, I decided to just live with the ~1
day of unbearable slowness. Or do you know of a fast way?


--Janne
-- 
Janne Peltonen janne.pelto...@helsinki.fi PGP Key ID: 0x9CFAC88B

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Another 2.4 upgrade horror story

2012-09-25 Thread Wolfgang Breyha
Janne Peltonen wrote, on 25.09.2012 15:34:
 Could you elaborate on that? I considered that option, but seeing as moving
 even a couple dozen users from a backend to another using RENAME takes hours
 and one backend contains thousands of users, I decided to just live with the 
 ~1
 day of unbearable slowness. Or do you know of a fast way?

Time was not the limiting factor for us. Availability and safety of our
mailboxes was. Nobody can guarantee you that the migration works out
flawlessly. If moving one mailbox fails I have troubles with exactly one
mailbox. If reconstruction of one mailbox fails while migrating hard from 2.3
to 2.4 your system is down even longer. And we had a couple of problematic
mailboxes as partly documented on this mailinglist;-)

We had 5 backend stores and have 6 now. Complete migration of our 150k users
with ~22TB took about 3 or 4 month. Including building the machines,
requesting storage, etc...

But you can't have less headache then moving mailboxes from backend to backend.

In detail I
*) check for active sessions on the mailbox
*) lock the mailbox by denying access via userdeny.db
*) move the mailbox
*) unlock it

Greetings, Wolfgang
-- 
Wolfgang Breyha wbre...@gmx.net | http://www.blafasel.at/
Vienna University Computer Center | Austria


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Another 2.4 upgrade horror story

2012-09-25 Thread Simon Beale
 Hi!

 On Tue, Sep 25, 2012 at 03:25:45PM +0200, Wolfgang Breyha wrote:
 Sebastian Hagedorn wrote, on 25.09.2012 14:01:
  I realize that some of our problems were caused by infrastructure
 that's
  not up to current standards, but nonetheless I would really urge you
 to
  never again use an upgrade mechanism like that. Give admins a chance
 to
  upgrade indexes in the background and over time.

 There is such an upgrade path using a murder environment and moving
 mailboxes
 between backends. We used that for our 150k user infrastructure and had
 no IO
 headaches at all. It was a good moment to update distribution,
 filesystems,
 hardware,  as well.

 Could you elaborate on that? I considered that option, but seeing as
 moving
 even a couple dozen users from a backend to another using RENAME takes
 hours
 and one backend contains thousands of users, I decided to just live with
 the ~1
 day of unbearable slowness. Or do you know of a fast way?

I did the migration by moving mailboxes between backends of 1TB each,
having scripted it up to only move employees when it was 00:00 - 06:00 in
their local timezone, and left the script running on each v2.3 backend for
a few days. Took a few weeks in all to migrate and upgrade all our
backends in turn, but no one experienced any downtime.

The only gotcha I experienced was I forgot that cyrus was configured to
hardlink mail, which of course was no longer the case after each mailbox
was migrated, so my disk usage exploded. (But easily fixed/restored once
identified).

It comes down to having spare backend(s) to move people on to, and time to
do it patiently.

Simon


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Re: Another 2.4 upgrade horror story

2012-09-25 Thread Deniss


On 25.09.2012 15:28, Eric Luyten wrote:
 On Tue, September 25, 2012 2:01 pm, Sebastian Hagedorn wrote:
 Hi,


 about three weeks ago we upgraded our Cyrus installation from 2.3.x to 
 2.4.16.
 We were aware of the reindexing issue, so we took precautionary
 measures, but they didn't help a lot. We've got about 7 TB of mail data for
 almost 200,000 mailboxes. We did the upgrade on a Sunday and had told our
 users that mail access wouldn't be possible for the whole day. After the
 actual software upgrade we ran distributed scripts that triggered the index
 upgrades. We started with the largest mailboxes. The idea was that after 
 those
 that took the longest had been upgraded, the rest should be OK overnight and
 early Monday. However, even though our storage infrastructure was kept at 99 
 %
 I/O saturation, progress was much slower than anticipated.


 Ultimately the server was virtually unuseable for the whole Monday and
 parts of Tuesday. The last mailbox was finally upgraded on Thursday, although
 on Wednesday most things were already working normally.

 I realize that some of our problems were caused by infrastructure that's
 not up to current standards, but nonetheless I would really urge you to never
 again use an upgrade mechanism like that. Give admins a chance to upgrade
 indexes in the background and over time.


 +1


 Sebastian,


 Thank you for sharing your experiences.

 As a site willing/needing to upgrade from 2.3.16 to 2.4.X this fall, we
 are interested in learning about your storage backend characteristics.

 What read/write IOPS rates were you registering before/during/after your
 upgrade process ?

 I'd understand your reluctance to share this information in a public forum.
 No offence taken whatsoever !


 Kind regards,
 Eric Luyten, Computing Centre VUB/ULB, eric.luy...@vub.ac.be


migration process from 2.3 to 2.4 took ~ one year for our installation. 
we converted ~200Tb of users data.
first step we did - spread data on many nodes using cyrus replication.
next we started converting nodes one by one at weekends nights to 
minimize IO load generated by users.
in fact cyrus read all data from disk to generate new indexes, so 
convert is limited by disk IO mainly while CPU is pretty cheap nowadays.
we got around 500Gb in 8 hours rate for forced reindex with 100% disk load.
we started forced reindex with most active users meanwhile allowing 
users to login and trigger reindex of their mailboxes


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Another 2.4 upgrade horror story

2012-09-25 Thread Wolfgang Breyha
On 2012-09-25 19:05, Simon Beale wrote:
 The only gotcha I experienced was I forgot that cyrus was configured to
 hardlink mail, which of course was no longer the case after each mailbox
 was migrated, so my disk usage exploded. (But easily fixed/restored once
 identified).

What did you use for restoring the hardlinks? freedup as well?

I'm asking because I found a bug in freedup causing dataloss. I already
sent a patch fixing it to the author of freedup last november, but he
didn't release a new version yet.

In case cyr_expire is running while freedup tries to hardlink files it is
possible to loose both the source freedup wants to link to and the copy
freedup still removes on error. Running cyr_expire and freedup (up to
1.6-2) together is a really bad idea.

If it's of interest I can provide my patch here, too.

Greetings, Wolfgang
-- 
Wolfgang Breyha wbre...@gmx.net | http://www.blafasel.at/
Vienna University Computer Center | Austria

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Another 2.4 upgrade horror story

2012-09-25 Thread Simon Matter
 On 2012-09-25 19:05, Simon Beale wrote:
 The only gotcha I experienced was I forgot that cyrus was configured to
 hardlink mail, which of course was no longer the case after each mailbox
 was migrated, so my disk usage exploded. (But easily fixed/restored once
 identified).

 What did you use for restoring the hardlinks? freedup as well?

 I'm asking because I found a bug in freedup causing dataloss. I already
 sent a patch fixing it to the author of freedup last november, but he
 didn't release a new version yet.

 In case cyr_expire is running while freedup tries to hardlink files it is
 possible to loose both the source freedup wants to link to and the copy
 freedup still removes on error. Running cyr_expire and freedup (up to
 1.6-2) together is a really bad idea.

 If it's of interest I can provide my patch here, too.

Hi,

I have not used freedup for restoring the hardlinks but I'm interested in
the patch. If it's not big could you post it here?

Thanks,
Simon


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus