Re: High Availability
El abr 22, 2015 12:51 AM, Bron Gondwana br...@fastmail.fm escribió: On Wed, Apr 22, 2015, at 02:27 PM, Ciro Iriarte wrote: Interesting, is the use of several instances needed because cyrus cannot scale with threads in a single instance scenario? There are two interesting reasons: 1) global locks. There are some - mailboxes.db for example. If you have multiple instances on a single machine, then a lock never blocks up the entire machine. 2) replication and load spreading - right now there's no support for partial replica - a Cyrus instance replicates every mailbox to its replica. The second one is the kicker. If we replicated everything from one machine to another machine, then we'd have 100% user load on one machine and nothing on the other - not efficient use of resources, because the second one needs to have the capacity to run at 100% in a failover situation too. Our first thought was to run two instances per machine and pair them - so there was a master on one and a replica on the other. At least then we're running equally in the general situation, and only in a failover situation are we loaded 100%. But it's still nasty - you go from 50% load to 100% load. So we have about 10 different replicas for each machine, and every machine is running at 50% capacity. If we need to take one machine down, then 10 other machines run at 55% capacity instead for that time. The load change is much less. (as of about a year ago, we're fully paired odd-host-number to even-host-number, and odd and even are in different cabinets, so we can shut down an entire cabinet by raising the load on its replicas) Bron. -- Bron Gondwana br...@fastmail.fm Hi Bron, it makes sense from that perspective although it seems to imply a management nightmare. Do you use any management/automation (webscale if you want) framework?. Regards, Ciro Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
Re: High Availability
On Wed, Apr 22, 2015, at 11:32 PM, Ciro Iriarte wrote: Hi Bron, it makes sense from that perspective although it seems to imply a management nightmare. Do you use any management/automation (webscale if you want) framework?. Less than you might imagine :) We have a single file (production.dat) which contains all the layout information mapping from machines to slot numbers, and slot numbers to disks, for example: i30 30 t15 0 1000 e 10.202.80.1 Which says that slots sloti30t01 through sloti30t15 are on server number 30, they have a zero sized meta drive (all meta is on the SSD) and a 1000 Mb sized data drive, running ext4 filesystem, and IP addresses from 10.202.80.1 through 80.15. And then a store based on that is: store23 n 0 90 sloti30t01 sloti15t03 slotti5t02 slotsi2d2t01 That's where my br...@fastmail.fm user lives - it has replicas on imap15 (New York), timap5 (Iceland) and simap2 (Amsterdam). The 'n' says that the master should live in New York, the '0' is a bit bogus actually, as we'll see in a sec, the 90 says that it has a target maximum disk usage of 90%. store254 n future 0 sloti30t15 sloti29t15 slotti1t06 slotsi1d2t40 This is a testing store, only one real user lives here, and that's my personal non-work account. All the other users are test users. The future says that it's running on the future branch of Cyrus, which is where we try out experimental code. This means that all the commands which find the correct binary for tools will look in the correct paths, like this: [brong@imap30 ~]$ cyr store254 Store: store254 Master: sloti30t15 (imap30) 10.202.80.15 Primary: sloti30t15 (imap30) 10.202.80.15 This: sloti30t15 (imap30) 10.202.80.15 Other: sloti29t15 (imap29) 10.202.79.15 Other: slotsi1d2t40 (simap1) 10.206.51.80 Other: slotti1t06 (timap1) 10.205.161.6 sudo -u cyrus /usr/cyrus-future/bin/cyr_dbtool -C /etc/cyrus/imapd-sloti30t15.conf /mnt/ssd30/sloti30t15/store254/conf/mailboxes.db twoskip sudo -u cyrus /usr/cyrus-future/bin/reconstruct -C /etc/cyrus/imapd-sloti30t15.conf sudo -u cyrus /usr/cyrus-future/bin/dav_reconstruct -C /etc/cyrus/imapd-sloti30t15.conf sudo -u cyrus /usr/cyrus-future/bin/cyr_synclog -C /etc/cyrus/imapd-sloti30t15.conf -v sudo -u cyrus /usr/cyrus-future/bin/ctl_conversationsdb -C /etc/cyrus/imapd-sloti30t15.conf sudo -u cyrus /usr/cyrus-future/bin/squatter -C /etc/cyrus/imapd-sloti30t15.conf -v -i sudo -u cyrus /usr/cyrus-future/bin/sync_client -C /etc/cyrus/imapd-sloti30t15.conf -n sloti29t15 -v sudo -u cyrus /usr/cyrus-future/bin/sync_client -C /etc/cyrus/imapd-sloti30t15.conf -n slotsi1d2t40 -v sudo -u cyrus /usr/cyrus-future/bin/sync_client -C /etc/cyrus/imapd-sloti30t15.conf -n slotti1t06 -v So I can even run 'cyr br...@fastmail.fm' and it will give me the correct commands to run for my user. If it wasn't heavily automated, it would be a pain. Configuration files are built from Perl Template-Toolkit using Makefiles and data from the production.dat file. What we don't have so much yet is automated user moves or disk layout building, though it's semi-automated. I have a script which can be told make config for 5 new stores and it will find the least used machines, within the constraints we have for placing slots, and pick out empty slots on them. For moving users, 'MultiMove.pl' knows about disk usage on backends and can pick random users on busy backends to move. Our MoveServer.pl script is very smart, it does what Ken at CMU and now Ellie have done in the upstream branch with Sync-based-XFER, but externally. It runs sync_client 3 times, plus squatter, plus cyr_expire for archiving and locks out users in the DB, fiddles caches, etc. The upshot is that the user gets about a 3 second pause, and their connections drop, then they keep on working as if nothing happened. Bron. Bron. -- Bron Gondwana br...@fastmail.fm Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
Re: High Availability
El abr 20, 2015 3:55 AM, Bron Gondwana br...@fastmail.fm escribió: (taking it back to the list in case it's useful to others) On Mon, Apr 20, 2015, at 05:45 PM, Lalot Dominique wrote: Hello Bron Unfortunately I would'nt be able to go to The Hague.. Oh well :) Just as a simple question, the only drawback of not using is that you won't be able to share folders? That's the only drawback we have. You can only share folders with users on the same server. For our family/business accounts, we just make sure all users are on the same backend. We run hundreds of servers, and use nginx as a proxy in front of them so we can move users without them knowing or having to update settings. Can we have several imap servers without using muder? Sure. We do a thing we call slots and stores, where we split each machines up into up to 40 separate instances of Cyrus with 1Tb of storage each, replicating to different machines. I had only used a simple setup, one imap server with several spools That works fine, but it wont' give you high availability. Is there some more information somewhere? Not much unfortunately. We've written about our setup many times, most recently here: http://blog.fastmail.com/2014/12/04/standalone-mail-servers/ and more detail here: https://www.fastmail.com/help/technical/architecture.html But they don't give you quite enough configuration detail to just plug and play. Our plan with the Cyrus Foundation and developing Cyrus 3.0 is to have pre-configured Docker images which you can just run and add storage, and they will work in a cluster. It's very ambitious, and we might not have it fully stable by July when we launch 3.0, but it's definitely the eventual goal. What is your timeframe for setting up this new system? Regards, Bron. -- Bron Gondwana br...@fastmail.fm Interesting, is the use of several instances needed because cyrus cannot scale with threads in a single instance scenario? Regards, Ciro Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
Re: High Availability
On Wed, Apr 22, 2015, at 02:27 PM, Ciro Iriarte wrote: Interesting, is the use of several instances needed because cyrus cannot scale with threads in a single instance scenario? There are two interesting reasons: 1) global locks. There are some - mailboxes.db for example. If you have multiple instances on a single machine, then a lock never blocks up the entire machine. 2) replication and load spreading - right now there's no support for partial replica - a Cyrus instance replicates every mailbox to its replica. The second one is the kicker. If we replicated everything from one machine to another machine, then we'd have 100% user load on one machine and nothing on the other - not efficient use of resources, because the second one needs to have the capacity to run at 100% in a failover situation too. Our first thought was to run two instances per machine and pair them - so there was a master on one and a replica on the other. At least then we're running equally in the general situation, and only in a failover situation are we loaded 100%. But it's still nasty - you go from 50% load to 100% load. So we have about 10 different replicas for each machine, and every machine is running at 50% capacity. If we need to take one machine down, then 10 other machines run at 55% capacity instead for that time. The load change is much less. (as of about a year ago, we're fully paired odd-host-number to even-host-number, and odd and even are in different cabinets, so we can shut down an entire cabinet by raising the load on its replicas) Bron. -- Bron Gondwana br...@fastmail.fm Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
Re: High Availability
(taking it back to the list in case it's useful to others) On Mon, Apr 20, 2015, at 05:45 PM, Lalot Dominique wrote: Hello Bron Unfortunately I would'nt be able to go to The Hague.. Oh well :) Just as a simple question, the only drawback of not using is that you won't be able to share folders? That's the only drawback we have. You can only share folders with users on the same server. For our family/business accounts, we just make sure all users are on the same backend. We run hundreds of servers, and use nginx as a proxy in front of them so we can move users without them knowing or having to update settings. Can we have several imap servers without using muder? Sure. We do a thing we call slots and stores, where we split each machines up into up to 40 separate instances of Cyrus with 1Tb of storage each, replicating to different machines. I had only used a simple setup, one imap server with several spools That works fine, but it wont' give you high availability. Is there some more information somewhere? Not much unfortunately. We've written about our setup many times, most recently here: http://blog.fastmail.com/2014/12/04/standalone-mail-servers/ and more detail here: https://www.fastmail.com/help/technical/architecture.html But they don't give you quite enough configuration detail to just plug and play. Our plan with the Cyrus Foundation and developing Cyrus 3.0 is to have pre-configured Docker images which you can just run and add storage, and they will work in a cluster. It's very ambitious, and we might not have it fully stable by July when we launch 3.0, but it's definitely the eventual goal. What is your timeframe for setting up this new system? Regards, Bron. -- Bron Gondwana br...@fastmail.fm Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
High Availability
Hi, We used cyrus for many years and switch to a proprietary system. We are juste looking back to cyrus. I would like to know the status of cyrus and HA: This documentation seems to consider that replication is edge.. http://cyrusimap.org/docs/cyrus-imapd/2.4.9/install-replication.php and it has been written in 2007 *Note that Cyrus replication is still relatively young in the grand scheme of things, and if you choose to deploy you are doing so at your own risk. * Is there somewhere a documentation, an howto for HA (proxies, murder and replication) Thanks Dom -- Dominique LALOT Ingénieur Systèmes et Réseaux http://annuaire.univ-amu.fr/showuser.php?uid=lalot Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
Re: High Availability
Am Montag, 20. April 2015, 08:32:52 schrieb Lalot Dominique: I would like to know the status of cyrus and HA: This documentation seems to consider that replication is edge.. http://cyrusimap.org/docs/cyrus-imapd/2.4.9/install-replication.php and it has been written in 2007 Cyrus is a product mainly developed for large scale / ISP / enterprise level applications and in good old internet terms anything is edge which is not deployed over many years in such high requesting environments with a lot of experince around. But very few commercial / proprietary solutions did really have more experience and productive field testing behind when called stable by their marketing... *Note that Cyrus replication is still relatively young in the grand scheme of things, and if you choose to deploy you are doing so at your own risk. * Is there somewhere a documentation, an howto for HA (proxies, murder and replication) Only for HA you are not required to use the new cyrus internal technologies - there still are many large scale cyrus installations which realized their own HA infrastructure / logic by standard or less standard tools / techniques. But yes, some of the docs are a bit edgy, but in the last years the situation was changing step by step into a better situation. I.e. see for murder: https://cyrusimap.org/docs/cyrus-imapd/2.4.6/install-murder.php https://cyrusimap.org/mediawiki/index.php/Cyrus_Murder_Design and even well know computer magazines wrote about setup details (sorry for the egrman versiuon, but it may exist in the english version of LM too): http://www.linux-magazin.de/Ausgaben/2007/11/Mailvertreter hth a bit cheerioh, Niels. -- --- Niels Dettenbach Syndicat IT Internet http://www.syndicat.com PGP: https://syndicat.com/pub_key.asc --- signature.asc Description: This is a digitally signed message part. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
Re: High Availability
On Mon, Apr 20, 2015, at 04:32 PM, Lalot Dominique wrote: Hi, We used cyrus for many years and switch to a proprietary system. We are juste looking back to cyrus. I would like to know the status of cyrus and HA: This documentation seems to consider that replication is edge.. http://cyrusimap.org/docs/cyrus-imapd/2.4.9/install-replication.php and it has been written in 2007 *Note that Cyrus replication is still relatively young in the grand scheme of things, and if you choose to deploy you are doing so at your own risk. * Yeah, that's pretty ancient. Is there somewhere a documentation, an howto for HA (proxies, murder and replication) Thanks So I'll be talking about this in a couple of weeks if you want to make your way over to The Hague :) https://conference.kolab.org/kolab-summit/sessions/cyrus-imapd-past-current-and-future The short version: replication in 2.4/2.5 is very stable. We're using it at FastMail and have been in production for about 5 years now. It doesn't integrate very well with murder yet though. We don't use murder at FastMail. My plan in the short to medium term is to merge replication/murder into a general better HA system. I'd be very interested in having test cases :) (as well as FastMail moving to it) Bron. -- Bron Gondwana br...@fastmail.fm Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
Re: high-availability Cyrus (i.e. glusterfs)?
On Wed, 29 Sep 2010, Tomasz Chmielewski wrote: Hmm - I added this to imapd.conf: annotation_db: skiplist duplicate_db: skiplist mboxlist_db: skiplist ptscache_db: skiplist quota_db: skiplist seenstate_db: skiplist tlscache_db: skiplist When starting cyrus, I have this: Sep 29 02:53:48 omega cyrus/master[1089]: process started Sep 29 02:53:48 omega cyrus/ctl_cyrusdb[1090]: recovering cyrus databases Sep 29 02:53:48 omega cyrus/ctl_cyrusdb[1090]: done recovering cyrus databases Sep 29 02:53:49 omega cyrus/cyr_expire[1091]: DBERROR db4: Program version 4.2 doesn't match environment version Sep 29 02:53:49 omega cyrus/cyr_expire[1091]: DBERROR: dbenv-open '/shared/var/lib/cyrus/db' failed: Invalid argument Sep 29 02:53:49 omega cyrus/cyr_expire[1091]: DBERROR: init() on berkeley Sep 29 02:53:49 omega cyrus/cyr_expire[1091]: duplicate_prune: pruning back 3 days Sep 29 02:53:49 omega cyrus/cyr_expire[1091]: duplicate_prune: purged 0 out of 0 entries Sep 29 02:53:49 omega cyrus/cyr_expire[1091]: expunged 0 out of 0 messages from 0 mailboxes Sep 29 02:53:49 omega cyrus/tls_prune[1092]: tls_prune: purged 0 out of 0 entries Sep 29 02:53:49 omega cyrus/master[1089]: ready for work Sep 29 02:53:49 omega cyrus/ctl_cyrusdb[1093]: checkpointing cyrus databases Sep 29 02:53:49 omega cyrus/ctl_cyrusdb[1093]: done checkpointing cyrus databases # file /shared/var/lib/cyrus/db/* /shared/var/lib/cyrus/db/__db.001: data /shared/var/lib/cyrus/db/__db.002: data /shared/var/lib/cyrus/db/__db.003: data /shared/var/lib/cyrus/db/__db.004: data /shared/var/lib/cyrus/db/__db.005: data /shared/var/lib/cyrus/db/log.01: Berkeley DB (Log, version 8, native byte-order) /shared/var/lib/cyrus/db/skipstamp: data The error and Berkeley DB log file is there even if I empty this directory, and start Cyrus. Did I miss some value in imapd.conf? Cyrus is always linked with Berkeley DB, so it always tries to init the Berkeley DB environment. Even with all your backends set to skiplist, you'll still see the Berkeley DB log files in {configdir}/db/. You can safely ignore them. I'm not sure why you still get Berkeley DB errors when starting Cyrus. I have converted everything to skiplist, and I do not get those errors. Andy Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
Le 28 sept. 2010 à 08:50, Tomasz Chmielewski a écrit : Sep 28 01:10:10 omega cyrus/ctl_cyrusdb[21728]: DBERROR db4: Program version 4.2 doesn't match environment version Are you sure on each node the _SAME_ Cyrus version linked to the _SAME_ bdb libs is running? And - just a little side note - you can dump bdb in favor to skiplist... I bet you'll have much less problems in your cluster environment setup. Pascal Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
--On 28. September 2010 08:50:00 +0200 Tomasz Chmielewski man...@wpkg.org wrote: How do you manage your Cyrus installations highly-available? Check the archives. There have been many discussions regarding this. I though a minimal example could be like below: internet | server1 - server2 There would be Heartbeat/Pacemaker running on both servers. Its role would be: - assign Cyrus IP to a given server, - start Cyrus where Cyrus IP is up. Still, we need to have Cyrus database, mail storage accessible for both servers. I though using glusterfs for it would be a good idea (assuming Cyrus only runs on one of the servers at a given time). We use a similar setup with standard ext3 file systems that are mounted and unmounted as needed; in our case that's done by the RHEL 3 Cluster Suite. That's been working great for almost 6 years now. -- .:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:. .:.Regionales Rechenzentrum (RRZK).:. .:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:. p7sisIZ9P1qN8.p7s Description: S/MIME cryptographic signature Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
On 28.09.2010 09:13, Pascal Gienger wrote: Le 28 sept. 2010 à 08:50, Tomasz Chmielewski a écrit : Sep 28 01:10:10 omega cyrus/ctl_cyrusdb[21728]: DBERROR db4: Program version 4.2 doesn't match environment version Are you sure on each node the _SAME_ Cyrus version linked to the _SAME_ bdb libs is running? 100% sure. If I copy everything off glusterfs to a local filesystem, Cyrus doesn't report any errors. And - just a little side note - you can dump bdb in favor to skiplist... I bet you'll have much less problems in your cluster environment setup. Yep, I found more or less it could be some mmap problem with BDB. Is there a way to convert the existing BDB databases to skiplist? Or, initialize empty skiplist databases for Cyrus? -- Tomasz Chmielewski http://wpkg.org Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
Quoting Tomasz Chmielewski man...@wpkg.org: How do you manage your Cyrus installations highly-available? I though a minimal example could be like below: internet | server1 - server2 There would be Heartbeat/Pacemaker running on both servers. Its role would be: - assign Cyrus IP to a given server, - start Cyrus where Cyrus IP is up. Still, we need to have Cyrus database, mail storage accessible for both servers. I though using glusterfs for it would be a good idea (assuming Cyrus only runs on one of the servers at a given time). However, something doesn't work with it very well when Cyrus data is on a glusterfs mount point (if I move it to a local disk, everything works well): Cyrus depends on locks and mmap, so your fs must support them. I had written a summery of the diskussions about Cyrus and HA in the old wiki. But the wiki was replaced by the new wiki. I will have a look if I have a copy. If you plan to run in active-passive mode, did you considre Cyrus replication? You will need twice the disk space, but you remove a single point of failure (glustefs) Regards Michael Mege M.MengeTel.: (49) 7071/29-70316 Universität Tübingen Fax.: (49) 7071/29-5912 Zentrum für Datenverarbeitung mail: michael.me...@zdv.uni-tuebingen.de Wächterstraße 76 72074 Tübingen smime.p7s Description: S/MIME Signatur Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
On 28.09.2010 10:56, Michael Menge wrote: Cyrus depends on locks and mmap, so your fs must support them. I had written a summery of the diskussions about Cyrus and HA in the old wiki. But the wiki was replaced by the new wiki. I will have a look if I have a copy. I would be grateful. If you plan to run in active-passive mode, did you considre Cyrus replication? You will need twice the disk space, but you remove a single point of failure (glustefs) Glusterfs is there to avoid SPOF - as the filesystem sits on two servers. So assuming I won't do rm -rf /gluster-filesystem, it should be quite safe. And it too needs twice the disk space, since it's replicated with glusterfs on both servers. However, I'm of course open to better alternatives. I'm running Debian Lenny, which ships with Cyrus 2.2.13 - not sure if Cyrus replication is possible there? I'd like to stick with distro packages, but if a newer Cyrus version provides features which let you do HA without too much hackarounds, I'll consider upgrading. -- Tomasz Chmielewski http://wpkg.org Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
Quoting Tomasz Chmielewski man...@wpkg.org: On 28.09.2010 10:56, Michael Menge wrote: Cyrus depends on locks and mmap, so your fs must support them. I had written a summery of the diskussions about Cyrus and HA in the old wiki. But the wiki was replaced by the new wiki. I will have a look if I have a copy. I would be grateful. I didn't find the Wiki Text, but the thread that was the base of this Wiki-Text http://www.irbs.net/internet/info-cyrus/0611/0279.html If you plan to run in active-passive mode, did you considre Cyrus replication? You will need twice the disk space, but you remove a single point of failure (glustefs) Glusterfs is there to avoid SPOF - as the filesystem sits on two servers. So assuming I won't do rm -rf /gluster-filesystem, it should be quite safe. And it too needs twice the disk space, since it's replicated with glusterfs on both servers. So there is no differens in diskspace. If glustefs keeps two copies of each file or if you have two Cyrus-Servers. But with Cyrus Replication you don't have the problem with mmap and locking. It may help not to use BDB for the databases. But i don't know how good skiplist is in 2.2.13. Many skiplist bugs have been fixed in 2.3.x However, I'm of course open to better alternatives. I'm running Debian Lenny, which ships with Cyrus 2.2.13 - not sure if Cyrus replication is possible there? I'd like to stick with distro packages, but if a newer Cyrus version provides features which let you do HA without too much hackarounds, I'll consider upgrading. Replication was introduced in 2.3.x. There are other features in 2.3.x I don't want to live with out (e.g. delayed expunge). There was a diskussion on the lists about that Debian wants to upgrade cyrus. The main problem is the upgrade path (update of BDB Databases). M.MengeTel.: (49) 7071/29-70316 Universität Tübingen Fax.: (49) 7071/29-5912 Zentrum für Datenverarbeitung mail: michael.me...@zdv.uni-tuebingen.de Wächterstraße 76 72074 Tübingen smime.p7s Description: S/MIME Signatur Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
On 28.09.2010 11:55, Michael Menge wrote: Replication was introduced in 2.3.x. There are other features in 2.3.x I don't want to live with out (e.g. delayed expunge). There was a diskussion on the lists about that Debian wants to upgrade cyrus. The main problem is the upgrade path (update of BDB Databases). Assuming I start with empty mail pool (no accounts) - how can I trigger the creation of Cyrus databases (in skiplist format - I assume adding relevant skiplist info to the config file is not enough)? -- Tomasz Chmielewski http://wpkg.org Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
On Tue, Sep 28, 2010 at 12:13:14PM +0200, Tomasz Chmielewski wrote: On 28.09.2010 11:55, Michael Menge wrote: Replication was introduced in 2.3.x. There are other features in 2.3.x I don't want to live with out (e.g. delayed expunge). There was a diskussion on the lists about that Debian wants to upgrade cyrus. The main problem is the upgrade path (update of BDB Databases). Assuming I start with empty mail pool (no accounts) - how can I trigger the creation of Cyrus databases (in skiplist format - I assume adding relevant skiplist info to the config file is not enough)? All databases will create automatically upon use. Just set the type in the config file. Bron. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
Still, we need to have Cyrus database, mail storage accessible for both servers. I though using glusterfs for it would be a good idea (assuming Cyrus only runs on one of the servers at a given time). IMO, don't use glusterfs for this. I found it to not even be sufficient for a PHP session store; it'll certainly fall over with IMAP loads. John -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmad...@ivytech.edu Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
On 28.09.2010 15:01, John Madden wrote: Still, we need to have Cyrus database, mail storage accessible for both servers. I though using glusterfs for it would be a good idea (assuming Cyrus only runs on one of the servers at a given time). IMO, don't use glusterfs for this. I found it to not even be sufficient for a PHP session store; it'll certainly fall over with IMAP loads. Any other suggestions? There is an alternatives like Ceph[1], but it is just too new (and potentially can have some edge cases). DRBD + GFS/OCFS2 just seem too complex for such setup. Other than that, I use glusterfs in several setups, and I don't have any dramatic performance problems with it (still slower than bare metal of course) - will depend on workload and expected performance of course. [1] http://ceph.newdream.net/about/ -- Tomasz Chmielewski http://wpkg.org Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
Any other suggestions? There is an alternatives like Ceph[1], but it is just too new (and potentially can have some edge cases). DRBD + GFS/OCFS2 just seem too complex for such setup. If you're doing failover, you don't need a cluster filesystem. You can use just plain DRDB+ext4 if you don't have real shared storage. John -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmad...@ivytech.edu Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
Quoting Tomasz Chmielewski man...@wpkg.org: On 28.09.2010 15:01, John Madden wrote: Still, we need to have Cyrus database, mail storage accessible for both servers. I though using glusterfs for it would be a good idea (assuming Cyrus only runs on one of the servers at a given time). IMO, don't use glusterfs for this. I found it to not even be sufficient for a PHP session store; it'll certainly fall over with IMAP loads. Any other suggestions? There is an alternatives like Ceph[1], but it is just too new (and potentially can have some edge cases). DRBD + GFS/OCFS2 just seem too complex for such setup. Other than that, I use glusterfs in several setups, and I don't have any dramatic performance problems with it (still slower than bare metal of course) - will depend on workload and expected performance of course. [1] http://ceph.newdream.net/about/ Most Cluster-/Sharedfilesystems are good with few big files. But because of the metadatahandling these FS all lose performance if you have many small files, and cyrus has many files. M.MengeTel.: (49) 7071/29-70316 Universität Tübingen Fax.: (49) 7071/29-5912 Zentrum für Datenverarbeitung mail: michael.me...@zdv.uni-tuebingen.de Wächterstraße 76 72074 Tübingen smime.p7s Description: S/MIME Signatur Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
Hello AFAIK, cyrus needs posix file locks and mmap support. GlusterFS needs FUSE and it only supports writable mmap files after kernel 2.6.26 or so. Therefore, you need recent kernel and recent fuse. Further, you need to extremely fine tune your configuration, as the most robust clustered filesystems suffer under load over small files. Its their achiles heel... And cyrus uses small files and hot spot files. We are evaluating clustered fs like GlusterFS, GFS, OCFS2 (and other shared/mirrored alternatives) since 2007 and they are not there yet for such heavy load profile (small files). GlusterFS is the most elegant, flexible and promising of them. But clustered filesystems worth their performance penalty if you need active-active servers. For such active-active, you may consider using Dovecot, that was designed having taking into account clustered filesystems and shared storage and multiple servers. It has four file locking methods to choose for best suitability for a given storage method and even sql backends for mailer internal db (not for messages) . But dovecot does not support shared folders across multiple backends yet as cyrus. And *this* is a killer feature for us. If you wants active-passive configuration, it is best to stay away from any clustered filesystem, to not pay the heavy performance cost for small files (and another layer of bugs) without REALLY needing the active-active fs sharing. Keep it simple. Maybe you even do not need real time up to the microsecond replication/mirroring or sharing. This allows even more simple and or reliable or recoverable or less resource hungry solutions as more sync delay is accepted. Low level (byte or even file) solutions will replicate crashes like bdb corruptions and will slow down your app. Byte, block, file replications need REALY FAST and EXTREMELY LOW LATENCY networks, also. Notably for small files. Answer yourself: What you desire? what you actually need? Maybe you consider worthwhile to read some articles to bring some light to the subject. Also, remember that glusterfs evolved since the written article and newer versions use somewhat different confs and tuning, that depends of YOUR infrastructure. You will need some translation service to articles on brazilian portuguese. Look for Translate this page. link near bottom of each page. Good luck. Andre Felipe Machado [0] http://www.techforce.com.br/news/linux_blog/glusterfs_tuning_small_files [1 ] http://www.techforce.com.br/news/linux_blog/lvm_raid_xfs_ext3_tuning_for_small_files_parallel_i_o_on_debian [2] http://www.techforce.com.br/news/linux_blog/storage_space_for_debian_on_ibm_ds_8300 [3] http://www.techforce.com.br/news/linux_blog/how_to_configure_multipath_debian_centos_for_ibm_ds8300 [4] http://www.techforce.com.br/news/linux_blog/postgresql_ha_p1_5_com_glusterfs [5] http://www.techforce.com.br/news/linux_blog/postgresql_ha_p1_com_glusterfs [6] http://www.techforce.com.br/news/media/multimedia/video_1_da_palestra_postgresql_em_alta_disponibilidade_parte_1_usando_sistema_de_arquivos_distribuido_glusterfs [7] http://www.techforce.com.br/news/linux_blog/red_hat_cluster_suite_debian_etch [8] http://www.techforce.com.br/news/linux_blog/virtualizacao_e_servico_de_arquivos_em_cluster_ha_com_debian_etch_parte_1 [9] http://www.techforce.com.br/news/linux_blog/virtualizacao_e_servico_de_arquivos_em_cluster_ha_com_debian_etch_parte_2 [10] http://www.techforce.com.br/news/linux_blog/virtualizacao_e_servico_de_arquivos_em_cluster_ha_com_debian_etch_parte_3 [11] http://www.techforce.com.br/news/linux_blog/postgresql_ha_p1_5_com_glusterfs [12] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=595370 Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: high-availability Cyrus (i.e. glusterfs)?
On 28.09.2010 12:55, Bron Gondwana wrote: On Tue, Sep 28, 2010 at 12:13:14PM +0200, Tomasz Chmielewski wrote: On 28.09.2010 11:55, Michael Menge wrote: Replication was introduced in 2.3.x. There are other features in 2.3.x I don't want to live with out (e.g. delayed expunge). There was a diskussion on the lists about that Debian wants to upgrade cyrus. The main problem is the upgrade path (update of BDB Databases). Assuming I start with empty mail pool (no accounts) - how can I trigger the creation of Cyrus databases (in skiplist format - I assume adding relevant skiplist info to the config file is not enough)? All databases will create automatically upon use. Just set the type in the config file. Hmm - I added this to imapd.conf: annotation_db: skiplist duplicate_db: skiplist mboxlist_db: skiplist ptscache_db: skiplist quota_db: skiplist seenstate_db: skiplist tlscache_db: skiplist When starting cyrus, I have this: Sep 29 02:53:48 omega cyrus/master[1089]: process started Sep 29 02:53:48 omega cyrus/ctl_cyrusdb[1090]: recovering cyrus databases Sep 29 02:53:48 omega cyrus/ctl_cyrusdb[1090]: done recovering cyrus databases Sep 29 02:53:49 omega cyrus/cyr_expire[1091]: DBERROR db4: Program version 4.2 doesn't match environment version Sep 29 02:53:49 omega cyrus/cyr_expire[1091]: DBERROR: dbenv-open '/shared/var/lib/cyrus/db' failed: Invalid argument Sep 29 02:53:49 omega cyrus/cyr_expire[1091]: DBERROR: init() on berkeley Sep 29 02:53:49 omega cyrus/cyr_expire[1091]: duplicate_prune: pruning back 3 days Sep 29 02:53:49 omega cyrus/cyr_expire[1091]: duplicate_prune: purged 0 out of 0 entries Sep 29 02:53:49 omega cyrus/cyr_expire[1091]: expunged 0 out of 0 messages from 0 mailboxes Sep 29 02:53:49 omega cyrus/tls_prune[1092]: tls_prune: purged 0 out of 0 entries Sep 29 02:53:49 omega cyrus/master[1089]: ready for work Sep 29 02:53:49 omega cyrus/ctl_cyrusdb[1093]: checkpointing cyrus databases Sep 29 02:53:49 omega cyrus/ctl_cyrusdb[1093]: done checkpointing cyrus databases # file /shared/var/lib/cyrus/db/* /shared/var/lib/cyrus/db/__db.001: data /shared/var/lib/cyrus/db/__db.002: data /shared/var/lib/cyrus/db/__db.003: data /shared/var/lib/cyrus/db/__db.004: data /shared/var/lib/cyrus/db/__db.005: data /shared/var/lib/cyrus/db/log.01: Berkeley DB (Log, version 8, native byte-order) /shared/var/lib/cyrus/db/skipstamp: data The error and Berkeley DB log file is there even if I empty this directory, and start Cyrus. Did I miss some value in imapd.conf? -- Tomasz Chmielewski http://wpkg.org Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
High Availability approaches for Cyrus
Greetings all, I've spent a good deal of time searching the Info-Cyrus archives (and various googled articles) to identify the recommended ways to improve Cyrus availability and reduce disaster recovery time. The two main approaches appear to be Cyrus replication and file system replication using DRBD and Heartbeat/Pacemaker/RHCS. Cyrus replication appears to be the preferred approach, since with DRBD a corrupted file system on the master would be replicated on the slave. I have a few questions. - Am I missing something? Is there a third approach that is better than Cyrus or file system replication? - Cyrus replication seems to be used in conjunction with manual failover procedures. Is anyone using Heartbeat, etc. with Cyrus replication? - We have three Cyrus servers, each with a single large mailstore. Would there be a significant advantage to splitting them into multiple smaller mailstores? We're using Perdition but not Murder / Aggregator. - Are there any situations where DRBD would be preferred to Cyrus replication? Thank you for your time. John John Simpson Senior Software Engineer, I. T. Engineering and Operations Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High Availability approaches for Cyrus
- We have three Cyrus servers, each with a single large mailstore. Would there be a significant advantage to splitting them into multiple smaller mailstores? We’re using Perdition but not Murder / Aggregator. Murder rocks, IMO, well worth the learning curve of the setup. If you're going to take the extra step of doing HA for your storage nodes, I think Murder makes even more sense. We deployed our Murder cluster back in November and recently cut off access to our old Cyrus (single instance, multiple-spool) system and 6 nodes with FC meta partitions and SATA storage partitions plus a single frontend absolutely rocks for our over 450,000 users (2.6m mailboxes). We don't do HA but Murder makes it easy to do if needed. John -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmad...@ivytech.edu Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High Availability approaches for Cyrus
Quoting Simpson, John R john_simp...@reyrey.com: Greetings all, I've spent a good deal of time searching the Info-Cyrus archives (and various googled articles) to identify the recommended ways to improve Cyrus availability and reduce disaster recovery time. The two main approaches appear to be Cyrus replication and file system replication using DRBD and Heartbeat/Pacemaker/RHCS. Cyrus replication appears to be the preferred approach, since with DRBD a corrupted file system on the master would be replicated on the slave. I have a few questions. - Am I missing something? Is there a third approach that is better than Cyrus or file system replication? I don't know any other. - Cyrus replication seems to be used in conjunction with manual failover procedures. Is anyone using Heartbeat, etc. with Cyrus replication? You could write scripts to do the failover with Heartbeat, but IMHO the reaction-time you win by using Heartbeat does not outwight the risk of an ammok running Heartbeat (e.g. split brain) - We have three Cyrus servers, each with a single large mailstore. Would there be a significant advantage to splitting them into multiple smaller mailstores? We're using Perdition but not Murder / Aggregator. Running two active instances of cyrus would allow you to share the load of the failed server on the two other instead of one server doing the work of two. - Are there any situations where DRBD would be preferred to Cyrus replication? Cyrus replication is very new, so you have to use a recent version of cyrus. If you have to use an older version of cyrus DRBD might be the only option. Thank you for your time. John John Simpson Senior Software Engineer, I. T. Engineering and Operations M.MengeTel.: (49) 7071/29-70316 Universität Tübingen Fax.: (49) 7071/29-5912 Zentrum für Datenverarbeitung mail: michael.me...@zdv.uni-tuebingen.de Wächterstraße 76 72074 Tübingen smime.p7s Description: S/MIME Signatur Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
RE: High Availability approaches for Cyrus
-Original Message- From: John Madden [mailto:jmad...@ivytech.edu] Sent: Monday, March 15, 2010 2:07 PM To: Simpson, John R Cc: info-cyrus@lists.andrew.cmu.edu Subject: Re: High Availability approaches for Cyrus - We have three Cyrus servers, each with a single large mailstore. Would there be a significant advantage to splitting them into multiple smaller mailstores? We're using Perdition but not Murder / Aggregator. Murder rocks, IMO, well worth the learning curve of the setup. If you're going to take the extra step of doing HA for your storage nodes, I think Murder makes even more sense. We deployed our Murder cluster back in November and recently cut off access to our old Cyrus (single instance, multiple-spool) system and 6 nodes with FC meta partitions and SATA storage partitions plus a single frontend absolutely rocks for our over 450,000 users (2.6m mailboxes). We don't do HA but Murder makes it easy to do if needed. John Thank you, John. I'm not a Cyrus expert but I'll be working with our Cyrus team on this project. I'll definitely bring up Murder John -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmad...@ivytech.edu Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Well, as far as I know, the mailboxes.db and other databases are only opened and modified by the master process. But I'm not sure here. But as your assumption sounds correct and because this seems to work with cluster (and I fully believe you here, no question), your assumption regarding the DBs somewhat must be correct. Thanks! I would be glad if some list member who has in depth knowledge here could comment! Best, Daniel Andrew Morgan schrieb: On Tue, 1 Aug 2006, Daniel Eckl wrote: Well, I don't have cluster knowledge, and so of course I simply believe you that a good cluster system will never have file locking problems. I already stated this below! But how will the cluster affect application level database locking? That was my primary question and you didn't name this at all. A database file which is in use is practically always inconsistent until it's being closed by the database application. That's why databases can be corrupt after an application crash and have to be reconstructed. When you have two applications changing the same database file, you have a never ending fight, because every application thinks, the database is inconsistent, but it's just in use by another application. And every app will try to reconstruct it and so break it for the other app(s). It's like letting two cyrus master run on the same single node! It will break in my opinion. Can you shed some light on this subject? I think the point here is that the situation you describe already occurs all the time on a stand-alone Cyrus server. There are multiple imapd processes accessing the mailboxes.db database concurrently. If you are using Berkeley DB, it has an API to manage concurrent access. I assume the same is true of skiplist and the other backend formats. I don't know enough about the Berkeley DB internals to explain how it actually works, but it does. :) Andy Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Hi Scott! Your statements cannot be correct by logical reasons. While on file locking level you are fully right, cyrus heavily depends on critical database access where you need application level database locking. As only one master process can lock the database, a second one either cannot lock the database or just crashes it with simultaneous write access. I didn't try it by myself for obvious reasons... If that didn't occur to you, then you had incredible luck, that there was no situation where both processes wanted to change the same db file simultaneously. Best, Daniel Scott Adkins schrieb: Okay, okay, I just can't *NOT* say something here :) First, I disagree with all the statements below. Cyrus CAN run in an Active/Active mode, you CAN have multiple servers reading and writing to the same files, and clustering IS a good way to achieve HA/DR/BC in a Cyrus environment. Why do I say that? I say that because we have been doing that for many many years. The key is to have a GOOD clustering file technology that does proper file locking while still providing good performance. For years, we have been running our Cyrus IMAP system on a Tru64 Alpha TruCluster system. Our cluster has 4 members in it, 2 of which serves up Cyrus IMAP/IMSP, and the other 2 which accepts and delivers mail via Sendmail. All of the servers have access to the full set of files across the cluster and everything just works. I would like to address a few specific comments listed below: GFS, Lustre, and other cluster filesystems do file-level locking; in order to properly read and write to the BDB backend, you'd need DB-level locking, which is not possible from a filesystem. I wrote a lengthy response to this, but when I got the conclusion, it all came down to a really simple point. How is having multiple servers any different than a single server? You still have tons of different processes all trying to acquire read/write locks to the same files. There is no one process in Cyrus that opens the database and shares it with all the other processes running under Cyrus. How is this different from an IMAP process running on one server and a different IMAP process running on a different server? There isn't. The end result is that file locking is the most important feature Cyrus has to rely upon... what if you are using the flat file format for your mailboxes.db file? At that point, that is the ONLY thing you can rely upon... IMAP is also a stateful connection; depending on how you set up your cluster, some clients might not handle it gracefully (e.g., Pine). True, true... stateful it is... but at the same time, what kind of problems do you see? When an IMAP connection is opened, the client auths and starts doing stuff with it. When it closes the connection, it is done with it. That is it. If another connection is opened, presumably, it is because of a user initiating a new action, and the client will simply do the same thing again... auth and work. Most clients keep at least one connection open to the server at all times. Even if the client has more than one connection, and one connection is on one server and another connection is on another server, there still shouldn't be any problems The data is the same on the other end. Incidentally, we have Pine users in our environment that do not have problems with our multi-server clustering Cyrus environment. In fact, we have not seen any client have problems with it. Webmail based clients are a different animal. It isn't because of the fact that we are running multi-servers in the environment, it is becasue of the non-stateful nature of the client. Users don't have problems with antyhing from a data consistency standpoint, it is simply a problem with performance. It is the same issue faced in a single server environment. Using some kind of middleware piece to cache IMAP connections is usually how this problem is solved. As already said in this thread: Cyrus cannot share its spool. No 2 cyrus instances can use the same spool, databases and lockfiles. Just simply isn't true. However, I must say, there is a reason why NFS shouldn't be used... it doesn't do proper file locking (though, I am going to watch for the responses on the NFSv4 thread that somebody asked about). Without proper file locking, even a single Cyrus server on the backend is jeapordized by multiple IMAP processes wanting to write to a single DB at the same time. Clustered filesystems don't make any sense for Cyrus, since the application itself doesn't allow simultaneous read/write. I completely disagree... Clustering filesystems (if they implement proper file locking techniques) actually SIMPLIFIES your setup significantly. You don't have to have a complex Murder/Perdition environment with replication, failover, etc. You simply run 2 or more servers running on the clustering filesystem and run things as you would normally expect. Surprisingly, it runs
Re: High availability email server...
Daniel Eckl wrote: Hi Scott! Your statements cannot be correct by logical reasons. While on file locking level you are fully right, cyrus heavily depends on critical database access where you need application level database locking. As only one master process can lock the database, a second one either cannot lock the database or just crashes it with simultaneous write access. I didn't try it by myself for obvious reasons... If that didn't occur to you, then you had incredible luck, that there was no situation where both processes wanted to change the same db file simultaneously. Hi Daniel, Scott is not just lucky, he's using clustering technology that works. When using a cluster filesystem that works, the locking semantics across cluster nodes will be the same as those on a single node filesystem. What you say above is simply not correct. University of Pittsburgh is also running a 4-node active/active cluster using Veritas Cluster Filesystem and it works very well. The performance is incredible, and as Scott pointed out you don't need the complexity of murder or application-level replication. Using a cluster instead of Cyrus murder gives you both scalability and redundancy. The big tradeoff is that Veritas Cluster Filesystem costs money, while Cyrus does not. Thanks, Dave Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Well, I don't have cluster knowledge, and so of course I simply believe you that a good cluster system will never have file locking problems. I already stated this below! But how will the cluster affect application level database locking? That was my primary question and you didn't name this at all. A database file which is in use is practically always inconsistent until it's being closed by the database application. That's why databases can be corrupt after an application crash and have to be reconstructed. When you have two applications changing the same database file, you have a never ending fight, because every application thinks, the database is inconsistent, but it's just in use by another application. And every app will try to reconstruct it and so break it for the other app(s). It's like letting two cyrus master run on the same single node! It will break in my opinion. Can you shed some light on this subject? Best, Daniel Dave McMurtrie schrieb: Daniel Eckl wrote: Hi Scott! Your statements cannot be correct by logical reasons. While on file locking level you are fully right, cyrus heavily depends on critical database access where you need application level database locking. As only one master process can lock the database, a second one either cannot lock the database or just crashes it with simultaneous write access. I didn't try it by myself for obvious reasons... If that didn't occur to you, then you had incredible luck, that there was no situation where both processes wanted to change the same db file simultaneously. Hi Daniel, Scott is not just lucky, he's using clustering technology that works. When using a cluster filesystem that works, the locking semantics across cluster nodes will be the same as those on a single node filesystem. What you say above is simply not correct. University of Pittsburgh is also running a 4-node active/active cluster using Veritas Cluster Filesystem and it works very well. The performance is incredible, and as Scott pointed out you don't need the complexity of murder or application-level replication. Using a cluster instead of Cyrus murder gives you both scalability and redundancy. The big tradeoff is that Veritas Cluster Filesystem costs money, while Cyrus does not. Thanks, Dave Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
On Tue, 1 Aug 2006, Daniel Eckl wrote: Well, I don't have cluster knowledge, and so of course I simply believe you that a good cluster system will never have file locking problems. I already stated this below! But how will the cluster affect application level database locking? That was my primary question and you didn't name this at all. A database file which is in use is practically always inconsistent until it's being closed by the database application. That's why databases can be corrupt after an application crash and have to be reconstructed. When you have two applications changing the same database file, you have a never ending fight, because every application thinks, the database is inconsistent, but it's just in use by another application. And every app will try to reconstruct it and so break it for the other app(s). It's like letting two cyrus master run on the same single node! It will break in my opinion. Can you shed some light on this subject? I think the point here is that the situation you describe already occurs all the time on a stand-alone Cyrus server. There are multiple imapd processes accessing the mailboxes.db database concurrently. If you are using Berkeley DB, it has an API to manage concurrent access. I assume the same is true of skiplist and the other backend formats. I don't know enough about the Berkeley DB internals to explain how it actually works, but it does. :) Andy Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Michael-- One of the major problems you'd run into is /var/lib/imap, the config directory. It contains, among other things, a Berkeley DB of information about the mail store. GFS, Lustre, and other cluster filesystems do file-level locking; in order to properly read and write to the BDB backend, you'd need DB-level locking, which is not possible from a filesystem. If you tried putting /var/lib/imap on shared storage, you'd have data corruption and loss in no time. IMAP is also a stateful connection; depending on how you set up your cluster, some clients might not handle it gracefully (e.g., Pine). Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University On Sat, 29 Jul 2006, Michael Menge wrote: Hi, Quoting Pascal Gienger [EMAIL PROTECTED]: I would NEVER suggest to mount the cyrus mail spool via NFS, locking is important and for these crucial things I like to have a real block device with a real filesystem, so SANs are ok to me. does someone use lustre as cyrus mail spoll? Would it be possible to run cryus on 2 ore more systems with a shared spool for loadbalancing and HA with lustre? Michael Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
RE: High availability email server...
At 11:49 PM +0200 7/28/06, Pascal Gienger wrote: In the Apple case we need to distinguish Apple XSAN Harddisk chassis and the XSAN software. The XSAN software seem to give you a special filesystem for SAN issues (at least I read this on their webpage). Let me dissect this a bit. The Xserve RAID is Apple's RAID appliance box, two non-redundant/failover 7-disk controllers in one box (14 disks total), each with FC connectors connecting to (whatever). The administrative application is a Java app. No Mac OS necessary, no special Apple goo. A group here is using it as raw storage for VMWare (VMFS). Xsan is Apple's licensed implementation of ADIC's StorNext file system. It's its own file system, *NOT* HFS+. StorNext requires a dedicated, private Ethernet network for communicating metadata information between the nodes and controllers. This topology is where it falls down with lots of little files -- lots of little files means more metadata flying between the nodes and controllers; it's inherent to the StorNext design. On the flip side, Apple's Xsan product is a very cheap way to implement a StorNext controller; Apple's client licenses are (relatively) cheap as well, but ADIC will happily sell clients for Linux/AIX/Solarix/other that all work with Xsan. -- Andrew Laurence [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
At 4:18 PM -0400 7/28/06, John Madden wrote: Sorry, please bear with my ignorance, I'm not very informed about NFS, but what's wrong with locking against a real block device? NFS is a file sharing protocol that doesn't provide full locking semantics the way block devices do. Has Cyrus been tested against NFSv4? My understanding is that NFSv4 fixes all the locking issues. The legends of NFS' insufficient locking continue, but I can only assume they refer to =NFSv3. Thanks, -Andrew -- Andrew Laurence [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
On Fri, 2006-07-28 at 15:33 -0700, Andrew Morgan wrote: On Fri, 28 Jul 2006, Rich Graves wrote: My question: So is *anyone* here happy with Cyrus on ext3? We're a small site, only 3200 users, 246GB mail. I'd really rather not try anything more exotic for supportability reasons, but I'm getting worried that our planned move from Solaris 9/VxFS to RHEL4/ext3 on significantly newer and faster hardware is going to be a downgrade. We run Cyrus on ext3 under Debian Linux without complaints here. We have approximately 35000 mailboxes/users split between 2 backend servers. Each backend server is connected to an EMC Cx500 SAN (no shared access or anything fancy) with 800GB of mail spool each. The commands used to build the filesystems were: mkfs -t ext3 -j -m 1 -O dir_index /dev/sdb1 tune2fs -c 0 -i 0 /dev/sdb1 The filesystem is mounted like so: /dev/sdb1/private ext3defaults,data=ordered,noatime 0 2 If you want more information, just ask. :) How big is your journal? I have instructions for determining the size here, because it's non-obvious: http://nakedape.cc/wiki/PlatformNotes_2fLinuxNotes (BTW, you can drop the 'defaults' from the entry in your fstab; 'defaults' exists to fill the column in the table when nothing else is there.) Wil -- Wil Cooley [EMAIL PROTECTED] Naked Ape Consulting, Ltd signature.asc Description: This is a digitally signed message part Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
On Mon, 31 Jul 2006, Wil Cooley wrote: How big is your journal? I have instructions for determining the size here, because it's non-obvious: http://nakedape.cc/wiki/PlatformNotes_2fLinuxNotes (BTW, you can drop the 'defaults' from the entry in your fstab; 'defaults' exists to fill the column in the table when nothing else is there.) Those tools are a little scary, but here is what it reported: Inode: 8 Type: regularMode: 0600 Flags: 0x0 Generation: 0 User: 0 Group: 0 Size: 33554432 ... Performance has been okay for me so far. Do you have any feeling for whether it is worth changing the journal size? Andy Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
On Mon, 31 Jul 2006, Wil Cooley wrote: Well, 32MB is small for a write-heavy filesystem. But if you're not seeing any problems with kjournald stalling while it flushes, then it might not be worth the trouble of re-creating the journal as a larger size. It's unlikely to hurt anything, but I wouldn't make it huge priority. Did you also read the LOPSA post from Ted Ts'o that I linked to in the section above the instructions? Yeah, I guess this is something I should do when we have a downtime window. The only performance problem I've noticed is that things get pretty sluggish when I unplug half of the power feed to the SAN (we were doing a power upgrade in our data center). The write cache is disabled in this situation and things got really bad for a while. :) Increasing the size of the journal may have helped during that time, but who knows. In general, I think an IMAP server does more reads than writes, but every little bit helps! Andy Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
On Mon, 2006-07-31 at 15:40 -0700, Andrew Morgan wrote: On Mon, 31 Jul 2006, Wil Cooley wrote: How big is your journal? I have instructions for determining the size here, because it's non-obvious: http://nakedape.cc/wiki/PlatformNotes_2fLinuxNotes (BTW, you can drop the 'defaults' from the entry in your fstab; 'defaults' exists to fill the column in the table when nothing else is there.) Those tools are a little scary, but here is what it reported: Yeah, but debugfs opens the filesystem read-only w/o '-w'. Inode: 8 Type: regularMode: 0600 Flags: 0x0 Generation: 0 User: 0 Group: 0 Size: 33554432 ... Performance has been okay for me so far. Do you have any feeling for whether it is worth changing the journal size? Well, 32MB is small for a write-heavy filesystem. But if you're not seeing any problems with kjournald stalling while it flushes, then it might not be worth the trouble of re-creating the journal as a larger size. It's unlikely to hurt anything, but I wouldn't make it huge priority. Did you also read the LOPSA post from Ted Ts'o that I linked to in the section above the instructions? Wil -- Wil Cooley [EMAIL PROTECTED] Naked Ape Consulting, Ltd signature.asc Description: This is a digitally signed message part Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Kinda surprising, but it DOES have something to do with Cyrus. Caspur did their case study on cluster filesystems with their e-mail environment. It used Cyrus IMAP and some kind of SMTP (I think it was Postfix or Their paper talks about Maildir. If you connect to mailbox.caspur.it:993 you'll see a Courier-IMAP greeting. Maildir is specifically designed not to need anything more fine-grained than filesystem metadata-level locking. In theory, it's even NFSv2-safe, though user-visible performance on typical IMAP loads sucks (see recent Thunderbird rants, and double). Cyrus usually expects more advanced locking facilities, but it depends on db backend. I obviously can't argue with your own positive experiences with Tru64 clustering (as with other former DEC products, I've heard only good things about Tru64, except that it's been mismanaged and mismarketed into irrelevance), but those of us on crankier OSes need to worry about such things. Sendmail 8.12.5 release notes: NOTE: Linux appears to have broken flock() again; UW-IMAP FAQ http://www.washington.edu/imap/documentation/FAQ.html#6.11, etc. Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Hi, Quoting Pascal Gienger [EMAIL PROTECTED]: I would NEVER suggest to mount the cyrus mail spool via NFS, locking is important and for these crucial things I like to have a real block device with a real filesystem, so SANs are ok to me. does someone use lustre as cyrus mail spoll? Would it be possible to run cryus on 2 ore more systems with a shared spool for loadbalancing and HA with lustre? Michael Hi, Quoting Pascal Gienger [EMAIL PROTECTED]: I would NEVER suggest to mount the cyrus mail spool via NFS, locking is important and for these crucial things I like to have a real block device with a real filesystem, so SANs are ok to me. does someone use lustre as cyrus mail spoll? Would it be possible to run cryus on 2 ore more systems with a shared spool for loadbalancing and HA with lustre? Michael smime.p7s Description: S/MIME krytographische Unterschrift smime.p7s Description: S/MIME krytographische Unterschrift Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Hi Michael! As already said in this thread: Cyrus cannot share its spool. No 2 cyrus instances can use the same spool, databases and lockfiles. For load balancing you can use a murder setup and for HA you can use replication. Best, Daniel Michael Menge schrieb: Hi, Quoting Pascal Gienger [EMAIL PROTECTED]: I would NEVER suggest to mount the cyrus mail spool via NFS, locking is important and for these crucial things I like to have a real block device with a real filesystem, so SANs are ok to me. does someone use lustre as cyrus mail spoll? Would it be possible to run cryus on 2 ore more systems with a shared spool for loadbalancing and HA with lustre? Michael Hi, Quoting Pascal Gienger [EMAIL PROTECTED]: I would NEVER suggest to mount the cyrus mail spool via NFS, locking is important and for these crucial things I like to have a real block device with a real filesystem, so SANs are ok to me. does someone use lustre as cyrus mail spoll? Would it be possible to run cryus on 2 ore more systems with a shared spool for loadbalancing and HA with lustre? Michael Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Pascal Gienger wrote: David Korpiewski [EMAIL PROTECTED] wrote: I spent about 6 months fighting with Apple XSAN and Apple OSX mail to try to create a redundant cyrus mail cluster. First of all, don't try it, it is a waste of time. Apple states that mail on an XSAN is not supported. The reason is that it simply won't run. The Xsan can't handle the large amount of small files and will do things like disconnect or corrupting the file system. STOP! The capability to handle small files efficiently is related to the filesystem carrying the files and NOT to the physical and logical storage media (block device) under it. Apple is the one confusing people. XSan is the name of the Apple cluster file system. So you configure a couple of hosts on your SAN, with a shared volume, and then run XSan. XSan is more similar to Linux GFS (Global File System). So I believe the original poster is right: XSan is crap for lots of small files. That is not surprising. It is really hard to come up with a shared file system that does suck. The nodes have to lock for meta-data updates. So shared file systems can be pretty slow too. Tom Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Chad-- We've put /var/lib/imap and /var/spool/imap on a SAN and have two machines -- one active, and one hot backup. If the active server fails, the other mounts the storage and takes over. This is not yet in production, but it's a pretty simple setup and can be done without running any bleeding edge software, and it appears that it will work fine. There's no need to use a SAN, either -- you could share your mail storage out via NFS with the same effect. We're going production with this in mid-August; if you'd like to know how everything goes, drop me a note in a month or so. Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University On Thu, 27 Jul 2006, Chad A. Prey wrote: OK...I'm searching for strategies to have a realtime email backup in the event of backend failure. We've been running cyrus-imap for about a year and a half with incredible success. Our failures have all been due to using junky storage. One idea is to have a continuous rsync of the cyrus /var/spool/imap and /var/lib/imap to another server I've also considered delivering email to two discreet email backends and keeping the /var/lib/imap file sync'd . I don't think I can use murder to do this. Is anyone out there using RHEL in a cluster that would like to share their architecture? Any contractors out there that want to get paid to help us implement? Chad P. [EMAIL PROTECTED] Salk Institute for Biological Studies -- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
I spent about 6 months fighting with Apple XSAN and Apple OSX mail to try to create a redundant cyrus mail cluster. First of all, don't try it, it is a waste of time. Apple states that mail on an XSAN is not supported. The reason is that it simply won't run. The Xsan can't handle the large amount of small files and will do things like disconnect or corrupting the file system. So my suggestion is to make sure that your san can handle the large amount of files that a mail server will be reading and dumping. The Xsan had severe problems to the point of file system corruption while trying to deal with them, so make sure your san can handle it. Our final solution which we are still in the process of finishing is to create a linux cluster with two servers each with their own XRAID backend. Using replication, we mirror one server to the other. Using some sophisticated software we wrote, we can validate the mailstore ages and then create a fully see-sawable cluster system. Good luck! David Chris St. Pierre wrote: Chad-- We've put /var/lib/imap and /var/spool/imap on a SAN and have two machines -- one active, and one hot backup. If the active server fails, the other mounts the storage and takes over. This is not yet in production, but it's a pretty simple setup and can be done without running any bleeding edge software, and it appears that it will work fine. There's no need to use a SAN, either -- you could share your mail storage out via NFS with the same effect. We're going production with this in mid-August; if you'd like to know how everything goes, drop me a note in a month or so. Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University On Thu, 27 Jul 2006, Chad A. Prey wrote: OK...I'm searching for strategies to have a realtime email backup in the event of backend failure. We've been running cyrus-imap for about a year and a half with incredible success. Our failures have all been due to using junky storage. One idea is to have a continuous rsync of the cyrus /var/spool/imap and /var/lib/imap to another server I've also considered delivering email to two discreet email backends and keeping the /var/lib/imap file sync'd . I don't think I can use murder to do this. Is anyone out there using RHEL in a cluster that would like to share their architecture? Any contractors out there that want to get paid to help us implement? Chad P. [EMAIL PROTECTED] Salk Institute for Biological Studies -- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html -- David Korpiewski Phone: 413-545-4319 Software Specialist IFax: 413-577-2285 Department of Computer Science ICQ: 7565766 University of Massachusetts Amherst Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
David Korpiewski wrote: I spent about 6 months fighting with Apple XSAN and Apple OSX mail to try to create a redundant cyrus mail cluster. First of all, don't try it, it is a waste of time. Apple states that mail on an XSAN is not supported. The reason is that it simply won't run. The Xsan can't handle the large amount of small files and will do things like disconnect or corrupting the file system. So my suggestion is to make sure that your san can handle the large amount of files that a mail server will be reading and dumping. The Xsan had severe problems to the point of file system corruption while trying to deal with them, so make sure your san can handle it. Our final solution which we are still in the process of finishing is to create a linux cluster with two servers each with their own XRAID backend. Using replication, we mirror one server to the other. Using some sophisticated software we wrote, we can validate the mailstore ages and then create a fully see-sawable cluster system. Good luck! David Chris St. Pierre wrote: Chad-- We've put /var/lib/imap and /var/spool/imap on a SAN and have two machines -- one active, and one hot backup. If the active server fails, the other mounts the storage and takes over. This is not yet in production, but it's a pretty simple setup and can be done without running any bleeding edge software, and it appears that it will work fine. There's no need to use a SAN, either -- you could share your mail storage out via NFS with the same effect. One note on this. The SAN configuration sounds great and is a very common backup/failover solution, however do not try the NFS. It is well documented that Cyrus does not play nice with NFS. We're going production with this in mid-August; if you'd like to know how everything goes, drop me a note in a month or so. Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University On Thu, 27 Jul 2006, Chad A. Prey wrote: OK...I'm searching for strategies to have a realtime email backup in the event of backend failure. We've been running cyrus-imap for about a year and a half with incredible success. Our failures have all been due to using junky storage. One idea is to have a continuous rsync of the cyrus /var/spool/imap and /var/lib/imap to another server I've also considered delivering email to two discreet email backends and keeping the /var/lib/imap file sync'd . I don't think I can use murder to do this. Is anyone out there using RHEL in a cluster that would like to share their architecture? Any contractors out there that want to get paid to help us implement? Chad P. [EMAIL PROTECTED] Salk Institute for Biological Studies -- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html -- Kevin Baker Mission Vi Inc. [EMAIL PROTECTED] 858.454.5532 begin:vcard fn:Kevin Baker n:Baker;Kevin email;internet:[EMAIL PROTECTED] tel;work:858-454-5532 version:2.1 end:vcard Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Chris St. Pierre wrote: We've put /var/lib/imap and /var/spool/imap on a SAN and have two machines -- one active, and one hot backup. If the active server fails, the other mounts the storage and takes over. This is not yet Also consider /var/spool/{mqueue,clientmqueue,postfix}. Depending on how your server fails, there could be important incoming mail stuck in those queues, and silently failing over to a server without them could be bad. Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
RE: High availability email server...
Hi, you can also use DRBD for replication on the block level. Then you need no SAN and have a shared nothing architecture. You will need a high speed link between the sites (GBIT). An alternative is a SAN with replication. You can also use md for this purpose (host based SAN raid). Two cheap MSA 1000/1500 will do just fine. We have done a installation with 2 MSA 1500 san arrays mirrored with MD on the host level in production since 1 year. Runs without a problem. There are aprox. 3500 active users on the servers. Exim and other services (e.g. LDAP) are also protected in the cluster. We used SteelEye LifeKeeper as cluster software. Contact me if you need more information. Regards, Robert Heinzmann COMPUTER CONCEPT CC Computersysteme und Kommunikationstechnik GmbH Robert Heinzmann Wiener Str. 114 - 116 Email: [EMAIL PROTECTED] 01219 Dresden Telefon:+49 (0)351/8 76 92-0 Telefax:+49 (0)351/8 76 92-99 Internet: http://www.cc-dresden.de -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Chris St. Pierre Gesendet: Freitag, 28. Juli 2006 15:12 An: Chad A. Prey Cc: info-cyrus@lists.andrew.cmu.edu Betreff: Re: High availability email server... Chad-- We've put /var/lib/imap and /var/spool/imap on a SAN and have two machines -- one active, and one hot backup. If the active server fails, the other mounts the storage and takes over. This is not yet in production, but it's a pretty simple setup and can be done without running any bleeding edge software, and it appears that it will work fine. There's no need to use a SAN, either -- you could share your mail storage out via NFS with the same effect. We're going production with this in mid-August; if you'd like to know how everything goes, drop me a note in a month or so. Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University On Thu, 27 Jul 2006, Chad A. Prey wrote: Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Pascal Gienger wrote: There are techniques to handle these situations - for xfs (as an example) consider having *MUCH* RAM in your machine and always mount it with logbufs=8. Is XFS so RAM intensive? I would NEVER suggest to mount the cyrus mail spool via NFS, locking is important and for these crucial things I like to have a real block device with a real filesystem, so SANs are ok to me. Sorry, please bear with my ignorance, I'm not very informed about NFS, but what's wrong with locking against a real block device? We are having here a RAID device with 1,5 TB wich is shared between 2 mail nodes and 2 test nodes. The switch can be done manually (10 seconds downtime) and - if you wish - via Heartbeat HA software. The only dangerous thing is to ensure that NEVER, really NEVER a second node mounts your SAN partition while another has mounted it already. Immediately kernel halts and data losses are the result. There are file systems like GFS that have been written for that, even if they are pretty CPU and I/O intensive (I use it for multimedia sharing - a lot a lot of images that needs to be shared across 4 nodes having Apache to serve them). Fabio Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Sorry, please bear with my ignorance, I'm not very informed about NFS, but what's wrong with locking against a real block device? NFS is a file sharing protocol that doesn't provide full locking semantics the way block devices do. There are file systems like GFS that have been written for that, even if they are pretty CPU and I/O intensive (I use it for multimedia sharing - a lot a lot of images that needs to be shared across 4 nodes having Apache to serve them). GFS is a *cluster* filesystem. We're talking about high availability here, not clustering. GFS could certainly be used in this case, but would be overkill. John -- John Madden Sr. UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
On Jul 28, 2006, at 1:40 PM, Pascal Gienger wrote: So if Apple says that Xsan does not handle many files they admit that their HFS+ file system is crap for many small files. This is completely untrue. Xsan, although branded by Apple, is not completely an Apple product. ADIC makes StorNext which is what Xsan is based on. Basically, think of Xsan as StorNext with a pretty interface. I just spent an hour on the phone with the ADIC guys talking about using their product on a few Linux servers. Their concern is that their product doesn't work well with a lot of small files. It has absolutely nothing to do with HFS, jfs, xfs, or anything else. It's basically the design of their software. It's better suited for the management of large files like images, audio and video clips, 3D renderings, etc. If you have a lot of small mail files, it might not be the best solution. That is unless your users spend most of their email time sending 15MB files around to people and almost none responding with thanks. I've used HFS+ to store mail using cyrus. I've had exactly zero problems using it. I've also used jfs and had exactly zero problems. -Michael --- There will always be those who dare to take great risks. Rather than mourn their loss, we should value their contributions. --Jesse Brown Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Clustered filesystems don't make any sense for Cyrus, since the application itself doesn't allow simultaneous read/write. Just use a normal journaling filesystem and fail over by mounting the FS on the backup server. Consider replication such as DRDB or proprietary SAN replication if you feel you must physically mirror the storage. My question: So is *anyone* here happy with Cyrus on ext3? We're a small site, only 3200 users, 246GB mail. I'd really rather not try anything more exotic for supportability reasons, but I'm getting worried that our planned move from Solaris 9/VxFS to RHEL4/ext3 on significantly newer and faster hardware is going to be a downgrade. Anyway, it has nothing to do with Cyrus, but if anyone does have another application that wants lots of small files on a clustered FS: http://web.caspur.it/Files/2005/01/10/1105354214692.pdf http://polyserve.com/pdf/Caspur_CS.pdf -- Rich Graves [EMAIL PROTECTED] Sr UNIX and Security Administrator Ofc 507-646-7079 Cell 952-292-6529 Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Rich Graves wrote: Clustered filesystems don't make any sense for Cyrus, since the application itself doesn't allow simultaneous read/write. Just use a normal journaling filesystem and fail over by mounting the FS on the backup server. Consider replication such as DRDB or proprietary SAN replication if you feel you must physically mirror the storage. That means forget about cyrus being active/active? Sounds like a BIG limitation to me, especially when we talk about horizontal scalability. Anyway, it has nothing to do with Cyrus, but if anyone does have another application that wants lots of small files on a clustered FS: http://web.caspur.it/Files/2005/01/10/1105354214692.pdf http://polyserve.com/pdf/Caspur_CS.pdf Thanks very much for pointing out at those documents. Regarding your questions, I've never tried to do comparisons between VxFS and ext3, but as far as I know the first one performs better. If this is available for Solaris 10_x86 consider an AMD64 architecture, should be pretty cheap compared to SPARC and very well performing. Fabio Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
RE: High availability email server...
David S. Madole [EMAIL PROTECTED] wrote: That's just not true as a general statement. SAN is a broad term that applies to much more than just farming out block devices. Some of the more sophisticated SANs are filesystem-based, not block-based. This allows them to implement more advanced functionality like cross-platform sharing of volumes, simultaneous mounts of volumes from different hosts, backups (and single-file restores) performed by the SAN system, pooling of free space, transparent migration to offline storage, etc., etc., etc. In my classical view a SAN is a network used for storage applications to give a view on shareable block devices. There are hardware applications giving access to the same filesystem in a shareable manner (as GFS or ocfs) but this is software logic in the filesystem and firmware level and not in the classical SAN components like JBOD arrays, RAID controllers and FC or IP switches. In the Apple case we need to distinguish Apple XSAN Harddisk chassis and the XSAN software. The XSAN software seem to give you a special filesystem for SAN issues (at least I read this on their webpage). So if Apple says that this is not suited well for many small files I would not use it for that. Another instance of a SAN filesystem that I do happen to be familiar with is IBM's: http://www-03.ibm.com/servers/storage/software/virtualization/sfs/index.h tml Also this filesystem lives above the FCP (Fiberchannel) protocol forming a filesystem including multipathing elements and concurrent access strategies. Still you have to distinguish the block-level access to SAN devices and the filesystems build above them. It is true that SAN is marketing speech for all kind of things. Pascal Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Fabio Corazza wrote: Rich Graves wrote: Clustered filesystems don't make any sense for Cyrus, since the application itself doesn't allow simultaneous read/write. Just use a normal journaling filesystem and fail over by mounting the FS on the backup server. Consider replication such as DRDB or proprietary SAN replication if you feel you must physically mirror the storage. That means forget about cyrus being active/active? Sounds like a BIG limitation to me, especially when we talk about horizontal scalability. No. You scale horizontally with Murder. Or front-end with another proxy like perdition, or have the clients connect to other servers directly by using ACAP (mostly dead), IMAP referrals (mostly unimplemented), or simply telling users which server to use (historically, universities would advertise user-specific load-balancing hostnames like rgraves.imap.carleton.edu). You get active/active N+1 redundancy by allowing failover server(s) to mount other server's filesystems in the SAN. Anyway, it's not Exchange 5.5. It doesn't crash every week. And when you perform 10 times better than the competition, you have 1/10 the need for horizontal scalability. Regarding your questions, I've never tried to do comparisons between VxFS and ext3, but as far as I know the first one performs better. If this is available for Solaris 10_x86 consider an AMD64 architecture, should be pretty cheap compared to SPARC and very well performing. If we were going to stay in the Solaris game I think we'd be looking at ZFS. Interesting. http://www.sun.com/software/whitepapers/solaris10/fs_performance.pdf suggests that ext3 is better than reiserfs for their test workload. Just goes to show you that benchmarks are entirely parameter-dependent. Anyone have postmark parameters that they feel accurately reflect Cyrus's needs? -- Rich Graves [EMAIL PROTECTED] Sr UNIX and Security Administrator Ofc 507-646-7079 Cell 952-292-6529 Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
On Fri, 28 Jul 2006, Rich Graves wrote: My question: So is *anyone* here happy with Cyrus on ext3? We're a small site, only 3200 users, 246GB mail. I'd really rather not try anything more exotic for supportability reasons, but I'm getting worried that our planned move from Solaris 9/VxFS to RHEL4/ext3 on significantly newer and faster hardware is going to be a downgrade. We run Cyrus on ext3 under Debian Linux without complaints here. We have approximately 35000 mailboxes/users split between 2 backend servers. Each backend server is connected to an EMC Cx500 SAN (no shared access or anything fancy) with 800GB of mail spool each. The commands used to build the filesystems were: mkfs -t ext3 -j -m 1 -O dir_index /dev/sdb1 tune2fs -c 0 -i 0 /dev/sdb1 The filesystem is mounted like so: /dev/sdb1/private ext3defaults,data=ordered,noatime 0 2 If you want more information, just ask. :) Andy Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
-- Rich Graves [EMAIL PROTECTED] is rumored to have mumbled on 28. Juli 2006 15:52:17 -0500 regarding Re: High availability email server...: My question: So is *anyone* here happy with Cyrus on ext3? Yes. We use it on a SAN with a 800 GB partition for /var/spool/imap. -- Sebastian Hagedorn - RZKR-R1 (Flachbau), Zi. 18, Robert-Koch-Str. 10 Zentrum für angewandte Informatik - Universitätsweiter Service RRZK Universität zu Köln / Cologne University - Tel. +49-221-478-5587 pgpWa1AcYJjGH.pgp Description: PGP signature Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Andrew Morgan wrote: On Fri, 28 Jul 2006, Rich Graves wrote: My question: So is *anyone* here happy with Cyrus on ext3? We're a small site, only 3200 users, 246GB mail. I'd really rather not try anything more exotic for supportability reasons, but I'm getting worried that our planned move from Solaris 9/VxFS to RHEL4/ext3 on significantly newer and faster hardware is going to be a downgrade. We run Cyrus on ext3 under Debian Linux without complaints here. We have approximately 35000 mailboxes/users split between 2 backend servers. Each backend server is connected to an EMC Cx500 SAN (no shared access or anything fancy) with 800GB of mail spool each. The commands used to build the filesystems were: mkfs -t ext3 -j -m 1 -O dir_index /dev/sdb1 tune2fs -c 0 -i 0 /dev/sdb1 The filesystem is mounted like so: /dev/sdb1/private ext3defaults,data=ordered,noatime 0 2 If you want more information, just ask. :) Andy We also use ext3 not because I think it's the fastest or has the most features but because it just works. We do volume management with EVMS and I had a lot of trouble getting XFS and other file systems to snapshot correctly under heavy load without the box eventually running into a situation where all processes started to hang waiting for IO eventually causing a system crash. Ext3 worked every time so the choice was obvious. I figured if it would survive a snapshot while I'm hitting it very hard with postal then odds of having problems in prod are going to be pretty slim. One thing that ext3 does have going for it is the fact that it is the most tested and most common file system on linux. schu Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
High availability email server...
OK...I'm searching for strategies to have a realtime email backup in the event of backend failure. We've been running cyrus-imap for about a year and a half with incredible success. Our failures have all been due to using junky storage. One idea is to have a continuous rsync of the cyrus /var/spool/imap and /var/lib/imap to another server I've also considered delivering email to two discreet email backends and keeping the /var/lib/imap file sync'd . I don't think I can use murder to do this. Is anyone out there using RHEL in a cluster that would like to share their architecture? Any contractors out there that want to get paid to help us implement? Chad P. [EMAIL PROTECTED] Salk Institute for Biological Studies -- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High availability email server...
Chad A. Prey wrote: OK...I'm searching for strategies to have a realtime email backup in the event of backend failure. We've been running cyrus-imap for about a year and a half with incredible success. Our failures have all been due to using junky storage. One idea is to have a continuous rsync of the cyrus /var/spool/imap and /var/lib/imap to another server The newest version of Cyrus supports replication. I'd suggest looking into this. http://cyrusimap.web.cmu.edu/imapd/install-replication.html I've also considered delivering email to two discreet email backends and keeping the /var/lib/imap file sync'd . I don't think I can use murder to do this. Is anyone out there using RHEL in a cluster that would like to share their architecture? Any contractors out there that want to get paid to help us implement? Chad P. [EMAIL PROTECTED] Salk Institute for Biological Studies -- Kevin Baker Mission Vi Inc. [EMAIL PROTECTED] 858.454.5532 begin:vcard fn:Kevin Baker n:Baker;Kevin email;internet:[EMAIL PROTECTED] tel;work:858-454-5532 version:2.1 end:vcard Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
Scott Adkins wrote: --On Monday, September 26, 2005 6:45 PM +0200 David [EMAIL PROTECTED] wrote: Hello, I have a 'pseudo' High Availability SMTP system consisting in two servers running cyrus 2.2.5. The main problem I have is that only one of the two nodes can access to the mailboxes in order to keep the integrity of the cyrus databases despite the filesystem (GFS) has support to allow to two different servers access in R/W mode. I am curious about this statement... What kind of locking is being used on GFS that prevents two nodes from accessing mailboxes without destroying the integrity of the cyrus database? yeah, that's what I asked me too. Unfortunately I haven't had the chance right now to test such a setup with multiple cyrus-instances upon the same shared GFS-filesystem. Both Cyrus instances would use the same databases, only the lock-files for the imapd's and popd's would need to be different I expect. Has anybody ever had such a setup in production use using Linux and GFS ? I haven't even tried it, but are very interested in it. What integrity-problem do you mean exactly, David. Have you already expirienced a problem ? regards -- Wolfgang Powisch [EMAIL PROTECTED] www.powo.priv.at Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
On Tue, 27 Sep 2005, Patrick Radtke wrote: We made great use of it Monday morning when one of our backend machines failed. Switching to the replica was quite simple and relatively fast (maybe 5 to 10 minutes from deciding to switch to the replica before replica was fully in action) We use the replication engine all the time to move users back and forth between systems so that we can patch and upgrade operating systems and/or Cyrus without any user visible downtime. There have also been a number of forced failovers because of hardware problems, specifically some dodgy RAID controller firmware that we were running for a few months until we got a fix. Its worked very nicely for us, but it is important that people don't just trust the software blindly. We maintain and constantly regenerate a database of MD5 checksums for all of the messages and cache entries across the cluster. Its been a long time now since this has turned up errors, but I still check it religiously. I consider the code to stable, though on occasion strange things happen Which is not really my definition of stable :). (e.g. when user renames user.INBOX to user.saved.INBOX) and you have to restart the replication process (no downtime to Cyrus involved). This one is odd behaviour on the part of mboxlist_renamemailbox(): it does special magic when running as a non-admin user. There's actually a more serious underlying bug in Cyrus here which I believe Ken is working on. Again we don't see this one. Partly because our replication engine doesn't run as an admin user (afraid you don't have that option), partly because of overenthusiastic hacking on my part in other parts of the Cyrus code. -- David Carter Email: [EMAIL PROTECTED] University Computing Service,Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH. Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
On Tue, 2005-09-27 at 08:51 +0800, Ow Mun Heng wrote: On Mon, 2005-09-26 at 10:03 -0700, Aaron Glenn wrote: On 9/26/05, David [EMAIL PROTECTED] wrote: Is there any way to achieve this goal using cyrus? Which is the best approach to this scenario? Run daily imapsync via cron and a Load Balancer forward the requests to the active one? Any help would be appreciated. There is replication code in the 2.3 branch; though from what I can tell it hasn't been touched in a few months and makes me wonder if it's being actively developed still. Nevertheless, in my exhaustive search for any and all information on IMAP replication, I came across a few list posts detailing the 2.3 replication code in production, without many issues, for over a year. I would be eternally grateful if someone on the list more knowledgeable detailed their experiences with replication. I would be very interested in this solution as well. I would also be interested in replication advice. Thanks, -- Brad Crotchett, RHCE [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
brad wrote: On Tue, 2005-09-27 at 08:51 +0800, Ow Mun Heng wrote: On Mon, 2005-09-26 at 10:03 -0700, Aaron Glenn wrote: On 9/26/05, David [EMAIL PROTECTED] wrote: Is there any way to achieve this goal using cyrus? Which is the best approach to this scenario? Run daily imapsync via cron and a Load Balancer forward the requests to the active one? Any help would be appreciated. There is replication code in the 2.3 branch; though from what I can tell it hasn't been touched in a few months and makes me wonder if it's being actively developed still. Nevertheless, in my exhaustive search for any and all information on IMAP replication, I came across a few list posts detailing the 2.3 replication code in production, without many issues, for over a year. I would be eternally grateful if someone on the list more knowledgeable detailed their experiences with replication. I would be very interested in this solution as well. I would also be interested in replication advice. Thanks, I too am very interested in this replication solution. Where can I get the src and documentation ? Regards, João Assad Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
On Wed, 2005-09-28 at 09:02 +0100, David Carter wrote: We use the replication engine all the time to move users back and forth between systems so that we can patch and upgrade operating systems and/or Cyrus without any user visible downtime. I read the documentation on replication and am interested in trying it. I have several servers that run a single domain, but are using virtdomain anyway. I would like to have one virtdomain replica server that serves as a hot spare to all of these servers. In other words server A would replicate domain A mailboxes to the replica and server B would replicate domain B to the replica. If server A fails then I could bring up the replica server for domain A but not domain B (just by not pointing domain B to the replica server). Is this possible? Thanks, -- Brad Crotchett, RHCE [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
On Wed, 2005-09-28 at 12:41 -0300, João Assad wrote: I too am very interested in this replication solution. Where can I get the src and documentation ? Regards, João Assad This is a good start: http://www-uxsup.csx.cam.ac.uk/~dpc22/cyrus/replication.html Thanks, -- Brad Crotchett, RHCE [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
João Assad wrote: brad wrote: On Tue, 2005-09-27 at 08:51 +0800, Ow Mun Heng wrote: On Mon, 2005-09-26 at 10:03 -0700, Aaron Glenn wrote: On 9/26/05, David [EMAIL PROTECTED] wrote: Is there any way to achieve this goal using cyrus? Which is the best approach to this scenario? Run daily imapsync via cron and a Load Balancer forward the requests to the active one? Any help would be appreciated. There is replication code in the 2.3 branch; though from what I can tell it hasn't been touched in a few months and makes me wonder if it's being actively developed still. Nevertheless, in my exhaustive search for any and all information on IMAP replication, I came across a few list posts detailing the 2.3 replication code in production, without many issues, for over a year. I would be eternally grateful if someone on the list more knowledgeable detailed their experiences with replication. I would be very interested in this solution as well. I would also be interested in replication advice. Thanks, I too am very interested in this replication solution. Where can I get the src and documentation ? Its in the 2.3 branch of Cyrus CVS (tag cyrus-imapd-2_3) -- Kenneth Murchison Oceana Matrix Ltd. Software Engineer 21 Princeton Place 716-662-8973 x26 Orchard Park, NY 14127 --PGP Public Key--http://www.oceana.com/~ken/ksm.pgp Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
David Carter wrote: On Wed, 28 Sep 2005, brad wrote: I read the documentation on replication and am interested in trying it. I have several servers that run a single domain, but are using virtdomain anyway. I would like to have one virtdomain replica server that serves as a hot spare to all of these servers. In other words server A would replicate domain A mailboxes to the replica and server B would replicate domain B to the replica. If server A fails then I could bring up the replica server for domain A but not domain B (just by not pointing domain B to the replica server). Is this possible? We typically run with half the accounts on a given server as masters and the other half as replicas to reduce fallout from a single server failing. We don't use virtual domains. I'm afraid I don't know if the code in 2.3 supports virtual domains. Ken? I haven't tried it, but I've done nothing to purposely break replication of virtdomains. Note that the code in CVS does NOT yet allow load balancing the mailboxes across servers as David's does. Shared mailboxes makes this a more difficult nut to crack. -- Kenneth Murchison Oceana Matrix Ltd. Software Engineer 21 Princeton Place 716-662-8973 x26 Orchard Park, NY 14127 --PGP Public Key--http://www.oceana.com/~ken/ksm.pgp Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
On Wed, 2005-09-28 at 13:45 -0400, Ken Murchison wrote: I haven't tried it, but I've done nothing to purposely break replication of virtdomains. I might give it a try and report back then. Thanks, -- Brad Crotchett, RHCE [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
On Mon, 26 Sep 2005, Aaron Glenn wrote: There is replication code in the 2.3 branch; though from what I can tell it hasn't been touched in a few months and makes me wonder if it's being actively developed still. Nevertheless, in my exhaustive search for any and all information on IMAP replication, I came across a few list posts detailing the 2.3 replication code in production, without many issues, for over a year. I wrote the code which was eventually merged into the 2.3 branch back in Autumn 2002. We've been using it on our production systems for a little over two years now, and all of our users have been replicated (rolling replication to hot spare system, plus nightly replication to tape spooling array) for about 18 months. The last significant change to my code base was November last year. That's not a sign of neglect, the code just does everything we need right now. Ken merged the code into 2.3 at the start of this year. He put in quite a lot of work to merge the code properly into Cyrus (I had deliberately left it to one side to make updates easier) and add support for features such as shared mailboxes and annotation that we just don't need right now. Its still conceptually the same code and the same design. Invariably working on code introduces new bugs (including a particularly exciting one caused by a stray semicolon). People are also pushing the code in new and interesting ways. Ken fixed a bug involving account renaming (user.xxx - user.yyy) a couple of weeks back: that's something our nightly useradmin scripts just never try to do. Looking back at features which were introduced into earlier versions of Cyrus, I imagine that people will start to test the new code seriously when Cyrus 2.3 is released, and that there will be a period of fixing bugs which will tail off as 2.3 stabilises. The complication is that there doesn't appear to be anyone left at CMU to release new versions of Cyrus at the moment. Poor Jeffrey Eaton seems to be the last man standing there. My own experience of running things single handed is that it doesn't leave much time for development work. -- David Carter Email: [EMAIL PROTECTED] University Computing Service,Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH. Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
David Carter wrote: The complication is that there doesn't appear to be anyone left at CMU to release new versions of Cyrus at the moment. Poor Jeffrey Eaton seems to be the last man standing there. My own experience of running things single handed is that it doesn't leave much time for development work. Jeff will have development help real soon now. -- Kenneth Murchison Oceana Matrix Ltd. Software Engineer 21 Princeton Place 716-662-8973 x26 Orchard Park, NY 14127 --PGP Public Key--http://www.oceana.com/~ken/ksm.pgp Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
We are running the replication code in production at Columbia. We made great use of it Monday morning when one of our backend machines failed. Switching to the replica was quite simple and relatively fast (maybe 5 to 10 minutes from deciding to switch to the replica before replica was fully in action) I consider the code to stable, though on occasion strange things happen (e.g. when user renames user.INBOX to user.saved.INBOX) and you have to restart the replication process (no downtime to Cyrus involved). -Patrick Radtke On Sep 27, 2005, at 8:24 AM, Ken Murchison wrote: David Carter wrote: The complication is that there doesn't appear to be anyone left at CMU to release new versions of Cyrus at the moment. Poor Jeffrey Eaton seems to be the last man standing there. My own experience of running things single handed is that it doesn't leave much time for development work. Jeff will have development help real soon now. -- Kenneth Murchison Oceana Matrix Ltd. Software Engineer 21 Princeton Place 716-662-8973 x26 Orchard Park, NY 14127 --PGP Public Key--http://www.oceana.com/~ken/ksm.pgp Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
High-Availability IMAP server
Hello, I have a 'pseudo' High Availability SMTP system consisting in two servers running cyrus 2.2.5. The main problem I have is that only one of the two nodes can access to the mailboxes in order to keep the integrity of the cyrus databases despite the filesystem (GFS) has support to allow to two different servers access in R/W mode. I've read about cyrus-murder which allows to distribute mailboxes along different servers but if the server that has the mailbox for [EMAIL PROTECTED] goes offline, this mailbox is not available. With maildir/mailbox format, there is no additional integrity mechanism so any server with R/W access to the filesystem can provide the mailbox via POP3/IMAP, etc. Is there any way to achieve this goal using cyrus? Which is the best approach to this scenario? Run daily imapsync via cron and a Load Balancer forward the requests to the active one? Any help would be appreciated. Regards, David Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
On 9/26/05, David [EMAIL PROTECTED] wrote: Is there any way to achieve this goal using cyrus? Which is the best approach to this scenario? Run daily imapsync via cron and a Load Balancer forward the requests to the active one? Any help would be appreciated. There is replication code in the 2.3 branch; though from what I can tell it hasn't been touched in a few months and makes me wonder if it's being actively developed still. Nevertheless, in my exhaustive search for any and all information on IMAP replication, I came across a few list posts detailing the 2.3 replication code in production, without many issues, for over a year. I would be eternally grateful if someone on the list more knowledgeable detailed their experiences with replication. regards, aaron.glenn Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
--On Monday, September 26, 2005 6:45 PM +0200 David [EMAIL PROTECTED] wrote: Hello, I have a 'pseudo' High Availability SMTP system consisting in two servers running cyrus 2.2.5. The main problem I have is that only one of the two nodes can access to the mailboxes in order to keep the integrity of the cyrus databases despite the filesystem (GFS) has support to allow to two different servers access in R/W mode. I am curious about this statement... What kind of locking is being used on GFS that prevents two nodes from accessing mailboxes without destroying the integrity of the cyrus database? In our environment, we have a cluster of 4 Alpha machines, two of them are ES40's and two of them are ES80's. They run Tru64 5.1 (TruCluster) and are attached to an HA San using AdvFS. All the members of the cluster can see all the filesystems and can access all the files and directories. We are currently only running Cyrus on the two ES80 machines, but we could easily run them on all four cluster members if we wanted too... we don't because we do run other things (i.e. Sendmail) and it is better not to mix Cyrus and Sendmail on the same machines in our environment. That being said... the mailboxes are all available from the Cyrus servers running on any cluster member. We don't see any integrity issues and it seems to run pretty good. Since Tru64 and Alpha's are on their way out the door, we are looking for a future solution that would give us the as much of the same capabilities our current environment has. This is most likely going to include Linux, but that then means we need to find a suitable cluster-like file system to replace AdvFS, which could be GFS. Anyways, I am interested in the shortcomings that you guys have encountered with reliability and integrity when trying to run an HA Cyrus server... Thanks, Scott -- +---+ Scott W. Adkinshttp://www.cns.ohiou.edu/~sadkins/ UNIX Systems Engineer mailto:[EMAIL PROTECTED] ICQ 7626282 Work (740)593-9478 Fax (740)593-1944 +---+ PGP Public Key available at http://www.cns.ohiou.edu/~sadkins/pgp/ pgpktJjzewxXD.pgp Description: PGP signature Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
Is there any way to achieve this goal using cyrus? Which is the best approach to this scenario? Run daily imapsync via cron and a Load Balancer forward the requests to the active one? Here's my approach: setup heartbeat with two ethernet heartbeats, shared storage (SAN), and pray a bunch that split-brain doesn't happen. :) John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: High-Availability IMAP server
On Mon, 2005-09-26 at 10:03 -0700, Aaron Glenn wrote: On 9/26/05, David [EMAIL PROTECTED] wrote: Is there any way to achieve this goal using cyrus? Which is the best approach to this scenario? Run daily imapsync via cron and a Load Balancer forward the requests to the active one? Any help would be appreciated. There is replication code in the 2.3 branch; though from what I can tell it hasn't been touched in a few months and makes me wonder if it's being actively developed still. Nevertheless, in my exhaustive search for any and all information on IMAP replication, I came across a few list posts detailing the 2.3 replication code in production, without many issues, for over a year. I would be eternally grateful if someone on the list more knowledgeable detailed their experiences with replication. I would be very interested in this solution as well. -- Ow Mun Heng Gentoo/Linux on DELL D600 1.4Ghz 1.5GB RAM 98% Microsoft(tm) Free!! Neuromancer 08:51:25 up 2 days, 13:37, 7 users, load average: 0.31, 0.31, 1.33 Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: high-availability again
Hi, Ok the solution with SAN seems good but did someone try this with the linx virtual server (lvs) ??? Dave McMurtrie a écrit : zorg wrote: Hi I'v seen in the list lot of discussion about availabity but none of them seem to give a complete answers I have been asked to build an high-availability for 5000 users I was wondering what is actually the best solution Using murder Idon' t really understand if it can help me. it's purpose is for load balancing. Murder, by itself does not give you high availability. It does give you scalability. but some people on this list seem to use it for availabily like this - Server A - active accounts 1-100 - replicate accounts 101-200 from Server B - Server B - active accounts 101-200 - replicate accounts 1-100 from Server A If B goes down, A takes over the accounts it had replicated from B. if someone can explain the detail of this conf ? - the tool use to replicate ? - what configuration of the MUPDATE make it to switch the user to server A from B ?? I'm not familiar with this. Replication with rsync see to slow the 5000 user It'd be tough to do this real-time. We used to have a setup where we'd rsync to a standby server each night. The plan was to use it as a warm-standy in case the primary server would happen to fail. Fortunately that never happened. Cluster with block device but if you have a heavily corrupted filesystem. yau are stuck. and recovery can be long I'm not sure exactly what you mean here, but I think it's safe to say that any time you have a corrupted filesystem it's bad whether it's a clustered filesystem or not. Using a SAN : Connect your two servers to a SAN, and store all of Cyrus' data on one LUN, which both servers have access to. Then, set your cluster software to automatically mount the file system before starting Cyrus. We're doing this. We have a 4-node Veritas cluster with all imap data residing on a SAN. Overall it's working quite well. We had to make some very minor cyrus code changes so it'd get along well with Veritas' cluster filesystem. This setup gives us high availability and scalability. but if you have a heavily corrupted filesystem. yau are stuck. and recovery can be long Again, yes. It would be bad if we had a corrupt filesystem. Thanks, Dave --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: high-availability again
Hi c ould you give me just some more explaination of what is the stage./ files used during LMTP delivery have unique filenames so if i underdstand what you saying. if the stage./ files used during LMTP delivery is the same for all the node of the cluster share the same SAN then there won't be any problem thank Ben Carter a écrit : Dave McMurtrie wrote: Amos wrote: What sort of changes did you have to make? We just had to change map_refresh() to call mmap() with MAP_PRIVATE instead of MAP_SHARED. Since mmap() is being called with PROT_READ anyway, this doesn't affect the operation of the application since the mapped region can never be updated. Veritas CFS was not very efficient about maintaining cache coherency across all cluster nodes when we were using MAP_SHARED. Everything worked, but under heavy load it became extremely slow. Actually, the important code change for any active/active cluster configuration is to make sure the stage./ files used during LMTP delivery have unique filenames across the cluster. There are some other setup differences related to this same issue such as symlinking /var/imap/proc, /var/imap/socket, and if you care /var/imap/log to local filesystem space on each cluster node. You could make these filenames unique across the cluster with code changes if you want to make the code changes for these also. We added a clusternode parameter to imapd.conf to accomplish this for the LMTP stage./ files. Otherwise, it just worked. Ben --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: high-availability again
Amos wrote: So y'all are doing active/active? What version of Cyrus? Yes. We're running 2.1.17. Thanks, Dave --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: high-availability again
zorg wrote: Hi c ould you give me just some more explaination of what is the stage./ files used during LMTP delivery have unique filenames so if i underdstand what you saying. if the stage./ files used during LMTP delivery is the same for all the node of the cluster share the same SAN then there won't be any problem thank In imap/append.c (at least in our Cyrus version) there is a function called append_newstage, and lmtpd uses this routine for mail delivery. The name of the temporary file being created in this code is of the form pid-timestampseconds which of course is not guaranteed to be unique across the cluster, so we just changed this code to create a filename of the form clusternode-pid-timestampseconds. (If you truss/strace lmtpd during message delivery, you'll understand this right away) Probably what should also happen in the std. code if some cluster support is officially added is that master should use some exclusive locking mechanism when it starts, to guarantee that a sysadmin's typo in imapd.conf won't allow cluster nodes to share a node ID, if the technique of configuring a cluster node ID in imapd.conf is used. And probably also, a node ID should always be required (which our code does). When we get a chance, we're going to talk to Derrick about getting some cluster support into the std. code. Ben -- Ben Carter University of Pittsburgh/CSSD [EMAIL PROTECTED] 412-624-6470 --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: high-availability again
Ben Carter wrote: When we get a chance, we're going to talk to Derrick about getting some cluster support into the std. code. That would be most impressive. I wonder how much Ken's work with 2.3 would fit in with this? --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: high-availability again
Amos wrote: Ben Carter wrote: When we get a chance, we're going to talk to Derrick about getting some cluster support into the std. code. That would be most impressive. I wonder how much Ken's work with 2.3 would fit in with this? My code in 2.3 uses the Murder code to keep local copies of mailboxes.db on each node in the cluster. -- Kenneth Murchison Oceana Matrix Ltd. Software Engineer 21 Princeton Place 716-662-8973 x26 Orchard Park, NY 14127 --PGP Public Key--http://www.oceana.com/~ken/ksm.pgp --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
high-availability again
Hi I'v seen in the list lot of discussion about availabity but none of them seem to give a complete answers I have been asked to build an high-availability for 5000 users I was wondering what is actually the best solution Using murder Idon' t really understand if it can help me. it's purpose is for load balancing. but some people on this list seem to use it for availabily like this - Server A - active accounts 1-100 - replicate accounts 101-200 from Server B - Server B - active accounts 101-200 - replicate accounts 1-100 from Server A If B goes down, A takes over the accounts it had replicated from B. if someone can explain the detail of this conf ? - the tool use to replicate ? - what configuration of the MUPDATE make it to switch the user to server A from B ?? Replication with rsync see to slow the 5000 user Cluster with block device but if you have a heavily corrupted filesystem. yau are stuck. and recovery can be long Using a SAN : Connect your two servers to a SAN, and store all of Cyrus' data on one LUN, which both servers have access to. Then, set your cluster software to automatically mount the file system before starting Cyrus. but if you have a heavily corrupted filesystem. yau are stuck. and recovery can be long --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: high-availability again
zorg wrote: Hi I'v seen in the list lot of discussion about availabity but none of them seem to give a complete answers I have been asked to build an high-availability for 5000 users I was wondering what is actually the best solution Using murder Idon' t really understand if it can help me. it's purpose is for load balancing. Murder, by itself does not give you high availability. It does give you scalability. but some people on this list seem to use it for availabily like this - Server A - active accounts 1-100 - replicate accounts 101-200 from Server B - Server B - active accounts 101-200 - replicate accounts 1-100 from Server A If B goes down, A takes over the accounts it had replicated from B. if someone can explain the detail of this conf ? - the tool use to replicate ? - what configuration of the MUPDATE make it to switch the user to server A from B ?? I'm not familiar with this. Replication with rsync see to slow the 5000 user It'd be tough to do this real-time. We used to have a setup where we'd rsync to a standby server each night. The plan was to use it as a warm-standy in case the primary server would happen to fail. Fortunately that never happened. Cluster with block device but if you have a heavily corrupted filesystem. yau are stuck. and recovery can be long I'm not sure exactly what you mean here, but I think it's safe to say that any time you have a corrupted filesystem it's bad whether it's a clustered filesystem or not. Using a SAN : Connect your two servers to a SAN, and store all of Cyrus' data on one LUN, which both servers have access to. Then, set your cluster software to automatically mount the file system before starting Cyrus. We're doing this. We have a 4-node Veritas cluster with all imap data residing on a SAN. Overall it's working quite well. We had to make some very minor cyrus code changes so it'd get along well with Veritas' cluster filesystem. This setup gives us high availability and scalability. but if you have a heavily corrupted filesystem. yau are stuck. and recovery can be long Again, yes. It would be bad if we had a corrupt filesystem. Thanks, Dave --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: high-availability again
We're doing this. We have a 4-node Veritas cluster with all imap data residing on a SAN. Overall it's working quite well. We had to make some very minor cyrus code changes so it'd get along well with Veritas' cluster filesystem. This setup gives us high availability and scalability. What sort of changes did you have to make? We're planning on doing something similar with Sun Cluster. Why Sun Cluster? Well, it's the only cluster environment that WebCT supports, and managing two completely different clustering technologies would be a bit much for our little brains. ;-) Not planning on doing anything particularly sophisticated. Just failover when one node fails. Amos --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: high-availability again
Amos wrote: What sort of changes did you have to make? We just had to change map_refresh() to call mmap() with MAP_PRIVATE instead of MAP_SHARED. Since mmap() is being called with PROT_READ anyway, this doesn't affect the operation of the application since the mapped region can never be updated. Veritas CFS was not very efficient about maintaining cache coherency across all cluster nodes when we were using MAP_SHARED. Everything worked, but under heavy load it became extremely slow. We're planning on doing something similar with Sun Cluster. Why Sun Cluster? Well, it's the only cluster environment that WebCT supports, and managing two completely different clustering technologies would be a bit much for our little brains. ;-) The folks who ran the upgrade project here tested both Sun Cluster and Veritas Cluster. They chose Veritas Cluster. We have a few other systems here running Sun Cluster (not related to our cyrus installation). Personally, I haven't been overwhelming impressed by Sun Cluster and I really like Veritas Cluster. I have to admit that I try to avoid the systems here that have Sun Cluster on them, so my opinion might be unfairly biased. I can also say that we received unmatched technical support from Veritas when we did encounter problems. Thanks, Dave --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: high-availability again
Dave McMurtrie wrote: Amos wrote: What sort of changes did you have to make? We just had to change map_refresh() to call mmap() with MAP_PRIVATE instead of MAP_SHARED. Since mmap() is being called with PROT_READ anyway, this doesn't affect the operation of the application since the mapped region can never be updated. Veritas CFS was not very efficient about maintaining cache coherency across all cluster nodes when we were using MAP_SHARED. Everything worked, but under heavy load it became extremely slow. Actually, the important code change for any active/active cluster configuration is to make sure the stage./ files used during LMTP delivery have unique filenames across the cluster. There are some other setup differences related to this same issue such as symlinking /var/imap/proc, /var/imap/socket, and if you care /var/imap/log to local filesystem space on each cluster node. You could make these filenames unique across the cluster with code changes if you want to make the code changes for these also. We added a clusternode parameter to imapd.conf to accomplish this for the LMTP stage./ files. Otherwise, it just worked. Ben -- Ben Carter University of Pittsburgh/CSSD [EMAIL PROTECTED] 412-624-6470 --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: high-availability again
Ben Carter wrote: Actually, the important code change for any active/active cluster configuration is to make sure the stage./ files used during LMTP delivery have unique filenames across the cluster. There are some other setup differences related to this same issue such as symlinking /var/imap/proc, /var/imap/socket, and if you care /var/imap/log to local filesystem space on each cluster node. You could make these filenames unique across the cluster with code changes if you want to make the code changes for these also. We added a clusternode parameter to imapd.conf to accomplish this for the LMTP stage./ files. Otherwise, it just worked. Ben So y'all are doing active/active? What version of Cyrus? Amos --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Funding Cyrus High Availability
David Carter wrote: 5. Active/Active designate one of the boxes as primary and identify all items in the datastore that absolutly must not be subject to race conditions between the two boxes (message UUID for example). In addition to implementing the replication needed for #1 modify all functions that need to update these critical pieces of data to update them on the master and let the master update the other box. We may be talking at cross purposes (and its entirely likely that I've got the wrong end of the stick!), but I consider active-active to be the case where there is no primary: users can make changes to either system, and if the two systems lose touch with each other they have to resolve their differences when contact is reestablished. I'd go for #5 as well: Since this is a setup where there is no primary at all, I suppose this is quite some different design then the #1-4 solutions. And because of that, I would think that it's rather useless to have these steps done in order to get #5 right, but I might as well be wrong. I would be most happy when the work would start on #5. Personally I don't care that much at this moment for #6, but I can imagine that this is different for others. But well; if the design is that every machine tracks changes and they have them propagated (actively or passively) to n hosts (it's not so hard to keep track of that, all hosts had this change; remove it) there is no risk of missing things or not recovering I guess. (It's only possible that a slave is out of sync for a very short time, and well - why would that be so wrong? And if that is so wrong, then maybe fix that later since this would make the work easier?) This could be the task of the cyrus daemon, but it can as well be the work of murder as Jure suggests. (Or both?) I'm not entirely sure that that is what we want, but it could be done if that fits nicely (and it can be asured that there is always a murder to talk to). If there is a problem with UID selection, I don't see a problem in that one of the servers is responsible for that task. We don't even need an election system for that, you could define a sequence for the servers; if a server with the highest preference is down, then take over its job. It's just that for the users the machines should appear all active. (And that in case of failover the remaining machines remain active, and not readonly or only active after manual intervention.) Paul --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Funding Cyrus High Availability
On Sun, 19 Sep 2004, David Lang wrote: here is the problem. you have a new message created on both servers at the same time. how do you allocate the UID without any possibility of stepping on each other? With a new UIDvalidity you can choose any ordering you like. Of course one of the two servers has to make that choice, and the potential for race conditions here and elsewhere in an active-active solution is amusing. -- David Carter Email: [EMAIL PROTECTED] University Computing Service,Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH. --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Funding Cyrus High Availability
On Sun, 19 Sep 2004, David Lang wrote: assiming that the simplest method would cost ~$3000 to code I would make a wild guess that the ballpark figures would be 1. active/passive without automatic failover $3k 2. active/passive with automatic failover (limited to two nodes or withing a murder cluster) $4k 3. active/passive with updates pushed to the master $5k 4. #3 with auto failover (failover not limited to two nodes or a single murder cluster) $7k 5. active/active (limited to a single geographic location) $10k 6. active/active/active (no limits) $30k in addition to automaticly re-merge things after a split-brin has happened would probably be another $5k I think that you are missing a zero (or at least a fairly substantial multipler!) from 5. 1 - 4 can be done without substantial changes to the Cyrus core code, and Ken would be able to use my code as a reference implementation, even if he wanted to recode everything from scratch. 5 and 6 would require a much more substantial redesign and I suspect quite a lot of trial and error as this is unexplored territory for IMAP servers. -- David Carter Email: [EMAIL PROTECTED] University Computing Service,Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH. --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Funding Cyrus High Availability
On Mon, 20 Sep 2004, David Carter wrote: On Sun, 19 Sep 2004, David Lang wrote: assiming that the simplest method would cost ~$3000 to code I would make a wild guess that the ballpark figures would be 1. active/passive without automatic failover $3k 2. active/passive with automatic failover (limited to two nodes or withing a murder cluster) $4k 3. active/passive with updates pushed to the master $5k 4. #3 with auto failover (failover not limited to two nodes or a single murder cluster) $7k 5. active/active (limited to a single geographic location) $10k 6. active/active/active (no limits) $30k in addition to automaticly re-merge things after a split-brin has happened would probably be another $5k I think that you are missing a zero (or at least a fairly substantial multipler!) from 5. 1 - 4 can be done without substantial changes to the Cyrus core code, and Ken would be able to use my code as a reference implementation, even if he wanted to recode everything from scratch. 5 and 6 would require a much more substantial redesign and I suspect quite a lot of trial and error as this is unexplored territory for IMAP servers. Thanks, this is exactly the type of feedback that I was hopeing to get. so you are saying that #5 is more like $50k-100k and #6 goes up from there Ok folks, how much are you really willing to pay for this and since the amount of work involved translates fairly directly into both cost and time how long are you willing to go with nothing? David Lang -- David Carter Email: [EMAIL PROTECTED] University Computing Service,Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH. --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Funding Cyrus High Availability
On Mon, 20 Sep 2004, Paul Dekkers wrote: David Carter wrote: 5. Active/Active designate one of the boxes as primary and identify all items in the datastore that absolutly must not be subject to race conditions between the two boxes (message UUID for example). In addition to implementing the replication needed for #1 modify all functions that need to update these critical pieces of data to update them on the master and let the master update the other box. We may be talking at cross purposes (and its entirely likely that I've got the wrong end of the stick!), but I consider active-active to be the case where there is no primary: users can make changes to either system, and if the two systems lose touch with each other they have to resolve their differences when contact is reestablished. I'd go for #5 as well: Since this is a setup where there is no primary at all, I suppose this is quite some different design then the #1-4 solutions. And because of that, I would think that it's rather useless to have these steps done in order to get #5 right, but I might as well be wrong. actually I think most of the work nessasary for #1 is also needed for #5-6. for #1 you need to have the ability for a system report all it's changes to a daemon and the ability for a system to read in changes and implement them. #5 needs the same abilities plus the ability to resolve conflicts. the HA steps of #2 and #4 don't gain that much, but they can also be done external to cyrus so it's not a problem to skip them. #3 involves changes to the update code to have cyrus take special actions with soem types of updates. there would need to be changes in the same area for #5, but they would be different. David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Funding Cyrus High Availability
On Mon, 20 Sep 2004, David Lang wrote: Thanks, this is exactly the type of feedback that I was hopeing to get. so you are saying that #5 is more like $50k-100k and #6 goes up from there If anyone could implement Active-Active for Cyrus from scratch in 100 to 150 hours it would be Ken, but I think that its a tall order. Sorry. -- David Carter Email: [EMAIL PROTECTED] University Computing Service,Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH. --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
RE: Funding Cyrus High Availability
On Fri, 17 Sep 2004 [EMAIL PROTECTED] wrote: From: David Lang [mailto:[EMAIL PROTECTED] Mike, one of the problems with this is that different databases have different interfaces and capabilities. if you design it to work on Oracle then if you try to make it work on MySQL there are going to be quite a few things you need to change. --snip another issue in all this is the maintainance of the resulting code. If this code can be used in many different situations then more people will use it (probably including CMU) and it will be maintained as a side effect of any other changes. however if it's tailored towards a very narrow situation then only the people who have that particular problem will use it and it's likly to have issues with new changes. I'd actually figured something like ODBC would be used, with prepared statements. /shrug. Abstract the whole interface issue. unfortunantly there are a few problems with this to start with ODBC is not readily available on all platforms. secondly ODBC can't cover up the fact that different database engines have vastly differeing capabilities. if you don't use any of these capabilities then you don't run into this pitfall, but if you want to you will. I really wish that ODBC did live up to it's hype, but in practice only the most trivial database users can transparently switch from database to database by changing the ODBC config David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Funding Cyrus High Availability
There are many ways of doing High Availability. This is an attempt to outline the various methods with the advantages and disadvantages. Ken and David (and anyne else who has thoughts on this) please feel free to add to this. I'm attempting to outline them roughly in order of complexity. 1. Active-Slave replication with manual failover This is where you can configure one machine to output all changes to a local daemon and another machine to implement the changes that are read from a local daemon. Pro: simplist implementation, since it makes no assumptions about how you are going to use it, it also sets no limits on how it is used. This is the basic functionality that all other variations will need so it's not wasted work no matter what is done later allows for multiple slaves from a single master allows for the propogation traffic pattern to be defined by the sysadmin (either master directly to all slaves or a tree-like propogation to save on WAN bandwidth when multiple slaves are co-located by involving a local daemon at each server there is a lot of flexibility in exactly how the replication takes place. for example you could use netcat as your daemon for instant transmission of the messages have a daemon that caches the messages so that if the link drops the messages are saved have a daemon that gets an acknowlegement from the far side that the message got through have a daemon that batches the messages up and compresses them for more efficiant transport have a daemon that delays all messages by a given time period to give you a way to recover from logical corruption without having to go to a backup have a daemon that filters the messages (say one that updates everything except it won't delete any messages so you have a known safe archive of all messages) etc Con: since it makes no assumptions about how you are going to use it, it also gives you no help in useing it in any particular way 2. Active-Slave replication with automatic failover This takes #1, limits it to a pair of boxes and through changes to murder or other parts of cyrus will swap the active/slave status of the two boxes Pro: makes setting up of a HA pair of boxes easier increases availability by decreasing downtime Con: this functionality can be duplicated without changes to cyrus by the use of an external HA/cluster software package. Since this now assumes a particular mode of operation it starts to limit other uses (for example, if this is implemented as part of murder then it won't help much if you are trying to replicate to a DR datacenter several thousand miles away). Split-brain conditions are the responsibility of cyrus to prevent or solve. These are fundamentaly hard problems to get right in all cases 3. Active-Slave replication with Slave able to accept client connections This takes #1 and then further modifies the slave so that requests that would change the contents of things get relayed to the active box and then the results of the change get propogated back down before they are visable to the client. Pro: simulates active/active operation although it does cause longer delays when clients issue some commands. use of slaves for local access can reduce the load on the master resulting in higher performance. can be cascaded to multiple slaves and multiple tiers of slaves as needed in case of problems on the master the slaves can continue to operate as read-only servers providing degraded service while the master is fixed. depending on the problem with the master this may be very preferable to having to re-sync the master or recover from a split-brain situation Con: more extensive modifications needed to trap all changes and propogate them up to the master how does the slave know when the master has implemented the change (so that it can give the result to the client) raises questions about the requirement to get confirmation og all updates before the slave can respond to the client (for example, if a slave decides to read a message that is flagged as new should the slave wait until the master confirms that it knows the message has been read before it gives it to the client, or should it give the message to the client and not worry if the update fails on the master) since the slave needs to send updates to the master the latency of the link between them can become a limiting factor in the performance that clients see when connecting to the slave 4. #3 with automatic failover Since #3 supports multiple slaves the number of failover senerios grow significantly. you have multiple machines that could be the new master and you have the split-brain senerio to watch out for. Pro: increased availability by decreasing failover time potentially easier to setup then with external clustering software Con: increased complexity
Re: Funding Cyrus High Availability
On Sun, 19 Sep 2004, David Lang wrote: 5. Active/Active designate one of the boxes as primary and identify all items in the datastore that absolutly must not be subject to race conditions between the two boxes (message UUID for example). In addition to implementing the replication needed for #1 modify all functions that need to update these critical pieces of data to update them on the master and let the master update the other box. We may be talking at cross purposes (and its entirely likely that I've got the wrong end of the stick!), but I consider active-active to be the case where there is no primary: users can make changes to either system, and if the two systems lose touch with each other they have to resolve their differences when contact is reestablished. UUIDs aren't a problem (each machine in a cluster owns its own fraction of the address space). Message UIDs are a big problem. I guess in the case of conflict, you could bump the UIDvalidity value on a mailbox and reassign UIDs for all the messages, using timestamps determine the eventual ordering of messages. Now that I think about it, maybe that's not a totally absurd idea. It would involve a lot of work though. Pro: best use of available hardware as the load is split almost evenly between the boxes. best availability becouse if there is a failure half of the clients won't see it at all Actually this is what I do right now by having two live mailstores. Half the mailboxes on each system are active, the remainder are passive. -- David Carter Email: [EMAIL PROTECTED] University Computing Service,Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH. --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Funding Cyrus High Availability
On Sun, 19 Sep 2004, David Carter wrote: On Sun, 19 Sep 2004, David Lang wrote: 5. Active/Active designate one of the boxes as primary and identify all items in the datastore that absolutly must not be subject to race conditions between the two boxes (message UUID for example). In addition to implementing the replication needed for #1 modify all functions that need to update these critical pieces of data to update them on the master and let the master update the other box. We may be talking at cross purposes (and its entirely likely that I've got the wrong end of the stick!), but I consider active-active to be the case where there is no primary: users can make changes to either system, and if the two systems lose touch with each other they have to resolve their differences when contact is reestablished. UUIDs aren't a problem (each machine in a cluster owns its own fraction of the address space). Message UIDs are a big problem. I guess in the case of conflict, you could bump the UIDvalidity value on a mailbox and reassign UIDs for all the messages, using timestamps determine the eventual ordering of messages. Now that I think about it, maybe that's not a totally absurd idea. It would involve a lot of work though. the problem is that when they are both up you have to have one of them allocate the message UID's or you have to change the UIDVALIDITY for every new message that arrives. here is the problem. you have a new message created on both servers at the same time. how do you allocate the UID without any possibility of stepping on each other? the only way to do this is to have some sort of locking so that only one machine at a time can allocate UID's. you can shuffle this responsibility back and forth between machines, but there's a significant amount of overhead in doing this so the useual answer is just to have one machine issue the numbers and the other ask the first for a number when it needs it. changing UIDVALIDITY while recovering from a split-brain is probably going to be needed. but as you say it's a lot of work (which is why I'm advocating the simpler options get released first :-) Pro: best use of available hardware as the load is split almost evenly between the boxes. best availability becouse if there is a failure half of the clients won't see it at all Actually this is what I do right now by having two live mailstores. Half the mailboxes on each system are active, the remainder are passive. right, but what this would allow is sharing the load on individual mailboxes useually this won't matter, but I could see it for shared mailboxes David Lang -- David Carter Email: [EMAIL PROTECTED] University Computing Service,Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH. --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Funding Cyrus High Availability
On Sun, 19 Sep 2004 00:52:08 -0700 (PDT) David Lang [EMAIL PROTECTED] wrote: Nice review of replication ABC :) Here are my thoughts: 1. Active-Slave replication with manual failover This is really the simplest way to do it. Rsync (and friends) does 90% of the required job here; the only thing it's lacking is the concept of the mailbox as a unit. It would be nice if our daemon here would do its job in an atomic way. A few days ago someone was asking for an event notification system that would be able to call some program when a certain action happened on a mailbox. Something like this would come handy here i think :) 2. Active-Slave replication with automatic failover 2 is really just 1 + your heartbeat package of choice and some scripts to tie it all together. 3. Active-Slave replication with Slave able to accept client connections I think here would be good to start thinking about the app itself and define connections better. Cyrus has three kinds of connections that modify a mailbox: lmtp that puts new mails into mailbox, pop that (generally) retrieves (and delete) them and imap that does both plus some other (folder ops and moving mails around). Now if you deceide that it does not hurt you if slave is a bit out of date when it accepts a connection (but i guess most of us would find this unacceptable), you can ditch some of the complexity; but you'd want the changes that were made on the slave in that connection to propagate up to the master. I dont really like this, because the concepts of master and slave gets blurred here and things can easily end in a mess. Once you have mailstores that are synchronizing each other in a way that is not very well defined, you'll end up with conflicts sooner or later. There are some unpredictable factors like network latency that can lead you to unexpected situations easily. 4. #3 with automatic failover Another level of mess over 3 :) 5. Active/Active designate one of the boxes as primary and identify all items in the datastore that absolutly must not be subject to race conditions between the two boxes (message UUID for example). In addition to implementing the replication needed for #1 modify all functions that need to update these critical pieces of data to update them on the master and let the master update the other box. Exactly. This is the atomicy i was mentioning above. I'd say this is going to be the larger part of the job. 6. active/active/active/... This is what most of us would want. while #6 is the ideal option to have it can get very complex Despite everything you've said, i still think this *can* be done in a relatively simple way. See my previos mail where i was dreaming about the whole ha concept in a raid way. There i assumed murder as the only agent through which clinets would be able to access their mailboxes. If you think of murder handling all of the jobs of your daemon in 1-4, one thing that you gain immediately is much simpler synchronization of actions between the mailstore machines. If you start empty or with exactly the same data on two machines, all that murder needs to do is take care that both receive the same commands and data in the same order. Also if you put all logic into one place, backend mailstores need not to be taught any special tricks and can remain pretty much as they are today. Or am i missing something? personally I would like to see #1 (with a sample daemon or two to provide basic functionality and leave the doors open for more creative uses) followed by #3 while people try and figure out all the problems with #5 and #6 and i would like to see that we come here to a conclusion of what kind of ha setup would be best for all and focus our energy on only one implementation. I have enough old hardware here (and i'm getting some more in about a month) that i can setup a nice little test environment. Right now it also looks like i'll have plenty of time in the february - june 2005 so i can volunteer to be a tester. there are a lot of senerios that are possible with #1 or #3 that are not possible with #5 One i think is slave of a slave of a slave (...) kind of setup. Does anybody really need such setup for a mail? I understand it for a ldap for example, there are even some things where it is usefull for a sql database, but i see no reason to have it for a mail server. -- Jure Pear http://jure.pecar.org/ --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html