Re: choosing a file system
Thanks for everybody. That was an interesting thread. Nobody seems to use a NetApp appliance, may be due to NFS architecture problems. I believe I'll look to ext4 that seemed to be available in last kernel, and also to Solaris, but we are not enough to support another OS. Dom And Happy New Year ! 2008/12/31 Bron Gondwana br...@fastmail.fm On Tue, Dec 30, 2008 at 02:43:14PM -0700, Shawn Nock wrote: Bron and the fastmail guys could tell you more about reiserfs... we've used RHSuSE/reiserfs/EMC for quite a while and we are very happy. Yeah, sure could :) You can probably find plenty of stuff from me in the archives about our setup - the basic things are: * separate metadata on RAID1 10kRPM (or 15kRPM in the new boxes) drives. * data files on RAID5 big slow drives - data IO isn't a limiting factor * 300Gb slots with 15Gb associated meta drives, like this: /dev/sdb6 14016208 8080360 5935848 58% /mnt/meta6 /dev/sdb7 14016208 8064848 5951360 58% /mnt/meta7 /dev/sdb8 14016208 8498812 5517396 61% /mnt/meta8 /dev/sdd2292959500 248086796 44872704 85% /mnt/data6 /dev/sdd3292959500 242722420 50237080 83% /mnt/data7 /dev/sdd4292959500 248840432 44119068 85% /mnt/data8 as you can see, that balances out pretty nicely. We also store per-user bayes databases on the associated meta drives. We balance our disk usage by moving users between stores when usage reaches 88% on any partition. We get emailed if it goes above 92% and paged if it goes above 95%. Replication. We have multiple slots on each server, and since they are all the same size, we have replication pairs spread pretty randomly around the hosts, so the failure of any one drive unit (SCSI attached SATA) or imap server doesn't significantly overload any one other machine. By using Cyrus replication rather than, say, DRBD, a filesystem corruption should only affect a single partition, which won't take so long to fsck. Moving users is easy - we run a sync_server on the Cyrus master, and just create a custom config directory with symlinks into the tree on the real server and a rewritten piece of mailboxes.db so we can rename them during the move if needed. It's all automatic. We also have a CheckReplication perl module that can be used to compare two ends to make sure everything is the same. It does full per-message flags checks, random sha1 integrity checks, etc. Does require a custom patch to expose the GUID (as DIGEST.SHA1) via IMAP. I lost an entire drive unit on the 26th. It stopped responding. 8 x 1TB drives in it. I tried rebooting everything, then switched the affected stores over to their replicas. Total downtime for those users of about 15 minutes because I tried the reboot first just in case (there's a chance that some messages were delivered and not yet replicated, so it's better not to bring up the replica uncleanly until you're sure there's no other choice) In the end I decided that it wasn't recoverable quickly enough to be viable, so chose new replica pairs for the slots that had been on that drive unit (we keep some empty space on our machines for just this eventuality) and started up another handy little script sync_all_users which runs sync_client -u for every user, then starts the rolling sync_client again at the end. It took about 16 hours to bring everything back to fully replicated again. Bron. -- Dominique LALOT Ingénieur Systèmes et Réseaux http://annuaire.univmed.fr/showuser?uid=lalot Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
On Wed, 2008-12-31 at 11:47 +0100, LALOT Dominique wrote: Thanks for everybody. That was an interesting thread. Nobody seems to use a NetApp appliance, may be due to NFS architecture problems. Personally, I'd never use NFS for anything. Over the years I've had way to many NFS related problems on other things to ever want to try it again. I believe I'll look to ext4 that seemed to be available in last kernel, and also to Solaris, but we are not enough to support another OS. We've used Cyrus on XFS for almost a years, no problems. In regards to ext3 I'd pay attention to the vintage of problem reports and performance issues; ext3 of several years ago is not the ext3 of today, many improvements have been made. data=writeback mode can help performance quite a bit, as well as enabling dir_index if it isn't already (did it ever become the default?). The periodic fsck can also be disabled via tune2fs. I only point this out since, if you already have any ext3 setup, trying the above are all painless and might buy you something. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
On Dec 30, 2008, at 4:43 PM, Shawn Nock wrote: [...] a scripted rename of mailboxes to balance partition utilization when we add another partition. Just curious - how do stop people from accessing their mailboxes during the time they are being renamed and moved to another partition? -nik Information Technology Systems Programming Boston University Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
Hi, I would not discount using reiserfs (v3) by any means. It's still by far a better choice for a filesystem with Cyrus then Ext3 or Ext4. I haven't really seen anyone do any tests with Ext4, but I imagine it should be about par for the course for Ext3. as far as the NFS... NFS isn't itself that bad, it's just that people tend to find ways to use NFS in a incorrect manner that only ends up leading to failure. Scott On Dec 31, 2008, at 2:47 AM, LALOT Dominique wrote: Thanks for everybody. That was an interesting thread. Nobody seems to use a NetApp appliance, may be due to NFS architecture problems. I believe I'll look to ext4 that seemed to be available in last kernel, and also to Solaris, but we are not enough to support another OS. Dom And Happy New Year ! 2008/12/31 Bron Gondwana br...@fastmail.fm On Tue, Dec 30, 2008 at 02:43:14PM -0700, Shawn Nock wrote: Bron and the fastmail guys could tell you more about reiserfs... we've used RHSuSE/reiserfs/EMC for quite a while and we are very happy. Yeah, sure could :) You can probably find plenty of stuff from me in the archives about our setup - the basic things are: * separate metadata on RAID1 10kRPM (or 15kRPM in the new boxes) drives. * data files on RAID5 big slow drives - data IO isn't a limiting factor * 300Gb slots with 15Gb associated meta drives, like this: /dev/sdb6 14016208 8080360 5935848 58% /mnt/meta6 /dev/sdb7 14016208 8064848 5951360 58% /mnt/meta7 /dev/sdb8 14016208 8498812 5517396 61% /mnt/meta8 /dev/sdd2292959500 248086796 44872704 85% /mnt/data6 /dev/sdd3292959500 242722420 50237080 83% /mnt/data7 /dev/sdd4292959500 248840432 44119068 85% /mnt/data8 as you can see, that balances out pretty nicely. We also store per-user bayes databases on the associated meta drives. We balance our disk usage by moving users between stores when usage reaches 88% on any partition. We get emailed if it goes above 92% and paged if it goes above 95%. Replication. We have multiple slots on each server, and since they are all the same size, we have replication pairs spread pretty randomly around the hosts, so the failure of any one drive unit (SCSI attached SATA) or imap server doesn't significantly overload any one other machine. By using Cyrus replication rather than, say, DRBD, a filesystem corruption should only affect a single partition, which won't take so long to fsck. Moving users is easy - we run a sync_server on the Cyrus master, and just create a custom config directory with symlinks into the tree on the real server and a rewritten piece of mailboxes.db so we can rename them during the move if needed. It's all automatic. We also have a CheckReplication perl module that can be used to compare two ends to make sure everything is the same. It does full per-message flags checks, random sha1 integrity checks, etc. Does require a custom patch to expose the GUID (as DIGEST.SHA1) via IMAP. I lost an entire drive unit on the 26th. It stopped responding. 8 x 1TB drives in it. I tried rebooting everything, then switched the affected stores over to their replicas. Total downtime for those users of about 15 minutes because I tried the reboot first just in case (there's a chance that some messages were delivered and not yet replicated, so it's better not to bring up the replica uncleanly until you're sure there's no other choice) In the end I decided that it wasn't recoverable quickly enough to be viable, so chose new replica pairs for the slots that had been on that drive unit (we keep some empty space on our machines for just this eventuality) and started up another handy little script sync_all_users which runs sync_client -u for every user, then starts the rolling sync_client again at the end. It took about 16 hours to bring everything back to fully replicated again. Bron. -- Dominique LALOT Ingénieur Systèmes et Réseaux http://annuaire.univmed.fr/showuser?uid=lalot !DSPAM:495b4f1f47731804284693! Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html !DSPAM:495b4f1f47731804284693! Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
-- Nik Conwell n...@bu.edu is rumored to have mumbled on 31. Dezember 2008 07:47:31 -0500 regarding Re: choosing a file system: Just curious - how do stop people from accessing their mailboxes during the time they are being renamed and moved to another partition? I just do a grep on the username in the proc directory - if there is no process for that user, I figure it's safe enough to move the mailbox. This approach has worked well so far. I experimented with accessing a mailbox while it was being moved and that seemed to be OK as well, i.e. it failed while the operation was in progress. -- Sebastian Hagedorn - RZKR-R1 (Flachbau), Zi. 18, Robert-Koch-Str. 10 Zentrum für angewandte Informatik - Universitätsweiter Service RRZK Universität zu Köln / Cologne University - Tel. +49-221-478-5587 pgpPU72K0BOGZ.pgp Description: PGP signature Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
-- Nik Conwell n...@bu.edu is rumored to have mumbled on 31. Dezember 2008 07:47:31 -0500 regarding Re: choosing a file system: Just curious - how do stop people from accessing their mailboxes during the time they are being renamed and moved to another partition? I moved a few thousand mailboxes in a similar fashion (summer of 2007) and encountered no problems. New message deliveries were nicely frozen by Cyrus while the target Inbox was being renamed/moved. Question : would it, stabilitywise, make a difference if the mail data and metadata are split, allocating the metadata partitions on SAN-based LUNs and storing messages in NAS (NFS) space ? In other words : are the Cyrus-over-NFS inconveniences confined to the cyrus.* files ? Rationale : NAS space can, typically, be grown more easily than SAN space. This could be an advantage to older server OSes en filesystems... Eric Luyten, Brussels Free University Computing Centre (Cyrus 2.2, 58k users, 2.3 TB) Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
cyrus-sasl pam mysql connections are not getting closed
I am using cyrus-sasl with pam mysql ( on Centos5) The mysql is on a remote server. After some time I find that there are too many connections to mysql open ( using netstat) I restart saslauthd but still these dont away How do I check what the mysql connection is being used for ? and how do I avoid these piling up Thanks Ram Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
On Wed, Dec 31, 2008 at 07:38:21AM -0500, Adam Tauno Williams wrote: In regards to ext3 I'd pay attention to the vintage of problem reports and performance issues; ext3 of several years ago is not the ext3 of today, many improvements have been made. data=writeback mode can help performance quite a bit, as well as enabling dir_index if it isn't already (did it ever become the default?). The periodic fsck can also be disabled via tune2fs. I only point this out since, if you already have any ext3 setup, trying the above are all painless and might buy you something. I wouldn't call data=writeback painless. I had it on in the testing phase of our current Cyrus installation, and if the filesystem had to be forcibly unmounted by any reason (yes, there are reasons), the amount of corruption in those files that happened to be active during the unmount - well, it wasn't a nice sight. And the files weren't recoverable, except from backup. I never really got the point of the data=writeback mode. Sure, it increases throughput, but so does disabling the journal completely, and seems to me the end result as concerns data integrity is exactly the same. --Janne -- Janne Peltonen janne.pelto...@helsinki.fi PGP Key ID: 0x9CFAC88B Please consider membership of the Hospitality Club (http://www.hospitalityclub.org) Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
У вт, 2008-12-30 у 17:49 +0100, LALOT Dominique пише: Once, there was a bad shutdown corrupting ext3fs and we spent 6 hours on an fsck. Actually i do use reiserfs over 2 years on cyrus-imapd. It performs great even with realy big count of files in imap spool folders. But i dont know how it will perform on EMC. 4 years ago i tryied ext3. It was disaster. Slow as hell. Reiser4 was once used too, it did even better than reiserfs. But after 2 mounth stable running it get kernel OPS because a FS. And i did swiched back to reiserfs. -- Teresa Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
On Wed, Dec 31, 2008 at 04:58:57AM -0800, Scott Likens wrote: I would not discount using reiserfs (v3) by any means. It's still by far a better choice for a filesystem with Cyrus then Ext3 or Ext4. I haven't really seen anyone do any tests with Ext4, but I imagine it should be about par for the course for Ext3. There are /lots/ of (comparative) tests done: The most recent I could find with a quick Google is here: http://www.phoronix.com/scan.php?page=articleitem=ext4_benchmarks The problem with reiserfs is... well. The developers have explicitely stated that the development of v3 has come to its end, and there was the long argument between Hans Reiser and kernel delevopers about whether v4 could be included in kernel. When Hans Reiser was charged with murder (not the crow or Cyrus variant), his company assured that the development (of v4) would continue, but the last time I tried to find out anything about the project, it appeared more or less dead. Of course, the current reiserfs (v3) is very stable, but if you run into any issues, there really isn't a developer you can contact (or send patches to, if you figure out the bug). --Janne -- Janne Peltonen janne.pelto...@helsinki.fi PGP Key ID: 0x9CFAC88B Please consider membership of the Hospitality Club (http://www.hospitalityclub.org) Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
On Wed, 2008-12-31 at 15:46 +0200, Janne Peltonen wrote: On Wed, Dec 31, 2008 at 07:38:21AM -0500, Adam Tauno Williams wrote: In regards to ext3 I'd pay attention to the vintage of problem reports and performance issues; ext3 of several years ago is not the ext3 of today, many improvements have been made. data=writeback mode can help performance quite a bit, as well as enabling dir_index if it isn't already (did it ever become the default?). The periodic fsck can also be disabled via tune2fs. I only point this out since, if you already have any ext3 setup, trying the above are all painless and might buy you something. I wouldn't call data=writeback painless. I had it on in the testing phase of our current Cyrus installation, and if the filesystem had to be forcibly unmounted by any reason (yes, there are reasons), the amount of corruption in those files that happened to be active during the unmount - well, it wasn't a nice sight. And the files weren't recoverable, except from backup. I never really got the point of the data=writeback mode. Sure, it increases throughput, but so does disabling the journal completely, and seems to me the end result as concerns data integrity is exactly the same. The *filesystem* is recoverable as the meta-data is journaled. *Contents* of files may be lost/corrupted. I'm fine with that since a serious abend usually leaves the state of the data in a questionable state anyway for reasons other than the filesystem; I want something I can safely (and quickly) remount and investigate/restore. It is a trade-off. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
Ah the saga of Hans Reiser. That unfortunately is the Downfall of Reiserfs. Yes, his company has disappeared, and a void has appeared from his lack of presence? However, the Reiserfs4 patch set is current against the linux kernel 2.6.28 (see http://www.kernel.org/pub/linux/kernel/people/edward/reiser4/reiser4-for-2.6/) However I think that (http://en.wikipedia.org/wiki/Reiser4) pretty much sums up the future of Reiserfs4. ... However I haven't really run into show stopping bugs on Reiserfs3 in quite some time (with excellent hardware). However you replace it with dodgy hardware and things change. I haven't looked at btrfs yet with Cyrus, perhaps I'll do that sometime soon. On Dec 31, 2008, at 6:20 AM, Janne Peltonen wrote: On Wed, Dec 31, 2008 at 04:58:57AM -0800, Scott Likens wrote: I would not discount using reiserfs (v3) by any means. It's still by far a better choice for a filesystem with Cyrus then Ext3 or Ext4. I haven't really seen anyone do any tests with Ext4, but I imagine it should be about par for the course for Ext3. There are /lots/ of (comparative) tests done: The most recent I could find with a quick Google is here: http://www.phoronix.com/scan.php?page=articleitem=ext4_benchmarks The problem with reiserfs is... well. The developers have explicitely stated that the development of v3 has come to its end, and there was the long argument between Hans Reiser and kernel delevopers about whether v4 could be included in kernel. When Hans Reiser was charged with murder (not the crow or Cyrus variant), his company assured that the development (of v4) would continue, but the last time I tried to find out anything about the project, it appeared more or less dead. Of course, the current reiserfs (v3) is very stable, but if you run into any issues, there really isn't a developer you can contact (or send patches to, if you figure out the bug). --Janne -- Janne Peltonen janne.pelto...@helsinki.fi PGP Key ID: 0x9CFAC88B Please consider membership of the Hospitality Club (http://www.hospitalityclub.org ) !DSPAM:495b87d570801804284693! Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
On Wed, 31 Dec 2008, Adam Tauno Williams wrote: On Wed, 2008-12-31 at 11:47 +0100, LALOT Dominique wrote: Thanks for everybody. That was an interesting thread. Nobody seems to use a NetApp appliance, may be due to NFS architecture problems. Personally, I'd never use NFS for anything. Over the years I've had way to many NFS related problems on other things to ever want to try it again. NFS has some very interesting capabilities and limitations. it's really bad for multiple processes writing to the same file (the cyrus* files for example) and for atomic actions (writing the message files for example) there are ways that you can configure it that will work, but unless you already have a big NFS server you are probably much better off using a mechanism that makes the drives look more like local drives (SAN, iSCSI, etc) or try one of the cluster filesystems that has different tradeoffs than NFS does I believe I'll look to ext4 that seemed to be available in last kernel, and also to Solaris, but we are not enough to support another OS. We've used Cyrus on XFS for almost a years, no problems. In regards to ext3 I'd pay attention to the vintage of problem reports and performance issues; ext3 of several years ago is not the ext3 of today, many improvements have been made. data=writeback mode can help performance quite a bit, as well as enabling dir_index if it isn't already (did it ever become the default?). The periodic fsck can also be disabled via tune2fs. I only point this out since, if you already have any ext3 setup, trying the above are all painless and might buy you something. it's definantly worth testing different filesystems. I last did a test about two years ago and confirmed XFS as my choice. I have one instance of cyrus still running on ext3 and I definantly see it as a user in the performance. David Lang Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
Nik Conwell wrote: On Dec 30, 2008, at 4:43 PM, Shawn Nock wrote: [...] a scripted rename of mailboxes to balance partition utilization when we add another partition. Just curious - how do stop people from accessing their mailboxes during the time they are being renamed and moved to another partition? We don't really bother. We run the script overnight (over several nights) to minimize storage utilization and we haven't run into a problem. I haven't looked at the code in a while, but as I recall the rename operation is fairly atomic. In short: it doesn't take long to move a box. The worst thing that I could imagine would be a momentary outage for a single user (``Mailbox does not exist'' or similar). This sort of error (if it does occur in the wild) would clear almost immediately. Shawn -- Shawn Nock (OpenPGP: 0xFF7D08A3) Unix Systems Group; UITS University of Arizona nock at email.arizona.edu signature.asc Description: OpenPGP digital signature Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
On Wed, Dec 31, 2008 at 07:47:31AM -0500, Nik Conwell wrote: On Dec 30, 2008, at 4:43 PM, Shawn Nock wrote: [...] a scripted rename of mailboxes to balance partition utilization when we add another partition. Just curious - how do stop people from accessing their mailboxes during the time they are being renamed and moved to another partition? All access goes via an nginx proxy - we use the proc directory contents to detect currently active connections and termintate them after blocking all new logins in the authentication daemon. Once they're fully moved, logins are enabled again. Bron. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html