Re: GSSAPI authentication ceased working
Lars Hanke wrote: BTW: It's still not working. I put it to PRI2, since the important ldapdb stuff is running. Kerberized imap is rarely used here, so people can do without. But still I'd like to understand, what is happening. Is the keytab readable by the cyrus user (the Unix uid)? Thanks, Dave -- Dave McMurtrie, SPE Email Systems Team Leader Carnegie Mellon University, Computing Services Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Storage Sizing: IOPS per mailbox
Original Message Subject: Storage Sizing: IOPS per mailbox From: ram r...@netcore.co.in To: info-cyrus info-cyrus@lists.andrew.cmu.edu Date: Friday, January 02, 2009 10:40:17 PM When sizing a storage device for a large cyrus server, the typical question asked by storage vendors is what is the IOPS required per mailbox M$$ Exchange has this concept of IOPS. and they suggest 1.5 IOPS per mailbox ( heavy users ) If I use postfix and cyrus , on my imap server ( pure IMAP server .. All spam filtering , outgoing mails , authentication etc happens on different servers ) If the storage is used only for imap storage , what is the typical IOPS requirement per user We will probably assume 30-50 mails a day of average 100k , and an email client checking for new mail every 5minutes In my experience I would estimate 0.1 IOPS per user for heavy users (thousands of emails per day, checked every few minutes) and 0.01 IOPS per user for typical ISP accounts (under a dozen emails, checked a few times daily). Our systems use MySQL for authentication and account verification and primarily skiplist databases within Cyrus. These figures may be on the safe side as none of our systems do just postfix/imap. --Blake Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
Hm. ReiserFS: If I'm still following after reading through all this discussion, everyone who is actually using ReiserFS (v3) appears to be very content with it, even with very large installations. Apparently the fact that ReiserFS uses the BKL in places doesn't hurt performance too badly, even with multi core systems? Another thing I don't recall being mentioned was fragmentation - ext3 appears to have a problem with it, in typical Cyrus usage, but how does ReiserFS compare to it? Also, the write barrier problem mentioned in response to my earlier post on ext3 would apparently be there with ReiserFS, too, wouldn't it? GFS: Nobody mentioned using GFS, which /is/ a clustered file system and as such, probably overkill if it's only mounted on one node at a time, but I'm curious... the overhead of a clustered FS is the fact that all metadata operations take a long time, because there is a lot of cluster-wide locking. But how much metadata operations there are, after all, in Cyrus? Also, GFS is one of the two file systems available when using RH clustering... Ext3: I'm using this happily, with 50k users, 24 distinct mailspools of 240G each. Full backups take quite a while to complete (~2 days), but normal usage is quite fast. There is the barrier problem, of course... I'm using noatime (implying nodiratime) and data=ordered, since data=writeback resulted in corrupted skiplist files on crash, while data=ordered mostly didn't. Also, ext3 is the other FS available when using RH clustering. (Of course, it isn't a clustered FS, so it is only available when using the cluster in active-passive mode.) XFS: There was someone using this, too, and happy with it. JFS: Mm, apparently no comments on this, not positive, at least. Future: Ext4 just got stable, so there is no real world Cyrus user experience on it. Among other things, it contains an online defragmenter. Journal checksumming might also help around the write barrier problem on LVM logical volumes, if I've understood correctly. Reiser4 might have a future, at least Andrew Morton's -mm patch contains it and there are people developing it. But I don't know if it ever will be included in the standard kernel tree. Btrfs is in so early development that I don't know yet what to say about it, but the fact of ZFS's being incompatible with GPL might be mitigated by this. Conclusion: I'm going to continue using ext3 for now, and probably ext4 when it's available from certain commercial enterprise linux vendor (personally, I'd be using Debian, but the department has an official policy of using RH / Centos). I'm eagerly waiting for btrfs to appear... I probably /would/ switch to ReiserFS for now, if RH cluster would support ReiserFS FS resources. Hmm, maybe I should just start hacking... On the other hand, the upgrade path from ext3 to ext4 is quite easy, and I don't know yet which would be better, ReiserFS or ext4. -- Janne Peltonen janne.pelto...@helsinki.fi PGP Key ID: 0x9CFAC88B Please consider membership of the Hospitality Club (http://www.hospitalityclub.org) Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Storage Sizing: IOPS per mailbox
On 08 Jan 09, at 1508, Blake Hudson wrote: Original Message Subject: Storage Sizing: IOPS per mailbox From: ram r...@netcore.co.in To: info-cyrus info-cyrus@lists.andrew.cmu.edu Date: Friday, January 02, 2009 10:40:17 PM When sizing a storage device for a large cyrus server, the typical question asked by storage vendors is what is the IOPS required per mailbox M$$ Exchange has this concept of IOPS. and they suggest 1.5 IOPS per mailbox ( heavy users ) If I use postfix and cyrus , on my imap server ( pure IMAP server .. All spam filtering , outgoing mails , authentication etc happens on different servers ) If the storage is used only for imap storage , what is the typical IOPS requirement per user We will probably assume 30-50 mails a day of average 100k , and an email client checking for new mail every 5minutes In my experience I would estimate 0.1 IOPS per user for heavy users (thousands of emails per day, checked every few minutes) and 0.01 IOPS per user for typical ISP accounts (under a dozen emails, checked a few times daily). Our IMAP server has as I type 1020 imap connections up, representing most of our staff. The metadata (both /var/imap and per-mailbox) is in a ZFS pool configured as a two-way mirror of two-way stripes of SAS drives. The load on that is low. The zfs statistics on the metadata are as an example: mailhost-new# zpool iostat 1 capacity operationsbandwidth pool used avail read write read write -- - - - - - - pool1 54.4G 23.6G 12 77 489K 499K pool1 54.4G 23.6G 0785 0 3.74M pool1 54.4G 23.6G 0 14 0 135K pool1 54.4G 23.6G 0 24 0 877K pool1 54.4G 23.6G 0 33 0 242K pool1 54.4G 23.6G 0 10 0 43.6K pool1 54.4G 23.6G 0417 1.48K 2.06M pool1 54.4G 23.6G 0 22 0 139K pool1 54.4G 23.6G 0 1 0 7.92K pool1 54.4G 23.6G 0 7 0 31.7K pool1 54.4G 23.6G 0 20 0 83.2K pool1 54.4G 23.6G 0504 1013 1.93M pool1 54.4G 23.6G 0 23 0 574K pool1 54.4G 23.6G 2 17 96.5K 123K pool1 54.4G 23.6G 0 40 0 285K pool1 54.4G 23.6G 0 26 0 123K pool1 54.4G 23.6G 0698 1.98K 3.41M pool1 54.4G 23.6G 0 3 0 15.8K pool1 54.4G 23.6G 0 24 0 744K pool1 54.4G 23.6G 0 16 0 713K pool1 54.4G 23.6G 3 15 209K 147K pool1 54.4G 23.6G 5569 760K 2.71M pool1 54.4G 23.6G 0 16 0 222K ^Cmailhost-new# You can see the five-second sync. The first line, average, figures aren't representative because they of course include backup activity. The actual messages are stored in the lowest (`archive') QoS band of a Pillar Axiom 500, in NAS mode. The load is very small on each of two 2TB-ish partitions (it's approaching four pm, so the business is going at close to full load): mailpool1 I/O Operations Read I/Os per second: 58.677 Write I/Os per second: 7.129 Average Request Time: 4.475 ms Current MB per second: 1.003 General Statistics Read/Write Cache Hit Percentage: 69% Read/Write I/O Ratio: 89:11 mailpool2 I/O Operations Read I/Os per second: 46.733 Write I/Os per second: 9.467 Average Request Time: 1.923 ms Current MB per second: 0.544 General Statistics Read/Write Cache Hit Percentage: 56% Read/Write I/O Ratio: 83:17 Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
(Summary of filesystem discussion) You left out ZFS. Sometimes Linux admins remind me of Windows admins. I have adminned a half-dozen UNIX variants professionally but keep running into admins who only do ONE and for whom every problem is solved with how can I do this with one OS only? I admin numerous Linux systems in our data center (Perdition proxy in front of Cyrus for one) but frankly you want me to go back into filesystem Dark Ages now for terabytes of mail volume I'd throw a professional fit. Even the idea that I need to tune my filesystem for inodes and to avoid it wanting to fsck on reboot #20 or whatever seems like caveman discussion. Any of them offer cheap and nearly-instant snapshots online scrubbing? No? Then why use it for large number of files of important nature? I love Linux, I surely do. Virtually everything of an appliance nature here will probably shift over to it in the long run I think and for good reasons. But filesystem is one area where the bazaar model has fallen into a very deep rut and can't muster energy to climb out. So far ZFS ticking along with no problems and low iostat numbers with everything in one big pool. I have separate fs for data, imap, mail but haven't seen any need to carve mail spool into chunks at all. There were initial problems noted here in the mailing lists way back in Solaris 10u3 but that was solved with the fsync patch and since then it's been like butter. Mail-store systems nobody ever needs to look at them because it just works. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: GSSAPI authentication ceased working
On 02 Jan 2009, at 11:19, Lars Hanke wrote: hermod: /var/log/auth.log Jan 2 17:07:54 hermod imtest: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Decrypt integrity check failed) hel: /var/log/syslog Jan 2 16:07:54 hel krb5kdc[1652]: TGS_REQ (7 etypes {18 17 16 23 1 3 2}) 172.16.6.5: PROCESS_TGS: authtime 0, unknown client for imap/hermod@mgr, Decrypt integrity check failed As I read this, hel is saying that the TGT is bad. You're trying to obtain a service ticket for imap/hermod, but the TGT you're attempting to use is not accepted by the KDC. If you klist after running imtest, you have no imap/hermod ticket. I've never seen an error like that. It suggests that you KDC is really broken :) Something like the key used to encrypt your TGT isn't valid for obtaining service tickets. :wes Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
On Thu, Jan 08, 2009 at 05:20:00PM +0200, Janne Peltonen wrote: If I'm still following after reading through all this discussion, everyone who is actually using ReiserFS (v3) appears to be very content with it, even with very large installations. Apparently the fact that ReiserFS uses the BKL in places doesn't hurt performance too badly, even with multi core systems? Another thing I don't recall being mentioned was fragmentation - ext3 appears to have a problem with it, in typical Cyrus usage, but how does ReiserFS compare to it? Yeah, I'm surprised the BKL hasn't hurt us more. Fragmentation, yeah it does hurt performance a bit. We run a patch which causes a skiplist checkpoint every time it runs a recovery, which includes every restart. We also tune skiplists to checkpoint more frequently in everyday use. This helps reduce meta fragmentation. For data fragmentation - we don't care. Honestly. Data IO is so rare. The main time it matters is if someone does a body search. Which leaves... index files. The worst case are files that are only ever appended to, never any records deleted. Each time you expunge a mailbox (even with delayed expunge) it causes a complete rewrite of the cyrus.index file. I also wrote a filthy little script (attached) which can repack cyrus meta directories. I'm not 100% certain that it's problem free though, so I only run it on replicas. Besides, it's not protected like most of our auto-system functions, which check the database to see if the machine is reporting high load problems and choke themselves until the load drops back down again. I'm using this happily, with 50k users, 24 distinct mailspools of 240G each. Full backups take quite a while to complete (~2 days), but normal usage is quite fast. There is the barrier problem, of course... I'm using noatime (implying nodiratime) and data=ordered, since data=writeback resulted in corrupted skiplist files on crash, while data=ordered mostly didn't. Yeah, full backups. Ouch. I think the last time we had to do that it took somewhat over a week. Mainly CPU limited on the backup server, which is doing a LOT of gzipping! Our incremental backups take about 4 hours. We could probably speed this up a little more, but given that it's now down from about 12 hours two weeks ago, I'm happy. We were actually rate limited by Perl 'unpack' and hash creation, believe it or not! I wound up rewriting Cyrus::IndexFile to provide a raw interface, and unpacking just the fields that I needed. I also asserted index file version == 10 in the backup library so I can guarantee the offsets are correct. I've described our backup system here before - it's _VERY_ custom, based on a deep understanding of the Cyrus file structures. In this case it's definitely worth it - it allows us to reconstruct partial mailbox recoveries with flags intact. Unfortunately, seen information is much trickier. I've been tempted for a while to patch cyrus's seen support to store seen information for the user themselves in the cyrus.index file, and only seen information for unowned folders in the user.seen files. The way it works now seems optimised for the uncommon case at the expense of the common. That always annoys me! Ext4 just got stable, so there is no real world Cyrus user experience on it. Among other things, it contains an online defragmenter. Journal checksumming might also help around the write barrier problem on LVM logical volumes, if I've understood correctly. Yeah, it's interesting. Local fiddling suggests it's worse for my Maildir performance than even btrfs, and btrfs feels more jerky than reiser3, so I stick with reiser3. Reiser4 might have a future, at least Andrew Morton's -mm patch contains it and there are people developing it. But I don't know if it ever will be included in the standard kernel tree. Yeah, the mailing list isn't massively active at the moment either... I do keep an eye on it. Btrfs is in so early development that I don't know yet what to say about it, but the fact of ZFS's being incompatible with GPL might be mitigated by this. Yeah, btrfs looks interesting. Especially with their work on improving locking - even on my little dual processor laptop (yay core processors) I would expect to see an improvement when they merge the new locking code. I'm going to continue using ext3 for now, and probably ext4 when it's available from certain commercial enterprise linux vendor (personally, I'd be using Debian, but the department has an official policy of using RH / Centos). I'm eagerly waiting for btrfs to appear... I probably /would/ switch to ReiserFS for now, if RH cluster would support ReiserFS FS resources. Hmm, maybe I should just start hacking... On the other hand, the upgrade path from ext3 to ext4 is quite easy, and I don't know yet which would be better, ReiserFS or ext4. Sounds sane. If vendor support matters, then ext4 is probably the immediate future good choice.
Re: choosing a file system
On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote: (Summary of filesystem discussion) You left out ZFS. Sometimes Linux admins remind me of Windows admins. I have adminned a half-dozen UNIX variants professionally but keep running into admins who only do ONE and for whom every problem is solved with how can I do this with one OS only? We run one zfs machine. I've seen it report issues on a scrub only to not have them on the second scrub. While it looks shiny and great, it's also relatively new. Besides, we had a disk _fail_ early on in our x4500 - Sun shipped a replacement drive, but the kernel was unable to recognise it: --- Nothing odd about how it snaps in. We can see the connectors in the slot - they seem fine as far as we can tell. The drive's 'ok' light is on and the blue led lit. Which suggests the server thinks the drive is fine, but the dmesg data definitely suggests it isn't. I've also included the output of hdadm display below as well, which shows that currently it thinks the drive is not present, even though the last thing reported in the dmesg log is that the device was connected. Aug 14 21:59:13 backup1 SATA device attached at port 0 Aug 14 21:59:13 backup1 sata: [ID 663010 kern.info] +/p...@2,0/pci1022,7...@8/pci11ab,1...@1 : The output of hdadm display shows that the machine definitely thinks the drive is NOT connected. --- Sun's response was to wait for the next kernel upgrade - there was a bug that made that channel unusable even after a reboot. So far ZFS ticking along with no problems and low iostat numbers with everything in one big pool. I have separate fs for data, imap, mail but haven't seen any need to carve mail spool into chunks at all. There were initial problems noted here in the mailing lists way back in Solaris 10u3 but that was solved with the fsync patch and since then it's been like butter. Mail-store systems nobody ever needs to look at them because it just works. I'd sure hate to lose the entire basket, say due to an unknown bug in zfs. Besides, I _know_ Debian quite well. We don't have any Solaris experience in our team. The documentation looks quite good, but it's still a lot of things that work differently. I tell you what, maintaining Solaris and using the Solaris userland feels like going back 20 years - and the whole need a sunsolve password and only get some patches - permission denied on others crap. I don't need that. So while I apprciate that ZFS has some advantages, I'd have to say that they need to be weighed up against the rest of the system, and the all the eggs in a relatively new basket argument. Also, the response we've had from Linus when we find kernel issues has been absolutely fantastic. Bron ( Debian on the Solaris kernel would be interesting... ) Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
On Jan 8, 2009, at 7:46 PM, Bron Gondwana wrote: We run one zfs machine. I've seen it report issues on a scrub only to not have them on the second scrub. While it looks shiny and great, it's also relatively new. Wait, weren't you just crowing about ext4? The filesystem that was marked GA in the linux kernel release that happened just a few weeks ago? You also sound pretty enthusiastic, rather than cautious, when talking about brtfs and tux3. ZFS, and anyone who even remotely seriously follows Solaris would know this, has been GA for 3 years now. For someone who doesn't have their nose buried in Solaris much or with any serious attention span, I guess it could still seem new. As for your x4500, I can't tell if those syslog lines you pasted were from Aug. 2008 or 2007, but certainly since 2007 the marvel SATA driver has seen some huge improvements to work around some pretty nasty bugs in the marvell chipset. If you still have that x4500, and have not applied the current patch for the marvell88sx driver, I highly suggest doing so. Problems with that chip are some of the reasons Sun switched to the LSI 1068E as the controller in the x4540. /dale Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
On Thu, 08 Jan 2009 20:03 -0500, Dale Ghent da...@elemental.org wrote: On Jan 8, 2009, at 7:46 PM, Bron Gondwana wrote: We run one zfs machine. I've seen it report issues on a scrub only to not have them on the second scrub. While it looks shiny and great, it's also relatively new. Wait, weren't you just crowing about ext4? The filesystem that was marked GA in the linux kernel release that happened just a few weeks ago? You also sound pretty enthusiastic, rather than cautious, when talking about brtfs and tux3. I was saying I find it interesting. I wouldn't seriously consider using it for production mail stores just yet. But I have been testing it on my laptop, where I'm running an offlineimap replicated copy of my mail. I wouldn't consider btrfs for production yet either, and tux3 isn't even on the radar. They're interesting to watch though, as is ZFS. I also said (or at least meant) that if you have commercial support, ext4 is probably going to be the next evolutionary step from ext3. ZFS, and anyone who even remotely seriously follows Solaris would know this, has been GA for 3 years now. For someone who doesn't have their nose buried in Solaris much or with any serious attention span, I guess it could still seem new. Yeah, it's true - but I've heard anecdotes of people losing entire zpools due to bugs. Google turns up things like: http://www.techcrunch.com/2008/01/15/joyent-suffers-major-downtime-due-to-zfs-bug/ which points to this thread: http://www.opensolaris.org/jive/thread.jspa?threadID=49020tstart=0 and finally this comment: http://www.joyeur.com/2008/01/16/strongspace-and-bingodisk-update#c008480 Not something I would want happening to my entire universe, which is why having ~280 separate filesystems (at the moment) with our email spread across them means that a rare filesystem bug is only likely to affect a single store if it bites - and we can restore one store's worth of users a lot quicker than the whole system. It's the same reason we prefer Cyrus replication (and put a LOT of work into making it stable - check this mailing list from a couple of years ago. I wrote most of the patches the stabilised replication between 2.3.3 and 2.3.8) If all your files are on a single filesystem then a rare bug only has to hit once. A frequent bug on the other hand, well - you'll know about them pretty fast... :) None of the filesystems mentioned have frequent bugs (except btrfs and probably tux3 - but they ship with big fat warnings all over) As for your x4500, I can't tell if those syslog lines you pasted were from Aug. 2008 or 2007, but certainly since 2007 the marvel SATA driver has seen some huge improvements to work around some pretty nasty bugs in the marvell chipset. If you still have that x4500, and have not applied the current patch for the marvell88sx driver, I highly suggest doing so. Problems with that chip are some of the reasons Sun switched to the LSI 1068E as the controller in the x4540. I think it was 2007 actually. We haven't had any trouble with it for a while, but then it does pretty little. The big zpool is just used for backups, which are pretty much one .tar.gz and one .sqlite3 file per user - and the .sqlite3 file is just indexing the .tar.gz file, we can rebuild it by reading the tar file if needed. As a counterpoint to some of the above, we had an issue with Linux where there was a bug in 64 bit writev handling of mmaped space. If you were doing a writev with a mmaped space that crossed a page boundary and the following page wasn't mapped in, it would inject spurious zero bytes in the output where the start of the next page belonged. It took me a few days to prove it was the kernel and create a repeatable test case, and then backwards and forwards with Linus and a couple of other developers we fixed it and tested it _that_day_. I don't know anyone with even unobtanium level support with a commercial vendor who has actually had that sort of turnaround. This caused pretty massive file corruption of especially our skiplist files, but bits of every other meta file too. Luckily, as per above, we had only upgraded one machine. We generally do that with new kernels or software versions - upgrade one production machine and watch it for a bit. We also test things on testbed machines first, but you always find something different on production. The mmap over boundaries case was pretty rare - only a few per day would actually cause a crash, the others were silent corruption that wasn't detected at the time. If something like this hit an only machine, we would have been seriously screwed. Since it only hit one machine, we could apply the fix and re-replicate all the damaged data from the other machine. No actual dataloss. Bron. -- Bron Gondwana br...@fastmail.fm Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info:
Re: choosing a file system
On Jan 8, 2009, at 4:46 PM, Bron Gondwana wrote: On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote: (Summary of filesystem discussion) You left out ZFS. Sometimes Linux admins remind me of Windows admins. I have adminned a half-dozen UNIX variants professionally but keep running into admins who only do ONE and for whom every problem is solved with how can I do this with one OS only? We run one zfs machine. I've seen it report issues on a scrub only to not have them on the second scrub. While it looks shiny and great, it's also relatively new. You'd be surprised how unreliable disks and the transport between the disk and host can be. This isn't a ZFS problem, but a statistical certainty as we're pushing a large amount of bits down the wire. You can, with a large enough corpus, have on-disk data corruption, or data corruption that appeared en-flight to the disk, or in the controller, that your standard disk CRCs can't correct for. As we keep pushing the limits, data integrity checking at the filesystem layer -- before the information is presented for your application to consume -- has basically become a requirement. BTW, the reason that the first scrub saw the error, and the second scrub didn't, is that the first scrub fixed it -- that's the job of a ZFS scrub. -rob Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
On Thu, Jan 08, 2009 at 08:57:18PM -0800, Robert Banz wrote: On Jan 8, 2009, at 4:46 PM, Bron Gondwana wrote: On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote: (Summary of filesystem discussion) You left out ZFS. Sometimes Linux admins remind me of Windows admins. I have adminned a half-dozen UNIX variants professionally but keep running into admins who only do ONE and for whom every problem is solved with how can I do this with one OS only? There's a significant upfront cost to learning a whole new system for one killer feature, especially if it comes along with signifiant regressions in lots of other features (like a non-sucky userland out of the box). Applying patches on Solaris seems to be a choice between incredibly low-level command line tools or boot up a whole graphical environment on a machine in a datacentre on the other side of the world. We run one zfs machine. I've seen it report issues on a scrub only to not have them on the second scrub. While it looks shiny and great, it's also relatively new. You'd be surprised how unreliable disks and the transport between the disk and host can be. This isn't a ZFS problem, but a statistical certainty as we're pushing a large amount of bits down the wire. You can, with a large enough corpus, have on-disk data corruption, or data corruption that appeared en-flight to the disk, or in the controller, that your standard disk CRCs can't correct for. As we keep pushing the limits, data integrity checking at the filesystem layer -- before the information is presented for your application to consume -- has basically become a requirement. BTW, the reason that the first scrub saw the error, and the second scrub didn't, is that the first scrub fixed it -- that's the job of a ZFS # zpool status -v rpool pool: rpool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress for 0h0m, 0.69% done, 1h40m to go config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 c5t4d0s0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: //dev/dsk --- if that's an error that the scrub fixed then it's a really badly written error message. Same error didn't exist next scrub, which was what confused me. Bron. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote: (Summary of filesystem discussion) You left out ZFS. Just to come back to this - I should say that I'm a big fan of ZFS and what Sun have done with filesystem design. Despite the issues we've had with that machine, I know it's great for people who are using it... BUT - if someone is asking what's the best filesystem to use on Linux and gets told ZFS, and by the way you should switch operating systems and ditch all the rest of your custom setup/ experience then you're as bad as a Linux weenie saying just use Cyrus on Linux in a how should I tune NTFS on my Exchange server discussion. From the original post: Message-ID: 1617f8010812300849k1c7c878bl2f17e8d4287c1...@mail.gmail.com zfs (but we should switch to solaris or freebsd and throw away our costly SAN) I'd love to do some load testing on a ZFS box with our setup at some point. There would be some advantages, though I suspect having one big mailboxes.db vs the lots of little ones we have would be a point of contention - and fine-grained skiplist locking is still very much a wishlist item. I'd want to take some time testing it before unleashing it on the world! Bron. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
Bron Gondwana wrote: BUT - if someone is asking what's the best filesystem to use on Linux and gets told ZFS, and by the way you should switch operating systems and ditch all the rest of your custom setup/ experience then you're as bad as a Linux weenie saying just use Cyrus on Linux in a how should I tune NTFS on my Exchange server discussion. Point taken. We can go around that circle all day long but I *am* saying there are other UNIX OS out there than just Linux and quite frankly it blows my mind sometimes how people fall into ruts. Numerous times in my career I have had to switch some application from AIX to HP-UX, or IRIX to Linux. The differing flavors of UNIX are not so different to me as others perhaps. Particularly when it's a single app on a dedicated server I usually find it odd how people get stuck on something and won't change. Or they take the safe institutional path and never fight it. Collect your paycheck and go home at 4. I sleep very well at night knowing the Cyrus mail-stores are on ZFS. Once in a while I run a scrub just for fun. No futzing around. This was no cakewalk. I was pushing a boulder up a hill particularly when we ran head-first into the ZFS fsync bottleneck start of Fall quarter. Managers said we needed a crash program to convert everything to Linux or Exchange or whatever. I dug into the bugs instead and Sun got us an interim patch to fix it and we moved on. Now as I said it's like butter and one of those setups nobody thinks about. There are always excuses why you will stick with established practice even if it's antiquated and full of aches and pains, and I fought that and won. It seems to me there is no bigger deal than having a RELIABLE filesystem for mail-store and this is where all other filesystem I have worked with since 1989 have been a frigging nightmare. Everything from bad controllers to double-disk failures in RAID-5 sets keeps me wondering am I paranoid ENOUGH. I'll be all over btrfs when it hits beta. I'm not married to ZFS. But I'm quite unashamedly looking down my nose at any filesystem now that leaves me possibly looking at fsck prompt. I've done enough of that in my career already it's time to move beyond 30+ years worth of cruft atop antique designs that seemed tolerable when a huge disk was 20 gigs. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: choosing a file system
There's a significant upfront cost to learning a whole new system for one killer feature, especially if it comes along with signifiant regressions in lots of other features (like a non-sucky userland out of the box). ... The non-sucky userland comment is simply a matter of preference, and bait for a religious war, which I'm not going to bite. What I will say is that switching between Solaris, Linux, IRIX, Ultrix, FreeBSD, HP-UX, OSF/1 -- any *nix variant, should not be considered a stumbling block. Your comment shows the narrow-mindedness of the current Linux culture, many of us were brought up supporting and using a collection of these platforms at any one time. (notice, didn't mention AIX. I've got my standards ;) Patching is always an issue on any OS, and you do have the choice of running X applications remotely (booting an entire graphic environment!?), and many other tools available such as pca to help you patch on Solaris, which provide many of the features that you're used to. -rob Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
32-bit to 64-bit migration seen flags
I am migrating mailboxes from a 32 bit cyrus (cyrus-2.3.7) to a 64 bit cyrus (2.3.13) server When I copy the mailbox seen flags(skiplist) from the 32 bit server to the 64 bit servers it does not work. All the mails are flagged as unseen on the new server Is there a way I can migrate the seen flags Thanks Ram Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html