Re: improving concurrency/performance
At Mon, 7 Nov 2005 09:00:08 -0500 (EST), John Madden wrote: Have you tried running something like postmark http://packages.debian.org/stable/utils/postmark to benchmark your filesystem? The disks are quite fast. bonnie++, for example, shows writes at over 300MB/s. What I'm finding though is that the processes aren't ever pegging them out -- nothing ever goes into iowait. The bottleneck is elsewhere... The question was though: Have you tried running PostMark? The distinction is extremely important. PostMark can reliably and repeatably provide very good benchmark measurements to compare filesystem tunings and hardware configurations and it does so in ways that mimic very well such real-world multi-user applications such as Cyrus IMAPd. Bonnie and Bonnie++ are very simplistic in comparison and rather useless for determining the cause of bottlenecks in real-world applications that may open many files at a time (unless you script frameworks to wrap them with, in which case you are simply re-inventing something like PostMark). -- Greg A. Woods H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED] Secrets of the Weird [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance (fwd)
On Wed, 09 Nov 2005, Joshua Schmidlkofer wrote: Does this mean that those of us using XFS should run some testing as well? Yes, XFS doesn't journal data in any way, AFAIK. I don't know how one could go about speeding up fsyncs() with it. What I *do* know is that I don't trust spools to XFS, because anything not fsync()'ed upon crash WILL be lost, but the metadata will likely be there and it is a total bitch to find out what has been damaged. You will have to hunt the entire fs over for files containing while blocks of NULs. -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance (fwd)
This guy is having a problem with cyrus-imap and ext3 - when multiple processes are attempting to write to the one filesystem (but not the one file), performance drops to next to nothing when only five processes are writing. An strace shows most of the time is being spent in fdatasync and fsync. Actually, the thread just got off topic quickly -- I'm running this on reiserfs, not ext3. ...And I've got it mounted with data=writeback, too. But thanks for the info, Andrew. John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
Quoting John Madden [EMAIL PROTECTED]: The disks are quite fast. bonnie++, for example, shows writes at over 300MB/s. What I'm finding though is that the processes aren't ever pegging them out -- nothing ever goes into iowait. The bottleneck is elsewhere... John This might seem dumb, but are there any issues with name resolution? Could DNS queries be slowing things down? David Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
This might seem dumb, but are there any issues with name resolution? Could DNS queries be slowing things down? Nah, it's a good thought, but this is with an already-established session running from localhost. Based on the strace, I can guess that this is definitely something disk-based and that I'm just going to have to deal with it from that angle. John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance (fwd)
John Madden wrote: This guy is having a problem with cyrus-imap and ext3 - when multiple processes are attempting to write to the one filesystem (but not the one file), performance drops to next to nothing when only five processes are writing. An strace shows most of the time is being spent in fdatasync and fsync. Actually, the thread just got off topic quickly -- I'm running this on reiserfs, not ext3. ...And I've got it mounted with data=writeback, too. But thanks for the info, Andrew. John I'll bet that the fakesync preload library will make diference for you. -- Sergio Bruder Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance (fwd)
This guy is having a problem with cyrus-imap and ext3 - when multiple processes are attempting to write to the one filesystem (but not the one file), performance drops to next to nothing when only five processes are writing. An strace shows most of the time is being spent in fdatasync and fsync. Actually, the thread just got off topic quickly -- I'm running this on reiserfs, not ext3. ...And I've got it mounted with data=writeback, too. But thanks for the info, Andrew. Sorry, my confusion. But it might be worth asking the reiserfs guys. My experience has been that if you are fsync'ing files, then even modern disks only get around 10 fsync's per second (because not only does the file data get writen out, but typically the inode, the directory entry, the free block table and maybe even all the directory entries up to root). Journalling can help, because the commited data is writen sequentially to the journal, rather than being scattered all over the disk, but the journalled operations still need to be applied to the filesystem sooner or later. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance (fwd)
Yes, on ext3, an fsync() syncs the entire filesystem. It has to, because all the metadata for each file is shared - it's just a string of journallable blocks. Similar story with the data, in ordered mode. So effectively, fsync()ing five files one time each is performing 25 fsync()s. One fix (which makes the application specific to ext3 in ordered-data or journalled-data mode) is to perform a single fsync(), with the understanding that this has the side-effect of fsyncing all the other files. That's an ugly solution and is rather hard to do if the workload consists of five separate processes! So I'd recommending mounting the filesystem with the `-o data=writeback' mode. This way, each fsync(fd) will sync fd's data only. That's much better than the default data-ordered mode, wherein a single fsync() will sync all the other file's data too. In data=writeback mode it is still the case that fsync(fd) will sync the other file's metadata, but that's a single linear write to the journal and the additional cost should be low. Bottom line: please try data=writeback, let me know. Does this mean that those of us using XFS should run some testing as well? thanks, joshua Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
As expected, these are from locking operations. 0x8 is file descriptor, which, if I read lsof output correctly, points to config/socket/imap-0.lock (what would that be?) and 0x7 is F_SETLKW which reads as set lock or wait for it to be released in the manual page. Yup, that's exactly the sort of thing I was suspecting -- the performance I was seeing just didn't make sense. imap-0.lock is in /var/imap/socket for me. I believe it's one of the lock files created when cyrus is started, so it wouldn't make any sense for imapd to ever be spinning on it. The delays I was seeing ocurred when multiple imapd's were writing to the spool at the same time. I do see a lot of this though: fcntl(6, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0 It looks like the lock to open a file in the target mailbox. But again, very low actual throughput and still little or no iowait. However, adding a -c to the strace, the top three syscalls are: % time seconds usecs/call callserrors syscall -- --- --- - - 52.680.5147201243 414 fdatasync 29.870.291830 846 345 fsync 4.190.040898 27 1519 fcntl Makes me wonder why the fsync's are taking so long since the disk is performing so well. Anyone know if that's actually typical? Also interesting is the errors column for the open() call on this strace: % time seconds usecs/call callserrors syscall -- --- --- - - 1.070.019902 17 622130 open Why 130 errors? I assume if there's an error that the call is re-tried... John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
On Tue, 8 Nov 2005 09:25:54 -0500 (EST) John Madden [EMAIL PROTECTED] wrote: The delays I was seeing ocurred when multiple imapd's were writing to the spool at the same time. I do see a lot of this though: fcntl(6, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0 It looks like the lock to open a file in the target mailbox. But again, very low actual throughput and still little or no iowait. However, adding a -c to the strace, the top three syscalls are: % time seconds usecs/call callserrors syscall -- --- --- - - 52.680.5147201243 414 fdatasync 29.870.291830 846 345 fsync 4.190.040898 27 1519 fcntl Makes me wonder why the fsync's are taking so long since the disk is performing so well. Anyone know if that's actually typical? Hm. I'd definitely take a second look at your ds6800 configuration ... How is your write cache configured there? I can't really measure the percentage of fdatasync, since on live system most is time spent in select() and read() ... Also interesting is the errors column for the open() call on this strace: % time seconds usecs/call callserrors syscall -- --- --- - - 1.070.019902 17 622130 open Why 130 errors? I assume if there's an error that the call is re-tried... Probably many ENOENT when trying to open msg/motd and msg/shutdown files. -- Jure Pečar http://jure.pecar.org Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
Hm. I'd definitely take a second look at your ds6800 configuration ... How is your write cache configured there? Let's just say they're not terribly clear on that. :) -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
On Tue, 2005-11-08 at 09:25 -0500, John Madden wrote: Makes me wonder why the fsync's are taking so long since the disk is performing so well. Anyone know if that's actually typical? Some time ago I wrote a little LD_PRELOAD libary that neutered fsync() and related calls, intended for use with migration; maybe it'll help, maybe it won't. At any rate, if you're doing practice migrations and aren't worried too much about trashing your test system, try it and let me know. http://haus.nakedape.cc/svn/public/trunk/small-projects/fakesync/ Wil -- Wil Cooley [EMAIL PROTECTED] Naked Ape Consulting, Ltd signature.asc Description: This is a digitally signed message part Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance (fwd)
I forwarded John's message to Andrew Morton, linux kernel maintainer, and this is his reply (it was cc'ed to the list, but, not being a subscriber, I presume it bounced): --- Forwarded Message Date:Tue, 08 Nov 2005 15:21:31 -0800 From:Andrew Morton [EMAIL PROTECTED] To: Andrew McNamara [EMAIL PROTECTED] cc: John Madden [EMAIL PROTECTED], info-cyrus@lists.andrew.cmu.edu Subject: Re: improving concurrency/performance (fwd) Andrew McNamara [EMAIL PROTECTED] wrote: This guy is having a problem with cyrus-imap and ext3 - when multiple processes are attempting to write to the one filesystem (but not the one file), performance drops to next to nothing when only five processes are writing. An strace shows most of the time is being spent in fdatasync and fsync. ... Yes, on ext3, an fsync() syncs the entire filesystem. It has to, because all the metadata for each file is shared - it's just a string of journallable blocks. Similar story with the data, in ordered mode. So effectively, fsync()ing five files one time each is performing 25 fsync()s. One fix (which makes the application specific to ext3 in ordered-data or journalled-data mode) is to perform a single fsync(), with the understanding that this has the side-effect of fsyncing all the other files. That's an ugly solution and is rather hard to do if the workload consists of five separate processes! So I'd recommending mounting the filesystem with the `-o data=writeback' mode. This way, each fsync(fd) will sync fd's data only. That's much better than the default data-ordered mode, wherein a single fsync() will sync all the other file's data too. In data=writeback mode it is still the case that fsync(fd) will sync the other file's metadata, but that's a single linear write to the journal and the additional cost should be low. Bottom line: please try data=writeback, let me know. --- Forwarded Message Date:Tue, 08 Nov 2005 09:25:54 -0500 From:John Madden [EMAIL PROTECTED] To: Jure =?iso-8859-1?Q?Pe=E8ar?= [EMAIL PROTECTED] cc: info-cyrus@lists.andrew.cmu.edu Subject: Re: improving concurrency/performance As expected, these are from locking operations. 0x8 is file descriptor, which, if I read lsof output correctly, points to config/socket/imap-0.lock (what would that be?) and 0x7 is F_SETLKW which reads as set lock or wait for it to be released in the manual page. Yup, that's exactly the sort of thing I was suspecting -- the performance I was seeing just didn't make sense. imap-0.lock is in /var/imap/socket for me. I believe it's one of the lock file s created when cyrus is started, so it wouldn't make any sense for imapd to ever be spinning on it. The delays I was seeing ocurred when multiple imapd's were writing to the spool at the same time. I do see a lot of this though: fcntl(6, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0 It looks like the lock to open a file in the target mailbox. But again, very l ow actual throughput and still little or no iowait. However, adding a -c to the strace, the top three syscalls are: % time seconds usecs/call callserrors syscall - -- --- --- - - 52.680.5147201243 414 fdatasync 29.870.291830 846 345 fsync 4.190.040898 27 1519 fcntl Makes me wonder why the fsync's are taking so long since the disk is performing so well. Anyone know if that's actually typical? -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] - Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html --- End of Forwarded Message --- End of Forwarded Message Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
Hi, I would just request that the tests and comments in this thread should be added to the Cyrus wiki. Kind regards, Tarjei On Mon, 2005-11-07 at 02:46 -0200, Sergio Devojno Bruder wrote: David Lang wrote: (..) I was recently doing some testing of lots of small files on the various filesystems, and I ran into a huge difference (8x) depending on what allocator was used for ext*. the default allocator changed between ext2 and ext3 (you can override it as a mount option) and when reading 1M files (10 dirs of 10 dirs of 10 dirs of 1000 1K files) the time to read them went from ~5 min with the old allocator useed in ext2 to 40 min for the one that's the default for ext3. David Lang (!!) Interesting. You said mount options? man mount man page only show me data=journal, data=ordered, data=writeback, etcetera. How can I change that? -- Sergio Bruder Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html -- Tarjei Huse [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
Hi, Andrew Morgan wrote: On Sun, 6 Nov 2005, Michael Loftis wrote: I'd also be VERY interested since our experience was quite the opposite. ReiserFS was faster than all three, XFS trailing a dismal third (also had corruption issues) and ext3 second or even more dismal third, depending on if you ignored it's wretched large directory performance or not. ReiserFS performed solidly and predictably in all tests. Not the same could be said for XFS and ext3. This was about 2 yrs ago though. Make sure that you format ext3 partitions with dir_index which improves large directory performance. ... but decreases read performance in general... at least that is what I found under RH / Fedora! Look at: http://www.surfnetters.nl/paul/fs/2/read.png http://www.surfnetters.nl/paul/fs/tarcopy-read-ext3.png ... to see that reading from ext3 with dir_index enabled takes about 2h15 to read 20 Gb of mail data, while... http://www.surfnetters.nl/paul/fs/read-plainext3-reiserfs.png http://www.surfnetters.nl/paul/fs/2/read2.png ... without dir_index it takes only 15 minutes! ReiserFS was a bit slower for me with reads, but faster in writes. ReiserFS was also predictable in writes, where ext3 was slow(er) on large directories, but not that dramaticly. (I have graphs of that too.) http://www.surfnetters.nl/paul/fs/2/write.png BTW: I found that chaning dir_index with tune2fs didn't work as expected. If I disabled the dir_indexes, even after a forced fsck, performance was still slow. Enabling didn't give predictable results either: I had to specify it with mkfs. I posted this to the RedHat list once, no-one replied. I decided that my tests where satisfied my questions for what fs to use under RedHat 4 for our NG mail platform... (ext3: supported, lots of (coroner) tools, fast enough, available in the stock kernel (if you need RH), ... and ReiserFS 3 already has a successor, ...) Paul P.S. I also compared FreeBSD's UFS2 under 5.3. I should maybe try again with 6, since that release should have improved filesystem and disk performance in general. We use FreeBSD now, it's a pity we move to RH for this, but Dell hardware maybe says enough. P.S. You can look at the graph's including some comments on http://www.surfnetters.nl/paul/fs/2 and http://www.surfnetters.nl/paul/fs (more rubbish) Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
Have you tried running something like postmark http://packages.debian.org/stable/utils/postmark to benchmark your filesystem? The disks are quite fast. bonnie++, for example, shows writes at over 300MB/s. What I'm finding though is that the processes aren't ever pegging them out -- nothing ever goes into iowait. The bottleneck is elsewhere... John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
On Mon, 7 Nov 2005 09:00:08 -0500 (EST) John Madden [EMAIL PROTECTED] wrote: The disks are quite fast. bonnie++, for example, shows writes at over 300MB/s. What I'm finding though is that the processes aren't ever pegging them out -- nothing ever goes into iowait. The bottleneck is elsewhere... It's situations like this Dtrace was made for. But on linux we still have to use some 'gut feeling' to figure it out ... So you say you have fast disks for bonnie, but still see slow imap copy operations? What kind of SAN exactly do you have attached? Because fsync() calls would still be my primary suspect here ... You say you're copying mail spool from one box to another via imap. Is the source box able to provide mails at a fast enough rate? -- Jure Pečar http://jure.pecar.org/ Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
On Mon, Nov 07, 2005 at 11:59:39AM +0100, Paul Dekkers wrote: Make sure that you format ext3 partitions with dir_index which improves large directory performance. ... but decreases read performance in general... at least that is what I found under RH / Fedora! Yes, processing directory entries in the order returned by readdir() is slower when dir_index is enabled: https://listman.redhat.com/archives/ext3-users/2004-September/msg00029.html Actually you'd get similar slowdown if you create/delete a lot of files in random order. The speed of readdir() w/ dir_index on a newly populated directory should be similar to the speed of readdir() w/o dir_index on a heavily used mail folder. It would be interesting if you could repeat the measurement with LD_PRELOAD'ing the readdir-sorting library posted in: https://listman.redhat.com/archives/ext3-users/2004-September/msg00025.html BTW: I found that chaning dir_index with tune2fs didn't work as expected. If I disabled the dir_indexes, even after a forced fsck, performance was still slow. Enabling didn't give predictable results either: I had to specify it with mkfs. Disabling dir_index will not reorder existing directories. Again, the result should be similar to what you'd get after creating/deleting a lot of files in random order, so that the readdir() order no longer matches the disk layout. Also, this read benchmark only models the case when users download every mail in a folder sequentially (like POP3). It does not tell anything about the case when users randomly open files (by explicit filename instead of using readdir()) inside a folder. I think Cyrus uses its own database instead of doing a readdir() even for POP3, so this benchmark may not even match POP3 usage. Gabor -- - MTA SZTAKI Computer and Automation Research Institute Hungarian Academy of Sciences - Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
It's situations like this Dtrace was made for. But on linux we still have to use some 'gut feeling' to figure it out ... True. It's that sort of tool that I'm looking for, specifically to look into concurrency on the skiplist db's, as the system load is so low that it seems there's got to be a simple explanation for what's going on. So you say you have fast disks for bonnie, but still see slow imap copy operations? What kind of SAN exactly do you have attached? Because fsync() calls would still be my primary suspect here ... It's an IBM DS-6800. You say you're copying mail spool from one box to another via imap. Is the source box able to provide mails at a fast enough rate? Yeah, the load average there dropped to near zero, with no iowait and only 1-3MB/s coming off its disk. Perhaps it's worth repeating: With a single imapcopy process, the whole thing goes along pretty quickly, but drops off significantly with a second process and comes to basically a crawl with just 5 processes running concurrently. I gambled that I could shorten my migration by running more than one at a time since one only seems to raise the load on the box to 0.80. With 5, I'm only able to get it to around 2.5 and only briefly as the throughput starts to drop off. With a little multi-threaded perl, I wrote a quick benchmark script that (in parallel) grabs a random user, logs in, selects the INBOX, pulls out the first message and logs out. I'm able to do about 230 of those per second, so at least the read performance is more than acceptable. (And the client box here, a 4-CPU Opteron 850, is definitely the bottleneck anyway.) John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
On Mon, 7 Nov 2005 12:41:03 -0500 (EST) John Madden [EMAIL PROTECTED] wrote: Perhaps it's worth repeating: With a single imapcopy process, the whole thing goes along pretty quickly, but drops off significantly with a second process and comes to basically a crawl with just 5 processes running concurrently. I gambled that I could shorten my migration by running more than one at a time since one only seems to raise the load on the box to 0.80. With 5, I'm only able to get it to around 2.5 and only briefly as the throughput starts to drop off. That is a start. Try to strace -tt all of imapd processes running concurrently and examine in which syscalls most time is spent. I hope that would give you at least a lead ... For example, on my production system I see some suspicious long pauses at fcntl64(0x8, 0x7, 0xsomeaddr, 0xsomeotheraddr) calls ... lets dig what this is. -- Jure Pečar http://jure.pecar.org/ Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
On Mon, 7 Nov 2005 22:31:42 +0100 Jure Pečar [EMAIL PROTECTED] wrote: For example, on my production system I see some suspicious long pauses at fcntl64(0x8, 0x7, 0xsomeaddr, 0xsomeotheraddr) calls ... lets dig what this is. As expected, these are from locking operations. 0x8 is file descriptor, which, if I read lsof output correctly, points to config/socket/imap-0.lock (what would that be?) and 0x7 is F_SETLKW which reads as set lock or wait for it to be released in the manual page. I'm sure some cyrus expert (Ken? :) can explain immediately the role of imap-0.lock and all the locking going on around it ... And if there's anything we can do to speed it up ... -- Jure Pečar http://jure.pecar.org/ Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
Jure Pečar wrote: On Sun, 06 Nov 2005 03:58:15 -0200 Sergio Devojno Bruder [EMAIL PROTECTED] wrote: In our experience FS-wise, ReiserFS is the worst performer between ext3, XFS e ReiserFS (with tailBLAH turned on or off) for a Cyrus Backend (1M mailboxes in 3 partitions per backend, 0.5TB each partition). Interesting ... can you provide some numbers, even from memory? I always thought that reiserfs is best suited for jobs like this. Also, I'm quite happy with it, but I havent done any hard-core scientific measurements. From memory: 2 backends, same hardware (xeons), same storage, same number of mailboxes (aprox). One with ext3 spools, other with reiserFS spools. the reiserFS one was handling half the simultaneous use of the ext3 one. -- Sergio Bruder Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
On Sun, 06 Nov 2005 03:58:15 -0200 Sergio Devojno Bruder [EMAIL PROTECTED] wrote: In our experience FS-wise, ReiserFS is the worst performer between ext3, XFS e ReiserFS (with tailBLAH turned on or off) for a Cyrus Backend (1M mailboxes in 3 partitions per backend, 0.5TB each partition). Interesting ... can you provide some numbers, even from memory? I always thought that reiserfs is best suited for jobs like this. Also, I'm quite happy with it, but I havent done any hard-core scientific measurements. One thing to keep in mind is that while ReiserFS is usually good at handling a large number of small files, it eats up much more CPU cycles than other filesystems, like ext3 or XFS. So, if you're only running a benchmark, it may no show up the same way like in a mixed load test, where CPU may also be used by other components of the system. At least that's what showed up in my tests years ago. Simon Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
--On November 6, 2005 12:51:33 PM +0100 Jure Pečar [EMAIL PROTECTED] wrote: On Sun, 06 Nov 2005 03:58:15 -0200 Sergio Devojno Bruder [EMAIL PROTECTED] wrote: In our experience FS-wise, ReiserFS is the worst performer between ext3, XFS e ReiserFS (with tailBLAH turned on or off) for a Cyrus Backend (1M mailboxes in 3 partitions per backend, 0.5TB each partition). Interesting ... can you provide some numbers, even from memory? I'd also be VERY interested since our experience was quite the opposite. ReiserFS was faster than all three, XFS trailing a dismal third (also had corruption issues) and ext3 second or even more dismal third, depending on if you ignored it's wretched large directory performance or not. ReiserFS performed solidly and predictably in all tests. Not the same could be said for XFS and ext3. This was about 2 yrs ago though. I always thought that reiserfs is best suited for jobs like this. Also, I'm quite happy with it, but I havent done any hard-core scientific measurements. Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
On Sun, 6 Nov 2005, Michael Loftis wrote: I'd also be VERY interested since our experience was quite the opposite. ReiserFS was faster than all three, XFS trailing a dismal third (also had corruption issues) and ext3 second or even more dismal third, depending on if you ignored it's wretched large directory performance or not. ReiserFS performed solidly and predictably in all tests. Not the same could be said for XFS and ext3. This was about 2 yrs ago though. Make sure that you format ext3 partitions with dir_index which improves large directory performance. You'll probably also want to increase the number of inodes. Here is what I used: mkfs -t ext3 -j -m 1 -O dir_index /dev/sdb1 tune2fs -c 0 -i 0 /dev/sdb1 This was on an 800GB Dell/EMC Cx500 array. Andy Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
On Sun, 6 Nov 2005 14:20:03 -0800 (PST) Andrew Morgan [EMAIL PROTECTED] wrote: mkfs -t ext3 -j -m 1 -O dir_index /dev/sdb1 tune2fs -c 0 -i 0 /dev/sdb1 What about 1k blocks? I think they'd be more useful than 4k on mail spools ... -- Jure Pečar http://jure.pecar.org/ Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
On Mon, 7 Nov 2005, Jure [ISO-8859-2] Pe?ar wrote: On Sun, 6 Nov 2005 14:20:03 -0800 (PST) Andrew Morgan [EMAIL PROTECTED] wrote: mkfs -t ext3 -j -m 1 -O dir_index /dev/sdb1 tune2fs -c 0 -i 0 /dev/sdb1 What about 1k blocks? I think they'd be more useful than 4k on mail spools ... Maybe, could be a tradeoff though of size versus number of messages on average for your users. Andy Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
On Mon, 7 Nov 2005, Jure Pe?ar wrote: On Sun, 6 Nov 2005 14:20:03 -0800 (PST) Andrew Morgan [EMAIL PROTECTED] wrote: mkfs -t ext3 -j -m 1 -O dir_index /dev/sdb1 tune2fs -c 0 -i 0 /dev/sdb1 What about 1k blocks? I think they'd be more useful than 4k on mail spools ... I was recently doing some testing of lots of small files on the various filesystems, and I ran into a huge difference (8x) depending on what allocator was used for ext*. the default allocator changed between ext2 and ext3 (you can override it as a mount option) and when reading 1M files (10 dirs of 10 dirs of 10 dirs of 1000 1K files) the time to read them went from ~5 min with the old allocator useed in ext2 to 40 min for the one that's the default for ext3. David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
In our experience FS-wise, ReiserFS is the worst performer between ext3, XFS e ReiserFS (with tailBLAH turned on or off) for a Cyrus Backend (1M mailboxes in 3 partitions per backend, 0.5TB each partition). Interesting ... can you provide some numbers, even from memory? I'd also be VERY interested since our experience was quite the opposite. ReiserFS was faster than all three, XFS trailing a dismal third (also had corruption issues) and ext3 second or even more dismal third, depending on if you ignored it's wretched large directory performance or not. ReiserFS performed solidly and predictably in all tests. Not the same could be said for XFS and ext3. This was about 2 yrs ago though. This was also our experience. ReiserFS was the fastest, most stable, and the most predictable of the 3. The concept of predictable is an interesting one. Basically we were doing lots of tests including a bunch of simultanoues load tests (do some cyrus tests, and at the same time do a bunch of other things that caused lots of IO on the system) and what we found was that while ext3 in particular seemed to jump around performance wise a lot (it seemed to strangely allocate a lot of IO for a while to cyrus, then slow down to a crawl, then speed up again, etc) reiserfs performed very consistently during the entire test. No idea what caused this, but was an interesting observation. My previous post on the filesystem topic as well... http://permalink.gmane.org/gmane.mail.imap.cyrus/15683 Rob -- [EMAIL PROTECTED] Sign up at http://fastmail.fm for fast, ad free, IMAP accessible email Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
Michael Loftis wrote: Interesting ... can you provide some numbers, even from memory? I'd also be VERY interested since our experience was quite the opposite. ReiserFS was faster than all three, XFS trailing a dismal third (also had corruption issues) and ext3 second or even more dismal third, depending on if you ignored it's wretched large directory performance or not. ReiserFS performed solidly and predictably in all tests. Not the same could be said for XFS and ext3. This was about 2 yrs ago though. Our cyrus in production have one diff from stock cyrus, I almost forgot: we tweaked the directory hash functions, we use a 2 level deep hash, and that can make a lot of a diferente specially comparing FS's. We tweaked our hash function specially to guarantee that our users directories will in the vast majority of the cases will occupy only one block with ext3 (4k). -- Sergio Devojno Bruder Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
On Mon, 7 Nov 2005, Sergio Devojno Bruder wrote: David Lang wrote: (..) I was recently doing some testing of lots of small files on the various filesystems, and I ran into a huge difference (8x) depending on what allocator was used for ext*. the default allocator changed between ext2 and ext3 (you can override it as a mount option) and when reading 1M files (10 dirs of 10 dirs of 10 dirs of 1000 1K files) the time to read them went from ~5 min with the old allocator useed in ext2 to 40 min for the one that's the default for ext3. David Lang (!!) Interesting. You said mount options? man mount man page only show me data=journal, data=ordered, data=writeback, etcetera. How can I change that? I found more things listed under /usr/src/linux/Documentation/filesystems there are ext2.txt and ext3.txt files that list all the options available. note that with my test all the files were created in order, it may be that if the files are created in a random order things would be different, so further testing is warrented. I was doing tests to find how long it took to tar/untar these files (with the tarball on a different drive). here are the notes I made at the time. this was either 2.6.8.1 or 2.6.13.4 (I upgraded about that time, but I'm not sure what the exact timeing was note that on my cyrus server I actually use XFS with very large folders (20,000 mails in one folder) and it seems lightning fast. I haven't reconciled that observed bahavior with the tests listed below the fact that on ext* filesystems I had tests range from 5 min to 80 min is somewhat scary. I did make sure to clear memory (by reading a file larger then available ram and doing a sync) between tests David Lang on ext2 reading the tarball takes 53 seconds, createing the tar takes 10m, untarring it takes 4 min, copying it between drives on different controllers takes 62 seconds. XFS looks bad for small files (13 min to untar, 9:41 to tar), but good for large files (47 sec to read) reiserfs: reading the tar 43 sec 4:50 to tar 2:06 to untar (it was designed for tiny files and it appears to do that well) a couple tests I ran on reiserfs that I hadn't thought to run on the others, untaring on top of an existing directory took 7m, ls -lR took 2:40, ls -flR (unsorted) took 2:40, find . -print took 21sec, rm -r took 3m jfs: 57 sec to read, untar 15:30, no other tests run ext3: untar 3:30, read 64sec, tar 5:46, untarring on top of an existing directory 5:20, ls -lR 53 sec, ls -flR 47 sec, find . -print 7 sec enabling dir_hash changed the read (36 sec) ls -flr (57 sec), ls -lR 61 sec, find (25 sec), tar 81m!!! turning off dir_hash and removing the journal (effectivly turning it into ext2 again) and mounting noatime the tar goes to 34 min mounting with oldalloc,noatime untar is 4:45, tar is 5:51. -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
David Lang wrote: (..) I was recently doing some testing of lots of small files on the various filesystems, and I ran into a huge difference (8x) depending on what allocator was used for ext*. the default allocator changed between ext2 and ext3 (you can override it as a mount option) and when reading 1M files (10 dirs of 10 dirs of 10 dirs of 1000 1K files) the time to read them went from ~5 min with the old allocator useed in ext2 to 40 min for the one that's the default for ext3. David Lang (!!) Interesting. You said mount options? man mount man page only show me data=journal, data=ordered, data=writeback, etcetera. How can I change that? -- Sergio Bruder Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
John Madden wrote: I've had great experience with the performance of Cyrus thus far, but I'm testing a migration at the moment (via imapcopy) and I'm having some pretty stinky results. There's no iowait (4 stripes on a 2Gbps SAN), no cpu usage, nothing waiting on the network, and still I'm seeing terrible performance. I assume this points to something internal, such as concurrency on the db files. I've converted everything to skiplist already, I've tweaked reiserfs's mount options, what little Berkeley still used appears to be ok (no waiting on locks and such), so I'm at a loss. Is there a general checklist of things to have a look at? Are their tools to look at the metrics of the skiplist db's (such as Berkeley's db_stat)? Am I doomed to suffer sub-par performance as long as IMAP writes are happening? Migration's coming on the 24th. I'm now officially sweating. :) Thanks, John In our experience FS-wise, ReiserFS is the worst performer between ext3, XFS e ReiserFS (with tailBLAH turned on or off) for a Cyrus Backend (1M mailboxes in 3 partitions per backend, 0.5TB each partition). -- Sergio Bruder Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
improving concurrency/performance
I've had great experience with the performance of Cyrus thus far, but I'm testing a migration at the moment (via imapcopy) and I'm having some pretty stinky results. There's no iowait (4 stripes on a 2Gbps SAN), no cpu usage, nothing waiting on the network, and still I'm seeing terrible performance. I assume this points to something internal, such as concurrency on the db files. I've converted everything to skiplist already, I've tweaked reiserfs's mount options, what little Berkeley still used appears to be ok (no waiting on locks and such), so I'm at a loss. Is there a general checklist of things to have a look at? Are their tools to look at the metrics of the skiplist db's (such as Berkeley's db_stat)? Am I doomed to suffer sub-par performance as long as IMAP writes are happening? Migration's coming on the 24th. I'm now officially sweating. :) Thanks, John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
On 11/4/05, John Madden [EMAIL PROTECTED] wrote: I've had great experience with the performance of Cyrus thus far, but I'm testing a migration at the moment (via imapcopy) and I'm having some pretty stinky results. There's no iowait (4 stripes on a 2Gbps SAN), no cpu usage, nothing waiting on the network, and still I'm seeing terrible performance. I assume this points to something internal, such as concurrency on the db files. Have you tried running something like postmark http://packages.debian.org/stable/utils/postmark to benchmark your filesystem? -- Huaqing Zheng Beer and Code Wrangler at Large Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance
How bad is your performance with imapcopy? I've never had 'fast' performance with IMAP. -Patrick On Fri, 4 Nov 2005, John Madden wrote: I've had great experience with the performance of Cyrus thus far, but I'm testing a migration at the moment (via imapcopy) and I'm having some pretty stinky results. There's no iowait (4 stripes on a 2Gbps SAN), no cpu usage, nothing waiting on the network, and still I'm seeing terrible performance. I assume this points to something internal, such as concurrency on the db files. I've converted everything to skiplist already, I've tweaked reiserfs's mount options, what little Berkeley still used appears to be ok (no waiting on locks and such), so I'm at a loss. Is there a general checklist of things to have a look at? Are their tools to look at the metrics of the skiplist db's (such as Berkeley's db_stat)? Am I doomed to suffer sub-par performance as long as IMAP writes are happening? Migration's coming on the 24th. I'm now officially sweating. :) Thanks, John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html