[zfs-discuss] UC Davis Cyrus Incident September 2007
Does anyone on this mailing list have an idea what went wrong with ZFS and Cyrus IMAP? Here's an excerpt that explains the problem: About a week before classes actually start is when all the kids start moving back into town and mailing all their buds. We saw process numbers go from 500-ish to as high as 5,000. Load would climb radically after passing 2,000 processes and systems became slow to respond. Here's a suggestion on the cause: The root problem seems to be an interaction between Solaris' concept of global memory consistency and the fact that Cyrus spawns many processes that all memory map (mmap) the same file. Whenever any process updates any part of a memory mapped file, Solaris freezes all of the processes that have that file mmaped, updates their memory tables, and then re-schedules the processes to run. When we have problems we see the load average go extremely high and no useful work gets done by Cyrus. I'm concerned because I'm also using Cyrus IMAP with ZFS. So far, it's been extremely well behaved. Snapshots are one the best parts of this system. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UC Davis Cyrus Incident September 2007
On Thu, 2007-10-18 at 08:04 -0500, Gary Mills wrote: Here's a suggestion on the cause: The root problem seems to be an interaction between Solaris' concept of global memory consistency and the fact that Cyrus spawns many processes that all memory map (mmap) the same file. Whenever any process updates any part of a memory mapped file, Solaris freezes all of the processes that have that file mmaped, updates their memory tables, and then re-schedules the processes to run. When we have problems we see the load average go extremely high and no useful work gets done by Cyrus. that sounds like a somewhat mangled description of the cross-calls done to invalidate the TLB on other processors when a page is unmapped. (it certainly doesn't happen on *every* update to a mapped file). from grepping the source code it looks like cyrus is both multithreaded and a heavy user of munmap. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UC Davis Cyrus Incident September 2007
On 10/18/07, Bill Sommerfeld [EMAIL PROTECTED] wrote: that sounds like a somewhat mangled description of the cross-calls done to invalidate the TLB on other processors when a page is unmapped. (it certainly doesn't happen on *every* update to a mapped file). I've seen systems running Veritas Cluster Oracle Cluster Ready Services idle at about 10% sys due to the huge number of monitoring scripts that kept firing. This was on a 12 - 16 CPU 25k domain. A quite similar configuration on T2000's had negligible overhead. Lesson learned: cross-calls (and thread migrations, and ...) are much cheaper on systems with lower latency between CPUs. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UC Davis Cyrus Incident September 2007
On Thu, Oct 18, 2007 at 10:16:52AM -0400, Bill Sommerfeld wrote: On Thu, 2007-10-18 at 08:04 -0500, Gary Mills wrote: Here's a suggestion on the cause: The root problem seems to be an interaction between Solaris' concept of global memory consistency and the fact that Cyrus spawns many processes that all memory map (mmap) the same file. Whenever any process updates any part of a memory mapped file, Solaris freezes all of the processes that have that file mmaped, updates their memory tables, and then re-schedules the processes to run. When we have problems we see the load average go extremely high and no useful work gets done by Cyrus. that sounds like a somewhat mangled description of the cross-calls done to invalidate the TLB on other processors when a page is unmapped. (it certainly doesn't happen on *every* update to a mapped file). from grepping the source code it looks like cyrus is both multithreaded and a heavy user of munmap. Here's a process summary of our Cyrus back-end which uses ZFS for its mail store: 10:08am up 38 day(s), 12:11, 1 user, load average: 2.36, 2.02, 1.89 %CPUNUM COMM 5.2 1788imapd 0.7 11 pop3d 0.6 43 lmtpd 0.2 2 /usr/local/cyrus/bin/master 0.1 1 ps 0.1 1 idled 0.1 1 fsflush 0.1 1 /usr/sbin/syslogd 0.1 1 /opt/local/mysql/libexec/mysqld The imapd, pop3d, and lmtpd processes are single-threaded. There's one for each client connection. `master' on the other hand is supposed to be multi-threaded, but `prstat' shows only one thread now. There are two because one is the mupdate master and the other is the back-end master. What's the command to show cross calls? -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UC Davis Cyrus Incident September 2007
What's the command to show cross calls? mpstat(1M) example o/p $ mpstat 1 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 16 0 0 416 316 485 16 0 0 0 618 7 3 0 90 0 6 0 0 425 324 488 2 0 0 0 579 4 2 0 94 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UC Davis Cyrus Incident September 2007
On 10/18/07, Gary Mills [EMAIL PROTECTED] wrote: What's the command to show cross calls? mpstat will show it on a system basis. xcallsbypid.d from the DTraceToolkit (ask google) will tell you which PID is responsible. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UC Davis Cyrus Incident September 2007
On Thu, 18 Oct 2007, Mike Gerdts wrote: On 10/18/07, Bill Sommerfeld [EMAIL PROTECTED] wrote: that sounds like a somewhat mangled description of the cross-calls done to invalidate the TLB on other processors when a page is unmapped. (it certainly doesn't happen on *every* update to a mapped file). I've seen systems running Veritas Cluster Oracle Cluster Ready Services idle at about 10% sys due to the huge number of monitoring scripts that kept firing. This was on a 12 - 16 CPU 25k domain. A Monitoring scripts and mmap users ... URGH :( That runs into procfs' notorious keenness on locking the address spaces of inspected processes. Even as much as an ls -l /proc/PID/ is acquiring address space locks on that process, and I can see how/why this leads to CPU spikes when you have an application that heavily uses mmap()/munmap(). One could say, if you want this workload to perform well, trust it to perform well and restrain the urge to watch it all the time ... quite similar configuration on T2000's had negligible overhead. Lesson learned: cross-calls (and thread migrations, and ...) are much cheaper on systems with lower latency between CPUs. And quantum theory tells us: If you hadn't looked, that cat might still be living happily ever after ... /proc isn't for free. FrankH. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UC Davis Cyrus Incident September 2007
Gary Mills wrote: What's the command to show cross calls? mpstat -- Michael SchusterSun Microsystems, Inc. recursion, n: see 'recursion' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UC Davis Cyrus Incident September 2007
On Thu, Oct 18, 2007 at 10:32:58AM -0500, Mike Gerdts wrote: On 10/18/07, Gary Mills [EMAIL PROTECTED] wrote: What's the command to show cross calls? mpstat will show it on a system basis. Thanks. This is on our T2000 Cyrus IMAP server with ZFS. It's the second listing from `mpstat 5'. How do I recognize when there are too many cross calls? CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 29 0 2591 779 139 10378 78 2141 7852 10 0 87 1 12 0 1225 1750 2501 63 950 2572 2 0 96 22 0 427830 1280 35 590 1240 1 0 99 33 0 432660 1210 32 440 1541 1 0 98 4 13 0 1231 1590 3380 20 830 2811 3 0 96 5 11 0 991 1642 3371 22 690 2991 2 0 97 63 0 512 1010 2050 18 510 1850 4 0 96 78 0 598 1190 2460 20 680 2111 1 0 97 89 0 947 1190 2451 20 520 2641 1 0 98 92 0 258 1100 2290 19 580770 4 0 95 10 13 0 2083 1530 3110 20 710 1510 4 0 96 11 16 0 1434 1190 2621 21 530 3851 2 0 97 12 20 0 1093 1450 3081 21 750 3811 2 0 97 13 12 0 2163 1540 3182 23 680 2111 2 0 97 14 22 0 2955 1260 2691 22 490 6662 3 0 95 154 0 313 1320 2731 20 550 1270 1 0 99 160 0 256 2870 5820 20 600530 2 0 98 170 0 138 1450 2950 20 540 1160 1 0 98 181 0 806 2830 5741 19 620 1500 3 0 96 197 0 2290 2347 2181 3472 23 1050 1691 7 0 92 201 0 671 605 496 2260 18 610 1400 2 0 98 217 0 1205 128 26 2030 16 510 1521 7 0 93 22 15 0 1045 1070 2181 18 560 2430 2 0 98 23 27 0 2887 1410 3062 19 530 5943 2 0 95 24 55 3 991 1280 2721 22 560 3881 2 0 97 25 21 0 1857 1240 2641 19 590 4611 2 0 97 26 16 0 835 1720 3580 19 930 2671 2 0 97 27 10 0 1132 1830 3831 20 660 2891 3 0 96 28 14 0 2761 1030 2251 20 530 3791 3 0 95 295 0 618990 2120 19 560 1970 3 0 97 30 16 0 538870 1781 18 470 1851 1 0 98 310 0 661 11040 23193 24 2060780 8 0 91 -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss