[zfs-discuss] UC Davis Cyrus Incident September 2007

2007-10-18 Thread Gary Mills
Does anyone on this mailing list have an idea what went wrong with
ZFS and Cyrus IMAP?  Here's an excerpt that explains the problem:

  About a week before classes actually start is when all the kids start
  moving back into town and mailing all their buds.  We saw process
  numbers go from 500-ish to as high as 5,000.  Load would climb
  radically after passing 2,000 processes and systems became slow to
  respond.

Here's a suggestion on the cause:

  The root problem seems to be an interaction between Solaris' concept
  of global memory consistency and the fact that Cyrus spawns many
  processes that all memory map (mmap) the same file.  Whenever any
  process updates any part of a memory mapped file, Solaris freezes all
  of the processes that have that file mmaped, updates their memory
  tables, and then re-schedules the processes to run.  When we have
  problems we see the load average go extremely high and no useful work
  gets done by Cyrus.

I'm concerned because I'm also using Cyrus IMAP with ZFS.  So far,
it's been extremely well behaved.  Snapshots are one the best parts of
this system.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UC Davis Cyrus Incident September 2007

2007-10-18 Thread Bill Sommerfeld
On Thu, 2007-10-18 at 08:04 -0500, Gary Mills wrote:
 Here's a suggestion on the cause:
 
   The root problem seems to be an interaction between Solaris' concept
   of global memory consistency and the fact that Cyrus spawns many
   processes that all memory map (mmap) the same file.  Whenever any
   process updates any part of a memory mapped file, Solaris freezes all
   of the processes that have that file mmaped, updates their memory
   tables, and then re-schedules the processes to run.  When we have
   problems we see the load average go extremely high and no useful work
   gets done by Cyrus.

that sounds like a somewhat mangled description of the cross-calls done
to invalidate the TLB on other processors when a page is unmapped.
(it certainly doesn't happen on *every* update to a mapped file).

from grepping the source code it looks like cyrus is both multithreaded
and a heavy user of munmap.  

- Bill


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UC Davis Cyrus Incident September 2007

2007-10-18 Thread Mike Gerdts
On 10/18/07, Bill Sommerfeld [EMAIL PROTECTED] wrote:
 that sounds like a somewhat mangled description of the cross-calls done
 to invalidate the TLB on other processors when a page is unmapped.
 (it certainly doesn't happen on *every* update to a mapped file).

I've seen systems running Veritas Cluster  Oracle Cluster Ready
Services idle at about 10% sys due to the huge number of monitoring
scripts that kept firing.  This was on a 12 - 16 CPU 25k domain.  A
quite similar configuration on T2000's had negligible overhead.
Lesson learned: cross-calls (and thread migrations, and ...) are much
cheaper on systems with lower latency between CPUs.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UC Davis Cyrus Incident September 2007

2007-10-18 Thread Gary Mills
On Thu, Oct 18, 2007 at 10:16:52AM -0400, Bill Sommerfeld wrote:
 On Thu, 2007-10-18 at 08:04 -0500, Gary Mills wrote:
  Here's a suggestion on the cause:
  
The root problem seems to be an interaction between Solaris' concept
of global memory consistency and the fact that Cyrus spawns many
processes that all memory map (mmap) the same file.  Whenever any
process updates any part of a memory mapped file, Solaris freezes all
of the processes that have that file mmaped, updates their memory
tables, and then re-schedules the processes to run.  When we have
problems we see the load average go extremely high and no useful work
gets done by Cyrus.
 
 that sounds like a somewhat mangled description of the cross-calls done
 to invalidate the TLB on other processors when a page is unmapped.
 (it certainly doesn't happen on *every* update to a mapped file).
 
 from grepping the source code it looks like cyrus is both multithreaded
 and a heavy user of munmap.  

Here's a process summary of our Cyrus back-end which uses ZFS for its
mail store:

   10:08am  up 38 day(s), 12:11,  1 user,  load average: 2.36, 2.02, 1.89
  %CPUNUM COMM
  5.2 1788imapd
  0.7 11  pop3d
  0.6 43  lmtpd
  0.2 2   /usr/local/cyrus/bin/master
  0.1 1   ps
  0.1 1   idled
  0.1 1   fsflush
  0.1 1   /usr/sbin/syslogd
  0.1 1   /opt/local/mysql/libexec/mysqld

The imapd, pop3d, and lmtpd processes are single-threaded.  There's
one for each client connection.  `master' on the other hand is
supposed to be multi-threaded, but `prstat' shows only one thread now.
There are two because one is the mupdate master and the other is the
back-end master.

What's the command to show cross calls?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UC Davis Cyrus Incident September 2007

2007-10-18 Thread Pramod Batni


 What's the command to show cross calls?
   

mpstat(1M)

example o/p
$ mpstat 1
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 16 0 0 416 316 485 16 0 0 0 618 7 3 0 90
0 6 0 0 425 324 488 2 0 0 0 579 4 2 0 94

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UC Davis Cyrus Incident September 2007

2007-10-18 Thread Mike Gerdts
On 10/18/07, Gary Mills [EMAIL PROTECTED] wrote:
 What's the command to show cross calls?

mpstat will show it on a system basis.

xcallsbypid.d from the DTraceToolkit (ask google) will tell you which
PID is responsible.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UC Davis Cyrus Incident September 2007

2007-10-18 Thread Frank Hofmann
On Thu, 18 Oct 2007, Mike Gerdts wrote:

 On 10/18/07, Bill Sommerfeld [EMAIL PROTECTED] wrote:
 that sounds like a somewhat mangled description of the cross-calls done
 to invalidate the TLB on other processors when a page is unmapped.
 (it certainly doesn't happen on *every* update to a mapped file).

 I've seen systems running Veritas Cluster  Oracle Cluster Ready
 Services idle at about 10% sys due to the huge number of monitoring
 scripts that kept firing.  This was on a 12 - 16 CPU 25k domain.  A

Monitoring scripts and mmap users ... URGH :(

That runs into procfs' notorious keenness on locking the address spaces of 
inspected processes. Even as much as an ls -l /proc/PID/ is acquiring 
address space locks on that process, and I can see how/why this leads to 
CPU spikes when you have an application that heavily uses mmap()/munmap().

One could say, if you want this workload to perform well, trust it to 
perform well and restrain the urge to watch it all the time ...

 quite similar configuration on T2000's had negligible overhead.
 Lesson learned: cross-calls (and thread migrations, and ...) are much
 cheaper on systems with lower latency between CPUs.

And quantum theory tells us: If you hadn't looked, that cat might still be 
living happily ever after ... /proc isn't for free.

FrankH.


 -- 
 Mike Gerdts
 http://mgerdts.blogspot.com/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UC Davis Cyrus Incident September 2007

2007-10-18 Thread michael schuster
Gary Mills wrote:

 What's the command to show cross calls?

mpstat

-- 
Michael SchusterSun Microsystems, Inc.
recursion, n: see 'recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UC Davis Cyrus Incident September 2007

2007-10-18 Thread Gary Mills
On Thu, Oct 18, 2007 at 10:32:58AM -0500, Mike Gerdts wrote:
 On 10/18/07, Gary Mills [EMAIL PROTECTED] wrote:
  What's the command to show cross calls?
 
 mpstat will show it on a system basis.

Thanks.  This is on our T2000 Cyrus IMAP server with ZFS.  It's
the second listing from `mpstat 5'.  How do I recognize when there
are too many cross calls?

  CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
0   29   0 2591   779  139 10378   78  2141   7852  10   0  87
1   12   0 1225   1750  2501   63   950   2572   2   0  96
22   0  427830  1280   35   590   1240   1   0  99
33   0  432660  1210   32   440   1541   1   0  98
4   13   0 1231   1590  3380   20   830   2811   3   0  96
5   11   0  991   1642  3371   22   690   2991   2   0  97
63   0  512   1010  2050   18   510   1850   4   0  96
78   0  598   1190  2460   20   680   2111   1   0  97
89   0  947   1190  2451   20   520   2641   1   0  98
92   0  258   1100  2290   19   580770   4   0  95
   10   13   0 2083   1530  3110   20   710   1510   4   0  96
   11   16   0 1434   1190  2621   21   530   3851   2   0  97
   12   20   0 1093   1450  3081   21   750   3811   2   0  97
   13   12   0 2163   1540  3182   23   680   2111   2   0  97
   14   22   0 2955   1260  2691   22   490   6662   3   0  95
   154   0  313   1320  2731   20   550   1270   1   0  99
   160   0  256   2870  5820   20   600530   2   0  98
   170   0  138   1450  2950   20   540   1160   1   0  98
   181   0  806   2830  5741   19   620   1500   3   0  96
   197   0 2290  2347 2181  3472   23  1050   1691   7   0  92
   201   0  671   605  496  2260   18   610   1400   2   0  98
   217   0 1205   128   26  2030   16   510   1521   7   0  93
   22   15   0 1045   1070  2181   18   560   2430   2   0  98
   23   27   0 2887   1410  3062   19   530   5943   2   0  95
   24   55   3  991   1280  2721   22   560   3881   2   0  97
   25   21   0 1857   1240  2641   19   590   4611   2   0  97
   26   16   0  835   1720  3580   19   930   2671   2   0  97
   27   10   0 1132   1830  3831   20   660   2891   3   0  96
   28   14   0 2761   1030  2251   20   530   3791   3   0  95
   295   0  618990  2120   19   560   1970   3   0  97
   30   16   0  538870  1781   18   470   1851   1   0  98
   310   0  661  11040 23193   24  2060780   8   0  91

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss