[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

2012-06-06 Thread Sašo Kiselkov
So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached
to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two
LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I'm
occasionally seeing a storm of xcalls on one of the 32 VCPUs (10
xcalls a second). The machine is pretty much idle, only receiving a
bunch of multicast video streams and dumping them to the drives (at a
rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of
xcalls that completely eat one of the CPUs, so the mpstat line for the
CPU looks like:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 310   0 102191 1000000 00 100
 0   0

100% busy in the system processing cross-calls. When I tried dtracing
this issue, I found that this is the most likely culprit:

dtrace -n 'sysinfo:::xcalls {@[stack()]=count();}'
   unix`xc_call+0x46
   unix`hat_tlb_inval+0x283
   unix`x86pte_inval+0xaa
   unix`hat_pte_unmap+0xed
   unix`hat_unload_callback+0x193
   unix`hat_unload+0x41
   unix`segkmem_free_vn+0x6f
   unix`segkmem_zio_free+0x27
   genunix`vmem_xfree+0x104
   genunix`vmem_free+0x29
   genunix`kmem_slab_destroy+0x87
   genunix`kmem_slab_free+0x2bb
   genunix`kmem_magazine_destroy+0x39a
   genunix`kmem_depot_ws_reap+0x66
   genunix`taskq_thread+0x285
   unix`thread_start+0x8
3221701

This happens in the sched (pid 0) process. My fsstat one looks like this:

# fsstat /content 1
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
0 0 0   664 0952 0 0 0   664 38.0M /content
0 0 0   658 0935 0 0 0   656 38.6M /content
0 0 0   660 0946 0 0 0   659 37.8M /content
0 0 0   677 0969 0 0 0   676 38.5M /content

What's even more puzzling is that this happens apparently entirely
because of some factor other than userland, since I see no changes to
CPU usage of processes in prstat(1M) when this xcall storm happens, only
an increase of loadavg of +1.00 (the busy CPU).

I Googled and found that
http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html
seems to have been an issue identical to mine, however, it remains
unresolved at that time and it worries me about putting this kind of
machine into production use.

Could some ZFS guru please tell me what's going on in segkmem_zio_free?
When I disable the writers to the /content filesystem, this issue goes
away, so it has obviously something to do with disk IO. Thanks!

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

2012-06-06 Thread Richard Elling
On Jun 6, 2012, at 12:48 AM, Sašo Kiselkov wrote:

 So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached
 to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two
 LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I'm
 occasionally seeing a storm of xcalls on one of the 32 VCPUs (10
 xcalls a second).

That isn't much of a storm, I've seen  1M xcalls in some cases...

 The machine is pretty much idle, only receiving a
 bunch of multicast video streams and dumping them to the drives (at a
 rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of
 xcalls that completely eat one of the CPUs, so the mpstat line for the
 CPU looks like:
 
 CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 310   0 102191 1000000 00 100
 0   0
 
 100% busy in the system processing cross-calls. When I tried dtracing
 this issue, I found that this is the most likely culprit:
 
 dtrace -n 'sysinfo:::xcalls {@[stack()]=count();}'
   unix`xc_call+0x46
   unix`hat_tlb_inval+0x283
   unix`x86pte_inval+0xaa
   unix`hat_pte_unmap+0xed
   unix`hat_unload_callback+0x193
   unix`hat_unload+0x41
   unix`segkmem_free_vn+0x6f
   unix`segkmem_zio_free+0x27
   genunix`vmem_xfree+0x104
   genunix`vmem_free+0x29
   genunix`kmem_slab_destroy+0x87
   genunix`kmem_slab_free+0x2bb
   genunix`kmem_magazine_destroy+0x39a
   genunix`kmem_depot_ws_reap+0x66
   genunix`taskq_thread+0x285
   unix`thread_start+0x8
 3221701
 
 This happens in the sched (pid 0) process. My fsstat one looks like this:
 
 # fsstat /content 1
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
0 0 0   664 0952 0 0 0   664 38.0M /content
0 0 0   658 0935 0 0 0   656 38.6M /content
0 0 0   660 0946 0 0 0   659 37.8M /content
0 0 0   677 0969 0 0 0   676 38.5M /content
 
 What's even more puzzling is that this happens apparently entirely
 because of some factor other than userland, since I see no changes to
 CPU usage of processes in prstat(1M) when this xcall storm happens, only
 an increase of loadavg of +1.00 (the busy CPU).

What exactly is the workload doing?

Local I/O, iSCSI, NFS, or CIFS?

 I Googled and found that
 http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html
 seems to have been an issue identical to mine, however, it remains
 unresolved at that time and it worries me about putting this kind of
 machine into production use.
 
 Could some ZFS guru please tell me what's going on in segkmem_zio_free?

It is freeing memory.

 When I disable the writers to the /content filesystem, this issue goes
 away, so it has obviously something to do with disk IO. Thanks!

Not directly related to disk I/O bandwidth. Can be directly related to other
use, such as deletions -- something that causes frees.

Depending on the cause, there can be some tuning that applies for large
memory machines, where large is = 96 MB.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

2012-06-06 Thread Sašo Kiselkov
On 06/06/2012 04:55 PM, Richard Elling wrote:
 On Jun 6, 2012, at 12:48 AM, Sašo Kiselkov wrote:
 
 So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached
 to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two
 LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I'm
 occasionally seeing a storm of xcalls on one of the 32 VCPUs (10
 xcalls a second).
 
 That isn't much of a storm, I've seen  1M xcalls in some cases...

Well it does make one of the cores 100% busy for around 10-15 seconds,
so it is processing at the maximum rate the core can do it. I'd call
that a sign of something bad(tm) going on.

 The machine is pretty much idle, only receiving a
 bunch of multicast video streams and dumping them to the drives (at a
 rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of
 xcalls that completely eat one of the CPUs, so the mpstat line for the
 CPU looks like:

 CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 310   0 102191 1000000 00 100
 0   0

 100% busy in the system processing cross-calls. When I tried dtracing
 this issue, I found that this is the most likely culprit:

 dtrace -n 'sysinfo:::xcalls {@[stack()]=count();}'
   unix`xc_call+0x46
   unix`hat_tlb_inval+0x283
   unix`x86pte_inval+0xaa
   unix`hat_pte_unmap+0xed
   unix`hat_unload_callback+0x193
   unix`hat_unload+0x41
   unix`segkmem_free_vn+0x6f
   unix`segkmem_zio_free+0x27
   genunix`vmem_xfree+0x104
   genunix`vmem_free+0x29
   genunix`kmem_slab_destroy+0x87
   genunix`kmem_slab_free+0x2bb
   genunix`kmem_magazine_destroy+0x39a
   genunix`kmem_depot_ws_reap+0x66
   genunix`taskq_thread+0x285
   unix`thread_start+0x8
 3221701

 This happens in the sched (pid 0) process. My fsstat one looks like this:

 # fsstat /content 1
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
0 0 0   664 0952 0 0 0   664 38.0M /content
0 0 0   658 0935 0 0 0   656 38.6M /content
0 0 0   660 0946 0 0 0   659 37.8M /content
0 0 0   677 0969 0 0 0   676 38.5M /content

 What's even more puzzling is that this happens apparently entirely
 because of some factor other than userland, since I see no changes to
 CPU usage of processes in prstat(1M) when this xcall storm happens, only
 an increase of loadavg of +1.00 (the busy CPU).
 
 What exactly is the workload doing?

As I wrote above, just receiving multicast video streams and writing
them to disk files, nothing else. The fsstat lines above show that -
pure write load.

 Local I/O, iSCSI, NFS, or CIFS?

Purely local I/O via the two LSI SAS controllers, nothing else.

 I Googled and found that
 http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html
 seems to have been an issue identical to mine, however, it remains
 unresolved at that time and it worries me about putting this kind of
 machine into production use.

 Could some ZFS guru please tell me what's going on in segkmem_zio_free?
 
 It is freeing memory.

Yes, but why is this causing a ton of cross-calls?

 When I disable the writers to the /content filesystem, this issue goes
 away, so it has obviously something to do with disk IO. Thanks!
 
 Not directly related to disk I/O bandwidth. Can be directly related to other
 use, such as deletions -- something that causes frees.

When I'm not writing to disk it doesn't happen, so my guess that it
indeed has something to do with (perhaps) ZFS freeing txg buffers or
something...

 Depending on the cause, there can be some tuning that applies for large
 memory machines, where large is = 96 MB.
  -- richard

I'll try and load the machine with dd(1) to the max to see if access
patterns of my software have something to do with it.

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

2012-06-06 Thread Sašo Kiselkov
On 06/06/2012 05:01 PM, Sašo Kiselkov wrote:
 I'll try and load the machine with dd(1) to the max to see if access
 patterns of my software have something to do with it.

Tried and tested, any and all write I/O to the pool causes this xcall
storm issue, writing more data to it only exacerbates it (i.e. it occurs
more often). I still get storms of over 100k xcalls completely draining
one CPU core, but now they happen in 20-30s intervals rather than every
1-2 minutes. Writing to the rpool, however, does not, so I suspect it
has something to do with the MPxIO and how ZFS is pumping data into the
twin LSI 9200 controllers. Each is attached to a different CPU I/O
bridge (since the system has two Opterons, it has two I/O bridges, each
handling roughly half of the PCI-e links). I did this in the hope of
improving performance (since the HT links to the I/O bridges will be
more evenly loaded). Any idea of this might be the cause of this issue?

The whole system diagram is:

CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-,
 |\
(ht)   == JBOD
 |/
CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-'

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

2012-06-06 Thread Richard Elling
On Jun 6, 2012, at 8:01 AM, Sašo Kiselkov wrote:

 On 06/06/2012 04:55 PM, Richard Elling wrote:
 On Jun 6, 2012, at 12:48 AM, Sašo Kiselkov wrote:
 
 So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached
 to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two
 LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I'm
 occasionally seeing a storm of xcalls on one of the 32 VCPUs (10
 xcalls a second).
 
 That isn't much of a storm, I've seen  1M xcalls in some cases...
 
 Well it does make one of the cores 100% busy for around 10-15 seconds,
 so it is processing at the maximum rate the core can do it. I'd call
 that a sign of something bad(tm) going on.

It is unpleasant, for sure...

 The machine is pretty much idle, only receiving a
 bunch of multicast video streams and dumping them to the drives (at a
 rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of
 xcalls that completely eat one of the CPUs, so the mpstat line for the
 CPU looks like:
 
 CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 310   0 102191 1000000 00 100
 0   0
 
 100% busy in the system processing cross-calls. When I tried dtracing
 this issue, I found that this is the most likely culprit:
 
 dtrace -n 'sysinfo:::xcalls {@[stack()]=count();}'
  unix`xc_call+0x46
  unix`hat_tlb_inval+0x283
  unix`x86pte_inval+0xaa
  unix`hat_pte_unmap+0xed
  unix`hat_unload_callback+0x193
  unix`hat_unload+0x41
  unix`segkmem_free_vn+0x6f
  unix`segkmem_zio_free+0x27
  genunix`vmem_xfree+0x104
  genunix`vmem_free+0x29
  genunix`kmem_slab_destroy+0x87
  genunix`kmem_slab_free+0x2bb
  genunix`kmem_magazine_destroy+0x39a
  genunix`kmem_depot_ws_reap+0x66
  genunix`taskq_thread+0x285
  unix`thread_start+0x8
 3221701
 
 This happens in the sched (pid 0) process. My fsstat one looks like this:
 
 # fsstat /content 1
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
   0 0 0   664 0952 0 0 0   664 38.0M /content
   0 0 0   658 0935 0 0 0   656 38.6M /content
   0 0 0   660 0946 0 0 0   659 37.8M /content
   0 0 0   677 0969 0 0 0   676 38.5M /content
 
 What's even more puzzling is that this happens apparently entirely
 because of some factor other than userland, since I see no changes to
 CPU usage of processes in prstat(1M) when this xcall storm happens, only
 an increase of loadavg of +1.00 (the busy CPU).
 
 What exactly is the workload doing?
 
 As I wrote above, just receiving multicast video streams and writing
 them to disk files, nothing else. The fsstat lines above show that -
 pure write load.
 
 Local I/O, iSCSI, NFS, or CIFS?
 
 Purely local I/O via the two LSI SAS controllers, nothing else.
 
 I Googled and found that
 http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html
 seems to have been an issue identical to mine, however, it remains
 unresolved at that time and it worries me about putting this kind of
 machine into production use.
 
 Could some ZFS guru please tell me what's going on in segkmem_zio_free?
 
 It is freeing memory.
 
 Yes, but why is this causing a ton of cross-calls?

To better understand xcalls, see Adam's excellent blog on the topic
https://blogs.oracle.com/ahl/entry/debugging_cross_calls_on_opensolaris

The scenario described there probably isn't your problem, but the explaination 
of xcalls is quite good.

 When I disable the writers to the /content filesystem, this issue goes
 away, so it has obviously something to do with disk IO. Thanks!
 
 Not directly related to disk I/O bandwidth. Can be directly related to other
 use, such as deletions -- something that causes frees.
 
 When I'm not writing to disk it doesn't happen, so my guess that it
 indeed has something to do with (perhaps) ZFS freeing txg buffers or
 something...
 
 Depending on the cause, there can be some tuning that applies for large
 memory machines, where large is = 96 MB.
 -- richard
 
 I'll try and load the machine with dd(1) to the max to see if access
 patterns of my software have something to do with it.

OK, next step is to determine if this is part of the normal ARC resizing or not.
If it is, then tuning can help. I've got a few scripts to do this that aren't 
posted.
Fortunately, Brendan posted his, solook for the arcreap.d script at
http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/

If you are seeing lots of xcall activity related to arc_shrink, then this can 
be tuned.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

2012-06-06 Thread Richard Elling

On Jun 6, 2012, at 8:22 AM, Sašo Kiselkov wrote:

 On 06/06/2012 05:01 PM, Sašo Kiselkov wrote:
 I'll try and load the machine with dd(1) to the max to see if access
 patterns of my software have something to do with it.
 
 Tried and tested, any and all write I/O to the pool causes this xcall
 storm issue, writing more data to it only exacerbates it (i.e. it occurs
 more often). I still get storms of over 100k xcalls completely draining
 one CPU core, but now they happen in 20-30s intervals rather than every
 1-2 minutes. Writing to the rpool, however, does not, so I suspect it
 has something to do with the MPxIO and how ZFS is pumping data into the
 twin LSI 9200 controllers. Each is attached to a different CPU I/O
 bridge (since the system has two Opterons, it has two I/O bridges, each
 handling roughly half of the PCI-e links). I did this in the hope of
 improving performance (since the HT links to the I/O bridges will be
 more evenly loaded). Any idea of this might be the cause of this issue?

No, this is definitely not the cause of the issue. Looks like a reasonable 
config
to me.
  -- richard

 
 The whole system diagram is:
 
 CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-,
 |\
 (ht)   == JBOD
 |/
 CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-'
 
 Cheers,
 --
 Saso

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

2012-06-06 Thread Jim Mauro

I can't help but be curious about something, which perhaps you verified but
did not post.

What the data here shows is;
- CPU 31 is buried in the kernel (100% sys).
- CPU 31 is handling a moderate-to-high rate of xcalls.

What the data does not prove empirically is that the 100% sys time of
CPU 31 is in xcall handling.

What's the hot stack when this occurs and you run this;

dtrace -n 'profile-997hz /cpu == 31/ { @[stack()] = count(); }'


On Jun 6, 2012, at 3:48 AM, Sašo Kiselkov wrote:

 So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached
 to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two
 LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I'm
 occasionally seeing a storm of xcalls on one of the 32 VCPUs (10
 xcalls a second). The machine is pretty much idle, only receiving a
 bunch of multicast video streams and dumping them to the drives (at a
 rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of
 xcalls that completely eat one of the CPUs, so the mpstat line for the
 CPU looks like:
 
 CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 310   0 102191 1000000 00 100
 0   0
 
 100% busy in the system processing cross-calls. When I tried dtracing
 this issue, I found that this is the most likely culprit:
 
 dtrace -n 'sysinfo:::xcalls {@[stack()]=count();}'
   unix`xc_call+0x46
   unix`hat_tlb_inval+0x283
   unix`x86pte_inval+0xaa
   unix`hat_pte_unmap+0xed
   unix`hat_unload_callback+0x193
   unix`hat_unload+0x41
   unix`segkmem_free_vn+0x6f
   unix`segkmem_zio_free+0x27
   genunix`vmem_xfree+0x104
   genunix`vmem_free+0x29
   genunix`kmem_slab_destroy+0x87
   genunix`kmem_slab_free+0x2bb
   genunix`kmem_magazine_destroy+0x39a
   genunix`kmem_depot_ws_reap+0x66
   genunix`taskq_thread+0x285
   unix`thread_start+0x8
 3221701
 
 This happens in the sched (pid 0) process. My fsstat one looks like this:
 
 # fsstat /content 1
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
0 0 0   664 0952 0 0 0   664 38.0M /content
0 0 0   658 0935 0 0 0   656 38.6M /content
0 0 0   660 0946 0 0 0   659 37.8M /content
0 0 0   677 0969 0 0 0   676 38.5M /content
 
 What's even more puzzling is that this happens apparently entirely
 because of some factor other than userland, since I see no changes to
 CPU usage of processes in prstat(1M) when this xcall storm happens, only
 an increase of loadavg of +1.00 (the busy CPU).
 
 I Googled and found that
 http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html
 seems to have been an issue identical to mine, however, it remains
 unresolved at that time and it worries me about putting this kind of
 machine into production use.
 
 Could some ZFS guru please tell me what's going on in segkmem_zio_free?
 When I disable the writers to the /content filesystem, this issue goes
 away, so it has obviously something to do with disk IO. Thanks!
 
 Cheers,
 --
 Saso
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

2012-06-06 Thread Sašo Kiselkov
On 06/06/2012 09:43 PM, Jim Mauro wrote:
 
 I can't help but be curious about something, which perhaps you verified but
 did not post.
 
 What the data here shows is;
 - CPU 31 is buried in the kernel (100% sys).
 - CPU 31 is handling a moderate-to-high rate of xcalls.
 
 What the data does not prove empirically is that the 100% sys time of
 CPU 31 is in xcall handling.
 
 What's the hot stack when this occurs and you run this;
 
 dtrace -n 'profile-997hz /cpu == 31/ { @[stack()] = count(); }'
 

Thanks for pointing this out. I ran the probe you specified and attached
are the results (I had to chase the xcalls around a bit, because they
were jumping around cores as I was trying to insert the probes). As I
suspected, the most numerous stack trace is the one which causes cross
calls because of the segkmem_zio_free+0x27 code path. While this was
going on, I was getting between 80k and 300k xcalls on the core in question.

The next most common stack was the one ending in mach_cpu_idle and then,
so I'm not sure why the CPU reported 100% busy (perhaps the xcalls were
very expensive on CPU time compared with the 1273 idle's).

Cheers,
--
Saso


xc_call.txt.bz2
Description: application/bzip
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss