Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free
On 06/06/2012 09:43 PM, Jim Mauro wrote: > > I can't help but be curious about something, which perhaps you verified but > did not post. > > What the data here shows is; > - CPU 31 is buried in the kernel (100% sys). > - CPU 31 is handling a moderate-to-high rate of xcalls. > > What the data does not prove empirically is that the 100% sys time of > CPU 31 is in xcall handling. > > What's the hot stack when this occurs and you run this; > > dtrace -n 'profile-997hz /cpu == 31/ { @[stack()] = count(); }' > Thanks for pointing this out. I ran the probe you specified and attached are the results (I had to chase the xcalls around a bit, because they were jumping around cores as I was trying to insert the probes). As I suspected, the most numerous stack trace is the one which causes cross calls because of the segkmem_zio_free+0x27 code path. While this was going on, I was getting between 80k and 300k xcalls on the core in question. The next most common stack was the one ending in mach_cpu_idle and then, so I'm not sure why the CPU reported 100% busy (perhaps the xcalls were very expensive on CPU time compared with the 1273 idle's). Cheers, -- Saso xc_call.txt.bz2 Description: application/bzip ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free
I can't help but be curious about something, which perhaps you verified but did not post. What the data here shows is; - CPU 31 is buried in the kernel (100% sys). - CPU 31 is handling a moderate-to-high rate of xcalls. What the data does not prove empirically is that the 100% sys time of CPU 31 is in xcall handling. What's the hot stack when this occurs and you run this; dtrace -n 'profile-997hz /cpu == 31/ { @[stack()] = count(); }' On Jun 6, 2012, at 3:48 AM, Sašo Kiselkov wrote: > So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached > to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two > LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I'm > occasionally seeing a storm of xcalls on one of the 32 VCPUs (>10 > xcalls a second). The machine is pretty much idle, only receiving a > bunch of multicast video streams and dumping them to the drives (at a > rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of > xcalls that completely eat one of the CPUs, so the mpstat line for the > CPU looks like: > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 310 0 102191 1000000 00 100 > 0 0 > > 100% busy in the system processing cross-calls. When I tried dtracing > this issue, I found that this is the most likely culprit: > > dtrace -n 'sysinfo:::xcalls {@[stack()]=count();}' > unix`xc_call+0x46 > unix`hat_tlb_inval+0x283 > unix`x86pte_inval+0xaa > unix`hat_pte_unmap+0xed > unix`hat_unload_callback+0x193 > unix`hat_unload+0x41 > unix`segkmem_free_vn+0x6f > unix`segkmem_zio_free+0x27 > genunix`vmem_xfree+0x104 > genunix`vmem_free+0x29 > genunix`kmem_slab_destroy+0x87 > genunix`kmem_slab_free+0x2bb > genunix`kmem_magazine_destroy+0x39a > genunix`kmem_depot_ws_reap+0x66 > genunix`taskq_thread+0x285 > unix`thread_start+0x8 > 3221701 > > This happens in the sched (pid 0) process. My fsstat one looks like this: > > # fsstat /content 1 > new name name attr attr lookup rddir read read write write > file remov chng get setops ops ops bytes ops bytes >0 0 0 664 0952 0 0 0 664 38.0M /content >0 0 0 658 0935 0 0 0 656 38.6M /content >0 0 0 660 0946 0 0 0 659 37.8M /content >0 0 0 677 0969 0 0 0 676 38.5M /content > > What's even more puzzling is that this happens apparently entirely > because of some factor other than userland, since I see no changes to > CPU usage of processes in prstat(1M) when this xcall storm happens, only > an increase of loadavg of +1.00 (the busy CPU). > > I Googled and found that > http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html > seems to have been an issue identical to mine, however, it remains > unresolved at that time and it worries me about putting this kind of > machine into production use. > > Could some ZFS guru please tell me what's going on in segkmem_zio_free? > When I disable the writers to the /content filesystem, this issue goes > away, so it has obviously something to do with disk IO. Thanks! > > Cheers, > -- > Saso > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free
On Jun 6, 2012, at 8:22 AM, Sašo Kiselkov wrote: > On 06/06/2012 05:01 PM, Sašo Kiselkov wrote: >> I'll try and load the machine with dd(1) to the max to see if access >> patterns of my software have something to do with it. > > Tried and tested, any and all write I/O to the pool causes this xcall > storm issue, writing more data to it only exacerbates it (i.e. it occurs > more often). I still get storms of over 100k xcalls completely draining > one CPU core, but now they happen in 20-30s intervals rather than every > 1-2 minutes. Writing to the rpool, however, does not, so I suspect it > has something to do with the MPxIO and how ZFS is pumping data into the > twin LSI 9200 controllers. Each is attached to a different CPU I/O > bridge (since the system has two Opterons, it has two I/O bridges, each > handling roughly half of the PCI-e links). I did this in the hope of > improving performance (since the HT links to the I/O bridges will be > more evenly loaded). Any idea of this might be the cause of this issue? No, this is definitely not the cause of the issue. Looks like a reasonable config to me. -- richard > > The whole system diagram is: > > CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-, > |\ > (ht) == JBOD > |/ > CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-' > > Cheers, > -- > Saso -- ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free
On Jun 6, 2012, at 8:01 AM, Sašo Kiselkov wrote: > On 06/06/2012 04:55 PM, Richard Elling wrote: >> On Jun 6, 2012, at 12:48 AM, Sašo Kiselkov wrote: >> >>> So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached >>> to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two >>> LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I'm >>> occasionally seeing a storm of xcalls on one of the 32 VCPUs (>10 >>> xcalls a second). >> >> That isn't much of a storm, I've seen > 1M xcalls in some cases... > > Well it does make one of the cores 100% busy for around 10-15 seconds, > so it is processing at the maximum rate the core can do it. I'd call > that a sign of "something bad(tm)" going on. It is unpleasant, for sure... >>> The machine is pretty much idle, only receiving a >>> bunch of multicast video streams and dumping them to the drives (at a >>> rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of >>> xcalls that completely eat one of the CPUs, so the mpstat line for the >>> CPU looks like: >>> >>> CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl >>> 310 0 102191 1000000 00 100 >>> 0 0 >>> >>> 100% busy in the system processing cross-calls. When I tried dtracing >>> this issue, I found that this is the most likely culprit: >>> >>> dtrace -n 'sysinfo:::xcalls {@[stack()]=count();}' >>> unix`xc_call+0x46 >>> unix`hat_tlb_inval+0x283 >>> unix`x86pte_inval+0xaa >>> unix`hat_pte_unmap+0xed >>> unix`hat_unload_callback+0x193 >>> unix`hat_unload+0x41 >>> unix`segkmem_free_vn+0x6f >>> unix`segkmem_zio_free+0x27 >>> genunix`vmem_xfree+0x104 >>> genunix`vmem_free+0x29 >>> genunix`kmem_slab_destroy+0x87 >>> genunix`kmem_slab_free+0x2bb >>> genunix`kmem_magazine_destroy+0x39a >>> genunix`kmem_depot_ws_reap+0x66 >>> genunix`taskq_thread+0x285 >>> unix`thread_start+0x8 >>> 3221701 >>> >>> This happens in the sched (pid 0) process. My fsstat one looks like this: >>> >>> # fsstat /content 1 >>> new name name attr attr lookup rddir read read write write >>> file remov chng get setops ops ops bytes ops bytes >>> 0 0 0 664 0952 0 0 0 664 38.0M /content >>> 0 0 0 658 0935 0 0 0 656 38.6M /content >>> 0 0 0 660 0946 0 0 0 659 37.8M /content >>> 0 0 0 677 0969 0 0 0 676 38.5M /content >>> >>> What's even more puzzling is that this happens apparently entirely >>> because of some factor other than userland, since I see no changes to >>> CPU usage of processes in prstat(1M) when this xcall storm happens, only >>> an increase of loadavg of +1.00 (the busy CPU). >> >> What exactly is the workload doing? > > As I wrote above, just receiving multicast video streams and writing > them to disk files, nothing else. The fsstat lines above show that - > pure write load. > >> Local I/O, iSCSI, NFS, or CIFS? > > Purely local I/O via the two LSI SAS controllers, nothing else. > >>> I Googled and found that >>> http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html >>> seems to have been an issue identical to mine, however, it remains >>> unresolved at that time and it worries me about putting this kind of >>> machine into production use. >>> >>> Could some ZFS guru please tell me what's going on in segkmem_zio_free? >> >> It is freeing memory. > > Yes, but why is this causing a ton of cross-calls? To better understand xcalls, see Adam's excellent blog on the topic https://blogs.oracle.com/ahl/entry/debugging_cross_calls_on_opensolaris The scenario described there probably isn't your problem, but the explaination of xcalls is quite good. >>> When I disable the writers to the /content filesystem, this issue goes >>> away, so it has obviously something to do with disk IO. Thanks! >> >> Not directly related to disk I/O bandwidth. Can be directly related to other >> use, such as deletions -- something that causes frees. > > When I'm not writing to disk it doesn't happen, so my guess that it > indeed has something to do with (perhaps) ZFS freeing txg buffers or > something... > >> Depending on the cause, there can be some tuning that applies for large >> memory machines, where large is >= 96 MB. >> -- richard > > I'll try and load the machine with dd(1) to the max to see if access > patterns of my software have something to do with it. OK, next step is to determine if this is part of the normal ARC resizing or not. If it is, then tuning can help. I've got a few scripts to do this that aren't posted. Fortunately, Brendan posted his, solook for the arcreap.d script at http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/ If you are seeing lots of xcall activity related to arc_shrink, then this can be tuned. -- richard -- ZFS Performance and Training richard.ell...@richar
Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free
On 06/06/2012 05:01 PM, Sašo Kiselkov wrote: > I'll try and load the machine with dd(1) to the max to see if access > patterns of my software have something to do with it. Tried and tested, any and all write I/O to the pool causes this xcall storm issue, writing more data to it only exacerbates it (i.e. it occurs more often). I still get storms of over 100k xcalls completely draining one CPU core, but now they happen in 20-30s intervals rather than every 1-2 minutes. Writing to the rpool, however, does not, so I suspect it has something to do with the MPxIO and how ZFS is pumping data into the twin LSI 9200 controllers. Each is attached to a different CPU I/O bridge (since the system has two Opterons, it has two I/O bridges, each handling roughly half of the PCI-e links). I did this in the hope of improving performance (since the HT links to the I/O bridges will be more evenly loaded). Any idea of this might be the cause of this issue? The whole system diagram is: CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-, |\ (ht) == JBOD |/ CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-' Cheers, -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free
On 06/06/2012 04:55 PM, Richard Elling wrote: > On Jun 6, 2012, at 12:48 AM, Sašo Kiselkov wrote: > >> So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached >> to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two >> LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I'm >> occasionally seeing a storm of xcalls on one of the 32 VCPUs (>10 >> xcalls a second). > > That isn't much of a storm, I've seen > 1M xcalls in some cases... Well it does make one of the cores 100% busy for around 10-15 seconds, so it is processing at the maximum rate the core can do it. I'd call that a sign of "something bad(tm)" going on. >> The machine is pretty much idle, only receiving a >> bunch of multicast video streams and dumping them to the drives (at a >> rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of >> xcalls that completely eat one of the CPUs, so the mpstat line for the >> CPU looks like: >> >> CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl >> 310 0 102191 1000000 00 100 >> 0 0 >> >> 100% busy in the system processing cross-calls. When I tried dtracing >> this issue, I found that this is the most likely culprit: >> >> dtrace -n 'sysinfo:::xcalls {@[stack()]=count();}' >> unix`xc_call+0x46 >> unix`hat_tlb_inval+0x283 >> unix`x86pte_inval+0xaa >> unix`hat_pte_unmap+0xed >> unix`hat_unload_callback+0x193 >> unix`hat_unload+0x41 >> unix`segkmem_free_vn+0x6f >> unix`segkmem_zio_free+0x27 >> genunix`vmem_xfree+0x104 >> genunix`vmem_free+0x29 >> genunix`kmem_slab_destroy+0x87 >> genunix`kmem_slab_free+0x2bb >> genunix`kmem_magazine_destroy+0x39a >> genunix`kmem_depot_ws_reap+0x66 >> genunix`taskq_thread+0x285 >> unix`thread_start+0x8 >> 3221701 >> >> This happens in the sched (pid 0) process. My fsstat one looks like this: >> >> # fsstat /content 1 >> new name name attr attr lookup rddir read read write write >> file remov chng get setops ops ops bytes ops bytes >>0 0 0 664 0952 0 0 0 664 38.0M /content >>0 0 0 658 0935 0 0 0 656 38.6M /content >>0 0 0 660 0946 0 0 0 659 37.8M /content >>0 0 0 677 0969 0 0 0 676 38.5M /content >> >> What's even more puzzling is that this happens apparently entirely >> because of some factor other than userland, since I see no changes to >> CPU usage of processes in prstat(1M) when this xcall storm happens, only >> an increase of loadavg of +1.00 (the busy CPU). > > What exactly is the workload doing? As I wrote above, just receiving multicast video streams and writing them to disk files, nothing else. The fsstat lines above show that - pure write load. > Local I/O, iSCSI, NFS, or CIFS? Purely local I/O via the two LSI SAS controllers, nothing else. >> I Googled and found that >> http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html >> seems to have been an issue identical to mine, however, it remains >> unresolved at that time and it worries me about putting this kind of >> machine into production use. >> >> Could some ZFS guru please tell me what's going on in segkmem_zio_free? > > It is freeing memory. Yes, but why is this causing a ton of cross-calls? >> When I disable the writers to the /content filesystem, this issue goes >> away, so it has obviously something to do with disk IO. Thanks! > > Not directly related to disk I/O bandwidth. Can be directly related to other > use, such as deletions -- something that causes frees. When I'm not writing to disk it doesn't happen, so my guess that it indeed has something to do with (perhaps) ZFS freeing txg buffers or something... > Depending on the cause, there can be some tuning that applies for large > memory machines, where large is >= 96 MB. > -- richard I'll try and load the machine with dd(1) to the max to see if access patterns of my software have something to do with it. Cheers, -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free
On Jun 6, 2012, at 12:48 AM, Sašo Kiselkov wrote: > So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached > to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two > LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I'm > occasionally seeing a storm of xcalls on one of the 32 VCPUs (>10 > xcalls a second). That isn't much of a storm, I've seen > 1M xcalls in some cases... > The machine is pretty much idle, only receiving a > bunch of multicast video streams and dumping them to the drives (at a > rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of > xcalls that completely eat one of the CPUs, so the mpstat line for the > CPU looks like: > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 310 0 102191 1000000 00 100 > 0 0 > > 100% busy in the system processing cross-calls. When I tried dtracing > this issue, I found that this is the most likely culprit: > > dtrace -n 'sysinfo:::xcalls {@[stack()]=count();}' > unix`xc_call+0x46 > unix`hat_tlb_inval+0x283 > unix`x86pte_inval+0xaa > unix`hat_pte_unmap+0xed > unix`hat_unload_callback+0x193 > unix`hat_unload+0x41 > unix`segkmem_free_vn+0x6f > unix`segkmem_zio_free+0x27 > genunix`vmem_xfree+0x104 > genunix`vmem_free+0x29 > genunix`kmem_slab_destroy+0x87 > genunix`kmem_slab_free+0x2bb > genunix`kmem_magazine_destroy+0x39a > genunix`kmem_depot_ws_reap+0x66 > genunix`taskq_thread+0x285 > unix`thread_start+0x8 > 3221701 > > This happens in the sched (pid 0) process. My fsstat one looks like this: > > # fsstat /content 1 > new name name attr attr lookup rddir read read write write > file remov chng get setops ops ops bytes ops bytes >0 0 0 664 0952 0 0 0 664 38.0M /content >0 0 0 658 0935 0 0 0 656 38.6M /content >0 0 0 660 0946 0 0 0 659 37.8M /content >0 0 0 677 0969 0 0 0 676 38.5M /content > > What's even more puzzling is that this happens apparently entirely > because of some factor other than userland, since I see no changes to > CPU usage of processes in prstat(1M) when this xcall storm happens, only > an increase of loadavg of +1.00 (the busy CPU). What exactly is the workload doing? Local I/O, iSCSI, NFS, or CIFS? > I Googled and found that > http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html > seems to have been an issue identical to mine, however, it remains > unresolved at that time and it worries me about putting this kind of > machine into production use. > > Could some ZFS guru please tell me what's going on in segkmem_zio_free? It is freeing memory. > When I disable the writers to the /content filesystem, this issue goes > away, so it has obviously something to do with disk IO. Thanks! Not directly related to disk I/O bandwidth. Can be directly related to other use, such as deletions -- something that causes frees. Depending on the cause, there can be some tuning that applies for large memory machines, where large is >= 96 MB. -- richard -- ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free
So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I'm occasionally seeing a storm of xcalls on one of the 32 VCPUs (>10 xcalls a second). The machine is pretty much idle, only receiving a bunch of multicast video streams and dumping them to the drives (at a rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of xcalls that completely eat one of the CPUs, so the mpstat line for the CPU looks like: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 310 0 102191 1000000 00 100 0 0 100% busy in the system processing cross-calls. When I tried dtracing this issue, I found that this is the most likely culprit: dtrace -n 'sysinfo:::xcalls {@[stack()]=count();}' unix`xc_call+0x46 unix`hat_tlb_inval+0x283 unix`x86pte_inval+0xaa unix`hat_pte_unmap+0xed unix`hat_unload_callback+0x193 unix`hat_unload+0x41 unix`segkmem_free_vn+0x6f unix`segkmem_zio_free+0x27 genunix`vmem_xfree+0x104 genunix`vmem_free+0x29 genunix`kmem_slab_destroy+0x87 genunix`kmem_slab_free+0x2bb genunix`kmem_magazine_destroy+0x39a genunix`kmem_depot_ws_reap+0x66 genunix`taskq_thread+0x285 unix`thread_start+0x8 3221701 This happens in the sched (pid 0) process. My fsstat one looks like this: # fsstat /content 1 new name name attr attr lookup rddir read read write write file remov chng get setops ops ops bytes ops bytes 0 0 0 664 0952 0 0 0 664 38.0M /content 0 0 0 658 0935 0 0 0 656 38.6M /content 0 0 0 660 0946 0 0 0 659 37.8M /content 0 0 0 677 0969 0 0 0 676 38.5M /content What's even more puzzling is that this happens apparently entirely because of some factor other than userland, since I see no changes to CPU usage of processes in prstat(1M) when this xcall storm happens, only an increase of loadavg of +1.00 (the busy CPU). I Googled and found that http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html seems to have been an issue identical to mine, however, it remains unresolved at that time and it worries me about putting this kind of machine into production use. Could some ZFS guru please tell me what's going on in segkmem_zio_free? When I disable the writers to the /content filesystem, this issue goes away, so it has obviously something to do with disk IO. Thanks! Cheers, -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss