On Jun 6, 2012, at 8:01 AM, Sašo Kiselkov wrote:

> On 06/06/2012 04:55 PM, Richard Elling wrote:
>> On Jun 6, 2012, at 12:48 AM, Sašo Kiselkov wrote:
>> 
>>> So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached
>>> to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two
>>> LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I'm
>>> occasionally seeing a storm of xcalls on one of the 32 VCPUs (>100000
>>> xcalls a second).
>> 
>> That isn't much of a storm, I've seen > 1M xcalls in some cases...
> 
> Well it does make one of the cores 100% busy for around 10-15 seconds,
> so it is processing at the maximum rate the core can do it. I'd call
> that a sign of "something bad(tm)" going on.

It is unpleasant, for sure...

>>> The machine is pretty much idle, only receiving a
>>> bunch of multicast video streams and dumping them to the drives (at a
>>> rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of
>>> xcalls that completely eat one of the CPUs, so the mpstat line for the
>>> CPU looks like:
>>> 
>>> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
>>> 31    0   0 102191     1    0    0    0    0    0    0     0    0 100
>>> 0   0
>>> 
>>> 100% busy in the system processing cross-calls. When I tried dtracing
>>> this issue, I found that this is the most likely culprit:
>>> 
>>> dtrace -n 'sysinfo:::xcalls {@[stack()]=count();}'
>>>  unix`xc_call+0x46
>>>  unix`hat_tlb_inval+0x283
>>>  unix`x86pte_inval+0xaa
>>>  unix`hat_pte_unmap+0xed
>>>  unix`hat_unload_callback+0x193
>>>  unix`hat_unload+0x41
>>>  unix`segkmem_free_vn+0x6f
>>>  unix`segkmem_zio_free+0x27
>>>  genunix`vmem_xfree+0x104
>>>  genunix`vmem_free+0x29
>>>  genunix`kmem_slab_destroy+0x87
>>>  genunix`kmem_slab_free+0x2bb
>>>  genunix`kmem_magazine_destroy+0x39a
>>>  genunix`kmem_depot_ws_reap+0x66
>>>  genunix`taskq_thread+0x285
>>>  unix`thread_start+0x8
>>> 3221701
>>> 
>>> This happens in the sched (pid 0) process. My fsstat one looks like this:
>>> 
>>> # fsstat /content 1
>>> new  name   name  attr  attr lookup rddir  read read  write write
>>> file remov  chng   get   set    ops   ops   ops bytes   ops bytes
>>>   0     0     0   664     0    952     0     0     0   664 38.0M /content
>>>   0     0     0   658     0    935     0     0     0   656 38.6M /content
>>>   0     0     0   660     0    946     0     0     0   659 37.8M /content
>>>   0     0     0   677     0    969     0     0     0   676 38.5M /content
>>> 
>>> What's even more puzzling is that this happens apparently entirely
>>> because of some factor other than userland, since I see no changes to
>>> CPU usage of processes in prstat(1M) when this xcall storm happens, only
>>> an increase of loadavg of +1.00 (the busy CPU).
>> 
>> What exactly is the workload doing?
> 
> As I wrote above, just receiving multicast video streams and writing
> them to disk files, nothing else. The fsstat lines above show that -
> pure write load.
> 
>> Local I/O, iSCSI, NFS, or CIFS?
> 
> Purely local I/O via the two LSI SAS controllers, nothing else.
> 
>>> I Googled and found that
>>> http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html
>>> seems to have been an issue identical to mine, however, it remains
>>> unresolved at that time and it worries me about putting this kind of
>>> machine into production use.
>>> 
>>> Could some ZFS guru please tell me what's going on in segkmem_zio_free?
>> 
>> It is freeing memory.
> 
> Yes, but why is this causing a ton of cross-calls?

To better understand xcalls, see Adam's excellent blog on the topic
        https://blogs.oracle.com/ahl/entry/debugging_cross_calls_on_opensolaris

The scenario described there probably isn't your problem, but the explaination 
of xcalls is quite good.

>>> When I disable the writers to the /content filesystem, this issue goes
>>> away, so it has obviously something to do with disk IO. Thanks!
>> 
>> Not directly related to disk I/O bandwidth. Can be directly related to other
>> use, such as deletions -- something that causes frees.
> 
> When I'm not writing to disk it doesn't happen, so my guess that it
> indeed has something to do with (perhaps) ZFS freeing txg buffers or
> something...
> 
>> Depending on the cause, there can be some tuning that applies for large
>> memory machines, where large is >= 96 MB.
>> -- richard
> 
> I'll try and load the machine with dd(1) to the max to see if access
> patterns of my software have something to do with it.

OK, next step is to determine if this is part of the normal ARC resizing or not.
If it is, then tuning can help. I've got a few scripts to do this that aren't 
posted.
Fortunately, Brendan posted his, solook for the arcreap.d script at
http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/

If you are seeing lots of xcall activity related to arc_shrink, then this can 
be tuned.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to