On 06/06/2012 05:01 PM, Sašo Kiselkov wrote:
> I'll try and load the machine with dd(1) to the max to see if access
> patterns of my software have something to do with it.

Tried and tested, any and all write I/O to the pool causes this xcall
storm issue, writing more data to it only exacerbates it (i.e. it occurs
more often). I still get storms of over 100k xcalls completely draining
one CPU core, but now they happen in 20-30s intervals rather than every
1-2 minutes. Writing to the rpool, however, does not, so I suspect it
has something to do with the MPxIO and how ZFS is pumping data into the
twin LSI 9200 controllers. Each is attached to a different CPU I/O
bridge (since the system has two Opterons, it has two I/O bridges, each
handling roughly half of the PCI-e links). I did this in the hope of
improving performance (since the HT links to the I/O bridges will be
more evenly loaded). Any idea of this might be the cause of this issue?

The whole system diagram is:

CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-,
 |                                            \
(ht)                                           == JBOD
 |                                            /
CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-'

