[zfs-discuss] ZFS partially hangs when removing an rpool mirrored disk while having some IO on another pool on another partition of the same disk

Arnaud Brand Fri, 08 Jan 2010 07:56:09 -0800

Hello,



Sorry for the (very) long subject but I've pinpointed the problem to this exact 
situation.

I know about the other threads related to hangs, but in my case there was no < 
zfs destroy > involved, nor any compression or deduplication.



To make a long story short, when

- a disk contains 2 partitions  (p1=32GB, p2=1800 GB) and

- p1 is used as part of a zfs mirror of rpool and

- p2 is used as part of a raidz (tested raidz1 and raidz2) of tank and

- some serious work  is underway on tank (tested write, copy, scrub),

If you physically remove the disk, zfs partially hangs. Putting back the 
physical disk does not help.



For the long story :



About the hardware :

1 x intel X25E (64GB SSD), 15x2TB SATA drives (7xWD, 8xHitachi), 2xQuadCore 
Xeon, 12GB RAM, 2xAreca-1680 (8-ports SAS controller), tyan S7002 mainboard.



About the software / firmware :

Opensolaris b130 installed on the SSD drive, on the first 32 GB.

The areca cards are configured as a JBOD and are running the latest release 
firmware.



Initial setup :

We created a 32GB partition on all of the 2TB drives and mirrored the system 
partition, giving us a 16-way rpool mirror.

The rest of the 2TB drives's space was put in a second partition and used for a 
raidz2 pool (named tank)



Problem :

Whenever we physically removed a disk from its tray while doing some speed 
testing on the tank pool, the system hung.

At that time I hadn't read all the thread about zfs hangs and couldn't 
determine wether the system was hung or just zfs.

In order to pinpoint the problem, we made another setup.



Second setup :

I  reduced the number of partitions in the rpool mirror down to 3 (p1 from the 
SSD, p1 from a 2TB drive on the same controller as the SSD and p1 from a 2TB 
drive on the other controller).



Problem :

When the system is quiet, I am able to physically remove any disk, plug it back 
and resilver it.

When I am putting some load on the tank pool, I can remove any disk that does 
*not* contain the rpool mirror (I can plug it back and resilver it while the 
load keeps running without noticeable performance impact).



When I am putting some load on the tank pool, I cannot physically remove a disk 
that also contains a mirror of the rpool or zfs partially hangs.

When I say partially, I mean that :

- zpool iostat -v tank 5 freezes

- if I run any zpool command related to rpool, I'm stuck (zpool clear rpool 
c4t0d7s0 for example or zpool status rpool)



I can't launch new programms, but already launched programs continue to run (at 
least in an ssh session, since gnome becomes more and more frozen as you move 
from window to window).



>From ssh sessions :



- prstat shows that only gnome-system-monitor, xorg, ssh, bash and various 
*stat utils (prstat, fstat, iostat, mpstat) are consumming some CPU.



- zpool iostat -v tank 5 is frozen (It freezes when I issue a zpool clear rpool 
c4t0d7s0 in another session)



- iostat -xn is not stuck but shows all zeroes since the very moment zpool 
iostat froze (which is quite strange if you look at fsstat ouput hereafter). 
NB: when I say all zeroes, I really mea nit, it's not zero dot domething, its 
zero dot zero.



- mpstat shows normal activity (almost nothing since this is a test machine, so 
only a few percent are used, but it still shows some activity and refreshes 
correctly)

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl

  0    0   0  125   428  109  113    1    4    0    0   251    2   0   0  98

  1    0   0   20    56   16   44    2    2    1    0   277   11   1   0  88

  2    0   0  163   152   13  309    1    3    0    0  1370    4   0   0  96

  3    0   0   19   111   41   90    0    4    0    0    80    0   0   0 100

  4    0   0   69   192   17   66    0    3    0    0    20    0   0   0 100

  5    0   0   10    61    7   92    0    4    0    0   167    0   0   0 100

  6    0   0   96   191   25   74    0    4    1    0     5    0   0   0 100

  7    0   0   16    58    6   63    0    3    1    0    59    0   0   0 100



- fsstat -F 5 shows all zeroes but for the zfs line (the figures hereunder stay 
almost the same over time)

new  name   name  attr  attr lookup rddir  read read  write write

file remov  chng   get   set    ops   ops   ops bytes   ops bytes

0     0     0 1,25K     0  2,51K     0   803 11,0M   473 11,0M zfs



- disk leds show no activity



- I cannot run any other command (neither from ssh, nor from gnome)



- I cannot open another ssh session (I don't even get the login prompt in putty)



- I can successfully ping the machine



- I cannot establish a new cifs session (the login prompt should not appear 
since the machine is in an active directory domain, but when it's stuck the 
prompt appear and I cannot authenticate. I guess it's related to ldap or 
kerberos or whatever cannot be read on rpool), but an already active session 
will stay open (last time I even managed to create a text file with a few lines 
in it that was still there after I hard-rebooted).



- after some time (an hour or so) my ssh sessions eventually stuck too.



Having worked for quite some time with Opensolaris/ZFS (though not with so much 
disks), I doubted the problem came from opensolaris and I already opened a case 
with areca's tech support which is trying (at least they told me so) to 
reproduce the problem. That's until I've read on zfs hangs issues.



We've found a workaround : we're going to put one internal 2.5'' disk to mirror 
rpool and dedicate the whole of the 2TB disks for tank, but :

- I thought it might somehow be related to the other hang issues (and so it 
might help developpers to hear about other similar issues)

- I would really like to rule out an opensolaris bug so that I can bring proofs 
to areca that their driver has a problem and either request them to correct it 
or request my supplier to replace the cards with working ones.



I think their driver is at fault because I found a message in /var/adm/message 
saying "WARNING: arcmsr duplicate scsi_hba_pkt_comp(9F) on same scsi_pkt(9S)" 
and immediately after that " WARNING: kstat rcnt == 0 when exiting runq, please 
check". Then the system was hung. The comments in the code introducing these 
changes tell us that drivers behaving this way could panic the system (or at 
least that how I understood these comments).



I've reached my limits here.

If anyone has ideas on how to determine what's going on, on other information I 
should publish, on other things to run or to check, I'll be happy to test them.



If I can be of any help to help troubleshoot the zfs hang problems that other 
are experiencing, I'd be happy to give a hand.



Thanks and have a nice day,



Regards,

Arnaud Brand

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS partially hangs when removing an rpool mirrored disk while having some IO on another pool on another partition of the same disk

Reply via email to