RE: [smartos-discuss] Is zfs dead man timer tunable?

Fred Liu Mon, 25 Jan 2016 22:50:53 -0800


From: Richard Elling [mailto:[email protected]] 
Sent: 星期六, 一月 23, 2016 2:55
To: [email protected]
Cc: illumos-developer
Subject: Re: [smartos-discuss] Is zfs dead man timer tunable?

On Jan 21, 2016, at 10:18 PM, Fred Liu <[email protected]> wrote:

Comment below.

From: Richard Elling [mailto:[email protected]]
Sent: 星期五, 一月 22, 2016 13:51
To: [email protected]
Subject: Re: [smartos-discuss] Is zfs dead man timer tunable?

answer far below...

On Jan 21, 2016, at 8:44 PM, Fred Liu <[email protected]> wrote:

-----Original Message-----
From: Richard Elling [mailto:[email protected]]
Sent: 星期五, 一月 22, 2016 12:02
To: [email protected]
Subject: Re: [smartos-discuss] Is zfs deadm man timer tunable?

On Jan 21, 2016, at 4:25 AM, Fred Liu <[email protected]> wrote:

zfs deadman timer is tunable. But if you hit it, you've got problems
that tuning the deadman won't help.

The tunable is zfs_deadman_synctime_ms, which is milliseconds.

For example, on a test machine here:
       [root@elvis ~]# echo zfs_deadman_synctime_ms/D | mdb -k
       zfs_deadman_synctime_ms:
       zfs_deadman_synctime_ms:        1000000

FYI, you can check on the state of I/Os in the ZIO pipeline and how
long they've been there using the zio_state dcmd. Elvis is not
currently busy or broken, but here is an example:
       [root@elvis ~]# echo ::zio_state | mdb -k
       ADDRESS                 TYPE  STAGE            WAITER
TIME_ELAPSED
       ffffff01ada853e0        NULL  OPEN             -                -
       ffffff01ada85b10        NULL  OPEN             -                -

If you see a large TIME_ELAPSED, you can track down the zio in question
for more debugging.
-- richard

Richard,

Many thanks! You have been always helpful since I first touched ZFS many years 
ago.
I am trying intel P3600 NVMe ssd on ZFS. I got several random server reboot 
every day. And I just captured "
panic message: I/O to pool 'zones' appears to be hung" from console. I doubt it 
is related to NVMe driver or
ssd firmware.

This is disturbing, to me. sd should sit between ZFS and the NVMe driver. sd 
manages 
timeouts, retries, and resets. But it is still somewhat at the mercy of the 
NVMe driver
(or mptsas).

more below...

[Fred]: Now I switch to joyent_20160121T174331Z(latest for now). In my burning 
test(scrubbing + compiling SmartOS), I can see the checksum errors like 
following:
"fffff086b921f3b0        NULL  CHECKSUM_VERIFY  fffff003d1fc5c40 - "
If I don't issue "zfs clean", all the vdevs will get checksum erros thru "zpool 
status". And then the whole pool
will hang for 1000s before reboot. That means zfs deadman really works as the 
goal of design! In the version
before joyent_20160121T174331Z, there is no 1000s hanging and just a immediate 
reboot. I checked the changlog,
it should be some bug fixes in zfs.

- SmartOS Live Image v0.147+ build: 20151001T070028Z
[root@pluto ~]# echo zfs_deadman_synctime_ms/D | mdb -k
zfs_deadman_synctime_ms:
zfs_deadman_synctime_ms:        1000000
[root@pluto ~]# echo ::zio_state | mdb -k
ADDRESS                 TYPE  STAGE            WAITER           TIME_ELAPSED
fffff08576c15028        NULL  OPEN             -                -
fffff08576c153a8        NULL  OPEN             -                -
fffff08576c15728        NULL  OPEN             -                -
fffff08576c15aa8        NULL  OPEN             -                -
fffff08576c15e28        NULL  OPEN             -                -
fffff08576c161a8        NULL  OPEN             -                -
fffff08576c16528        NULL  OPEN             -                -
fffff08576c168a8        NULL  OPEN             -                -
fffff08576c16c28        NULL  OPEN             -                -
...
...

Is it possible to trigger a core dump? Coz, I can't get anything from 
/var/adm/message.

If you hit the zfs deadman timer, you should get a core dump. The ::zio_state 
is very helpful
for debugging kernel dumps, too :-)

[Fred]: the dump device is zones/dump which is a zvol. How to get the core-dump 
file?

You should be able to pull it out using "savecore -v /some/directory/with/space"
Then you can run mdb on the resulting dump files. From there you should be able
to see which zios are stuck and dive down towards the cause.

[Fred]: [root@pluto ~]# savecore -v /zones/debug
savecore: System dump time: Mon Jan 25 19:24:52 2016

savecore: Saving compressed system crash dump in /zones/debug/vmdump.0
savecore: Copying /dev/zvol/dsk/zones/dump to /zones/debug/vmdump.0

savecore: Decompress the crash dump with 
'savecore -vf /zones/debug/vmdump.0'
1:08 dump copy is done

[root@pluto /zones/debug]# ls -la
total 4818285
drwxr-xr-x   2 root     root           5 Jan 26 14:16 .
drwxr-xr-x  16 root     root          19 Jan 25 19:21 ..
-rw-r--r--   1 root     root           2 Jan 26 14:15 bounds
-rw-r--r--   1 root     root        1056 Jan 26 14:15 METRICS.csv
-rw-r--r--   1 root     root     4305518592 Jan 26 14:15 vmdump.0
[root@pluto /zones/debug]# mdb -f vmdump.0
> echo ::zio_state 
 mdb:  failed to dereference symbol: operation not supported by target
  > ::status
 debugging file 'vmdump.0' (object file)

 [root@pluto /zones/debug]# echo "::zio_state" | mdb -f vmdump.0
      invalid command '::zio_state': unknown dcmd name

 It looks like I can't find too much useful info here.

 That said, it is unusual that you hit the zfs deadman without another failure 
of some sort
 that isn't handled by sd.

 However, there is such a bug in mptsas driver in the past year. The bug has to 
do with a
 deadlock in the driver during reset conditions. These can occur if the IOC 
decides the target
 needs to be reset. It should be noted in the FMA ereport log since resets tend 
to be preceded
 by timeouts which are logged. You might also see some syslog messages around 
the same
 time: 1,000 seconds prior to zfs deadman timeout. If this is the case, then we 
can take a look
 at your release and see if it contains the fix. See also 
https://www.illumos.org/issues/6256

 [Fred]: These NVMe ssds directly talk to CPUs via PCIE buses. So there is no 
HBAs. This case may
        not apply.

 Many thanks!

 Fred

 smartos-discuss | Archives | Modify Your Subscription

-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

RE: [smartos-discuss] Is zfs dead man timer tunable?

Reply via email to