From: Richard Elling [mailto:[email protected]] Sent: 星期六, 一月 23, 2016 2:55 To: [email protected] Cc: illumos-developer Subject: Re: [smartos-discuss] Is zfs dead man timer tunable?
On Jan 21, 2016, at 10:18 PM, Fred Liu <[email protected]> wrote: Comment below. From: Richard Elling [mailto:[email protected]] Sent: 星期五, 一月 22, 2016 13:51 To: [email protected] Subject: Re: [smartos-discuss] Is zfs dead man timer tunable? answer far below... On Jan 21, 2016, at 8:44 PM, Fred Liu <[email protected]> wrote: -----Original Message----- From: Richard Elling [mailto:[email protected]] Sent: 星期五, 一月 22, 2016 12:02 To: [email protected] Subject: Re: [smartos-discuss] Is zfs deadm man timer tunable? On Jan 21, 2016, at 4:25 AM, Fred Liu <[email protected]> wrote: zfs deadman timer is tunable. But if you hit it, you've got problems that tuning the deadman won't help. The tunable is zfs_deadman_synctime_ms, which is milliseconds. For example, on a test machine here: [root@elvis ~]# echo zfs_deadman_synctime_ms/D | mdb -k zfs_deadman_synctime_ms: zfs_deadman_synctime_ms: 1000000 FYI, you can check on the state of I/Os in the ZIO pipeline and how long they've been there using the zio_state dcmd. Elvis is not currently busy or broken, but here is an example: [root@elvis ~]# echo ::zio_state | mdb -k ADDRESS TYPE STAGE WAITER TIME_ELAPSED ffffff01ada853e0 NULL OPEN - - ffffff01ada85b10 NULL OPEN - - If you see a large TIME_ELAPSED, you can track down the zio in question for more debugging. -- richard Richard, Many thanks! You have been always helpful since I first touched ZFS many years ago. I am trying intel P3600 NVMe ssd on ZFS. I got several random server reboot every day. And I just captured " panic message: I/O to pool 'zones' appears to be hung" from console. I doubt it is related to NVMe driver or ssd firmware. This is disturbing, to me. sd should sit between ZFS and the NVMe driver. sd manages timeouts, retries, and resets. But it is still somewhat at the mercy of the NVMe driver (or mptsas). more below... [Fred]: Now I switch to joyent_20160121T174331Z(latest for now). In my burning test(scrubbing + compiling SmartOS), I can see the checksum errors like following: "fffff086b921f3b0 NULL CHECKSUM_VERIFY fffff003d1fc5c40 - " If I don't issue "zfs clean", all the vdevs will get checksum erros thru "zpool status". And then the whole pool will hang for 1000s before reboot. That means zfs deadman really works as the goal of design! In the version before joyent_20160121T174331Z, there is no 1000s hanging and just a immediate reboot. I checked the changlog, it should be some bug fixes in zfs. - SmartOS Live Image v0.147+ build: 20151001T070028Z [root@pluto ~]# echo zfs_deadman_synctime_ms/D | mdb -k zfs_deadman_synctime_ms: zfs_deadman_synctime_ms: 1000000 [root@pluto ~]# echo ::zio_state | mdb -k ADDRESS TYPE STAGE WAITER TIME_ELAPSED fffff08576c15028 NULL OPEN - - fffff08576c153a8 NULL OPEN - - fffff08576c15728 NULL OPEN - - fffff08576c15aa8 NULL OPEN - - fffff08576c15e28 NULL OPEN - - fffff08576c161a8 NULL OPEN - - fffff08576c16528 NULL OPEN - - fffff08576c168a8 NULL OPEN - - fffff08576c16c28 NULL OPEN - - ... ... Is it possible to trigger a core dump? Coz, I can't get anything from /var/adm/message. If you hit the zfs deadman timer, you should get a core dump. The ::zio_state is very helpful for debugging kernel dumps, too :-) [Fred]: the dump device is zones/dump which is a zvol. How to get the core-dump file? You should be able to pull it out using "savecore -v /some/directory/with/space" Then you can run mdb on the resulting dump files. From there you should be able to see which zios are stuck and dive down towards the cause. [Fred]: [root@pluto ~]# savecore -v /zones/debug savecore: System dump time: Mon Jan 25 19:24:52 2016 savecore: Saving compressed system crash dump in /zones/debug/vmdump.0 savecore: Copying /dev/zvol/dsk/zones/dump to /zones/debug/vmdump.0 savecore: Decompress the crash dump with 'savecore -vf /zones/debug/vmdump.0' 1:08 dump copy is done [root@pluto /zones/debug]# ls -la total 4818285 drwxr-xr-x 2 root root 5 Jan 26 14:16 . drwxr-xr-x 16 root root 19 Jan 25 19:21 .. -rw-r--r-- 1 root root 2 Jan 26 14:15 bounds -rw-r--r-- 1 root root 1056 Jan 26 14:15 METRICS.csv -rw-r--r-- 1 root root 4305518592 Jan 26 14:15 vmdump.0 [root@pluto /zones/debug]# mdb -f vmdump.0 > echo ::zio_state mdb: failed to dereference symbol: operation not supported by target > ::status debugging file 'vmdump.0' (object file) [root@pluto /zones/debug]# echo "::zio_state" | mdb -f vmdump.0 invalid command '::zio_state': unknown dcmd name It looks like I can't find too much useful info here. That said, it is unusual that you hit the zfs deadman without another failure of some sort that isn't handled by sd. However, there is such a bug in mptsas driver in the past year. The bug has to do with a deadlock in the driver during reset conditions. These can occur if the IOC decides the target needs to be reset. It should be noted in the FMA ereport log since resets tend to be preceded by timeouts which are logged. You might also see some syslog messages around the same time: 1,000 seconds prior to zfs deadman timeout. If this is the case, then we can take a look at your release and see if it contains the fix. See also https://www.illumos.org/issues/6256 [Fred]: These NVMe ssds directly talk to CPUs via PCIE buses. So there is no HBAs. This case may not apply. Many thanks! Fred smartos-discuss | Archives | Modify Your Subscription ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com
