Re: ZFS/zpool command blocks ... locking up all terminals

2013-12-20 Thread Allan Jude
On 2013-12-20 05:55, O. Hartmann wrote:
 I have a faulty pool with an ambiguous label and I tried to resolve
 that problem. ZFS is at the moment highly active copying data from
 several volumes to another.

 Operating system:

 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r259522: Tue Dec 17 19:02:10 CET
 2013 amd64

 In one terminal I exported the pool in question and tried to list it
 via zpool import. But the this command sequence locks up the terminal
 for an hour up!

 In another terminal I tried to issue to command zpool status to watch
 the status of the pools (I have several). But this terminal ist alos
 locked up right now!

 What is wrong here? I had such an issue in 10.0-CURRENT as well. It
 seems ZFS is locking everything up and can only be brought back by a
 hard reset! What is going on? Why is zpool locking up in trying to
 display a label-scrambled pool while the zpool status is then also
 locked up, but latter is supposed to show the status of the other,
 healthy pools? This reminds me of single-threaded tools which looks up
 every operation consecutively issued after the blocking command.

 How is this to be solved?

 Oliver
Can you input 'control + t' or otherwise send siginfo to see what run
'state' zpool is in?

This usually consists of the ZFS function name and is often very
revealing and gives a starting point for investigation

-- 
Allan Jude




signature.asc
Description: OpenPGP digital signature


Re: ZFS/zpool command blocks ... locking up all terminals

2013-12-20 Thread Alan Somers
On Fri, Dec 20, 2013 at 3:55 AM, O. Hartmann
ohart...@zedat.fu-berlin.de wrote:

 I have a faulty pool with an ambiguous label and I tried to resolve
 that problem. ZFS is at the moment highly active copying data from
 several volumes to another.

 Operating system:

 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r259522: Tue Dec 17 19:02:10 CET
 2013 amd64

 In one terminal I exported the pool in question and tried to list it
 via zpool import. But the this command sequence locks up the terminal
 for an hour up!

 In another terminal I tried to issue to command zpool status to watch
 the status of the pools (I have several). But this terminal ist alos
 locked up right now!

 What is wrong here? I had such an issue in 10.0-CURRENT as well. It
 seems ZFS is locking everything up and can only be brought back by a
 hard reset! What is going on? Why is zpool locking up in trying to
 display a label-scrambled pool while the zpool status is then also
 locked up, but latter is supposed to show the status of the other,
 healthy pools? This reminds me of single-threaded tools which looks up
 every operation consecutively issued after the blocking command.

 How is this to be solved?

Sounds like a deadlock.  Did the zpool export complete successfully?
 Did the pool become suspended at any point?  Can you get to the
kernel debugger?  Most importantly, can you reproduce it?  If you can,
you'll probably need a WITNESS enabled kernel to get any useful info.
When I find a deadlock, I usually go into the kernel debugger and
issue the following commands.  It results in about a megabyte of
output, so use screen or tmux or something to capture the output

x/s version
show msginfo
ps
alltrace
show alllocks  # You need witness for this one

-Alan


 Oliver
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS/zpool command blocks ... locking up all terminals

2013-12-20 Thread O. Hartmann
On Fri, 20 Dec 2013 11:23:25 -0700
Alan Somers asom...@freebsd.org wrote:

 On Fri, Dec 20, 2013 at 3:55 AM, O. Hartmann
 ohart...@zedat.fu-berlin.de wrote:
 
  I have a faulty pool with an ambiguous label and I tried to resolve
  that problem. ZFS is at the moment highly active copying data from
  several volumes to another.
 
  Operating system:
 
  11.0-CURRENT FreeBSD 11.0-CURRENT #1 r259522: Tue Dec 17 19:02:10
  CET 2013 amd64
 
  In one terminal I exported the pool in question and tried to list it
  via zpool import. But the this command sequence locks up the
  terminal for an hour up!
 
  In another terminal I tried to issue to command zpool status to
  watch the status of the pools (I have several). But this terminal
  ist alos locked up right now!
 
  What is wrong here? I had such an issue in 10.0-CURRENT as well. It
  seems ZFS is locking everything up and can only be brought back by a
  hard reset! What is going on? Why is zpool locking up in trying to
  display a label-scrambled pool while the zpool status is then also
  locked up, but latter is supposed to show the status of the other,
  healthy pools? This reminds me of single-threaded tools which looks
  up every operation consecutively issued after the blocking command.
 
  How is this to be solved?
 
 Sounds like a deadlock.  Did the zpool export complete successfully?

No, it didn't, it is now stuck for  ~ 8 hours.
As well as zpool status.

  Did the pool become suspended at any point?  Can you get to the

The pools not exported are under heavy load at the moment (two further
pools). The pool exported isn't to be checked - I can't check the
status since the command is blocking.

 kernel debugger?  Most importantly, can you reproduce it?  If you can,
 you'll probably need a WITNESS enabled kernel to get any useful info.

I regret, I have no debugging kernel on this machine. The question
regarding the fact whether the problem is reproducable is unanswered
since I have no chance at this moment to try the procedure under the
very same conditions. I once realised the same behaviour in
10.0-CURRENT three months ago. I do not recall the exact conditions.

What I do recall is, that after all operations on any pool has
finished, the deadlock released. At this moment, I try to copy ~ 4TB
data from a pool (RAIDZ-0) to an external drive (via USB 3.0, also a
ZFS pool). That takes hours and I suspect the deadlock will last that
long until the copying is finished.

But it is scaring, that a single faulty command can block all further
operations of ZFS/zpool even on different pools.

 When I find a deadlock, I usually go into the kernel debugger and
 issue the following commands.  It results in about a megabyte of
 output, so use screen or tmux or something to capture the output
 
 x/s version
 show msginfo
 ps
 alltrace
 show alllocks  # You need witness for this one

I try this later after the backup is gone through. Thank you very much.

Oliver

 
 -Alan
 
 
  Oliver
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to
 freebsd-current-unsubscr...@freebsd.org




signature.asc
Description: PGP signature