Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-11-21 Thread Karl Denninger
On 10/17/2016 18:32, Steven Hartland wrote: > > On 17/10/2016 22:50, Karl Denninger wrote: >> I will make some effort on the sandbox machine to see if I can come up >> with a way to replicate this. I do have plenty of spare larger drives >> laying around that used to be in service and were

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-18 Thread Karl Denninger
On 10/17/2016 18:32, Steven Hartland wrote: > > > On 17/10/2016 22:50, Karl Denninger wrote: >> I will make some effort on the sandbox machine to see if I can come up >> with a way to replicate this. I do have plenty of spare larger drives >> laying around that used to be in service and were

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-18 Thread Andriy Gapon
On 18/10/2016 00:43, Steven Hartland wrote: > On 17/10/2016 20:52, Andriy Gapon wrote: >> On 17/10/2016 21:54, Steven Hartland wrote: >>> You're hitting stack exhaustion, have you tried increasing the kernel stack >>> pages? >>> It can be changed from /boot/loader.conf >>> kern.kstack_pages="6"

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Steven Hartland
On 17/10/2016 22:50, Karl Denninger wrote: I will make some effort on the sandbox machine to see if I can come up with a way to replicate this. I do have plenty of spare larger drives laying around that used to be in service and were obsolesced due to capacity -- but what I don't know if

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Karl Denninger
I will make some effort on the sandbox machine to see if I can come up with a way to replicate this. I do have plenty of spare larger drives laying around that used to be in service and were obsolesced due to capacity -- but what I don't know if whether the system will misbehave if the source is

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Steven Hartland
On 17/10/2016 20:52, Andriy Gapon wrote: On 17/10/2016 21:54, Steven Hartland wrote: You're hitting stack exhaustion, have you tried increasing the kernel stack pages? It can be changed from /boot/loader.conf kern.kstack_pages="6" Default on amd64 is 4 IIRC Steve, perhaps you can think of a

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Steven Hartland
Setting those values will only effect what's queued to the device not what's actually outstanding. On 17/10/2016 21:22, Karl Denninger wrote: Since I cleared it (by setting TRIM off on the test machine, rebooting, importing the pool and noting that it did not panic -- pulled drives,

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Karl Denninger
Since I cleared it (by setting TRIM off on the test machine, rebooting, importing the pool and noting that it did not panic -- pulled drives, re-inserted into the production machine and ran backup routine -- all was normal) it may be a while before I see it again (a week or so is usual.) It

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Steven Hartland
Be good to confirm its not an infinite loop by giving it a good bump first. On 17/10/2016 19:58, Karl Denninger wrote: I can certainly attempt setting that higher but is that not just hiding the problem rather than addressing it? On 10/17/2016 13:54, Steven Hartland wrote: You're hitting

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Andriy Gapon
On 17/10/2016 21:54, Steven Hartland wrote: > You're hitting stack exhaustion, have you tried increasing the kernel stack > pages? > It can be changed from /boot/loader.conf > kern.kstack_pages="6" > > Default on amd64 is 4 IIRC Steve, perhaps you can think of a more proper fix? :-)

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Karl Denninger
I can certainly attempt setting that higher but is that not just hiding the problem rather than addressing it? On 10/17/2016 13:54, Steven Hartland wrote: > You're hitting stack exhaustion, have you tried increasing the kernel > stack pages? > It can be changed from /boot/loader.conf >

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Steven Hartland
You're hitting stack exhaustion, have you tried increasing the kernel stack pages? It can be changed from /boot/loader.conf kern.kstack_pages="6" Default on amd64 is 4 IIRC On 17/10/2016 19:08, Karl Denninger wrote: The target (and devices that trigger this) are a pair of 4Gb 7200RPM SATA

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Karl Denninger
The target (and devices that trigger this) are a pair of 4Gb 7200RPM SATA rotating rust drives (zmirror) with each provider geli-encrypted (that is, the actual devices used for the pool create are the .eli's) The machine generating the problem has both rotating rust devices *and* SSDs, so I can't

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Warner Losh
what's your underlying media? Warner On Mon, Oct 17, 2016 at 10:02 AM, Karl Denninger wrote: > Update from my test system: > > Setting vfs.zfs.vdev_trim_max_active to 10 (from default 64) does *not* > stop the panics. > > Setting vfs.zfs.vdev.trim.enabled = 0 (which

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Karl Denninger
Update from my test system: Setting vfs.zfs.vdev_trim_max_active to 10 (from default 64) does *not* stop the panics. Setting vfs.zfs.vdev.trim.enabled = 0 (which requires a reboot) DOES stop the panics. I am going to run a scrub on the pack, but I suspect the pack itself (now that I can

Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Karl Denninger
This is a situation I've had happen before, and reported -- it appeared to be a kernel stack overflow, and it has gotten materially worse on 11.0-STABLE. The issue occurs after some period of time (normally a week or so.) The system has a mirrored pair of large drives used for backup purposes to