Re: FreeBSD 10.2-RELEASE #0 r286666: Panic and crash

2017-02-10 Thread Shawn Bakhtiar
Well

It happened again today.

I found a few instances on the web of others reporting similar issues, and also 
ran across this bug.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211037

This is early similar to what is happening to me, save that in this case it's 
happening with a USB drive. Should I be attaching this there?

I've disabled the local backup to the USB (just doing the remote one to the 
share drive).

Any help would be greatly appreciated.


On Feb 6, 2017, at 1:01 PM, Shaheen Bakhtiar 
mailto:shashan...@hotmail.com>> wrote:

Hi all!

http://pastebin.com/niXrjF0D

Please refer to full output from crash above.

This morning our IMAP server decided to go belly up. I could not remote in, and 
the machine would not respond to any pings.

Checking the physical console I had the following worrisome messages on screen:

• g_vfs_done():da1p1[READ(offset=7265561772032, length=32768)]error = 5
• g_vfs_done():da1p1[WRITE(offset=7267957735424, length=131072)]error = 16
• /mnt/USBBD: got error 16 while accessing filesystem
• panic: softdep_deallocate_dependencies: unrecovered I/O error
• cpuid = 5

/mnt/USBDB is a MyBook USB 8TB drive that we use for daily backups of the IMAP 
data using rsync. Everything so far has worked without issue.

I also noticed a bunch of:

• fstat: can't read file 2 at 0x41f
• fstat: can't read file 4 at 0x78
• fstat: can't read file 5 at 0x6
• fstat: can't read file 1 at 0x27f
• fstat: can't read file 2 at 0x41f
• fstat: can't read file 4 at 0x78
• fstat: can't read file 5 at 0x6


but I have no idea what these are from.

df -h output:
/dev/da0p21.8T226G1.5T13%/
devfs 1.0K1.0K  0B   100%/dev
/dev/da1p17.0T251G6.2T 4%/mnt/USBBD


da0p2 is a RAID level 5 on an HP Smart Array

Here is the output of dmsg after reboot:
http://pastebin.com/rHVjgZ82

Obviously both the RAID and USB drive did not walk away from the crash 
cleaning. Should I be running a fsck at this point on both from single user 
mode to verify and clean up. My concern is the:
WARNING: /: mount pending error: blocks 0 files 26
when mounting /dev/da0p2

For some reason I was under the impression that fsck was run automatically on 
reboot.

Any help in this matter would be greatly appreciated. I'm a little concerned 
that a backup strategy that has worked for us for many MANY years would so 
easily throw the OS into panic. If an I/O error occurred on the USB Drive I 
would frankly think it should just back out, without panic. Or am I missing 
something?

Any recommendations / insights would be most welcome.
Shawn









___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 10.2-RELEASE #0 r286666: Panic and crash

2017-02-06 Thread Karl Denninger
On 2/6/2017 15:01, Shawn Bakhtiar wrote:
> Hi all!
>
> http://pastebin.com/niXrjF0D
>
> Please refer to full output from crash above.
>
> This morning our IMAP server decided to go belly up. I could not remote in, 
> and the machine would not respond to any pings.
>
> Checking the physical console I had the following worrisome messages on 
> screen:
>
> • g_vfs_done():da1p1[READ(offset=7265561772032, length=32768)]error = 5
> • g_vfs_done():da1p1[WRITE(offset=7267957735424, length=131072)]error = 16
> • /mnt/USBBD: got error 16 while accessing filesystem
> • panic: softdep_deallocate_dependencies: unrecovered I/O error
> • cpuid = 5
>
> /mnt/USBDB is a MyBook USB 8TB drive that we use for daily backups of the 
> IMAP data using rsync. Everything so far has worked without issue.
>
> I also noticed a bunch of:
>
> • fstat: can't read file 2 at 0x41f
> • fstat: can't read file 4 at 0x78
> • fstat: can't read file 5 at 0x6
> • fstat: can't read file 1 at 0x27f
> • fstat: can't read file 2 at 0x41f
> • fstat: can't read file 4 at 0x78
> • fstat: can't read file 5 at 0x6
>
>
> but I have no idea what these are from.
>
> df -h output:
> /dev/da0p21.8T226G1.5T13%/
> devfs 1.0K1.0K  0B   100%/dev
> /dev/da1p17.0T251G6.2T 4%/mnt/USBBD
>
>
> da0p2 is a RAID level 5 on an HP Smart Array
>
> Here is the output of dmsg after reboot:
> http://pastebin.com/rHVjgZ82
>
> Obviously both the RAID and USB drive did not walk away from the crash 
> cleaning. Should I be running a fsck at this point on both from single user 
> mode to verify and clean up. My concern is the:
> WARNING: /: mount pending error: blocks 0 files 26
> when mounting /dev/da0p2
>
> For some reason I was under the impression that fsck was run automatically on 
> reboot.
>
> Any help in this matter would be greatly appreciated. I'm a little concerned 
> that a backup strategy that has worked for us for many MANY years would so 
> easily throw the OS into panic. If an I/O error occurred on the USB Drive I 
> would frankly think it should just back out, without panic. Or am I missing 
> something?
>
> Any recommendations / insights would be most welcome.
> Shawn
>
>
The "mount pending error" is normal on a disk that has softupdates
turned on; fsck runs in the background after the boot, and this is
"safe" because of how the metadata and data writes are ordered.  In
other words the filesystem in this situation is missing uncommitted
data, but the state of the system is consistent.  As a result the system
can mount root read-write without having to fsck it first and the
background cleanup is safe from a disk consistency problem.

The panic itself appears to have resulted from an I/O error that
resulted in a failed operation.

I was part of a thread in 2016 on this you can find here:
https://lists.freebsd.org/pipermail/freebsd-stable/2016-July/084944.html

The basic problem is that the softupdates code cannot deal with a hard
I/O error on write because it no longer can guarantee filesystem
integrity if it continues.  I argued in that thread that the superior
solution would be forcibly detach the volume, which would leave you with
a "dirty" filesystem and a failed operation but not a panic.  The
file(s) involved in the write error might be lost, but the integrity of
the filesystem is recoverable (as it is in the panic case) -- at least
it is if the fsck doesn't require writing to a block that *also* errors out.

The decision in the code is to panic rather than detach the volume,
however, so panic it is.  This one has bit me with sd cards in small
embedded-style machines (where turning off softupdates makes things VERY
slow) and at some point I may look into developing a patch to
forcibly-detach the volume instead.  That obviously won't help you if
the system volume is the one the error happens on (now you just forcibly
detached the root filesystem which is going to get you an immediate
panic anyway) but in the event of a data disk it would prevent the
system from crashing.

-- 
Karl Denninger
k...@denninger.net 
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature


FreeBSD 10.2-RELEASE #0 r286666: Panic and crash

2017-02-06 Thread Shawn Bakhtiar
Hi all!

http://pastebin.com/niXrjF0D

Please refer to full output from crash above.

This morning our IMAP server decided to go belly up. I could not remote in, and 
the machine would not respond to any pings.

Checking the physical console I had the following worrisome messages on screen:

• g_vfs_done():da1p1[READ(offset=7265561772032, length=32768)]error = 5
• g_vfs_done():da1p1[WRITE(offset=7267957735424, length=131072)]error = 16
• /mnt/USBBD: got error 16 while accessing filesystem
• panic: softdep_deallocate_dependencies: unrecovered I/O error
• cpuid = 5

/mnt/USBDB is a MyBook USB 8TB drive that we use for daily backups of the IMAP 
data using rsync. Everything so far has worked without issue.

I also noticed a bunch of:

• fstat: can't read file 2 at 0x41f
• fstat: can't read file 4 at 0x78
• fstat: can't read file 5 at 0x6
• fstat: can't read file 1 at 0x27f
• fstat: can't read file 2 at 0x41f
• fstat: can't read file 4 at 0x78
• fstat: can't read file 5 at 0x6


but I have no idea what these are from.

df -h output:
/dev/da0p21.8T226G1.5T13%/
devfs 1.0K1.0K  0B   100%/dev
/dev/da1p17.0T251G6.2T 4%/mnt/USBBD


da0p2 is a RAID level 5 on an HP Smart Array

Here is the output of dmsg after reboot:
http://pastebin.com/rHVjgZ82

Obviously both the RAID and USB drive did not walk away from the crash 
cleaning. Should I be running a fsck at this point on both from single user 
mode to verify and clean up. My concern is the:
WARNING: /: mount pending error: blocks 0 files 26
when mounting /dev/da0p2

For some reason I was under the impression that fsck was run automatically on 
reboot.

Any help in this matter would be greatly appreciated. I'm a little concerned 
that a backup strategy that has worked for us for many MANY years would so 
easily throw the OS into panic. If an I/O error occurred on the USB Drive I 
would frankly think it should just back out, without panic. Or am I missing 
something?

Any recommendations / insights would be most welcome.
Shawn








___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"