Re: ZFS i/o error in recent 12.0

2018-03-20 Thread Allan Jude
On 2018-03-20 10:29, Andriy Gapon wrote:
> On 20/03/2018 09:09, Trond Endrestøl wrote:
>> This step has been big no-no in the past. Never leave your 
>> bootpool/rootpool in an exported state if you intend to boot from it. 
>> For all I know, this advice might be superstition for the present 
>> versions of FreeBSD.
> 
> Yes, it is.  That does not matter at all now.
> 
>> From what I can tell from the above, you never created a new 
>> zpool.cache and copied it to its rightful place.
> 
> For the _rooot_ pool zpool.cache does not matter as well.
> It matters only for auto-import of additional pools, if any.
> 

As I mentioned previously, the error reported by the user is before it
is even possible to read zpool.cache, so it is definitely not the source
of the problem.

-- 
Allan Jude
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Strange ARC/Swap/CPU on yesterday's -CURRENT

2018-03-20 Thread Thomas Steen Rasmussen


On 03/11/2018 09:43 PM, Jeff Roberson wrote:

> Also, if you could try going back to r328953 or r326346 and let me
> know if the problem exists in either.  That would be very helpful.  If
> anyone is willing to debug this with me contact me directly and I will
> send some test patches or debugging info after you have done the above
> steps.
>

Hello,

I am seeing this issue (high swap, arc not backing down) on two
jail/bhyve hosts running 11-STABLE r325275 and r325235 - which sounds
like it is earlier than the two patches you mention?

The two machines are at 98 and 138 days uptime, and both are currently
using more than 90% swap, and I've had to shut down non-critical stuff
because I was getting out-of-swap errors.

Just wanted to let everyone know, since I haven't seen any revisions as
early as r325275 in the "me too" posts here.

More information available on request.

Best regards,

Thomas Steen Rasmussen

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS i/o error in recent 12.0

2018-03-20 Thread Andriy Gapon
On 20/03/2018 09:09, Trond Endrestøl wrote:
> This step has been big no-no in the past. Never leave your 
> bootpool/rootpool in an exported state if you intend to boot from it. 
> For all I know, this advice might be superstition for the present 
> versions of FreeBSD.

Yes, it is.  That does not matter at all now.

> From what I can tell from the above, you never created a new 
> zpool.cache and copied it to its rightful place.

For the _rooot_ pool zpool.cache does not matter as well.
It matters only for auto-import of additional pools, if any.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS i/o error in recent 12.0

2018-03-20 Thread Toomas Soome


> On 20 Mar 2018, at 09:50, Markus Wild  wrote:
> 
> Hi there,
> 
>> I've been encountered suddenly death in ZFS full volume
>> machine(r330434) about 10 days after installation[1]:
>> 
>> ZFS: i/o error - all block copies unavailable
>> ZFS: can't read MOS of pool zroot
>> gptzfsboot: failed to mount default pool zroot
>> 
> 
>>268847104  30978715648  4  freebsd-zfs  (14T)
> 
> ^^^
> 
> 
> I had faced the exact same issue on a HP Microserver G8 with 8TB disks and a 
> 16TB zpool on FreeBSD 11 about a year ago.
> My conclusion was, that over time (and updating the kernel), the blocks for 
> that kernel file were reallocated to a
> later spot on the disks, and that however the loader fetches those blocks, it 
> now failed doing so (perhaps a 2/4TB
> limit/bug with the BIOS of that server? Unfortunately, there was no UEFI 
> support for it, don't know whether that
> changed in the meantime). The pool was always importable fine with the USB 
> stick, the problem was only with the boot
> loader. I worked around the problem stealing space from the swap partitions 
> on two disks to build a "zboot" pool, just
> containing the /boot directory, having the boot loader load the kernel from 
> there, and then still mount the real root
> pool to run the system off using loader-variables in loader.conf of the boot 
> pool. It's a hack, but it's working
> fine since (the server is being used as a backup repository). This is what I 
> have in the "zboot" boot/loader.conf:
> 
> # zfs boot kludge due to buggy bios
> vfs.root.mountfrom="zfs:zroot/ROOT/fbsd11"
> 
> 
> If you're facing the same problem, you might give this a shot? You seem to 
> have plenty of swap to canibalize as well;)
> 

please check with lsdev -v from loader OK prompt - do the reported 
disk/partition sizes make sense. Another thing is, even if you do update the 
current build, you want to make sure your installed boot blocks are updated as 
well - otherwise you will have new binary in the /boot directory, but it is not 
installed on boot block area…

rgds,
toomas

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS i/o error in recent 12.0

2018-03-20 Thread Markus Wild
Hi there,

> I've been encountered suddenly death in ZFS full volume
> machine(r330434) about 10 days after installation[1]:
> 
> ZFS: i/o error - all block copies unavailable
> ZFS: can't read MOS of pool zroot
> gptzfsboot: failed to mount default pool zroot
> 

> 268847104  30978715648  4  freebsd-zfs  (14T)

^^^


I had faced the exact same issue on a HP Microserver G8 with 8TB disks and a 
16TB zpool on FreeBSD 11 about a year ago.
My conclusion was, that over time (and updating the kernel), the blocks for 
that kernel file were reallocated to a
later spot on the disks, and that however the loader fetches those blocks, it 
now failed doing so (perhaps a 2/4TB
limit/bug with the BIOS of that server? Unfortunately, there was no UEFI 
support for it, don't know whether that
changed in the meantime). The pool was always importable fine with the USB 
stick, the problem was only with the boot
loader. I worked around the problem stealing space from the swap partitions on 
two disks to build a "zboot" pool, just
containing the /boot directory, having the boot loader load the kernel from 
there, and then still mount the real root
pool to run the system off using loader-variables in loader.conf of the boot 
pool. It's a hack, but it's working
fine since (the server is being used as a backup repository). This is what I 
have in the "zboot" boot/loader.conf:

# zfs boot kludge due to buggy bios
vfs.root.mountfrom="zfs:zroot/ROOT/fbsd11"


If you're facing the same problem, you might give this a shot? You seem to have 
plenty of swap to canibalize as well;)

Cheers,
Markus




___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS i/o error in recent 12.0

2018-03-20 Thread Trond Endrestøl
On Tue, 20 Mar 2018 08:00+0900, KIRIYAMA Kazuhiko wrote:

> Hi,
> 
> I've been encountered suddenly death in ZFS full volume
> machine(r330434) about 10 days after installation[1]:
> 
> ZFS: i/o error - all block copies unavailable
> ZFS: can't read MOS of pool zroot
> gptzfsboot: failed to mount default pool zroot
> 
> FreeBSD/x86 boot
> ZFS: i/o error - all block copies unavailable
> ZFS: can't find dataset u
> Default: zroot/<0x0>:
> boot: 
> 
> Partition is bellow:
> 
>  # gpart show /dev/mfid0
> => 40  31247564720  mfid0  GPT  (15T)
>40   409600  1  efi  (200M)
>409640 1024  2  freebsd-boot  (512K)
>410664  984 - free -  (492K)
>411648268435456  3  freebsd-swap  (128G)
> 268847104  30978715648  4  freebsd-zfs  (14T)
>   31247562752 2008 - free -  (1.0M)
> 
> # 
> 
> But nothing had beed happend in old current ZFS full volume
> machine(r327038M). According to [2] the reason is boot/zfs/zpool.cache
> inconsistent. I've tried to cope with this by repairing
> /boot [3] from rescue bootable USB as follows:
> 
> # kldload zfs
> # zpool import 
>pool: zroot
>  id: 17762298124265859537
>   state: ONLINE
>  action: The pool can be imported using its name or numeric identifier.
>  config:
> 
> zroot   ONLINE
>   mfid0p4   ONLINE
> # zpool import -fR /mnt zroot
> # df -h
> Filesystem  SizeUsed   Avail Capacity  Mounted on
> /dev/da0p2   14G1.6G 11G13%/
> devfs   1.0K1.0K  0B   100%/dev
> zroot/.dake  14T 18M 14T 0%/mnt/.dake
> zroot/ds 14T 96K 14T 0%/mnt/ds
> zroot/ds/backup  14T 88K 14T 0%/mnt/ds/backup
> zroot/ds/backup/kazu.pis 14T 31G 14T 0%
> /mnt/ds/backup/kazu.pis
> zroot/ds/distfiles   14T7.9M 14T 0%/mnt/ds/distfiles
> zroot/ds/obj 14T 10G 14T 0%/mnt/ds/obj
> zroot/ds/packages14T4.0M 14T 0%/mnt/ds/packages
> zroot/ds/ports   14T1.3G 14T 0%/mnt/ds/ports
> zroot/ds/src 14T2.6G 14T 0%/mnt/ds/src
> zroot/tmp14T 88K 14T 0%/mnt/tmp
> zroot/usr/home   14T136K 14T 0%/mnt/usr/home
> zroot/usr/local  14T 10M 14T 0%/mnt/usr/local
> zroot/var/audit  14T 88K 14T 0%/mnt/var/audit
> zroot/var/crash  14T 88K 14T 0%/mnt/var/crash
> zroot/var/log14T388K 14T 0%/mnt/var/log
> zroot/var/mail   14T 92K 14T 0%/mnt/var/mail
> zroot/var/ports  14T 11M 14T 0%/mnt/var/ports
> zroot/var/tmp14T6.0M 14T 0%/mnt/var/tmp
> zroot/vm 14T2.8G 14T 0%/mnt/vm
> zroot/vm/tbedfc  14T1.6G 14T 0%/mnt/vm/tbedfc
> zroot14T 88K 14T 0%/mnt/zroot
> # zfs list
> NAME   USED  AVAIL  REFER  MOUNTPOINT
> zroot 51.1G  13.9T88K  /mnt/zroot
> zroot/.dake   18.3M  13.9T  18.3M  /mnt/.dake
> zroot/ROOT1.71G  13.9T88K  none
> zroot/ROOT/default1.71G  13.9T  1.71G  /mnt/mnt
> zroot/ds  45.0G  13.9T96K  /mnt/ds
> zroot/ds/backup   30.8G  13.9T88K  /mnt/ds/backup
> zroot/ds/backup/kazu.pis  30.8G  13.9T  30.8G  /mnt/ds/backup/kazu.pis
> zroot/ds/distfiles7.88M  13.9T  7.88M  /mnt/ds/distfiles
> zroot/ds/obj  10.4G  13.9T  10.4G  /mnt/ds/obj
> zroot/ds/packages 4.02M  13.9T  4.02M  /mnt/ds/packages
> zroot/ds/ports1.26G  13.9T  1.26G  /mnt/ds/ports
> zroot/ds/src  2.56G  13.9T  2.56G  /mnt/ds/src
> zroot/tmp   88K  13.9T88K  /mnt/tmp
> zroot/usr 10.4M  13.9T88K  /mnt/usr
> zroot/usr/home 136K  13.9T   136K  /mnt/usr/home
> zroot/usr/local   10.2M  13.9T  10.2M  /mnt/usr/local
> zroot/var 17.4M  13.9T88K  /mnt/var
> zroot/var/audit 88K  13.9T88K  /mnt/var/audit
> zroot/var/crash 88K  13.9T88K  /mnt/var/crash
> zroot/var/log  388K  13.9T   388K  /mnt/var/log
> zroot/var/mail  92K  13.9T92K  /mnt/var/mail
> zroot/var/ports   10.7M  13.9T  10.7M  /mnt/var/ports
> zroot/var/tmp 5.98M  13.9T  5.98M  /mnt/var/tmp
> zroot/vm  4.33G  13.9T  2.75G  /mnt/vm
> zroot/vm/tbedfc   1.58G  13.9T  1.58G  /mnt/vm/tbedfc
> # zfs mount zroot/ROOT/default
> # cd /mnt/mnt/
> # mv boot boot.bak
> # cp -RPp boot.bak boot
> # gpart show /dev/mfid0
> => 40  31247564720  mfid0  GPT  (15T)
>  

Re: Strange ARC/Swap/CPU on yesterday's -CURRENT

2018-03-20 Thread Peter Jeremy

On 2018-Mar-11 10:43:58 -1000, Jeff Roberson  wrote:
>Also, if you could try going back to r328953 or r326346 and let me know if 
>the problem exists in either.  That would be very helpful.  If anyone is 
>willing to debug this with me contact me directly and I will send some 
>test patches or debugging info after you have done the above steps.

I ran into this on 11-stable and tracked it to r326619 (MFC of r325851).
I initially got around the problem by reverting that commit but either
it or something very similar is still present in 11-stable r331053.

I've seen it in my main server (32GB RAM) but haven't managed to reproduce
it in smaller VBox guests - one difficulty I faced was artificially filling
ARC.

-- 
Peter Jeremy


signature.asc
Description: PGP signature