Re: Odd ZFS boot module issue on r332158

2018-05-01 Thread Andrew Gallatin

FWIW, I also updated to the latest BIOS available for this board,
Supermicro X10SRA, to version: 2.0c Release Date: 09/25/2017.

One thing I noticed is that even though the board supports UEFI,
there was no UEFI update procedure for it, and I had to flash via
a USB stick with a FreeDOS boot image.  Given that, I would
not be terribly... surprised... if the UEFI fw on the board had
 issues with 2TB as well.   All the update seemed to do was
re-order my PCI devices, so I had to revisit my Xorg and
byhve pptdev configurations.

Thankfully, the only module giving issues is nvidia, which is
just as well loaded post-boot.

FWIW, the error is

~~~
/boot/modules/nvidia.ko -
elf64_obj_loadimage: read failed

can't load file '/boot/modules/nvidia.ko': input/output error
~~~


Drew
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Odd ZFS boot module issue on r332158

2018-05-01 Thread Andrew Gallatin

On 05/01/18 13:14, Toomas Soome wrote:



On 1 May 2018, at 17:34, Andrew Gallatin > wrote:


On 04/10/18 16:51, Andriy Gapon wrote:

On 10/04/2018 22:48, Andrew Gallatin wrote:

On 04/10/18 11:25, Andriy Gapon wrote:

On 10/04/2018 15:27, Andrew Gallatin wrote:

Is there something like tools/diag/prtblknos for ZFS?


zdb.

It has a manual page, but in the case like this you typically want 
to run

zdb -d[d*]  
Add d-s until you get all the information you want.

It looks like five d-s is needed to get individual blocks reported.



Thanks for the instructions!

How do I interpret this output:

[snip]
   0 L1  1:1f01016c000:1000 2L/1000P F=3 
B=16769122/16769122
   0  L0 1:1f00f9e3000:2 2L/2P F=1 
B=16769122/16769122
   2  L0 1:1f00fa03000:2 2L/2P F=1 
B=16769122/16769122
   4  L0 1:1f00fa23000:2 2L/2P F=1 
B=16769122/16769122
The first number is an offset within the file (hex); Lx is a block 
level where
L0 is a data block, L1 is an indirect block just above data blocks, 
etc; x:y:z
is a (top-level) vdev number, a block offset on disk (hex) and a 
block size on

disk(hex); the rest is not as important.
The quoted offsets appear to be just below 2TB.


Are these byte addresses?  Or do I need to multiply by the blocksize 
to determine the offset into the file?  From your "just below 2TB" I'm 
assuming byte addresses.


This is a supermicro board X10SRA. They do have a f/w update,
but I suspect it is mainly just for new ucode.  Of course there is
no changelong.  I guess I'll try it if/when I'm totally unable to
boot into a new BE.

I just checked, and my EFI loader is ~1 year old, I should probably try
updating that too.

FWIW, I just updated to head again, and I see a problem with just one
module, which looks like the attached.



could you test https://reviews.freebsd.org/D15207 



Thank you so much!  I just tried that, and I'm afraid that it didn't 
help, though by all rights I'd expect that it should.  I installed the

world into a new BE, and made sure to re-install boot1.efi into all
my EFI partitions.

Are you able to replicate this issue yourself?

Drew



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Odd ZFS boot module issue on r332158

2018-05-01 Thread Toomas Soome


> On 1 May 2018, at 17:34, Andrew Gallatin  wrote:
> 
> On 04/10/18 16:51, Andriy Gapon wrote:
>> On 10/04/2018 22:48, Andrew Gallatin wrote:
>>> On 04/10/18 11:25, Andriy Gapon wrote:
 On 10/04/2018 15:27, Andrew Gallatin wrote:
> Is there something like tools/diag/prtblknos for ZFS?
 
 zdb.
 
 It has a manual page, but in the case like this you typically want to run
 zdb -d[d*]  
 Add d-s until you get all the information you want.
 
 It looks like five d-s is needed to get individual blocks reported.
 
>>> 
>>> Thanks for the instructions!
>>> 
>>> How do I interpret this output:
>> [snip]
>>>0 L1  1:1f01016c000:1000 2L/1000P F=3 B=16769122/16769122
>>>0  L0 1:1f00f9e3000:2 2L/2P F=1 
>>> B=16769122/16769122
>>>2  L0 1:1f00fa03000:2 2L/2P F=1 
>>> B=16769122/16769122
>>>4  L0 1:1f00fa23000:2 2L/2P F=1 
>>> B=16769122/16769122
>> The first number is an offset within the file (hex); Lx is a block level 
>> where
>> L0 is a data block, L1 is an indirect block just above data blocks, etc; 
>> x:y:z
>> is a (top-level) vdev number, a block offset on disk (hex) and a block size 
>> on
>> disk(hex); the rest is not as important.
>> The quoted offsets appear to be just below 2TB.
> 
> Are these byte addresses?  Or do I need to multiply by the blocksize to 
> determine the offset into the file?  From your "just below 2TB" I'm assuming 
> byte addresses.
> 
> This is a supermicro board X10SRA. They do have a f/w update,
> but I suspect it is mainly just for new ucode.  Of course there is
> no changelong.  I guess I'll try it if/when I'm totally unable to
> boot into a new BE.
> 
> I just checked, and my EFI loader is ~1 year old, I should probably try
> updating that too.
> 
> FWIW, I just updated to head again, and I see a problem with just one
> module, which looks like the attached.
> 

could you test https://reviews.freebsd.org/D15207 
 ?

rgds,
toomas

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Odd ZFS boot module issue on r332158

2018-05-01 Thread Andrew Gallatin

On 04/10/18 16:51, Andriy Gapon wrote:

On 10/04/2018 22:48, Andrew Gallatin wrote:

On 04/10/18 11:25, Andriy Gapon wrote:

On 10/04/2018 15:27, Andrew Gallatin wrote:

Is there something like tools/diag/prtblknos for ZFS?


zdb.

It has a manual page, but in the case like this you typically want to run
zdb -d[d*]  
Add d-s until you get all the information you want.

It looks like five d-s is needed to get individual blocks reported.



Thanks for the instructions!

How do I interpret this output:

[snip]

    0 L1  1:1f01016c000:1000 2L/1000P F=3 B=16769122/16769122
    0  L0 1:1f00f9e3000:2 2L/2P F=1 B=16769122/16769122
    2  L0 1:1f00fa03000:2 2L/2P F=1 B=16769122/16769122
    4  L0 1:1f00fa23000:2 2L/2P F=1 B=16769122/16769122


The first number is an offset within the file (hex); Lx is a block level where
L0 is a data block, L1 is an indirect block just above data blocks, etc; x:y:z
is a (top-level) vdev number, a block offset on disk (hex) and a block size on
disk(hex); the rest is not as important.

The quoted offsets appear to be just below 2TB.





Are these byte addresses?  Or do I need to multiply by the blocksize to 
determine the offset into the file?  From your "just below 2TB" I'm 
assuming byte addresses.


This is a supermicro board X10SRA. They do have a f/w update,
but I suspect it is mainly just for new ucode.  Of course there is
no changelong.  I guess I'll try it if/when I'm totally unable to
boot into a new BE.

I just checked, and my EFI loader is ~1 year old, I should probably try
updating that too.

FWIW, I just updated to head again, and I see a problem with just one
module, which looks like the attached.

Drew
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Odd ZFS boot module issue on r332158

2018-04-10 Thread Andriy Gapon
On 10/04/2018 22:48, Andrew Gallatin wrote:
> On 04/10/18 11:25, Andriy Gapon wrote:
>> On 10/04/2018 15:27, Andrew Gallatin wrote:
>>> Is there something like tools/diag/prtblknos for ZFS?
>>
>> zdb.
>>
>> It has a manual page, but in the case like this you typically want to run
>> zdb -d[d*]  
>> Add d-s until you get all the information you want.
>>
>> It looks like five d-s is needed to get individual blocks reported.
>>
> 
> Thanks for the instructions!
> 
> How do I interpret this output:
[snip]
>    0 L1  1:1f01016c000:1000 2L/1000P F=3 B=16769122/16769122
>    0  L0 1:1f00f9e3000:2 2L/2P F=1 B=16769122/16769122
>    2  L0 1:1f00fa03000:2 2L/2P F=1 B=16769122/16769122
>    4  L0 1:1f00fa23000:2 2L/2P F=1 B=16769122/16769122

The first number is an offset within the file (hex); Lx is a block level where
L0 is a data block, L1 is an indirect block just above data blocks, etc; x:y:z
is a (top-level) vdev number, a block offset on disk (hex) and a block size on
disk(hex); the rest is not as important.

The quoted offsets appear to be just below 2TB.



-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Odd ZFS boot module issue on r332158

2018-04-10 Thread Andrew Gallatin

On 04/10/18 11:25, Andriy Gapon wrote:

On 10/04/2018 15:27, Andrew Gallatin wrote:

Is there something like tools/diag/prtblknos for ZFS?


zdb.

It has a manual page, but in the case like this you typically want to run
zdb -d[d*]  
Add d-s until you get all the information you want.

It looks like five d-s is needed to get individual blocks reported.



Thanks for the instructions!

How do I interpret this output:

<3:45pm>viserion/gallatin:~>ls -li /boot//kernel/vmm.ko
231484 -r-xr-xr-x  1 root  wheel  371168 Apr  7 15:05 /boot//kernel/vmm.ko*
<3:46pm>viserion/gallatin:src>sudo zdb -d 
tank/ROOT/12.0-CURRENT-20180407.104009 231484
Dataset tank/ROOT/12.0-CURRENT-20180407.104009 [ZPL], ID 260, cr_txg 
16768182, 19.9G, 292309 objects, rootbp DVA[0]=<0:4565c5a5000:1000> 
DVA[1]=<1:1f1ba52b000:1000> [L0 DMU objset] fletcher4 uncompressed LE 
contiguous unique double size=800L/800P birth=16821145L/16821145P 
fill=292309 cksum=f1c0930f9:12c49ea8e701:e036d6305e6ca:79eca698e0476d9


Object  lvl   iblk   dblk  dsize  lsize   %full  type
2314842   128K   128K   392K   384K  100.00  ZFS plain file
168   bonus  System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 2
path/boot/kernel/vmm.ko
uid 0
gid 0
atime   Sat Apr  7 17:06:06 2018
mtime   Sat Apr  7 15:05:48 2018
ctime   Sat Apr  7 15:05:48 2018
crtime  Sat Apr  7 14:08:21 2018
gen 16768195
mode100555
size371168
parent  229378
links   1
pflags  4080104
Indirect blocks:
   0 L1  1:1f01016c000:1000 2L/1000P F=3 
B=16769122/16769122
   0  L0 1:1f00f9e3000:2 2L/2P F=1 
B=16769122/16769122
   2  L0 1:1f00fa03000:2 2L/2P F=1 
B=16769122/16769122
   4  L0 1:1f00fa23000:2 2L/2P F=1 
B=16769122/16769122


segment [, 0006) size  384K



Thanks,

Drew
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Odd ZFS boot module issue on r332158

2018-04-10 Thread Toomas Soome


> On 10 Apr 2018, at 15:27, Andrew Gallatin  wrote:
> 
> On 04/09/18 23:33, Allan Jude wrote:
>> On 2018-04-09 19:11, Andrew Gallatin wrote:
>>> I updated my main amd64 workstation to r332158 from something much
>>> earlier (mid Jan).
>>> 
>>> Upon reboot, all seemed well.  However, I later realized that the vmm.ko
>>> module was not loaded at boot, because bhyve PCI passthru did not
>>> work.  My loader.conf looks like (I'm passing a USB interface through):
>>> 
>>> ###
>>> vmm_load="YES"
>>> opensolaris_load="YES"
>>> zfs_load="YES"
>>> nvidia_load="YES"
>>> nvidia-modeset_load="YES"
>>> 
>>> # Tune ZFS Arc Size - Change to adjust memory used for disk cache
>>> vfs.zfs.arc_max="4096M"
>>> hint.xhci.2.disabled="1"
>>> pptdevs="8/0/0"
>>> hw.dmar.enable="0"
>>> cuse_load="YES"
>>> ###
>>> 
>>> The problem seems "random".  I rebooted into single-user to
>>> see if somehow, vmm.ko was loaded at boot and something
>>> was unloading vmm.ko.  However, on this boot it was loaded.  I then
>>> ^D'ed and continued to multi-user, where X failed to start because
>>> this time, the nvidia modules were not loaded.  (but nvidia had
>>> been loaded on the 1st boot).
>>> 
>>> So it *seems* like different modules are randomly not loaded by the
>>> loader, at boot.   The ZFS config is:
>>> 
>>> config:
>>> 
>>> NAMESTATE READ WRITE CKSUM
>>> tankONLINE   0 0 0
>>>   mirror-0  ONLINE   0 0 0
>>> ada0p2  ONLINE   0 0 0
>>> da3p2   ONLINE   0 0 0
>>>   mirror-1  ONLINE   0 0 0
>>> ada1p2  ONLINE   0 0 0
>>> da0p2   ONLINE   0 0 0
>>> cache
>>>   da2s1dONLINE   0 0 0
>>> 
>>> The data drives in the pool are all exactly like this:
>>> 
>>> =>34  9767541101  ada0  GPT  (4.5T)
>>>   34   6- free -  (3.0K)
>>>   40  204800 1  efi  (100M)
>>>   204840  9763209216 2  freebsd-zfs  (4.5T)
>>>   9763414056 4096000 3  freebsd-swap  (2.0G)
>>>   9767510056   31079- free -  (15M)
>>> 
>>> 
>>> There is about 1.44T used in the pool.  I have no idea
>>> how ZFS mirrors work, but I'm wondering if somehow this
>>> is a 2T problem, and there are issues with blocks on
>>> difference sides of the mirror being across the 2T boundary.
>>> 
>>> Sorry to be so vague.. but this is the one machine I *don't* have
>>> a serial console on, so I don't have good logs.
>>> 
>>> Drew
>>> 
>>> ___
>>> freebsd-current@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>> What makes you think it is related to ZFS?
>> Are there any error messages when the nvidia module did not load?
> 
> I think it is related to ZFS simply because I'm booting from ZFS and
> it is not working reliably.  Our systems at work, booting from UFS on
> roughly the same svn rev seem to still load modules reliably from
> the loader.  I know there has been a lot of work on the loader
> recently, and in a UEFE + UFS context, I've seen it fail to boot
> the right partition, etc.  However, I've never seen it fail to load
> just some modules.  The one difference between what I run at home
> and what we run at work is ZFS vs UFS.
> 
> Given that it is a glass console, I have no confidence in my ability
> to log error messages.   However, I could have sworn that I saw
> something like "io error" when it failed to load vmm.ko
> (I actually rebooted several times when I was diagnosing it..
> at first I thought xhci was holding on to the pass-thru device)
> 
> I vaguely remembered reading something about this recently.
> I just tracked it down to the "ZFS i/o error in recent 12.0"
> thread from last month, and this message in particular:
> 
> https://lists.freebsd.org/pipermail/freebsd-current/2018-March/068890.html
> 
> I'm booting via UEFI into a ZFS system with a FS that
> extends across 2TB..
> 
> Is there something like tools/diag/prtblknos for ZFS?
> 

run zpool scrub first, however, if you were able to load that module manually 
from OS, there is no reason to suspect the zfs corruption.

But if you really are getting IO errors, I would actually suspect that the 
firmware is is buggy and can not really read past 2TB, so the obvious second 
suggestion is to check for firmware update. The ZFS reader code does try all 
block copies before giving up on the block, so the third option you can test is:

1. reboot
2. press esc when the boot menu is up to get to OK prompt
3. enter:  start

this would load the configured files and you will get the error messages. Also 
once you have kernel loaded, you can try to load modules manually with load 
command.

If still nothing, the only way to ensure your data is below 2TB line is to 

Re: Odd ZFS boot module issue on r332158

2018-04-10 Thread Andriy Gapon
On 10/04/2018 15:27, Andrew Gallatin wrote:
> Is there something like tools/diag/prtblknos for ZFS?

zdb.

It has a manual page, but in the case like this you typically want to run
zdb -d[d*]  
Add d-s until you get all the information you want.

It looks like five d-s is needed to get individual blocks reported.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Re: Odd ZFS boot module issue on r332158

2018-04-10 Thread Andrew Gallatin

On 04/09/18 23:33, Allan Jude wrote:

On 2018-04-09 19:11, Andrew Gallatin wrote:

I updated my main amd64 workstation to r332158 from something much
earlier (mid Jan).

Upon reboot, all seemed well.  However, I later realized that the vmm.ko
module was not loaded at boot, because bhyve PCI passthru did not
work.  My loader.conf looks like (I'm passing a USB interface through):

###
vmm_load="YES"
opensolaris_load="YES"
zfs_load="YES"
nvidia_load="YES"
nvidia-modeset_load="YES"

# Tune ZFS Arc Size - Change to adjust memory used for disk cache
vfs.zfs.arc_max="4096M"
hint.xhci.2.disabled="1"
pptdevs="8/0/0"
hw.dmar.enable="0"
cuse_load="YES"
###

The problem seems "random".  I rebooted into single-user to
see if somehow, vmm.ko was loaded at boot and something
was unloading vmm.ko.  However, on this boot it was loaded.  I then
^D'ed and continued to multi-user, where X failed to start because
this time, the nvidia modules were not loaded.  (but nvidia had
been loaded on the 1st boot).

So it *seems* like different modules are randomly not loaded by the
loader, at boot.   The ZFS config is:

config:

     NAME    STATE READ WRITE CKSUM
     tank    ONLINE   0 0 0
   mirror-0  ONLINE   0 0 0
     ada0p2  ONLINE   0 0 0
     da3p2   ONLINE   0 0 0
   mirror-1  ONLINE   0 0 0
     ada1p2  ONLINE   0 0 0
     da0p2   ONLINE   0 0 0
     cache
   da2s1d    ONLINE   0 0 0

The data drives in the pool are all exactly like this:

=>    34  9767541101  ada0  GPT  (4.5T)
   34   6    - free -  (3.0K)
   40  204800 1  efi  (100M)
   204840  9763209216 2  freebsd-zfs  (4.5T)
   9763414056 4096000 3  freebsd-swap  (2.0G)
   9767510056   31079    - free -  (15M)


There is about 1.44T used in the pool.  I have no idea
how ZFS mirrors work, but I'm wondering if somehow this
is a 2T problem, and there are issues with blocks on
difference sides of the mirror being across the 2T boundary.

Sorry to be so vague.. but this is the one machine I *don't* have
a serial console on, so I don't have good logs.

Drew

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


What makes you think it is related to ZFS?

Are there any error messages when the nvidia module did not load?



I think it is related to ZFS simply because I'm booting from ZFS and
it is not working reliably.  Our systems at work, booting from UFS on
roughly the same svn rev seem to still load modules reliably from
the loader.  I know there has been a lot of work on the loader
recently, and in a UEFE + UFS context, I've seen it fail to boot
the right partition, etc.  However, I've never seen it fail to load
just some modules.  The one difference between what I run at home
and what we run at work is ZFS vs UFS.

Given that it is a glass console, I have no confidence in my ability
to log error messages.   However, I could have sworn that I saw
something like "io error" when it failed to load vmm.ko
(I actually rebooted several times when I was diagnosing it..
at first I thought xhci was holding on to the pass-thru device)

I vaguely remembered reading something about this recently.
I just tracked it down to the "ZFS i/o error in recent 12.0"
thread from last month, and this message in particular:

https://lists.freebsd.org/pipermail/freebsd-current/2018-March/068890.html

I'm booting via UEFI into a ZFS system with a FS that
extends across 2TB..

Is there something like tools/diag/prtblknos for ZFS?

Drew

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Odd ZFS boot module issue on r332158

2018-04-09 Thread Allan Jude
On 2018-04-09 19:11, Andrew Gallatin wrote:
> I updated my main amd64 workstation to r332158 from something much
> earlier (mid Jan).
> 
> Upon reboot, all seemed well.  However, I later realized that the vmm.ko
> module was not loaded at boot, because bhyve PCI passthru did not
> work.  My loader.conf looks like (I'm passing a USB interface through):
> 
> ###
> vmm_load="YES"
> opensolaris_load="YES"
> zfs_load="YES"
> nvidia_load="YES"
> nvidia-modeset_load="YES"
> 
> # Tune ZFS Arc Size - Change to adjust memory used for disk cache
> vfs.zfs.arc_max="4096M"
> hint.xhci.2.disabled="1"
> pptdevs="8/0/0"
> hw.dmar.enable="0"
> cuse_load="YES"
> ###
> 
> The problem seems "random".  I rebooted into single-user to
> see if somehow, vmm.ko was loaded at boot and something
> was unloading vmm.ko.  However, on this boot it was loaded.  I then
> ^D'ed and continued to multi-user, where X failed to start because
> this time, the nvidia modules were not loaded.  (but nvidia had
> been loaded on the 1st boot).
> 
> So it *seems* like different modules are randomly not loaded by the
> loader, at boot.   The ZFS config is:
> 
> config:
> 
>     NAME    STATE READ WRITE CKSUM
>     tank    ONLINE   0 0 0
>   mirror-0  ONLINE   0 0 0
>     ada0p2  ONLINE   0 0 0
>     da3p2   ONLINE   0 0 0
>   mirror-1  ONLINE   0 0 0
>     ada1p2  ONLINE   0 0 0
>     da0p2   ONLINE   0 0 0
>     cache
>   da2s1d    ONLINE   0 0 0
> 
> The data drives in the pool are all exactly like this:
> 
> =>    34  9767541101  ada0  GPT  (4.5T)
>   34   6    - free -  (3.0K)
>   40  204800 1  efi  (100M)
>   204840  9763209216 2  freebsd-zfs  (4.5T)
>   9763414056 4096000 3  freebsd-swap  (2.0G)
>   9767510056   31079    - free -  (15M)
> 
> 
> There is about 1.44T used in the pool.  I have no idea
> how ZFS mirrors work, but I'm wondering if somehow this
> is a 2T problem, and there are issues with blocks on
> difference sides of the mirror being across the 2T boundary.
> 
> Sorry to be so vague.. but this is the one machine I *don't* have
> a serial console on, so I don't have good logs.
> 
> Drew
> 
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

What makes you think it is related to ZFS?

Are there any error messages when the nvidia module did not load?

-- 
Allan Jude



signature.asc
Description: OpenPGP digital signature