Re: zfs problems after rebuilding system [SOLVED]

2018-03-13 Thread Pete French




I based my fix heavily on that patch from the PR, but I rewrote it
enough that I might've made any number of mistakes, so it needs fresh
testing. 


Ok, have been rebooting with the patch eery ten minutes for 24 hours 
now, and it comes back up perfectly every time, so as far as I am 
concerned thats sufficient testing for me to say its fixed and I would 
be very happy to have it merged into STABLE (and I;ll then roll it out 
everywhere). Thanks!


-pete.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-12 Thread Ian Lepore
On Mon, 2018-03-12 at 17:21 +, Pete French wrote:
> 
> On 10/03/2018 23:48, Ian Lepore wrote:
> > 
> > I based my fix heavily on that patch from the PR, but I rewrote it
> > enough that I might've made any number of mistakes, so it needs fresh
> > testing.  The main change I made was to make it a lot less noisy while
> > waiting (it only mentions the wait once, unless bootverbose is set, in
> > which case it's once per second).  I also removed the logic that
> > limited the retries to nfs and zfs, because I think we can remove all
> > the old code related to waiting that only worked for ufs and let this
> > new retry be the way it waits for all filesystems.  But that's a bigger
> > change we can do separately; I didn't want to hold up this fix any
> > longer.
> TThansk for the patch, its is very much appercaited! I applied this 
> earlier today, and have been continuously rebooting the machine in Azure 
> ever since (every ten minutes). This has worked flawlessly, so I am very 
> happy that this fixes the issue for me. I am going to leave it running 
> though, just to see if anything happens. I havent examined dmesg, but I 
> thould be able to see the output from the patch there to verify that its 
> waiting, yes ?
> 
> cheers,
> 
> -pete.

Yes, if the root filesystem isn't available on the first attempt, it
should emit a single line saying it will wait for up to N seconds for
it to arrive, where N is the vfs.mountroot.timeout value (3 seconds if
not set in loader.conf).

-- Ian
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-12 Thread Pete French



On 10/03/2018 23:48, Ian Lepore wrote:

I based my fix heavily on that patch from the PR, but I rewrote it
enough that I might've made any number of mistakes, so it needs fresh
testing.  The main change I made was to make it a lot less noisy while
waiting (it only mentions the wait once, unless bootverbose is set, in
which case it's once per second).  I also removed the logic that
limited the retries to nfs and zfs, because I think we can remove all
the old code related to waiting that only worked for ufs and let this
new retry be the way it waits for all filesystems.  But that's a bigger
change we can do separately; I didn't want to hold up this fix any
longer.


TThansk for the patch, its is very much appercaited! I applied this 
earlier today, and have been continuously rebooting the machine in Azure 
ever since (every ten minutes). This has worked flawlessly, so I am very 
happy that this fixes the issue for me. I am going to leave it running 
though, just to see if anything happens. I havent examined dmesg, but I 
thould be able to see the output from the patch there to verify that its 
waiting, yes ?


cheers,

-pete.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-10 Thread Ian Lepore
On Sat, 2018-03-10 at 23:42 +, Pete French wrote:
> > 
> > It looks like r330745 applies fine to stable-11 without any changes,
> > and there's plenty of value in testing that as well, if you're already
> > set up for that world.
> > 
> 
> Ive been running the patch from the PR in production since the original 
> bug report and it works fine. I havent looked at r330745 yes, but can 
> replace the PR patch with that and give it a whirl will take a look 
> Monday at whats possible.
> 
> -pete.
> 

I based my fix heavily on that patch from the PR, but I rewrote it
enough that I might've made any number of mistakes, so it needs fresh
testing.  The main change I made was to make it a lot less noisy while
waiting (it only mentions the wait once, unless bootverbose is set, in
which case it's once per second).  I also removed the logic that
limited the retries to nfs and zfs, because I think we can remove all
the old code related to waiting that only worked for ufs and let this
new retry be the way it waits for all filesystems.  But that's a bigger
change we can do separately; I didn't want to hold up this fix any
longer.

-- Ian
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-10 Thread Pete French



It looks like r330745 applies fine to stable-11 without any changes,
and there's plenty of value in testing that as well, if you're already
set up for that world.




Ive been running the patch from the PR in production since the original 
bug report and it works fine. I havent looked at r330745 yes, but can 
replace the PR patch with that and give it a whirl will take a look 
Monday at whats possible.


-pete.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-10 Thread Ian Lepore
On Sat, 2018-03-10 at 23:08 +, Pete French wrote:
> Ah, thankyou! I haven;t run current before, but as this is such an issue 
> for us I;ll setup an Azure machine running it and have it reboot every 
> five minutes or so to check it works OK. Unfortunately the error doesnt 
> show up consisntently, as its a race condition. Will let you know if it
> fails for any reason.
> 
> -pete. [time to take a dive into the exiting world of current]

It looks like r330745 applies fine to stable-11 without any changes,
and there's plenty of value in testing that as well, if you're already
set up for that world.

-- Ian
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-10 Thread Pete French
Ah, thankyou! I haven;t run current before, but as this is such an issue 
for us I;ll setup an Azure machine running it and have it reboot every 
five minutes or so to check it works OK. Unfortunately the error doesnt 
show up consisntently, as its a race condition. Will let you know if it

fails for any reason.

-pete. [time to take a dive into the exiting world of current]

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-10 Thread Ian Lepore
On Sat, 2018-03-03 at 16:19 +, Pete French wrote:
> 
> > 
> > That won't work for the boot drive.
> > 
> > When no boot drive is detected early enough, the kernel goes to the
> > mountroot prompt.  That seems to hold a Giant lock which inhibits
> > further progress being made.  Sometimes progress can be made by
> > trying
> > to mount unmountable partitions on other drives, but this usually
> > goes
> > too fast, especially if the USB drive often times out.
> 
> 
> We have this problem in Azure with a ZFS root, was fixed by the pacth
> in 
> this bug report, which actually starts off being about USB.
> 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208882
> 
> You can then set the mountroot timeout as normal and it works.
> 
> I wold really like this patch to be applied, but it seems to have 
> languished since last summer. We use this as standard on all our
> cloud 
> machines now, and it works very nicely.
> 
> -pete.

I've committed a fix to -current (r330745) based on that patch.  It
would be good if people running -current who've had this problem could
give it some testing.  I'd like to get it merged back to 11 before the
11.1 release (and back to 10-stable as well).

With r330745 in place, the only setting that should be needed if your
rootfs is on a device that is slow to arrive is vfs.mountroot.timeout=
in loader.conf; the value is the number of seconds to wait before
giving up and going to the mountroot prompt.

-- Ian
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-05 Thread Mark Millard via freebsd-stable
Eugene Grosbein eugen at grosbein.net wrote on
Mon Mar 5 12:20:47 UTC 2018 :

> 05.03.2018 19:10, Dimitry Andric wrote:
> 
>>> When no boot drive is detected early enough, the kernel goes to the
>>> mountroot prompt.  That seems to hold a Giant lock which inhibits
>>> further progress being made.  Sometimes progress can be made by trying
>>> to mount unmountable partitions on other drives, but this usually goes
>>> too fast, especially if the USB drive often times out.
>> 
>> What I would like to know, is why our USB stack has such timeout issues
>> at all.  When I boot Linux on the same type of hardware, I never see USB
>> timeouts.  They must be doing something right, or maybe they just don't
>> bother checking some status bits that we are very strict about?
> 
> This is heavily hardware-dependent. You may have no issues with some
> software+hardware combination and long timeouts with same software
> but different hardware.

Dimitry's example is for changing the software for the same(?) hardware,
if I understand right. (FreeBSD vs. some Linux distribution.)

(?: He did say "type of".)

Perhaps that type of hardware can be used to figure out the difference.

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-05 Thread Eugene Grosbein
05.03.2018 19:10, Dimitry Andric wrote:

>> When no boot drive is detected early enough, the kernel goes to the
>> mountroot prompt.  That seems to hold a Giant lock which inhibits
>> further progress being made.  Sometimes progress can be made by trying
>> to mount unmountable partitions on other drives, but this usually goes
>> too fast, especially if the USB drive often times out.
> 
> What I would like to know, is why our USB stack has such timeout issues
> at all.  When I boot Linux on the same type of hardware, I never see USB
> timeouts.  They must be doing something right, or maybe they just don't
> bother checking some status bits that we are very strict about?

This is heavily hardware-dependent. You may have no issues with some
software+hardware combination and long timeouts with same software
but different hardware.





signature.asc
Description: OpenPGP digital signature


Re: zfs problems after rebuilding system [SOLVED]

2018-03-05 Thread Dimitry Andric
On 3 Mar 2018, at 13:56, Bruce Evans  wrote:
> 
> On Sat, 3 Mar 2018, tech-lists wrote:
>> On 03/03/2018 00:23, Dimitry Andric wrote:
...
>>> Whether this is due to some sort of BIOS handover trouble, or due to
>>> cheap and/or crappy USB-to-SATA bridges (even with brand WD and Seagate
>>> disks!), I have no idea.  I attempted to debug it at some point, but
>>> a well-placed "sleep 10" was an acceptable workaround... :)
>> 
>> That fixed it, thank you again :D
> 
> That won't work for the boot drive.
> 
> When no boot drive is detected early enough, the kernel goes to the
> mountroot prompt.  That seems to hold a Giant lock which inhibits
> further progress being made.  Sometimes progress can be made by trying
> to mount unmountable partitions on other drives, but this usually goes
> too fast, especially if the USB drive often times out.

What I would like to know, is why our USB stack has such timeout issues
at all.  When I boot Linux on the same type of hardware, I never see USB
timeouts.  They must be doing something right, or maybe they just don't
bother checking some status bits that we are very strict about?

-Dimitry



signature.asc
Description: Message signed with OpenPGP


Re: zfs problems after rebuilding system [SOLVED]

2018-03-03 Thread Pete French




That won't work for the boot drive.

When no boot drive is detected early enough, the kernel goes to the
mountroot prompt.  That seems to hold a Giant lock which inhibits
further progress being made.  Sometimes progress can be made by trying
to mount unmountable partitions on other drives, but this usually goes
too fast, especially if the USB drive often times out.




We have this problem in Azure with a ZFS root, was fixed by the pacth in 
this bug report, which actually starts off being about USB.


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208882

You can then set the mountroot timeout as normal and it works.

I wold really like this patch to be applied, but it seems to have 
languished since last summer. We use this as standard on all our cloud 
machines now, and it works very nicely.


-pete.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-03 Thread Eugene Grosbein
03.03.2018 19:56, Bruce Evans wrote:

> On Sat, 3 Mar 2018, tech-lists wrote:
> 
>> On 03/03/2018 00:23, Dimitry Andric wrote:
>>> Indeed.  I have had the following for a few years now, due to USB drives
>>> with ZFS pools:
>>>
>>> --- /usr/src/etc/rc.d/zfs2016-11-08 10:21:29.820131000 +0100
>>> +++ /etc/rc.d/zfs2016-11-08 12:49:52.971161000 +0100
>>> @@ -25,6 +25,8 @@
>>>
>>>  zfs_start_main()
>>>  {
>>> +echo "Sleeping for 10 seconds to let USB devices settle..."
>>> +sleep 10
>>>  zfs mount -va
>>>  zfs share -a
>>>  if [ ! -r /etc/zfs/exports ]; then
>>>
>>> For some reason, USB3 (xhci) controllers can take a very, very long time
>>> to correctly attach mass storage devices: I usually see many timeouts
>>> before they finally get detected.  After that, the devices always work
>>> just fine, though.
> 
> I have one that works for an old USB hard drive but never works for a not
> so old USB flash drive and a new SSD in a USB dock (just to check the SSD
> speed when handicapped by USB).  Win7 has no problems with the xhci and
> USB flash drive combination, and FreeBSD has no problems with the drive
> on other systems.
> 
>>> Whether this is due to some sort of BIOS handover trouble, or due to
>>> cheap and/or crappy USB-to-SATA bridges (even with brand WD and Seagate
>>> disks!), I have no idea.  I attempted to debug it at some point, but
>>> a well-placed "sleep 10" was an acceptable workaround... :)
>>
>> That fixed it, thank you again :D
> 
> That won't work for the boot drive.
> 
> When no boot drive is detected early enough, the kernel goes to the
> mountroot prompt.  That seems to hold a Giant lock which inhibits
> further progress being made.  Sometimes progress can be made by trying
> to mount unmountable partitions on other drives, but this usually goes
> too fast, especially if the USB drive often times out.

In fact, we have enough loader.conf quirks for that:
 
kern.cam.boot_delay "Bus registration wait time" # miliseconds
vfs.mountroot.timeout   "Wait for root mount" # seconds
vfs.root_mount_always_wait "Wait for root mount holds even if the root device 
already exists" # boolean

No need in extra hacks to zfs rc.d script.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-03 Thread tech-lists
On 03/03/2018 12:56, Bruce Evans wrote:
> That won't work for the boot drive.

In my case the workaround is fine because it's not a boot drive

-- 
J.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-03 Thread Bruce Evans

On Sat, 3 Mar 2018, tech-lists wrote:


On 03/03/2018 00:23, Dimitry Andric wrote:

Indeed.  I have had the following for a few years now, due to USB drives
with ZFS pools:

--- /usr/src/etc/rc.d/zfs   2016-11-08 10:21:29.820131000 +0100
+++ /etc/rc.d/zfs   2016-11-08 12:49:52.971161000 +0100
@@ -25,6 +25,8 @@

 zfs_start_main()
 {
+   echo "Sleeping for 10 seconds to let USB devices settle..."
+   sleep 10
zfs mount -va
zfs share -a
if [ ! -r /etc/zfs/exports ]; then

For some reason, USB3 (xhci) controllers can take a very, very long time
to correctly attach mass storage devices: I usually see many timeouts
before they finally get detected.  After that, the devices always work
just fine, though.


I have one that works for an old USB hard drive but never works for a not
so old USB flash drive and a new SSD in a USB dock (just to check the SSD
speed when handicapped by USB).  Win7 has no problems with the xhci and
USB flash drive combination, and FreeBSD has no problems with the drive
on other systems.


Whether this is due to some sort of BIOS handover trouble, or due to
cheap and/or crappy USB-to-SATA bridges (even with brand WD and Seagate
disks!), I have no idea.  I attempted to debug it at some point, but
a well-placed "sleep 10" was an acceptable workaround... :)


That fixed it, thank you again :D


That won't work for the boot drive.

When no boot drive is detected early enough, the kernel goes to the
mountroot prompt.  That seems to hold a Giant lock which inhibits
further progress being made.  Sometimes progress can be made by trying
to mount unmountable partitions on other drives, but this usually goes
too fast, especially if the USB drive often times out.

Bruce
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs problems after rebuilding system [SOLVED]

2018-03-03 Thread tech-lists
On 03/03/2018 00:23, Dimitry Andric wrote:
> Indeed.  I have had the following for a few years now, due to USB drives
> with ZFS pools:
> 
> --- /usr/src/etc/rc.d/zfs 2016-11-08 10:21:29.820131000 +0100
> +++ /etc/rc.d/zfs 2016-11-08 12:49:52.971161000 +0100
> @@ -25,6 +25,8 @@
> 
>  zfs_start_main()
>  {
> + echo "Sleeping for 10 seconds to let USB devices settle..."
> + sleep 10
>   zfs mount -va
>   zfs share -a
>   if [ ! -r /etc/zfs/exports ]; then
> 
> For some reason, USB3 (xhci) controllers can take a very, very long time
> to correctly attach mass storage devices: I usually see many timeouts
> before they finally get detected.  After that, the devices always work
> just fine, though.
> 
> Whether this is due to some sort of BIOS handover trouble, or due to
> cheap and/or crappy USB-to-SATA bridges (even with brand WD and Seagate
> disks!), I have no idea.  I attempted to debug it at some point, but
> a well-placed "sleep 10" was an acceptable workaround... :)

That fixed it, thank you again :D
-- 
J.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"