Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0

2017-04-09 Thread Julian Elischer

On 8/4/17 7:01 pm, Edward Tomasz NapieraƂa wrote:

On 0313T1206, Pete French wrote:

I have a number of machines in Azure, all booting from ZFS and, until
the weekend, running 10.3 perfectly happily.

I started upgrading these to 11. The first went fine, the second would
not boot. Looking at the boot diagnistics it is having problems finding the
root pool to mount. I see this is the diagnostic output:

storvsc0:  on vmbus0
Solaris: NOTICE: Cannot find the pool label for 'rpool'
Mounting from zfs:rpool/ROOT/default failed with error 5.
Root mount waiting for: storvsc
(probe0:blkvsc0:0:storvsc1: 0:0):  on 
vmbus0
storvsc scsi_status = 2
(da0:blkvsc0:0:0:0): UNMAPPED
(probe1:blkvsc1:0:1:0): storvsc scsi_status = 2
hvheartbeat0:  on vmbus0
da0 at blkvsc0 bus 0 scbus2 target 0 lun 0

As you can see, the drive da0 only appears after it has tried, and failed,
to mount the root pool.

Does the same problem still happen with recent 11-STABLE?


There is a fix for this floating around,  we applied at work.
 Our systems are 10.3, but I think it wouldn't be  a bad thing to add 
generally
as it could (if we let it) solve the problem we sometimes see with nfs 
as well

as with azure.

p4 diff2 -du 
//depot/bugatti/FreeBSD-PZ/10.3/sys/kern/vfs_mountroot.c#1 
//depot/bugatti/FreeBSD-PZ/10.3/sys/kern/vfs_mountroot.c#3
 //depot/bugatti/FreeBSD-PZ/10.3/sys/kern/vfs_mountroot.c#1 (text) 
- //depot/bugatti/FreeBSD-PZ/10.3/sys/kern/vfs_mountroot.c#3 (text) 
 content

@@ -126,8 +126,8 @@
 static int root_mount_mddev;
 static int root_mount_complete;

-/* By default wait up to 3 seconds for devices to appear. */
-static int root_mount_timeout = 3;
+/* By default wait up to 30 seconds for devices to appear. */
+static int root_mount_timeout = 30;
 TUNABLE_INT("vfs.mountroot.timeout", _mount_timeout);

 struct root_hold_token *
@@ -690,7 +690,7 @@
 char *errmsg;
 struct mntarg *ma;
 char *dev, *fs, *opts, *tok;
-int delay, error, timeout;
+int delay, error, timeout, err_stride;

 error = parse_token(conf, );
 if (error)
@@ -727,11 +727,20 @@
 goto out;
 }

+/*
+ * For ZFS we can't simply wait for a specific device
+ * as we only know the pool name. To work around this,
+ * parse_mount() will retry the mount later on.
+ *
+ * While retrying for NFS could be implemented similarly
+ * it is currently not supported.
+ */
+delay = hz / 10;
+timeout = root_mount_timeout * hz;
+
 if (strcmp(fs, "zfs") != 0 && strstr(fs, "nfs") == NULL &&
 dev[0] != '\0' && !parse_mount_dev_present(dev)) {
 printf("mountroot: waiting for device %s ...\n", dev);
-delay = hz / 10;
-timeout = root_mount_timeout * hz;
 do {
 pause("rmdev", delay);
 timeout -= delay;
@@ -741,16 +750,34 @@
 goto out;
 }
 }
+/* Timeout keeps counting down */

-ma = NULL;
-ma = mount_arg(ma, "fstype", fs, -1);
-ma = mount_arg(ma, "fspath", "/", -1);
-ma = mount_arg(ma, "from", dev, -1);
-ma = mount_arg(ma, "errmsg", errmsg, ERRMSGL);
-ma = mount_arg(ma, "ro", NULL, 0);
-ma = parse_mountroot_options(ma, opts);
-error = kernel_mount(ma, MNT_ROOTFS);
+err_stride=0;
+do {
+ma = NULL;
+ma = mount_arg(ma, "fstype", fs, -1);
+ma = mount_arg(ma, "fspath", "/", -1);
+ma = mount_arg(ma, "from", dev, -1);
+ma = mount_arg(ma, "errmsg", errmsg, ERRMSGL);
+ma = mount_arg(ma, "ro", NULL, 0);
+ma = parse_mountroot_options(ma, opts);

+error = kernel_mount(ma, MNT_ROOTFS);
+/* UFS only does it once */
+if (strcmp(fs, "zfs") != 0)
+break;
+timeout -= delay;
+if (timeout > 0 && error) {
+if (err_stride <= 0 ) {
+printf("Mounting from %s:%s failed with error %d. "
+"%d seconds left. Retrying.\n", fs, dev, error,
+timeout / hz);
+}
+err_stride += 1;
+err_stride %= 50;
+pause("rmzfs", delay);
+}
+} while (timeout > 0 && error);
  out:
 if (error) {
 printf("Mounting from %s:%s failed with error %d",



___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"



___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: No USB?

2017-04-09 Thread Kevin Oberman
I have opened https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218513 on
this issue.

Bi-section in a problem as the update to LLVM  a week ago breaks building
old kernels. Hoping I can buildworld with the current compiler and then
build the kernel. (Wonder if the new compiler could be the trigger for the
problem I'm seeing?)

Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: rkober...@gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683

On Sun, Apr 9, 2017 at 10:52 AM, Kevin Oberman  wrote:

> On Sat, Apr 8, 2017 at 1:55 PM, Kevin Oberman  wrote:
>
>> Today, for the first time in a couple of weeks, I plugged in a USB drive
>> to my 11-STABLE system (r316552). No device was created and usbconfig only
>> sees EHCI hubs:
>> ugen1.1:  at usbus1, cfg=0 md=HOST spd=HIGH
>> (480Mbps) pwr=SAVE (0mA)
>> ugen0.1:  at usbus0, cfg=0 md=HOST spd=HIGH
>> (480Mbps) pwr=SAVE (0mA)
>>
>> Seems like I should be seeing UHCI stuff, too. Even internal devices like
>> my webcam don't show up.
>>
>> I'm running a GENERIC kernel with the following exceptions:
>> nooptions SCHED_ULE   # ULE scheduler
>> options   SCHED_4BSD  # 4BSD scheduler
>> optionsIEEE80211_DEBUG
>>
>> I tried updating my system and that made no difference. I booted up
>> Windows and it sees the USB drive just fine.
>>
>> Any things I should try or look at to try to figure out what is
>> happening? I really want to get an image of my system before moving in
>> three days.
>>
>> This is looking more and more like a bug. I con't know why nobody else
> had seen it, but here is more information:
> Relevant limes from bot:
> ehci0:  mem 0xf252a000-0xf252a3ff irq
> 16 at device 26.0 on pci0
> usbus0: EHCI version 1.0
> usbus0 on ehci0
> ehci1:  mem 0xf2529000-0xf25293ff irq
> 23 at device 29.0 on pci0
> sbus1: EHCI version 1.0
> usbus1 on ehci1
> [...]
> usbus0: 480Mbps High Speed USB v2.0
> usbus1: 480Mbps High Speed USB v2.0
> ugen1.1:  at usbus1
> uhub0:  on usbus1
> ugen0.1:  at usbus0
> uhub1:  on usbus0
> uhub0: 3 ports with 3 removable, self powered
> uhub1: 3 ports with 3 removable, self powered
> usbus0: port reset timeout
> usbus1: port reset timeout
> uhub_reattach_port: port 1 reset failed, error=USB_ERR_TIMEOUT
> uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 1
> uhub_reattach_port: port 1 reset failed, error=USB_ERR_TIMEOUT
> uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 1
>
> usbconfig -d ugen1.1 reset produced:
> Apr  9 09:15:11 rogue kernel: uhub1: at usbus0, port 1, addr 1
> (disconnected)
> Apr  9 09:15:11 rogue kernel: uhub1:
> Apr  9 09:15:11 rogue kernel:  2.00/1.00, addr 1> on usbus0
> Apr  9 09:15:12 rogue kernel: uhub1: 3 ports with 3 removable, self powered
>
> Any ideas would be GREATLY appreciated as I can't backup or restore my
> system.
>
> I hope to boot a live version of 11-RELEASE if I can find one, and see if
> it works.
> --
> Kevin Oberman, Part time kid herder and retired Network Engineer
> E-mail: rkober...@gmail.com
> PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: No USB?

2017-04-09 Thread Kevin Oberman
On Sat, Apr 8, 2017 at 1:55 PM, Kevin Oberman  wrote:

> Today, for the first time in a couple of weeks, I plugged in a USB drive
> to my 11-STABLE system (r316552). No device was created and usbconfig only
> sees EHCI hubs:
> ugen1.1:  at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps)
> pwr=SAVE (0mA)
> ugen0.1:  at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps)
> pwr=SAVE (0mA)
>
> Seems like I should be seeing UHCI stuff, too. Even internal devices like
> my webcam don't show up.
>
> I'm running a GENERIC kernel with the following exceptions:
> nooptions SCHED_ULE   # ULE scheduler
> options   SCHED_4BSD  # 4BSD scheduler
> optionsIEEE80211_DEBUG
>
> I tried updating my system and that made no difference. I booted up
> Windows and it sees the USB drive just fine.
>
> Any things I should try or look at to try to figure out what is happening?
> I really want to get an image of my system before moving in three days.
>
> This is looking more and more like a bug. I con't know why nobody else had
seen it, but here is more information:
Relevant limes from bot:
ehci0:  mem 0xf252a000-0xf252a3ff irq 16
at device 26.0 on pci0
usbus0: EHCI version 1.0
usbus0 on ehci0
ehci1:  mem 0xf2529000-0xf25293ff irq 23
at device 29.0 on pci0
sbus1: EHCI version 1.0
usbus1 on ehci1
[...]
usbus0: 480Mbps High Speed USB v2.0
usbus1: 480Mbps High Speed USB v2.0
ugen1.1:  at usbus1
uhub0:  on usbus1
ugen0.1:  at usbus0
uhub1:  on usbus0
uhub0: 3 ports with 3 removable, self powered
uhub1: 3 ports with 3 removable, self powered
usbus0: port reset timeout
usbus1: port reset timeout
uhub_reattach_port: port 1 reset failed, error=USB_ERR_TIMEOUT
uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 1
uhub_reattach_port: port 1 reset failed, error=USB_ERR_TIMEOUT
uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 1

usbconfig -d ugen1.1 reset produced:
Apr  9 09:15:11 rogue kernel: uhub1: at usbus0, port 1, addr 1
(disconnected)
Apr  9 09:15:11 rogue kernel: uhub1:
Apr  9 09:15:11 rogue kernel:  on usbus0
Apr  9 09:15:12 rogue kernel: uhub1: 3 ports with 3 removable, self powered

Any ideas would be GREATLY appreciated as I can't backup or restore my
system.

I hope to boot a live version of 11-RELEASE if I can find one, and see if
it works.
--
Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: rkober...@gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"