Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0
On 8/4/17 7:01 pm, Edward Tomasz NapieraĆa wrote: On 0313T1206, Pete French wrote: I have a number of machines in Azure, all booting from ZFS and, until the weekend, running 10.3 perfectly happily. I started upgrading these to 11. The first went fine, the second would not boot. Looking at the boot diagnistics it is having problems finding the root pool to mount. I see this is the diagnostic output: storvsc0: on vmbus0 Solaris: NOTICE: Cannot find the pool label for 'rpool' Mounting from zfs:rpool/ROOT/default failed with error 5. Root mount waiting for: storvsc (probe0:blkvsc0:0:storvsc1: 0:0): on vmbus0 storvsc scsi_status = 2 (da0:blkvsc0:0:0:0): UNMAPPED (probe1:blkvsc1:0:1:0): storvsc scsi_status = 2 hvheartbeat0: on vmbus0 da0 at blkvsc0 bus 0 scbus2 target 0 lun 0 As you can see, the drive da0 only appears after it has tried, and failed, to mount the root pool. Does the same problem still happen with recent 11-STABLE? There is a fix for this floating around, we applied at work. Our systems are 10.3, but I think it wouldn't be a bad thing to add generally as it could (if we let it) solve the problem we sometimes see with nfs as well as with azure. p4 diff2 -du //depot/bugatti/FreeBSD-PZ/10.3/sys/kern/vfs_mountroot.c#1 //depot/bugatti/FreeBSD-PZ/10.3/sys/kern/vfs_mountroot.c#3 //depot/bugatti/FreeBSD-PZ/10.3/sys/kern/vfs_mountroot.c#1 (text) - //depot/bugatti/FreeBSD-PZ/10.3/sys/kern/vfs_mountroot.c#3 (text) content @@ -126,8 +126,8 @@ static int root_mount_mddev; static int root_mount_complete; -/* By default wait up to 3 seconds for devices to appear. */ -static int root_mount_timeout = 3; +/* By default wait up to 30 seconds for devices to appear. */ +static int root_mount_timeout = 30; TUNABLE_INT("vfs.mountroot.timeout", _mount_timeout); struct root_hold_token * @@ -690,7 +690,7 @@ char *errmsg; struct mntarg *ma; char *dev, *fs, *opts, *tok; -int delay, error, timeout; +int delay, error, timeout, err_stride; error = parse_token(conf, ); if (error) @@ -727,11 +727,20 @@ goto out; } +/* + * For ZFS we can't simply wait for a specific device + * as we only know the pool name. To work around this, + * parse_mount() will retry the mount later on. + * + * While retrying for NFS could be implemented similarly + * it is currently not supported. + */ +delay = hz / 10; +timeout = root_mount_timeout * hz; + if (strcmp(fs, "zfs") != 0 && strstr(fs, "nfs") == NULL && dev[0] != '\0' && !parse_mount_dev_present(dev)) { printf("mountroot: waiting for device %s ...\n", dev); -delay = hz / 10; -timeout = root_mount_timeout * hz; do { pause("rmdev", delay); timeout -= delay; @@ -741,16 +750,34 @@ goto out; } } +/* Timeout keeps counting down */ -ma = NULL; -ma = mount_arg(ma, "fstype", fs, -1); -ma = mount_arg(ma, "fspath", "/", -1); -ma = mount_arg(ma, "from", dev, -1); -ma = mount_arg(ma, "errmsg", errmsg, ERRMSGL); -ma = mount_arg(ma, "ro", NULL, 0); -ma = parse_mountroot_options(ma, opts); -error = kernel_mount(ma, MNT_ROOTFS); +err_stride=0; +do { +ma = NULL; +ma = mount_arg(ma, "fstype", fs, -1); +ma = mount_arg(ma, "fspath", "/", -1); +ma = mount_arg(ma, "from", dev, -1); +ma = mount_arg(ma, "errmsg", errmsg, ERRMSGL); +ma = mount_arg(ma, "ro", NULL, 0); +ma = parse_mountroot_options(ma, opts); +error = kernel_mount(ma, MNT_ROOTFS); +/* UFS only does it once */ +if (strcmp(fs, "zfs") != 0) +break; +timeout -= delay; +if (timeout > 0 && error) { +if (err_stride <= 0 ) { +printf("Mounting from %s:%s failed with error %d. " +"%d seconds left. Retrying.\n", fs, dev, error, +timeout / hz); +} +err_stride += 1; +err_stride %= 50; +pause("rmzfs", delay); +} +} while (timeout > 0 && error); out: if (error) { printf("Mounting from %s:%s failed with error %d", ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: No USB?
I have opened https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218513 on this issue. Bi-section in a problem as the update to LLVM a week ago breaks building old kernels. Hoping I can buildworld with the current compiler and then build the kernel. (Wonder if the new compiler could be the trigger for the problem I'm seeing?) Kevin Oberman, Part time kid herder and retired Network Engineer E-mail: rkober...@gmail.com PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683 On Sun, Apr 9, 2017 at 10:52 AM, Kevin Obermanwrote: > On Sat, Apr 8, 2017 at 1:55 PM, Kevin Oberman wrote: > >> Today, for the first time in a couple of weeks, I plugged in a USB drive >> to my 11-STABLE system (r316552). No device was created and usbconfig only >> sees EHCI hubs: >> ugen1.1: at usbus1, cfg=0 md=HOST spd=HIGH >> (480Mbps) pwr=SAVE (0mA) >> ugen0.1: at usbus0, cfg=0 md=HOST spd=HIGH >> (480Mbps) pwr=SAVE (0mA) >> >> Seems like I should be seeing UHCI stuff, too. Even internal devices like >> my webcam don't show up. >> >> I'm running a GENERIC kernel with the following exceptions: >> nooptions SCHED_ULE # ULE scheduler >> options SCHED_4BSD # 4BSD scheduler >> optionsIEEE80211_DEBUG >> >> I tried updating my system and that made no difference. I booted up >> Windows and it sees the USB drive just fine. >> >> Any things I should try or look at to try to figure out what is >> happening? I really want to get an image of my system before moving in >> three days. >> >> This is looking more and more like a bug. I con't know why nobody else > had seen it, but here is more information: > Relevant limes from bot: > ehci0: mem 0xf252a000-0xf252a3ff irq > 16 at device 26.0 on pci0 > usbus0: EHCI version 1.0 > usbus0 on ehci0 > ehci1: mem 0xf2529000-0xf25293ff irq > 23 at device 29.0 on pci0 > sbus1: EHCI version 1.0 > usbus1 on ehci1 > [...] > usbus0: 480Mbps High Speed USB v2.0 > usbus1: 480Mbps High Speed USB v2.0 > ugen1.1: at usbus1 > uhub0: on usbus1 > ugen0.1: at usbus0 > uhub1: on usbus0 > uhub0: 3 ports with 3 removable, self powered > uhub1: 3 ports with 3 removable, self powered > usbus0: port reset timeout > usbus1: port reset timeout > uhub_reattach_port: port 1 reset failed, error=USB_ERR_TIMEOUT > uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 1 > uhub_reattach_port: port 1 reset failed, error=USB_ERR_TIMEOUT > uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 1 > > usbconfig -d ugen1.1 reset produced: > Apr 9 09:15:11 rogue kernel: uhub1: at usbus0, port 1, addr 1 > (disconnected) > Apr 9 09:15:11 rogue kernel: uhub1: > Apr 9 09:15:11 rogue kernel: 2.00/1.00, addr 1> on usbus0 > Apr 9 09:15:12 rogue kernel: uhub1: 3 ports with 3 removable, self powered > > Any ideas would be GREATLY appreciated as I can't backup or restore my > system. > > I hope to boot a live version of 11-RELEASE if I can find one, and see if > it works. > -- > Kevin Oberman, Part time kid herder and retired Network Engineer > E-mail: rkober...@gmail.com > PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683 > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: No USB?
On Sat, Apr 8, 2017 at 1:55 PM, Kevin Obermanwrote: > Today, for the first time in a couple of weeks, I plugged in a USB drive > to my 11-STABLE system (r316552). No device was created and usbconfig only > sees EHCI hubs: > ugen1.1: at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) > pwr=SAVE (0mA) > ugen0.1: at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) > pwr=SAVE (0mA) > > Seems like I should be seeing UHCI stuff, too. Even internal devices like > my webcam don't show up. > > I'm running a GENERIC kernel with the following exceptions: > nooptions SCHED_ULE # ULE scheduler > options SCHED_4BSD # 4BSD scheduler > optionsIEEE80211_DEBUG > > I tried updating my system and that made no difference. I booted up > Windows and it sees the USB drive just fine. > > Any things I should try or look at to try to figure out what is happening? > I really want to get an image of my system before moving in three days. > > This is looking more and more like a bug. I con't know why nobody else had seen it, but here is more information: Relevant limes from bot: ehci0: mem 0xf252a000-0xf252a3ff irq 16 at device 26.0 on pci0 usbus0: EHCI version 1.0 usbus0 on ehci0 ehci1: mem 0xf2529000-0xf25293ff irq 23 at device 29.0 on pci0 sbus1: EHCI version 1.0 usbus1 on ehci1 [...] usbus0: 480Mbps High Speed USB v2.0 usbus1: 480Mbps High Speed USB v2.0 ugen1.1: at usbus1 uhub0: on usbus1 ugen0.1: at usbus0 uhub1: on usbus0 uhub0: 3 ports with 3 removable, self powered uhub1: 3 ports with 3 removable, self powered usbus0: port reset timeout usbus1: port reset timeout uhub_reattach_port: port 1 reset failed, error=USB_ERR_TIMEOUT uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 1 uhub_reattach_port: port 1 reset failed, error=USB_ERR_TIMEOUT uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 1 usbconfig -d ugen1.1 reset produced: Apr 9 09:15:11 rogue kernel: uhub1: at usbus0, port 1, addr 1 (disconnected) Apr 9 09:15:11 rogue kernel: uhub1: Apr 9 09:15:11 rogue kernel: on usbus0 Apr 9 09:15:12 rogue kernel: uhub1: 3 ports with 3 removable, self powered Any ideas would be GREATLY appreciated as I can't backup or restore my system. I hope to boot a live version of 11-RELEASE if I can find one, and see if it works. -- Kevin Oberman, Part time kid herder and retired Network Engineer E-mail: rkober...@gmail.com PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"