r324870 breaks boot on amd64 with WITNESS (was "svn commit: r324870 - in head/sys: amd64/include kern")
All, I highly advise not upgrading to this revision if you use WITNESS. Please see the attached message for more details and reply to the commit log. Cheers, -Ngie > Begin forwarded message: > > From: "Ngie Cooper (yaneurabeya)"> Subject: Re: svn commit: r324870 - in head/sys: amd64/include kern > Date: October 22, 2017 at 17:19:32 PDT > To: Mateusz Guzik > Cc: src-committers , svn-src-...@freebsd.org, > svn-src-h...@freebsd.org > > >> On Oct 22, 2017, at 13:43, Mateusz Guzik wrote: >> >> Author: mjg >> Date: Sun Oct 22 20:43:50 2017 >> New Revision: 324870 >> URL: https://svnweb.freebsd.org/changeset/base/324870 >> >> Log: >> Make the sleepq chain hash size configurable per-arch and bump on amd64. >> >> While here cache-align chains. >> >> This shortens longest found chain during poudriere -j 80 from 32 to 16. >> >> Pushing this higher up will probably require allocation on boot. > > Hi Mateusz, > This change causes the Jenkins VMs to panic at boot with "panic: > witness_init: pending locks list is too small, increase WITNESS_PENDLIST” > when WITNESS is enabled: > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/4781/console . > Please fix or revert. > Thanks, > -Ngie signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Segfault in _Unwind_* code called from pthread_exit
On 22.10.17 02:18, Tijl Coosemans wrote: On Sat, 21 Oct 2017 22:02:38 +0200 Andreas Toblerwrote: On 26.08.17 20:40, Konstantin Belousov wrote: On Sat, Aug 26, 2017 at 08:28:13PM +0200, Tijl Coosemans wrote: On Sat, 26 Aug 2017 02:44:42 +0300 Konstantin Belousov wrote: How does llvm unwinder detects that the return address is a garbage ? It just stops unwinding when it can't find frame information (stored in .eh_frame sections). GCC unwinder doesn't give up yet and checks if the return address points to the signal trampoline (which means the current frame is that of a signal handler). It has built-in knowledge of how to unwind to the signal trampoline frame. So llvm just gives up on signal frames ? A noreturn attribute isn't enough. You can still unwind such functions. They are allowed to throw exceptions for example. Ok. I did consider using a CFI directive (see patch below) and it works, but it's architecture specific and it's inserted after the function prologue so there's still a window of a few instructions where a stack unwinder will try to use the return address. Index: lib/libthr/thread/thr_create.c === --- lib/libthr/thread/thr_create.c (revision 322802) +++ lib/libthr/thread/thr_create.c (working copy) @@ -251,6 +251,7 @@ create_stack(struct pthread_attr *pattr) static void thread_start(struct pthread *curthread) { + __asm(".cfi_undefined %rip"); sigset_t set; if (curthread->attr.suspend == THR_CREATE_SUSPENDED) I like this approach much more than the previous patch. What can be done is to provide asm trampoline which calls thread_start(). There you can add the .cfi_undefined right at the entry. It is somewhat more work than just setting the return address on the kernel-constructed pseudo stack frame, but I believe this is ultimately correct way. You still can do it only on some arches, if you do not have incentive to code asm for all of them. Also crt1 probably should get the same treatment, despite we already set %rbp to zero AFAIR. Did some commit result out of this discussion or is this subject still under investigation? Curious because I got this gcc PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82635 If I add the above to lib/libthr/thread/thr_create.c the mentioned PR works. Sorry, but I didn't and won't have time to work on this. Np. Ideally I think there should be a function attribute to mark functions as entry points. The compiler would add ".cfi_undefined %rip" to such functions (and maybe optimise the function prologue because there are no caller registers that need to be preserved). If you have connections in the GCC community maybe you could discuss that with them. Well, from my understanding I'd have to teach every compiler to do so, right? (Beside that I do not know how to.) I think we need another solution to find out if an unwind context is garbage. I'll take a look at how llvm does this w/o segfaulting. Thx, Andreas ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: host, bhyve vm and ntpd
[ Charset UTF-8 unsupported, converting... ] > 22.10.2017 01:15, Ian Lepore ?: > > On Sat, 2017-10-21 at 17:07 -0400, Michael Voorhis wrote: > >> Ian Lepore writes: > >>> > >>> Beyond that, I'm not sure what else to try. ?It might be necessary to > >>> get some bhyve developers involved (I know almost nothing about it). > >> NTPD behaves more normally on uniprocessor VMs. > >> > >> A FreeBSD bhyve-guest running on a freebsd host will select a > >> different timecounter depending on whether it is a multiprocessor or a > >> uniprocessor.??My uniprocessor bhyve-vm selected TSC-low as the best > >> timecounter in a uniprocessor.??NTP functions there as expected. > >> > >> kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) HPET(950) i8254(0) > >> dummy(-100) > >> kern.timecounter.hardware: TSC-low > >> > >> The very same VM, when given two total CPUs, selected HPET (if I > >> recall) and the timekeeping with NTPD was unreliable, with many > >> step-resets to the clock. > >> > > > > Hmm, I just had glance at the code in?sys/amd64/vmm/io/vhpet.c and it > > looks right. ?I wonder if this is just a simple roundoff error in > > converting between 10.0MHz and SBT units? ?If so, that could be wished > > away easily by using a power-of-2 frequency for the virtual HPET. ?I > > wonder if the attached patch is all that's needed? > I've tried the patch (at bhyve guest) and nothing has changed. Should > the patched system be tested at bhyve guest or bhyve host? I believe the suggested patch would have to be made to the bhyve host. Also on the host and guest what are the values of sysctl kern.timecounter.tc.HPET sysctl kern.timecounter.tc.i8254 Getting good ntpd behavior in a VM guest of any kind is sometimes a non trivial thing to do. -- Rod Grimes rgri...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: /sys/boot compile broken
On Sun, Oct 22, 2017 at 10:53 AM, Gary Jennejohnwrote: > On Sun, 22 Oct 2017 08:59:52 -0600 > Warner Losh wrote: > > > On Sun, Oct 22, 2017 at 1:39 AM, Gary Jennejohn > > wrote: > > > > > On Sat, 21 Oct 2017 09:33:41 -0600 > > > Warner Losh wrote: > > > > > > > On Oct 21, 2017 8:02 AM, "Allan Jude" wrote: > > > > > > > > On 2017-10-21 02:41, Gary Jennejohn wrote: > > > > > SVN for HEAD source at 324810. > > > > > > > > > > Compiling /sys/boot is totally screwed up. The failure is that > > > > > geliboot.c cannot be found. > > > > > > > > > > This prevents a successful ``make buildworld''. > > > > > > > > > > This error occurs despite the fact that I have > LOADER_NO_GELI_SUPPORT > > > > > set to yes in src.conf. > > > > > > > > > > Looking at the various Makefiles this option is supposed to prevent > > > > > using GELI. > > > > > > > > > > Even if the user wanted to use GELI the compile of the boot code > > > > > would probably fail. > > > > > > > > > > imp@ has had his fingers in the boot code lately. > > > > > > > > > > > > > Some of the boot code has been changed over to LOADER_GELI_SUPPORT=no > > > > > > > > And so you ended up with some code not guarded. Add the additional > > > > src.conf knob for now, and Warner will get it fixed up > > > > > > > > > > > > I fly back from legoland today and will touch this up. I'm in the > process > > > > of changing them into real config knobs from the weird things that > have > > > > leaked out... sys/boot is likely moving my up to boot as well in the > near > > > > future. > > > > > > > > > > Thanks for the info. > > > > > > I see that this change was made fairly late in my time zone. > > > > > > mine too :) I was going to reply to this thread, but it was too late... > > > > Please let me know if it works for you. I was able to do builds both > ways. > > > > It worked. It was good to see that you put an entry in UPDATING. > People noticing was a good clue I'd been remiss in the initial round of commits. Warner ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: /sys/boot compile broken
On Sun, 22 Oct 2017 08:59:52 -0600 Warner Loshwrote: > On Sun, Oct 22, 2017 at 1:39 AM, Gary Jennejohn > wrote: > > > On Sat, 21 Oct 2017 09:33:41 -0600 > > Warner Losh wrote: > > > > > On Oct 21, 2017 8:02 AM, "Allan Jude" wrote: > > > > > > On 2017-10-21 02:41, Gary Jennejohn wrote: > > > > SVN for HEAD source at 324810. > > > > > > > > Compiling /sys/boot is totally screwed up. The failure is that > > > > geliboot.c cannot be found. > > > > > > > > This prevents a successful ``make buildworld''. > > > > > > > > This error occurs despite the fact that I have LOADER_NO_GELI_SUPPORT > > > > set to yes in src.conf. > > > > > > > > Looking at the various Makefiles this option is supposed to prevent > > > > using GELI. > > > > > > > > Even if the user wanted to use GELI the compile of the boot code > > > > would probably fail. > > > > > > > > imp@ has had his fingers in the boot code lately. > > > > > > > > > > Some of the boot code has been changed over to LOADER_GELI_SUPPORT=no > > > > > > And so you ended up with some code not guarded. Add the additional > > > src.conf knob for now, and Warner will get it fixed up > > > > > > > > > I fly back from legoland today and will touch this up. I'm in the process > > > of changing them into real config knobs from the weird things that have > > > leaked out... sys/boot is likely moving my up to boot as well in the near > > > future. > > > > > > > Thanks for the info. > > > > I see that this change was made fairly late in my time zone. > > > mine too :) I was going to reply to this thread, but it was too late... > > Please let me know if it works for you. I was able to do builds both ways. > It worked. It was good to see that you put an entry in UPDATING. -- Gary Jennejohn ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: host, bhyve vm and ntpd
22.10.2017 19:02, Ian Lepore пишет: > On Sun, 2017-10-22 at 11:31 +0300, Boris Samorodov wrote: >> 22.10.2017 01:15, Ian Lepore пишет: >>> >>> On Sat, 2017-10-21 at 17:07 -0400, Michael Voorhis wrote: Ian Lepore writes: > > > Beyond that, I'm not sure what else to try. It might be necessary to > get some bhyve developers involved (I know almost nothing about it). NTPD behaves more normally on uniprocessor VMs. A FreeBSD bhyve-guest running on a freebsd host will select a different timecounter depending on whether it is a multiprocessor or a uniprocessor. My uniprocessor bhyve-vm selected TSC-low as the best timecounter in a uniprocessor. NTP functions there as expected. kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) HPET(950) i8254(0) dummy(-100) kern.timecounter.hardware: TSC-low The very same VM, when given two total CPUs, selected HPET (if I recall) and the timekeeping with NTPD was unreliable, with many step-resets to the clock. >>> Hmm, I just had glance at the code in sys/amd64/vmm/io/vhpet.c and it >>> looks right. I wonder if this is just a simple roundoff error in >>> converting between 10.0MHz and SBT units? If so, that could be wished >>> away easily by using a power-of-2 frequency for the virtual HPET. I >>> wonder if the attached patch is all that's needed? >> I've tried the patch (at bhyve guest) and nothing has changed. Should >> the patched system be tested at bhyve guest or bhyve host? >> > > Oh, I'm sorry, I should have mentioned that's for the host side. NP, that's OK. However, the host is busy now, and I'll have an opportunity to test host only tomorrow evening. Ian, thank you for your help! -- WBR, bsam ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: host, bhyve vm and ntpd
On Sun, 2017-10-22 at 11:31 +0300, Boris Samorodov wrote: > 22.10.2017 01:15, Ian Lepore пишет: > > > > On Sat, 2017-10-21 at 17:07 -0400, Michael Voorhis wrote: > > > > > > Ian Lepore writes: > > > > > > > > > > > > Beyond that, I'm not sure what else to try. It might be necessary to > > > > get some bhyve developers involved (I know almost nothing about it). > > > NTPD behaves more normally on uniprocessor VMs. > > > > > > A FreeBSD bhyve-guest running on a freebsd host will select a > > > different timecounter depending on whether it is a multiprocessor or a > > > uniprocessor. My uniprocessor bhyve-vm selected TSC-low as the best > > > timecounter in a uniprocessor. NTP functions there as expected. > > > > > > kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) HPET(950) i8254(0) > > > dummy(-100) > > > kern.timecounter.hardware: TSC-low > > > > > > The very same VM, when given two total CPUs, selected HPET (if I > > > recall) and the timekeeping with NTPD was unreliable, with many > > > step-resets to the clock. > > > > > Hmm, I just had glance at the code in sys/amd64/vmm/io/vhpet.c and it > > looks right. I wonder if this is just a simple roundoff error in > > converting between 10.0MHz and SBT units? If so, that could be wished > > away easily by using a power-of-2 frequency for the virtual HPET. I > > wonder if the attached patch is all that's needed? > I've tried the patch (at bhyve guest) and nothing has changed. Should > the patched system be tested at bhyve guest or bhyve host? > Oh, I'm sorry, I should have mentioned that's for the host side. -- Ian ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: host, bhyve vm and ntpd
22.10.2017 18:22, Rodney W. Grimes пишет: > [ Charset UTF-8 unsupported, converting... ] >> 22.10.2017 01:15, Ian Lepore ?: >>> On Sat, 2017-10-21 at 17:07 -0400, Michael Voorhis wrote: Ian Lepore writes: > > Beyond that, I'm not sure what else to try. ?It might be necessary to > get some bhyve developers involved (I know almost nothing about it). NTPD behaves more normally on uniprocessor VMs. A FreeBSD bhyve-guest running on a freebsd host will select a different timecounter depending on whether it is a multiprocessor or a uniprocessor.??My uniprocessor bhyve-vm selected TSC-low as the best timecounter in a uniprocessor.??NTP functions there as expected. kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) HPET(950) i8254(0) dummy(-100) kern.timecounter.hardware: TSC-low The very same VM, when given two total CPUs, selected HPET (if I recall) and the timekeeping with NTPD was unreliable, with many step-resets to the clock. >>> >>> Hmm, I just had glance at the code in?sys/amd64/vmm/io/vhpet.c and it >>> looks right. ?I wonder if this is just a simple roundoff error in >>> converting between 10.0MHz and SBT units? ?If so, that could be wished >>> away easily by using a power-of-2 frequency for the virtual HPET. ?I >>> wonder if the attached patch is all that's needed? >> I've tried the patch (at bhyve guest) and nothing has changed. Should >> the patched system be tested at bhyve guest or bhyve host? > > I believe the suggested patch would have to be made to the bhyve > host OK, I'd do it tomorrow and report back. >. Also on the host and guest what are the values of > sysctl kern.timecounter.tc.HPET > sysctl kern.timecounter.tc.i8254 Here they are: --- bhyve-host% sysctl kern.timecounter.tc.HPET kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.HPET.frequency: 14318180 kern.timecounter.tc.HPET.counter: 2138094157 kern.timecounter.tc.HPET.mask: 4294967295 bhyve-host% sysctl kern.timecounter.tc.i8254 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.counter: 54883 kern.timecounter.tc.i8254.mask: 65535 --- bhyve-guest% sysctl kern.timecounter.tc.HPET kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.HPET.frequency: 1000 kern.timecounter.tc.HPET.counter: 969429421 kern.timecounter.tc.HPET.mask: 4294967295 bhyve-guest% sysctl kern.timecounter.tc.i8254 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.counter: 39893 kern.timecounter.tc.i8254.mask: 65535 --- > Getting good ntpd behavior in a VM guest of any kind is sometimes a > non trivial thing to do. As a side note, I have a CentOS-7 bhyve VM at the same host. And it was enough to run chronyd with default config. Which stepped twice and is stable (no messages) for several days, current log: --- Oct 19 16:01:03 c.vpn systemd[1]: Starting NTP client/server... Oct 19 16:01:03 c.vpn chronyd[27043]: chronyd version 3.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SECHASH +SIGND +ASYNCDNS +IPV6 +DEBUG) Oct 19 16:01:03 c.vpn chronyd[27043]: Frequency 0.000 +/- 100.000 ppm read from /var/lib/chrony/drift Oct 19 16:01:03 c.vpn systemd[1]: Started NTP client/server. Oct 19 16:01:07 c.vpn chronyd[27043]: Selected source XX.XX.XX.1 Oct 19 16:01:07 c.vpn chronyd[27043]: System clock wrong by -44.392782 seconds, adjustment started Oct 19 16:00:23 c.vpn chronyd[27043]: System clock was stepped by -44.392782 seconds Oct 19 16:00:34 c.vpn chronyd[27043]: System clock was stepped by 0.01 seconds --- -- WBR, bsam ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: /sys/boot compile broken
On Sun, Oct 22, 2017 at 1:39 AM, Gary Jennejohnwrote: > On Sat, 21 Oct 2017 09:33:41 -0600 > Warner Losh wrote: > > > On Oct 21, 2017 8:02 AM, "Allan Jude" wrote: > > > > On 2017-10-21 02:41, Gary Jennejohn wrote: > > > SVN for HEAD source at 324810. > > > > > > Compiling /sys/boot is totally screwed up. The failure is that > > > geliboot.c cannot be found. > > > > > > This prevents a successful ``make buildworld''. > > > > > > This error occurs despite the fact that I have LOADER_NO_GELI_SUPPORT > > > set to yes in src.conf. > > > > > > Looking at the various Makefiles this option is supposed to prevent > > > using GELI. > > > > > > Even if the user wanted to use GELI the compile of the boot code > > > would probably fail. > > > > > > imp@ has had his fingers in the boot code lately. > > > > > > > Some of the boot code has been changed over to LOADER_GELI_SUPPORT=no > > > > And so you ended up with some code not guarded. Add the additional > > src.conf knob for now, and Warner will get it fixed up > > > > > > I fly back from legoland today and will touch this up. I'm in the process > > of changing them into real config knobs from the weird things that have > > leaked out... sys/boot is likely moving my up to boot as well in the near > > future. > > > > Thanks for the info. > > I see that this change was made fairly late in my time zone. mine too :) I was going to reply to this thread, but it was too late... Please let me know if it works for you. I was able to do builds both ways. Warner ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: host, bhyve vm and ntpd
22.10.2017 01:15, Ian Lepore пишет: > On Sat, 2017-10-21 at 17:07 -0400, Michael Voorhis wrote: >> Ian Lepore writes: >>> >>> Beyond that, I'm not sure what else to try. It might be necessary to >>> get some bhyve developers involved (I know almost nothing about it). >> NTPD behaves more normally on uniprocessor VMs. >> >> A FreeBSD bhyve-guest running on a freebsd host will select a >> different timecounter depending on whether it is a multiprocessor or a >> uniprocessor. My uniprocessor bhyve-vm selected TSC-low as the best >> timecounter in a uniprocessor. NTP functions there as expected. >> >> kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) HPET(950) i8254(0) >> dummy(-100) >> kern.timecounter.hardware: TSC-low >> >> The very same VM, when given two total CPUs, selected HPET (if I >> recall) and the timekeeping with NTPD was unreliable, with many >> step-resets to the clock. >> > > Hmm, I just had glance at the code in sys/amd64/vmm/io/vhpet.c and it > looks right. I wonder if this is just a simple roundoff error in > converting between 10.0MHz and SBT units? If so, that could be wished > away easily by using a power-of-2 frequency for the virtual HPET. I > wonder if the attached patch is all that's needed? I've tried the patch (at bhyve guest) and nothing has changed. Should the patched system be tested at bhyve guest or bhyve host? -- WBR, bsam ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: host, bhyve vm and ntpd
22.10.2017 00:07, Michael Voorhis пишет: > Ian Lepore writes: >> Beyond that, I'm not sure what else to try. It might be necessary to >> get some bhyve developers involved (I know almost nothing about it). > > NTPD behaves more normally on uniprocessor VMs. > > A FreeBSD bhyve-guest running on a freebsd host will select a > different timecounter depending on whether it is a multiprocessor or a > uniprocessor. My uniprocessor bhyve-vm selected TSC-low as the best > timecounter in a uniprocessor. NTP functions there as expected. > > kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) HPET(950) i8254(0) > dummy(-100) > kern.timecounter.hardware: TSC-low > > The very same VM, when given two total CPUs, selected HPET (if I > recall) and the timekeeping with NTPD was unreliable, with many > step-resets to the clock. Yep, the same here. I've switched to TSC-low at Bhyve guest and there is no stepping per 24 hours. -- WBR, bsam ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: /sys/boot compile broken
On Sat, 21 Oct 2017 09:33:41 -0600 Warner Loshwrote: > On Oct 21, 2017 8:02 AM, "Allan Jude" wrote: > > On 2017-10-21 02:41, Gary Jennejohn wrote: > > SVN for HEAD source at 324810. > > > > Compiling /sys/boot is totally screwed up. The failure is that > > geliboot.c cannot be found. > > > > This prevents a successful ``make buildworld''. > > > > This error occurs despite the fact that I have LOADER_NO_GELI_SUPPORT > > set to yes in src.conf. > > > > Looking at the various Makefiles this option is supposed to prevent > > using GELI. > > > > Even if the user wanted to use GELI the compile of the boot code > > would probably fail. > > > > imp@ has had his fingers in the boot code lately. > > > > Some of the boot code has been changed over to LOADER_GELI_SUPPORT=no > > And so you ended up with some code not guarded. Add the additional > src.conf knob for now, and Warner will get it fixed up > > > I fly back from legoland today and will touch this up. I'm in the process > of changing them into real config knobs from the weird things that have > leaked out... sys/boot is likely moving my up to boot as well in the near > future. > Thanks for the info. I see that this change was made fairly late in my time zone. -- Gary Jennejohn ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"