from:"Konstantin Belousov"

Re: Install of 13.0-RELEASE i386 with ZFS root hangs up

2021-05-08 Thread Konstantin Belousov

On Sat, May 08, 2021 at 06:33:02PM +0700, Eugene Grosbein wrote:
> 08.05.2021 2:52, Konstantin Belousov wrote:
> 
> > i386 kernel uses memory up to 24G since 13.0.
> > 
> > PAE only means that devices that can access full 64bit address are allowed
> > to avoid dma bouncing.
> 
> Maybe you could tell something on similar topic?
> 
> There is FreeBSD 12.2-STABLE r369567 Base12 amd64 running
> with Intel Atom CPU capable of long mode and addressing 8GB RAM,
> ASRock A330ION motherboard and two memory modules installed: 4G+2GB.
> Why so small "avail memory"?
> 
> FreeBSD clang version 10.0.1 (g...@github.com:llvm/llvm-project.git 
> llvmorg-10.0.1-0-gef32c611aa2)
> CPU: Intel(R) Atom(TM) CPU  330   @ 1.60GHz (1600.03-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x106c2  Family=0x6  Model=0x1c  Stepping=2
>   
> Features=0xbfe9fbff
>   Features2=0x40e31d
>   AMD Features=0x2800
>   AMD Features2=0x1
>   TSC: P-state invariant, performance statistics
> real memory  = 6442450944 (6144 MB)
> Physical memory chunk(s):
> 0x0001 - 0x0009dfff, 581632 bytes (142 pages)
> 0x00103000 - 0x001f, 1036288 bytes (253 pages)
> 0x02b0 - 0xd8709fff, 3586170880 bytes (875530 pages)
> avail memory = 3571384320 (3405 MB)
> 
> Also http://www.grosbein.net/freebsd/dmidecode.txt

Some necromancy revealed that this CPU did not have memory controller
on-chip, it was a design from the 2008 where MCH handled memory.  It is
up to the chipset and BIOS to configure and report the memory above 4G
to OS.  As you clearly see from the SMAP printed above, BIOS does not
report anything above 4G.

Might be, look at bios settings.  No other ideas.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Install of 13.0-RELEASE i386 with ZFS root hangs up

2021-05-07 Thread Konstantin Belousov

On Fri, May 07, 2021 at 09:48:07AM -0700, Freddie Cash wrote:
> On Fri, May 7, 2021 at 5:49 AM Yasuhiro Kimura  wrote:
> 
> > Does anyone succeed to install 13.0-RELEASE i386 with ZFS root?
> >
> > I tried this with VirtualBox and VMware Player on Windows with
> > following VM condition.
> >
> > * 4 CPUs
> > * 8GB memory
> > * 100GB disk
> > * Bridge mode NIC
> >
> > But in both cases, VM gets high CPU load and hangs up after I moved
> > to 'YES' at 'ZFS Configuration' menu and type return key.
> >
> > If I select UFS root installation completes successfully. So the
> > problem is specific to ZFS root.
> >
> 
> Running ZFS on 32-bit OSes is doable (although not recommended) but
> requires a lot of manual configuration and tweaking, especially around
> kernel memory and ARC usage.
> 
> You're limited to 4 GB of memory space, so you need to tune the ARC to use
> less than that.  The auto-tuning has improved a lot over the years, but you
> still need to limit the ARC size to around 2 GB (or less) to keep the
> system stable.  KVA memory space tuning shouldn't be needed anymore, but
> you can do research into that, just in case.
> 
> You can compile a custom kernel to enable PAE support, that will sometimes
> help with memory issues on i386 (and will allow you to use more than 4 GB
> of system RAM, although individual processes are still limited to 4 GB).
i386 kernel uses memory up to 24G since 13.0.

PAE only means that devices that can access full 64bit address are allowed
to avoid dma bouncing.


> 
> If you really need to, you can make ZFS work on i386.  If at all possible,
> though, you really should run it on amd64 instead.
> 
> -- 
> Freddie Cash
> fjwc...@gmail.com
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Filesystem operations slower in 13.0 than 12.2

2021-03-05 Thread Konstantin Belousov

On Sat, Mar 06, 2021 at 12:27:55AM +0200, Christos Chatzaras wrote:
> I did some more tests. Finally I see similar results (with the exception of 
> one "portsnap extract" test). Also with 13.0 I can't trigger a bug that I 
> describe here:
> 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=250576
> 
> --
> 
> Command: /usr/bin/time -l rm -fr /usr/ports /usr/src (these tests done with 
> exactly the same hardware - I upgrade 12.2p4 to 13.0-RC1 for the 2nd test)
> 
> FreeBSD 12.2p4
> 
>12.67 real 0.36 user 1.94 sys
>13.18 real 0.41 user 1.81 sys
>12.16 real 0.36 user 1.85 sys

> FreeBSD 13.0-RC1
> 
>16.71 real 0.63 user 3.02 sys
>14.53 real 0.48 user 2.98 sys
>13.97 real 0.70 user 2.85 sys
> 
> Command: /usr/bin/time -l tar xf src.tar (these tests done with 2 different 
> idle servers but with same 4TB HDDs models)
> 
> FreeBSD 12.2p4
> 
>37.35 real 1.03 user 3.34 sys
> 
> FreeBSD 13.0-RC1
> 
>44.97 real 1.15 user 3.34 sys
> 
> --
> 
> Command: /usr/bin/time -l tar xf ports.tar (these tests done with 2 different 
> idle servers but with same 4TB HDDs models)
> 
> FreeBSD 12.2p4
> 
>50.80 real 1.55 user 4.62 sys
> 
> FreeBSD 13.0-RC1
> 
>59.93 real 1.69 user 4.73 sys
> 
> --
> 
> 
> Command: /usr/bin/time -l portsnap extract (these tests done with 2 different 
> idle servers but with same 4TB HDDs models)
> 
> FreeBSD 12.2p4
> 
>99.45 real34.90 user59.63 sys
>   100.00 real34.91 user59.97 sys
>82.95 real35.98 user60.68 sys
> 
> FreeBSD 13.0-RC1
> 
>   217.43 real75.67 user   110.97 sys
>   125.50 real63.00 user96.47 sys
>   118.93 real62.91 user96.28 sys
I trimmed the data above to show the interesting numbers more compact.
In the portsnap results for 13RC1, the variance is too high to conclude
anything, I think.

There was (is) bugs in FreeBSD UFS SU < 13
- some LoR existed in SU code, where it needed to lock a containing directory
  to provide posix guarantees for fsync(), while owning the vnode lock.  I
  do not believe it is observable in a real-world uses
- in some situations UFS SU in < 13 did not performed necessary fsync()
  of the directory, related to the previous item
The end result was that after sucessfull fsync() followed by a system
failure e.g. power or panic, the parent directory for the synced
vnode would not be synced and the vnode dirent' is not written to the
permanent store. This volatiles posix requirement that after fsync, the
data can be read, since you plain cannot open the file.

During the development of the patch to fix both LoR and related
ommission of fsync, a mistake was made resulting in much more aggessive
syncing of directories. It was not exactly that, but approximately, on
most of metadata operations that created or removed directory entry,
the directory was fully synced. This resulted in the significant slow
down, which was eliminated around BETA4..RC1. I.e. most of fixes come to
BETA4, but minor parts were only discovered later and ready for RC1.

There are still more fsync(dir) in 13RC1 than it is in any 12, by the nature
of the bug and its fix, but the current belief is that all fsync calls left
in the flow are required for correctness.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 13.0-BETA2 and slow IO

2021-02-16 Thread Konstantin Belousov

On Tue, Feb 16, 2021 at 12:54:55PM +0200, Christos Chatzaras wrote:
> 
> > On 16 Feb 2021, at 12:20, Christos Chatzaras  wrote:
> > 
> > I build a test system with 13.0-BETA2 and it's very slow with at least IO.
> > 
> > Doing "portsnap auto" takes much more time than 12.2.
> > 
> > Also when I do "rm -fr /usr/ports" with 12.2 takes 5 seconds and the same 
> > command with 13.0-BETA2 takes 100 seconds.
> > 
> > The disks are similar 4TB HDD drives on both systems.
> > 
> > Is this related to debug enabled in 13.0-BETA2?
> 
> I install 12.2 in the same system and "rm -fr /usr/ports" was fast. So it's 
> not related to hardware.
> 
> If I upgrade it to 13.0-BETA2 the same command is slow again.

Are you using UFS+SU or SU+J?  If yes, this is known and fix is planned
for BETA3 or BETA4.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Microcode update prevents boot

2021-02-14 Thread Konstantin Belousov

On Sun, Feb 14, 2021 at 07:16:29PM -0500, Mark Johnston wrote:
> On Sun, Feb 14, 2021 at 02:01:14PM +0100, Leon Dietrich wrote:
> > Hi there,
> > 
> > I already worked around the issue myself. I'm just writing this here in
> > case someone else may have the same issue and is seeking an answer.
> > 
> > 
> > I recently upgraded the intel cpu microcode update package. Since then
> > the boot process hang at the stage where the other cpu cores where
> > enabled (shortly after enabling acpi). In order to resolve the issue one
> > has to boot in safe mode (not single user mode!) and comment (or remove)
> > the lines enabling the cpu microcode update on boot in
> > /boot/loader.conf. One can and should reboot then.
> > 
> > After making these changes the system boots again and all cores are
> > started and SMT works as well. One should note that one's not running
> > the newer microcode (including some security-) fixes. Having
> > microcode_update_enable="YES" in /etc/rc.conf doesn't prevent booting
> > and does not cause noticeable instability.
> > 
> > For reference: Im running FreeBSD 12.1 on a supermicro embedded board
> > with intel xeon E3-1585L v5 cpus.
> > 
> > 
> > I hope someone will find this info useful.
> 
> I see that r347931 was not merged to stable/12 branch, but the lockless
> delayed invalidation changes were indeed present in 12.1.  Could you see
> if the hang persists when boot-time ucode loading is enabled and
> vm.pmap.di_locked=1 is configured?  Note that you could apply both
> configurations at the loader prompt, i.e., without having to edit
> loader.conf and boot in safe mode to revert the change.

Please check that this patch helps:

commit c0faf2999bfaad2fdcead26d59d60c9b9e01988a
Author: Konstantin Belousov 
Date:   Fri May 17 17:11:01 2019 +

Free microcode memory later.

(cherry picked from commit 8f7f38457f940798c149ae40b73e0d20672812de)

diff --git a/sys/x86/x86/ucode.c b/sys/x86/x86/ucode.c
index 93f82e37eb66..d8beeed68215 100644
--- a/sys/x86/x86/ucode.c
+++ b/sys/x86/x86/ucode.c
@@ -260,7 +260,7 @@ ucode_release(void *arg __unused)
goto restart;
}
 }
-SYSINIT(ucode_release, SI_SUB_KMEM + 1, SI_ORDER_ANY, ucode_release, NULL);
+SYSINIT(ucode_release, SI_SUB_SMP + 1, SI_ORDER_ANY, ucode_release, NULL);
 
 void
 ucode_load_ap(int cpu)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Page fault in _mca_init during startup

2021-02-08 Thread Konstantin Belousov

On Mon, Feb 08, 2021 at 10:48:46AM -0500, Mark Johnston wrote:
> On Mon, Feb 08, 2021 at 05:33:22PM +0200, Konstantin Belousov wrote:
> > On Mon, Feb 08, 2021 at 10:03:59AM -0500, Mark Johnston wrote:
> > > On Mon, Feb 08, 2021 at 12:18:12AM +0200, Konstantin Belousov wrote:
> > > > On Sun, Feb 07, 2021 at 02:33:11PM -0700, Alan Somers wrote:
> > > > > Upgrading the BIOS fixed the problem, by clearing the MCG_CMCI_P bit 
> > > > > on all
> > > > > processors.  I don't have strong opinions about whether we should 
> > > > > commit
> > > > > kib's patch too.  Kib, what do you think?
> > > > 
> > > > The patch causes some memory over-use.
> > > > 
> > > > If this issue is not too widely experienced, I prefer to not commit the 
> > > > patch.
> > > 
> > > Couldn't we short-circuit cmci_monitor() if the BSP did not allocate
> > > anything?
> > > 
> > > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> > > index 03100e77d45..0619a41b128 100644
> > > --- a/sys/x86/x86/mca.c
> > > +++ b/sys/x86/x86/mca.c
> 
> > I think something should be printed in this case, at least once.
> > I believe printf() already works, because spin locks do.
> 
> Indeed, the printf() below should only fire on an AP during SI_SUB_SMP.
> Access to the static flag is synchronized by mca_lock.
> 
> diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> index 03100e77d45..8098bcfb4bd 100644
> --- a/sys/x86/x86/mca.c
> +++ b/sys/x86/x86/mca.c
> @@ -1065,11 +1065,26 @@ mca_setup(uint64_t mcg_cap)
>  static void
>  cmci_monitor(int i)
>  {
> + static bool first = true;
>   struct cmc_state *cc;
>   uint64_t ctl;
>  
>   KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
>  
> + /*
> +  * It is possible for some APs to report CMCI support even if the BSP
> +  * does not, apparently due to a BIOS bug.
> +  */
> + if (cmc_state == NULL) {
> + if (first) {
> + printf(
I would wrote if (bootverbose) printf().
Also it might be useful to report ACPI id/APIC id as well, since the data
is most likely the source for BIOS bug report.

Otherwise fine with me.

> + "AP %d reports CMCI support but the BSP does not\n",
> + PCPU_GET(cpuid));
> + first = false;
> + }
> + return;
> + }
> +
>   ctl = rdmsr(MSR_MC_CTL2(i));
>   if (ctl & MC_CTL2_CMCI_EN)
>   /* Already monitored by another CPU. */
> @@ -1114,6 +1129,10 @@ cmci_resume(int i)
>  
>   KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
>  
> + /* See cmci_monitor(). */
> + if (cmc_state == NULL)
> + return;
> +
>   /* Ignore banks not monitored by this CPU. */
>   if (!(PCPU_GET(cmci_mask) & 1 << i))
>   return;
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Page fault in _mca_init during startup

2021-02-08 Thread Konstantin Belousov

On Mon, Feb 08, 2021 at 10:03:59AM -0500, Mark Johnston wrote:
> On Mon, Feb 08, 2021 at 12:18:12AM +0200, Konstantin Belousov wrote:
> > On Sun, Feb 07, 2021 at 02:33:11PM -0700, Alan Somers wrote:
> > > Upgrading the BIOS fixed the problem, by clearing the MCG_CMCI_P bit on 
> > > all
> > > processors.  I don't have strong opinions about whether we should commit
> > > kib's patch too.  Kib, what do you think?
> > 
> > The patch causes some memory over-use.
> > 
> > If this issue is not too widely experienced, I prefer to not commit the 
> > patch.
> 
> Couldn't we short-circuit cmci_monitor() if the BSP did not allocate
> anything?
> 
> diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> index 03100e77d45..0619a41b128 100644
> --- a/sys/x86/x86/mca.c
> +++ b/sys/x86/x86/mca.c
> @@ -1070,6 +1070,13 @@ cmci_monitor(int i)
>  
>   KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
>  
> + /*
> +  * It is possible for some APs to report CMCI support even if the BSP
> +  * does not, apparently due to a BIOS bug.
> +  */
> + if (cmc_state == NULL)
> + return;
> +
>   ctl = rdmsr(MSR_MC_CTL2(i));
>   if (ctl & MC_CTL2_CMCI_EN)
>   /* Already monitored by another CPU. */
> @@ -1114,6 +1121,10 @@ cmci_resume(int i)
>  
>   KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
>  
> + /* See cmci_monitor(). */
> + if (cmc_state == NULL)
> + return;
> +
>   /* Ignore banks not monitored by this CPU. */
>   if (!(PCPU_GET(cmci_mask) & 1 << i))
>   return;
I think something should be printed in this case, at least once.
I believe printf() already works, because spin locks do.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Page fault in _mca_init during startup

2021-02-07 Thread Konstantin Belousov

On Sun, Feb 07, 2021 at 02:33:11PM -0700, Alan Somers wrote:
> On Fri, Feb 5, 2021 at 10:21 AM Konstantin Belousov 
> wrote:
> 
> > On Fri, Feb 05, 2021 at 09:01:26AM -0700, Alan Somers wrote:
> > > On Fri, Feb 5, 2021 at 7:41 AM Konstantin Belousov 
> > > wrote:
> > >
> > > > On Thu, Feb 04, 2021 at 07:53:09PM -0700, Alan Somers wrote:
> > > > > On Thu, Feb 4, 2021 at 7:40 PM Konstantin Belousov <
> > kostik...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > On Thu, Feb 04, 2021 at 07:01:30PM -0700, Alan Somers wrote:
> > > > > > > On Thu, Feb 4, 2021 at 5:59 PM Konstantin Belousov <
> > > > kostik...@gmail.com>
> > > > > > > wrote:
> > > > > > > > Do you have INVARIANTS enabled?  If not, I am curious if
> > enabling
> > > > them
> > > > > > > > would convert that rare page fault into rare "CPU %d has more
> > MC
> > > > banks"
> > > > > > > > assert.
> > > > > > > >
> > > > > > > > Also might be the output of the
> > > > > > > > # for x in $(jot $(sysctl -n hw.ncpu) 0) ; do cpucontrol -m
> > 0x179
> > > > > > > > /dev/cpuctl$x; done
> > > > > > > > command will show the issue (0x179 is the MCG_CAP MSR).
> > > > > > > > You need to load cpuctl(4) if it is not loaded yet.
> > > > > > > >
> > > > > > >
> > > > > > > I don't have INVARIANTS enabled, and I can't enable it on the
> > > > production
> > > > > > > servers.  However, I can turn those three KASSERTs into VERIFYs
> > and
> > > > see
> > > > > > > what happens.  Here is what your command shows on the server that
> > > > > > panicked:
> > > > > > > $ for x in $(jot $(sysctl -n hw.ncpu) 0) ; do sudo cpucontrol -m
> > > > 0x179
> > > > > > > /dev/cpuctl$x; done | uniq -c
> > > > > > >   16 MSR 0x179: 0x 0x0f000c14
> > > > > > >   16 MSR 0x179: 0x 0x0f000814
> > > > > >
> > > > > > It probably explains it, but it would be more telling if you left
> > the
> > > > > > output as is, so that we can see which CPUs have MCG_CMCI_P (10)
> > bit
> > > > set.
> > > > > >
> > > > >
> > > > > I didn't sort them, so the first 16 have bit 10 set and the second 16
> > > > > don't.
> > > > >
> > > > >
> > > > > >
> > > > > > I suspect that your machine has two sockets, and processor in one
> > > > socket
> > > > > > has CPUs reporting MCG_CMCI_P, while other processor does not.
> > Your SMP
> > > > > > is not quite symmetric, perhaps processors were from different
> > bins?
> > > >
> > >
> > > I found 2 other servers that exhibit the same problem: the first 16 cores
> > > have bit 10 set and the second 16 don't.  All 3 have dual Xeon Gold 6142
> > > CPUs and SuperMicro X11DPU motherboards with BIOS revision 5.12.  I have
> > > other examples of X11DPU motherboards that don't exhibit the problem, but
> > > they all have both different CPUs and different BIOS revisions.  So I
> > can't
> > > be sure whether the bug follows the CPU model or the BIOS version.
> > I looked at the full spec update errata list for the first gen Skylake
> > Xeons, but did not noticed anything relevant. EDS doc does not provide
> > much useful info on the MSR 0x179 bit 10 either, except rewording SDM
> > definition.
> >
> > In fact I am not sure but this bit might be writeable by software. Try
> > to flip the bit with cpucontrol(8). Might be it is a BIOS bug after all.
> >
> > If you have Intel representative contact, or Supermicro contact, try to
> > engage them.  I do not have any further ideas, since spec update does not
> > mention the problem.
> >
> > >
> > >
> > > > > >
> > > > >
> > > > > Could be.  Is there some MSR that reports a more specific version
> > number?
> > > > There are CPUID %eax=1 values returned in %eax, but then it requires
> > > > some interpretation.
> > > > # cpucontrol -i 1 /dev/cpuctl$x
> > > > for $x iterating over the cpus.
> > > >
> > >
> > > Apart from the Local APIC ID field, that returns the same value for all
> > > processors.
> > >
> > > Your second patch doesn't cause any obvious problems on my dev system.
> > I hope that you would confirm that the issue is solved by it, after some
> > time.
> >
> 
> Upgrading the BIOS fixed the problem, by clearing the MCG_CMCI_P bit on all
> processors.  I don't have strong opinions about whether we should commit
> kib's patch too.  Kib, what do you think?

The patch causes some memory over-use.

If this issue is not too widely experienced, I prefer to not commit the patch.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Page fault in _mca_init during startup

2021-02-05 Thread Konstantin Belousov

On Fri, Feb 05, 2021 at 09:01:26AM -0700, Alan Somers wrote:
> On Fri, Feb 5, 2021 at 7:41 AM Konstantin Belousov 
> wrote:
> 
> > On Thu, Feb 04, 2021 at 07:53:09PM -0700, Alan Somers wrote:
> > > On Thu, Feb 4, 2021 at 7:40 PM Konstantin Belousov 
> > > wrote:
> > >
> > > > On Thu, Feb 04, 2021 at 07:01:30PM -0700, Alan Somers wrote:
> > > > > On Thu, Feb 4, 2021 at 5:59 PM Konstantin Belousov <
> > kostik...@gmail.com>
> > > > > wrote:
> > > > > > Do you have INVARIANTS enabled?  If not, I am curious if enabling
> > them
> > > > > > would convert that rare page fault into rare "CPU %d has more MC
> > banks"
> > > > > > assert.
> > > > > >
> > > > > > Also might be the output of the
> > > > > > # for x in $(jot $(sysctl -n hw.ncpu) 0) ; do cpucontrol -m 0x179
> > > > > > /dev/cpuctl$x; done
> > > > > > command will show the issue (0x179 is the MCG_CAP MSR).
> > > > > > You need to load cpuctl(4) if it is not loaded yet.
> > > > > >
> > > > >
> > > > > I don't have INVARIANTS enabled, and I can't enable it on the
> > production
> > > > > servers.  However, I can turn those three KASSERTs into VERIFYs and
> > see
> > > > > what happens.  Here is what your command shows on the server that
> > > > panicked:
> > > > > $ for x in $(jot $(sysctl -n hw.ncpu) 0) ; do sudo cpucontrol -m
> > 0x179
> > > > > /dev/cpuctl$x; done | uniq -c
> > > > >   16 MSR 0x179: 0x 0x0f000c14
> > > > >   16 MSR 0x179: 0x 0x0f000814
> > > >
> > > > It probably explains it, but it would be more telling if you left the
> > > > output as is, so that we can see which CPUs have MCG_CMCI_P (10) bit
> > set.
> > > >
> > >
> > > I didn't sort them, so the first 16 have bit 10 set and the second 16
> > > don't.
> > >
> > >
> > > >
> > > > I suspect that your machine has two sockets, and processor in one
> > socket
> > > > has CPUs reporting MCG_CMCI_P, while other processor does not. Your SMP
> > > > is not quite symmetric, perhaps processors were from different bins?
> >
> 
> I found 2 other servers that exhibit the same problem: the first 16 cores
> have bit 10 set and the second 16 don't.  All 3 have dual Xeon Gold 6142
> CPUs and SuperMicro X11DPU motherboards with BIOS revision 5.12.  I have
> other examples of X11DPU motherboards that don't exhibit the problem, but
> they all have both different CPUs and different BIOS revisions.  So I can't
> be sure whether the bug follows the CPU model or the BIOS version.
I looked at the full spec update errata list for the first gen Skylake
Xeons, but did not noticed anything relevant. EDS doc does not provide
much useful info on the MSR 0x179 bit 10 either, except rewording SDM
definition.

In fact I am not sure but this bit might be writeable by software. Try
to flip the bit with cpucontrol(8). Might be it is a BIOS bug after all.

If you have Intel representative contact, or Supermicro contact, try to
engage them.  I do not have any further ideas, since spec update does not
mention the problem.

> 
> 
> > > >
> > >
> > > Could be.  Is there some MSR that reports a more specific version number?
> > There are CPUID %eax=1 values returned in %eax, but then it requires
> > some interpretation.
> > # cpucontrol -i 1 /dev/cpuctl$x
> > for $x iterating over the cpus.
> >
> 
> Apart from the Local APIC ID field, that returns the same value for all
> processors.
> 
> Your second patch doesn't cause any obvious problems on my dev system.
I hope that you would confirm that the issue is solved by it, after some
time.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Page fault in _mca_init during startup

2021-02-05 Thread Konstantin Belousov

On Thu, Feb 04, 2021 at 07:53:09PM -0700, Alan Somers wrote:
> On Thu, Feb 4, 2021 at 7:40 PM Konstantin Belousov 
> wrote:
> 
> > On Thu, Feb 04, 2021 at 07:01:30PM -0700, Alan Somers wrote:
> > > On Thu, Feb 4, 2021 at 5:59 PM Konstantin Belousov 
> > > wrote:
> > > > Do you have INVARIANTS enabled?  If not, I am curious if enabling them
> > > > would convert that rare page fault into rare "CPU %d has more MC banks"
> > > > assert.
> > > >
> > > > Also might be the output of the
> > > > # for x in $(jot $(sysctl -n hw.ncpu) 0) ; do cpucontrol -m 0x179
> > > > /dev/cpuctl$x; done
> > > > command will show the issue (0x179 is the MCG_CAP MSR).
> > > > You need to load cpuctl(4) if it is not loaded yet.
> > > >
> > >
> > > I don't have INVARIANTS enabled, and I can't enable it on the production
> > > servers.  However, I can turn those three KASSERTs into VERIFYs and see
> > > what happens.  Here is what your command shows on the server that
> > panicked:
> > > $ for x in $(jot $(sysctl -n hw.ncpu) 0) ; do sudo cpucontrol -m 0x179
> > > /dev/cpuctl$x; done | uniq -c
> > >   16 MSR 0x179: 0x 0x0f000c14
> > >   16 MSR 0x179: 0x 0x0f000814
> >
> > It probably explains it, but it would be more telling if you left the
> > output as is, so that we can see which CPUs have MCG_CMCI_P (10) bit set.
> >
> 
> I didn't sort them, so the first 16 have bit 10 set and the second 16
> don't.
> 
> 
> >
> > I suspect that your machine has two sockets, and processor in one socket
> > has CPUs reporting MCG_CMCI_P, while other processor does not. Your SMP
> > is not quite symmetric, perhaps processors were from different bins?
> >
> 
> Could be.  Is there some MSR that reports a more specific version number?
There are CPUID %eax=1 values returned in %eax, but then it requires
some interpretation.
# cpucontrol -i 1 /dev/cpuctl$x
for $x iterating over the cpus.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Page fault in _mca_init during startup

2021-02-04 Thread Konstantin Belousov

On Thu, Feb 04, 2021 at 07:01:30PM -0700, Alan Somers wrote:
> On Thu, Feb 4, 2021 at 5:59 PM Konstantin Belousov 
> wrote:
> > Do you have INVARIANTS enabled?  If not, I am curious if enabling them
> > would convert that rare page fault into rare "CPU %d has more MC banks"
> > assert.
> >
> > Also might be the output of the
> > # for x in $(jot $(sysctl -n hw.ncpu) 0) ; do cpucontrol -m 0x179
> > /dev/cpuctl$x; done
> > command will show the issue (0x179 is the MCG_CAP MSR).
> > You need to load cpuctl(4) if it is not loaded yet.
> >
> 
> I don't have INVARIANTS enabled, and I can't enable it on the production
> servers.  However, I can turn those three KASSERTs into VERIFYs and see
> what happens.  Here is what your command shows on the server that panicked:
> $ for x in $(jot $(sysctl -n hw.ncpu) 0) ; do sudo cpucontrol -m 0x179
> /dev/cpuctl$x; done | uniq -c
>   16 MSR 0x179: 0x 0x0f000c14
>   16 MSR 0x179: 0x 0x0f000814

It probably explains it, but it would be more telling if you left the
output as is, so that we can see which CPUs have MCG_CMCI_P (10) bit set.

I suspect that your machine has two sockets, and processor in one socket
has CPUs reporting MCG_CMCI_P, while other processor does not. Your SMP
is not quite symmetric, perhaps processors were from different bins?

If BSP is selected on reporting socket, everything boots well. If
other socket wins the BSP selection race, cmci is not initialized, but
when per-cpu mca_init() sees CMCI_P bit, it calls cmci_setup() without
allocated cmc state, because BSP did not needed it.

If I am right, then unconditionally allocating the memory is probably the
only choice there.

commit 2e2c925ac3b626edc6492a57a80f6b87895801c2
Author: Konstantin Belousov 
Date:   Fri Feb 5 04:32:05 2021 +0200

x86 mca: unconditionally allocate memory for cmc state

diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
index 03100e77d455..dff3f7631f5c 100644
--- a/sys/x86/x86/mca.c
+++ b/sys/x86/x86/mca.c
@@ -1047,7 +1047,7 @@ mca_setup(uint64_t mcg_cap)
"force_scan", CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, 0,
sysctl_mca_scan, "I", "Force an immediate scan for machine checks");
 #ifdef DEV_APIC
-   if (cmci_supported(mcg_cap))
+   if (cpu_vendor_id == CPU_VENDOR_INTEL)
cmci_setup();
else if (amd_thresholding_supported())
amd_thresholding_setup();
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Page fault in _mca_init during startup

2021-02-04 Thread Konstantin Belousov

On Thu, Feb 04, 2021 at 05:19:43PM -0700, Alan Somers wrote:
> On Thu, Feb 4, 2021 at 4:27 PM Mark Johnston  wrote:
> 
> > On Fri, Feb 05, 2021 at 12:58:34AM +0200, Konstantin Belousov wrote:
> > > On Thu, Feb 04, 2021 at 01:34:13PM -0800, Matthew Macy wrote:
> > > > On Thu, Feb 4, 2021 at 1:31 PM Alan Somers 
> > wrote:
> > > > >
> > > > > After upgrading a machine to FreeBSD, 12.2, it hit the following
> > panic on
> > > > > its first reboot.  I suspect that a few other servers have hit this
> > too,
> > > > > but since it happens before swap is mounted there are no core dumps,
> > and
> > > > > they usually reboot immediately.  The code in question hasn't
> > changed since
> > > > > 2018.  The panic happened in cmci_monitor at line 930.  Does anybody
> > have
> > > > > any suggestions for how I could debug further?  I can't readily
> > reproduce
> > > > > it, and I can't dump core, but I'd like to investigate it any way I
> > can.
> > > > > The server in question has dual Xeon Gold 6142 CPUs.
> > > > >
> > > Try this.
> > >
> > > I think that there is no other dependencies in the startup order, but
> > > cannot know it for sure.
> > >
> > > commit 19584e3d3e9606d591fa30999b370ed758960e8c
> > > Author: Konstantin Belousov 
> > > Date:   Fri Feb 5 00:56:09 2021 +0200
> > >
> > > x86: init mca before APs are started
> >
> > APs only call mca_init() after they have been released by the BSP
> > though, and that happens later in SI_SUB_SMP.
> >
> > > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> > > index 03100e77d455..e2bf2673cf69 100644
> > > --- a/sys/x86/x86/mca.c
> > > +++ b/sys/x86/x86/mca.c
> > > @@ -1371,7 +1371,7 @@ mca_init_bsp(void *arg __unused)
> > >
> > >   mca_init();
> > >  }
> > > -SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_ANY, mca_init_bsp, NULL);
> > > +SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_SECOND, mca_init_bsp, NULL);
> > >
> > >  /* Called when a machine check exception fires. */
> > >  void
> >
> 
> kib's patch causes a different problem, and this one is reproducible:
> 
>  Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address = 0x18
> fault code = supervisor read data, page not present
> instruction pointer = 0x20:0x8125762c
> stack pointer= 0x28:0x828dad90
> frame pointer= 0x28:0x828dad90
> code segment = base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags = resume, IOPL = 0
> current process = 0 ()
> trap number = 12
> panic: page fault
> cpuid = 0
> time = 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0x828daa50
> vpanic() at vpanic+0x17b/frame 0x828daaa0
> panic() at panic+0x43/frame 0x828dab00
> trap_fatal() at trap_fatal+0x391/frame 0x828dab60
> trap_pfault() at trap_pfault+0x4f/frame 0x828dabb0
> trap() at trap+0x286/frame 0x828dacc0
> calltrap() at calltrap+0x8/frame 0x828dacc0
> --- trap 0xc, rip = 0x8125762c, rsp = 0x828dad90, rbp =
> 0x828dad90 ---
> native_lapic_enable_cmc() at native_lapic_enable_cmc+0x1c/frame
> 0x828dad90
> _mca_init() at _mca_init+0x94c/frame 0x828dadd0
> mi_startup() at mi_startup+0xdf/frame 0x828dadf0
> btext() at btext+0x2c
> KDB: enter: panic
> [ thread pid 0 tid 0 ]
> Stopped at  kdb_enter+0x37: movq$0,0x12bc396(%rip)
> 
> If you're wondering, the panic happens at this point in
> native_lapic_enable_cmc:
> 
> apic_id = PCPU_GET(apic_id);
> KASSERT(lapics[apic_id].la_present,
>("%s: missing APIC %u", __func__, apic_id));
> lapics[apic_id].la_lvts[APIC_LVT_CMCI].lvt_masked = 0;<- panic here
> lapics[apic_id].la_lvts[APIC_LVT_CMCI].lvt_active = 1;
> if (bootverbose)
> printf("lapic%u: CMCI unmasked\n", apic_id);
> }

Scratch this patch.

Do you have INVARIANTS enabled?  If not, I am curious if enabling them
would convert that rare page fault into rare "CPU %d has more MC banks"
assert.

Also might be the output of the
# for x in $(jot $(sysctl -n hw.ncpu) 0) ; do cpucontrol -m 0x179 
/dev/cpuctl$x; done
command will show the issue (0x179 is the MCG_CAP MSR).
You need to load cpuctl(4) if it is not loaded yet.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Page fault in _mca_init during startup

2021-02-04 Thread Konstantin Belousov

On Thu, Feb 04, 2021 at 04:05:42PM -0700, Alan Somers wrote:
> On Thu, Feb 4, 2021 at 3:58 PM Konstantin Belousov 
> wrote:
> 
> > On Thu, Feb 04, 2021 at 01:34:13PM -0800, Matthew Macy wrote:
> > > On Thu, Feb 4, 2021 at 1:31 PM Alan Somers  wrote:
> > > >
> > > > After upgrading a machine to FreeBSD, 12.2, it hit the following panic
> > on
> > > > its first reboot.  I suspect that a few other servers have hit this
> > too,
> > > > but since it happens before swap is mounted there are no core dumps,
> > and
> > > > they usually reboot immediately.  The code in question hasn't changed
> > since
> > > > 2018.  The panic happened in cmci_monitor at line 930.  Does anybody
> > have
> > > > any suggestions for how I could debug further?  I can't readily
> > reproduce
> > > > it, and I can't dump core, but I'd like to investigate it any way I
> > can.
> > > > The server in question has dual Xeon Gold 6142 CPUs.
> > > >
> > >
> > > I can't actually help :( but I can add a +1  with similar hardware or
> > > equivalent specs. It's not frequent, but it's often enough to be
> > > annoying.
> > > -M
> > >
> > > > if (!(ctl & MC_CTL2_CMCI_EN))
> > > > /* This bank does not support CMCI. */
> > > > return;
> > > >
> > > > cc = _state[PCPU_GET(cpuid)][i];// <- panic here
> > > >
> > > > /* Determine maximum threshold. */
> > > >
> > > >
> > > > Fatal trap 12: page fault while in kernel mode
> > > > cpuid = 26; apic id = 34
> > > > fault virtual address = 0xd0
> > > > fault code = supervisor read data, page not present
> > > > instruction pointer = 0x20:0x8125a009
> > > > stack pointer= 0x28:0xfeb65f20
> > > > frame pointer= 0x28:0xfeb65f50
> > > > code segment = base 0x0, limit 0xf, type 0x1b
> > > > = DPL 0, pres 1, long 1, def32 0, gran 1
> > > > processor eflags = resume, IOPL = 0
> > > > current process = 11 (idle: cpu26)
> > > > trap number = 12
> > > > panic: page fault
> > > > cpuid = 26
> > > > time = 1
> > > > KDB: stack backtrace:
> > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > > > 0xfeb65be0
> > > > vpanic() at vpanic+0x17b/frame 0xfeb65c30
> > > > panic() at panic+0x43/frame 0xfeb65c90
> > > > trap_fatal() at trap_fatal+0x391/frame 0xfeb65cf0
> > > > trap_pfault() at trap_pfault+0x4f/frame 0xfeb65d40
> > > > trap() at trap+0x286/frame 0xfeb65e50
> > > > calltrap() at calltrap+0x8/frame 0xfeb65e50
> > > > --- trap 0xc, rip = 0x8125a009, rsp = 0xfeb65f20, rbp =
> > > > 0xfeb65f50 ---
> > > > _mca_init() at _mca_init+0x5d9/frame 0xfeb65f50
> > > > init_secondary_tail() at init_secondary_tail+0xfd/frame
> > 0xfeb65f80
> > > > init_secondary() at init_secondary+0x2d1/frame 0xfeb65ff0
> > > > KDB: enter: panic
> > > > [ thread pid 11 tid 100029 ]
> > > > Stopped at  kdb_enter+0x37: movq$0,0x12bc1f6(%rip)
> >
> > Try this.
> >
> > I think that there is no other dependencies in the startup order, but
> > cannot know it for sure.
> >
> > commit 19584e3d3e9606d591fa30999b370ed758960e8c
> > Author: Konstantin Belousov 
> > Date:   Fri Feb 5 00:56:09 2021 +0200
> >
> > x86: init mca before APs are started
> >
> > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> > index 03100e77d455..e2bf2673cf69 100644
> > --- a/sys/x86/x86/mca.c
> > +++ b/sys/x86/x86/mca.c
> > @@ -1371,7 +1371,7 @@ mca_init_bsp(void *arg __unused)
> >
> > mca_init();
> >  }
> > -SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_ANY, mca_init_bsp, NULL);
> > +SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_SECOND, mca_init_bsp, NULL);
> >
> >  /* Called when a machine check exception fires. */
> >  void
> >
> 
> I can test this patch on development servers, but so far I've only seen the
> crash on production servers.  Do you have any suggestions for how to force
> the crash, or how to test this patch besides simply making sure that my dev
> servers can boot?

The race, as I see it, is that we call mca_init() on BSP too late, so
malloc() that provides the storage for cmc_state array, could be called
too late, before one of the APs was IPIed for startup.

Patch ensures that mca_init_bsp() SYSINIT is finished before we go to
start the APs.

I do not think there is any reliable way to trigger the panic while keeping
the patch usable, except to observe enough successfull boots.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Page fault in _mca_init during startup

2021-02-04 Thread Konstantin Belousov

On Thu, Feb 04, 2021 at 01:34:13PM -0800, Matthew Macy wrote:
> On Thu, Feb 4, 2021 at 1:31 PM Alan Somers  wrote:
> >
> > After upgrading a machine to FreeBSD, 12.2, it hit the following panic on
> > its first reboot.  I suspect that a few other servers have hit this too,
> > but since it happens before swap is mounted there are no core dumps, and
> > they usually reboot immediately.  The code in question hasn't changed since
> > 2018.  The panic happened in cmci_monitor at line 930.  Does anybody have
> > any suggestions for how I could debug further?  I can't readily reproduce
> > it, and I can't dump core, but I'd like to investigate it any way I can.
> > The server in question has dual Xeon Gold 6142 CPUs.
> >
> 
> I can't actually help :( but I can add a +1  with similar hardware or
> equivalent specs. It's not frequent, but it's often enough to be
> annoying.
> -M
> 
> > if (!(ctl & MC_CTL2_CMCI_EN))
> > /* This bank does not support CMCI. */
> > return;
> >
> > cc = _state[PCPU_GET(cpuid)][i];// <- panic here
> >
> > /* Determine maximum threshold. */
> >
> >
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 26; apic id = 34
> > fault virtual address = 0xd0
> > fault code = supervisor read data, page not present
> > instruction pointer = 0x20:0x8125a009
> > stack pointer= 0x28:0xfeb65f20
> > frame pointer= 0x28:0xfeb65f50
> > code segment = base 0x0, limit 0xf, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags = resume, IOPL = 0
> > current process = 11 (idle: cpu26)
> > trap number = 12
> > panic: page fault
> > cpuid = 26
> > time = 1
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > 0xfeb65be0
> > vpanic() at vpanic+0x17b/frame 0xfeb65c30
> > panic() at panic+0x43/frame 0xfeb65c90
> > trap_fatal() at trap_fatal+0x391/frame 0xfeb65cf0
> > trap_pfault() at trap_pfault+0x4f/frame 0xfeb65d40
> > trap() at trap+0x286/frame 0xfeb65e50
> > calltrap() at calltrap+0x8/frame 0xfeb65e50
> > --- trap 0xc, rip = 0x8125a009, rsp = 0xfeb65f20, rbp =
> > 0xfeb65f50 ---
> > _mca_init() at _mca_init+0x5d9/frame 0xfeb65f50
> > init_secondary_tail() at init_secondary_tail+0xfd/frame 0xfeb65f80
> > init_secondary() at init_secondary+0x2d1/frame 0xfffffeb65ff0
> > KDB: enter: panic
> > [ thread pid 11 tid 100029 ]
> > Stopped at  kdb_enter+0x37: movq$0,0x12bc1f6(%rip)

Try this.

I think that there is no other dependencies in the startup order, but
cannot know it for sure.

commit 19584e3d3e9606d591fa30999b370ed758960e8c
Author: Konstantin Belousov 
Date:   Fri Feb 5 00:56:09 2021 +0200

x86: init mca before APs are started

diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
index 03100e77d455..e2bf2673cf69 100644
--- a/sys/x86/x86/mca.c
+++ b/sys/x86/x86/mca.c
@@ -1371,7 +1371,7 @@ mca_init_bsp(void *arg __unused)
 
mca_init();
 }
-SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_ANY, mca_init_bsp, NULL);
+SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_SECOND, mca_init_bsp, NULL);
 
 /* Called when a machine check exception fires. */
 void
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: msdosfs umount problem

2020-12-04 Thread Konstantin Belousov

On Fri, Dec 04, 2020 at 04:39:43PM +0300, Özkan KIRIK wrote:
> # ps alxww | grep umount
>0 13556  1360  0  20  0 10716  2076 mntref   D+10:00.00 umount
> -vf /mnt
> 
> checkedout version:
> commit e7cae3d24a3e8d036c763f294bcd9883a180c36a (freebsd/stable/12)
> https://github.com/freebsd/freebsd/commits/stable/12 (still it's the last
> commit)
> 
> On Fri, Dec 4, 2020 at 4:26 PM Konstantin Belousov  wrote:
> 
> > On Fri, Dec 04, 2020 at 03:19:31PM +0300, Özkan KIRIK wrote:
> > > Hello,
> > >
> > > I've just checkedout stable/12 branch. Unmounting msdosfs blocks
> > >
> > > # newfs_msdos -C 40M efiboot.img
> > > efiboot.img: 81800 sectors in 10225 FAT16 clusters (4096 bytes/cluster)
> > > BytesPerSec=512 SecPerClust=8 ResSectors=1 FATs=2 RootDirEnts=512
> > > Media=0xf0 FATsecs=40 SecPerTrack=63 Heads=255 HiddenSecs=0
> > > HugeSectors=81920
> > > # mdconfig ./efiboot.img
> > > md0
> > > # mount -t msdosfs /dev/md0 /mnt
> > > # umount -vf /mnt(uninterruptable wait)
> > >
> > > On other termial:
> > > # ps ax | grep umount
> > > 1333  0  D+ 0:00.00 umount -vf /mnt
> > >
> > > There is not such a problem on 12.1 stable
> > Which revision did you checked out ?
> > Show the mchan of the umount process, like 'px alxww | grep umount'.
> >

This should be fixed by r368339.
Thank you for the report.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 12-STABLE try to init thead-using libraries before threads and program crashes

2020-11-27 Thread Konstantin Belousov

On Fri, Nov 27, 2020 at 06:03:13PM +0300, Lev Serebryakov wrote:
> 
>  I have locally-built net/samba413 port on 12-STABLE (r367937) which crashes 
> in library initialization code due to wrong library initialization order:
> 
> (No debugging symbols found in /usr/local/bin/testparm)
> (gdb) b  _libpthread_init
> Function "_libpthread_init" not defined.
> Make breakpoint pending on future shared library load? (y or [n]) y
> Breakpoint 1 (_libpthread_init) pending.
> (gdb) run
> Starting program: /usr/local/bin/testparm
> 
> Program received signal SIGSEGV, Segmentation fault.
> thr_malloc_lock (curthread=0x801e077d0) at 
> /usr/src/lib/libthr/thread/thr_malloc.c:66
> 66  curthread->locklevel++;
> (gdb) bt
> #0  thr_malloc_lock (curthread=0x801e077d0) at 
> /usr/src/lib/libthr/thread/thr_malloc.c:66
> #1  __thr_calloc (num=1, size=96) at 
> /usr/src/lib/libthr/thread/thr_malloc.c:88
> #2  0x000801474843 in mutex_init (mutex=0x801072008, 
> mutex_attr=, calloc_cb=) at 
> /usr/src/lib/libthr/thread/thr_mutex.c:295
> #3  __Tthr_mutex_init (mutex=0x801072008, mutex_attr=) at 
> /usr/src/lib/libthr/thread/thr_mutex.c:395
> #4  0x0008016d62fc in ?? () from /usr/local/lib/libgnutls.so.30
> #5  0x0008016cfcb3 in ?? () from /usr/local/lib/libgnutls.so.30
> #6  0x0008016d0077 in ?? () from /usr/local/lib/libgnutls.so.30
> #7  0x00080103730d in objlist_call_init (list=, 
> lockstate=) at /usr/src/libexec/rtld-elf/rtld.c:2823
> #8  0x00080103603d in _rtld (sp=0x7fffeb58, exit_proc=0x7fffeb20, 
> objp=0x7fffeb28) at /usr/src/libexec/rtld-elf/rtld.c:811
> #9  0x0008010338c9 in rtld_start () at 
> /usr/src/libexec/rtld-elf/amd64/rtld_start.S:39
> #10 0x in ?? ()
> (gdb)
> 
>  Please note, that `_libpthread_init` HAS BEEN NOT CALLED before 
> `_Tthr_mutex_init`.
> 
>  Looks like some corner-case problem in rtld?
> 
>  Link command for this program is:
> 
> [3517/3660] Linking bin/default/source3/utils/testparm
> runner ['cc', 'source3/utils/testparm.c.41.o', 
> '-o/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/source3/utils/testparm',
>  '-Wl,-Bstatic', '-Wl,-Bdynamic', 
> '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/source4/heimdal_build',
>  
> '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/source4/lib/events',
>  
> '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/lib/tdb_wrap',
>  
> '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/libcli/security',
>  '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/librpc', 
> '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/libcli/registry',
>  '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/lib', 
> '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/lib/dbwrap', 
> '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/lib/socket', 
> '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/lib/param', 
> '-L/wrkdirs/usr/ports/net/s
 am
> ba413/work/samba-4.13.1/bin/default/lib/messaging', 
> '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/lib/util', 
> '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/libcli/util',
>  
> '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/lib/replace',
>  '-L/wrkdirs/usr/ports/net/samba413/work/samba-4.13.1/bin/default/source3', 
> '-L/usr/local/lib', '-L/usr/local/lib', '-L/usr/local/lib', 
> '-L/usr/local/lib', '-L/usr/local/lib', '-L/usr/local/lib', 
> '-lpopt-samba3-samba4', '-lsmbconf', '-lreplace-samba4', '-lsamba-errors', 
> '-lcmdline-contexts-samba4', '-lsamba-util', '-lsamba3-util-samba4', 
> '-lmessages-dgm-samba4', '-lsys-rw-samba4', '-lmessages-util-samba4', 
> '-liov-buf-samba4', '-lsamba-hostconfig', '-lsocket-blocking-samba4', 
> '-linterfaces-samba4', '-ldbwrap-samba4', '-ltevent-util', 
> '-lsamba-sockets-samba4', '-lutil-reg-samba4', '-lutil-tdb-samba4', '-lndr', 
> '-ltalloc-report-printf-samba4', '-lserver-id-db-samba4', 
> '-lsamba-cluster-support-samba4', '-l
 C
> HARSET3-samba4', '-lsamba-security-samba4', '-lsmbd-s
> him-samba4', '-lsamba-debug-samba4', '-lgenrand-samba4', 
> '-ltime-basic-samba4', '-lutil-setid-samba4', '-lmsghdr-samba4', 
> '-lserver-role-samba4', '-ltdb-wrap-samba4', '-levents-samba4', '-lndr-nbt', 
> '-lroken-samba4', '-lexecinfo', '-ltevent', '-ltalloc', '-lpthread', 
> '-lutil', '-lunwind-generic', '-lunwind', '-liconv', '-lz', '-ltdb', 
> '-lpopt', '-lgnutls', '-ltalloc', '-fstack-protector-strong', 
> '-L/usr/local/lib', '-pie', '-Wl,-z,relro,-z,now', '-Wl,-no-undefined', 
> '-Wl,--export-dynamic']
> 

libthr is cleanly linked too early, it should come after all consumers.
Anyway, try this.

diff --git a/lib/libthr/thread/thr_mutex.c b/lib/libthr/thread/thr_mutex.c
index 57984ef6d0e..303386db7fe 100644
--- a/lib/libthr/thread/thr_mutex.c
+++ b/lib/libthr/thread/thr_mutex.c
@@ -384,6 +384,8 @@

Re: mmap and MAP_STACK

2020-10-21 Thread Konstantin Belousov

On Wed, Oct 21, 2020 at 06:18:50PM +0300, Nick Kostirya via freebsd-stable 
wrote:
> On Wed, 21 Oct 2020 17:16:57 +0300
> Konstantin Belousov  wrote:
> 
> > On Wed, Oct 21, 2020 at 04:53:11PM +0300, Nick Kostirya via freebsd-stable 
> > wrote:
> > > Hello.
> > > I have question about mmap.
> > > 
> > > void *OSMem::AllocateDataArea(size_t )
> > > {
> > > // Round up to an integral number of pages.
> > > space = (space + pageSize-1) & ~(pageSize-1);
> > > int fd = -1; // This value is required by FreeBSD.  Linux doesn't care
> > > int flags = MAP_PRIVATE | MAP_ANON;
> > > #ifdef MAP_STACK
> > > if (memUsage == UsageStack) flags |= MAP_STACK; // OpenBSD seems to 
> > > require this
> > > #endif
> > > void *result = mmap(0, space, PROT_READ|PROT_WRITE, flags, fd, 0);
> > > // Convert MAP_FAILED (-1) into NULL
> > > if (result == MAP_FAILED)
> > > return 0;
> > > return result;
> > > }
> > > 
> > > 
> > > When MAP_STACK is used, "insufficient memory" error occurs.
> > > When MAP_STACK removed, it is all right.
> > > 
> > > Please tell me why.  
> > Show ktrace/kdump output of the mmap(2) without and with MAP_STACK.
> > 
> > Or provide a minimal self-contained C source that demonstrates your
> > issue.
> 
> kdump with MAP_STACK.
> 
>  87183 polyimport CALL  
> mmap(0,0x1000,0x3,0x1402,0x,0,0)
>  87183 polyimport RET   mmap -1 errno 22 Invalid argument
So it is anything but 'insufficient memory' (I suspected ENOMEM).
EINVAL there is because sysctl security.bsd.stack_guard_page default value
is 1, which means that at least one page of the stack is reserved as guard.
Kernel does not allow to map stack that would have no data pages (all pages
are guard).

Your mapping request is for one page, and one page is due to guard, so
you get EINVAL.  Generally MAP_STACK is magic and requires caller to know
what it does.

> 
> 
> kdump without MAP_STACK.
> 
>  93712 polyimport CALL  
> mmap(0,0x1000,0x3,0x1002,0x,0,0)
>  93712 polyimport RET   mmap 547053568/0x209b6000
> 
> 
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: mmap and MAP_STACK

2020-10-21 Thread Konstantin Belousov

On Wed, Oct 21, 2020 at 04:53:11PM +0300, Nick Kostirya via freebsd-stable 
wrote:
> Hello.
> I have question about mmap.
> 
> void *OSMem::AllocateDataArea(size_t )
> {
> // Round up to an integral number of pages.
> space = (space + pageSize-1) & ~(pageSize-1);
> int fd = -1; // This value is required by FreeBSD.  Linux doesn't care
> int flags = MAP_PRIVATE | MAP_ANON;
> #ifdef MAP_STACK
> if (memUsage == UsageStack) flags |= MAP_STACK; // OpenBSD seems to 
> require this
> #endif
> void *result = mmap(0, space, PROT_READ|PROT_WRITE, flags, fd, 0);
> // Convert MAP_FAILED (-1) into NULL
> if (result == MAP_FAILED)
> return 0;
> return result;
> }
> 
> 
> When MAP_STACK is used, "insufficient memory" error occurs.
> When MAP_STACK removed, it is all right.
> 
> Please tell me why.
Show ktrace/kdump output of the mmap(2) without and with MAP_STACK.

Or provide a minimal self-contained C source that demonstrates your
issue.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: How to free used Swap-Space? (from errno=8)

2020-09-22 Thread Konstantin Belousov

On Wed, Sep 23, 2020 at 12:03:32AM +0300, Konstantin Belousov wrote:
> On Tue, Sep 22, 2020 at 09:11:49PM +0200, Peter wrote:
> > So what happens then is this:
> > 
> > $ file scc.e
> > scc.e: ELF 32-bit LSB executable, Intel 80386, version 1
> > (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1,
> > for FreeBSD 9.3 (903504), stripped
> > 
> > $ ./scc.e
> > ELF interpreter /libexec/ld-elf.so.1 not found, error 8
> > Abort trap
> > 
> > And this will cost about some (hundred?) kB of swapspace every time it
> > happens. And they do not go away again, neither can the concerned jail
> > do fully die again.
> In what sense it 'costs' ?
> 
> Can you show exact sequence of commands and outputs that demostrate your
> point ?  What type of filesystem the binaries live on ?
> 
> I want to reproduce it locally.

I suspect that https://reviews.freebsd.org/D26525 should fix it.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: How to free used Swap-Space? (from errno=8)

2020-09-22 Thread Konstantin Belousov

On Tue, Sep 22, 2020 at 09:11:49PM +0200, Peter wrote:
> So what happens then is this:
> 
> $ file scc.e
> scc.e: ELF 32-bit LSB executable, Intel 80386, version 1
> (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1,
> for FreeBSD 9.3 (903504), stripped
> 
> $ ./scc.e
> ELF interpreter /libexec/ld-elf.so.1 not found, error 8
> Abort trap
> 
> And this will cost about some (hundred?) kB of swapspace every time it
> happens. And they do not go away again, neither can the concerned jail
> do fully die again.
In what sense it 'costs' ?

Can you show exact sequence of commands and outputs that demostrate your
point ?  What type of filesystem the binaries live on ?

I want to reproduce it locally.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Commit 364003 causes immediate restart

2020-08-07 Thread Konstantin Belousov

On Fri, Aug 07, 2020 at 03:09:00PM +0200, peter.b...@bsd4all.org wrote:
> Hi,
> 
> After commit 364003 STABLE-12 reboots almost immediately. No error message, 
> not dump. Just a reboot.
> 
> Last working commit 364002.
> 
> Please let me know what is needed - acpidump or something like that.
Why did not you added the committer to Cc: ?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Laundry

2020-07-26 Thread Konstantin Belousov

On Sun, Jul 26, 2020 at 01:11:33PM -0700, Doug Hardie wrote:
> I have a production system (12.1-RELEASE-p6) that is showing around 1 GB of 
> Laundry pages.  There are over 6 Gb Inact and 1 Gb free.  I can understand 
> why the system would want to not prioritize laundering those pages as there 
> is plenty of available pages.  However, does that mean that I have about 1 GB 
> of updated files that have not been written back to disk?  If so, then there 
> is a significant issue with power failures and loss of data.
> 
Laundry keeps both file-backed (named) pages and swap-backed (anonymous)
pages. Most likely it means that you have 1G of anonymous dirty
mappings, for instance programs data/bss and malloced.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: svn commit: r362848 - in stable/12/sys: net netinet sys

2020-07-20 Thread Konstantin Belousov

On Tue, Jul 21, 2020 at 07:20:44AM +1000, Peter Jeremy wrote:
> On 2020-Jul-19 14:48:28 +0300, Konstantin Belousov  
> wrote:
> >On Sun, Jul 19, 2020 at 09:21:02PM +1000, Peter Jeremy wrote:
> >> I'm sending this to -stable, rather than the src groups because I
> >> don't believe the problem is the commit itself, rather the commit
> >> has uncovered a latent problem elsewhere.
> >> 
> >> On 2020-Jul-01 18:03:38 +, Michael Tuexen  wrote:
> >> >Author: tuexen
> >> >Date: Wed Jul  1 18:03:38 2020
> >> >New Revision: 362848
> >> >URL: https://svnweb.freebsd.org/changeset/base/362848
> >> >
> >> >Log:
> >> >  MFC r353480: Use event handler in SCTP
> >> 
> >> I have no idea how, but this update breaks booting amd64 for me (r362847
> >> works and this doesn't).  I have a custom kernel with ZFS but no SCTP so I
> >> have no real idea how this could break booting - presumably the
> >> eventhandler change has uncovered a bug somewhere else.
> >> 
> >> The symptoms are that I get:
> >> Mounting from zfs:zroot/ROOT/r363310 failed with error 6; retrying for 3 
> >> more seconds
> >> Mounting from zfs:zroot/ROOT/r363310 failed with error 6
> >> 
> >> (r363310 is where I was trying to update to and I didn't change the BE
> >> name as I was searching for the problem and error 6 is ENXIO).
> >> 
> >> I tried to reproduce the problem with GENERIC but it hangs after
> >> displaying the EFI framebuffer information (I've seen that before and
> >> suspect it is a loader problem but haven't dug into it).
> 
> I've confirmed that particular problem is bug 209821.  I've disabled
> EFI and GENERIC r362848 boots and runs successfully.
Did you mis-typed the PR number ?   The referenced bug talks about very
early hang, while your report said that kernel boots up to the point of
mounting root.

> 
> >> Does anyone have any ideas?
> >
> >Did you checked that the physical devices where your ZFS pool is located,
> >are detected, and that kernel messages for their drivers are as usual ?
> >Overall, is there anything strange in the verbose dmesg ?
> 
> There's nothing obviously strange (in particular, I can see the physical
> boot/root disk) but the faulty kernel appears to have moved the msgbuf
> somewhere unexpected so it's not saved across reboots and I'm limited to
> eyeballing the messages via DDB.
> 
> Since GENERIC worked, I did some more experimenting and tracked the
> problem down to a lack of "options ACPI_DMAR" in my kernel config.
> That makes more sense, though I have no idea why it suddenly became
> mandatory for my system.
No, this does not make too much sense either, since DMAR is disabled
by default.  Did you enabled it ?

BTW, you are using stable, right ?  There were some code reorganization
commits in HEAD moving DMAR code around, but they were not merged to
stable.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: svn commit: r362848 - in stable/12/sys: net netinet sys

2020-07-19 Thread Konstantin Belousov

On Sun, Jul 19, 2020 at 09:21:02PM +1000, Peter Jeremy wrote:
> I'm sending this to -stable, rather than the src groups because I
> don't believe the problem is the commit itself, rather the commit
> has uncovered a latent problem elsewhere.
> 
> On 2020-Jul-01 18:03:38 +, Michael Tuexen  wrote:
> >Author: tuexen
> >Date: Wed Jul  1 18:03:38 2020
> >New Revision: 362848
> >URL: https://svnweb.freebsd.org/changeset/base/362848
> >
> >Log:
> >  MFC r353480: Use event handler in SCTP
> 
> I have no idea how, but this update breaks booting amd64 for me (r362847
> works and this doesn't).  I have a custom kernel with ZFS but no SCTP so I
> have no real idea how this could break booting - presumably the
> eventhandler change has uncovered a bug somewhere else.
> 
> The symptoms are that I get:
> Mounting from zfs:zroot/ROOT/r363310 failed with error 6; retrying for 3 more 
> seconds
> Mounting from zfs:zroot/ROOT/r363310 failed with error 6
> 
> (r363310 is where I was trying to update to and I didn't change the BE
> name as I was searching for the problem and error 6 is ENXIO).
> 
> I tried to reproduce the problem with GENERIC but it hangs after
> displaying the EFI framebuffer information (I've seen that before and
> suspect it is a loader problem but haven't dug into it).
> 
> Does anyone have any ideas?

Did you checked that the physical devices where your ZFS pool is located,
are detected, and that kernel messages for their drivers are as usual ?
Overall, is there anything strange in the verbose dmesg ?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Buildworld and buildkernel with very slow compilation, recently

2020-06-21 Thread Konstantin Belousov

On Sun, Jun 21, 2020 at 02:48:51PM +0200, Ronald Klop wrote:
> Building ports/pkgs is also significantly slower on 13 (new clang) than on 
> 12.1.
> http://thunderx1.nyi.freebsd.org/
> 13 = 140 hours12.1 = 103 hoursFor roughly the same amount of ports.
If you use stock HEAD GENERIC kernel config and stock-built libc,
it is expected.  Both kernel and jemalloc has extensive debugging turned
on, on HEAD.

> 
> Regards, Ronald
> 
> Van: Michael Grimm 
> Datum: 21 juni 2020 14:12
> Aan: FreeBSD-STABLE Mailing List 
> Onderwerp: Buildworld and buildkernel with very slow compilation, recently
> 
> > 
> > 
> > Hi,
> > 
> > I am following FreeBSD 12.1-STABLE.
> > 
> > Clang has been upgraded to version 10.0.0 on May, 1st, and ever since that 
> > time, I do observe a dramatic increase in compilation times of building 
> > world, kernel and ports. I didn't benchmark the exact times, but 
> > compilation times are at least increased by a factor of 1.5. Nothing has 
> > changed of the last month besides upgrading 12.1-Stable every other week.
> > 
> > Has anyone else been bitten by this?
> > 
> > Regards,
> > Michael
> > ___
> > freebsd-stable@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> > 
> > 
> > 
> > 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Upgrading to 12.1S 362003 - a few issues

2020-06-13 Thread Konstantin Belousov

On Sat, Jun 13, 2020 at 12:34:46PM +1000, Dewayne Geraghty wrote:
> Hi Konstantin,
> I did try ktrace/kdump but kdump complained of "data too short".  Using
> your suggestion about LD_DEBUG nearly caused me to fall off my chair.  I
> think this is most relevant:
> 
> # setenv LD_DEBUG 1
> # ktrace -f /tmp/sq3.kt /usr/local/sbin/squid start
> /libexec/ld-elf.so.1 is initialized, base address = 0x8aded000
> RTLD dynamic = 0x8ae0f6c0
> RTLD pltgot  = 0
> initializing thread locks
> _rtld_thread_init: done
> processing main program's program header
> note osrel 1201517
> note fctl0 0
> note crt_no_init
> AT_EXECPATH 0xdfe0 /usr/bin/ktrace
> obj_main path /usr/bin/ktrace
> Filling in DT_DEBUG entry
> Ignoring d_tag 1879048186 = 0x6ffa
> /usr/bin/ktrace valid_hash_sysv 1 valid_hash_gnu 0 dynsymcount 30
> lm_init("(null)")
> loading LD_PRELOAD libraries
> loading needed objects
>  Searching for "libc.so.7"
> search_library_pathfds('libc.so.7', '(null)', fdp)
> lm_find("/usr/bin/ktrace", "/lib")
> lmp_find("/usr/bin/ktrace")
> lmp_find("$DEFAULT$")
>   Trying "/lib/libc.so.7"
>   Opened "/lib/libc.so.7", fd 3
> loading "/lib/libc.so.7"
> Ignoring d_tag 1879048186 = 0x6ffa
> /lib/libc.so.7 valid_hash_sysv 1 valid_hash_gnu 1 dynsymcount 3126
>   0x6d20f000 .. 0x6d3eafff: /lib/libc.so.7
> checking for required versions
> initializing initial thread local storage offsets
> relocating "/usr/bin/ktrace"
> reloc_jmpslot: *0x4b177c04 = 0x6d39bac0
> reloc_jmpslot: *0x4b177c08 = 0x6d3b6480
> ... [ Lots of these ]
> reloc_jmpslot: *0x9d4859fc = 0x9d46e560
> reloc_jmpslot: *0x9d485a00 = 0x9d47a428
> reloc_jmpslot: *0x9d485a04 = 0x9d47a448
> relocating "/usr/local/lib/heimdal/libasn1.so.8"
> relocating "/usr/local/lib/heimdal/libwind.so.0"
> relocating "/usr/local/lib/heimdal/libheimbase.so.1"
> relocating "/usr/local/lib/heimdal/libhx509.so.5"
> relocating "/usr/local/lib/heimdal/libhcrypto.so.4"
> relocating "/usr/local/lib/heimdal/libheimsqlite.so.0"
> relocating "/usr/local/lib/heimdal/libcom_err.so.1"
> relocating "/usr/local/lib/heimdal/libroken.so.18"
> relocating "/usr/local/lib/libintl.so.8"
> relocating "/usr/local/lib/heimdal/libheimntlm.so.0"
> doing copy relocations
> initializing initial thread local storage
> initializing key program variables
> "__progname": *0x73be38b8 <-- 0xdd44
> "environ": *0x73cbbc08 <-- 0xfffc69cc
> "__elf_aux_vector": *0x9d4a6a5c <-- 0xfffc6a4c
> resolving ifuncs
> reloc_jmpslot: *0x9d484d88 = 0x9d352cc0
> reloc_jmpslot: *0x9d484d8c = 0x9d352c70
> reloc_jmpslot: *0x9cb16e04 = 0x9d352c70
> reloc_jmpslot: *0x9cb16e0c = 0x9d352cc0
> calling init function for /lib/libc.so.7 at 0x9d47c440
> calling init function for /lib/libc.so.7 at 0x9d40fc20
> calling init function for /lib/libc.so.7 at 0x9d3f0f40
> calling init function for /lib/libthr.so.3 at 0x9cb15210
> calling init function for /lib/libthr.so.3 at 0x9cb10cb0
> _rtld_thread_init: done
> calling init function for /lib/libgcc_s.so.1 at 0x9bfad540
> calling init function for /lib/libgcc_s.so.1 at 0x9bfa3de0
> calling init function for /lib/libcxxrt.so.1 at 0x9b698480
> calling init function for /usr/lib/libc++.so.1 at 0x9ab43d90
> calling init function for /usr/lib/libc++.so.1 at 0x9aae3e40
> calling init function for /lib/libcrypt.so.5 at 0x965d3210
> calling init function for /usr/local/lib/libcrypto.so.11 at 0x96e9f420
> calling init function for /usr/local/lib/heimdal/libroken.so.18 at
> 0xa32b6f60
> calling init function for /usr/local/lib/libintl.so.8 at 0xa23cb104
> calling init function for /usr/local/lib/heimdal/libheimbase.so.1 at
> 0x9ed15de0
> calling init function for /usr/local/lib/heimdal/libcom_err.so.1 at
> 0x9fec3768
> calling init function for /usr/local/lib/heimdal/libasn1.so.8 at 0x9e23f054
> calling init function for /usr/local/lib/heimdal/libhcrypto.so.4 at
> 0x9fb5641c
> calling init function for /usr/local/lib/heimdal/libheimsqlite.so.0 at
> 0xa1b9603c
> calling init function for /usr/local/lib/heimdal/libwind.so.0 at 0x9f77a990
> calling init function for /usr/local/lib/heimdal/libhx509.so.5 at 0xa0c4e3b4
> calling init function for /usr/local/lib/heimdal/libkrb5.so.26 at 0x99527188
> calling init function for /usr/local/lib/heimdal/libheimntlm.so.0 at
> 0xa498c814
> calling init function for /usr/local/lib/heimdal/libgssapi.so.3 at
> 0x9a45bfd0
> calling init function for /usr/local/lib/libpcre.so.1 at 0x97550ca8
> calling init function for /usr/local/lib/libpcreposix.so.0 at 0x9728e518
> calling init function for /lib/libm.so.5 at 0x97139f10
> calling init function for /usr/local/lib/libssl.so.11 at 0x98579078
> calling init function for /usr/lib/libregex.so.1 at 0x95e8f0a0
> calling init function for /usr/lib/librt.so.1 at 0x953ca240
> loading filtees
> enforcing main obj relro
> transferring control to program entry point = 0x73757360
So the app code started executing.

> "atexit" in "squid" ==> 0x9d457ac0 in "libc.so.7"
> reloc_jmpslot: *0x73be32b4 = 0x9d457ac0
> "_ZSt13set_terminatePFvvE"

Re: Upgrading to 12.1S 362003 - a few issues

2020-06-12 Thread Konstantin Belousov

On Sat, Jun 13, 2020 at 10:24:49AM +1000, Dewayne Geraghty wrote:
> After upgrading to 12.1Stable as of June 11:
> 1) squid - fails with segmentation fault, ldd "Cannot load PIE binary"
> 2) gcc9 - suffers a cc1 internal compiler error
> 3) pkg-static - issues "failed" messages, unable to package or install
> 
> Environment Xeon E3, ufs2 only, previously running FreeBSD 12.1 dated
> 1st May (from kernel.old).
> Prior to the upgrade all ports were rebuilt without issue, but NOT
> installed as they were a fall-back, in the event that clang 10 caused
> issues (the concern).  There are multiple jails on this system, both
> amd64 and i386 - some for building, testing and production use.  One of
> the production i386 jails runs squid, unchanged since Sept 2019.
> 
> /etc/src.conf contains
> WITH_PIE=YES
> WITH_BIND_NOW=YES
> 
> Most of our 1400+ ports are built and run with relro, now, pie and where
> possible with noexecstack &/or no-common.  These functioned in an ASLR
> environment.  (ASLR is only disabled during builds (gcc9 complains), or
> when there's a problem, now).
> 
> Note: NONE of the ports were rebuilt after the upgrade. However as part
> of resolution, beep and squid were rebuilt.
> 
> === Sequence of thigns ===
> 
> Upgrade performed.  System rebooted without incident to
>   FreeBSD 12.1-STABLE #0 r362003M: Thu Jun 11 23:07:00 AEST 2020  i386
> hqdev-amd64-smp-vga 1201517 1201517
> but some port/application failures:
> 
> Problem 1
> -
> 
> i386 jail demonstrated:
> 
> # /usr/local/etc/rc.d/squid start
> Starting squid.
> Segmentation fault
Segmentation fault means that image was activated and kernel handed control
to usermode.  Try to debug it some, for instance use ktrace and LD_DEBUG=1
to see how much things progressed.

> 
> # ldd /usr/local/sbin/squid
> /usr/local/sbin/squid:
> ldd: /usr/local/sbin/squid: Cannot load PIE binary /usr/local/sbin/squid
> as DSO
> /usr/local/sbin/squid: exit status 1
This is cosmetics, the problem is in ldd(1) mis-detecting PIE binary as
DSO.  Before some changes in rtld it was innocent, but since from recent
times rtld refuses to dlopen(3) PIE binaries, method that ldd uses for
DSO no longer works.

It will take some time to fix ldd, because it needs to start parsing
dynamic segment for DF_1_FLAGS.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: vt [was: Re: [Bug 235564] INDEX.keymaps for vt contains "from-" keymaps but the files are missing]

2020-03-09 Thread Konstantin Belousov

On Mon, Mar 09, 2020 at 10:37:52AM -0400, Ed Maste wrote:
> On Sun, 8 Mar 2020 at 17:13, Andy Farkas  wrote:
> >
> > Is anyone actually working on the vt(4) driver?  Will it ever
> > become feature-parity with the old sc(4) driver?
> 
> Yes, and yes. What specific missing functionality are you affected by?
> 
> > I've noticed some weird things happening on my console recently...
> > like psychedelicly-colour-coded kernel messages.. far out, man.
> 
> I don't think this is related to vt(4), but it would help if you
> provided more details (including the version you're running).
Take a look at r334530.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Disabling speculative execution mitigations

2019-12-06 Thread Konstantin Belousov

On Fri, Dec 06, 2019 at 03:51:04PM +1030, O'Connor, Daniel wrote:
> Hi,
> I am trying to track down a performance drop with the ASPEED xorg video 
> driver between FreeBSD 11 and 12 (I'm not expecting miracles from it but it 
> was basically unusable..)
> 
> I wondered if some of the speculative execution mitigations could be causing 
> the problem so I did some digging and found these..
> 
> vm.pmap.pti="0"# Disable page table isolation
> hw.ibrs_disable="1"# Disable Indirect Branch Restricted Speculation
This line enables IBRS.

> hw.mds_disable="0" # Disable Microarchitectural Data Sampling flush
> hw.vmm.vmx="1" # Don't flush RSB on vmexit (presumably only affects 
> bhyve etc)
I have no idea what this line should configure.

> hw.lazy_fpu_switch="1" # Lazily flush FPU
> 
> Does anyone know of any others?
Did you read security(7) (on HEAD)?

> 
> I have 2 systems with the same motherboard (Supermicro X11SSH-F), one is 
> older and runs FreeBSD 11 (and had an older BIOS_ and the newer runs FreeBSD 
> 12.
> 
> FWIW on FreeBSD 11 the performance (measured by a subset of x11perf 
> benchmarks) went down 40% after updating to the latest BIOS (2.2a). 
> Unfortunately on FreeBSD 12 rolling back to the original BIOS (2.2) did not 
> improve performance.
> 
> --
> Daniel O'Connor
> "The nice thing about standards is that there
> are so many of them to choose from."
>  -- Andrew Tanenbaum
> 
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: How can kill(-1, 0) return EPERM?

2019-12-01 Thread Konstantin Belousov

On Mon, Dec 02, 2019 at 02:11:14AM +0300, Dmitry Marakasov wrote:
> * Konstantin Belousov (kostik...@gmail.com) wrote:
> 
> > > > > > I'm helping to investigate some userspace issue [1], where kill(-1, 
> > > > > > SIGKILL)
> > > > > > fails with EPERM. I've managed to isolate this case in a small 
> > > > > > program:
> > > > > > 
> > > > > > 
> > > > > > ```
> > > > > > #include 
> > > > > > #include 
> > > > > > #include 
> > > > > > #include 
> > > > > > #include 
> > > > > > #include 
> > > > > > 
> > > > > > int main() {
> > > > > > if (setuid(66) == -1)  // uucp, just for the test
> > > > > > err(1, "setuid");
> > > > > > 
> > > > > > int res = kill(-1, 0);  // <- fails with EPERM
> > > > > > fprintf(stderr, "kill(-1, 0) result=%d, errno=%s\n", res, 
> > > > > > strerror(errno));
> > > > > > 
> > > > > > return 0;
> > > > > > }
> > > > > > ```
> > > > > > 
> > > > > > when run from root on 12.1 kill call fails with EPERM. However I 
> > > > > > cannot
> > > > > > comprehend what it is caused by and how it's even possible: kill(2) 
> > > > > > manpage
> > > > > > says that with pid=-1 kill should only send (and in this case of 
> > > > > > sig=0,
> > > > > > /not/ send) signals to the processes belonging to the current uid, 
> > > > > > so there
> > > > > > should be no permission problems. I've also looked into the kernel 
> > > > > > code
> > > > > > (sys_kill, killpg1), and it matches to what manpage says, I see no 
> > > > > > way
> > > > > > for it to return EPERM: sys_kill() should fall through to the 
> > > > > > switch, call
> > > > > > killpg1() with all=1 and killpg1() if(all) branch may only set 
> > > > > > `ret` to
> > > > > > either 0 or ESRCH. Am I missing something, or is there a problem 
> > > > > > somewhere?
> > > > > 
> > > > > It looks like I have misread the `else if' path of this core.
> > > > > 
> > > > > if (all) {
> > > > > /*
> > > > >  * broadcast
> > > > >  */
> > > > > sx_slock(_lock);
> > > > > FOREACH_PROC_IN_SYSTEM(p) {
> > > > > if (p->p_pid <= 1 || p->p_flag & P_SYSTEM ||
> > > > > p == td->td_proc || p->p_state == PRS_NEW) {
> > > > > continue;
> > > > > }
> > > > > PROC_LOCK(p);
> > > > > err = p_cansignal(td, p, sig);
> > > > > if (err == 0) {
> > > > > if (sig)
> > > > > pksignal(p, sig, ksi);
> > > > > ret = err;
> > > > > }
> > > > > else if (ret == ESRCH)
> > > > > ret = err;
> > > > > PROC_UNLOCK(p);
> > > > > }
> > > > > sx_sunlock(_lock);
> > > > > } ...
> > > > > 
> > > > > so it's clear now where EPERM comes from. However it looks like the
> > > > > behavior contradicts the manpage - there are no signs of check that
> > > > > the signalled process has the same uid as the caller.
> > > > 
> > > > I am not sure what you mean by 'signs of check'.  Look at p_cansignal()
> > > > and cr_cansignal() implementation.
> > > 
> > > I've meant that according to the manpage
> > > 
> > >  If pid is -1:
> > >  If the user has super-user privileges, the signal is sent to 
> > > all
> > >  processes excluding system processes (with P_SYSTEM flag 
> > > set),
> > >  process with ID 1 (usually init(8)), and the process sending 
> > > the
> > >  signal.  If the user is not the super user, the signal is 
> > > sent to
> > >

Re: How can kill(-1, 0) return EPERM?

2019-12-01 Thread Konstantin Belousov

On Sun, Dec 01, 2019 at 03:24:11AM +0300, Dmitry Marakasov wrote:
> * Konstantin Belousov (kostik...@gmail.com) wrote:
> 
> > > > I'm helping to investigate some userspace issue [1], where kill(-1, 
> > > > SIGKILL)
> > > > fails with EPERM. I've managed to isolate this case in a small program:
> > > > 
> > > > 
> > > > ```
> > > > #include 
> > > > #include 
> > > > #include 
> > > > #include 
> > > > #include 
> > > > #include 
> > > > 
> > > > int main() {
> > > > if (setuid(66) == -1)  // uucp, just for the test
> > > > err(1, "setuid");
> > > > 
> > > > int res = kill(-1, 0);  // <- fails with EPERM
> > > > fprintf(stderr, "kill(-1, 0) result=%d, errno=%s\n", res, 
> > > > strerror(errno));
> > > > 
> > > > return 0;
> > > > }
> > > > ```
> > > > 
> > > > when run from root on 12.1 kill call fails with EPERM. However I cannot
> > > > comprehend what it is caused by and how it's even possible: kill(2) 
> > > > manpage
> > > > says that with pid=-1 kill should only send (and in this case of sig=0,
> > > > /not/ send) signals to the processes belonging to the current uid, so 
> > > > there
> > > > should be no permission problems. I've also looked into the kernel code
> > > > (sys_kill, killpg1), and it matches to what manpage says, I see no way
> > > > for it to return EPERM: sys_kill() should fall through to the switch, 
> > > > call
> > > > killpg1() with all=1 and killpg1() if(all) branch may only set `ret` to
> > > > either 0 or ESRCH. Am I missing something, or is there a problem 
> > > > somewhere?
> > > 
> > > It looks like I have misread the `else if' path of this core.
> > > 
> > > if (all) {
> > > /*
> > >  * broadcast
> > >  */
> > > sx_slock(_lock);
> > > FOREACH_PROC_IN_SYSTEM(p) {
> > > if (p->p_pid <= 1 || p->p_flag & P_SYSTEM ||
> > > p == td->td_proc || p->p_state == PRS_NEW) {
> > > continue;
> > > }
> > > PROC_LOCK(p);
> > > err = p_cansignal(td, p, sig);
> > > if (err == 0) {
> > > if (sig)
> > > pksignal(p, sig, ksi);
> > > ret = err;
> > > }
> > > else if (ret == ESRCH)
> > > ret = err;
> > > PROC_UNLOCK(p);
> > > }
> > > sx_sunlock(_lock);
> > > } ...
> > > 
> > > so it's clear now where EPERM comes from. However it looks like the
> > > behavior contradicts the manpage - there are no signs of check that
> > > the signalled process has the same uid as the caller.
> > 
> > I am not sure what you mean by 'signs of check'.  Look at p_cansignal()
> > and cr_cansignal() implementation.
> 
> I've meant that according to the manpage
> 
>  If pid is -1:
>  If the user has super-user privileges, the signal is sent to all
>  processes excluding system processes (with P_SYSTEM flag set),
>  process with ID 1 (usually init(8)), and the process sending the
>  signal.  If the user is not the super user, the signal is sent to
>  all processes with the same uid as the user excluding the process
>  sending the signal.  No error is returned if any process could be
>  signaled.
> 
> IMO there should be an additional check in this condition:
> 
>  if (p->p_pid <= 1 || p->p_flag & P_SYSTEM ||
>  p == td->td_proc || p->p_state == PRS_NEW) {
>  continue;
>  }
> 
> E.g. something like
> 
>  if (p->p_pid <= 1 || p->p_flag & P_SYSTEM ||
>  p == td->td_proc || p->p_state == PRS_NEW ||
>  (td->td_ucred->cr_ruid != 0 &&
>   p->td_ucred->cr_ruid != td->td_ucred->cr_ruid) {
>  continue;
>  }
> 
> e.g. it should not even attempt to signal processes with other uids.
Why ?  You are trying to outguess p_cansignal(), which could deny
action for much more reasons, so you would get EPERM still, e.g. if the
target is suid.  Or, p_cansignal() also might allow to send the signal
even for mismatched uids, again look at it code.

I might guess that your complain is really about a different aspect
of it.  If you look at the posix description of the EPERM error from
kill(2) (really kill(3)), it says
[EPERM]  The process does not have permission to send the signal to
 any receiving process.
In other words, we should not return EPERM if we signalled at least one
of the process.

Is this the problem ?

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: How can kill(-1, 0) return EPERM?

2019-11-29 Thread Konstantin Belousov

On Fri, Nov 29, 2019 at 07:45:09PM +0300, Dmitry Marakasov wrote:
> * Dmitry Marakasov (amd...@amdmi3.ru) wrote:
> 
> > I'm helping to investigate some userspace issue [1], where kill(-1, SIGKILL)
> > fails with EPERM. I've managed to isolate this case in a small program:
> > 
> > 
> > ```
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > 
> > int main() {
> > if (setuid(66) == -1)  // uucp, just for the test
> > err(1, "setuid");
> > 
> > int res = kill(-1, 0);  // <- fails with EPERM
> > fprintf(stderr, "kill(-1, 0) result=%d, errno=%s\n", res, 
> > strerror(errno));
> > 
> > return 0;
> > }
> > ```
> > 
> > when run from root on 12.1 kill call fails with EPERM. However I cannot
> > comprehend what it is caused by and how it's even possible: kill(2) manpage
> > says that with pid=-1 kill should only send (and in this case of sig=0,
> > /not/ send) signals to the processes belonging to the current uid, so there
> > should be no permission problems. I've also looked into the kernel code
> > (sys_kill, killpg1), and it matches to what manpage says, I see no way
> > for it to return EPERM: sys_kill() should fall through to the switch, call
> > killpg1() with all=1 and killpg1() if(all) branch may only set `ret` to
> > either 0 or ESRCH. Am I missing something, or is there a problem somewhere?
> 
> It looks like I have misread the `else if' path of this core.
> 
> if (all) {
> /*
>  * broadcast
>  */
> sx_slock(_lock);
> FOREACH_PROC_IN_SYSTEM(p) {
> if (p->p_pid <= 1 || p->p_flag & P_SYSTEM ||
> p == td->td_proc || p->p_state == PRS_NEW) {
> continue;
> }
> PROC_LOCK(p);
> err = p_cansignal(td, p, sig);
> if (err == 0) {
> if (sig)
> pksignal(p, sig, ksi);
> ret = err;
> }
> else if (ret == ESRCH)
> ret = err;
> PROC_UNLOCK(p);
> }
> sx_sunlock(_lock);
> } ...
> 
> so it's clear now where EPERM comes from. However it looks like the
> behavior contradicts the manpage - there are no signs of check that
> the signalled process has the same uid as the caller.

I am not sure what you mean by 'signs of check'.  Look at p_cansignal()
and cr_cansignal() implementation.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Error building stable/12 (amd64) at r355087

2019-11-25 Thread Konstantin Belousov

On Mon, Nov 25, 2019 at 03:58:10AM -0800, David Wolfskill wrote:
> This is during a source-based update from r355048 to r355087, during
> "stage 4.3: building everything" (using META_MODE); meta file reads:
> 
> # Meta data file 
> /common/S3/obj/usr/src/amd64.amd64/usr.sbin/camdd/camdd.o.meta
> CMD cc -target x86_64-unknown-freebsd12.1 
> --sysroot=/common/S3/obj/usr/src/amd64.amd64/tmp 
> -B/common/S3/obj/usr/src/amd64.amd64/tmp/usr/bin  -O2 -pipe   -std=gnu99 
> -fstack-protector-strong -Wsystem-headers -Werror -Wall -Wno-format-y2k -W 
> -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes 
> -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow 
> -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs 
> -Wredundant-decls -Wold-style-definition -Wno-pointer-sign 
> -Wmissing-variable-declarations -Wno-empty-body -Wno-string-plus-int 
> -Wno-unused-const-variable  -Qunused-arguments  -c 
> /usr/src/usr.sbin/camdd/camdd.c -o camdd.o
> CMD 
> CWD /common/S3/obj/usr/src/amd64.amd64/usr.sbin/camdd
> TARGET camdd.o
> -- command output --
> In file included from /usr/src/usr.sbin/camdd/camdd.c:54:
> In file included from 
> /common/S3/obj/usr/src/amd64.amd64/tmp/usr/include/machine/bus.h:6:
> In file included from 
> /common/S3/obj/usr/src/amd64.amd64/tmp/usr/include/x86/bus.h:1043:
> In file included from 
> /common/S3/obj/usr/src/amd64.amd64/tmp/usr/include/machine/bus_dma.h:34:
> /common/S3/obj/usr/src/amd64.amd64/tmp/usr/include/x86/bus_dma.h:182:1: 
> error: unknown type name 'bool'
> bool bus_dma_dmar_set_buswide(device_t dev);
> ^
> /common/S3/obj/usr/src/amd64.amd64/tmp/usr/include/x86/bus_dma.h:182:31: 
> error: unknown type name 'device_t'
> bool bus_dma_dmar_set_buswide(device_t dev);
>   ^
> 2 errors generated.
> 
> *** Error code 1

I hope that this is fixed by r355089.  I did not tracked down how HEAD
was immune to the problem.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: lib not found and found at the same time ?

2019-11-14 Thread Konstantin Belousov

On Thu, Nov 14, 2019 at 08:46:13AM -0500, mike tancsa wrote:
> 
> On 11/13/2019 5:25 PM, Konstantin Belousov wrote:
> > On Wed, Nov 13, 2019 at 04:48:40PM -0500, mike tancsa wrote:
> >> I was trying to upgrade (failed) and then re-install the
> >> samba410-4.10.10 port on a RELENG12 box.  One of the Samba libs shows
> >> some output I dont understand on ldd
> >>
> >> ldd /usr/local/lib/nss_wins.so.1
> >> /usr/local/lib/nss_wins.so.1:
> >>     libwbclient.so.0 => /usr/local/lib/samba4/libwbclient.so.0
> >> (0x801003000)
> >>     libwinbind-client-samba4.so => not found (0)
> >>     libreplace-samba4.so => not found (0)
> >>     libcrypt.so.5 => /lib/libcrypt.so.5 (0x80066b000)
> >>     libc.so.7 => /lib/libc.so.7 (0x80024a000)
> >>     libwinbind-client-samba4.so =>
> >> /usr/local/lib/samba4/private/libwinbind-client-samba4.so (0x801213000)
> >>     libreplace-samba4.so =>
> >> /usr/local/lib/samba4/private/libreplace-samba4.so (0x801417000)
> >>
> >>
> >> There are 2 libs it says it cannot find, but then a few lines below it
> >> says it found them ?
> > First instance (not found) is probably the direct dependency, which is
> > probably not found because nss_wins.so does not have rpath recorded.
> > Then, I guess, some other library also depends on 
> > libwinbind-client-samba4.so,
> > but this library has rpath.
> >
> > You can check this with readelf, look for DT_NEEDED and DT_RPATH*
> > dynamic entries.
> 
> Thanks!
> 
>  readelf -d nss_wins.so.1
> 
> Dynamic section at offset 0x1d20 contains 32 entries:
>   Tag    Type  Name/Value
>  0x0001 NEEDED   Shared library: [libwbclient.so.0]
>  0x0001 NEEDED   Shared library:
> [libwinbind-client-samba4.so]
>  0x0001 NEEDED   Shared library:
> [libreplace-samba4.so]
>  0x0001 NEEDED   Shared library: [libcrypt.so.5]
>  0x0001 NEEDED   Shared library: [libc.so.7]
>  0x000e SONAME   Library soname: [nss_wins.so.1]
>  0x000f RPATH    Library rpath: [/usr/local/lib]
>  0x001d RUNPATH  Library runpath: [/usr/local/lib]
> 
> Looking at other libs, they have the settings
> 
>  0x000f RPATH    Library rpath:
> [/usr/local/lib/samba4/private:/usr/local/lib]
>  0x001d RUNPATH  Library runpath:
> [/usr/local/lib/samba4/private:/usr/local/lib]
> 
> What is the best way to fix this ? It seems I can do a quick libmap
> entry and it seems to correct it
> 
> 
> [/usr/local/lib/nss_wins.so.1]
> libwinbind-client-samba4.so 
> /usr/local/lib/samba4/private/libwinbind-client-samba4.so
> libreplace-samba4.so /usr/local/lib/samba4/private/libreplace-samba4.so
> 
>  ldd nss_wins.so.1
> nss_wins.so.1:
>     libwbclient.so.0 => /usr/local/lib/samba4/libwbclient.so.0
> (0x801003000)
>     libwinbind-client-samba4.so =>
> /usr/local/lib/samba4/private/libwinbind-client-samba4.so (0x801213000)
>     libreplace-samba4.so =>
> /usr/local/lib/samba4/private/libreplace-samba4.so (0x801417000)
>     libcrypt.so.5 => /lib/libcrypt.so.5 (0x80066b000)
>     libc.so.7 => /lib/libc.so.7 (0x80024a000)
> 
> This is something that needs to be fixed in the port at build time ?

Why is it a problem ?  If the library is loaded from a binary that already
linked to the depended lib it would just work.

Otherwise yes, it is the ports' build issue, it must not be papered over
with libmap or LD_LIBRARY_PATH.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: lib not found and found at the same time ?

2019-11-13 Thread Konstantin Belousov

On Wed, Nov 13, 2019 at 04:48:40PM -0500, mike tancsa wrote:
> I was trying to upgrade (failed) and then re-install the
> samba410-4.10.10 port on a RELENG12 box.  One of the Samba libs shows
> some output I dont understand on ldd
> 
> ldd /usr/local/lib/nss_wins.so.1
> /usr/local/lib/nss_wins.so.1:
>     libwbclient.so.0 => /usr/local/lib/samba4/libwbclient.so.0
> (0x801003000)
>     libwinbind-client-samba4.so => not found (0)
>     libreplace-samba4.so => not found (0)
>     libcrypt.so.5 => /lib/libcrypt.so.5 (0x80066b000)
>     libc.so.7 => /lib/libc.so.7 (0x80024a000)
>     libwinbind-client-samba4.so =>
> /usr/local/lib/samba4/private/libwinbind-client-samba4.so (0x801213000)
>     libreplace-samba4.so =>
> /usr/local/lib/samba4/private/libreplace-samba4.so (0x801417000)
> 
> 
> There are 2 libs it says it cannot find, but then a few lines below it
> says it found them ?
First instance (not found) is probably the direct dependency, which is
probably not found because nss_wins.so does not have rpath recorded.
Then, I guess, some other library also depends on libwinbind-client-samba4.so,
but this library has rpath.

You can check this with readelf, look for DT_NEEDED and DT_RPATH*
dynamic entries.

> 
> ldd -av /usr/local/lib/nss_wins.so.1
> /usr/local/lib/nss_wins.so.1:
>     libwbclient.so.0 => /usr/local/lib/samba4/libwbclient.so.0
> (0x801003000)
>     libwinbind-client-samba4.so => not found (0)
>     libreplace-samba4.so => not found (0)
>     libcrypt.so.5 => /lib/libcrypt.so.5 (0x800665000)
>     libc.so.7 => /lib/libc.so.7 (0x80024a000)
> /usr/local/lib/samba4/libwbclient.so.0:
>     libwinbind-client-samba4.so =>
> /usr/local/lib/samba4/private/libwinbind-client-samba4.so (0x801213000)
>     libreplace-samba4.so =>
> /usr/local/lib/samba4/private/libreplace-samba4.so (0x801417000)
>     libcrypt.so.5 => /lib/libcrypt.so.5 (0x800665000)
>     libc.so.7 => /lib/libc.so.7 (0x80024a000)
> /lib/libcrypt.so.5:
>     libc.so.7 => /lib/libc.so.7 (0x80024a000)
> /usr/local/lib/samba4/private/libwinbind-client-samba4.so:
>     libreplace-samba4.so =>
> /usr/local/lib/samba4/private/libreplace-samba4.so (0x801417000)
>     libcrypt.so.5 => /lib/libcrypt.so.5 (0x800665000)
>     libc.so.7 => /lib/libc.so.7 (0x80024a000)
> /usr/local/lib/samba4/private/libreplace-samba4.so:
>     libcrypt.so.5 => /lib/libcrypt.so.5 (0x800665000)
>     libc.so.7 => /lib/libc.so.7 (0x80024a000)
> 
>     ---Mike
> 
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 12.1 release symbol incompatibility?

2019-09-19 Thread Konstantin Belousov

On Wed, Sep 18, 2019 at 03:05:34PM -0600, Sean Bruno wrote:
> If one installs 12.1 and tries to run a 12.0 release package (postgresql
> server for instance), it fails due to a missing symbol:
> 
>  # service postgresql start
> /usr/local/bin/pg_ctl: Undefined symbol "stat@FBSD_1.5"
> 
> I think this is a bug as we are supposed to support this kind of thing,
> right?

I do not think it is a bug in the base system.  You seems to install
stable/11 libc.  How did you get that libc, is a different question.

The stat@FBSD_1.5 symbol was added during the CURRENT-12 lifecycle due
to the ino64 work, and was there at the branch point for 12. Of course
nobody removed it from libc since then.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ntpd doesn't like ASLR on stable/12 post-r350672

2019-09-10 Thread Konstantin Belousov

On Tue, Sep 10, 2019 at 10:50:33AM -0600, Ian Lepore wrote:
> On Sun, 2019-08-25 at 15:03 +0300, Konstantin Belousov wrote:
> > On Sun, Aug 25, 2019 at 12:40:22AM +0200, Trond Endrestøl wrote:
> > > On Sun, 25 Aug 2019 01:28+0300, Konstantin Belousov wrote:
> > > 
> > > > On Sun, Aug 25, 2019 at 12:19:43AM +0200, Trond Endrestøl wrote:
> > > > > On Sat, 24 Aug 2019 23:41+0300, Konstantin Belousov wrote:
> > > > > > > I tried changing command="/usr/sbin/${name}" to 
> > > > > > > command="/usr/bin/proccontrol -m aslr -s disable 
> > > > > > > /usr/sbin/${name}" in 
> > > > > > > /etc/rc.d/ntpd, but that didn't go well.
> > > > > > 
> > > > > > If you set kern.elf64.aslr.stack_gap to zero, does it help ?
> > > > > 
> > > > > That helped. Thank you again.
> > > > 
> > > > Can you verify is ntpd sets new rlimit(RLIMIT_STACK) for the main 
> > > > thread,
> > > > and if yes, what this new limit is ?
> > > 
> > > (gdb)
> > > 5265if (-1 == setrlimit(RLIMIT_STACK, )) {
> > > (gdb) print rl
> > > $1 = {rlim_cur = 204800, rlim_max = 536870912}
> > 
> > So they set the stack limit to 200K, am I right ?  I suspect they do
> > that because ntpd wires entire process address space, so 512M blows off
> > all limits on wiring.
> > 
> > I do not have a good idea how to make this behaviour compatible with
> > the gap.  Might be we can change the gap sizing parameter to KBs instead
> > of percentage, and set the defaults in 64KB range.
> > 
> > > 
> > > > aslr.stack_gap is the percentage for the gap on that stack, and since
> > > > default size of the main stack limit is quite large 512M, even 3%
> > > > (default gap upper limit) are whole 15M. If the new limit is less than
> > > > 15M, there is a likely probability that only the gap is left after the
> > > > rlimit(2) call, leaving no space for the program frames.
> > > > 
> > > > At least this looks like a nice theory.
> 
> So is the problem here that before ntpd is running and has the chance
> to call setrlimit(), aslr has already created a large stack gap?  If
> so, it seems to me that aslr and setrlimit(RLIMIT_STACK, ...) are never
> going to work right together.  Even if the default stack gap were much
> smaller, code using RLIMIT_STACK is going to end up with a stack
> smaller than it asked for because the gap it has no way of knowing
> about uses up some part (or all of) the limited space.
Sort of, yes.  There is a UI problem with the control for the gap,
and I am not sure how to fix it.

> 
> If the default gap were 64K or less, things would be much more likely
> to work accidentally (and we might never have noticed this situtation),
> but they still wouldn't be working correctly.  Is it possible for the
> code on the kernel side to add the requested limit to the gap size to
> generate a result that gives the caller the usable stack size they
> asked for?
I do not see a way to account for the gap in RLIMIT_STACK adjustment.
It should be handled in cooperation with the user code.

When program does adjust RLIMIT_STACK in so radical way, e.g. setting
it to 64k, it must know a lot about execution environment.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ntpd doesn't like ASLR on stable/12 post-r350672

2019-08-25 Thread Konstantin Belousov

On Sun, Aug 25, 2019 at 12:40:22AM +0200, Trond Endrestøl wrote:
> On Sun, 25 Aug 2019 01:28+0300, Konstantin Belousov wrote:
> 
> > On Sun, Aug 25, 2019 at 12:19:43AM +0200, Trond Endrestøl wrote:
> > > On Sat, 24 Aug 2019 23:41+0300, Konstantin Belousov wrote:
> > > > > I tried changing command="/usr/sbin/${name}" to 
> > > > > command="/usr/bin/proccontrol -m aslr -s disable /usr/sbin/${name}" 
> > > > > in 
> > > > > /etc/rc.d/ntpd, but that didn't go well.
> > > > 
> > > > If you set kern.elf64.aslr.stack_gap to zero, does it help ?
> > > 
> > > That helped. Thank you again.
> > 
> > Can you verify is ntpd sets new rlimit(RLIMIT_STACK) for the main thread,
> > and if yes, what this new limit is ?
> 
> (gdb)
> 5265if (-1 == setrlimit(RLIMIT_STACK, )) {
> (gdb) print rl
> $1 = {rlim_cur = 204800, rlim_max = 536870912}
So they set the stack limit to 200K, am I right ?  I suspect they do
that because ntpd wires entire process address space, so 512M blows off
all limits on wiring.

I do not have a good idea how to make this behaviour compatible with
the gap.  Might be we can change the gap sizing parameter to KBs instead
of percentage, and set the defaults in 64KB range.

> 
> > aslr.stack_gap is the percentage for the gap on that stack, and since
> > default size of the main stack limit is quite large 512M, even 3%
> > (default gap upper limit) are whole 15M. If the new limit is less than
> > 15M, there is a likely probability that only the gap is left after the
> > rlimit(2) call, leaving no space for the program frames.
> > 
> > At least this looks like a nice theory.
> 
> -- 
> Trond.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ntpd doesn't like ASLR on stable/12 post-r350672

2019-08-24 Thread Konstantin Belousov

On Sun, Aug 25, 2019 at 12:19:43AM +0200, Trond Endrestøl wrote:
> On Sat, 24 Aug 2019 23:41+0300, Konstantin Belousov wrote:
> > > I tried changing command="/usr/sbin/${name}" to 
> > > command="/usr/bin/proccontrol -m aslr -s disable /usr/sbin/${name}" in 
> > > /etc/rc.d/ntpd, but that didn't go well.
> > 
> > If you set kern.elf64.aslr.stack_gap to zero, does it help ?
> 
> That helped. Thank you again.

Can you verify is ntpd sets new rlimit(RLIMIT_STACK) for the main thread,
and if yes, what this new limit is ?

aslr.stack_gap is the percentage for the gap on that stack, and since
default size of the main stack limit is quite large 512M, even 3%
(default gap upper limit) are whole 15M. If the new limit is less than
15M, there is a likely probability that only the gap is left after the
rlimit(2) call, leaving no space for the program frames.

At least this looks like a nice theory.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ntpd doesn't like ASLR on stable/12 post-r350672

2019-08-24 Thread Konstantin Belousov

On Sat, Aug 24, 2019 at 10:04:49PM +0200, Trond Endrestøl wrote:
> Hi,
> 
> I'm running stable/12 with ASLR enabled in /etc/sysctl.conf:
> 
> kern.elf64.aslr.enable=1
> kern.elf64.aslr.pie_enable=1
> kern.elf32.aslr.enable=1
> kern.elf32.aslr.pie_enable=1
> 
> After upgrading to anything after r350672, now at r351450, ntpd 
> refuses to start at boot.
> 
> Aug 24 21:25:42  HOSTNAME ntpd[5618]: ntpd 4.2.8p12-a (1): 
> Starting
> Aug 24 21:25:43  HOSTNAME kernel: [406] pid 5619 (ntpd), jid 0, 
> uid 123: exited on signal 11
> 
> Disabling ASLR, kern.elf64.aslr.enable=0, before starting ntpd 
> manually is a workaround, but this is not viable in the long run.
Why ?

> 
> I tried changing command="/usr/sbin/${name}" to 
> command="/usr/bin/proccontrol -m aslr -s disable /usr/sbin/${name}" in 
> /etc/rc.d/ntpd, but that didn't go well.
If you set kern.elf64.aslr.stack_gap to zero, does it help ?

> 
> Running ntpd through gdb while ASLR was enabled, I narrowed it down to
> /usr/src/contrib/ntp/ntpd/ntpd.c:1001
> 
>   ntp_rlimit(RLIMIT_STACK, DFLT_RLIMIT_STACK * 4096, 4096, "4k");
> 
> which calls /usr/src/contrib/ntp/ntpd/ntp_config.c:5211 and proceeds 
> to /usr/src/contrib/ntp/ntpd/ntp_config.c:5254
> 
>   if (-1 == getrlimit(RLIMIT_STACK, )) {
> 
> Single stepping from this point gave me:
> 
> 
> 
> (gdb) s
> _thr_rtld_set_flag (mask=1) at /usr/src/lib/libthr/thread/thr_rtld.c:171
> 171 {
> (gdb)
> 176 return (0);
> (gdb)
> _thr_rtld_rlock_acquire (lock=0x80180d200) at 
> /usr/src/lib/libthr/thread/thr_rtld.c:115
> 115 {
> (gdb)
> 120 curthread = _get_curthread();
> (gdb)
> _get_curthread () at /usr/src/lib/libthr/arch/amd64/include/pthread_md.h:97
> 97  return (TCB_GET64(tcb_thread));
> (gdb)
> _thr_rtld_rlock_acquire (lock=0x80180d200) at 
> /usr/src/lib/libthr/thread/thr_rtld.c:121
> 121 SAVE_ERRNO();
> (gdb)
> 124 THR_CRITICAL_ENTER(curthread);
> (gdb)
> _thr_rwlock_tryrdlock (rwlock=, flags=0) at 
> /usr/src/lib/libthr/thread/thr_umtx.h:192
> 192 (rwlock->rw_flags & URWLOCK_PREFER_READER) != 0)
> (gdb)
> 191 if ((flags & URWLOCK_PREFER_READER) != 0 ||
> (gdb)
> 197 while (!(state & wrflags)) {
> (gdb)
> 201 if (atomic_cmpset_acq_32(>rw_state, state, 
> state + 1))
> (gdb)
> atomic_cmpset_int (dst=, expect=, src=1) at 
> /usr/obj/usr/src/amd64.amd64/tmp/usr/include/machine/atomic.h:220
> 220 ATOMIC_CMPSET(int);
> (gdb)
> _thr_rwlock_tryrdlock (rwlock=, flags=0) at 
> /usr/src/lib/libthr/thread/thr_umtx.h:201
> 201 if (atomic_cmpset_acq_32(>rw_state, state, 
> state + 1))
> (gdb)
> _thr_rtld_rlock_acquire (lock=0x80180d200) at 
> /usr/src/lib/libthr/thread/thr_rtld.c:127
> 127 curthread->rdlock_count++;
> (gdb)
> 128 RESTORE_ERRNO();
> (gdb)
> 129 }
> (gdb)
> _thr_rtld_clr_flag (mask=1) at /usr/src/lib/libthr/thread/thr_rtld.c:181
> 181 {
> (gdb)
> 182 return (0);
> (gdb)
> _thr_rtld_lock_release (lock=0x80180d200) at 
> /usr/src/lib/libthr/thread/thr_rtld.c:150
> 150 {
> (gdb)
> _get_curthread () at /usr/src/lib/libthr/arch/amd64/include/pthread_md.h:97
> 97  return (TCB_GET64(tcb_thread));
> (gdb)
> _thr_rtld_lock_release (lock=0x80180d200) at 
> /usr/src/lib/libthr/thread/thr_rtld.c:157
> 157 SAVE_ERRNO();
> (gdb)
> 160 state = l->lock.rw_state;
> (gdb)
> 161 if (_thr_rwlock_unlock(>lock) == 0) {
> (gdb)
> _thr_rwlock_unlock (rwlock=0x80180d200) at 
> /usr/src/lib/libthr/thread/thr_umtx.h:249
> 249 state = rwlock->rw_state;
> (gdb)
> 250 if ((state & URWLOCK_WRITE_OWNER) != 0) {
> (gdb)
> 256 if 
> (__predict_false(URWLOCK_READER_COUNT(state) == 0))
> (gdb)
> 260 URWLOCK_READER_COUNT(state) == 1)) 
> {
> (gdb)
> 259 URWLOCK_READ_WAITERS)) != 0 &&
> (gdb)
> 262 state, state - 1))
> (gdb)
> 261 if 
> (atomic_cmpset_rel_32(>rw_state,
> (gdb)
> atomic_cmpset_int (dst=, expect=, src=0) at 
> /usr/obj/usr/src/amd64.amd64/tmp/usr/include/machine/atomic.h:220
> 220 ATOMIC_CMPSET(int);
> (gdb)
> _thr_rwlock_unlock (rwlock=0x80180d200) at 
> /usr/src/lib/libthr/thread/thr_umtx.h:261
> 261 if 
> (atomic_cmpset_rel_32(>rw_state,
> (gdb)
> _thr_rtld_lock_release (lock=) at 
> /usr/src/lib/libthr/thread/thr_rtld.c:162
> 162 if ((state & URWLOCK_WRITE_OWNER) == 0)
> (gdb)
> 163 curthread->rdlock_count--;
> (gdb)
> 164 THR_CRITICAL_LEAVE(curthread);
> (gdb)
> _thr_ast (curthread=0x80864b000) at /usr/src/lib/libthr/thread/thr_sig.c:271
> 271 if (!THR_IN_CRITICAL(curthread)) {
> (gdb)
> 272 check_deferred_signal(curthread);
> (gdb)
>

Re: GENERIC crash 11.3-PRERELEASE (i386)

2019-07-10 Thread Konstantin Belousov

On Wed, Jul 10, 2019 at 03:02:40PM +0200, Schuendehuette, Matthias (LDA IT PLM) 
wrote:
> Sorry, wrong link... :-(
> 
> See the verbose boot messages here...
> 
> https://www.dropbox.com/sh/buzxekimo2h2r67/AADpUvLndhm2SHa5t9s9Ckksa?dl=0
> 
> ...in file "Boot_verbose.jpg"

Can you try the following patch ?

Index: sys/x86/x86/cpu_machdep.c
===
--- sys/x86/x86/cpu_machdep.c   (revision 349890)
+++ sys/x86/x86/cpu_machdep.c   (working copy)
@@ -953,7 +953,6 @@
  * architectural state except possibly %rflags. Also, it is always
  * called with interrupts disabled.
  */
-void (*mds_handler)(void);
 void mds_handler_void(void);
 void mds_handler_verw(void);
 void mds_handler_ivb(void);
@@ -962,6 +961,7 @@
 void mds_handler_skl_avx(void);
 void mds_handler_skl_avx512(void);
 void mds_handler_silvermont(void);
+void (*mds_handler)(void) = mds_handler_void;
 
 static int
 sysctl_hw_mds_disable_state_handler(SYSCTL_HANDLER_ARGS)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: GENERIC crash 11.3-PRERELEASE (i386)

2019-07-05 Thread Konstantin Belousov

On Fri, Jul 05, 2019 at 11:12:29AM +, Schuendehuette, Matthias wrote:
> Hello Konstantin,
> 
> ***
> Obviously Outlook has destroyed my last reply - here again:
> ***
> 
> I did what you suggested: deleted the content of /usr/src and 'svn co'
> the 11-STABLE sources again.
> 
> I investigated the three
> source files mentioned below and confirmed the 'svn diff' results:
> 
> "hw_mds_recalculate();" has been removed from
> 
>   sys/amd64/amd64/initcpu.c   and
>   sys/i386/i386/initcpu.c
> 
> 
> and:
> 
> "static void
>  hw_mds_recalculate_boot(void *arg __unused)
>  {
> 
>hw_mds_recalculate();
>  }
>  SYSINIT(mds_recalc, SI_SUB_SMP, SI_ORDER_ANY, hw_mds_recalculate_boot, 
> NULL);"
> 
> has been inserted into 'sys/x86/x86/cpu_machdep.c'
> 
> 
> That's still the case for 'r349719'. Also remains true that a kernel of
> 'r349719' crashes as described earlier.
Ok, show me
1. svn st and svn info output of the checkout you use
2. While kernel messages with verbose boot enabled, for your machine, and
   the kernel which fails to boot.
> 
> 
> 
> With best regards and have a nice weekend
> 
> Matthias Schuendehuette
> 
> 
> 
> 
> -Ursprüngliche Nachricht-
> Von: Konstantin Belousov  
> Gesendet: Mittwoch, 3. Juli 2019 15:55
> An: Schuendehuette, Matthias (LDA IT PLM) 
> 
> Cc: 'freebsd-stable@freebsd.org' ; Osipov, 
> Michael (LDA IT PLM) 
> Betreff: Re: GENERIC crash 11.3-PRERELEASE (i386)
> 
> On Wed, Jul 03, 2019 at 08:42:21AM +, Schuendehuette, Matthias wrote:
> > Hello Konstantin,
> > 
> > I did some research regarding the kernel crash with the following results>
> > 
> > 1) Last working kernel is:
> > 
> > "FreeBSD 11.3-BETA1 (BLNN719X) #8 r348361: Wed Jul  3 09:30:17 CEST 
> > 2019"
> > 
> > 1a) DDB-Backtrace of the crashing kernel r348362 can be seen on 
> > "Boot_BT.jpg"
> > in the dropbox directory 
> > 
> > "https://www.dropbox.com/sh/buzxekimo2h2r67/AADpUvLndhm2SHa5t9s9Ckksa?dl=0;
> > 
> > 
> > 2) Source code revision is:
> > 
> > root@blnn719x - /usr/src
> > 2056 # svn info
> > Path: .
> > Working Copy Root Path: /usr/src
> > URL: https://svn.freebsd.org/base/stable/11
> > Relative URL: ^/stable/11
> > Repository Root: https://svn.freebsd.org/base
> > Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
> > Revision: 348361
> > Node Kind: directory
> > Schedule: normal
> > Last Changed Author: jkim
> > Last Changed Rev: 348343
> > Last Changed Date: 2019-05-29 02:00:52 +0200 (Wed, 29 May 2019)
> > 
> > 
> > 3) Diff to next revision:
> > 
> > root@blnn719x - /usr/src
> > 2057 # svn diff -r 348362
> > Index: sys/amd64/amd64/initcpu.c
> > ===
> > --- sys/amd64/amd64/initcpu.c   (revision 348362)
> > +++ sys/amd64/amd64/initcpu.c   (working copy)
> > @@ -247,6 +247,7 @@
> > }
> > hw_ibrs_recalculate();
> > hw_ssb_recalculate(false);
> > +   hw_mds_recalculate();
> > switch (cpu_vendor_id) {
> > case CPU_VENDOR_AMD:
> > init_amd();
> > Index: sys/i386/i386/initcpu.c
> > ===
> > --- sys/i386/i386/initcpu.c (revision 348362)
> > +++ sys/i386/i386/initcpu.c (working copy)
> > @@ -769,6 +769,7 @@
> > elf32_nxstack = 1;
> > }
> >  #endif
> > +   hw_mds_recalculate();
> > if ((amd_feature & AMDID_RDTSCP) != 0 ||
> > (cpu_stdext_feature2 & CPUID_STDEXT2_RDPID) != 0)
> > wrmsr(MSR_TSC_AUX, PCPU_GET(cpuid));
> > Index: sys/x86/x86/cpu_machdep.c
> > ===
> > --- sys/x86/x86/cpu_machdep.c   (revision 348362)
> > +++ sys/x86/x86/cpu_machdep.c   (working copy)
> > @@ -1118,14 +1118,6 @@
> > }
> >  }
> > 
> > -static void
> > -hw_mds_recalculate_boot(void *arg __unused)
> > -{
> > -
> > -   hw_mds_recalculate();
> > -}
> > -SYSINIT(mds_recalc, SI_SUB_SMP, SI_ORDER_ANY, hw_mds_recalculate_boot, 
> > NULL);
> > -
> >  static int
> >  sysctl_mds_disable_handler(SYSCTL_HANDLER_ARGS)
> >  {
> > Index: .
> > ===
> > --- .   (revision 348362)
> > +++ .

Re: GENERIC crash 11.3-PRERELEASE (i386)

2019-07-03 Thread Konstantin Belousov

On Wed, Jul 03, 2019 at 08:42:21AM +, Schuendehuette, Matthias wrote:
> Hello Konstantin,
> 
> I did some research regarding the kernel crash with the following results>
> 
> 1) Last working kernel is:
> 
>   "FreeBSD 11.3-BETA1 (BLNN719X) #8 r348361: Wed Jul  3 09:30:17 CEST 
> 2019"
> 
> 1a) DDB-Backtrace of the crashing kernel r348362 can be seen on "Boot_BT.jpg"
>   in the dropbox directory 
>   
> "https://www.dropbox.com/sh/buzxekimo2h2r67/AADpUvLndhm2SHa5t9s9Ckksa?dl=0;
> 
> 
> 2) Source code revision is:
> 
> root@blnn719x - /usr/src
> 2056 # svn info
> Path: .
> Working Copy Root Path: /usr/src
> URL: https://svn.freebsd.org/base/stable/11
> Relative URL: ^/stable/11
> Repository Root: https://svn.freebsd.org/base
> Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
> Revision: 348361
> Node Kind: directory
> Schedule: normal
> Last Changed Author: jkim
> Last Changed Rev: 348343
> Last Changed Date: 2019-05-29 02:00:52 +0200 (Wed, 29 May 2019)
> 
> 
> 3) Diff to next revision:
> 
> root@blnn719x - /usr/src
> 2057 # svn diff -r 348362
> Index: sys/amd64/amd64/initcpu.c
> ===
> --- sys/amd64/amd64/initcpu.c   (revision 348362)
> +++ sys/amd64/amd64/initcpu.c   (working copy)
> @@ -247,6 +247,7 @@
> }
> hw_ibrs_recalculate();
> hw_ssb_recalculate(false);
> +   hw_mds_recalculate();
> switch (cpu_vendor_id) {
> case CPU_VENDOR_AMD:
> init_amd();
> Index: sys/i386/i386/initcpu.c
> ===
> --- sys/i386/i386/initcpu.c (revision 348362)
> +++ sys/i386/i386/initcpu.c (working copy)
> @@ -769,6 +769,7 @@
> elf32_nxstack = 1;
> }
>  #endif
> +   hw_mds_recalculate();
> if ((amd_feature & AMDID_RDTSCP) != 0 ||
> (cpu_stdext_feature2 & CPUID_STDEXT2_RDPID) != 0)
> wrmsr(MSR_TSC_AUX, PCPU_GET(cpuid));
> Index: sys/x86/x86/cpu_machdep.c
> ===
> --- sys/x86/x86/cpu_machdep.c   (revision 348362)
> +++ sys/x86/x86/cpu_machdep.c   (working copy)
> @@ -1118,14 +1118,6 @@
> }
>  }
> 
> -static void
> -hw_mds_recalculate_boot(void *arg __unused)
> -{
> -
> -   hw_mds_recalculate();
> -}
> -SYSINIT(mds_recalc, SI_SUB_SMP, SI_ORDER_ANY, hw_mds_recalculate_boot, NULL);
> -
>  static int
>  sysctl_mds_disable_handler(SYSCTL_HANDLER_ARGS)
>  {
> Index: .
> ===
> --- .   (revision 348362)
> +++ .   (working copy)
> 
> Property changes on: .
> ___
> Modified: svn:mergeinfo
> ## -0,1 +0,0 ##
>Reverse-merged /head:r348075
> 
> 
> 
> Somewhere here is the problem...
Definitely, there is some problem, but I doubt that it is due to the
revision in the svn. The diff above is the reverse of the stable/11
r348362 that was committed on 2019-05-29. Indeed, the missed (or
reverted) r348362 would cause exactly the symptoms you described with
failing AP startup.

I have no idea why do you have the change reverted with merge info, in
your sources.  Clean up and retry with pristine tree.

> 
> 
> 
> 
> with best regards
> Matthias Schündehütte
> 
> Siemens AG
> Large Drives Applications
> Information Technology
> Information Technology Product Lifecycle Management
> LDA IT PLM
> Nonnendammallee 72
> 13629 Berlin, Deutschland
> Tel.: +49 30 386-29957
> Mobil: +49 170 8162912
> mailto:matthias.schuendehue...@siemens.com
> 
> www.siemens.com/ingenuityforlife
> 
> -Ursprüngliche Nachricht-
> Von: Konstantin Belousov  
> Gesendet: Donnerstag, 27. Juni 2019 21:00
> An: Schuendehuette, Matthias (LDA IT PLM) 
> 
> Cc: 'freebsd-stable@freebsd.org' 
> Betreff: Re: GENERIC crash 11.3-PRERELEASE (i386)
> 
> On Thu, Jun 27, 2019 at 07:11:40AM +, Schuendehuette, Matthias wrote:
> > Hi,
> > 
> > the missing attachments can be found here now:
> > 
> > https://www.dropbox.com/sh/buzxekimo2h2r67/AADpUvLndhm2SHa5t9s9Ckksa?dl=0
> > 
> So your AP (Application Processor) seems to get fault, most likely in the
> trap handler.  There were absolutely no changes in the stable/11 in the
> area of SMP startup for quite long time.
> 
> To get anywhere, you should perhaps add ddb to your kernel configuration
> and get the backtrace.  The backtrace would be long, I am only interested
> in the first several frames before faults go into recursion.
> 
> But, since 1 month earlier kernel worked, and there were no changes, this
> might indicate either a failing hardware (your machine is quite old, it
> is Core2 Xeon, am I right ?) or problems with your build environment.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: GENERIC crash 11.3-PRERELEASE (i386)

2019-06-27 Thread Konstantin Belousov

On Thu, Jun 27, 2019 at 07:11:40AM +, Schuendehuette, Matthias wrote:
> Hi,
> 
> the missing attachments can be found here now:
> 
> https://www.dropbox.com/sh/buzxekimo2h2r67/AADpUvLndhm2SHa5t9s9Ckksa?dl=0
> 
So your AP (Application Processor) seems to get fault, most likely in the
trap handler.  There were absolutely no changes in the stable/11 in the
area of SMP startup for quite long time.

To get anywhere, you should perhaps add ddb to your kernel configuration
and get the backtrace.  The backtrace would be long, I am only interested
in the first several frames before faults go into recursion.

But, since 1 month earlier kernel worked, and there were no changes, this
might indicate either a failing hardware (your machine is quite old, it
is Core2 Xeon, am I right ?) or problems with your build environment.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ps -J0 broken?

2019-06-02 Thread Konstantin Belousov

On Sun, Jun 02, 2019 at 03:31:08PM +0200, Stefan Hegnauer wrote:
> 
> 
> On 02.06.2019 15:05, Konstantin Belousov wrote:
> > On Sun, Jun 02, 2019 at 02:30:49PM +0200, Stefan Hegnauer wrote:
> >> Hi,
> >>
> >> after a recent full update to 12.0-STABLE r348382 it seems that '/bin/ps
> >> -J 0' is broken: 'ps: Invalid jail id: 0'.
> >> It did work on stable for the last couple years prior to this update
> >> (last update without this error was about 5 weeks ago), and should still
> >> work according to ps(1):
> >>  -J  Display information about processes which match the specified
> >>  jail IDs.  This may be either the jid or name of the jail. 
> >> **Use**
> >> ** -J 0 to display only host processes*.*  This flag implies
> >> -x by
> >>  default.
> >>
> >> My system runs several jails with JID's currently in the range 80-100.
> >> The source code of ps did not change for the last 7 month as far as I
> >> can tell. A fresh 'make clean & make & make install' of just ps did not
> >> help either, which was not really surprising to me.
> >> Any pointers where to look further?
> > Is your libjail up to date ?  Do you have r348297 ?
> Thanks for the quick reply. Seems so:
> 
>     # grep FBSDID /usr/src/lib/libjail/jail_getid.c
>     __FBSDID("$FreeBSD: stable/12/lib/libjail/jail_getid.c 348297
> 2019-05-27 02:18:33Z kevans $");
>     #
>     # ls -l /lib/libjail*
>     -r--r--r--  1 root  wheel  31520 May 30 09:12 /lib/libjail.so.1
> 
> My full update included a 'svnlite up /usr/src' followed by make
> buildworld & make kernel and later make installworld as per
> /usr/src/UPDATING. To the very letter, as I always do just to be save. I
> do however use WITH_META_MODE="YES" in /etc/src-env.conf to speed up
> things. Anything else to look for?
Yes, rebuild without metamode, and remove your /usr/obj first.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ps -J0 broken?

2019-06-02 Thread Konstantin Belousov

On Sun, Jun 02, 2019 at 02:30:49PM +0200, Stefan Hegnauer wrote:
> Hi,
> 
> after a recent full update to 12.0-STABLE r348382 it seems that '/bin/ps
> -J 0' is broken: 'ps: Invalid jail id: 0'.
> It did work on stable for the last couple years prior to this update
> (last update without this error was about 5 weeks ago), and should still
> work according to ps(1):
>  -J  Display information about processes which match the specified
>  jail IDs.  This may be either the jid or name of the jail. 
> **Use**
> ** -J 0 to display only host processes*.*  This flag implies
> -x by
>  default.
> 
> My system runs several jails with JID's currently in the range 80-100.
> The source code of ps did not change for the last 7 month as far as I
> can tell. A fresh 'make clean & make & make install' of just ps did not
> help either, which was not really surprising to me.
> Any pointers where to look further?
Is your libjail up to date ?  Do you have r348297 ?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: efirtc causing panic (was Re: Panic booting 12-RC2 on amd64)

2019-05-31 Thread Konstantin Belousov

On Fri, May 31, 2019 at 04:19:57PM +0200, Jan Martin Mikkelsen wrote:
> Hi,
> 
> Christian has pointed me at this 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233534 which he raised 
> after his email. The workaround was to boot with “efi.rt.disabled=1”. 
> 
> I took a closer look at what is going on. The problem is that the EFI 
> rt_gettime call is faulting, and the fault is handled in efirt_support.S and 
> a failure is reported. These messages is in the kernel output:
> 
> kernel trap 12 with interrupts disabled
> kernel trap 12 with interrupts disabled
> EFI rt_gettime call faulted, error 14
> efirtc0: cannot read EFI realtime clock, error 14
> 
> So far, so good. The problem is that that later in startup the 
> "smp_targeted_tlb_shootdown: interrupts disabled” panic occurs, if the SMP is 
> enabled. With SMP disabled this does not occur and the system runs.
> 
> I’m not sure whether this is a BIOS problem (seems likely) or something that 
> could handled after dealing with the fault in efirt_support.S.
> 
> While looking I found the code below that looks wrong in efi_enter(), but 
> that is not the problem in this case.
> 
> Just adding this to the archive in case someone else looks more closely later.

Try this.  Only compile-time tested.

diff --git a/sys/amd64/amd64/efirt_support.S b/sys/amd64/amd64/efirt_support.S
index cd578eddcfb..b54b13b01fe 100644
--- a/sys/amd64/amd64/efirt_support.S
+++ b/sys/amd64/amd64/efirt_support.S
@@ -47,6 +47,9 @@ ENTRY(efi_rt_arch_call)
movq%r13, EC_R13(%rdi)
movq%r14, EC_R14(%rdi)
movq%r15, EC_R15(%rdi)
+   pushfq
+   popq%rax
+   movq%rax, EC_RFLAGS(%rdi)
movqPCPU(CURTHREAD), %rax
movq%rdi, TD_MD+MD_EFIRT_TMP(%rax)
movqPCPU(CURPCB), %rsi
@@ -98,6 +101,8 @@ efi_rt_arch_call_tail:
movqEC_RBP(%rdi), %rbp
movqEC_RSP(%rdi), %rsp
movqEC_RBX(%rdi), %rbx
+   pushq   EC_RFLAGS(%rdi)
+   popfq
 
popq%rbp
ret
diff --git a/sys/amd64/amd64/genassym.c b/sys/amd64/amd64/genassym.c
index de3969734a1..2e81b823262 100644
--- a/sys/amd64/amd64/genassym.c
+++ b/sys/amd64/amd64/genassym.c
@@ -272,3 +272,4 @@ ASSYM(EC_R12, offsetof(struct efirt_callinfo, ec_r12));
 ASSYM(EC_R13, offsetof(struct efirt_callinfo, ec_r13));
 ASSYM(EC_R14, offsetof(struct efirt_callinfo, ec_r14));
 ASSYM(EC_R15, offsetof(struct efirt_callinfo, ec_r15));
+ASSYM(EC_RFLAGS, offsetof(struct efirt_callinfo, ec_rflags));
diff --git a/sys/amd64/include/efi.h b/sys/amd64/include/efi.h
index 082223792ac..e630a338c17 100644
--- a/sys/amd64/include/efi.h
+++ b/sys/amd64/include/efi.h
@@ -72,6 +72,7 @@ struct efirt_callinfo {
register_t  ec_r13;
register_t  ec_r14;
register_t  ec_r15;
+   register_t  ec_rflags;
 };
 
 #endif /* __AMD64_INCLUDE_EFI_H_ */
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: After 12.x upgrade: mysqld RET _umtx_op -1 errno 45 Operation not supported

2019-04-15 Thread Konstantin Belousov

On Mon, Apr 15, 2019 at 08:28:39AM +, Marcin Cieslak wrote:
> On Mon, 15 Apr 2019, Konstantin Belousov wrote:
> 
> > On Mon, Apr 15, 2019 at 08:01:27AM +, Marcin Cieslak wrote:
> > > 
> > >  50766 mysqld   CALL  _umtx_op(0x966eaa0,UMTX_OP_RESERVED0,0x18cab,0,0)
> > >  50766 mysqld   RET   _umtx_op -1 errno 45 Operation not supported
> 
> > I believe these ops were removed at r263318.
> 
> That was quick, thank you!
> Is there any way this could be fixed on the library level?
You probably can write a LD_PRELOAD'ed library which would intercept
_umtx_op and emulate the call.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: After 12.x upgrade: mysqld RET _umtx_op -1 errno 45 Operation not supported

2019-04-15 Thread Konstantin Belousov

On Mon, Apr 15, 2019 at 08:01:27AM +, Marcin Cieslak wrote:
> Hello,
> 
> for archival purposes I am running an ancient (__FreeBSD_version 602100) 
> jail with mysqld inside.
> 
> This was working fine when the jail host was running 10.4. After upgrade to 
> 12.x
> (r345375) mysqld process starts but it hangs before PID is created and the
> socket opened.
> 
> Quick ktrace on the process shows this coming up all the time:
> 
>  50766 mysqld   CALL  _umtx_op(0x966eaa0,UMTX_OP_RESERVED0,0x18cab,0,0)
>  50766 mysqld   RET   _umtx_op -1 errno 45 Operation not supported
> 
> What has changed, can I somehow bring it up again? (I understood that
> we keep compatibility with very old userland).

I believe these ops were removed at r263318.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: hw.vga.acpi_ignore_no_vga=1 for installation media

2019-03-18 Thread Konstantin Belousov

On Mon, Mar 18, 2019 at 05:09:31AM +0700, Eugene Grosbein wrote:
> 18.03.2019 0:34, Konstantin Belousov wrote:
> 
> > Can anybody provide an example of machine where the flag is set but VGA
> > works ?  For me, it is set on headless NUC when there is no monitor
> > attached, and then BIOS does not configure framebuffer at all.
> 
> http://freebsd.1045724.x6.nabble.com/vt-4-related-hang-of-11-2-td6299125.html
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230172
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229235
All of them are about Silvermont/Airmont atoms which probably share reference
Intel BIOS code.

As I noted above, BIOS on mine machine is somewhat smarter, it reports
NO_VGA only if the display was not connected on boot.

> 
> > > So the proposal is about reversing the set of broken machines, but only
> > in installer ?  In other words, if it worked for installer, the installed
> > system would be broken (again) ?
> 
> VGA-based installation session won't event start unless this is fixed.
> 
> It should be easy to make installer generate the knob for target machine
> if installer sees wrong ACPI flag with working VGA hardware.
Until installer generates such knob, it is out of question to make
the config of the kernel booted from the installation media different
from the config of the installed system.

That said, did anybody considered ignoring NO_VGA FACP flag on Silvermonts
only ?  Or even better, gather SMBIOS identifications for affected BIOSes
and ignore the flag for them ?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: hw.vga.acpi_ignore_no_vga=1 for installation media

2019-03-17 Thread Konstantin Belousov

On Sun, Mar 17, 2019 at 10:10:45AM -0600, Warner Losh wrote:
> I generally like this idea... But two caveats...
> 
> First, we'd need to update the docs so that folks doing serial installs can
> unset it Though serial installs are a weird beast
> Second, if it's really needed, we should have the installer generate it.
> alas, only vt can tell us that, but it should be easy to add a sysctl to it
> that says that it has done video by ignoring the absence of the vga node...
It is not about VGA node (what is that ?).
It is about ignoring FACP flag IAPC_BOOT_ARCH={NO_VGA}, and there are
machines which actually break when trying to access VGA hardware despite
the flag is set.
Can anybody provide an example of machine where the flag is set but VGA
works ?  For me, it is set on headless NUC when there is no monitor
attached, and then BIOS does not configure framebuffer at all.

So the proposal is about reversing the set of broken machines, but only
in installer ?  In other words, if it worked for installer, the installed
system would be broken (again) ?

> 
> Warner
> 
> On Sun, Mar 17, 2019 at 6:58 AM Leon Christopher Dietrich <
> dorali...@chaotikum.org> wrote:
> 
> > Sound's like solid idea.
> >
> > A lot of systems out there lack propper ACPI description for VGA and it
> > would definitly make the installation on such a system much more easy.
> >
> > As far as I can tell it doesn't seam to break other things and even low
> > power system without VGA (like a pcengines apu2) don't seam to suffer.
What apu2 reports in FACP flags ?  Do
acpidump -dt | grep IAPC_BOOT_ARCH

> >
> > On 17.03.19 13:00, freebsd-stable-requ...@freebsd.org wrote:
> > > Date: Sun, 17 Mar 2019 02:59:12 +0700
> > > From: Eugene Grosbein 
> > > To: FreeBSD stable 
> > > Subject: hw.vga.acpi_ignore_no_vga=1 for installation media
> > > Message-ID: <912fc95d-5a5e-012b-7385-0f43f50dc...@grosbein.net>
> > > Content-Type: text/plain; charset=koi8-r
> > >
> > > Hi!
> > >
> > > Since 11.2-RELESE, default console driver vt(4) checks ACPI table for
> > presence of VGA in the system.
> > > It does not initialize console (no input, no output) if ACPI states
> > there is no VGA adapter.
> > >
> > > There are PRs describing many cases when VGA is present but ACPI lies
> > > and we have a regression compared with 11.1 and earlier:
> > > FreeBSD cannot be installed interactively onto such a system, leaving
> > aside serial console.
> > >
> > > vt(4) has loader knob to restore pre-11.2 behaviour and ignore ACPI:
> > >
> > > hw.vga.acpi_ignore_no_vga=1
> > >
> > > Should we add this unconditionally to the installation media designed
> > for interactive VGA-based installation?
> > >
> > >
> > > --
> > >
> >
> >
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 12.0 RELEASE i386 can not build a kernel?

2019-02-28 Thread Konstantin Belousov

On Thu, Feb 28, 2019 at 12:49:25AM -0800, Rodney W. Grimes wrote:
> > -- Start of PGP signed section.
> > > On 28 Feb 2019, at 00:37, Rodney W. Grimes 
> > >  wrote:
> > > > 
> > > > config CUSTOM
> > > > Kernel build directory is ../compile/CUSTOM
> > > > Don't forget to do ``make cleandepend && make depend''
> > > > fb-bld-120-i386.dnsmgr.net:root {200}# cd ../compile/CUSTOM
> > > > fb-bld-120-i386.dnsmgr.net:root {201}# (make cleandepend && make depend 
> > > > && make -j4 && make install) >
> > > > fb-bld-120-i386.dnsmgr.net:root {202}# more make.OUT
> > > > make: "../../../conf/../../../conf/kern.pre.mk" line 127: 
> > > > amd64/arm64/i386 kernel requires linker ifunc support
> > > 
> > > After ifunc support was introduced, you have to run at least
> > > "make kernel-toolchain" before "make buildkernel", or otherwise just run
> > > "make buildworld" first.  That will build the linker which supports the
> > > required functionality.
> > 
> > This is the -RELEASE, why is the release not built with the
> > proper toolchain in place?  This is not some upgrade or anything
> > odd, download 12.0-RELEASE i386 iso, install it with sources,
> > try to build a kernel.
> > 
> > I am running your suggested make kernel-toolchain now
> > to see if that fixes the problem (it shouid not, or
> > if it does we have a major issue with our release
> > building procedures.)
> 
> Sadly I have confirmed that "make kernel-toolchain" does infact
> fix the above error.
> 
> Now the begging question, why isnt the toolchain as shipped
> already properly built?
> 
> This is a stock FreeBSD-12.0=RELEASE-i386-disc1.iso install,
> with stock sources from the iso.  I was simply configuring
> a custom kernel, I should not need to build a took chain
> to build a kernel when nothing has changed from the
> RELEASE, it should already be the correct toolchain.

Let me guess.  You use 'make' in sys/i386/build/YOUR_KERNEL ?
Then you need to use 'LD=ld.ldd make'.  The proper way of
'make buildkernel' from top-level src handles it automatically.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stable/12 broken?

2019-02-22 Thread Konstantin Belousov

On Fri, Feb 22, 2019 at 07:53:57AM +0100, Antoine Brodin wrote:
> On Fri, Feb 22, 2019 at 7:39 AM Antoine Brodin  wrote:
> > Hi,
> >
> > For your information,  the stable/12 branch seems broken, at least on
> > i386, there is a segmentation fault when trying to run binaries and 0
> > package can be produced.
> > The regression happened between
> > SVN Revision: Jail stable/12 -> 344262
> > and
> > SVN Revision: Jail stable/12 -> 344454
> 
> The onlly relevant commit seems to be:
> 
> Author: kib
> Date: Thu Feb 21 12:13:27 2019
> New Revision: 344436
> URL: https://svnweb.freebsd.org/changeset/base/344436
> 
> Log:
>   MFC r344120:
>   Unify i386 and amd64 getcontextx.c, and use ifuncs while there.

Thank you, Antoine.
The commit was reverted, see the commit message in r344463 for explanation
of the issue.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: libcrypto.so.111 linked binaries SIGSEGV (in bhyve guest)

2019-02-21 Thread Konstantin Belousov

On Thu, Feb 21, 2019 at 10:03:29AM +0100, Harry Schmalzbauer wrote:
> Am 21.02.2019 um 09:54 schrieb Konstantin Belousov:
> > On Thu, Feb 21, 2019 at 09:24:43AM +0100, Harry Schmalzbauer wrote:
> >> Am 20.02.2019 um 17:51 schrieb Harry Schmalzbauer:
> >>> Hello,
> >>>
> >> …
> >>> gdb shows:
> >>> Core was generated by `/usr/sbin/auditdistd'.
> >>> Program terminated with signal 11, Segmentation fault.
> >>> Reading symbols from /lib/libutil.so.9...Reading symbols from
> >>> /usr/lib/debug//lib/libutil.so.9.debug...done.
> >>> done.
> >>> Loaded symbols for /lib/libutil.so.9
> >>> Reading symbols from /libexec/ld-elf.so.1...Reading symbols from
> >>> /usr/lib/debug//libexec/ld-elf.so.1.debug...done.
> >>> done.
> >>> Loaded symbols for /libexec/ld-elf.so.1
> >>> #0  memset (dest=0x80056f790, c=0, len=)
> >>>      at
> >>> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:5624
> >>> 5624    ((char *)dest)[i] = c;
> >>> (gdb) bt
> >>> #0  memset (dest=0x80056f790, c=0, len=)
> >>>      at
> >>> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:5624
> >>> #1  0x000800235b07 in map_object (fd=3, path=0x800246140
> >>> "/lib/libcrypto.so.111",
> >>>      sb=0x7fffd4a8)
> >>>      at
> >>> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/map_object.c:249
> >>> #2  0x000800230806 in load_object (name=0x201dba
> >>> "libcrypto.so.111", fd_u=-1,
> >>>      refobj=0x800248000, flags=)
> >>>      at
> >>> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:2493
> >>> #3  0x000800229972 in _rtld (sp=,
> >>> exit_proc=0x7fffea30,
> >>>      objp=0x7fffea38)
> >>>      at
> >>> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:2315
> >>> #4  0x000800228019 in .rtld_start ()
> >>>      at
> >>> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/amd64/rtld_start.S:39
> >>> #5  0x in ?? ()
> >>> Current language:  auto; currently minimal
> >>>
> >>> Any help highly appreciated.
> >>>
> >>> This is with a live CD (amd64), compiled with stable/12 from today (so
> >>> clang 7.01).
> >>> The bhyve guest has 2GB hardwired and ran stable/11 beforehand, which
> >>> compiled the live CD.
> >>> bhyve host is 11.2.  But that shouldn't play a role, does it?
> >>
> >> I'm really interested what happens here.
> >> I built stable/11 in that bhyve guest and updated that guest to
> >> stable/11 from yesterday.
> >> To my surpise llvm 7.01 was also merged to stable/11.  Thank you for
> >> that great supprt!
> >> No problems with any binary in the stable/11 bhyve guest.
> >>
> >> Then I built stable/12 in that re-built stable/11 guest.
> >> As result, again all binaries linked to /lib/libcrypto.so.111 crash
> >> (signal 11) with the stable/12 iso in the same bhyve guest.
> >>
> >> Here the example from ntpq:
> >> Program terminated with signal 11, Segmentation fault.
> >> Reading symbols from /lib/libedit.so.7...Reading symbols from
> >> /usr/lib/debug//lib/libedit.so.7.debug...done.
> >> done.
> >> Loaded symbols for /lib/libedit.so.7
> >> Reading symbols from /lib/libm.so.5...Reading symbols from
> >> /usr/lib/debug//lib/libm.so.5.debug...done.
> >> done.
> >> Loaded symbols for /lib/libm.so.5
> >> Reading symbols from /libexec/ld-elf.so.1...Reading symbols from
> >> /usr/lib/debug//libexec/ld-elf.so.1.debug...done.
> >> done.
> >> #0  memset (dest=0x8005ef790, c=0, len=) at
> >> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:5624
> >> 5624    ((char *)dest)[i] = c;
> >> (gdb) bt
> >> #0  memset (dest=0x8005ef790, c=0, len=) at
> >> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:5624
> >> #1  0x00080025db07 in map_object (fd=3, path=0x80026e1a0
> >> "/lib/libcrypto.so.111", sb=0x7fffd4c8) at
> >> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/map_object.c:249
> >> #2  0x000800258806 in load_object (name=0x201b40 "libcrypto.so.111",
> >> fd_u=-1,

Re: Strange rtld-elf failure on stable/12 [Was: libcrypto.so.111 linked binaries SIGSEGV (in bhyve guest)]

2019-02-21 Thread Konstantin Belousov

On Thu, Feb 21, 2019 at 09:24:43AM +0100, Harry Schmalzbauer wrote:
> Am 20.02.2019 um 17:51 schrieb Harry Schmalzbauer:
> > Hello,
> >
> …
> > gdb shows:
> > Core was generated by `/usr/sbin/auditdistd'.
> > Program terminated with signal 11, Segmentation fault.
> > Reading symbols from /lib/libutil.so.9...Reading symbols from 
> > /usr/lib/debug//lib/libutil.so.9.debug...done.
> > done.
> > Loaded symbols for /lib/libutil.so.9
> > Reading symbols from /libexec/ld-elf.so.1...Reading symbols from 
> > /usr/lib/debug//libexec/ld-elf.so.1.debug...done.
> > done.
> > Loaded symbols for /libexec/ld-elf.so.1
> > #0  memset (dest=0x80056f790, c=0, len=)
> >     at 
> > /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:5624
> > 5624    ((char *)dest)[i] = c;
> > (gdb) bt
> > #0  memset (dest=0x80056f790, c=0, len=)
> >     at 
> > /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:5624
> > #1  0x000800235b07 in map_object (fd=3, path=0x800246140 
> > "/lib/libcrypto.so.111",
> >     sb=0x7fffd4a8)
> >     at 
> > /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/map_object.c:249
> > #2  0x000800230806 in load_object (name=0x201dba 
> > "libcrypto.so.111", fd_u=-1,
> >     refobj=0x800248000, flags=)
> >     at 
> > /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:2493
> > #3  0x000800229972 in _rtld (sp=, 
> > exit_proc=0x7fffea30,
> >     objp=0x7fffea38)
> >     at 
> > /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:2315
> > #4  0x000800228019 in .rtld_start ()
> >     at 
> > /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/amd64/rtld_start.S:39
> > #5  0x in ?? ()
> > Current language:  auto; currently minimal
> >
> > Any help highly appreciated.
> >
> > This is with a live CD (amd64), compiled with stable/12 from today (so 
> > clang 7.01).
> > The bhyve guest has 2GB hardwired and ran stable/11 beforehand, which 
> > compiled the live CD.
> > bhyve host is 11.2.  But that shouldn't play a role, does it?
> 
> I'm really interested what happens here.
> I built stable/11 in that bhyve guest and updated that guest to 
> stable/11 from yesterday.
> To my surpise llvm 7.01 was also merged to stable/11.  Thank you for 
> that great supprt!
> No problems with any binary in the stable/11 bhyve guest.
> 
> Then I built stable/12 in that re-built stable/11 guest.
> As result, again all binaries linked to /lib/libcrypto.so.111 crash 
> (signal 11) with the stable/12 iso in the same bhyve guest.
> 
> Here the example from ntpq:
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /lib/libedit.so.7...Reading symbols from 
> /usr/lib/debug//lib/libedit.so.7.debug...done.
> done.
> Loaded symbols for /lib/libedit.so.7
> Reading symbols from /lib/libm.so.5...Reading symbols from 
> /usr/lib/debug//lib/libm.so.5.debug...done.
> done.
> Loaded symbols for /lib/libm.so.5
> Reading symbols from /libexec/ld-elf.so.1...Reading symbols from 
> /usr/lib/debug//libexec/ld-elf.so.1.debug...done.
> done.
> #0  memset (dest=0x8005ef790, c=0, len=) at 
> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:5624
> 5624    ((char *)dest)[i] = c;
> (gdb) bt
> #0  memset (dest=0x8005ef790, c=0, len=) at 
> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:5624
> #1  0x00080025db07 in map_object (fd=3, path=0x80026e1a0 
> "/lib/libcrypto.so.111", sb=0x7fffd4c8) at 
> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/map_object.c:249
> #2  0x000800258806 in load_object (name=0x201b40 "libcrypto.so.111", 
> fd_u=-1, refobj=0x80027, flags=) at 
> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:2493
> #3  0x000800251972 in _rtld (sp=, 
> exit_proc=0x7fffea50, objp=0x7fffea58) at 
> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/rtld.c:2315
> #4  0x000800250019 in .rtld_start () at 
> /usr/local/share/deploy-tools/RELENG_12/src/libexec/rtld-elf/amd64/rtld_start.S:39
> #5  0x in ?? ()
> 
> So please correct me if I'm comletely wrong, but the problem here seems 
> to be reproducably rtld-elf related.
> Unfortunately I don't know anything about object files and linkers and 
> the related fundamental stuff.
If you do not know about linkers, why do you claim that the problem
is related to rtld ?

> But maybe someone else has an idea what's going wrong here?

The fault happens during zeroing of bss.  Most likely it is due to some
strangeness of the object being loaded.  For diagnostic, show
the output of "readelf -a libcrypto.so.111".
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Problem building kernel STABLE12 amd64 arch

2019-02-17 Thread Konstantin Belousov

On Sun, Feb 17, 2019 at 09:11:46AM +, Filippo Moretti via freebsd-stable 
wrote:
> I tried to update stable to yesterday build and I get the following error on 
> amd64 arch
> linking kernel
> ld: error: undefined symbol: iflib_get_softc
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_legacy_intr)
> 
> ld: error: undefined symbol: iflib_admin_intr_deferred
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_legacy_intr)
> 
> ld: error: undefined symbol: iflib_get_softc
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_tx_queues_alloc)
> 
> ld: error: undefined symbol: iflib_dma_alloc_align
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_tx_queues_alloc)
> ld: error: undefined symbol: iflib_get_softc
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_rx_queues_alloc)
> 
> ld: error: undefined symbol: iflib_dma_alloc_align
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_rx_queues_alloc)
> 
> ld: error: undefined symbol: iflib_get_softc
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_queues_free)
> 
> ld: error: undefined symbol: iflib_dma_free
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_queues_free)
> 
> ld: error: undefined symbol: iflib_get_dev
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_attach_pre)>>>   
> >>>if_vmx.o:(vmxnet3_attach_pre)
> 
> ld: error: undefined symbol: iflib_get_sctx
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_attach_pre)
> 
> ld: error: undefined symbol: iflib_get_softc_ctx
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_attach_pre)
> 
> ld: error: undefined symbol: iflib_get_ifp
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_attach_pre)
> 
> ld: error: undefined symbol: iflib_get_media
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_attach_pre)
> 
> ld: error: undefined symbol: iflib_set_mac
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_attach_pre)
> ld: error: undefined symbol: iflib_get_softc_ctx
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_attach_post)
> 
> ld: error: undefined symbol: iflib_get_softc
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_attach_post)
> 
> ld: error: undefined symbol: iflib_dma_alloc_align
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_attach_post)
> 
> ld: error: undefined symbol: iflib_dma_alloc_align
> >>> referenced by if_vmx.c
> >>>   if_vmx.o:(vmxnet3_attach_post)
> 
> ld: error: too many errors emitted, stopping now (use -error-limit=0 to see 
> all errors)
> *** Error code 1
> 
> Stop.
> Stop.
> make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/STING_VT
> *** Error code 1
> 
> Stop.
> make[1]: stopped in /usr/src
> *** Error code 1
> 
> Stop.
> make: stopped in /usr/src
> 
> Any help appreciatedsincerelyFilippo

Read UPDATING note 20190214.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

HEADS-UP: em users

2019-02-15 Thread Konstantin Belousov

Please note the commit below, in particular the UPDATING entry.
In short, if you use 
  em ix ixv ixl iavf vmx
drivers and compile them into kernel statically, you need to add
device iflib to your configs.

Standard in-tree configs were updated, so if you do e.g. 'include GENERIC'
you do not need to do anything.

- Forwarded message from Konstantin Belousov  -

Date: Fri, 15 Feb 2019 09:49:09 + (UTC)
From: Konstantin Belousov 
To: src-committ...@freebsd.org, svn-src-...@freebsd.org,
svn-src-sta...@freebsd.org, svn-src-stable...@freebsd.org
Subject: svn commit: r344149 - in stable/12: . share/man/man4 sys/amd64/conf
sys/arm64/conf sys/conf sys/dev/ixgbe sys/i386/conf sys/mips/conf
sys/modules sys/modules/iflib sys/powerpc/conf sys/powerpc/conf...

Author: kib
Date: Fri Feb 15 09:49:09 2019
New Revision: 344149
URL: https://svnweb.freebsd.org/changeset/base/344149

Log:
  MFC r343617, r343618:
  Make iflib a loadable module.

Modified: stable/12/UPDATING
==
--- stable/12/UPDATING  Fri Feb 15 09:45:17 2019(r344148)
+++ stable/12/UPDATING  Fri Feb 15 09:49:09 2019(r344149)
@@ -16,6 +16,13 @@ from older versions of FreeBSD, try WITHOUT_CLANG and 
 the tip of head, and then rebuild without this option. The bootstrap process
 from older version of current across the gcc/clang cutover is a bit fragile.
 
+20190214:
+   Iflib is no longer unconditionally compiled into the kernel.  Drivers
+   using iflib and statically compiled into the kernel, now require
+   the 'device iflib' config option.  For the same drivers loaded as
+   modules on kernels not having 'device iflib', the iflib.ko module
+   is loaded automatically.
+
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: amd64, run-time linker and 32bit

2019-02-12 Thread Konstantin Belousov

On Mon, Feb 11, 2019 at 12:31:19AM +0700, Eugene Grosbein wrote:
> Hi!
> 
> Why our 32-bit run-time linker looks for shared libraries in the 
> /usr/local/lib despite of its absence in /var/run/ld-elf32.so.hints
> while 32-bit binary is started under FreeBSD 11.2-STABLE/amd64 ?
Most likely because you configured your system this way, or because your
binary sets rpath this way.

Without the data, we can only use a physhic service.

> 
> If it finds 64-bit version of library in /usr/local/lib, it fails immediately
> and does not even re-try to look at other directories noted in 
> /var/run/ld-elf32.so.hints
> such as 
> /usr/lib32:/usr/local/lib/compat/lib32:/usr/local/lib/compat/lib32/compat/pkg:/usr/local/lib32/compat
> where right 32-bit version is located.
> 
> As workaround, I can use /etc/libmap32.conf and then the binary starts just 
> fine
> but there are so many libraries. It should not even try to look to 
> /usr/local/lib
> if it is not in the /var/run/ld-elf32.so.hints, should it?
Compat32 linker default search path is /lib32:/usr/lib32 unless overritten or
reconfigured (not the same as for native linker for 32bit).
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Error upgrading 11-STABLE to 12-STABLE in ifunc resolver

2018-12-27 Thread Konstantin Belousov

On Thu, Dec 27, 2018 at 09:41:06PM +0100, Thierry Thomas wrote:
> Hello,
> 
> Trying to upgrade a machine from
> 
> 11.2-STABLE #0 r337833: Wed Aug 15 12:50:47 CEST 2018
> 
> to 12-STABLE as:
> 
> Working Copy Root Path: /usr/src
> URL: https://svn.freebsd.org/base/stable/12
> Relative URL: ^/stable/12
> Repository Root: https://svn.freebsd.org/base
> Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
> Revision: 342558
> Node Kind: directory
> Schedule: normal
> Last Changed Author: kp
> Last Changed Rev: 342545
> Last Changed Date: 2018-12-26 13:56:36 +0100 (Wed, 26 Dec 2018)
> 
> aborts due to this error:
> 
> /usr/local/libexec/ccache/world/cc -target x86_64-unknown-freebsd12.0 
> --sysroot=/usr/obj/usr/src/amd64.amd64/tmp 
> -B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin  -O2 -pipe   -DNO__SCCSID 
> -DNO__RCSID -I/usr/src/lib/libc/include -I/usr/src/include 
> -I/usr/src/lib/libc/amd64 -DNLS  -D__DBINTERFACE_PRIVATE 
> -I/usr/src/contrib/gdtoa -I/usr/src/contrib/libc-vis -DINET6 
> -I/usr/obj/usr/src/amd64.amd64/lib/libc -I/usr/src/lib/libc/resolv 
> -D_ACL_PRIVATE -DPOSIX_MISTAKE -I/usr/src/lib/libmd 
> -I/usr/src/contrib/jemalloc/include -I/usr/src/contrib/tzcode/stdtime 
> -I/usr/src/lib/libc/stdtime -I/usr/src/lib/libc/locale -DBROKEN_DES -DPORTMAP 
> -DDES_BUILTIN -I/usr/src/lib/libc/rpc -DWANT_HYPERV -DYP -DNS_CACHING 
> -DSYMBOL_VERSIONING -g -MD  -MF.depend.amd64_set_fsbase.o 
> -MTamd64_set_fsbase.o -std=gnu99 -fstack-protector-strong -Wsystem-headers 
> -Werror -Wall -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign 
> -Wno-empty-body -Wno-string-plus-int -Wno-unused-const-variable 
> -Wno-tautological-compare -Wno-unuse
 d-value -Wno-parentheses-equality -Wno-unused-function -Wno-enum-conversion 
-Wno-unused-local-typedef -Wno-address-of-packed-member -Wno-switch 
-Wno-switch-enum -Wno-knr-promoted-parameter  -Qunused-arguments  
-I/usr/src/lib/libutil -I/usr/src/lib/msun/amd64 -I/usr/src/lib/msun/x86 
-I/usr/src/lib/msun/src -c /usr/src/lib/libc/amd64/sys/amd64_set_fsbase.c -o 
amd64_set_fsbase.o
> --- amd64_get_fsbase.o ---
> /usr/src/lib/libc/amd64/sys/amd64_get_fsbase.c:60:1: error: ifunc resolver 
> function must have no parameters
> DEFINE_UIFUNC(, int, amd64_get_fsbase, (void **), static)
> ^
> /usr/obj/usr/src/amd64.amd64/tmp/usr/include/x86/ifunc.h:43:44: note: 
> expanded from macro 'DEFINE_UIFUNC'
> qual ret_type name args __attribute__((ifunc(#name "_resolver")));  \
> 
> cc is:
> FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 
> 6.0.1)
> 
> This problem has already been reported in
> 
> and should be fixed with clang 7, but I'm surprised that it seems
> impossible to upgrade from 11-STABLE to 12-STABLE; did I miss something?

Yes, you should update to the latest stable/11.  More details, you need
to have host clang which includes the r339284 commit.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: sporadic core dumps in 12.0-RELEASE

2018-12-18 Thread Konstantin Belousov

On Tue, Dec 18, 2018 at 07:34:33AM -0800, Chuck Tuffli wrote:
> Hi
> 
> When running 12.0-RELEASE in bhyve, nvmecontrol will core dump sporadically
> in rtld. This is repeatable, but doesn't happen every time. Peeking at
> rlock_acquire(), the function checks for a NULL lockstate and then
> dereferences the lock. The backtrace (below) suggests the lock is NULL but
> the lockstate pointer is not. Does anyone know if this is expected, weird,
> etc.?
This is very weird.  If you look at the frame #1, you would see that
rlock_acquire() is called for the rtld_bind_lock, which should point
to rtld_locks[0].

> 
> root@freebsd:~ # uname -a
> FreeBSD freebsd 12.0-RELEASE FreeBSD 12.0-RELEASE r341666 GENERIC  amd64
> root@freebsd:~ # /usr/libexec/gdb -q /sbin/nvmecontrol nvmecontrol.core
> Core was generated by `nvmecontrol identify nvme0'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /lib/libc.so.7...Reading symbols from
> /usr/lib/debug//lib/libc.so.7.debug...done.
> done.
> Loaded symbols for /lib/libc.so.7
> Reading symbols from /libexec/ld-elf.so.1...Reading symbols from
> /usr/lib/debug//libexec/ld-e
> lf.so.1.debug...done.
> done.
> Loaded symbols for /libexec/ld-elf.so.1
> #0  rlock_acquire (lock=0x0, lockstate=0x7fffd9b8)
> at /usr/src/libexec/rtld-elf/rtld_lock.c:203
> 203 /usr/src/libexec/rtld-elf/rtld_lock.c: No such file or directory.
> in /usr/src/libexec/rtld-elf/rtld_lock.c
> (gdb) bt
> #0  rlock_acquire (lock=0x0, lockstate=0x7fffd9b8)
> at /usr/src/libexec/rtld-elf/rtld_lock.c:203
> #1  0x00080021a2fd in _rtld_bind (obj=0x800236000, reloff=528)
> at /usr/src/libexec/rtld-elf/rtld.c:790
> #2  0x00080021704d in _rtld_bind_start ()
> at /usr/src/libexec/rtld-elf/amd64/rtld_start.S:121
> #3  0x002087de in identify_ctrlr (argc=2, argv=0x7fffebd0)
> at /usr/src/sbin/nvmecontrol/identify.c:183
> #4  0x002086e0 in identify (argc=2, argv=0x7fffebd0)
> at /usr/src/sbin/nvmecontrol/identify.c:292
> #5  0x00207935 in main (argc=, argv= optimized out>)
> at /usr/src/sbin/nvmecontrol/nvmecontrol.c:89
> #6  0x0020711b in _start (ap=, cleanup= optimized out>)
> at /usr/src/lib/csu/amd64/crt1.c:76
> #7  0x000800236000 in ?? ()
> #8  0x in ?? ()
> Current language:  auto; currently minimal
> (gdb) p *lockstate
> $1 = {lockstate = 0, env = 0x7fffd9c0}
> (gdb)
> 
> --chuck
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Address Collision using i386 4G/4G Memory Split

2018-12-18 Thread Konstantin Belousov

On Tue, Dec 18, 2018 at 11:22:53AM +0100, Alexander Lochmann wrote:
> 
> >> Some context: We are doing VM-based tracing in the FreeBSD kernel. For
> >> that, we observe parts of the kernel memory (allocations, accesses,...).
> >> Before 12.0 we simply knew that kernel addresses that we logged were
> >> unique. Moreover, when a memory access to a region of interest happened
> >> we knew that could only be kernel memory.
> >> We know have to ensure that we only record memory accesses that happen
> >> within the kernel.
> >> Our approach is to record the kernels value for the CR3 register, and
> >> record memory accesses if the CR3 registers holds the aforementioned value.
> > You must use CPL to see if the current operation mode is user or kernel.
> > If user, nothing should be done (this would avoid vm86). If kernel, you
> > need to compare current %cr3 with IdlePTD (IdlePTDP for PAE case).
> > 
> Thanks for the advice!  We'll include that in our toolchain.
> Do you use PLs other than 0(=kernel) and 3(=user)?
No, only 0 and 3.  But be careful with vm86 (I am not sure how your VM
reports it to your instrumentation).
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Address Collision using i386 4G/4G Memory Split

2018-12-18 Thread Konstantin Belousov

On Tue, Dec 18, 2018 at 10:16:35AM +0100, Alexander Lochmann wrote:
> Am 18.12.18 um 06:27 schrieb Konstantin Belousov:
> > On Mon, Dec 17, 2018 at 02:51:48PM +0100, Alexander Lochmann wrote:
> >> Hi folks!
> >>
> >> According to git commit e3089a (https://reviews.freebsd.org/D1463)
> >> FreeBSD 12.0 i386 uses separate address spaces for kernel and user
> >> space. So basically two memory areas, one in each space, can have the
> >> same address.
> >> Is this possible with FreeBSD 12.0? Is this likely to happen?
> > The feature was added to HEAD during this summer, before stable/12 was
> > branched.
> Mhmkay. But how likely is it that two memory areas will get the same
> address?
It is possible.

> Does the kernel, for example, start in the high memory region and the
> user space starts in the mid region?
No, kernel now does not relocate itself, it is running with PA == VA
for text and data segment.  Look at the kernel binary to see the
addresses.

> This would reduce the likelihood of two memory areas starting at the
> same virtual address.
I do not see why this would be even slightly needed.

> 
> Some context: We are doing VM-based tracing in the FreeBSD kernel. For
> that, we observe parts of the kernel memory (allocations, accesses,...).
> Before 12.0 we simply knew that kernel addresses that we logged were
> unique. Moreover, when a memory access to a region of interest happened
> we knew that could only be kernel memory.
> We know have to ensure that we only record memory accesses that happen
> within the kernel.
> Our approach is to record the kernels value for the CR3 register, and
> record memory accesses if the CR3 registers holds the aforementioned value.
You must use CPL to see if the current operation mode is user or kernel.
If user, nothing should be done (this would avoid vm86). If kernel, you
need to compare current %cr3 with IdlePTD (IdlePTDP for PAE case).

There are moments where kernel is executing on the user page tables.
This happens on kernel entry/exit, and sometimes on copyout(9).

> 
> > 
> >>
> >> On my opinion, this is also very expensive in terms of performance.
> >> Any copy{in,out} has to flush the TLB.
> >> (http://fxr.watson.org/fxr/source/i386/i386/copyout_fast.s#L91)
> >> Why are you still using this 4G/4G approach?
> > Because it is needed for i386 to self-host, in modern world 1G KVA
> > is too small, and because it provides Meltdown mitigation.
> > 
> 
> -- 
> Technische Universität Dortmund
> Alexander LochmannPGP key: 0xBC3EF6FD
> Otto-Hahn-Str. 16 phone:  +49.231.7556141
> D-44227 Dortmund  fax:+49.231.7556116
> http://ess.cs.tu-dortmund.de/Staff/al
> 



___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Address Collision using i386 4G/4G Memory Split

2018-12-18 Thread Konstantin Belousov

On Tue, Dec 18, 2018 at 08:34:25AM +, Brooks Davis wrote:
> On Mon, Dec 17, 2018 at 03:58:05PM -0500, Kurt Lidl wrote:
> > Alexander Lochmann writes:
> > > According to git commit e3089a (https://reviews.freebsd.org/D1463)
> > > FreeBSD 12.0 i386 uses separate address spaces for kernel and user
> > > space. So basically two memory areas, one in each space, can have the
> > > same address.
> > > Is this possible with FreeBSD 12.0? Is this likely to happen?
> > 
> > If the userspace program and the kernel address happen to overlap, the 
> > system will deal with it.  There's not anything to worry about.  As to
> > whether or not it's likely to happen -- I'm not sure about that.  I
> > expect the default stack and heap space locations for a fresh process
> > have changed due to this change, but it should not matter.
> 
> 4/4 does potentially alter the failure modes of buggy code that tries to
> read directly from userspace addresses.  For example, correct calls to
> the sysctls fixed in r342125 may panic prior to the fix because the
> addresses in question aren't mapped in kernel space.  They might also
> fail or behave bizarrely if the page is mapped and the value from the
> kernel page is used.

I believe that SMAP on amd64 is The solution to find such cases, now.
And it indeed catched several real cases, e.g. pci(4), acpi_call and
vbox from ports, besides the mentioned commit..
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Address Collision using i386 4G/4G Memory Split

2018-12-17 Thread Konstantin Belousov

On Mon, Dec 17, 2018 at 02:51:48PM +0100, Alexander Lochmann wrote:
> Hi folks!
> 
> According to git commit e3089a (https://reviews.freebsd.org/D1463)
> FreeBSD 12.0 i386 uses separate address spaces for kernel and user
> space. So basically two memory areas, one in each space, can have the
> same address.
> Is this possible with FreeBSD 12.0? Is this likely to happen?
The feature was added to HEAD during this summer, before stable/12 was
branched.

> 
> On my opinion, this is also very expensive in terms of performance.
> Any copy{in,out} has to flush the TLB.
> (http://fxr.watson.org/fxr/source/i386/i386/copyout_fast.s#L91)
> Why are you still using this 4G/4G approach?
Because it is needed for i386 to self-host, in modern world 1G KVA
is too small, and because it provides Meltdown mitigation.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: /dev/crypto not being used in 12-STABLE

2018-12-06 Thread Konstantin Belousov

On Thu, Dec 06, 2018 at 04:48:35PM -0700, John Nielsen wrote:
> Is aesni(4) even required if all you want is userland acceleration?
> 
No, it is not.  Same for rdrand_rng(4), if an application uses hw random
source directly.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Trap 12 in vm_page_alloc_after()

2018-11-18 Thread Konstantin Belousov

On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote:
> Has anyone seen this before?  It's on a busy NFS server, but hasn't
> been observed on any of our other NFS servers.
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 35; apic id = 35
> fault virtual address   = 0x5a
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x809a903d
> stack pointer   = 0x28:0xfe17eb8d0710
> frame pointer   = 0x28:0xfe17eb8d0750
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 878 (nfsd: service)
> trap number = 12
> panic: page fault
> cpuid = 35
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe17eb8d03c0
> vpanic() at vpanic+0x177/frame 0xfe17eb8d0420
> panic() at panic+0x43/frame 0xfe17eb8d0480
> trap_fatal() at trap_fatal+0x35f/frame 0xfe17eb8d04d0
> trap_pfault() at trap_pfault+0x49/frame 0xfe17eb8d0530
> trap() at trap+0x2c7/frame 0xfe17eb8d0640
> calltrap() at calltrap+0x8/frame 0xfe17eb8d0640
> --- trap 0xc, rip = 0x809a903d, rsp = 0xfe17eb8d0710, rbp = 
> 0xfe17eb8d0750 ---
> vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfe17eb8d0750
> kmem_back() at kmem_back+0xf2/frame 0xfe17eb8d07c0
> kmem_malloc() at kmem_malloc+0x60/frame 0xfe17eb8d07f0
> keg_alloc_slab() at keg_alloc_slab+0xe2/frame 0xfe17eb8d0860
> keg_fetch_slab() at keg_fetch_slab+0x14e/frame 0xfe17eb8d08b0
> zone_fetch_slab() at zone_fetch_slab+0x64/frame 0xfe17eb8d08e0
> zone_import() at zone_import+0x3f/frame 0xfe17eb8d0930
> uma_zalloc_arg() at uma_zalloc_arg+0x3d9/frame 0xfe17eb8d09a0
> zil_alloc_lwb() at zil_alloc_lwb+0x9c/frame 0xfe17eb8d09e0
> zil_lwb_write_issue() at zil_lwb_write_issue+0x2f8/frame 0xfe17eb8d0a40
> zil_commit_impl() at zil_commit_impl+0x95f/frame 0xfe17eb8d0b80
> zfs_freebsd_fsync() at zfs_freebsd_fsync+0xa7/frame 0xfe17eb8d0bb0
> VOP_FSYNC_APV() at VOP_FSYNC_APV+0x82/frame 0xfe17eb8d0be0
> nfsvno_fsync() at nfsvno_fsync+0xe0/frame 0xfe17eb8d0c50
> nfsrvd_commit() at nfsrvd_commit+0xe8/frame 0xfe17eb8d0e20
> nfsrvd_dorpc() at nfsrvd_dorpc+0x621/frame 0xfe17eb8d0ff0
> nfssvc_program() at nfssvc_program+0x557/frame 0xfe17eb8d11a0
> svc_run_internal() at svc_run_internal+0xe09/frame 0xfe17eb8d12e0
> svc_thread_start() at svc_thread_start+0xb/frame 0xfe17eb8d12f0
> fork_exit() at fork_exit+0x83/frame 0xfe17eb8d1330
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe17eb8d1330
> --- trap 0xc, rip = 0x80087101a, rsp = 0x7fffe688, rbp = 0x7fffe930 
> ---
> 
> 
> At this point the system was frozen: it did not attempt to reboot
> automatically and was not in the debugger.  I had to do a remote reset
> via the BMC.  The kernel is 11.2 r336644 (so no errata applied), but
> none of the SAs and ENs release so far look like they touch this
> region of code.

What is the line number for vm_page_alloc_after+0x15d ?
Do you have NUMA enabled on 11 ?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Panic (gpf) in early boot after upgrading FreeBSD 10.4 -> 11.2 on Ganeti

2018-11-11 Thread Konstantin Belousov

On Sun, Nov 11, 2018 at 11:24:48PM -0500, Rob Austein wrote:
> Belated upgrade (don't ask) of a pair of FreeBSD 10.4 VMs to 11.2.
> Each VM got as far as:
> 
>   freebsd-update -r 11.2-RELEASE update
>   freebsd-update install
>   reboot
> 
> Each VM got an immediate kernel panic after the reboot (log below).
> 
> The two VMs are basically identical at the system level, but run in
> separate Ganeti clusters on opposite coasts, so no hardware in common.
> A dozen or so other VMs run in each cluster without issues (including
> at least one other FreeBSD 11.2 VM), and the VMs I'm trying to upgrade
> have also been just fine until now, so the problem seems unlikely to
> be hardware per se.
> 
> GENERIC, amd64, UFS2, no non-/boot/kernel modules, one CPU per VM.
> The only things even slightly unusual about these VMs are:
> 
> * They're running in Ganeti clusters:
>   * Ganeti version 2.15.2
>   * KVM hypervisor
> 
> * They use virtio, so:
>   * Disk is vtbd0
>   * Net is vtnet0
> 
> Log of attempted boot with new kernel:
> 
> /boot/kernel/kernel text=0x1547d48 data=0x144138+0x4e9818 
> syms=[0x8+0x16aef8+0x8+0x183f99]
> Booting...
> Copyright (c) 1992-2018 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>   The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 11.2-RELEASE-p4 #0: Thu Sep 27 08:16:24 UTC 2018
>   r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
> FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on 
> LLVM 6.0.0)
> VT(vga): text 80x25
> CPU: QEMU Virtual CPU version 2.1.2 (2666.81-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x663  Family=0x6  Model=0x6  Stepping=3
>   
> Features=0x783fbfd
>   Features2=0x80a02001
>   AMD Features=0x20100800
>   AMD Features2=0x1
> Hypervisor: Origin = "KVMKVMKVM"
> real memory  = 4294967296 (4096 MB)
> avail memory = 4088406016 (3899 MB)
> Event timer "LAPIC" quality 100
> ACPI APIC Table: 
> ioapic0  irqs 0-23 on motherboard
> kernel trap 9 with interrupts disabled
> 
> 
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 0; apic id = 00
> instruction pointer   = 0x20:0x810e9ae6
> stack pointer = 0x28:0x82272c20
> frame pointer = 0x28:0x82272c80
> code segment  = base 0x0, limit 0xf, type 0x1b
>   = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags  = resume, IOPL = 0
> current process   = 0 (swapper)
> trap number   = 9
> panic: general protection fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0x80b3d587 at kdb_backtrace+0x67
> #1 0x80af6b27 at vpanic+0x177
> #2 0x80af69a3 at panic+0x43
> #3 0x80f77fdf at trap_fatal+0x35f
> #4 0x80f7759e at trap+0x5e
> #5 0x80f57fbc at calltrap+0x8
> #6 0x810ec5f3 at apic_setup_io+0x53
> #7 0x80a92898 at mi_startup+0x118
> #8 0x8031002c at btext+0x2c
> Uptime: 1s
> Automatic reboot in 15 seconds - press a key on the console to abort
> --> Press a key on the console to reboot,
> --> or switch off the system now.
> 
> Goggling turned up a few theories about bad memory and incompatible
> changes to video drivers, none of which seem likely to apply here.
> 
> Cluebat, please, somebody?
Try to issue the following commands at the loader prompt:
set hw.x2apic_enable=0
boot
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Possible memory leak in the kernel (contigmalloc)

2018-10-30 Thread Konstantin Belousov

On Tue, Oct 30, 2018 at 11:15:47AM +, Bennett, Ciunas wrote:
> Hi,
> The only other activity that is running is the csh script  that is inserting 
> and removing the kernel module.
> I am not using the VM for any other purpose but to run this test.
> In my tests the correlation between memory allocation and moving to inactive 
> list can be seen.
> I don't think any other process is creating the inactive memory.
But it is created.  More, since anon private memory is freed (not deactivated)
on the process exit, something is accumulating the memory.

The other possibility is that the memory is the caching pages from vnodes,
but for this buffers must be created and then reclaimed, which would suggest
even more activity on the system.

> Thanks.
> 
> -Original Message-----
> From: Konstantin Belousov [mailto:kostik...@gmail.com] 
> Sent: Tuesday, October 30, 2018 10:12 AM
> To: Bennett, Ciunas 
> Cc: freebsd-stable@freebsd.org
> Subject: Re: Possible memory leak in the kernel (contigmalloc)
> 
> On Tue, Oct 30, 2018 at 09:46:59AM +, Bennett, Ciunas wrote:
> > Hi,
> > 
> > I was debugging the issue by viewing the free ques "sysctl 
> > vm.phys_free" and also using "show page" in ddb. The inactive memory 
> > is never being released back into the free que. I thought that when 
> > inactive memory reaches a certain threshold that the kernel will start 
> > reclaiming and move it to the free list? In my program this is not 
> > happening, the program uses free memory (contigmalloc), and then it is 
> > put into the inactive que (contiigfree) when the program frees it.
> Contigfree() does not release memory into inactive queue.  By definition, 
> inactive pages have valid content, which is not possible for the pages freed 
> by contigfree().
> contigfree()->kmem_free()->kmem_unback()->vm_page_free().
> 
> 
> > This inactive memory is never released by the kernel, and the inactive 
> > que grows until all the memory is in this que. I have attached a xml 
> > sheet that shows the memory usage in the system.
> Inactive memory is not freed, it makes no sense as far as there is valid 
> content, which is either not recoverable (anon memory or dirty file
> pages) or high-cost to restore (clean file pages, need to re-read from disk). 
>  Inactive is processed by pagedaemon when system has memory deficit, and 
> either inactive pages are written to swap, or written to their file backing 
> storage.
> 
> I suspect that you have other activities on your system going on, which cause 
> creation of the inactive memory and unrecoverable fragmentation.
> Note that contigmalloc() tries to do defragmentation to satisfy requests, but 
> this is not always possible.
> 
> 
> > Ciunas.
> > 
> > -Original Message-
> > From: Konstantin Belousov [mailto:kostik...@gmail.com]
> > Sent: Friday, October 26, 2018 9:13 PM
> > To: Bennett, Ciunas 
> > Cc: freebsd-stable@freebsd.org
> > Subject: Re: Possible memory leak in the kernel (contigmalloc)
> > 
> > On Wed, Oct 24, 2018 at 04:27:52PM +, Bennett, Ciunas wrote:
> > > Hello,
> > > 
> > > I have encountered an issue with a kernel application that I have 
> > > written, the issue might be caused by a memory leak in the kernel.
> > > The application allocates and deallocates contiguous memory using
> > > contigmalloc() and contigfree(). The application will fail after a 
> > > period of time because there is not enough free contiguous memory 
> > > left. There could be an issue with the freeing of memory when using 
> > > the contigfree() function.
> > >
> > 
> > It is unlikely that there is an issue with a leak, but I would be not 
> > surprised if your allocation/free pattern would cause fragmentation on free 
> > lists that results in contigmalloc(9) failures after.
> > 
> > Look at the vmstat -z/vmstat -m output to see uma and malloc stats.
> > More interesting for your case can be the output from
> > sysctl vm.phys_free
> > which provides information about the free queues and order of free pages on 
> > them.
> > --
> > Intel Research and Development Ireland Limited Registered in Ireland 
> > Registered Office: Collinstown Industrial Park, Leixlip, County 
> > Kildare Registered Number: 308263
> > 
> > 
> > This e-mail and any attachments may contain confidential material for 
> > the sole use of the intended recipient(s). Any review or distribution 
> > by others is strictly prohibited. If you are not the intended 
> > recipient,

Re: Possible memory leak in the kernel (contigmalloc)

2018-10-30 Thread Konstantin Belousov

On Tue, Oct 30, 2018 at 09:46:59AM +, Bennett, Ciunas wrote:
> Hi,
> 
> I was debugging the issue by viewing the free ques "sysctl
> vm.phys_free" and also using "show page" in ddb. The inactive memory
> is never being released back into the free que. I thought that when
> inactive memory reaches a certain threshold that the kernel will start
> reclaiming and move it to the free list? In my program this is not
> happening, the program uses free memory (contigmalloc), and then it
> is put into the inactive que (contiigfree) when the program frees it.
Contigfree() does not release memory into inactive queue.  By definition,
inactive pages have valid content, which is not possible for the pages
freed by contigfree().
contigfree()->kmem_free()->kmem_unback()->vm_page_free().


> This inactive memory is never released by the kernel, and the inactive
> que grows until all the memory is in this que. I have attached a xml
> sheet that shows the memory usage in the system.
Inactive memory is not freed, it makes no sense as far as there is valid
content, which is either not recoverable (anon memory or dirty file
pages) or high-cost to restore (clean file pages, need to re-read from
disk).  Inactive is processed by pagedaemon when system has memory deficit,
and either inactive pages are written to swap, or written to their file
backing storage.

I suspect that you have other activities on your system going on, which
cause creation of the inactive memory and unrecoverable fragmentation.
Note that contigmalloc() tries to do defragmentation to satisfy requests,
but this is not always possible.


> Ciunas.
> 
> -Original Message-
> From: Konstantin Belousov [mailto:kostik...@gmail.com] 
> Sent: Friday, October 26, 2018 9:13 PM
> To: Bennett, Ciunas 
> Cc: freebsd-stable@freebsd.org
> Subject: Re: Possible memory leak in the kernel (contigmalloc)
> 
> On Wed, Oct 24, 2018 at 04:27:52PM +, Bennett, Ciunas wrote:
> > Hello,
> > 
> > I have encountered an issue with a kernel application that I have 
> > written, the issue might be caused by a memory leak in the kernel.
> > The application allocates and deallocates contiguous memory using
> > contigmalloc() and contigfree(). The application will fail after a 
> > period of time because there is not enough free contiguous memory 
> > left. There could be an issue with the freeing of memory when using 
> > the contigfree() function.
> >
> 
> It is unlikely that there is an issue with a leak, but I would be not 
> surprised if your allocation/free pattern would cause fragmentation on free 
> lists that results in contigmalloc(9) failures after.
> 
> Look at the vmstat -z/vmstat -m output to see uma and malloc stats.
> More interesting for your case can be the output from
>   sysctl vm.phys_free
> which provides information about the free queues and order of free pages on 
> them.
> --
> Intel Research and Development Ireland Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
> Registered Number: 308263
> 
> 
> This e-mail and any attachments may contain confidential material for the sole
> use of the intended recipient(s). Any review or distribution by others is
> strictly prohibited. If you are not the intended recipient, please contact the
> sender and delete all copies.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Possible memory leak in the kernel (contigmalloc)

2018-10-26 Thread Konstantin Belousov

On Wed, Oct 24, 2018 at 04:27:52PM +, Bennett, Ciunas wrote:
> Hello,
> 
> I have encountered an issue with a kernel application that I have
> written, the issue might be caused by a memory leak in the kernel.
> The application allocates and deallocates contiguous memory using
> contigmalloc() and contigfree(). The application will fail after a
> period of time because there is not enough free contiguous memory
> left. There could be an issue with the freeing of memory when using
> the contigfree() function.
>

It is unlikely that there is an issue with a leak, but I would be not
surprised if your allocation/free pattern would cause fragmentation
on free lists that results in contigmalloc(9) failures after.

Look at the vmstat -z/vmstat -m output to see uma and malloc stats.
More interesting for your case can be the output from
sysctl vm.phys_free
which provides information about the free queues and order of free pages
on them.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: loader lsdev crashes loader (Was: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated)

2018-10-23 Thread Konstantin Belousov

On Tue, Oct 23, 2018 at 08:54:24AM -0600, Warner Losh wrote:
> On Tue, Oct 23, 2018 at 5:54 AM Toomas Soome  wrote:
> 
> >
> > > On 23 Oct 2018, at 13:53, Lev Serebryakov  wrote:
> > >
> > > On 22.10.2018 12:27, Toomas Soome wrote:
> > >
> > >> It would help to get output from loader lsdev -v command.
> > > current loader crashes on "lsdev" for me:
> > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232483 (it is not
> > > threadripper-related, my hardware is Intel Atom).
> > >
> > > --
> > > // Lev Serebryakov
> > >
> >
> > That error means something is corrupting the memory, it is hard to guess
> > what exactly and debugging it is nasty - it means we would need to track
> > down what was allocated before this memory block (guard1 is marker inserted
> > in front of the allocated memory block).
> >
> > Fortunately the code to print the partition table is in
> > stand/common/disk.c and the partition code is just next to it and so it
> > should be relatively easy to find the guilty one… I will try to see if I
> > can replicate the issue.
> >
> 
> We've had reports of other systems mysteriously hanging on boot with the
> new boot loader, but not older ones. It isn't limited to new AMD boxes, but
> it's been seen on other Intel boxes of various flavors. When we crash, it
> seems like we don't have a good recovery like we do with BTX. Maybe they
> are related?

There is the 'grab_faults' command which might be used to get
information about the fault (as in, CPU fault due to the programming
mistake). You need to issue it before doing something that can cause the
fault. It is not enabled by default due to the methods it uses to catch
the exceptions.

I recently noted that at least UEFI 2.7 provides debugging interfaces
which can be used to achieve the same fault interception effect without
hacks.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Constraints in libmap(32).conf do not work as expected, possible bug in rtld-elf

2018-09-26 Thread Konstantin Belousov

On Wed, Sep 26, 2018 at 05:04:55PM +0200, Andreas Longwitz wrote:
> >> One annotation to the script /etc/rc.d/ldconfig: I had expected that
> >> this script during boot creates clean files ld-elf(32).so.hints in
> >> /var/run. For 64 bit this is true, but for 32 bit not because ldconfig
> >> with flag -32 also has flag -m. Is this intended behaviour ?
> > 
> > This seems to be from the beginning when ldconfig_local32 was
> > introduced in r154114.
See https://reviews.freebsd.org/D17331
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Constraints in libmap(32).conf do not work as expected, possible bug in rtld-elf

2018-09-25 Thread Konstantin Belousov

On Tue, Sep 25, 2018 at 12:03:32AM +0200, Andreas Longwitz wrote:
> > 
> > Can you try this instead ?
> > 
> Yes I did on a server running FreeBSD 12.0-CURRENT (GENERIC) #0 r337452
> and - after a trivial adaptation of your patch - on FreeBSD 10.4-STABLE
> #0 r337823 and everything works correct.
> 
> My simple libmap32.conf now is:
> 
> ## php52
> [/usr/local/php52/]
> /usr/local/lib  /usr/local/lib32
> /usr/local/lib/mysql/usr/local/lib32/mysql
> 
> [libc-client4.so.9]
> libssl.so.8 libssl.so.6
> libcrypto.so.8  libcrypto.so.6
> 
> My test command "/usr/local/php52/bin/php -i" loads also all the shared
> objects in /usr/local/php52/lib/php/20060613: gettext.so iconv.so
> imap.so mbstring.so mcrypt.so mysql.so pcre.so session.so xml.so.
> Further ldd gives correct output for every mentioned file.
Thanks.

> 
> I like to mention one thing concerning the source libmap.c. With the
> patch (yours or mine) and the libmap32.conf given above I see the
> following lmp_list when lm_fini() is called:
> 
> lm_fini("1, $DEFAULT$" lml-Adresse 0x2826c208)
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib/mysql, t=/usr/local/lib32/mysql")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=libcrypto.so.8, t=libcrypto.so.6")
> lm_fini("f=libssl.so.8, t=libssl.so.6")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
>  lm_fini("1, libc-client4.so.9" lml-Adresse 0x2826c168)
> lm_fini("f=libcrypto.so.8, t=libcrypto.so.6")
> lm_fini("f=libssl.so.8, t=libssl.so.6")
>  lm_fini("2, /usr/local/php52/" lml-Adresse 0x2826c068)
> lm_fini("f=/usr/local/lib/mysql, t=/usr/local/lib32/mysql")
> lm_fini("f=/usr/local/lib, t=/usr/local/lib32")
> 
> So for $DEFAULTS we have a lot of identical entries. This comes from the
> TAILQ_INSERT_HEAD statement in lm_add(). I am not sure if this can be
> accepted or a check to avoid double entries in the list is better.
Yes, this is mostly cosmetics.  It is not clear is it better to avoid
duplicates and pay the cost at insertion, or leave them and pay at the
list traversal.  I think there is slight preference to avoid dups, but
this should be not measureable.

There is a second caller of lm_add(), but there the dup should be user-caused.

> 
> One annotation to the script /etc/rc.d/ldconfig: I had expected that
> this script during boot creates clean files ld-elf(32).so.hints in
> /var/run. For 64 bit this is true, but for 32 bit not because ldconfig
> with flag -32 also has flag -m. Is this intended behaviour ?

This seems to be from the beginning when ldconfig_local32 was
introduced in r154114.

Combined patch below.

diff --git a/libexec/rtld-elf/libmap.c b/libexec/rtld-elf/libmap.c
index 592b7664eea..33c824a65af 100644
--- a/libexec/rtld-elf/libmap.c
+++ b/libexec/rtld-elf/libmap.c
@@ -353,6 +353,7 @@ lm_add(const char *p, const char *f, const char *t)
 {
struct lm_list *lml;
struct lm *lm;
+   const char *t1;
 
if (p == NULL)
p = "$DEFAULT$";
@@ -362,11 +363,14 @@ lm_add(const char *p, const char *f, const char *t)
if ((lml = lmp_find(p)) == NULL)
lml = lmp_init(xstrdup(p));
 
-   lm = xmalloc(sizeof(struct lm));
-   lm->f = xstrdup(f);
-   lm->t = xstrdup(t);
-   TAILQ_INSERT_HEAD(lml, lm, lm_link);
-   lm_count++;
+   t1 = lml_find(lml, f);
+   if (t1 == NULL || strcmp(t1, t) != 0) {
+   lm = xmalloc(sizeof(struct lm));
+   lm->f = xstrdup(f);
+   lm->t = xstrdup(t);
+   TAILQ_INSERT_HEAD(lml, lm, lm_link);
+   lm_count++;
+   }
 }
 
 char *
diff --git a/libexec/rtld-elf/rtld.c b/libexec/rtld-elf/rtld.c
index dfd0388478f..83d5e28e287

Re: Constraints in libmap(32).conf do not work as expected, possible bug in rtld-elf

2018-09-21 Thread Konstantin Belousov

On Tue, Sep 11, 2018 at 10:47:58PM +0200, Andreas Longwitz wrote:
> Thanks very much for answer !
> 
> Now I use the following libmap32.conf:
> 
> ## php52
> [/usr/local/php52/lib/php/20060613/mysql.so]
> /usr/local/lib/mysql/usr/local/lib32/mysql
> [/usr/local/php52/]
> /usr/local/lib  /usr/local/lib32
> 
> > I am having problem understanding what do you mean by step1/step2. The
> > refobj reference that you cache in the patch, comes into load_object()
> > as the pointer to the object which initiate the load_object() call. It
> > is NULL for preloaded objects, otherwise it is not.
> > So, could you, please, explain where does it get passed as NULL in your
> > case ?
> 
> Ok, I try to explain better:
> 
> step1 means: call of lm_find() in rtld.c (line 1500) and first argument
> is refobj->path, which is used in libmap.c to find the correct entry in
> the lmp_list.
> 
> step2 means: call of lm_findn() in rtld.c (line 2834) when called from
> search_library_path(). In this case the first argument is NULL and in
> libmap.c this means "$DEFAULT" entry in the lmp_list. Please notice that
> after reading libmap32.conf in lm_init() the entry $DEFAULT in the
> lmp_list does not exist, when all mappings are defined with constraints.

Can you try this instead ?

diff --git a/libexec/rtld-elf/rtld.c b/libexec/rtld-elf/rtld.c
index dfd0388478f..83d5e28e287 100644
--- a/libexec/rtld-elf/rtld.c
+++ b/libexec/rtld-elf/rtld.c
@@ -125,7 +125,7 @@ static void objlist_remove(Objlist *, Obj_Entry *);
 static int open_binary_fd(const char *argv0, bool search_in_path);
 static int parse_args(char* argv[], int argc, bool *use_pathp, int *fdp);
 static int parse_integer(const char *);
-static void *path_enumerate(const char *, path_enum_proc, void *);
+static void *path_enumerate(const char *, path_enum_proc, const char *, void 
*);
 static void print_usage(const char *argv0);
 static void release_object(Obj_Entry *);
 static int relocate_object_dag(Obj_Entry *root, bool bind_now,
@@ -140,7 +140,8 @@ static int rtld_dirname(const char *, char *);
 static int rtld_dirname_abs(const char *, char *);
 static void *rtld_dlopen(const char *name, int fd, int mode);
 static void rtld_exit(void);
-static char *search_library_path(const char *, const char *, int *);
+static char *search_library_path(const char *, const char *, const char *,
+int *);
 static char *search_library_pathfds(const char *, const char *, int *);
 static const void **get_program_var_addr(const char *, RtldLockState *);
 static void set_program_var(const char *, const void *);
@@ -1576,8 +1577,7 @@ gnu_hash(const char *s)
 static char *
 find_library(const char *xname, const Obj_Entry *refobj, int *fdp)
 {
-   char *pathname;
-   char *name;
+   char *name, *pathname, *refobj_path;
bool nodeflib, objgiven;
 
objgiven = refobj != NULL;
@@ -1597,6 +1597,7 @@ find_library(const char *xname, const Obj_Entry *refobj, 
int *fdp)
}
 
dbg(" Searching for \"%s\"", name);
+   refobj_path = objgiven ? refobj->path : NULL;
 
/*
 * If refobj->rpath != NULL, then refobj->runpath is NULL.  Fall
@@ -1605,52 +1606,61 @@ find_library(const char *xname, const Obj_Entry 
*refobj, int *fdp)
 * nodeflib.
 */
if (objgiven && refobj->rpath != NULL && ld_library_path_rpath) {
-   pathname = search_library_path(name, ld_library_path, fdp);
+   pathname = search_library_path(name, ld_library_path,
+   refobj_path, fdp);
if (pathname != NULL)
return (pathname);
if (refobj != NULL) {
-   pathname = search_library_path(name, refobj->rpath, 
fdp);
+   pathname = search_library_path(name, refobj->rpath,
+   refobj_path, fdp);
if (pathname != NULL)
return (pathname);
}
pathname = search_library_pathfds(name, ld_library_dirs, fdp);
if (pathname != NULL)
return (pathname);
-   pathname = search_library_path(name, gethints(false), fdp);
+   pathname = search_library_path(name, gethints(false),
+   refobj_path, fdp);
if (pathname != NULL)
return (pathname);
-   pathname = search_library_path(name, ld_standard_library_path, 
fdp);
+   pathname = search_library_path(name, ld_standard_library_path,
+   refobj_path, fdp);
if (pathname != NULL)
return (pathname);
} else {
nodeflib = objgiven ? refobj->z_nodeflib : false;
if (objgiven) {
-   pathname = search_library_path(name, refobj->rpath, 
fdp);
+   pathname = search_library_path(name, refobj->rpath,
+

Re: Constraints in libmap(32).conf do not work as expected, possible bug in rtld-elf

2018-09-02 Thread Konstantin Belousov

On Sat, Sep 01, 2018 at 12:32:07AM +0200, Andreas Longwitz wrote:
> On a FreeBSD 10.4-STABLE r337823 (amd64) server I have to run some old
> php52 scripts from an FreeBSD 8.4-STABLE r284383 (i386) server. I have
> copied the old php software to /usr/local/php52, installed the ports
> misc/compat8x and misc/compat9x and have copied all missing 32-bit
> libraries from the old machine to /usr/local/lib32. With the following
> libmap32.conf everything works fine:
> 
> ## php52
> /usr/local/lib  /usr/local/lib32
> /usr/local/lib/mysql/usr/local/lib32/mysql
> 
> Two examples:
> 
> -> ldd /usr/local/php52/bin/php
> /usr/local/php52/bin/php:
> libcrypt.so.5 => /usr/lib32/libcrypt.so.5 (0x28273000)
> librt.so.1 => /usr/lib32/librt.so.1 (0x28292000)
> libm.so.5 => /usr/lib32/libm.so.5 (0x28298000)
> libxml2.so.5 => /usr/local/lib32/libxml2.so.5 (0x282c2000)
> libz.so.5 => /usr/local/lib32/compat/libz.so.5 (0x283ec000)
> libiconv.so.3 => /usr/local/lib32/libiconv.so.3 (0x283fe000)
> libc.so.7 => /usr/lib32/libc.so.7 (0x284f2000)
> libthr.so.3 => /usr/lib32/libthr.so.3 (0x2866c000)
> 
> -> ldd /usr/local/php52/lib/php/20060613/mysql.so
> /usr/local/php52/lib/php/20060613/mysql.so:
> libmysqlclient.so.16 =>
> /usr/local/lib32/mysql/libmysqlclient.so.16 (0x28206000)
> libc.so.7 => /usr/lib32/libc.so.7 (0x2807)
> libcrypt.so.5 => /usr/lib32/libcrypt.so.5 (0x2835e000)
> libm.so.5 => /usr/lib32/libm.so.5 (0x2837d000)
> libz.so.5 => /usr/local/lib32/compat/libz.so.5 (0x283a7000)
> librt.so.1 => /usr/lib32/librt.so.1 (0x283b9000)
> libthr.so.3 => /usr/lib32/libthr.so.3 (0x283bf000)
> 
> Because I like to use constraints in libmap32.conf I chenged the file to
> 
> ## php52
> [/usr/local/php52/]
> /usr/local/lib  /usr/local/lib32
> 
> [/usr/local/php52/lib/php/20060613/mysql.so]
> /usr/local/lib/mysql/usr/local/lib32/mysql
> 
> The same examples as above shows that libmap does not work anymore:
> 
> -> ldd /usr/local/php52/bin/php
> /usr/local/php52/bin/php:
> libcrypt.so.5 => /usr/lib32/libcrypt.so.5 (0x28273000)
> librt.so.1 => /usr/lib32/librt.so.1 (0x28292000)
> libm.so.5 => /usr/lib32/libm.so.5 (0x28298000)
> libxml2.so.5 => not found (0)
> libz.so.5 => /usr/local/lib32/compat/libz.so.5 (0x282c2000)
> libiconv.so.3 => not found (0)
> libc.so.7 => /usr/lib32/libc.so.7 (0x282d4000)
> libthr.so.3 => /usr/lib32/libthr.so.3 (0x2844e000)
> 
> -> ldd /usr/local/php52/lib/php/20060613/mysql.so
> /usr/local/php52/lib/php/20060613/mysql.so:
>libmysqlclient.so.16 => not found (0)
>libc.so.7 => /usr/lib32/libc.so.7 (0x2807)
> 
> The constraints in libmap.conf are handled in rtld-elf with the help of
> contexts, and especially a "$DEFAULT$" context is used for all entries
> in libmap.conf before the first constraint statement. Now when rtld-elf
> loads a program the mapping rules from libmap.conf are applied. In a
> first step rtld-elf does a direct mapping using the correct context. But
> in a second step called "Searching for ..." rtld-elf uses always the
> "$DEFAULT$" context, which in the last example is empty. Therfore
> rtld-elf does never find a library in his searching step and cannot load
> programs where searching for libraries is necessary.
> 
> I cannot see any reason why rtld-elf should change the context between
> step1 and step2, The following patch provokes that rtld-elf uses the
> context from step1 in step2 too:
I am having problem understanding what do you mean by step1/step2. The
refobj reference that you cache in the patch, comes into load_object()
as the pointer to the object which initiate the load_object() call. It
is NULL for preloaded objects, otherwise it is not.
So, could you, please, explain where does it get passed as NULL in your
case ?

Also, your patch makes the ref_object stuck for all future invocations
of the load_object(), so it cannot be correct for this reason alone.

Another note is that libmap.conf use that you put it for, is quite the
strengthen of the original purpose.  You should just add the pathes
with your libraries to LD_32_LIBRARY_PATH or configure them into
/var/run/ld-elf32.so.hints using 'ldconfig -32'.

> 
> --- rtld.c.orig 2018-03-20 16:56:48.0 +0100
> +++ rtld.c  2018-08-31 23:17:18.051206000 +0200
> @@ -186,6 +186,7 @@
>  static Obj_Entry obj_rtld; /* The dynamic linker shared object */
>  static unsigned int obj_count; /* Number of objects in obj_list */
>  static unsigned int obj_loads; /* Number of loads of objects (gen count) */
> +static char *save_refobj_path;
> 
>  static Objlist list_global =   /* Objects dlopened with RTLD_GLOBAL */
>STAILQ_HEAD_INITIALIZER(list_global);
> @@ -1499,6 +1500,7 @@
>  if (libmap_disable || !objgiven ||
> (name = lm_find(refobj->path, xname)) == NULL)
> name = (char *)xname;

Heads up: OFED build by default

2018-08-07 Thread Konstantin Belousov

I am going to merge revisions r336568, r336569, and r336570 from HEAD to
stable/11. They enable the build of the OFED libraries by default, and
move the build of most of the utilities under the WITH_OFED_EXTRA knob.
Also as a minor fix, since libpcap lives in /lib and depends on two OFED
libraries, these libraries are moved into /lib.

I do not expect that any user of stable/11 would be seriosly affected by
the change.  Small detail is that / filesystem becomes somewhat larger.
Slightly bigger detail is that if you really need OpenSM or some of the
utilities, you should use WITH_OFED_EXTRA instead of WITH_OFED knob.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ? (with sort of workaround)

2018-07-27 Thread Konstantin Belousov

On Fri, Jul 27, 2018 at 10:09:35AM -0400, Mike Tancsa wrote:
> On 7/27/2018 10:00 AM, Pete French wrote:
> > pkg install linux_base-c7
> 
> Same deal here
> 
> 0{ryzenbsd11}# /compat/linux/bin/bash
> Segmentation fault (core dumped)
> 139{ryzenbsd11}#
> 
> 
> This is stock FreeBSD image r335560.
By stock you mean that no patches were applied, right ?

> 
> 0{ryzenbsd11}# /compat/linux/bin/bash
> Segmentation fault (core dumped)
> 139{ryzenbsd11}#
> 
> 
> pid 58901 (gio-querymodules-64), uid 0: exited on signal 11 (core dumped)
> pid 58915 (bash), uid 0: exited on signal 11 (core dumped)
> pid 58997 (gio-querymodules-64), uid 0: exited on signal 11 (core dumped)
> pid 59027 (bash), uid 0: exited on signal 11 (core dumped)
> 
> 
>   ---Mike
> 
> 
> -- 
> ---
> Mike Tancsa, tel +1 519 651 3400 x203
> Sentex Communications, m...@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ? (with sort of workaround)

2018-07-27 Thread Konstantin Belousov

On Fri, Jul 27, 2018 at 12:01:09PM +0100, Pete French wrote:
> So, I have been running the patched kernel for quiet a while now, and it 
> works fine for me, but last night I did hit a surprising issue - the 
> Linux emulator does not work on Ryzen / Epyc. I tried this on two 
> machines (both with the patches) and it coredumps when simply running 
> bash on both of them. I copied the OS over to an Intel machine, and that 
> works fine.
> 
> I have not tried running with an unpatched kernel on the Ryzen machine 
> (I dont have one to hand) but I did try applying the sysctls to the 
> Intel box to see if that wuld cause the Linux binaries to crash. It didn't.

I highly doubt that this can be related.

BTW, I forgot about the patch.  If nothing happens, I will commit it
today.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ? (with sort of workaround)

2018-07-05 Thread Konstantin Belousov

On Thu, Jul 05, 2018 at 02:58:29PM +0100, Pete French wrote:
> > Which other files ?
> 
> sys/x86/include/specialreg.h and sys/x86/x86/cpu_machdep.c
> 
> Those are in your original patch as well as the change
> to sys/amd64/amd64/initcpu.c, but your email earlier only
> patches sys/amd64/amd64/initcpu.c and not the others.
> 
> So I assumed I would keep the changes to the other two files ?


Right, I forgot about mwait. specialreg.h is cosmetics which I already
committed.

diff --git a/sys/amd64/amd64/initcpu.c b/sys/amd64/amd64/initcpu.c
index ccc5e64d0c4..bb342f42dec 100644
--- a/sys/amd64/amd64/initcpu.c
+++ b/sys/amd64/amd64/initcpu.c
@@ -130,6 +130,30 @@ init_amd(void)
}
}
 
+   /* Ryzen erratas. */
+   if (CPUID_TO_FAMILY(cpu_id) == 0x17 && CPUID_TO_MODEL(cpu_id) == 0x1 &&
+   (cpu_feature2 & CPUID2_HV) == 0) {
+   /* 1021 */
+   msr = rdmsr(0xc0011029);
+   msr |= 0x2000;
+   wrmsr(0xc0011029, msr);
+
+   /* 1033 */
+   msr = rdmsr(0xc0011020);
+   msr |= 0x10;
+   wrmsr(0xc0011020, msr);
+
+   /* 1049 */
+   msr = rdmsr(0xc0011028);
+   msr |= 0x10;
+   wrmsr(0xc0011028, msr);
+
+   /* 1095 */
+   msr = rdmsr(0xc0011020);
+   msr |= 0x200;
+   wrmsr(0xc0011020, msr);
+   }
+
/*
 * Work around a problem on Ryzen that is triggered by executing
 * code near the top of user memory, in our case the signal
diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c
index d897d518cbc..3416f949686 100644
--- a/sys/x86/x86/cpu_machdep.c
+++ b/sys/x86/x86/cpu_machdep.c
@@ -709,6 +709,13 @@ cpu_idle_tun(void *unused __unused)
 
if (TUNABLE_STR_FETCH("machdep.idle", tunvar, sizeof(tunvar)))
cpu_idle_selector(tunvar);
+   else if (cpu_vendor_id == CPU_VENDOR_AMD &&
+   CPUID_TO_FAMILY(cpu_id) == 0x17 && CPUID_TO_MODEL(cpu_id) == 0x1) {
+   /* Ryzen erratas 1057, 1109. */
+   cpu_idle_selector("hlt");
+   idle_mwait = 0;
+   }
+
if (cpu_vendor_id == CPU_VENDOR_INTEL && cpu_id == 0x506c9) {
/*
 * Apollo Lake errata APL31 (public errata APL30).
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ? (with sort of workaround)

2018-07-05 Thread Konstantin Belousov

On Thu, Jul 05, 2018 at 02:23:15PM +0100, Pete French wrote:
> 
> 
> On 05/07/2018 11:47, Konstantin Belousov wrote:
> > Why do you state that they are saved/restored ?  What is the evidence ?
> 
> 
> https://software.intel.com/en-us/blogs/2009/06/25/virtualization-and-performance-understanding-vm-exits
> 
> specificly...
> 
> 3) "Save MSRs in the VM-exit MSR-store area."
> 
> and
> 
> 5) "Load MSRs from the VM-exit MSR-load area."
> 
> but maybe thats not actyually true, I assumed it was given its an Intel 
> document, but admittedly its not an actual specification.
This is true, but absolutely irrelevant.

Modern CPUs have hundreds, if not thousands, MSR registers.  Only some of
them define architectural state, and saved/restored on the context switches.
Chicken bits are global knobs not relevant to the vmm entry.

> 
> 
> > On VM the patch should be NOP, testing it is a waste of time IMO.
> 
> 
> OK, will ignore that then. I am running the new patch on my workstation 
> now - I still need the old patch for the other files, yes ?
Which other files ?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Kernel build fails

2018-07-05 Thread Konstantin Belousov

On Thu, Jul 05, 2018 at 12:54:50PM +0200, Dries Michiels wrote:
> Hello, 
> 
> 
> Today I wanted to upgrade to newest revision of the 11-stable branch but I
> have the following kernel build error:
> 
>  
> 
> --- kern_kthread.o ---
> 
> cc -target x86_64-unknown-freebsd11.2 --sysroot=/usr/obj/usr/src/tmp
> -B/usr/obj/usr/src/tmp/usr/bin -c -O2 -pipe -fno-strict-aliasing  -g
> -nostdinc  -I. -I/usr/src/sys -I/usr/src/sys/contrib/libfdt -D_KERNEL
> -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h  -fno-omit-frame-pointer
> -mno-omit-leaf-frame-pointer -MD  -MF.depend.kern_kthread.o
> -MTkern_kthread.o -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse
> -msoft-float  -fno-asynchronous-unwind-tables -ffreestanding -fwrapv
> -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs
> -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline
> -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__
> -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas
> -Wno-error-tautological-compare -Wno-error-empty-body
> -Wno-error-parentheses-equality -Wno-error-unused-function
> -Wno-error-pointer-sign -Wno-error-shift-negative-value
> -Wno-error-address-of-packed-member  -mno-aes -mno-avx  -std=iso9899:1999
> -Werror  /usr/src/sys/kern/kern_kthread.c
> 
> --- kern_exit.o ---
> 
> ctfconvert -L VERSION -g kern_exit.o
> 
> --- kern_jail.o ---
> 
> /usr/src/sys/kern/kern_jail.c:3943:15: error: unused variable 'p'
> [-Werror,-Wunused-variable]
> 
> struct proc *p;
> 
>  ^
> 
> /usr/src/sys/kern/kern_jail.c:3944:16: error: unused variable 'cred'
> [-Werror,-Wunused-variable]
> 
> struct ucred *cred;
> 
>   ^
> 
> 2 errors generated.

You have RACCT but not RCTL in the kernel config ?  Try this.

diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c
index b8bcbd49420..457a590cdf5 100644
--- a/sys/kern/kern_jail.c
+++ b/sys/kern/kern_jail.c
@@ -3988,8 +3988,10 @@ prison_racct_attach(struct prison *pr)
 static void
 prison_racct_modify(struct prison *pr)
 {
+#ifdef RCTL
struct proc *p;
struct ucred *cred;
+#endif
struct prison_racct *oldprr;
 
ASSERT_RACCT_ENABLED();
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ? (with sort of workaround)

2018-07-05 Thread Konstantin Belousov

On Thu, Jul 05, 2018 at 11:43:29AM +0100, Pete French wrote:
> > It does not make any sense to even try to access the chicken bits
> > MSRs when running under virtualization.  It is the duty of the
> > hypervisor to configure hardware. 
> 
> I would tend to agree with you :-) I was kind of surprised to read that they
> are actually saved and restored as part of a VM context switch in fact.
Why do you state that they are saved/restored ?  What is the evidence ?

> 
> > I updated the patch.
> 
> Thanks I shall try this now on my workstation and the Epyc virtual machine

On VM the patch should be NOP, testing it is a waste of time IMO.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ? (with sort of workaround)

2018-07-05 Thread Konstantin Belousov

On Thu, Jul 05, 2018 at 11:13:10AM +0100, Pete French wrote:
> So, I got my first lockup in weeks, testing with the latest stable
> and the patch which sets the kernel bits. But I cant say it its
> Ryzen related or not.
> 
> Meanwhile I also got access to an Epyc server in Azure. Am also
> runing the latest STABLE on that tp see how it goes. Interesting
> thing there is that there appears to be no access to the MSR's.
> They all appear as zerousing cpucontrol. I am not entirely surprised
> by this as the are very low level, but I di think they were saved
> and restored during context switches between virtual machines so I
> was hoping to be able to set them. Is this normal ?

It does not make any sense to even try to access the chicken bits
MSRs when running under virtualization.  It is the duty of the
hypervisor to configure hardware.  I updated the patch.

diff --git a/sys/amd64/amd64/initcpu.c b/sys/amd64/amd64/initcpu.c
index ccc5e64d0c4..bb342f42dec 100644
--- a/sys/amd64/amd64/initcpu.c
+++ b/sys/amd64/amd64/initcpu.c
@@ -130,6 +130,30 @@ init_amd(void)
}
}
 
+   /* Ryzen erratas. */
+   if (CPUID_TO_FAMILY(cpu_id) == 0x17 && CPUID_TO_MODEL(cpu_id) == 0x1 &&
+   (cpu_feature2 & CPUID2_HV) == 0) {
+   /* 1021 */
+   msr = rdmsr(0xc0011029);
+   msr |= 0x2000;
+   wrmsr(0xc0011029, msr);
+
+   /* 1033 */
+   msr = rdmsr(0xc0011020);
+   msr |= 0x10;
+   wrmsr(0xc0011020, msr);
+
+   /* 1049 */
+   msr = rdmsr(0xc0011028);
+   msr |= 0x10;
+   wrmsr(0xc0011028, msr);
+
+   /* 1095 */
+   msr = rdmsr(0xc0011020);
+   msr |= 0x200;
+   wrmsr(0xc0011020, msr);
+   }
+
/*
 * Work around a problem on Ryzen that is triggered by executing
 * code near the top of user memory, in our case the signal
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ? (with sort of workaround)

2018-07-03 Thread Konstantin Belousov

On Tue, Jul 03, 2018 at 10:27:06AM +0100, Pete French wrote:
> 
> > It is very likely that the latest microcode sets the chicken bits for the
> > known erratas already.  AFAIK, this is the best that a ucode update
> > can typically do anyway.
> >
> 
> I just did some testing - it does do these bits:
By 'it' you mean the microcode update/BIOS on your board ?

> 
> 
>  cpucontrol -m '0xc0011029|=0x2000' $x
>  cpucontrol -m '0xc0011020|=0x10' $x
> 
> but it does not do these bits:
> 
>  cpucontrol -m '0xc0011028|=0x10' $x
>  cpucontrol -m '0xc0011020|=0x200' $x
> 
> (though someone else might want to doubel check that as I may have 
> miscounted the bits!)
> 
> am going to trey your patch today
> 
> -pete.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ? (with sort of workaround)

2018-07-01 Thread Konstantin Belousov

On Sun, Jul 01, 2018 at 11:15:56AM +0100, Pete French wrote:
> > This should be the kernel patch equivalent to the script.
> 
> Ah, thankyou. I shall give this a try on tuesday when I am
> physically back in front of the machine. I have been trying without
> the oath as you asked by the way, and with the latest microcode
> update (0x8001137) it also seems stable, without these tweaks. But I
> havent stressed it too much - if the errata says to set the bits then
> we should set the bits.

It is very likely that the latest microcode sets the chicken bits for the
known erratas already.  AFAIK, this is the best that a ucode update
can typically do anyway.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ? (with sort of workaround)

2018-06-30 Thread Konstantin Belousov

On Tue, Jun 26, 2018 at 01:32:01PM +0100, Pete French wrote:
> the dmesg wraps around if I boot verbosely, but heres the contnets of
> /var/log/messages from the time it starts to where it stops
> talking about CPU specific stuff... if you need something else then
> let me know - this is an easy machine to reboot and play about with.

This should be the kernel patch equivalent to the script.
According to the revision document, some of the erratas are applicable
to the Ryzen 2, but I do not want to do the bit tweaking without a
confirmation.

diff --git a/sys/amd64/amd64/initcpu.c b/sys/amd64/amd64/initcpu.c
index ccc5e64d0c4..aac3ccb7c73 100644
--- a/sys/amd64/amd64/initcpu.c
+++ b/sys/amd64/amd64/initcpu.c
@@ -130,6 +130,29 @@ init_amd(void)
}
}
 
+   /* Ryzen erratas. */
+   if (CPUID_TO_FAMILY(cpu_id) == 0x17 && CPUID_TO_MODEL(cpu_id) == 0x1) {
+   /* 1021 */
+   msr = rdmsr(0xc0011029);
+   msr |= 0x2000;
+   wrmsr(0xc0011029, msr);
+
+   /* 1033 */
+   msr = rdmsr(0xc0011020);
+   msr |= 0x10;
+   wrmsr(0xc0011020, msr);
+
+   /* 1049 */
+   msr = rdmsr(0xc0011028);
+   msr |= 0x10;
+   wrmsr(0xc0011028, msr);
+
+   /* 1095 */
+   msr = rdmsr(0xc0011020);
+   msr |= 0x200;
+   wrmsr(0xc0011020, msr);
+   }
+
/*
 * Work around a problem on Ryzen that is triggered by executing
 * code near the top of user memory, in our case the signal
diff --git a/sys/x86/include/specialreg.h b/sys/x86/include/specialreg.h
index 0ea6e61e652..c3900dadf05 100644
--- a/sys/x86/include/specialreg.h
+++ b/sys/x86/include/specialreg.h
@@ -998,18 +998,18 @@
 #defineMSR_TOP_MEM 0xc001001a  /* boundary for ram below 4G */
 #defineMSR_TOP_MEM20xc001001d  /* boundary for ram above 4G */
 #defineMSR_NB_CFG1 0xc001001f  /* NB configuration 1 */
+#defineMSR_K8_UCODE_UPDATE 0xc0010020  /* update microcode */
+#defineMSR_MC0_CTL_MASK 0xc0010044
 #defineMSR_P_STATE_LIMIT 0xc0010061/* P-state Current Limit 
Register */
 #defineMSR_P_STATE_CONTROL 0xc0010062  /* P-state Control Register */
 #defineMSR_P_STATE_STATUS 0xc0010063   /* P-state Status Register */
 #defineMSR_P_STATE_CONFIG(n) (0xc0010064 + (n)) /* P-state Config */
 #defineMSR_SMM_ADDR0xc0010112  /* SMM TSEG base address */
 #defineMSR_SMM_MASK0xc0010113  /* SMM TSEG address mask */
+#defineMSR_VM_CR   0xc0010114  /* SVM: feature control */
+#defineMSR_VM_HSAVE_PA 0xc0010117  /* SVM: host save area address 
*/
 #defineMSR_EXTFEATURES 0xc0011005  /* Extended CPUID Features 
override */
 #defineMSR_IC_CFG  0xc0011021  /* Instruction Cache 
Configuration */
-#defineMSR_K8_UCODE_UPDATE 0xc0010020  /* update microcode */
-#defineMSR_MC0_CTL_MASK0xc0010044
-#defineMSR_VM_CR   0xc0010114 /* SVM: feature control */
-#defineMSR_VM_HSAVE_PA 0xc0010117 /* SVM: host save area 
address */
 
 /* MSR_VM_CR related */
 #defineVM_CR_SVMDIS0x10/* SVM: disabled by BIOS */
diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c
index d897d518cbc..3416f949686 100644
--- a/sys/x86/x86/cpu_machdep.c
+++ b/sys/x86/x86/cpu_machdep.c
@@ -709,6 +709,13 @@ cpu_idle_tun(void *unused __unused)
 
if (TUNABLE_STR_FETCH("machdep.idle", tunvar, sizeof(tunvar)))
cpu_idle_selector(tunvar);
+   else if (cpu_vendor_id == CPU_VENDOR_AMD &&
+   CPUID_TO_FAMILY(cpu_id) == 0x17 && CPUID_TO_MODEL(cpu_id) == 0x1) {
+   /* Ryzen erratas 1057, 1109. */
+   cpu_idle_selector("hlt");
+   idle_mwait = 0;
+   }
+
if (cpu_vendor_id == CPU_VENDOR_INTEL && cpu_id == 0x506c9) {
/*
 * Apollo Lake errata APL31 (public errata APL30).
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ? (with sort of workaround)

2018-06-26 Thread Konstantin Belousov

On Tue, Jun 26, 2018 at 11:31:26AM +0100, Pete French wrote:
> > On 06/18/2018 09:34, Pete French wrote:
> > > Preseumably in the slightly longer term these workarounds go into the
> > > actual kernel if it detects Ryzen ?
> >
> > Yes, Kostik said he would code this into the kernel after he gets enough
> > feedback.
> 
> So, I've been running with the sysctl and cputl fixes from
> https://lists.freebsd.org/pipermail/freebsd-current/2018-June/069799.html
> for a couple of weeks now, with all the default settings back on (including
> SMT) and it now completely stable, so consider this one more point of feedback

If you run without the script, with the same settings, do you experience
problems ?

Also, please show the 100 first lines of the verbose boot dmesg on this
machine.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: regression: tmpfs in /etc/fstab results in boot stoppage

2018-06-17 Thread Konstantin Belousov

On Mon, Jun 18, 2018 at 08:46:35AM +1200, Jonathan Chen wrote:
> Hi,
> 
> I've updated to r335297 on STABLE-11. My root fs is ZFS, and /etc/fstab is:
> 
> # DeviceMountpoint  FStype  Options DumpPass#
> /dev/gpt/irontree-swap  noneswapsw  0   0
> tmpfs   /home/jonc/.cache tmpfs rw,size=512m,late   0 2
> 
> My /etc/rc.conf also contains:
> background_fsck="NO"
> 
> When I reboot the system, the kernel boots; but the startup scripts fail with:
> 
> fsck: exec fsck_tmpfs for tmpfs in /sbin/:/usr/sbin: No such file or directory
> THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY:
>   tmpfs: tmpfs (/home/jonc/.cache)
> 
> The system continues booting up as usual once I exit single-user mode.
> The system used to boot up with any intervention prior to this.

You specified fsck pass for tmpfs mount, which makes no sense and cannot
work since there is no fsck_tmpfs.  Fix you fstab.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ldconfig(8) oddity on 11.2-BETA3?

2018-05-27 Thread Konstantin Belousov

On Sun, May 27, 2018 at 12:49:12PM +, Antoine Brodin wrote:
> On Sat, May 26, 2018 at 10:29 PM, Jonathan Chen  wrote:
> > Hi,
> >
> > I'm running 11.2-BETA3/amd64 at r334236, and I've noticed that
> > "ldconfig -m" doesn't behave as expected (or perhaps it's my
> > understanding).
> >
> > This is what I'm seeing when building security/nss in a chrooted 
> > environment:
> >
> > # ldconfig -r | grep nss
> > # ls /usr/local/lib/nss
> > libcrmf.a   libnss3.so  libnssutil3.so
> >  libssl3.so
> > libfreebl3.so   libnssckbi.so   libsmime3.so
> > libfreeblpriv3.so   libnssdbm3.so   libsoftokn3.so
> > # ldconfig -m /usr/local/lib/nss
> > # ldconfig -r | grep nss
> > search directories:
> > /lib:/usr/lib:/usr/local/lib:/usr/local/lib/perl5/5.26/mach/CORE:/usr/local/lib/nss
> > # ldconfig -R | grep nss
> > # ldconfig -r | grep nss
> > search directories:
> > /lib:/usr/lib:/usr/local/lib:/usr/local/lib/perl5/5.26/mach/CORE:/usr/local/lib/nss
> > # file /usr/local/lib/nss/*.so
> > /usr/local/lib/nss/libfreebl3.so: ELF 64-bit LSB shared object,
> > x86-64, version 1 (FreeBSD), dynamically linked, stripped
> > /usr/local/lib/nss/libfreeblpriv3.so: ELF 64-bit LSB shared object,
> > x86-64, version 1 (FreeBSD), dynamically linked, stripped
> > /usr/local/lib/nss/libnss3.so:ELF 64-bit LSB shared object,
> > x86-64, version 1 (FreeBSD), dynamically linked, stripped
> > /usr/local/lib/nss/libnssckbi.so: ELF 64-bit LSB shared object,
> > x86-64, version 1 (FreeBSD), dynamically linked, stripped
> > /usr/local/lib/nss/libnssdbm3.so: ELF 64-bit LSB shared object,
> > x86-64, version 1 (FreeBSD), dynamically linked, stripped
> > /usr/local/lib/nss/libnssutil3.so:ELF 64-bit LSB shared object,
> > x86-64, version 1 (FreeBSD), dynamically linked, stripped
> > /usr/local/lib/nss/libsmime3.so:  ELF 64-bit LSB shared object,
> > x86-64, version 1 (FreeBSD), dynamically linked, stripped
> > /usr/local/lib/nss/libsoftokn3.so:ELF 64-bit LSB shared object,
> > x86-64, version 1 (FreeBSD), dynamically linked, stripped
> > /usr/local/lib/nss/libssl3.so:ELF 64-bit LSB shared object,
> > x86-64, version 1 (FreeBSD), dynamically linked, stripped
> >
> > Is this correct ldconfig behaviour or has something broken?
> >
> 
> Hi,
> 
> This looks normal, from the ldconfig(8) man page:
> 
>   Filenames must conform to the lib*.so.[0-9] pattern in order to be
> added to the hints file.
For ELF executable format, ld-elf.so.hints only contains the configured
library path.  Dynamic linker reads the directories specified there, for
typical loading of the library for normally configured case.

ldconfig -r does the same, but additionally filters the output by the
lib.*\.so\.[0-9]+ patern.  Dynamic linker does not filter and uses the
name from DT_NEEDED as is.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: svn commit: r334152 - in stable/11/sys: amd64/amd64 amd64/include dev/cpuctl i386/include x86/acpica x86/include x86/x86

2018-05-25 Thread Konstantin Belousov

On Fri, May 25, 2018 at 01:21:00PM -0400, Mike Tancsa wrote:
> On 5/24/2018 9:17 AM, Konstantin Belousov wrote:
> > Author: kib
> > Date: Thu May 24 13:17:24 2018
> > New Revision: 334152
> > URL: https://svnweb.freebsd.org/changeset/base/334152
> > 
> > Log:
> >   MFC r334004:
> >   Add Intel Spec Store Bypass Disable control.
> >   
> >   This also includes the i386/include/pcpu.h part of the r334018.
> >   
> 
> Hi,
>   This commit broke my i386 nanobsd kernel.  GENERIC kernels build just
> fine, but the kernel I have been using for i386 alix and Soekris boxes
> no longer builds
> 
> Apart from removing some unneeded drivers from GENERIC, the CPU options
> I use are
> 
> 
> cpu I586_CPU
> options CPU_GEODE
> ident   ALIX_DSK
You do not have the SMP option in the config, right ?

> 
> /usr/src/sys/x86/x86/cpu_machdep.c:890:3: error: this function
> declaration is not a prototype
>   [-Werror,-Wstrict-prototypes]
> /usr/src/sys/x86/x86/cpu_machdep.c:890:17: error: expected ';' after
> expression
> CPU_FOREACH(i) {
>   ^
I forgot about merging r334064.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: extract the process arguments from the crashdump

2018-05-14 Thread Konstantin Belousov

On Mon, May 14, 2018 at 05:32:21PM +0500, Eugene M. Zheganin wrote:
> Hello,
> 
> On 14.05.2018 16:15, Konstantin Belousov wrote:
> > On Mon, May 14, 2018 at 01:02:28PM +0500, Eugene M. Zheganin wrote:
> >> Hello,
> >>
> >>
> >> Is there any way to extract the process arguments from the system
> >> crashdump ? If yes, could anyone please explain to me how do I do it.
> > ps -M vmcore.file -N /boot/mykernel/kernel -auxww
> 
> Well, unfortunately this gives me exactly same information as the 
> core.X.txt file contains - process names without arguments, and I really 
> want to know what arguments ctladm had when the system has crashed:

Most likely the in-kernel cache for the process arguments was dropped.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: extract the process arguments from the crashdump

2018-05-14 Thread Konstantin Belousov

On Mon, May 14, 2018 at 01:02:28PM +0500, Eugene M. Zheganin wrote:
> Hello,
> 
> 
> Is there any way to extract the process arguments from the system 
> crashdump ? If yes, could anyone please explain to me how do I do it.

ps -M vmcore.file -N /boot/mykernel/kernel -auxww
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 11.1-RELEASE-p10 cannot compile freebsd stable/11 kernel?

2018-05-14 Thread Konstantin Belousov

On Sun, May 13, 2018 at 08:06:47PM -0500, Mike Karels wrote:
> [details omittied]
> > > I know that clang has been updated a lot; has the kernel source gotten
> > > ahead of clang on stable/11?
> > On stable/11 they are in sync.  The official method of upgrade is
> > make buildworld buildkernel
> > from older version takes care of the compiler version transparently.
> > If you use config/make, ensure that the installed world is at the
> > compatible level for the kernel sources.
> 
> So the freebsd-update version is not in sync with the -stable branch?
> That was not at all obvious to me.  I upgrade from source on my -current
> test system, but normally use freebsd-update on my production systems
> (until it failed to update the kernel).

freebsd-update never follows stable.  re@ only provides updates for releases,
and for beta/RCs.  11.2-BETA1 was released three days ago, from which moment
you can update to it using freebsd-update.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 11.1-RELEASE-p10 cannot compile freebsd stable/11 kernel?

2018-05-13 Thread Konstantin Belousov

On Sun, May 13, 2018 at 09:58:29AM -0500, Mike Karels wrote:
> I attempted a kernel compile from stable/11, as a freebsd-update didn't
> seem to update /usr/src/sys, and I'm running a custom kernel.  I get
> compile errors like this:
> 
> ../../../amd64/amd64/support.S:829:2: error: unknown directive
>  .altmacro
>  ^
> :1:13: error: invalid register name
> handle_ibrs_%(ll):
>   ^~
> :3:2: note: while in macro instantiation
>  ibrs_seq_label %(ll)
>  ^
> :2:2: note: while in macro instantiation
>  .rept 32
>  ^
> ../../../amd64/amd64/support.S:858:2: note: while in macro instantiation
>  ibrs_seq 32
>  ^
> :1:13: error: invalid register name
> handle_ibrs_%(ll):
>   ^~
> :8:2: note: while in macro instantiation
>  ibrs_seq_label %(ll)
>  ^
> :2:2: note: while in macro instantiation
>  .rept 32
>  ^
> ../../../amd64/amd64/support.S:858:2: note: while in macro instantiation
>  ibrs_seq 32
>  ^
> :1:13: error: invalid register name
> handle_ibrs_%(ll):
> (and this continues)
> 
> I had just run freebsd-update:
> 
> pughole# freebsd-version
> 11.1-RELEASE-p10
> 
> 
> pughole# cc --version
> FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on 
> LLVM 4.0.0)
> Target: x86_64-unknown-freebsd11.1
> Thread model: posix
> InstalledDir: /usr/bin
> 
> I know that clang has been updated a lot; has the kernel source gotten
> ahead of clang on stable/11?
On stable/11 they are in sync.  The official method of upgrade is
make buildworld buildkernel
from older version takes care of the compiler version transparently.
If you use config/make, ensure that the installed world is at the
compatible level for the kernel sources.

Another option might be stopping using integrated clang assembler, there
is some option in out build system for that, but I do not know it enough
to remember.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: pdeathsig_helper and .debug/pdeathsig_helper.debug placed in /mnt when DESTDIR=/mnt

2018-05-09 Thread Konstantin Belousov

On Wed, May 09, 2018 at 09:19:12PM +0200, Trond Endrest?l wrote:
> On Wed, 9 May 2018 20:46+0200, Trond Endrest?l wrote:
> 
> > But this one's persistent:
> > 
> > --- realinstall_subdir_tests ---
> > --- subr_unit_test.install ---
> > (cd /usr/src/tests/sys/kern &&  DEPENDFILE=.depend.subr_unit_test  
> > NO_SUBDIR=1 make -f /usr/src/tests/sys/kern/Makefile _RECURSING_PROGS=t  
> > PROG=subr_unit_test  install)
> > --- realinstall_subdir_usr.sbin ---
> > install  -o root -g wheel -m 444  ancontrol.debug 
> > /mnt/usr/lib/debug/usr/sbin/ancontrol.debug
> > --- maninstall ---
> > install  -o root -g wheel -m 444 ancontrol.8.gz  /mnt/usr/share/man/man8/
> > --- realinstall_subdir_usr.sbin/wlandebug ---
> > ===> usr.sbin/wlandebug (install)
> > --- realinstall_subdir_tests ---
> > --- _proginstall ---
> > install  -s -o root -g wheel -m 555   subr_unit_test 
> > /mnt/usr/tests/sys/kern/subr_unit_test
> > install  -o root -g wheel -m 444  subr_unit_test.debug 
> > /mnt/usr/lib/debug/usr/tests/sys/kern/subr_unit_test.debug
> > --- realinstall_subdir_usr.sbin ---
> > --- _proginstall ---
> > install  -s -o root -g wheel -m 555   wlandebug /mnt/usr/sbin/wlandebug
> > --- realinstall_subdir_tests ---
> > --- Kyuafile ---
> > sh: cannot create Kyuafile.tmp: Read-only file system
> > sh: cannot create Kyuafile.tmp: Read-only file system
> > *** [Kyuafile] Error code 2
> > 
> > make[6]: stopped in /usr/src/tests/sys/kern
> > 1 error
> 
> In case it helps, here's the result from a single-job run:
> 
> ===> tests/sys/geom/class/uzip (install)
> install  -o root  -g wheel -m 555  1_test  
> /mnt/usr/tests/sys/geom/class/uzip/1_test
> install  -o root -g wheel  -m 444 
> /usr/src/tests/sys/geom/class/uzip/etalon/etalon.txt 
> /mnt/usr/tests/sys/geom/class/uzip/etalon/
> install  -o root  -g wheel -m 444  Kyuafile  
> /mnt/usr/tests/sys/geom/class/uzip/Kyuafile
> install  -o root -g wheel  -m 444 /usr/src/tests/sys/geom/class/uzip/conf.sh 
> 1.img.uzip.uue /mnt/usr/tests/sys/geom/class/uzip/
> ===> tests/sys/kern (install)
> /tmp/install.84PWkNVS/sh: cannot create Kyuafile.tmp: Read-only file system
This error probably means that you did not correctly rebuild the world
before attempting the installation.

> *** Error code 2
> 
> Stop.
> make[6]: stopped in /usr/src/tests/sys/kern
> 
> -- 
> Trond.
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: pdeathsig_helper and .debug/pdeathsig_helper.debug placed in /mnt when DESTDIR=/mnt

2018-05-09 Thread Konstantin Belousov

On Wed, May 09, 2018 at 06:44:09PM +0200, Trond Endrest?l wrote:
> On Wed, 9 May 2018 16:50+0300, Konstantin Belousov wrote:
> 
> > On Wed, May 09, 2018 at 02:49:34PM +0200, Trond Endrest?l wrote:
> > > I noticed two new entries in / after running make installworld today 
> > > using amd64 stable/11 r90:
> > > 
> > > # LANG=en_US.UTF-8 ls -lT /pdeathsig_helper /.debug/pdeathsig_helper.debug
> > > -r--r--r--  1 root  wheel  7528 May  9 12:06:58 2018 
> > > /.debug/pdeathsig_helper.debug
> > > -r-xr-xr-x  1 root  wheel  8576 May  9 12:06:58 2018 /pdeathsig_helper
> > > 
> > > I'm sure these belong in /usr/tests/sys/kern. This must have happened 
> > > in r333162 or shortly after.
> > > 
> > > See PR 228018, of which I hijacked.
> > 
> > Can you try this ?
> > 
> > Index: tests/sys/kern/Makefile
> > ===
> > --- tests/sys/kern/Makefile (revision 333400)
> > +++ tests/sys/kern/Makefile (working copy)
> > @@ -4,6 +4,7 @@
> >  .PATH: ${SRCTOP}/sys/kern
> >  
> >  TESTSDIR=  ${TESTSBASE}/sys/kern
> > +BINDIR=${TESTDIR}
There is a type, it should be TESTSDIR.

Index: tests/sys/kern/Makefile
===
--- tests/sys/kern/Makefile (revision 333400)
+++ tests/sys/kern/Makefile (working copy)
@@ -4,6 +4,7 @@
 .PATH: ${SRCTOP}/sys/kern
 
 TESTSDIR=  ${TESTSBASE}/sys/kern
+BINDIR=${TESTSDIR}
 
 ATF_TESTS_C+=  kern_copyin
 ATF_TESTS_C+=  kern_descrip_test
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: pdeathsig_helper and .debug/pdeathsig_helper.debug placed in /mnt when DESTDIR=/mnt

2018-05-09 Thread Konstantin Belousov

On Wed, May 09, 2018 at 02:49:34PM +0200, Trond Endrest?l wrote:
> I noticed two new entries in / after running make installworld today 
> using amd64 stable/11 r90:
> 
> # LANG=en_US.UTF-8 ls -lT /pdeathsig_helper /.debug/pdeathsig_helper.debug
> -r--r--r--  1 root  wheel  7528 May  9 12:06:58 2018 
> /.debug/pdeathsig_helper.debug
> -r-xr-xr-x  1 root  wheel  8576 May  9 12:06:58 2018 /pdeathsig_helper
> 
> I'm sure these belong in /usr/tests/sys/kern. This must have happened 
> in r333162 or shortly after.
> 
> See PR 228018, of which I hijacked.

Can you try this ?

Index: tests/sys/kern/Makefile
===
--- tests/sys/kern/Makefile (revision 333400)
+++ tests/sys/kern/Makefile (working copy)
@@ -4,6 +4,7 @@
 .PATH: ${SRCTOP}/sys/kern
 
 TESTSDIR=  ${TESTSBASE}/sys/kern
+BINDIR=${TESTDIR}
 
 ATF_TESTS_C+=  kern_copyin
 ATF_TESTS_C+=  kern_descrip_test
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: KBI unexpexted change in stable/11 ?

2018-03-29 Thread Konstantin Belousov

On Thu, Mar 29, 2018 at 09:21:43PM +0300, Slawa Olhovchenkov wrote:
> On Wed, Mar 28, 2018 at 11:25:08PM -0700, Kevin Oberman wrote:
> 
> > > > > r325665 is previos point and is good.
> > > > > r331615 crashed.
> > > > > Can I use some script for bisect?
> > > >
> > > > I'm not aware of a script for this.  The only tool I've used is "git
> > > > bisect", which is very handy if you're already familiar with git.
> > >
> > > You may want to try devel/p5-App-SVN-Bisect.  Never used it, so
> > > no idea if it's functional or helpful, just found it doing a quick
> > > search
> > 
> > It would be nice if this could be fixed, but it is the case.
> 
> r328475 bad (tzdata)
> r328469 in progress (kib, sys/vm)
> r328463 in progress (don't touch kernel)
> r328462 good
> 
> I mean r328469 break KBI.
This commit is basically required to avoid significant confusion between
HEAD and stable code.  Its absense might make other merges impossible or
quite hard.

In fact, my opinion is that the real bug is elsewere.  We do provide
vm_map_min/max KPI, but the KPI is not KBI-stable because it encodes
struct vm_map layout into binaries which (correctly) use KPI instead
of directly accessing struct vm_map.  This should be fixed, I put the
review https://reviews.freebsd.org/D14902 for it.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Now that the meltdown-patches are in STABLE...

2018-02-26 Thread Konstantin Belousov

On Mon, Feb 26, 2018 at 09:47:26PM +0100, Rainer Duffner wrote:
> Am I right to assume they aren???t being backported to 11.1 and we have to 
> wait for 11.2?
> 
> 
> Don???t get me wrong - I???d rather have a stable system when random reboots 
> during the daily or weekly runs.
> 
> But for my own planing, I???d really like to know what the way forward is 
> going to look like.
> 
> 
> https://wiki.freebsd.org/SpeculativeExecutionVulnerabilities
> 
> Isn???t too helpful in this matter.

I put the snapshot of the WIP of the merge to 11.1 at
https://kib.kiev.ua/kib/amd64_11.1_meltdown.1.patch

I only compiled this on the stable/11, not even booted.  I suspect that
it is not compilable on 11.1 because apparently the patch depends on some
features of assembler only added in clang 5.0.

If somebody does the runtime test of the patch, it would be useful.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: [HEADS UP] - OFED/RDMA stack update

2018-02-26 Thread Konstantin Belousov

On Mon, Feb 26, 2018 at 02:21:39PM -0800, Navdeep Parhar wrote:
> +freebsd-arch@
> 
> Hi Meny,
> 
> Can you please post the KPI/KBI analysis that you generated to some
> public location and provide a link here?  A straight MFC would be a
> major break of KPI/KBI in -STABLE and the options we're looking at are:
I put the report at
https://kib.kiev.ua/kib/ibcore_11_to_11_merged_compat_report.html

> 
> a) Ignore the breakage and let downstream consumers deal with the
> fallout.  This obviously isn't ideal in a -STABLE branch.
> 
> b) Provide compat shims to at least preserve the KPI.  One challenge is
> that the changes include functions with the same name but different
> signature/behavior.  See, for example, ib_create_cq in Meny's list once
> he publishes it.
Project did handled similar issues already.  One of the approaches is to
renname the ib_create_cq with the new signature to ib_create_cq_n12 and
check for (e.g.) _WANT_NEW_OFED symbol and to select one or another:
#ifdef _WANT_NEW_OFED
#define ib_create_cq(new args there) ib_create_cq_n21(new args there)
#else
#define ib_create_cq (ib_create_cq)
#endif

Then ULP that wants new KPI defines _WANT_NEW_OFED.

> 
> c) Have two versions of the OFED interfaces in 11-STABLE and not break
> existing downstream consumers at all.
It is possible to make them loadable simultaneously as modules, but it
is quite confusing to users, because Mellanox clearly wants mlx5_ib and
mlx4_ib to work only with new OFED, while cxgbe would use old OFED ?

Also, either we would need to mess with the ibcore.ko module name, or
with version.  I am not sure that our module handling is robust enough
to make the version trick possible.

> 
> I've reached out to users that I know of and know will be affected.
> If you use OFED and FreeBSD 11 this would be a good time to weigh
> in with your thoughts, ideas, concerns etc..
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

1 2 3 4 >

1 - 100 of 394 matches

Mail list logo