Re: Current amd64 new error or warning from today's current with ruby r320323

2017-06-25 Thread Konstantin Belousov
On Sun, Jun 25, 2017 at 10:09:07AM -0700, Manfred Antar wrote:
> 
> > On Jun 25, 2017, at 9:41 AM, Konstantin Belousov <kostik...@gmail.com> 
> > wrote:
> > 
> > On Sun, Jun 25, 2017 at 08:21:33AM -0700, Manfred Antar wrote:
> >> 
> >>> On Jun 25, 2017, at 7:50 AM, Konstantin Belousov <kostik...@gmail.com> 
> >>> wrote:
> >>> 
> >>> On Sun, Jun 25, 2017 at 07:43:25AM -0700, Manfred Antar wrote:
> >>>> maybe message got reformatted in mail program (mac mail).
> >>>> could you send me a tar file of the patch?
> >>>> also not sure if ???patch -p1 <patchfile??? is the correct invocation of 
> >>>> patch
> >>>> 
> >>>> you could cc r...@pozo.com <mailto:r...@pozo.com> , that way i have copy 
> >>>> on freebsd box and on mac.
> >>> 
> >>> https://people.freebsd.org/~kib/misc/vm2.1.patch 
> >>> <https://people.freebsd.org/~kib/misc/vm2.1.patch>
> >> 
> >> OK patched and built new kernel \
> >> rebooted,
> >> same ruby message. So it must be a ruby thing.
> >> new kdump.txt at http://www.pozo.com/kernel/kdump.txt 
> >> <http://www.pozo.com/kernel/kdump.txt>
> >> 
> >> also i???ll put a copy of my kernel config in same directory:
> >> 
> >> http://www.pozo.com/kernel/pozo <http://www.pozo.com/kernel/pozo>
> >> 
> >> only one module is being loaded at boot:
> >> (kernel)4908}kldstat
> >> Id Refs AddressSize Name
> >> 15 0x8020 10380a8  kernel
> >> 21 0x8123a000 e13f50   nvidia.ko 
> >> 
> >> I can disable nvidia if it helps as I really only access this machine over 
> >> the net or serial console.
> >> 
> > No need, I understood why MAP_STACK failed in this case, thanks to the
> > ktrace log. This is indeed something ruby-specific, or rather, triggered
> > by ruby special use of libthr. It is not related to the main stack
> > split.
> > 
> > It seems that ruby requested very small stack for a new thread, only 5
> > pages in size.  This size caused the stack gap to be correctly calculated
> > as having zero size, because the whole stack is allocated by initial grow.
> > But then there is no space for the guard page, which caused mapping failure
> > for it, and overall stack mapping failure.
> > 
> > Try this.
> > https://people.freebsd.org/~kib/misc/vm2.2.patch
> 
> Seems to have worked:
> 
> (~)4933}ruby -v
> ruby 2.3.4p301 (2017-03-30 revision 58214) [amd64-freebsd12]
> (~)4934}
> 
> No more message. Do you want new ktrace ?

Thanks for testing.  You might post the trace.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Current amd64 new error or warning from today's current with ruby r320323

2017-06-25 Thread Konstantin Belousov
On Sun, Jun 25, 2017 at 08:51:07PM +0200, Trond Endrest?l wrote:
> > https://people.freebsd.org/~kib/misc/vm2.2.patch
> 
> This patch made ruby23 happy on my end. Can't say the same for 
> emacs-nox11 or temacs while building the former.

What exactly do you mean ?  Explain the behaviour and show the ktrace log.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r320358 panics immediately on boot / AMD64-GENERIC kernel

2017-06-27 Thread Konstantin Belousov
On Tue, Jun 27, 2017 at 12:09:56PM -0400, Michael Jung wrote:
> Screen image with backtrace
> 
> https://pasteboard.co/dZRVG5Uo.jpg
> 
> 
> After upgrading from 318959 to 320358 I immediately get the attached 
> panic.
> 
> This is AMD64 / GENERIC kernel.
> 
> /boot/loader.conf is empty.
> 
> The system boots off a UFS2 partition.
> 
> This is a virtual guest and I do currently have a serial cable.
> 
> Short of figuring out how to virtualize a serial console from a ESXi 
> guest is
> there any more information I can provide?

Rebuild the kernel after removing the kernel build directory.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ino64? r318606 -> r318739 OK; r318739 -> r318781 fails SIGSEGV

2017-05-24 Thread Konstantin Belousov
On Wed, May 24, 2017 at 04:59:31AM -0700, David Wolfskill wrote:
> Yesterday's in-place src update (r318606 -> r318739) was a bit more
> "interesting" than usual; as has been noted elsewhere, it really is
> necessary to boot the new kernel before the "make installworld"
> completes successfully.  That said, it ("make installworld") did
> complete successfully, and a followup reboot/smoke test worked without
> incident.
> 
> For today's update, sources are now at r318781; both laptop and build
> machine fail identically during ">>> stage 4.2: building libraries":
> 
> ...
> Building /common/S4/obj/usr/src/lib/libc/strsignal.pico
> Building /common/S4/obj/usr/src/lib/libc/libc.a
> --- libc.a ---
> building static c library
> Building /common/S4/obj/usr/src/lib/libc/libc.so.7
> Building /common/S4/obj/usr/src/lib/libc/libc_pic.a
> --- libc.so.7 ---
> building shared library libc.so.7
> --- libc_pic.a ---
> building special pic c library
> --- libc.so.7 ---
> cc: error: unable to execute command: Segmentation fault (core dumped)
> cc: error: linker command failed due to signal (use -v to see invocation)
> *** [libc.so.7] Error code 254
> 
> bmake[4]: stopped in /usr/src/lib/libc
> .ERROR_TARGET='libc.so.7'
> .ERROR_META_FILE='/common/S4/obj/usr/src/lib/libc/libc.so.7.meta'
> .MAKE.LEVEL='4'
> MAKEFILE=''
> .MAKE.MODE='meta missing-filemon=yes missing-meta=yes silent=yes verbose'
> .CURDIR='/usr/src/lib/libc'
> .MAKE='/usr/obj/usr/src/make.amd64/bmake'
> .OBJDIR='/usr/obj/usr/src/lib/libc'
> .TARGETS='all'
> DESTDIR='/usr/obj/usr/src/tmp'
> LD_LIBRARY_PATH=''
> MACHINE='amd64'
> MACHINE_ARCH='amd64'
> MAKEOBJDIRPREFIX='/usr/obj'
> MAKESYSPATH='/usr/src/share/mk'
> MAKE_VERSION='20160604'
> PATH='/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin'
> SRCTOP='/usr/src'
> OBJTOP='/usr/obj/usr/src'
> .MAKE.MAKEFILES='/usr/src/share/mk/sys.mk /usr/src/share/mk/local.sys.env.mk 
> /usr/src/share/mk/src.sys.env.mk /etc/src-env.conf 
> /usr/src/share/mk/bsd.mkopt.mk /usr/src/share/mk/bsd.suffixes.mk 
> /etc/make.conf /usr/src/share/mk/local.sys.mk /usr/src/share/mk/src.sys.mk 
> /etc/src.conf /usr/src/lib/libc/Makefile /usr/src/share/mk/src.opts.mk 
> /usr/src/share/mk/bsd.own.mk /usr/src/share/mk/bsd.opts.mk 
> /usr/src/share/mk/bsd.cpu.mk /usr/src/share/mk/bsd.compiler.mk 
> /usr/src/share/mk/bsd.compiler.mk /usr/src/lib/libc/amd64/Makefile.inc 
> /usr/src/lib/libc/db/Makefile.inc /usr/src/lib/libc/db/btree/Makefile.inc 
> /usr/src/lib/libc/db/db/Makefile.inc /usr/src/lib/libc/db/hash/Makefile.inc 
> /usr/src/lib/libc/db/man/Makefile.inc /usr/src/lib/libc/db/mpool/Makefile.inc 
> /usr/src/lib/libc/db/recno/Makefile.inc 
> /usr/src/lib/libc/compat-43/Makefile.inc /usr/src/lib/libc/gdtoa/Makefile.inc 
> /usr/src/lib/libc/gen/Makefile.inc /usr/src/lib/libc/amd64/gen/Makefile.inc 
> /usr/src/lib/libc/gmon/Makefile.inc /usr/s
 rc/lib/libc/iconv/Makefile.inc /usr/src/lib/libc_nonshared/Makefile.iconv 
/usr/src/lib/libc/inet/Makefile.inc /usr/src/lib/libc/isc/Makefile.inc 
/usr/src/lib/libc/locale/Makefile.inc /usr/src/lib/libc/md/Makefile.inc 
/usr/src/lib/libc/nameser/Makefile.inc /usr/src/lib/libc/net/Makefile.inc 
/usr/src/lib/libc/nls/Makefile.inc /usr/src/lib/libc/posix1e/Makefile.inc 
/usr/src/lib/libc/regex/Makefile.inc /usr/src/lib/libc/resolv/Makefile.inc 
/usr/src/lib/libc/stdio/Makefile.inc /usr/src/lib/libc/stdlib/Makefile.inc 
/usr/src/lib/libc/amd64/stdlib/Makefile.inc 
/usr/src/lib/libc/stdlib/jemalloc/Makefile.inc 
/usr/src/lib/libc/stdtime/Makefile.inc /usr/src/lib/libc/string/Makefile.inc 
/usr/src/lib/libc/amd64/string/Makefile.inc /usr/src/lib/libc/sys/Makefile.inc 
/usr/src/sys/sys/syscall.mk /usr/src/lib/libc/amd64/sys/Makefile.inc 
/usr/src/lib/libc/secure/Makefile.inc /usr/src/lib/libc/rpc/Makefile.inc 
/usr/src/lib/libc/uuid/Makefile.inc /usr/src/lib/libc/xdr/Makefile.inc 
/usr/src/lib/libc/x86/
 sys/Makefile.inc /usr/src/lib/libc/yp/Makefile.inc /
> .PATH='. /usr/src/lib/libc /usr/src/lib/libc/db/btree /usr/src/lib/libc/db/db 
> /usr/src/lib/libc/db/hash /usr/src/lib/libc/db/man /usr/src/lib/libc/db/mpool 
> /usr/src/lib/libc/db/recno /usr/src/lib/libc/compat-43 
> /usr/src/lib/libc/gdtoa /usr/src/lib/libc/amd64/gen /usr/src/lib/libc/gen 
> /usr/src/contrib/libc-pwcache /usr/src/contrib/libc-vis 
> /usr/src/lib/libc/gmon /usr/src/lib/libc/iconv /usr/src/lib/libc/inet 
> /usr/src/lib/libc/isc /usr/src/lib/libc/locale /usr/src/lib/libmd 
> /usr/src/lib/libc/nameser /usr/src/lib/libc/net /usr/src/lib/libc/nls 
> /usr/src/lib/libc/posix1e /usr/src/lib/libc/regex /usr/src/lib/libc/resolv 
> /usr/src/lib/libc/stdio /usr/src/lib/libc/amd64/stdlib 
> /usr/src/lib/libc/stdlib /usr/src/lib/libc/stdlib/jemalloc 
> /usr/src/lib/libc/stdtime /usr/src/contrib/tzcode/stdtime 
> /usr/src/lib/libc/amd64/string /usr/src/lib/libc/string /usr/src/sys/libkern 
> 

Re: ino64: desastrous update - recommendations not working!!!

2017-05-24 Thread Konstantin Belousov
On Wed, May 24, 2017 at 12:42:19PM +0200, Hartmann, O. wrote:
> On almost every CURRENT that has been updated according to UPDATING
> entry 2017-05-23 regarding ino64, the recommended update process ends
> up in a desaster or, if the old environemnt/kernel is intact, itr
> doesn't work.
> 
> Procedure:
> 
> make -jX buildworld buildkernel [successful]
> make installkernel [successful]
> reboot
> Booting single user mode as recommended withnthe newly installed kernel
> BUMMER!
> When it comes to the point to type in the full path of /bin/sh, /bin/sh
> immediately fails with SIGNAL 12
Signal 12 is SIGSYS, which strongly suggest that your 'new' kernel is not
new, it does not implement some of the syscalls called by new binaries.

> 
> In this case, I can boot without problems the old kernel and the system
> works again.
> 
> But, depending on the entry revision from which I started the 22nd, or
> 23rd of May ino64-deal, there is a more harsh failure!
I do not understand what are you trying to say there.

> 
> According to the above recommendation of updating, BUMMER! doesn't
> occur at that point and the shell /bin/sh starts as expected.
> Performing 
> 
> mergemaster -Fp
> 
> also performs well without any questions or installations so far,
> but then
> 
> make installworld
> 
> BUMMER! again and this time with fatal consequences! The installation
> fails in libexec/rtld-elf or something like that in the
> source/object tree after copying libexec/ld-elf.so.1. I
> see /libexec/ld-elf.so.1 successfully copied with the security copy
> marked with appendix .old being of a conclusive date and time.
> The installworld bails out, leaves the tree in a mixture of old and new
> binaries and now, thanks, the whole system ist wrecked.
> When trying to reboot such a half-ready installation in single user
> mode, I can't even get an shell enymore.
> 
> How can I fix this emergency case with the tools aboard?
> 
> Since there is no compiler or build infrastructure any more on the USB
> bootimage, I can not simply installworld and installkernel - the boot
> image is useless - on this list I had such a discussion in March. For
> short: I have the intact and complete /usr/obj tree and I think it
> would be a great deal to be able to simply boot via USB memstick and
> perform installworld with propper settings of DESTDIR= and sibblings.
> 
> Yes, now what is to do ... :-(
> 
> Help appreciated and thanks in advance for those reading so far.

I put a statically built stat(1) binary there:
https://www.kib.kiev.ua/kib/stat-ino64-static

You might use it as a test for the right kernel: after you boot with
supposedly new kernel but old world, try to run the binary.  If running
results in SIGSYS (12), you have configuration issue to solve.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r318757 - head

2017-05-24 Thread Konstantin Belousov
On Wed, May 24, 2017 at 08:06:34AM -0500, Larry Rosenman wrote:
> The initial failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus=2017-05-23%2019%3A17%3A42
> 
> I then recompiled perl, and got:
> borg.lerctr.org /home/pgbuildfarm $ cd /home/pgbuildfarm/bin/latest && 
> ./run_branches.pl --run-all --config=/home/pgbuildfarm/conf/build-farm.conf
> Socket.c: loadable library and perl binaries are mismatched (got handshake 
> key 0xd200080, needed 0xdf00080)
> borg.lerctr.org /home/pgbuildfarm/bin/latest $
> 
> force rebuilding and installing perl and all p5-* ports fixed that. 
>From what I understand in reading some perl bugs and perl source, perl
performs some validation of the structures shared between the perl
interpreter and XS libraries loaded into it. So I am almost sure that
you have perl itself and some module built against different src/ bases.

Is it true ?  If yes, then this is user error.  You are trying to mix
two binaries built against incompatible ABI.

> 
> 
> 
> -- 
> Larry Rosenman http://www.lerctr.org/~ler
> Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>  
>  
> 
> On 5/24/17, 4:05 AM, "Konstantin Belousov" <kostik...@gmail.com> wrote:
> 
> On Tue, May 23, 2017 at 04:46:14PM -0500, Larry Rosenman wrote:
> > My PostgreSQL buildfarm animal BROKE with this change until I force 
> rebuilt
> > lang/perl5.24
> > and all my p5-* ports. 
> So what was the symptoms and the error, exactly ?
> 
> A lot of efforts were spent to ensure that _consistent_ set of old 
> binaries
> and libraries would run without issues on the new system.  I mean that
> if you have binaries and libraries built on pre-ino64 system, which do
> not reference any libraries built on post ino64, except system libraries
> (like libc/libthr etc), everything should work.  This feature was the
> main cause of long delay finishing ino64.
> 
> > 
> > emulators/qemu-user-static also won???t compile (sbruno@ is on this 
> one).
> This is a separate issue.
> 
> > 
> > Poudriere did *NOT* force a fuill rebuild even though freebsd-version 
> *WAS* bumped. 
> > 
> > Is there a hazard for others here?
> > 
> > Or more info needed in /usr/{src,ports}/UPDATING?
> > 
> > 
> > -- 
> > Larry Rosenman http://www.lerctr.org/~ler
> > Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
> > US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
> >   
> > 
> > 
> > 
> 
> 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ino64? r318606 -> r318739 OK; r318739 -> r318781 fails SIGSEGV

2017-05-24 Thread Konstantin Belousov
On Wed, May 24, 2017 at 06:59:05AM -0700, David Wolfskill wrote:
> On Wed, May 24, 2017 at 04:35:58PM +0300, Konstantin Belousov wrote:
> > If build of r318739 succed, can you try, please, to rebuild the current
> > latest sources, but now with reverted r318750 ?
> 
> It did complete successfully.
> 
> I have updated /usr/src back to 318781, then started a new build.
So are you building stock 318781, or did you reverted r318750 ?

> 
> While it has not yet completed ">>> stage 4.2: building libraries", it
> is well beyond the provious point of failure (again, building parts of
> clang/libllvm).
> 
> I'm reporting now, as I'll need to head in to work fairly soon.  I
> should be able to report definitively a bit later.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ino64? r318606 -> r318739 OK; r318739 -> r318781 fails SIGSEGV

2017-05-24 Thread Konstantin Belousov
On Wed, May 24, 2017 at 06:01:43AM -0700, David Wolfskill wrote:
> On Wed, May 24, 2017 at 02:10:08PM +0200, Dimitry Andric wrote:
> > ...
> > > building special pic c library
> > > --- libc.so.7 ---
> > > cc: error: unable to execute command: Segmentation fault (core dumped)
> > > cc: error: linker command failed due to signal (use -v to see invocation)
> > > *** [libc.so.7] Error code 254
> > 
> > Looks like your linker is crashing.  Can you figure out:
> > 1) The exact linker command being run
> > 2) The path to the linker executable that crashes
> > 3) Backtrace of the crash
> > 
> > -Dimitry
> > 
> 
> On Wed, May 24, 2017 at 03:10:33PM +0300, Konstantin Belousov wrote:
> > ...
> > 
> > If you perform build of r318739 on r318739 (i.e. build of the same sources
> > as installed on your machine), does the SIGSEGV occur ?
> > 
> > Anyway, get the core file loaded into gdb and get the backtrace, at least.
> 
> Sorry for the delay; I'm way out of practice with using a debugger...
> and I see that gdb isn't in head now.  lldb tells me:
> 
> (lldb) bt
> * thread #1, name = 'ld', stop reason = signal SIGSEGV
>   * frame #0: 0x
> (lldb) 
> 
> which isn't entirely unexpected, I suppose, given the nature of SIGSEGV.
Useful gdb is in ports, devel/gdb.

There is nothing in the nature of SIGSEGV which makes lack of the
backtrace a reasonable outcome.

> 
> On the build machine, I "cloned" slice 4 to slice 3, then rebooted it
> from slice 3, "updated" /usr/src to r318739 and told it to go build
> itself (while I continued poking at my laptop).  The build machine has
> not yet completed the ">>> stage 4.2: building libraries" step -- recall
> that I had performed a "make clean" before cloning... -- but it has got
> quite a bit beyond the previous point of failure (still building clang).
> 
> I have copied the ld.core and libc.so.7.meta files from the build
> machine to <http://www.catwhisker.org/~david/FreeBSD/head/r318781/> (and
> made gzipped copies, as well).
> 
> As far as I can tell, the "ld" command was:
> freebeast(12.0-C)[10] ls -lT `which ld`
> -r-xr-xr-x  2 root  wheel  1706336 May 23 05:29:59 2017 /usr/bin/ld
> 
> This, from:
> FreeBSD freebeast.catwhisker.org 12.0-CURRENT FreeBSD 12.0-CURRENT #352  
> r318739M/318739:1200031: Tue May 23 05:16:24 PDT 2017 
> r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/sys/GENERIC  amd64
> 
> Late note: ">>> stage 4.2: building libraries" has completed on the
> build machine (building r318739 while running r318739).
> 
> I apologize for not getting all the information you (both) requested, but
> thought it best to provisde what I can (sooner).

If build of r318739 succed, can you try, please, to rebuild the current
latest sources, but now with reverted r318750 ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ino64 package fallout

2017-05-24 Thread Konstantin Belousov
On Wed, May 24, 2017 at 10:05:22AM -0700, Don Lewis wrote:
> I just upgraded by package build box and its poudriere jail to r318776
> and ran into some significant package build fallout.

There are several reviews that fix ports with most significant fallouts,
   lang/llvm39  D10796
   lang/llvm40  D10797
   lang/ghc D10798
   multimedia/webcamd   D10800
   devel/libgtopD10795
   sysutils/py-psutil   D1081
   lang/rustD10799

I intend to commit this tomorrow, after the ino64 get some probation time,
long enough to ensure that it does not get immediate revert.  You may
see the discussions and use the patches locally, meantime.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ino64? r318606 -> r318739 OK; r318739 -> r318781 fails SIGSEGV

2017-05-24 Thread Konstantin Belousov
On Wed, May 24, 2017 at 08:15:09AM -0700, David Wolfskill wrote:
> On Wed, May 24, 2017 at 07:20:01AM -0700, David Wolfskill wrote:
> > ...
> > > > I have updated /usr/src back to 318781, then started a new build.
> > > So are you building stock 318781, or did you reverted r318750 ?
> > 
> > Stock 318781 [*].  I revert as a (nearly) last resort. :-)
> > 
> > > > While it has not yet completed ">>> stage 4.2: building libraries", it
> > > > is well beyond the provious point of failure (again, building parts of
> > > > clang/libllvm).
> > > > 
> > > > I'm reporting now, as I'll need to head in to work fairly soon.  I
> > > > should be able to report definitively a bit later.
> > 
> > It's completed the ">>> stage 4.2: building libraries" part, and well
> > into ">>> stage 4.3: building everything".
> > 
> > * Save for my (usual) hacking of conf/newvers.sh a bit.
> > 
> > And now I really do need to head in to work.
> > ...
> 
> It completed successfully and a reboot shows:
> 
> FreeBSD freebeast.catwhisker.org 12.0-CURRENT FreeBSD 12.0-CURRENT #354  
> r318781M/318781:1200031: Wed May 24 07:31:48 PDT 2017 
> r...@freebeast.catwhisker.org:/common/S3/obj/usr/src/sys/GENERIC  amd64
> 

I performed a local experiment, first building sources from r318784
on a system where I did pre-commit test for ino64, updating the machine
to the result of it, then building sources of r318789 and again updating.

No SIGSEGV etc, so I think that the effects seen are due to build system.
rm -rf obj/* is the safest trick, I believe.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 64-bit inodes (ino64) Status Update and Call for Testing

2017-05-21 Thread Konstantin Belousov
On Sun, May 21, 2017 at 02:14:56PM +0200, Jilles Tjoelker wrote:
> We have another type in this area which is too small in some situations:
> uint8_t for struct dirent.d_namlen. For filesystems that store filenames
> as upto 255 UTF-16 code units, the name to be stored in d_name may be
> upto 765 bytes long in UTF-8. This was reported in PR 204643. The code
> currently handles this by returning the short (8.3) name, but this name
> may not be present or usable, leaving the file inaccessible.
> 
> Actually allowing longer names seems too complicated to add to the ino64
> change, but changing d_namlen to uint16_t (using d_pad0 space) and
> skipping entries with d_namlen > 255 in libc may be helpful.
> 
> Note that applications using the deprecated readdir_r() will not be able
> to read such long names, since the API does not allow specifying that a
> larger buffer has been provided. (This could be avoided by making struct
> dirent.d_name 766 bytes long instead of 256.)
> 
> Unfortunately, the existence of readdir_r() also prevents changing
> struct dirent.d_name to the more correct flexible array.

Yes, changing the size of d_name at this stage of the project is out of
question. My reading of your proposal is that we should extend the size
of d_namlen to uint16_t, am I right ? Should we go to 32bit directly
then, perhaps ?

I did not committed the change below, nor did I tested or even build it.

diff --git a/lib/libc/gen/readdir-compat11.c b/lib/libc/gen/readdir-compat11.c
index 1c52f563c75..18d85adaa63 100644
--- a/lib/libc/gen/readdir-compat11.c
+++ b/lib/libc/gen/readdir-compat11.c
@@ -41,6 +41,7 @@ __FBSDID("$FreeBSD$");
 #define_WANT_FREEBSD11_DIRENT
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -53,10 +54,12 @@ __FBSDID("$FreeBSD$");
 
 #include "gen-compat.h"
 
-static void
+static bool
 freebsd11_cvtdirent(struct freebsd11_dirent *dstdp, struct dirent *srcdp)
 {
 
+   if (srcdp->d_namelen >= sizeof(dstdp->d_name))
+   return (false);
dstdp->d_type = srcdp->d_type;
dstdp->d_namlen = srcdp->d_namlen;
dstdp->d_fileno = srcdp->d_fileno;  /* truncate */
@@ -65,6 +68,7 @@ freebsd11_cvtdirent(struct freebsd11_dirent *dstdp, struct 
dirent *srcdp)
bzero(dstdp->d_name + dstdp->d_namlen,
dstdp->d_reclen - offsetof(struct freebsd11_dirent, d_name) -
dstdp->d_namlen);
+   return (true);
 }
 
 struct freebsd11_dirent *
@@ -80,8 +84,10 @@ freebsd11_readdir(DIR *dirp)
if (dirp->dd_compat_de == NULL)
dirp->dd_compat_de = malloc(sizeof(struct
freebsd11_dirent));
-   freebsd11_cvtdirent(dirp->dd_compat_de, dp);
-   dstdp = dirp->dd_compat_de;
+   if (freebsd11_cvtdirent(dirp->dd_compat_de, dp))
+   dstdp = dirp->dd_compat_de;
+   else
+   dstdp = NULL;
} else
dstdp = NULL;
if (__isthreaded)
@@ -101,8 +107,10 @@ freebsd11_readdir_r(DIR *dirp, struct freebsd11_dirent 
*entry,
if (error != 0)
return (error);
if (xresult != NULL) {
-   freebsd11_cvtdirent(entry, );
-   *result = entry;
+   if (freebsd11_cvtdirent(entry, ))
+   *result = entry;
+   else
+   *result = NULL;
} else
*result = NULL;
return (0);
diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c
index 784af836aee..27b2635030d 100644
--- a/sys/kern/vfs_syscalls.c
+++ b/sys/kern/vfs_syscalls.c
@@ -3733,7 +3733,8 @@ freebsd11_kern_getdirentries(struct thread *td, int fd, 
char *ubuf, u_int count,
if (dp->d_reclen == 0)
break;
MPASS(dp->d_reclen >= _GENERIC_DIRLEN(0));
-   /* dp->d_namlen <= sizeof(dstdp.d_name) - 1 always */
+   if (dp->d_namlen >= sizeof(dstdp.d_name))
+   continue;
dstdp.d_type = dp->d_type;
dstdp.d_namlen = dp->d_namlen;
dstdp.d_fileno = dp->d_fileno;  /* truncate */
diff --git a/sys/sys/dirent.h b/sys/sys/dirent.h
index 341855d0530..691c4e8f90f 100644
--- a/sys/sys/dirent.h
+++ b/sys/sys/dirent.h
@@ -67,8 +67,9 @@ struct dirent {
off_t  d_off;   /* directory offset of entry */
__uint16_t d_reclen;/* length of this record */
__uint8_t  d_type;  /* file type, see below */
-   __uint8_t  d_namlen;/* length of string in d_name */
-   __uint32_t d_pad0;
+   __uint8_t  d_pad0
+   __uint16_t d_namlen;/* length of string in d_name */
+   __uint16_t d_pad1;
 #if __BSD_VISIBLE
 #defineMAXNAMLEN   255
chard_name[MAXNAMLEN + 1];  /* name must be no longer than this */

Re: 64-bit inodes (ino64) Status Update and Call for Testing

2017-05-21 Thread Konstantin Belousov
On Sun, May 21, 2017 at 04:03:55PM +0200, Jilles Tjoelker wrote:
> On Sun, May 21, 2017 at 03:31:18PM +0300, Konstantin Belousov wrote:
> > On Sun, May 21, 2017 at 02:14:56PM +0200, Jilles Tjoelker wrote:
> > > We have another type in this area which is too small in some situations:
> > > uint8_t for struct dirent.d_namlen. For filesystems that store filenames
> > > as upto 255 UTF-16 code units, the name to be stored in d_name may be
> > > upto 765 bytes long in UTF-8. This was reported in PR 204643. The code
> > > currently handles this by returning the short (8.3) name, but this name
> > > may not be present or usable, leaving the file inaccessible.
> 
> > > Actually allowing longer names seems too complicated to add to the ino64
> > > change, but changing d_namlen to uint16_t (using d_pad0 space) and
> > > skipping entries with d_namlen > 255 in libc may be helpful.
> 
> > > Note that applications using the deprecated readdir_r() will not be able
> > > to read such long names, since the API does not allow specifying that a
> > > larger buffer has been provided. (This could be avoided by making struct
> > > dirent.d_name 766 bytes long instead of 256.)
> 
> > > Unfortunately, the existence of readdir_r() also prevents changing
> > > struct dirent.d_name to the more correct flexible array.
> 
> > Yes, changing the size of d_name at this stage of the project is out of
> > question. My reading of your proposal is that we should extend the size
> > of d_namlen to uint16_t, am I right ? Should we go to 32bit directly
> > then, perhaps ?
> 
> Yes, my proposal is to change d_namlen to uint16_t.
> 
> Making it 32 bits is not useful with the 16-bit d_reclen, and increasing
> d_reclen does not seem useful to me with the current model of
> getdirentries() where the whole dirent must fit into the caller's
> buffer.
Bumping it now might cause less churn later, even if unused, but ok.

> 
> > I did not committed the change below, nor did I tested or even build it.
> 
> I'd like to skip overlong names in the native readdir_r() as well, so
> that long name support can be added to the kernel later without causing
> buffer overflows with applications using FreeBSD 12.0 libc.
> 
> The native readdir() does not seem to have such a problem.

Again, not even compiled.

diff --git a/lib/libc/gen/readdir-compat11.c b/lib/libc/gen/readdir-compat11.c
index 1c52f563c75..a865ab9157e 100644
--- a/lib/libc/gen/readdir-compat11.c
+++ b/lib/libc/gen/readdir-compat11.c
@@ -41,6 +41,7 @@ __FBSDID("$FreeBSD$");
 #define_WANT_FREEBSD11_DIRENT
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -53,10 +54,12 @@ __FBSDID("$FreeBSD$");
 
 #include "gen-compat.h"
 
-static void
+static bool
 freebsd11_cvtdirent(struct freebsd11_dirent *dstdp, struct dirent *srcdp)
 {
 
+   if (srcdp->d_namelen >= sizeof(dstdp->d_name))
+   return (false);
dstdp->d_type = srcdp->d_type;
dstdp->d_namlen = srcdp->d_namlen;
dstdp->d_fileno = srcdp->d_fileno;  /* truncate */
@@ -65,6 +68,7 @@ freebsd11_cvtdirent(struct freebsd11_dirent *dstdp, struct 
dirent *srcdp)
bzero(dstdp->d_name + dstdp->d_namlen,
dstdp->d_reclen - offsetof(struct freebsd11_dirent, d_name) -
dstdp->d_namlen);
+   return (true);
 }
 
 struct freebsd11_dirent *
@@ -75,13 +79,15 @@ freebsd11_readdir(DIR *dirp)
 
if (__isthreaded)
_pthread_mutex_lock(>dd_lock);
-   dp = _readdir_unlocked(dirp, 1);
+   dp = _readdir_unlocked(dirp, RDU_SKIP);
if (dp != NULL) {
if (dirp->dd_compat_de == NULL)
dirp->dd_compat_de = malloc(sizeof(struct
freebsd11_dirent));
-   freebsd11_cvtdirent(dirp->dd_compat_de, dp);
-   dstdp = dirp->dd_compat_de;
+   if (freebsd11_cvtdirent(dirp->dd_compat_de, dp))
+   dstdp = dirp->dd_compat_de;
+   else
+   dstdp = NULL;
} else
dstdp = NULL;
if (__isthreaded)
@@ -101,8 +107,10 @@ freebsd11_readdir_r(DIR *dirp, struct freebsd11_dirent 
*entry,
if (error != 0)
return (error);
if (xresult != NULL) {
-   freebsd11_cvtdirent(entry, );
-   *result = entry;
+   if (freebsd11_cvtdirent(entry, ))
+   *result = entry;
+   else /* should not happen due to RDU_SHORT */
+   *result = NULL;
} else
*result = NULL;
return (0);
diff --git a/lib/libc/gen/readdir.c b/lib/libc/gen/readdi

Re: svn commit: r318757 - head

2017-05-24 Thread Konstantin Belousov
On Tue, May 23, 2017 at 04:46:14PM -0500, Larry Rosenman wrote:
> My PostgreSQL buildfarm animal BROKE with this change until I force rebuilt
> lang/perl5.24
> and all my p5-* ports. 
So what was the symptoms and the error, exactly ?

A lot of efforts were spent to ensure that _consistent_ set of old binaries
and libraries would run without issues on the new system.  I mean that
if you have binaries and libraries built on pre-ino64 system, which do
not reference any libraries built on post ino64, except system libraries
(like libc/libthr etc), everything should work.  This feature was the
main cause of long delay finishing ino64.

> 
> emulators/qemu-user-static also won???t compile (sbruno@ is on this one).
This is a separate issue.

> 
> Poudriere did *NOT* force a fuill rebuild even though freebsd-version *WAS* 
> bumped. 
> 
> Is there a hazard for others here?
> 
> Or more info needed in /usr/{src,ports}/UPDATING?
> 
> 
> -- 
> Larry Rosenman http://www.lerctr.org/~ler
> Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>   
> 
> 
> 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


kqueue(2) changes

2017-06-02 Thread Konstantin Belousov
I implemented an option to specify absolute time for kqueue(2) timers,
and did required type changes to support larger values in struct kevent.
Please see https://reviews.freebsd.org/D11025 for the patch, including
man page update, and for some more detailed explanation.

Please review.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Time to increase MAXPHYS?

2017-06-04 Thread Konstantin Belousov
On Sat, Jun 03, 2017 at 11:28:23PM -0600, Warner Losh wrote:
> On Sat, Jun 3, 2017 at 9:55 PM, Allan Jude  wrote:
> 
> > On 2017-06-03 22:35, Julian Elischer wrote:
> > > On 4/6/17 4:59 am, Colin Percival wrote:
> > >> On January 24, 1998, in what was later renumbered to SVN r32724, dyson@
> > >> wrote:
> > >>> Add better support for larger I/O clusters, including larger physical
> > >>> I/O.  The support is not mature yet, and some of the underlying
> > >>> implementation
> > >>> needs help.  However, support does exist for IDE devices now.
> > >> and increased MAXPHYS from 64 kB to 128 kB.  Is it time to increase it
> > >> again,
> > >> or do we need to wait at least two decades between changes?
> > >>
> > >> This is hurting performance on some systems; in particular, EC2 "io1"
> > >> disks
> > >> are optimized for 256 kB I/Os, EC2 "st1" (throughput optimized
> > >> spinning rust)
> > >> disks are optimized for 1 MB I/Os, and Amazon's NFS service (EFS)
> > >> recommends
> > >> using a maximum I/O size of 1 MB (and despite NFS not being *physical*
> > >> I/O it
> > >> seems to still be limited by MAXPHYS).
> > >>
> > > We increase it in freebsd 8 and 10.3 on our systems,  Only good results.
> > >
> > > sys/sys/param.h:#define MAXPHYS (1024 * 1024)   /* max raw I/O
> > > transfer size */
> > >
> > > ___
> > > freebsd-current@freebsd.org mailing list
> > > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > To unsubscribe, send any mail to "freebsd-current-unsubscribe@
> > freebsd.org"
> >
> > At some point Warner and I discussed how hard it might be to make this a
> > boot time tunable, so that big amd64 machines can have a larger value
> > without causing problems for smaller machines.
> >
> > ZFS supports a block size of 1mb, and doing I/Os in 128kb negates some
> > of the benefit.
> >
> > I am preparing some benchmarks and other data along with a patch to
> > increase the maximum size of pipe I/O's as well, because using 1MB
> > offers a relatively large performance gain there as well.
> >
> 
> It doesn't look to be hard to change this, though struct buf depends on
> MAXPHYS:
> struct  vm_page *b_pages[btoc(MAXPHYS)];
> and b_pages isn't the last item in the list, so changing MAXPHYS at boot
> time would cause an ABI change. IMHO, we should move it to the last element
> so that wouldn't happen. IIRC all buf allocations are from a fixed pool.
> We'd have to audit anybody that creates one on the stack knowing it will be
> persisted. Given how things work, I don't think this is possible, so we may
> be safe. Thankfully, struct bio doesn't seem to be affected.
> 
> As for making it boot-time configurable, it shouldn't be too horrible with
> the above change. We should have enough of the tunables mechanism up early
> enough to pull this in before we create the buf pool.
> 
> Netflix runs MAXPHYS of 8MB. There's issues with something this big, to be
> sure, especially on memory limited systems. Lots of hardware can't do this
> big an I/O, and some drivers can't cope, even if the underlying hardware
> can. Since we don't use such drivers at work, I don't have a list handy
> (though I think the SG list for NVMe limits it to 1MB). 128k is totally
> reasonable bump by default, but I think going larger by default should be
> approached with some caution given the overhead that adds to struct buf.
> Having it be a run-time tunable would be great.
The most important side-effect of bumping MAXPHYS as high as you did,
which is somewhat counter-intuitive and also probably does not matter
for typical Netflix cache box load (as I understand it) is increase of
fragmentation for UFS volumes.

MAXPHYS limits the max cluster size, and larger the cluster we trying to
build, larger is the probability of failure.  We might end with single-block
writes more often, defeating reallocblk defragmenter.  This might be
somewhat theoretical, and probably can be mitigated in the clustering code
if real, but it is a thing to look at.

WRT making the MAXPHYS tunable, I do not like the proposal of converting
b_pages[] into the flexible array.  I think that making b_pages a pointer
to off-structure page run is better.  One of the reason is that buf cache
buffers are not only buffers in the system.  There are several cases where
the buffers are malloced, like markers for iterating queues.  In this case,
b_pages[] can be eliminated at all.  (I believe I changed all local
struct bufs to be allocated with malloc).

Another non-struct buf supply of buffers are phys buffers pool, see
vm/vm_pager.c.

> 
> There's a number of places in userland that depend on MAXPHYS, which is
> unfortunate since they assume a fixed value and don't pick it up from the
> kernel or kernel config. Thankfully, there are only a limited number of
> these.
> 
> Of course, there's times when I/Os can return much more than this. Reading
> drive log pages, for example, can 

Re: [libltdl] removal from gnu ports

2017-06-07 Thread Konstantin Belousov
On Wed, Jun 07, 2017 at 11:33:32AM +0100, David Chisnall wrote:
> On 7 Jun 2017, at 10:33, blubee blubeeme  wrote:
> > 
> > Hi
> > 
> > I'm sure I was reading yesterday on a different machine about the linker
> > flag -ld which has something to do with gnu dlopen and how it's ok to
> > remove those from your Makefile since FreeBSD handles dlopen and a few
> > other things from that header in the standard libc.
> > 
> > Is that correct?
> 
> Do you mean -ldl?  If so, then yes.  On Linux, the dl* symbols are only 
> exported from ld-linux.so if you link against libdl.  On FreeBSD, they are 
> exported from rtld regardless.

Symbols from the dynamic linker are always exported.

Issue is that the dynamic linker is never specified as the library on
the static linker (ld) command line. Linux puts stabs for the rtld
symbols into libdl.so, while FreeBSD provides weak symbols in the
libc.so dynamic symbol table. As result, FreeBSD does not need -ldl for
access to dl*(3), and does not provide libdl.so.

FreeBSD scheme is problematic because rtld have to prefer non-weak
symbols over weak, to have dynamic binaries to use real rtld symbols and
not libc stubs. It is non-compliant with the ELF spec. Unfortunately,
it is used also in other places, making us stuck with the non-compliant
behaviour.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic @r319733: "mtx_lock() of spin mutex (null) @ /usr/src/sys/kern/sys_socket.c:305"

2017-06-09 Thread Konstantin Belousov
On Fri, Jun 09, 2017 at 05:57:15AM -0700, David Wolfskill wrote:
> Build machine updated from r319689 to r319733 OK; smoke test was
> uneventful.
> 
> Laptop updated similarly, but smoke test was a little more "interesting".
> 
> Turns out that laptop gets to multi-user mode OK... if I disable
> starting xdm, devd, and hald.  But then, issuing "service hald onestart"
> generates the panic in question -- at r319733.  At r319689, xdm &
> friends worked fine.
> 
> I have placed copies of the /var/crash/*.6 files in
>  -- along with
> gzipped copies, as well.  (It's residential DSL in the US, so there's
> not a huge amount of bandwidth.)
> 
> I get the impression that something (ini hald) was trying to use
> the freebsd11 version of stat(), and Something Bad happened:
> 
> panic: mtx_lock() of spin mutex (null) @ /usr/src/sys/kern/sys_socket.c:305
> cpuid = 7
> time = 1497011454
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0x803a461b = 
> db_trace_self_wrapper+0x2b/frame 0xfe0c268ff600
> vpanic() at 0x80a1f94c = vpanic+0x19c/frame 0xfe0c268ff680
> kassert_panic() at 0x80a1f7a6 = kassert_panic+0x126/frame 
> 0xfe0c268ff6f0
> __mtx_lock_flags() at 0x809fedfe = __mtx_lock_flags+0x14e/frame 
> 0xfe0c268ff740
> soo_stat() at 0x80a8f8f0 = soo_stat+0x60/frame 0xfe0c268ff770
The main suspect is r319722.
Try reverting it or downgrading before it (the later might be simple due
to the patch size).

> kern_fstat() at 0x809cb378 = kern_fstat+0xa8/frame 0xfe0c268ff7c0
> freebsd11_fstat() at 0x809cb28d = freebsd11_fstat+0x1d/frame 
> 0xfe0c268ff930
> amd64_syscall() at 0x80e31fb4 = amd64_syscall+0x5a4/frame 
> 0xfe0c268ffab0
> Xfast_syscall() at 0x80e12eab = Xfast_syscall+0xfb/frame 
> 0xfe0c268ffab0
> --- syscall (189, FreeBSD ELF64, freebsd11_fstat), rip = 0x801b4973a, rsp = 
> 0x7fffe988, rbp = 0x7fffea20 ---
> KDB: enter: panic
> 
> 
> Note: the hald in question was built under FreeBSD stable/11 (as
> are all my ports); I noted the existence of, and installed,
> ports/misc/compat11s before (re-)creating the crash.  (And yes, the
> ports that have kernel modules get the kernel modules rebuilt on
> head every time I rebuild the kernel on head.)
> 
> With the caveat that I actually use the laptop in my day-to-day
> activities, I'm happy to try various combinations of patching,
> testing, and reporting results.
> 
> Peace,
> david
> -- 
> David H. Wolfskillda...@catwhisker.org
> Looking forward to telling Mr. Trump: "You're fired!"
> 
> See http://www.catwhisker.org/~david/publickey.gpg for my public key.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: INO64 in head: Does sys/boot/common/ufsread.c need its "typedef uint32_t ufs_ino_t;" replaced?

2017-06-16 Thread Konstantin Belousov
On Fri, Jun 16, 2017 at 05:01:43PM -0700, Mark Millard wrote:
> buildworld via clang for powerpc64 and powerpc fails for lack of
> `__udivdi3' referenced in sys/boot/common/ufsread.c fsread_size
> code. But this lead to me looking around and I found a conceptually
> separate possible issue. . .
> 
> sys/sys/_types.h :
> 
> typedef   __uint64_t  __ino_t;/* inode number */
> 
> # find /usr/src/sys/ -exec grep __ino_t {} \; -print | more
> typedef __ino_t ino_t;
> /usr/src/sys/sys/stat.h
> typedef __ino_t ino_t;  /* inode number */
> /usr/src/sys/sys/types.h
> typedef __uint64_t  __ino_t;/* inode number */
> /usr/src/sys/sys/_types.h
> typedef __ino_t ino_t;
> /usr/src/sys/sys/dirent.h
> 
> 
> sys/boot/common/ufsread.c :
> 
> . . .
> #include 
> #include 
> #include 
> . . .
> typedef uint32_tufs_ino_t;
> . . .
> 
> Note the 32-bit type above. The headers included
> have use of the 64-bit ino_t type as well, for
> example:
> 
> sys/ufs/ufs/diniode.h :
> 
> . . .
> #define UFS_ROOTINO ((ino_t)2)
> . . .
> #define UFS_WINO((ino_t)1)
> . . .
> 
> sys/ufs/ffs/fs.h :
> 
> . . .
> #define ino_to_cg(fs, x)(((ino_t)(x)) / (fs)->fs_ipg)
> #define ino_to_fsba(fs, x)  \
> ((ufs2_daddr_t)(cgimin(fs, ino_to_cg(fs, (ino_t)(x))) + \
> (blkstofrags((fs), ino_t)(x)) % (fs)->fs_ipg) / INOPB(fs))
> #define ino_to_fsbo(fs, x)  (((ino_t)(x)) % INOPB(fs))
> . . .
> 
> 
> I believe the powerpc64/powerpc issue
> gives evidence of ino_t being used in
> addition ot ufs_ino_t in
> sys/boot/common/ufsread.c 's fsread_size .
> 
> 
> Other things that look 32-bit inode-ish:
> (I do not claim to know that any of this
> matters.)
> 
> sys/ufs/ufs/dir.h has:
> 
> struct  direct {
> u_int32_t d_ino;/* inode number of entry */
> . . .
> struct dirtemplate {
> u_int32_t   dot_ino;
> . . .
> u_int32_t   dotdot_ino;
> . . .
> 
> struct odirtemplate {
> u_int32_t   dot_ino;
> . . .
> u_int32_t   dotdot_ino;
> . . .
> 
> 
> sys/ufs/ffs/fs.h has:
> 
> struct jrefrec {
> . . .
> uint32_tjr_ino;
> 
> struct jmvrec {
> . . .
> uint32_tjm_ino;
> 
> struct jblkrec {
> . . .
> uint32_tjb_ino;
> 
> struct jtrncrec {
> . . .
> uint32_tjt_ino;

UFS uses 32bit inodes, changing to 64bit is both pointless currently, and
causes on-disk layout incompatibilities.

As a consequence, use of ino_t (64bit) or uint32_t for inode numbers are
almost always interchangeable, unless used for specifying on-disk layout.
UFS correctly uses (and was changed to use) uint32_t for inode numbers
in the disk-layout definitions.  Other places, which calculate inode
numbers from inode block numbers, or do some other calculations with
inodes, are fine with either width.

That is, I believe that all instances which I looked at during the
ino64 preparation are fine.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: INO64 in head: Does sys/boot/common/ufsread.c need its "typedef uint32_t ufs_ino_t;" replaced?

2017-06-17 Thread Konstantin Belousov
On Fri, Jun 16, 2017 at 08:54:10PM -0700, Mark Millard wrote:
> On 2017-Jun-16, at 7:48 PM, Konstantin Belousov  
> wrote:
> 
> > On Fri, Jun 16, 2017 at 05:01:43PM -0700, Mark Millard wrote:
> >> . . .
> > 
> > UFS uses 32bit inodes, changing to 64bit is both pointless currently, and
> > causes on-disk layout incompatibilities.
> > 
> > As a consequence, use of ino_t (64bit) or uint32_t for inode numbers are
> > almost always interchangeable, unless used for specifying on-disk layout.
> > UFS correctly uses (and was changed to use) uint32_t for inode numbers
> > in the disk-layout definitions.  Other places, which calculate inode
> > numbers from inode block numbers, or do some other calculations with
> > inodes, are fine with either width.
> > 
> > That is, I believe that all instances which I looked at during the
> > ino64 preparation are fine.
> 
> Thanks for letting me know --and good to know.
> 
> I've added a note to the bugzilla report of the failed
> linking of boot1.elf for powerpc and powerpc64 that
> you have indicated that if the __udivdi3 is supplied to
> allow the linking to complete for builds based on clang
> then the result should operate okay for the mix of types.
> (The report is bugzilla 220024 .)
I never said that.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: adding extern maxbcachebuf to param.h

2017-06-18 Thread Konstantin Belousov
On Sun, Jun 18, 2017 at 12:36:59PM +, Rick Macklem wrote:
> My recent commit (r320062) broke the arm build when it added
> extern int maxbcachebuf;
> to sys/param.h. Although I don't understand the actual failure, I believe
> it is caused by arm/arm/elf_note.S including param.h and then using the
> ELFNOTE() macro.
> 
> As a temporary fix, I have committed r320070, which removes the definition
> from sys/param.h.
> This brings me to the question of how best to fix this?
> 1 - Just leave it the way it is now, where "extern int maxbcachebuf" isn't 
> defined
>  in a generic include file and needs to be defined as above before use.
> 2 - Add "!defined(LOCORE)" to the definition of it in sys/param.h, which I 
> believe
>  will also fix the problem.
> 3 - Put it in some other sys/*.h file which never gets included in assembler 
> files.
>  What .h would be appropriate?

I think that sys/buf.h is the best match.
Hiding the extern under !LOCORE in the sys/param.h is the second solution,
but I like moving the definition to buf.h because param.h is widely used
and most of the users do not need this symbol.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Compiler optimisation bug

2017-05-02 Thread Konstantin Belousov
On Tue, May 02, 2017 at 10:26:17AM +0200, Nick Hibma wrote:
> There is a bug in sbin/dhclient.c for large expiry values on 32 bit platforms 
> where time_t is a uint32_t (bug #218980, 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218980). It is caused by a 
> compiler optimisation that removes an if-statement. The code below shows the 
> following output, clearly showing that the optimised case provides a 
> different answer:
> 
> 
>   % cc -O2 main.c -o main.a && ./main.a
>   no opt: 0x7fff
>   with opt: 0xfffe
>   rephrased: 0x7fff
> 
> The code is as follows:
> 
>   % cat main.c
>   #include 
>   #include 
>   #define TIME_MAX 2147483647
> 
>   time_t a = TIME_MAX;
>   time_t b = TIME_MAX;
> 
>   time_t
>   add_noopt(time_t a, time_t b) __attribute__((optnone))
>   {
>   a += b;
>   if (a < b)
>   a = TIME_MAX;
This is a canonical example of the undefined behaviour. Compiler authors
consider that they have a blanket there, because the C standard left
signed integer overflow as undefined behaviour, to allow implementation
using native signed representation.  Instead, they abuse the permit to
gain 0.5% in some unimportant benchmarks.


>   return a;
>   }
> 
>   time_t
>   add_withopt(time_t a, time_t b)
>   {
>   a += b;
>   if (a < b)
>   a = TIME_MAX;
>   return a;
>   }
> 
>   time_t
>   add_rephrased(time_t a, time_t b)
>   {
>   if (a < 0 || a > TIME_MAX - b)
>   a = TIME_MAX;
>   else
>   a += b;
>   return a;
>   }
> 
>   int
>   main(int argc, char **argv)
>   {
>   printf("no opt:0x%08x\n", add_noopt(a, b));
>   printf("with opt:  0x%08x\n", add_withopt(a, b));
>   printf("rephrased: 0x%08x\n", add_rephrased(a, b));
>   }
> 
> Should this be reported to the clang folks? Or is this to be expected when 
> abusing integer overflows this way?

You will get an answer that this is expected. Add -fwrapv compiler flag
to make signed arithmetic behave in a way different from the mine-field,
or remove the code.  For kernel, we use -fwrapv.

> 
> Also: The underlying problem is the fact that time_t is a 32 bit signed int. 
> Any pointers as to a general method of resolving these time_t issues?
> 
> Regards,
> 
> Nick Hibma
> n...@van-laarhoven.org
> 
> -- Open Source: We stand on the shoulders of giants.
> 
> 
> 


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: filemon: weird full-time build although filemon enabled

2017-05-08 Thread Konstantin Belousov
On Mon, May 08, 2017 at 09:04:08AM -0700, Simon J. Gerraty wrote:
> O. Hartmann  wrote:
> > It is weird!
> > 
> > Today, after yesterday's built, I face the same 90 minutes build horror 
> > again, this time
> > I switched on "-dM" with the make command.
> > 
> > This happens:
> > 
> > [...]
> > /usr/obj/usr/src/lib/clang/libllvm/_usr_obj_usr_src_lib_clang_libllvm_CodeGen_SelectionDAG_LegalizeDAG.o.meta:
> > 15: file '/usr/local/etc/libmap.d/nvidia.conf' is newer than the target...
> > 
> 
> That does seem odd why anything involved in building clang should care
> about that file...
> 
> You can use the pid field in the syscall trace to show what process was
> looking at that file.
> 
> > This box has the following lines in /etc/src.conf to rebuild the nvidia 
> > kernel module
> > every time:
> > 
> > PORTS_MODULES+= x11/nvidia-driver
> > 
> > I do not know what is going on here ... :-(
> 
> well that might explain why nvidia.conf is updated, but not why clang
> build cares.

If I understand the motto of meta-mode, any file change is detected for any
file accessed during the build.  All dynamically-linked binary includes
the rtld into the process image, and rtld reads all config files in the
libmap.d subdirectories.  The end result is that everything must be rebuild
if any config file changed.

Then, after the world build, according to OP, the nvidia driver port is
reinstalled, which installs the nvidia.conf anew, which triggers the
behaviour on the next build.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ino64 package fallout

2017-05-25 Thread Konstantin Belousov
On Wed, May 24, 2017 at 05:49:04PM -0700, Don Lewis wrote:
> On 24 May, Konstantin Belousov wrote:
> > On Wed, May 24, 2017 at 10:05:22AM -0700, Don Lewis wrote:
> >> I just upgraded by package build box and its poudriere jail to r318776
> >> and ran into some significant package build fallout.
> > 
> > There are several reviews that fix ports with most significant fallouts,
> >lang/llvm39  D10796
> >lang/llvm40  D10797
> >lang/ghc D10798
> >multimedia/webcamd   D10800
> >devel/libgtopD10795
> >sysutils/py-psutil   D1081
> >lang/rustD10799
> > 
> > I intend to commit this tomorrow, after the ino64 get some probation time,
> > long enough to ensure that it does not get immediate revert.  You may
> > see the discussions and use the patches locally, meantime.
> 
> devel/libgtop is also broken:
> 
> procopenfiles.c:325:39: error: no member named 'kf_sa_local' in 'struct 
> kinfo_file'
> sun = (struct sockaddr_un *)>kf_sa_local;
>  ~~~  ^
> procopenfiles.c:330:37: error: no member named 'kf_sa_local' in 'struct 
> kinfo_file'
> addrstr = 
> addr_to_string(>kf_sa_local);
>   ~~~  ^
> procopenfiles.c:338:37: error: no member named 'kf_sa_peer' in 'struct 
> kinfo_file'
> addrstr = 
> addr_to_string(>kf_sa_peer);
>   ~~~  ^
> procopenfiles.c:352:36: error: no member named 'kf_sa_peer' in 'struct 
> kinfo_file'
> addrstr = addr_to_string(>kf_sa_peer);
>   ~~~  ^
> procopenfiles.c:357:52: error: no member named 'kf_sa_peer' in 'struct 
> kinfo_file'
> entry.info.sock.dest_port = 
> addr_to_port(>kf_sa_peer);
> procwd.c:155:16: warning: comparison of integers of different signs: 'int' 
> and 'unsigned long' [-Wsign-compare]
> for (i = 0; i < len / sizeof(*kif); i++, kif++) {
> ~ ^ ~~
>   ~~~ 
>  ^
> procopenfiles.c:388:9: warning: cast from 'gchar *' (aka 'char *') to 
> 'glibtop_open_files_entry *' (aka 'struct _glibtop_open_files_entry *') 
> increases required alignment from 1 to 4 [-Wcast-align]
> return (glibtop_open_files_entry*)g_array_free(entries, FALSE);
>^~~
> procopenfiles.c:305:16: warning: comparison of integers of different signs: 
> 'ssize_t' (aka 'long') and 'unsigned long' [-Wsign-compare]
> for (i = 0; i < len / sizeof(*kif); i++, kif++) {
> ~ ^ ~~
> 2 warnings and 5 errors generated.

This looks like errors from the unpatched port.  For instance, in my
working directory, content of the file
devel/libgtop/work/libgtop-2.32.0/sysdeps/freebsd/procopenfiles.c
around line 325 is:
struct sockaddr_un *sun;
   
entry.type = GLIBTOP_FILE_TYPE_LOCALSOCKET;
sun = (struct sockaddr_un *)>kf_un.kf_sock.
kf_sa_local;
which is not
sun = (struct sockaddr_un *)>kf_sa_local;
as reported by compiler in your case.

The patch is applied as extra-patch, might be you have OSVERSION set
forcibly ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ino64, java and intellij products problem

2017-05-31 Thread Konstantin Belousov
On Wed, May 31, 2017 at 11:53:39PM +0300, Boris Samorodov wrote:
> Hi All,
> 
> Seems that after ino64 transition some java programs stopped
> to fully function. I.e. java/intellij (IntellJ IDEA and Co)
> starts but does not show any files.
> 
> I'm not sure if it's a java or IntelliJ problem. Any help to
> diagnose the culprit is welcome.

Is it after full rebuild of all ports for post-ino64, or with all ports
built on pre-ino64 ?  (Mixes are not supported and not supposed to work).

Check java programs for JNI calls which return struct stat or struct
dirent to java code, and java code which knows the layout.  It is, e.g.,
the problem with Firefox and its javascript, but there the recompilation
seems to work.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: @r324591: panic: UNR inconsistency: items 0 found 7 (line 361)

2017-10-13 Thread Konstantin Belousov
On Fri, Oct 13, 2017 at 04:36:34AM -0700, David Wolfskill wrote:
> This occurred after I had booted & smoke-tested my laptop, then
> issued "sudo shutdown -r now"; it's *possible* that there was also
> a (similar?) problem shutting down yesterday (@r324542), but the
> screen had gone dark and declined to shed any light, so I'm not
> sure on that point.
> 
> uname strings (yesterday & today):
> 
> FreeBSD g1-252.catwhisker.org 12.0-CURRENT FreeBSD 12.0-CURRENT #429  
> r324542M/324546:1200051: Thu Oct 12 05:09:28 PDT 2017 
> r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64
> 
> FreeBSD g1-252.catwhisker.org 12.0-CURRENT FreeBSD 12.0-CURRENT #430  
> r324591M/324591:1200051: Fri Oct 13 03:58:11 PDT 2017 
> r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64
> 
> Panic message, kernel buffer & stack trace from today:
> 
> panic: UNR inconsistency: items 0 found 7 (line 361)
> 
Revert r324542 and try again.

This revision was not tested with DIAGNOSTIC enabled.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: C++ in jemalloc

2017-10-06 Thread Konstantin Belousov
On Thu, Oct 05, 2017 at 11:59:03AM -0700, David Goldblatt wrote:
>  Hi all,
> 
> The jemalloc developers have wanted to start using C++ for a while, to
> enable some targeted refactorings of code we have trouble maintaining due
> to brittleness or complexity (e.g. moving thousand line macro definitions
> to templates, changing the build->extract symbols->rebuild mangling scheme
> for internal symbols to one using C++ namespaces). We'd been holding off
> because we thought that FreeBSD base all had to compile on GCC 4.2, in
> order to support some esoteric architectures[1].
> 
> The other day though, I noticed that there is some C++ shipping with
> FreeBSD; /usr/bin/dtc and /sbin/devd (the former claiming in the HACKING
> document that C++11 is a minimum for FreeBSD 11). This, combined with the
> fact that ports now points to a modern gcc, makes me think we were
> incorrect, and can turn on C++ without breaking FreeBSD builds.
Note that these are just usermode utilities, which implementation language
is not too important.  If we considered ghc or rustc to be acceptable
dependency for utilities, then they could be implemented in Haskell or
Rust as well.

> 
> Am I right? Will anything break if jemalloc needs a C++ compiler to build?
> We will of course not use exceptions, RTTI, global constructors, the C++
> stdlib, or anything else that might affect C source or link compatibility.
I wonder how can you guarantee that for current and future compilers without
having the standard saying and compiler facilities to ensure that.  See below.

> 
> Thanks,
> David (on behalf of the jemalloc developers
> 
> [1] That being said, we don't compile or test on those architectures, and
> so probably don't work there in the first place if I'm being honest. But
> we'd also like to avoid making that a permanent state of affairs that can't
> be changed.

On Thu, Oct 05, 2017 at 04:50:32PM -0700, David Goldblatt wrote:
> We can avoid it in the short term without a ton of pain. In the long run it
> would be nice to have, but I wouldn't want to tie our release schedule to
> FreeBSD's too tightly (our CI is improving to the point where the tip of
> the dev branch gets tested about as well as releases would be, so we're
> trying to de-emphasize release vs. non-release versions). Do you have a
> sense of when the situation might change (if only so I know when to check
> back)?
> 
> Thanks for the replies on this, they've been super helpful.
I do not think so.

You are talking about importing C++ code into libc.  Libc _implements_
C runtime, which is a dependency of any C++ runtime.  That is, we
cannot allow C++ runtime to be dragged into libc.

C++ freestanding implementation, by the standard, is required to provide
the runtime typeinfo, exceptions, intialization and termination support
(i.e. atexit/cxa_atexit and most important cxa_thread_atexit, if you
use TLS) and so on.  It clearly gives the cycle in the dependencies.
There is no requirement on the compilers to not use these features
in unexpected ways, and looking at the current compiler evolution, I do
expect that these features would bite us simply because we allowed C++
code in libc.

We already have some issues die to the jemalloc reliance on pthreads,
which makes the bootstraping a problem. We have to maintain the ugly
fake init trick to postpone malloc for the mutexes backing store
inside libthr to allow jemalloc to initialize without causing cyclic
dependencies.

Also, our C runtime (rtld/libc/libthr and perhaps libm) is currently
only requires C compiler (and assembler and linker) to compile. Having
C++ requirement for compilation, assuming the runtime issues I noted
above are somewhat avoided, is also not a move I consider useful.

Summary is that, in my opinion, requiring C++ compiler and working runtime
for malloc(3) implementation is not desirable.  If this goes in, low-level
parts of the libc and whole libthr must grow private malloc implementation
to not depend on libc malloc.  Currently only rtld has private malloc
(for similar reasons).
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Dying SD memory? handle_workitem_freefile: got error 5 while accessing filesystem

2017-09-24 Thread Konstantin Belousov
On Sun, Sep 24, 2017 at 08:32:54AM -0700, Rodney W. Grimes wrote:
> Would it be possible to add some robustness to these checksum
> failed errors to indicate there source more accurately?  Ie
> either a /dev entry or a fstab entry, and the fact they are
> from the ffs layer?

Perhaps this. But untested and somewhat racy.

diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c
index b981b5e5794..bd78b760442 100644
--- a/sys/ufs/ffs/ffs_alloc.c
+++ b/sys/ufs/ffs/ffs_alloc.c
@@ -2584,6 +2584,15 @@ ffs_mapsearch(fs, cgp, bpref, allocsiz)
return (-1);
 }
 
+static const struct statfs *
+ffs_getmntstat(struct vnode *devvp)
+{
+
+   if (devvp->v_type == VCHR)
+   return (>v_rdev->si_mountpt->mnt_stat);
+   return (ffs_getmntstat(VFSTOUFS(devvp->v_mount)->um_devvp));
+}
+
 /*
  * Fetch and verify a cylinder group.
  */
@@ -2597,6 +2606,7 @@ ffs_getcg(fs, devvp, cg, bpp, cgpp)
 {
struct buf *bp;
struct cg *cgp;
+   const struct statfs *sfs;
int flags, error;
 
*bpp = NULL;
@@ -2615,7 +2625,11 @@ ffs_getcg(fs, devvp, cg, bpp, cgpp)
(bp->b_flags & B_CKHASH) != 0 &&
cgp->cg_ckhash != bp->b_ckhash) ||
!cg_chkmagic(cgp) || cgp->cg_cgx != cg) {
-   printf("checksum failed: cg %u, cgp: 0x%x != bp: 0x%jx\n",
+   sfs = ffs_getmntstat(devvp);
+   printf("UFS %s%s (%s) cylinder checksum failed: cg %u, cgp: "
+   "0x%x != bp: 0x%jx\n",
+   devvp->v_type == VCHR ? "" : "snapshot of ",
+   sfs->f_mntfromname, sfs->f_mntonname,
cg, cgp->cg_ckhash, (uintmax_t)bp->b_ckhash);
bp->b_flags &= ~B_CKHASH;
bp->b_flags |= B_INVAL | B_NOCACHE;
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Dying SD memory? handle_workitem_freefile: got error 5 while accessing filesystem

2017-09-24 Thread Konstantin Belousov
On Sun, Sep 24, 2017 at 02:19:26PM +0200, O. Hartmann wrote:
> Am Sun, 24 Sep 2017 10:52:38 +0300
> Konstantin Belousov <kostik...@gmail.com> schrieb:
> 
> > On Sun, Sep 24, 2017 at 09:31:51AM +0200, O. Hartmann wrote:
> > > Running a NanoBSD application on a small PCEngine device, feeding the box 
> > > with a
> > > CURRENT (FreeBSD 12.0-CURRENT #33 r323958: Sat Sep 23 22:19:57 CEST 2017 
> > > amd64),
> > > booting from a SD memory card.
> > > 
> > > Recently, when saving configuration changes on the SD memory card, I 
> > > receive these
> > > strange errors:
> > > 
> > > [...]
> > > checksum failed: cg 0, cgp: 0xb478d1d5 != bp: 0x874608df
> > > checksum failed: cg 12, cgp: 0xe626d67c != bp: 0x7360c12a
> > > checksum failed: cg 0, cgp: 0xb478d1d5 != bp: 0x874608df
> > > handle_workitem_freefile: got error 5 while accessing filesystem
> > > checksum failed: cg 12, cgp: 0xe626d67c != bp: 0x7360c12a
> > > handle_workitem_freefile: got error 5 while accessing filesystem
> > > [...]
> > > 
> > > I do not use MMCCAM in the kernel, but as far as I know, the problem 
> > > occured in the
> > > same time frame. 
> > > 
> > > The question is: is the SD memory about to die and is to be replaced or 
> > > is this a bug
> > > introduced with the new MMCCAM stuff recently?  
> > 
> > The messages you see about 'checksum_failed' are due to recently added
> > UFS feature where cylinder group blocks are checksummed, and the checksum
> > is verified against the known one stored on the volume.
> > 
> > Were there kernel messages about disk I/O errors before that (as opposed
> > to UFS errors about checksums and SU complains) ?
> 
> Hello,
> 
> no, there were no errors before. The small box is updated on a frequent 
> basis, so I'm
> sure there were no errors like those before or any other kind regarding I/O.
> 
> I recall having read the commit description about the UFS2 enhancement.
> 
> The NanoBSD image is created using UFS2/GPT partitioning scheme, if this is 
> important to
> know. 

Try to read the SD card on pre-checksum kernel.  If you can read it, verify
that the files are not damaged.  This is a way to check whether your
SD card or controller provides the wrong data to host.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Build error: 'isa/rtc.h' file not found

2017-10-02 Thread Konstantin Belousov
On Mon, Oct 02, 2017 at 09:44:36AM +0200, O. Hartmann wrote:
> Am Mon, 2 Oct 2017 11:35:22 +0700
> "Alex V. Petrov"  schrieb:
> 
> > revision 324186
> > In file included from /usr/obj/usr/src/tmp/usr/include/sys/efi.h:33:
> > /usr/obj/usr/src/tmp/usr/include/machine/efi.h:35:10: fatal error:
> > 'isa/rtc.h' file not found
> 
> Me, too here.
> 

I am only starting the build, but I believe that the following patch should
give the fix.


diff --git a/sys/amd64/include/efi.h b/sys/amd64/include/efi.h
index 75d39592e28..eaea8d03276 100644
--- a/sys/amd64/include/efi.h
+++ b/sys/amd64/include/efi.h
@@ -32,8 +32,6 @@
 #ifndef __AMD64_INCLUDE_EFI_H_
 #define __AMD64_INCLUDE_EFI_H_
 
-#include 
-
 /*
  * XXX: from gcc 6.2 manual:
  * Note, the ms_abi attribute for Microsoft Windows 64-bit targets
@@ -47,8 +45,12 @@
 #defineEFIABI_ATTR __attribute__((ms_abi))
 #endif
 
+#ifdef _KERNEL
+#include 
+
 #defineEFI_TIME_LOCK() mtx_lock(_time_lock);
 #defineEFI_TIME_UNLOCK()   mtx_unlock(_time_lock);
 #defineEFI_TIME_OWNED()mtx_assert(_time_lock, MA_OWNED);
+#endif
 
 #endif /* __AMD64_INCLUDE_EFI_H_ */
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Dying SD memory? handle_workitem_freefile: got error 5 while accessing filesystem

2017-09-24 Thread Konstantin Belousov
On Sun, Sep 24, 2017 at 09:31:51AM +0200, O. Hartmann wrote:
> Running a NanoBSD application on a small PCEngine device, feeding the box 
> with a CURRENT
> (FreeBSD 12.0-CURRENT #33 r323958: Sat Sep 23 22:19:57 CEST 2017 amd64), 
> booting from a
> SD memory card.
> 
> Recently, when saving configuration changes on the SD memory card, I receive 
> these
> strange errors:
> 
> [...]
> checksum failed: cg 0, cgp: 0xb478d1d5 != bp: 0x874608df
> checksum failed: cg 12, cgp: 0xe626d67c != bp: 0x7360c12a
> checksum failed: cg 0, cgp: 0xb478d1d5 != bp: 0x874608df
> handle_workitem_freefile: got error 5 while accessing filesystem
> checksum failed: cg 12, cgp: 0xe626d67c != bp: 0x7360c12a
> handle_workitem_freefile: got error 5 while accessing filesystem
> [...]
> 
> I do not use MMCCAM in the kernel, but as far as I know, the problem occured 
> in the same
> time frame. 
> 
> The question is: is the SD memory about to die and is to be replaced or is 
> this a bug
> introduced with the new MMCCAM stuff recently?

The messages you see about 'checksum_failed' are due to recently added
UFS feature where cylinder group blocks are checksummed, and the checksum
is verified against the known one stored on the volume.

Were there kernel messages about disk I/O errors before that (as opposed
to UFS errors about checksums and SU complains) ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Segfault in _Unwind_* code called from pthread_exit

2017-08-24 Thread Konstantin Belousov
On Wed, Aug 23, 2017 at 04:37:07PM +0200, Tijl Coosemans wrote:
> Hi,
> 
> The following program segfaults for me on amd64 when linked like this:
> 
> cc -o test test.c -lpthread -L/usr/local/lib/gcc5 -lgcc_s -rpath 
> /usr/local/lib/gcc5
> 
> 
> #include 
> #include 
> 
> void *
> thr( void *arg ) {
>   return( NULL );
> }
> 
> int
> main( void ) {
>   pthread_t thread;
> 
>   for( int i = 1; i < 20; i++ ) {
>   fprintf( stderr, "%d\n", i );
>   pthread_create( , NULL, thr, NULL );
>   pthread_join( thread, NULL );
>   }
>   return( 0 );
> }
> 
> 
> The backtrace looks like this:
> 
> Thread 7 received signal SIGSEGV, Segmentation fault.
> [Switching to LWP 100511 of process 1886]
> uw_frame_state_for (context=context@entry=0x7fffdfffddc0, 
> fs=fs@entry=0x7fffdfffdb10)
> at /usr/ports/lang/gcc5/work/gcc-5.4.0/libgcc/unwind-dw2.c:1249
> 1249  /usr/ports/lang/gcc5/work/gcc-5.4.0/libgcc/unwind-dw2.c: No such file 
> or directory.
> (gdb) bt
> #0  uw_frame_state_for (context=context@entry=0x7fffdfffddc0, 
> fs=fs@entry=0x7fffdfffdb10)
> at /usr/ports/lang/gcc5/work/gcc-5.4.0/libgcc/unwind-dw2.c:1249
> #1  0x000800a66ecb in _Unwind_ForcedUnwind_Phase2 (
> exc=exc@entry=0x800658730, context=context@entry=0x7fffdfffddc0)
> at /usr/ports/lang/gcc5/work/gcc-5.4.0/libgcc/unwind.inc:155
> #2  0x000800a67200 in _Unwind_ForcedUnwind (exc=0x800658730, 
> stop=0x8008428b0 , stop_argument=0x0)
> at /usr/ports/lang/gcc5/work/gcc-5.4.0/libgcc/unwind.inc:207
> #3  0x000800842224 in _Unwind_ForcedUnwind (ex=0x800658730, 
> stop_func=0x8008428b0 , stop_arg=0x0)
> at /usr/src/lib/libthr/thread/thr_exit.c:106
> #4  0x00080084269f in thread_unwind ()
> at /usr/src/lib/libthr/thread/thr_exit.c:172
> #5  0x0008008424d6 in _pthread_exit_mask (status=0x0, mask=0x0)
> at /usr/src/lib/libthr/thread/thr_exit.c:254
> #6  0x000800842359 in _pthread_exit (status=0x0)
> at /usr/src/lib/libthr/thread/thr_exit.c:206
> #7  0x00080082ccb1 in thread_start (curthread=0x800658500)
> at /usr/src/lib/libthr/thread/thr_create.c:289
> #8  0x7fffdfdfe000 in ?? ()
> Backtrace stopped: Cannot access memory at address 0x7fffdfffe000
> 
> 
> It happens with gcc6 as well, but not with base libgcc_s.
> Can anyone reproduce this?  Have there been any changes to stack
> unwinding recently (last few months)?

I can reproduce this, and there was a change in gcc unwinder, it seems.
Below is a patch which I did not even compiled.  Still, it should give
an idea how it might be approached.  The patch is against gcc head.

Index: libgcc/config/i386/freebsd-unwind.h
===
--- libgcc/config/i386/freebsd-unwind.h (revision 251293)
+++ libgcc/config/i386/freebsd-unwind.h (working copy)
@@ -28,6 +28,8 @@ see the files COPYING3 and COPYING.RUNTIME respect
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -42,7 +44,29 @@ x86_64_freebsd_fallback_frame_state
 {
   struct sigframe *sf;
   long new_cfa;
+#ifdef KERN_PROC_SIGTRAMP
+  static long sigtramp_addr = 0;
 
+  if (sigtramp_addr == 0) {
+struct kinfo_sigtramp kst;
+int error, mib[4];
+size_t len;
+
+mib[0] = CTL_KERN;
+mib[1] = KERN_PROC;
+mib[2] = KERN_PROC_SIGTRAMP;
+mib[3] = getpid();
+len = sizeof(kst);
+error = sysctl(mib, sizeof(mib) / sizeof(mib[0]), , , NULL, 0);
+if (error == 0)
+  sigtramp_addr = kst.ksigtramp_start;
+  }
+
+  if (sigtramp_addr != 0 && (uintptr_t)(context->ra) == sigtramp_addr)
+;
+  else
+#endif
+
   /* Prior to FreeBSD 9, the signal trampoline was located immediately
  before the ps_strings.  To support non-executable stacks on AMD64,
  the sigtramp was moved to a shared page for FreeBSD 9.  Unfortunately
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SIGSEGV in /bin/sh after r322740 -> r322776 update

2017-08-22 Thread Konstantin Belousov
On Tue, Aug 22, 2017 at 04:46:27AM -0700, David Wolfskill wrote:
> Started with:
> FreeBSD freebeast.catwhisker.org 12.0-CURRENT FreeBSD 12.0-CURRENT #445  
> r322740M/322745:1200040: Mon Aug 21 04:35:19 PDT 2017 
> r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/sys/GENERIC  amd64
> 
> After source-based update (which was uneventful):
> FreeBSD freebeast.catwhisker.org 12.0-CURRENT FreeBSD 12.0-CURRENT #446  
> r322776M/322778:1200041: Tue Aug 22 04:07:02 PDT 2017 
> r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/sys/GENERIC  amd64
> 
> Rebooted; trying "make delete-old-libs" yields:
> Segmentation fault (core dumped)
> make: "/usr/src/share/mk/bsd.compiler.mk" line 159: warning: "echo "5.0.0 
> 5.0.0svn)" | awk -F. '{print $1 * 1 + $2 * 100 + $3;}'" returned non-zero 
> status
> Segmentation fault (core dumped)
> make: "/usr/src/share/mk/bsd.compiler.mk" line 164: warning: "{ echo 
> "__FreeBSD_cc_version" | cc -E - 2>/dev/null || echo __FreeBSD_cc_version; } 
> | sed -n '$p'" returned non-zero status
> Segmentation fault (core dumped)
> make: "/usr/src/share/mk/bsd.linker.mk" line 56: warning: Unknown linker from 
> LD=ld: , defaulting to bfd
> Segmentation fault (core dumped)
> make: "/usr/src/share/mk/bsd.linker.mk" line 61: warning: "echo "2.17.50" |  
> awk -F. '{print $1 * 1 + $2 * 100 + $3;}'" returned non-zero status
> Segmentation fault (core dumped)
> Segmentation fault (core dumped)
> make[1]: "/usr/src/share/mk/bsd.compiler.mk" line 155: Unable to determine 
> compiler type for CC=cc.  Consider setting COMPILER_TYPE.
> *** Error code 1
> 
> Stop.
> make: stopped in /usr/src
> .ERROR_TARGET='delete-old-libs'
> .ERROR_META_FILE=''
> .MAKE.LEVEL='0'
> MAKEFILE=''
> .MAKE.MODE='normal'
> _ERROR_CMD='.PHONY'
> .CURDIR='/usr/src'
> .MAKE='make'
> .OBJDIR='/usr/obj/usr/src'
> .TARGETS='delete-old-libs'
> DESTDIR=''
> LD_LIBRARY_PATH=''
> MACHINE='amd64'
> MACHINE_ARCH='amd64'
> MAKEOBJDIRPREFIX='/usr/obj'
> MAKESYSPATH='/usr/src/share/mk'
> MAKE_VERSION='20170720'
> PATH='/sbin:/bin:/usr/sbin:/usr/bin'
> SRCTOP='/usr/src'
> OBJTOP='/usr/obj/usr/src'
> 
> 
> I actually *first* noticed the issue on my laptop -- above was my
> builld machine.  On laptop, I run xdm; entered login & password;
> screen blanked, then returned to fresh login screen.  Loggedin on
> vty, and found sh.core.  "file sh.core" said:
> 
> g1-252(11.1-S)[4] file sh.core 
> sh.core: ELF 64-bit LSB core file x86-64, version 1 (FreeBSD), FreeBSD-style, 
> from ' /usr/local/lib/X11/xdm/Xsession'
> 
> Files affected by the update were:
> Updating '/S4/usr/src':
> A
> /S4/usr/src/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/llquantize/err.D_LLQUANT_MAGTOOBIG.offbyone.d
> U/S4/usr/src/cddl/contrib/opensolaris/lib/libdtrace/common/dt_cc.c
> U/S4/usr/src/cddl/usr.sbin/dtrace/tests/common/llquantize/Makefile
> U/S4/usr/src/contrib/top/loadavg.h
> U/S4/usr/src/kerberos5/libexec/kpasswdd/Makefile
> U/S4/usr/src/lib/libc/amd64/sys/Makefile.inc
> A/S4/usr/src/lib/libc/amd64/sys/amd64_detect_rdfsgsbase.c
> A/S4/usr/src/lib/libc/amd64/sys/amd64_detect_rdfsgsbase.h
> U/S4/usr/src/lib/libc/amd64/sys/amd64_get_fsbase.c
> U/S4/usr/src/lib/libc/amd64/sys/amd64_get_gsbase.c
> U/S4/usr/src/lib/libc/amd64/sys/amd64_set_fsbase.c
> U/S4/usr/src/lib/libc/amd64/sys/amd64_set_gsbase.c
> U/S4/usr/src/lib/libc/mips/Symbol.map
> U/S4/usr/src/lib/libcompiler_rt/Makefile.inc
> U/S4/usr/src/share/man/man7/tests.7
> U/S4/usr/src/sys/amd64/amd64/cpu_switch.S
> U/S4/usr/src/sys/amd64/amd64/exception.S
> U/S4/usr/src/sys/amd64/amd64/machdep.c
> U/S4/usr/src/sys/amd64/amd64/ptrace_machdep.c
> U/S4/usr/src/sys/amd64/amd64/sys_machdep.c
> U/S4/usr/src/sys/amd64/amd64/vm_machdep.c
> U/S4/usr/src/sys/amd64/include/asmacros.h
> U/S4/usr/src/sys/amd64/include/pcb.h
> U/S4/usr/src/sys/arm64/arm64/swtch.S
> U/S4/usr/src/sys/cddl/contrib/opensolaris/uts/common/sys/isa_defs.h
> U/S4/usr/src/sys/compat/linuxkpi/common/src/linux_rcu.c
> U/S4/usr/src/sys/dev/qlxgbe/README.txt
> U/S4/usr/src/sys/dev/qlxgbe/ql_boot.c
> U/S4/usr/src/sys/dev/qlxgbe/ql_def.h
> U/S4/usr/src/sys/dev/qlxgbe/ql_fw.c
> U/S4/usr/src/sys/dev/qlxgbe/ql_glbl.h
> U/S4/usr/src/sys/dev/qlxgbe/ql_hw.c
> U/S4/usr/src/sys/dev/qlxgbe/ql_hw.h
> U/S4/usr/src/sys/dev/qlxgbe/ql_inline.h
> U/S4/usr/src/sys/dev/qlxgbe/ql_ioctl.c
> U/S4/usr/src/sys/dev/qlxgbe/ql_isr.c
> U/S4/usr/src/sys/dev/qlxgbe/ql_minidump.c
> U/S4/usr/src/sys/dev/qlxgbe/ql_os.c
> U/S4/usr/src/sys/dev/qlxgbe/ql_os.h
> U/S4/usr/src/sys/dev/qlxgbe/ql_reset.c
> U/S4/usr/src/sys/dev/qlxgbe/ql_ver.h
> U/S4/usr/src/sys/kern/subr_smp.c
> U/S4/usr/src/sys/mips/mips/exception.S
> U/S4/usr/src/sys/modules/qlxgbe/Makefile
> U/S4/usr/src/sys/netipsec/ipsec.c
> U/S4/usr/src/sys/netipsec/ipsec.h
> U/S4/usr/src/sys/netipsec/ipsec6.h
> U

Re: SIGSEGV in /bin/sh after r322740 -> r322776 update

2017-08-22 Thread Konstantin Belousov
On Tue, Aug 22, 2017 at 05:28:36AM -0700, David Wolfskill wrote:
> On Tue, Aug 22, 2017 at 02:59:23PM +0300, Konstantin Belousov wrote:
> > ...
> > > lldb's notion of the backtrace was fairly non-useful:
> > > g1-252(11.1-S)[7] lldb -c sh.core
> > > (lldb) target create --core "sh.core"
> > > Core file '/home/david/sh.core' (x86_64) was loaded.
> > > (lldb) bt
> > > * thread #1, name = 'sh', stop reason = signal SIGSEGV
> > >   * frame #0: 0x000800b6ee08
> > > frame #1: 0x00080003
> > > (lldb) 
> > I am not sure how to get the interesting information with lldb,
> > try gdb.
> 
> freebeast(12.0-C)[11] gdb -c sh.core 
> GNU gdb (GDB) 8.0 [GDB v8.0 for FreeBSD]
> ...
> Type "apropos word" to search for commands related to "word".
> [New LWP 100182]
> Core was generated by `sh -c cc --version || echo 0.0.0'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x000800b6ee08 in ?? ()
> (gdb) bt
> #0  0x000800b6ee08 in ?? ()
> #1  0x in ?? ()
> (gdb) 
> 
> > Disassemble the code around the faulting %rip.
> 
> Sorry; I haven't done very much with any debugger other than the
> one in Perl in ... decades.  Checking the gdb docs online, the only
> reference to "disassembly" reads "23.3.3.22 Disassembly In Guile",
> which seems rather far off the mark.

$ gdb /bin/sh sh.core
(gdb) bt
(gdb) info registers
(gdb) disassemble

> 
> I'm afraid I'll need a bit more detail.
> 
> >Also provide the first
> > 100 lines of verbose dmesg of the boot on the affected machine.
> 
> Well, a copy of the complete (verbose) dmesg.boot from *yesterday*
> (r322740) is at
> <http://www.catwhisker.org/~david/FreeBSD/history/freebeast.12_dmesg.txt>
> 
> I grabbed a copy of the dmesg.boot for today, and have attached
> "head -100" from it to this message.
Thank you.

> 
> > Is it only /bin/sh which faults ?
> 
> Well, /bin/csh doesn't seem to be giving me any trouble as I use
> it interactively.  I don't recall seeing evidence that anything
> that isn't invoking /bin/sh is having a problem; on the other hand,
> there is a lot of the system I don't normally use.  But things like
> "svn info" work, as does "svnlite info" (big difference there is
> that former is a port, built under stable/11, while the latter would
> be part of base).
> 
> > Does system boot into multiuser ?
> 
> Yes; it does.  But checking /var/log/messages, I see:

Ok, can you rebuild kernel and libc from scratch ?  I.e. remove your
object directories.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Segfault in _Unwind_* code called from pthread_exit

2017-08-25 Thread Konstantin Belousov
On Fri, Aug 25, 2017 at 05:38:51PM +0200, Tijl Coosemans wrote:
> So both GCC and LLVM unwinding look up the return address in the CFI
> table and fail when the return address is garbage, but LLVM treats this
> as an end-of-stack condition while GCC further tries to see if the
> return address points to a signal trampoline by testing the instruction
> bytes at that address.  On amd64 the garbage address is unreadable so it
> segfaults.  On i386 it is readable, the test fails and GCC returns
> end-of-stack.
How does llvm unwinder detects that the return address is a garbage ?

> 
> To fix the crash and get predictable behaviour in the other cases I
> propose always setting the return address to 0.  The attached patch does
> this for i386 and amd64.  I don't know if other architectures need a
> similar patch.

> Index: sys/amd64/amd64/vm_machdep.c
> ===
> --- sys/amd64/amd64/vm_machdep.c  (revision 322802)
> +++ sys/amd64/amd64/vm_machdep.c  (working copy)
> @@ -507,6 +507,9 @@ cpu_set_upcall(struct thread *td, void (*entry)(void *
>  (((uintptr_t)stack->ss_sp + stack->ss_size - 4) & ~0x0f) - 4;
>   td->td_frame->tf_rip = (uintptr_t)entry;
>  
> + /* Sentinel return address to stop stack unwinding. */
> + suword32((void *)td->td_frame->tf_rsp, 0);
> +
>   /* Pass the argument to the entry point. */
>   suword32((void *)(td->td_frame->tf_rsp + sizeof(int32_t)),
>   (uint32_t)(uintptr_t)arg);
> @@ -529,6 +532,9 @@ cpu_set_upcall(struct thread *td, void (*entry)(void *
>   td->td_frame->tf_fs = _ufssel;
>   td->td_frame->tf_gs = _ugssel;
>   td->td_frame->tf_flags = TF_HASSEGS;
> +
> + /* Sentinel return address to stop stack unwinding. */
> + suword((void *)td->td_frame->tf_rsp, 0);
>  
>   /* Pass the argument to the entry point. */
>   td->td_frame->tf_rdi = (register_t)arg;
> Index: sys/i386/i386/vm_machdep.c
> ===
> --- sys/i386/i386/vm_machdep.c(revision 322802)
> +++ sys/i386/i386/vm_machdep.c(working copy)
> @@ -524,6 +524,9 @@ cpu_set_upcall(struct thread *td, void (*entry)(void *
>   (((int)stack->ss_sp + stack->ss_size - 4) & ~0x0f) - 4;
>   td->td_frame->tf_eip = (int)entry;
>  
> + /* Sentinel return address to stop stack unwinding. */
> + suword((void *)td->td_frame->tf_esp, 0);
> +
>   /* Pass the argument to the entry point. */
>   suword((void *)(td->td_frame->tf_esp + sizeof(void *)),
>   (int)arg);

I do not object against this, but I believe that a better solution exists
for the system side (putting my change for gcc unwinder to detect the
signal frame aside).  The thread_start() sentinel in libthr should get
proper dwarf annotation of not having the return address.  May be
normal function attributes of no return are enough to force compilers
to generate required unwind data.  Might be some more magic with inline
asm and .cfi_return_column set to undefined.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Segfault in _Unwind_* code called from pthread_exit

2017-08-26 Thread Konstantin Belousov
On Sat, Aug 26, 2017 at 08:28:13PM +0200, Tijl Coosemans wrote:
> On Sat, 26 Aug 2017 02:44:42 +0300 Konstantin Belousov <kostik...@gmail.com> 
> wrote:
> > How does llvm unwinder detects that the return address is a garbage ?
> 
> It just stops unwinding when it can't find frame information (stored in
> .eh_frame sections).  GCC unwinder doesn't give up yet and checks if the
> return address points to the signal trampoline (which means the current
> frame is that of a signal handler).  It has built-in knowledge of how to
> unwind to the signal trampoline frame.
So llvm just gives up on signal frames ?

> A noreturn attribute isn't enough.  You can still unwind such functions.
> They are allowed to throw exceptions for example.
Ok.

> I did consider using
> a CFI directive (see patch below) and it works, but it's architecture
> specific and it's inserted after the function prologue so there's still
> a window of a few instructions where a stack unwinder will try to use
> the return address.
> 
> Index: lib/libthr/thread/thr_create.c
> ===
> --- lib/libthr/thread/thr_create.c  (revision 322802)
> +++ lib/libthr/thread/thr_create.c  (working copy)
> @@ -251,6 +251,7 @@ create_stack(struct pthread_attr *pattr)
>  static void
>  thread_start(struct pthread *curthread)
>  {
> +   __asm(".cfi_undefined %rip");
> sigset_t set;
>  
> if (curthread->attr.suspend == THR_CREATE_SUSPENDED)

I like this approach much more than the previous patch.  What can be
done is to provide asm trampoline which calls thread_start().  There you
can add the .cfi_undefined right at the entry.

It is somewhat more work than just setting the return address on the
kernel-constructed pseudo stack frame, but I believe this is ultimately
correct way.  You still can do it only on some arches, if you do not
have incentive to code asm for all of them.

Also crt1 probably should get the same treatment, despite we already set
%rbp to zero AFAIR.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SIGSEGV in /bin/sh after r322740 -> r322776 update

2017-08-22 Thread Konstantin Belousov
On Tue, Aug 22, 2017 at 05:46:17AM -0700, David Wolfskill wrote:
> On Tue, Aug 22, 2017 at 03:34:49PM +0300, Konstantin Belousov wrote:
> > ...
> > $ gdb /bin/sh sh.core
> > (gdb) bt
> > (gdb) info registers
> > (gdb) disassemble
> 
> [New LWP 100182]
> Core was generated by `sh -c cc --version || echo 0.0.0'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x000800b6ee08 in ?? () from /lib/libc.so.7
> (gdb) bt
> #0  0x000800b6ee08 in ?? () from /lib/libc.so.7
> #1  0x000800b6eda1 in malloc () from /lib/libc.so.7
> #2  0x004116cd in ?? ()
> #3  0x0041193f in ?? ()
> #4  0x00407751 in ?? ()
> #5  0x004074ef in ?? ()
> #6  0x0040fb50 in ?? ()
> #7  0x00406892 in ?? ()
> #8  0x004054aa in ?? ()
> #9  0x00405704 in ?? ()
> #10 0x0040515b in ?? ()
> #11 0x004111ae in ?? ()
> #12 0x00402ecf in ?? ()
> #13 0x00080064b000 in ?? ()
> #14 0x in ?? ()
> (gdb) info registers
> rax0x800e60fa0  34374815648
> rbx0x6278c0 6453440
> rcx0x4001024
> rdx0x5a5a5a5a5aff9c9c   651061437730972
> rsi0x7fffdf30   140737488346928
> rdi0x7fffdf60   140737488346976
> rbp0x7fffdf20   0x7fffdf20
> rsp0x7fffdea0   0x7fffdea0
> r8 0xfefefefefefefeff   -72340172838076673
> r9 0x8080808080808080   -9187201950435737472
> r100x8006cc000  34366865408
> r110x8006cc000  34366865408
> r120x6278c0 6453440
> r130x8006cc240  34366865984
> r140x1f8504
> r150x7fffdf30   140737488346928
> rip0x800b6ee08  0x800b6ee08
> eflags 0x10246  [ PF ZF IF RF ]
> cs 0x43 67
> ss 0x3b 59
> ds 
> es 
> fs 
> gs 
> fs_base
> gs_base
> (gdb) disassemble 0x800b6ee08,0x800b6ee08
> Dump of assembler code from 0x800b6ee08 to 0x800b6ee08:
> End of assembler dump.
> (gdb) disassemble 0x800b6ee08
> No function contains specified address.
> (gdb) 
> 
> > ...
> > > I grabbed a copy of the dmesg.boot for today, and have attached
> > > "head -100" from it to this message.
> > Thank you.
> 
> :-}
> 
> > ...
> > > > Does system boot into multiuser ?
> > > 
> > > Yes; it does.  But checking /var/log/messages, I see:
> > 
> > Ok, can you rebuild kernel and libc from scratch ?  I.e. remove your
> > object directories.
> 
> I think I'll need a working /bin/sh to do that.  As noted, I could
> try the stable/11 /bin/sh; on the other hand, if it's dying in a
> library, that's not likely to help a whole lot. :-}
I highly suspect that this is not /bin/sh at all.  Backtrace strongly
suggests that the malloc() has issues, but again I suspect that the
reason is not an issue in malloc, but its use of TLS.

The amd64 changes were to the TLS base register handling.  So you might
try to boot previous kernel.  If this works out without replacing libc
then it is definitely TLS, but I still do not know what is wrong.

> 
> But yes: once we resolve the "working /bin/sh" issue, clearing
> /usr/obj & rebuilding is straighforward and shouldn't take too long.
> 
> Peace,
> david
> -- 
> David H. Wolfskillda...@catwhisker.org
> If we wish to eliminate sources of Fake News, start at the top: D. Trump.
> 
> See http://www.catwhisker.org/~david/publickey.gpg for my public key.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Switch vfs.nfsd.issue_delegations to TUNABLE ?

2017-11-28 Thread Konstantin Belousov
On Tue, Nov 28, 2017 at 11:41:36AM +0100, Emmanuel Vadot wrote:
> 
>  Hello,
> 
>  I would like to switch the vfs.nfsd.issue_delegations sysctl to a
> tunable.
>  The reason behind it is recent problem at work on some on our filer
> related to NFS.
>  We use NFSv4 without delegation as we never been able to have good
> performance with FreeBSD server and Linux client (we need to do test
> again but that's for later). We recently see all of our filers with NFS
> enabled perform pourly, resulting in really bad performance on the
> service.
>  After doing some analyze with pmcstat we've seen that we spend ~50%
> of our time in lock delay, called by nfsrv_checkgetattr (See [1]).
>  This function is only usefull when using delegation, as it search for
> some write delegations issued to client, but it's called anyway when
> there so no delegation and cause a lot of problem when having a lot of
> activities.
>  We've patched the kernel with the included diff and now everything is
> fine (tm) (See [2]).
>  The problem for upstreaming this patch is that since issue_delegations
> is a sysctl we cannot know if the checkgetattr called is needed, hence
> the question to switch it to a TUNABLE. Also maybe some other code path
> could benefit from it, I haven't read all the source of nfsserver yet.
> 
>  Note that I won't MFC the changes as it would break POLA.
Perhaps make nodeleg per-mount flag ?
The you can check it safely by dereferencing vp->v_mount->mnt_data without
acquiring any locks, while the vnode lock is owned and the vnode is not
reclaimed.

> 
>  Cheers,
> 
> [1] https://people.freebsd.org/~manu/m8.svg
> [2] https://people.freebsd.org/~manu/m8-new.svg
> 
> >From 0cba277f406d3ccf3c9e943a3d4e17b529e31c89 Mon Sep 17 00:00:00 2001
> From: Emmanuel Vadot 
> Date: Fri, 24 Nov 2017 11:17:18 +0100
> Subject: [PATCH 2/4] Do not call nfsrv_checkgetttr if delegation isn't
> enable.
> 
> Signed-off-by: Emmanuel Vadot 
> ---
>  sys/fs/nfsserver/nfs_nfsdserv.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/sys/fs/nfsserver/nfs_nfsdserv.c
> b/sys/fs/nfsserver/nfs_nfsdserv.c index 8102c5810a9..8daf0142360 100644
> --- a/sys/fs/nfsserver/nfs_nfsdserv.c
> +++ b/sys/fs/nfsserver/nfs_nfsdserv.c
> @@ -54,6 +54,7 @@ extern struct timeval nfsboottime;
>  extern int nfs_rootfhset;
>  extern int nfsrv_enable_crossmntpt;
>  extern int nfsrv_statehashsize;
> +extern int nfsrv_issuedelegs;
>  #endif   /* !APPLEKEXT */
>  
>  static int   nfs_async = 0;
> @@ -240,7 +241,7 @@ nfsrvd_getattr(struct nfsrv_descript *nd, int
> isdgram, if (nd->nd_flag & ND_NFSV4) {
>   if (NFSISSET_ATTRBIT(,
> NFSATTRBIT_FILEHANDLE)) nd->nd_repstat = nfsvno_getfh(vp, , p);
> - if (!nd->nd_repstat)
> + if (nd->nd_repstat == 0 && nfsrv_issuedelegs
> == 1) nd->nd_repstat = nfsrv_checkgetattr(nd, vp,
>   , , nd->nd_cred, p);
>   if (nd->nd_repstat == 0) {
> -- 
> 2.14.2
> 
> 
> -- 
> Emmanuel Vadot  
> ___
> freebsd...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: core dumped no process name

2017-11-28 Thread Konstantin Belousov
On Tue, Nov 28, 2017 at 06:09:16PM +0700, Alex V. Petrov wrote:
> You were right.
> After revert to r325188 worked:
> Nov 28 18:05:18 alex kernel: pid 23618 (mc), uid 0: exited on signal 6
> (core dumped)
Thank you for confirming.

Adding the committer.

> 
> 28.11.2017 17:38, Konstantin Belousov пишет:
> > On Tue, Nov 28, 2017 at 05:13:10PM +0700, Alex V. Petrov wrote:
> >> from /var/log/messages
> >>
> >> # grep core /var/log/messages
> >> Nov 21 06:26:56 alex kernel: 1001: exited on signal 11 (core dumped)
> >> Nov 22 00:46:31 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> >> Nov 22 16:01:32 alex kernel: 1001: exited on signal 11 (core dumped)
> >> Nov 23 08:14:45 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> >> Nov 24 04:32:17 alex kernel: 1001: exited on signal 11 (core dumped)
> >> Nov 24 04:41:44 alex kernel: 1001: exited on signal 11 (core dumped)
> >> Nov 24 04:48:03 alex kernel: 1001: exited on signal 11 (core dumped)
> >> Nov 24 04:49:55 alex kernel: 1001: exited on signal 11 (core dumped)
> >> Nov 24 11:33:45 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> >> Nov 26 02:50:35 alex kernel: 1001: exited on signal 11 (core dumped)
> >> Nov 26 03:47:56 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> >> Nov 26 12:05:01 alex kernel: 1001: exited on signal 11 (core dumped)
> >> Nov 26 14:16:26 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> >> Nov 26 14:27:05 alex kernel: 0: exited on signal 6 (core dumped)
> >> Nov 26 23:49:56 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> >> Nov 26 23:50:26 alex kernel: 0: exited on signal 6 (core dumped)
> >> Nov 27 12:13:02 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> >> Nov 27 15:28:11 alex kernel: 0: exited on signal 6 (core dumped)
> >> Nov 27 15:50:54 alex kernel: 0: exited on signal 6 (core dumped)
> >> Nov 27 16:34:02 alex kernel: 0: exited on signal 6 (core dumped)
> >> Nov 28 03:07:42 alex kernel: 0: exited on signal 6 (core dumped)
> >> Nov 28 04:20:03 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> >> Nov 28 04:51:08 alex kernel: 0: exited on signal 6 (core dumped)
> >> Nov 28 10:28:38 alex kernel: 1001: exited on signal 11 (core dumped)
> >> Nov 28 15:55:42 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> >> Nov 28 17:05:35 alex kernel: 0: exited on signal 6 (core dumped)
> >>
> > Do you use syslogd after r325558 ?  If yes, try to revert it.
> > 
> > 
> 
> -- 
> -
> Alex.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: core dumped no process name

2017-11-28 Thread Konstantin Belousov
On Tue, Nov 28, 2017 at 05:13:10PM +0700, Alex V. Petrov wrote:
> from /var/log/messages
> 
> # grep core /var/log/messages
> Nov 21 06:26:56 alex kernel: 1001: exited on signal 11 (core dumped)
> Nov 22 00:46:31 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> Nov 22 16:01:32 alex kernel: 1001: exited on signal 11 (core dumped)
> Nov 23 08:14:45 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> Nov 24 04:32:17 alex kernel: 1001: exited on signal 11 (core dumped)
> Nov 24 04:41:44 alex kernel: 1001: exited on signal 11 (core dumped)
> Nov 24 04:48:03 alex kernel: 1001: exited on signal 11 (core dumped)
> Nov 24 04:49:55 alex kernel: 1001: exited on signal 11 (core dumped)
> Nov 24 11:33:45 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> Nov 26 02:50:35 alex kernel: 1001: exited on signal 11 (core dumped)
> Nov 26 03:47:56 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> Nov 26 12:05:01 alex kernel: 1001: exited on signal 11 (core dumped)
> Nov 26 14:16:26 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> Nov 26 14:27:05 alex kernel: 0: exited on signal 6 (core dumped)
> Nov 26 23:49:56 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> Nov 26 23:50:26 alex kernel: 0: exited on signal 6 (core dumped)
> Nov 27 12:13:02 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> Nov 27 15:28:11 alex kernel: 0: exited on signal 6 (core dumped)
> Nov 27 15:50:54 alex kernel: 0: exited on signal 6 (core dumped)
> Nov 27 16:34:02 alex kernel: 0: exited on signal 6 (core dumped)
> Nov 28 03:07:42 alex kernel: 0: exited on signal 6 (core dumped)
> Nov 28 04:20:03 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> Nov 28 04:51:08 alex kernel: 0: exited on signal 6 (core dumped)
> Nov 28 10:28:38 alex kernel: 1001: exited on signal 11 (core dumped)
> Nov 28 15:55:42 alex kernel: FreeBSD/SMP: 1 package(s) x 8 core(s)
> Nov 28 17:05:35 alex kernel: 0: exited on signal 6 (core dumped)
> 
Do you use syslogd after r325558 ?  If yes, try to revert it.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Switch vfs.nfsd.issue_delegations to TUNABLE ?

2017-11-28 Thread Konstantin Belousov
On Tue, Nov 28, 2017 at 02:26:10PM +0100, Emmanuel Vadot wrote:
> On Tue, 28 Nov 2017 13:04:28 +0200
> Konstantin Belousov <kostik...@gmail.com> wrote:
> 
> > On Tue, Nov 28, 2017 at 11:41:36AM +0100, Emmanuel Vadot wrote:
> > > 
> > >  Hello,
> > > 
> > >  I would like to switch the vfs.nfsd.issue_delegations sysctl to a
> > > tunable.
> > >  The reason behind it is recent problem at work on some on our filer
> > > related to NFS.
> > >  We use NFSv4 without delegation as we never been able to have good
> > > performance with FreeBSD server and Linux client (we need to do test
> > > again but that's for later). We recently see all of our filers with NFS
> > > enabled perform pourly, resulting in really bad performance on the
> > > service.
> > >  After doing some analyze with pmcstat we've seen that we spend ~50%
> > > of our time in lock delay, called by nfsrv_checkgetattr (See [1]).
> > >  This function is only usefull when using delegation, as it search for
> > > some write delegations issued to client, but it's called anyway when
> > > there so no delegation and cause a lot of problem when having a lot of
> > > activities.
> > >  We've patched the kernel with the included diff and now everything is
> > > fine (tm) (See [2]).
> > >  The problem for upstreaming this patch is that since issue_delegations
> > > is a sysctl we cannot know if the checkgetattr called is needed, hence
> > > the question to switch it to a TUNABLE. Also maybe some other code path
> > > could benefit from it, I haven't read all the source of nfsserver yet.
> > > 
> > >  Note that I won't MFC the changes as it would break POLA.
> > Perhaps make nodeleg per-mount flag ?
> > The you can check it safely by dereferencing vp->v_mount->mnt_data without
> > acquiring any locks, while the vnode lock is owned and the vnode is not
> > reclaimed.
> 
>  That won't work with current code.
Why ?

>  Currently if you have delegation enabled and connect one client to a
> mountpoint, then disable delegation, the current client will still have
> delegation while new ones will not. I have not tested restarting nfsd in
> this situation but I suspect that client will behave badly. This is a
> another +1 for making it a tunable I think.
It is up to the filesystem to handle remount, in particular, fs can disable
changing mount options on mount upgrade if the operation is not supported.
In other words, you would do
mount -o nodeleg ... /mnt
and
mount -u -o nonodeleg ... /mnt
needs to return EINVAL.

> 
> > > 
> > >  Cheers,
> > > 
> > > [1] https://people.freebsd.org/~manu/m8.svg
> > > [2] https://people.freebsd.org/~manu/m8-new.svg
> > > 
> > > >From 0cba277f406d3ccf3c9e943a3d4e17b529e31c89 Mon Sep 17 00:00:00 2001
> > > From: Emmanuel Vadot <m...@freebsd.org>
> > > Date: Fri, 24 Nov 2017 11:17:18 +0100
> > > Subject: [PATCH 2/4] Do not call nfsrv_checkgetttr if delegation isn't
> > > enable.
> > > 
> > > Signed-off-by: Emmanuel Vadot <m...@freebsd.org>
> > > ---
> > >  sys/fs/nfsserver/nfs_nfsdserv.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/sys/fs/nfsserver/nfs_nfsdserv.c
> > > b/sys/fs/nfsserver/nfs_nfsdserv.c index 8102c5810a9..8daf0142360 100644
> > > --- a/sys/fs/nfsserver/nfs_nfsdserv.c
> > > +++ b/sys/fs/nfsserver/nfs_nfsdserv.c
> > > @@ -54,6 +54,7 @@ extern struct timeval nfsboottime;
> > >  extern int nfs_rootfhset;
> > >  extern int nfsrv_enable_crossmntpt;
> > >  extern int nfsrv_statehashsize;
> > > +extern int nfsrv_issuedelegs;
> > >  #endif   /* !APPLEKEXT */
> > >  
> > >  static int   nfs_async = 0;
> > > @@ -240,7 +241,7 @@ nfsrvd_getattr(struct nfsrv_descript *nd, int
> > > isdgram, if (nd->nd_flag & ND_NFSV4) {
> > >   if (NFSISSET_ATTRBIT(,
> > > NFSATTRBIT_FILEHANDLE)) nd->nd_repstat = nfsvno_getfh(vp, , p);
> > > - if (!nd->nd_repstat)
> > > + if (nd->nd_repstat == 0 && nfsrv_issuedelegs
> > > == 1) nd->nd_repstat = nfsrv_checkgetattr(nd, vp,
> > >   , , nd->nd_cred, p);
> > >   if (nd->nd_repstat == 0) {
> > > -- 
> > > 2.14.2
> > > 
> > > 
> > > -- 
> > > Emmanuel Vadot <m...@bidouilliste.com> <m...@freebsd.org>
> > > ___
> > > freebsd...@freebsd.org mailing list
> > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > To unsubscribe, send any mail to "freebsd-fs-unsubscr...@freebsd.org"
> > ___
> > freebsd-current@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 
> 
> -- 
> Emmanuel Vadot <m...@bidouilliste.com> <m...@freebsd.org>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: core dumped no process name

2017-11-28 Thread Konstantin Belousov
On Tue, Nov 28, 2017 at 08:37:44AM -0800, Gleb Smirnoff wrote:
>   Hi Alex,
> 
> On Tue, Nov 28, 2017 at 06:09:16PM +0700, Alex V. Petrov wrote:
> A> You were right.
> A> After revert to r325188 worked:
> A> Nov 28 18:05:18 alex kernel: pid 23618 (mc), uid 0: exited on signal 6
> A> (core dumped)
> 
> Can you please send over to me the syslogd binary and syslogd.core?

Syslogd does not crash, it removes part of the kernel message which it
perceives as the host name.  Read the whole thread.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: core dumped no process name

2017-11-28 Thread Konstantin Belousov
On Tue, Nov 28, 2017 at 10:37:45AM +0700, Alex V. Petrov wrote:
> SUBJ:
> Nov 28 10:28:38 alex kernel: 1001: exited on signal 11 (core dumped)
> It was a thunerbird
> 
> 
> 12.0-CURRENT #0 r326286
> kernel GENERIC-NODEBUG

The format strings is following:
"pid %d (%s), uid %d: exited on signal %d%s\n"
In other words, your line misses whole prefix, I suppose that '1001:' part
is uid printout.

Where do you see this line ?  Is it from /var/log/messages, remote syslog
service, console ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: /usr/obj is 11GB huge on FreeBSD 12-current

2017-12-15 Thread Konstantin Belousov
On Fri, Dec 15, 2017 at 06:38:48PM +0100, Wolfram Schneider wrote:
> On 15 December 2017 at 17:51, Wolfram Schneider  wrote:
> > On 15 December 2017 at 13:02, David Wolfskill  wrote:
> >> On Fri, Dec 15, 2017 at 10:12:09AM +0100, Wolfram Schneider wrote:
> >>> Hi,
> >>>
> >>> I upgraded a machine from 11-stable to 12-current. The /usr/obj tree
> >>> is now 11GB huge:
> >>>
> >>> FreeBSD 12-current
> >>> $ du -hs /usr/obj
> >>>  11G /usr/obj
> >>>
> >>> on FreeBSD 11-stable it was less the size:
> >>> $ du -hs /usr/obj
> >>> 5.6G /usr/obj
> >>>
> >>> this is a problem when you have a small VM with 20GB disk space or less.
> >>>
> >>> Is there a way to use less /usr/obj disk space during build? I know
> >>> that we have to do some bootstrapping for newer compiler tools, but
> >>> does we need to keep all temp files during the build?
> >>
> >> There was a change near the beginning of November; please see UPDATING
> >> entry 20171101 -- you probably have several no-longer-used
> >> subdirectories under /usr/obj/usr/src/.
> >>
> >> Once those are cleared out, my experience (tracking stable/11 & head in
> >> different slices on the same machines) is that stbale/11 is using about
> >> 5.0G, while head uses about 6.1G.
> >
> > I think the suspect directories are "tmp" and "obj-lib32", together
> > they are 4.1GB huge.
> >
> > I will run a build of current again with a clean obj tree (-current on
> > a recent -current). Let's see.
> 
> I run a test on universe12b (FreeBSD 12.0-CURRENT #0 r325426: Sun Nov
> 5) with an empty obj directory.
> 
> `make buildworld' creates 9.7GB of obj data. After running `make
> buildkernel' it will grow to 12GB. This is on a ZFS filesystem (my
> original report was on UFS)

Most likely reason of the bump is generation of debugging data, turned on
for 12.  Another not usable thing to disable are tests and profile libraries.
Put the following into /etc/src.conf:
WITHOUT_PROFILE=yes
WITHOUT_DEBUG_FILES=yes
WITHOUT_TESTS=yes

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: cannot access pass device from within jail

2017-12-17 Thread Konstantin Belousov
On Sun, Dec 17, 2017 at 02:52:12PM -0500, Dan Langille wrote:
> Hello,
> 
> What suggestions do you have for where I should look next? I'm happy to start 
> installing various builds of FreeBSD in order to track down which commit 
> caused this.
> 
> I'm trying to access a tape library from within a jail running on a FreeBSD 
> 11.1 host.  sa(4) devices are working (e.g. I can rewind nsa0).
> 
> pass(4) devices (i.e. the tape changer ch0) are not working.  This morning I 
> posted to -scsi@: 
> https://lists.freebsd.org/pipermail/freebsd-scsi/2017-December/007608.html
> 
> The device appears in the jail and has appropriate permissions.  This access 
> was granted
> via /etc/devfs.rules using the same approach I used for FreeBSD 10.3
> 
> The permissions in the jail:
> 
> [root@bacula-sd-02 ~]# ls -l /dev/pass7
> crw---  1 root  operator  0x74 Dec 16 21:52 /dev/pass7
> 
> The command in the jail:
> 
> [root@bacula-sd-02 ~]# mtx -f /dev/pass7 status 
> cannot open SCSI device '/dev/pass7' - Operation not permitted
> 
> Here is the truss output of the command in question: 
> https://gist.github.com/dlangille/b80ee804b8080e1cbf5b5ab67f0bdabe

Does it work to access the pass device from host using host' /dev ?
Same question for the host access using the nodes of the jailed devfs mount.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: cannot access pass device from within jail

2017-12-17 Thread Konstantin Belousov
On Sun, Dec 17, 2017 at 03:49:15PM -0500, Dan Langille wrote:
> > On Dec 17, 2017, at 3:37 PM, Konstantin Belousov <kostik...@gmail.com> 
> > wrote:
> > Does it work to access the pass device from host using host' /dev ?
> 
> Yes, it does. see "This command on the host" at 
> https://lists.freebsd.org/pipermail/freebsd-scsi/2017-December/007610.html

Ok.

> 
> > Same question for the host access using the nodes of the jailed devfs mount.
> 
> I didn't try that, but I will soon. To be clear, does this command on the 
> host look like what you have in mind?
> 
> mtx -f /usr/jails/bacula-sd-02/dev/pass7 status 
I do not know.  Check with truss which node gets accessed.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: XBOX/i386 and xlint removal

2017-11-21 Thread Konstantin Belousov
On Thu, Nov 09, 2017 at 03:52:21PM +0200, Konstantin Belousov wrote:
> Hello,
> I created two reviews to axe two features which I personally find of
> little use in modern FreeBSD.

...

I put the follow up to remove the lint support from the headers.
Feel free to review and comment, after adding yourself as reviewer.

https://reviews.freebsd.org/D13156

Only include/ include/sys and x86 are handled.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [HEADS UP] posix_fallocate support removed from ZFS, lld affected

2017-11-06 Thread Konstantin Belousov
On Mon, Nov 06, 2017 at 11:43:42AM -0700, Ed Maste wrote:
> On 6 November 2017 at 10:56, Ian Lepore  wrote:
> >
> > Oh, right.  lld != ld.
> 
> Indeed, but this will be a problem for the arm64 package builds if
> they use ZFS and an 11.x userland on a new kernel. We probably need to
> bring the lld change in as an errata.

Or make ZFS return (lie) success for p_osrel < bumped osrel.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


XBOX/i386 and xlint removal

2017-11-09 Thread Konstantin Belousov
Hello,
I created two reviews to axe two features which I personally find of
little use in modern FreeBSD.

https://reviews.freebsd.org/D13015 xlint
https://reviews.freebsd.org/D13016 XBOX/i386

Reviews contain the explanations.  For xlint it is just an overdue, IMO.
While for XBOX I do not have much opposition against supporting the
obsoleted platforms, but the way the port was done pollutes the sources
with too much #ifdefs.  Since people often do not want to care about i386
together with amd64, additional hurdle only makes the things worse.

Feel free to add yourself as reviewer.  I intend to commit the patches
in a week, unless strong objections expressed by the time.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Segfault in _Unwind_* code called from pthread_exit

2017-10-21 Thread Konstantin Belousov
On Sat, Oct 21, 2017 at 10:02:38PM +0200, Andreas Tobler wrote:
> On 26.08.17 20:40, Konstantin Belousov wrote:
> > On Sat, Aug 26, 2017 at 08:28:13PM +0200, Tijl Coosemans wrote:
> >> On Sat, 26 Aug 2017 02:44:42 +0300 Konstantin Belousov 
> >> <kostik...@gmail.com> wrote:
> >>> How does llvm unwinder detects that the return address is a garbage ?
> >>
> >> It just stops unwinding when it can't find frame information (stored in
> >> .eh_frame sections).  GCC unwinder doesn't give up yet and checks if the
> >> return address points to the signal trampoline (which means the current
> >> frame is that of a signal handler).  It has built-in knowledge of how to
> >> unwind to the signal trampoline frame.
> > So llvm just gives up on signal frames ?
> > 
> >> A noreturn attribute isn't enough.  You can still unwind such functions.
> >> They are allowed to throw exceptions for example.
> > Ok.
> > 
> >> I did consider using
> >> a CFI directive (see patch below) and it works, but it's architecture
> >> specific and it's inserted after the function prologue so there's still
> >> a window of a few instructions where a stack unwinder will try to use
> >> the return address.
> >>
> >> Index: lib/libthr/thread/thr_create.c
> >> ===
> >> --- lib/libthr/thread/thr_create.c  (revision 322802)
> >> +++ lib/libthr/thread/thr_create.c  (working copy)
> >> @@ -251,6 +251,7 @@ create_stack(struct pthread_attr *pattr)
> >>   static void
> >>   thread_start(struct pthread *curthread)
> >>   {
> >> +   __asm(".cfi_undefined %rip");
> >>  sigset_t set;
> >>   
> >>  if (curthread->attr.suspend == THR_CREATE_SUSPENDED)
> > 
> > I like this approach much more than the previous patch.  What can be
> > done is to provide asm trampoline which calls thread_start().  There you
> > can add the .cfi_undefined right at the entry.
> > 
> > It is somewhat more work than just setting the return address on the
> > kernel-constructed pseudo stack frame, but I believe this is ultimately
> > correct way.  You still can do it only on some arches, if you do not
> > have incentive to code asm for all of them.
> > 
> > Also crt1 probably should get the same treatment, despite we already set
> > %rbp to zero AFAIR.
> 
> Did some commit result out of this discussion or is this subject still 
> under investigation?
Nothing was done AFAIK.

> 
> Curious because I got this gcc PR:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82635
> 
> Tia,
> Andreas
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Segfault in _Unwind_* code called from pthread_exit

2017-10-31 Thread Konstantin Belousov
On Tue, Oct 31, 2017 at 08:37:29PM +0100, Andreas Tobler wrote:
> On 31.10.17 10:28, Konstantin Belousov wrote:
> > On Mon, Oct 30, 2017 at 10:54:05PM +0100, Andreas Tobler wrote:
> >> On 30.10.17 15:32, Tijl Coosemans wrote:
> >>> On Sun, 29 Oct 2017 20:40:46 +0100 Andreas Tobler 
> >>> <andreast-l...@fgznet.ch> wrote:
> >>>> Attached what I have for libgcc. It can be applied to gcc5-8, should
> >>>> give no issues. The mentioned tc from this thread and mine,
> >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82635 do pass.
> >>>>
> >>>> What do you think?
> >>>
> >>> Like I said before the return address can be anything.  It could for
> >>> instance point to some instruction in a random function and then the
> >>> stack unwinder will think thread_start was called from that function.
> >>> There's no check you can add to libgcc to distinguish that from a
> >>> normal valid return address.
> >>>
> >> Maybe not, and most probably I do not understand what is happening. But
> >> with my modification I survive the test case.
> >>
> >> If no objections from your or Konstantin's side come up I will commit it
> >> to the gcc repo. It will not 'fix' the issue, but it will improve the
> >> gcc behavior.
> > 
> > I posted something similar when the discussion thread started. From the
> > cursory look, your patch is better than mine. The only difference that
> > makes me wonder is that I used #ifdef KERN_PROC_SIGTRAMP around the
> > block because I believe gcc has more relaxed policy about supporting
> > obsoleted OS versions.
> 
> I am aware about KERN_PROC_SIGTRAMP and older OS releases, that's why I 
> asked for feedback.
> Do we, FreeBSD'ers, want to have gcc unwind support on older than 
> FreeBSD 9.3 releases? I think the gcc folks do not care, but we are the 
> ones who might have an need for such a support?
Well, I put the #ifdef because I suspected that gcc folks cared, if
anybody.  For instance I know that perl people do.

Is there some specific configuration bits in gcc that are only relevant for 
older releases ?  If yes, then we perhaps should not break them until
removed.  If not, then it does not matter, most likely.

> @Gerald, do you have an opinion?
> 
> I can 'ifdef' the new code and in the 'else' case we fall back to the 
> already existing path.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Segfault in _Unwind_* code called from pthread_exit

2017-10-29 Thread Konstantin Belousov
On Sun, Oct 29, 2017 at 06:23:51PM +0100, Tijl Coosemans wrote:
> On Sat, 26 Aug 2017 21:40:34 +0300 Konstantin Belousov <kostik...@gmail.com> 
> wrote:
> > On Sat, Aug 26, 2017 at 08:28:13PM +0200, Tijl Coosemans wrote:
> >> I did consider using
> >> a CFI directive (see patch below) and it works, but it's architecture
> >> specific and it's inserted after the function prologue so there's still
> >> a window of a few instructions where a stack unwinder will try to use
> >> the return address.
> >> 
> >> Index: lib/libthr/thread/thr_create.c
> >> ===
> >> --- lib/libthr/thread/thr_create.c  (revision 322802)
> >> +++ lib/libthr/thread/thr_create.c  (working copy)
> >> @@ -251,6 +251,7 @@ create_stack(struct pthread_attr *pattr)
> >>  static void
> >>  thread_start(struct pthread *curthread)
> >>  {
> >> +   __asm(".cfi_undefined %rip");
> >> sigset_t set;
> >>  
> >> if (curthread->attr.suspend == THR_CREATE_SUSPENDED)  
> > 
> > I like this approach much more than the previous patch.  What can be
> > done is to provide asm trampoline which calls thread_start().  There you
> > can add the .cfi_undefined right at the entry.
> >
> > It is somewhat more work than just setting the return address on the
> > kernel-constructed pseudo stack frame, but I believe this is ultimately
> > correct way.  You still can do it only on some arches, if you do not
> > have incentive to code asm for all of them.
> 
> Ok, but then there are two ways to implement the trampoline:
> 
> 1)
>   movq $0,(%rsp)
>   jmp thread_start
> 2)
>   subq $8,%rsp
>   call thread_start
>   /* NOTREACHED */
> 
> With 1) you're setting the return address to zero anyway, so you might
> as well do that in the kernel like my first patch.  With 2) you're
> setting up a new call frame, basically redoing what the kernel already
> did and on i386 this also means copying the function argument.
I do not quite understand the second variant, because the stack is not
guaranteed to be zeroed, and it is often not if reused after the previously
exited thread.

The first variant is what I like, but perhaps we need to emulate the
frame as well, i.e. push two zero longs.

Currently kernel does not access the usermode stack for the new thread
unless dictated by ABI (i.e. it does not touch it for 64bit process
on amd64, but have to for 32bit).  I like this property.  Also, the
previous paragraph is indicative: we do not really know in kernel
what ABI the userspace follows.  It might want frame, may be it does
not need it.  It could use other register than %rbp as the frame base,
etc.

> 
> Do you have any preference (or better alternatives), because I think I
> still prefer my first patch.  It's the caller's job to set up the call
> frame, in this case the kernel.  And if the kernel handles it then it
> also works with (hypothetical) implementations other than libthr.
> 
> > Also crt1 probably should get the same treatment, despite we already set
> > %rbp to zero AFAIR.
> 
> I haven't checked but I imagine the return address of the process entry
> point is always zero because the stack is all zeros.
Stack is not zero. The environment and argument strings and auxv are copied
at top, and at the bottom the ps_strings structure is located, so it
is not.

If you commit your existing patch as is, I will not resent.  But I do think
that stuff that can be done in usermode, should be done in usermode, esp.
when the amount of efforts is same.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Segfault in _Unwind_* code called from pthread_exit

2017-10-31 Thread Konstantin Belousov
On Mon, Oct 30, 2017 at 10:54:05PM +0100, Andreas Tobler wrote:
> On 30.10.17 15:32, Tijl Coosemans wrote:
> > On Sun, 29 Oct 2017 20:40:46 +0100 Andreas Tobler  
> > wrote:
> >> Attached what I have for libgcc. It can be applied to gcc5-8, should
> >> give no issues. The mentioned tc from this thread and mine,
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82635 do pass.
> >>
> >> What do you think?
> > 
> > Like I said before the return address can be anything.  It could for
> > instance point to some instruction in a random function and then the
> > stack unwinder will think thread_start was called from that function.
> > There's no check you can add to libgcc to distinguish that from a
> > normal valid return address.
> > 
> Maybe not, and most probably I do not understand what is happening. But 
> with my modification I survive the test case.
> 
> If no objections from your or Konstantin's side come up I will commit it 
> to the gcc repo. It will not 'fix' the issue, but it will improve the 
> gcc behavior.

I posted something similar when the discussion thread started. From the
cursory look, your patch is better than mine. The only difference that
makes me wonder is that I used #ifdef KERN_PROC_SIGTRAMP around the
block because I believe gcc has more relaxed policy about supporting
obsoleted OS versions.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: vtopde on a uva/gpa 0x1030000 @r325228 (amd64)

2017-10-31 Thread Konstantin Belousov
On Tue, Oct 31, 2017 at 04:37:10AM -0700, David Wolfskill wrote:
> Performed the usual src update in head/amd64; today was from r325137 to
> r325228.  Laptop had no complaints; everything Just Worked (as usual).
> 

> GEOM: new disk da1
> panic: vtopde on a uva/gpa 0x103
> cpuid = 5
> time = 1509448749
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe07c77a8880
> vpanic() at vpanic+0x19c/frame 0xfe07c77a8900
> kassert_panic() at kassert_panic+0x126/frame 0xfe07c77a8970
> pmap_kextract() at pmap_kextract+0x121/frame 0xfe07c77a89a0
> free() at free+0x5e/frame 0xfe07c77a89e0
> g_slice_orphan() at g_slice_orphan+0x7e/frame 0xfe07c77a8a00
> g_spoil_event() at g_spoil_event+0x72/frame 0xfe07c77a8a30
> g_run_events() at g_run_events+0x231/frame 0xfe07c77a8a70
> fork_exit() at fork_exit+0x84/frame 0xfe07c77a8ab0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe07c77a8ab0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> [ thread pid 13 tid 100044 ]
> Stopped at  kdb_enter+0x3b: movq$0,kdb_why
> db> 

Try to revert r325227.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Lag after resume culprit found

2018-05-17 Thread Konstantin Belousov
On Thu, May 17, 2018 at 11:06:42AM +0300, Andriy Gapon wrote:
> On 17/05/2018 10:56, Johannes Lundberg wrote:
> > 
> > 
> > On Thu, May 17, 2018 at 8:46 AM, Johannes Lundberg  > > wrote:
> > 
> > 
> > 
> > On Thu, May 17, 2018 at 7:43 AM, Andriy Gapon  > > wrote:
> > 
> > On 17/05/2018 02:07, Johannes Lundberg wrote:
> > > 
> > https://github.com/freebsd/freebsd/commit/66f063557f257baa9c8aeab9f933171eaa6e1cfa
> > 
> > 
> > > x86 cpususpend_handler: call wbinvd after setting suspend state 
> > bits
> > 
> > That's very interesting and surprising.
> > That commit changes something that happens before suspend, it 
> > should not
> > have
> > any effect on the system state after resume.
> > 
> > Does anyone have a theory of what could be wrong?
> > 
> > 
> > Nope but moving
> >         CPU_CLR_ATOMIC(cpu, _cpus);
> > back to the end of that scope fixes it.
> >  
> > 
> > 
> > I did some further testing.
> > Calling
> > CPU_CLR_ATOMIC(cpu, _cpus);
> > before
> > pmap_init_pat();
> >  is what "breaks" resume.
> > 
> > Is this Intel only or this it happen on AMD as well (which this patch was
> > intended for)?
> 
> Not sure about the PAT part, but fpuresume/npxresume would affect all 
> platforms.
> It's a bit puzzling that doing PAT manipulations on one AP while another AP is
> being brought up is problematic.  Probably there is something that I am 
> missing.

Manipulating PAT might affect the cache consistency, since contradicting
caching attributes are applied to the line of the suspended_cpus variable
which is already cached.  It might be not the variable itself that causes
the final mis-operation, but some other data sharing the line.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Floating point in kernel?

2018-05-26 Thread Konstantin Belousov
On Sat, May 26, 2018 at 10:59:50AM +0100, Johannes Lundberg wrote:
> Hi
> 
> The new AMD graphics drivers use floats in kernel space.
> Upstream Linux code has something like:
> kernel_fpu_begin();
> ...
> kernel_fpu_end();
> 
> Do we have similar functions to save/restore FPU registers?
Read fpu_kern(9).
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] Deprecation and removal of the drm2 driver

2018-05-18 Thread Konstantin Belousov
On Fri, May 18, 2018 at 09:33:40PM +0100, Johannes Lundberg wrote:
> On Fri, May 18, 2018 at 9:22 PM, Ben Widawsky <b...@bwidawsk.net> wrote:
> 
> > On 18-05-18 14:15:03, Warner Losh wrote:
> > > On Fri, May 18, 2018 at 2:12 PM, Johannes Lundberg <johal...@gmail.com>
> > > wrote:
> > >
> > > >
> > > >
> > > > On Fri, May 18, 2018 at 9:03 PM, Warner Losh <i...@bsdimp.com> wrote:
> > > >
> > > >> On Fri, May 18, 2018 at 1:30 PM, Steve Kargl <
> > > >> s...@troutmask.apl.washington.edu> wrote:
> > > >>
> > > >> > On Fri, May 18, 2018 at 09:14:24PM +0200, Andreas Nilsson wrote:
> > > >> > > On Fri, May 18, 2018, 20:00 Niclas Zeising <zeis...@freebsd.org>
> > > >> wrote:
> > > >> > >
> > > >> > > > I propose that we remove the old drm2 driver (sys/dev/drm2) from
> > > >> > > > FreeBSD.  I suggest the driver is marked as deprecated in 11.x
> > and
> > > >> > > > removed from 12.0, as was done for other drivers recently.  Some
> > > >> > > > background and rationale:
> > > >> > > >
> > > >> > > > The drm2 driver was the original port of a KMS driver to
> > FreeBSD.
> > > >> It
> > > >> > > > was done by Konstantin Belousov to support Intel graphics
> > cards, and
> > > >> > > > later extended by Jean-S??bastien P??dron as well as Konstantin 
> > > >> > > > to
> > > >> match
> > > >> > > > what's in Linux 3.8.  This included unstable support from
> > Haswell,
> > > >> but
> > > >> > > > nothing newer than that.
> > > >> > > >
> > > >> > > > For quite some time now we have had the
> > graphics/drm-stable-kmod and
> > > >> > > > graphics/drm-next-kmods which provides support for modern AMD
> > and
> > > >> Intel
> > > >> > > > graphics cards.  These ports, together with the linuxkpi, or
> > lkpi,
> > > >> has
> > > >> > > > made it significantly easier to port and update our graphics
> > > >> drivers.
> > > >> > > > Further, these new drivers cover the same drivers as the old
> > drm2
> > > >> > driver.
> > > >> > > >
> > > >> > > > What does the community think?  Is there anyone still using the
> > drm2
> > > >> > > > driver on 12-CURRENT?  If so, what is preventing you from
> > switching
> > > >> to
> > > >> > > > the port?
> > > >> > > >
> > > >> > > > Thank you
> > > >> > > > Regards
> > > >> > > > --
> > > >> > > > Niclas Zeising
> > > >> > > > FreeBSD x11/graphics team
> > > >> > > > ___
> > > >> > > > freebsd-current@freebsd.org mailing list
> > > >> > > > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > >> > > > To unsubscribe, send any mail to "freebsd-current-unsubscribe@
> > > >> > freebsd.org"
> > > >> > > >
> > > >> > >
> > > >> > > Sounds good ( deprecate resp remove ). It causes more confusion
> > and
> > > >> > > problems and it solves nothing.
> > > >> > >
> > > >> >
> > > >> > Check the Makefiles
> > > >> >
> > > >> > % more /usr/ports/graphics/drm-next-kmod/Makefile
> > > >> >
> > > >> > ONLY_FOR_ARCHS= amd64
> > > >> > ONLY_FOR_ARCHS_REASON=  the new KMS components are only supported on
> > > >> amd64
> > > >> >
> > > >> > Not to ia32 friendly.
> > > >> >
> > > >>
> > > >> So do people use i386 for desktop? And need the latest KMS stuff?
> > > >>
> > > >
> > > > Yeah I was wondering the same.. If you're running i386, do you need drm
> > > > drivers? Will scfb work an i386? (probably has legacy bios and if I
> > > > remember correctly, scfb is UEFI only)
> > &

Re: A head buildworld race visible in the ci.freebsd.org build history

2018-06-18 Thread Konstantin Belousov
On Mon, Jun 18, 2018 at 12:42:46PM -0700, Bryan Drewery wrote:
> On 6/15/2018 10:55 PM, Mark Millard wrote:
> > In watching ci.freebsd.org builds I've seen a notable
> > number of one time failures, such as (example from
> > powerpc64):
> > 
> > --- all_subdir_lib/libufs ---
> > ranlib -D libufs.a
> > ranlib: fatal: Failed to open 'libufs.a'
> > *** [libufs.a] Error code 70
> > 
> > where the next build works despite the change being
> > irrelevant to whatever ranlib complained about.
> > 
> > Other builds failed similarly:
> > 
> > --- all_subdir_lib/libbsm ---
> > ranlib -D libbsm_p.a
> > ranlib: fatal: Failed to open 'libbsm_p.a'
> > *** [libbsm_p.a] Error code 70
> > 
> > and:
> > 
> > --- kerberos5/lib__L ---
> > ranlib -D libgssapi_spnego_p.a
> > --- libgssapi_spnego.a ---
> > ranlib -D libgssapi_spnego.a
> > --- libgssapi_spnego_p.a ---
> > ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
> > *** [libgssapi_spnego_p.a] Error code 70
> > 
> > and so on.
> > 
> > 
> > It is not limited to powerpc64. For example, for aarch64
> > there are:
> > 
> > --- libpam_exec.a ---
> > building static pam_exec library
> > ar -crD libpam_exec.a `NM='nm' NMFLAGS=''  lorder pam_exec.o  | tsort -q` 
> > ranlib -D libpam_exec.a
> > ranlib: fatal: Failed to open 'libpam_exec.a'
> > *** [libpam_exec.a] Error code 70
> > 
> > and:
> > 
> > --- all_subdir_lib/libusb ---
> > ranlib -D libusb.a
> > ranlib: fatal: Failed to open 'libusb.a'
> > *** [libusb.a] Error code 70
> > 
> > and:
> > 
> > --- all_subdir_lib/libbsnmp ---
> > ranlib: fatal: Failed to open 'libbsnmp.a'
> > --- all_subdir_lib/ncurses ---
> > --- all_subdir_lib/ncurses/panelw ---
> > --- panel.pico ---
> > --- all_subdir_lib/libbsnmp ---
> > *** [libbsnmp.a] Error code 70
> > 
> > 
> > Even amd64 gets such:
> > 
> > --- libpcap.a ---
> > ranlib -D libpcap.a
> > ranlib: fatal: Failed to open 'libpcap.a'
> > *** [libpcap.a] Error code 70
> > 
> > and:
> > 
> > 
> > --- libkafs5.a ---
> > ranlib: fatal: Failed to open 'libkafs5.a'
> > --- libkafs5_p.a ---
> > ranlib: fatal: Failed to open 'libkafs5_p.a'
> > --- cddl/lib__L ---
> > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26:
> >  note: include the header  or explicitly provide a declaration for 
> > 'toupper'
> > --- kerberos5/lib__L ---
> > *** [libkafs5_p.a] Error code 70
> > 
> > make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
> > --- libkafs5.a ---
> > *** [libkafs5.a] Error code 70
> > 
> > and:
> > 
> > 
> > --- lib__L ---
> > ranlib -D libclang_rt.asan_cxx-i386.a
> > ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
> > *** [libclang_rt.asan_cxx-i386.a] Error code 70
> > 
> > 
> > (Notice the variability in what .a the ranlib's fail for.)
> > 
> > 
> > 
> > 
> > 
> 
> 
> I looked at this a few days ago and don't believe it's actually a build
> race. I think there is something wrong with the ar/ranlib on that system
> or something else. I've found no evidence of concurrent building of the
> .a files in question.

FWIW, I got the similar failure when I did last checks for the OFED
commit.  For me, it was libgcc.a.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Ryzen public erratas

2018-06-13 Thread Konstantin Belousov
Today I noted that AMD published the public errata document for Ryzens,
https://developer.amd.com/wp-content/resources/55449_1.12.pdf

Some of the issues listed there looks quite relevant to the potential
hangs that some people still experience with the machines.  I wrote
a script which should apply the recommended workarounds to the erratas
that I find interesting.

To run it, kldload cpuctl, then apply the latest firmware update to your
CPU, then run the following shell script.  Comments indicate the errata
number for the workarounds.

Please report the results.  If the script helps, I will code the kernel
change to apply the workarounds.

#!/bin/sh

# Enable workarounds for erratas listed in
# https://developer.amd.com/wp-content/resources/55449_1.12.pdf

# 1057, 1109
sysctl machdep.idle_mwait=0
sysctl machdep.idle=hlt

for x in /dev/cpuctl*; do
# 1021
cpucontrol -m '0xc0011029|=0x2000' $x
# 1033
cpucontrol -m '0xc0011020|=0x10' $x
# 1049
cpucontrol -m '0xc0011028|=0x10' $x
# 1095
cpucontrol -m '0xc0011020|=0x200' $x
done

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Ryzen public erratas

2018-06-13 Thread Konstantin Belousov
On Wed, Jun 13, 2018 at 12:06:42PM +0100, Johannes Lundberg wrote:
> 
> Konstantin Belousov writes:
> 
> > Today I noted that AMD published the public errata document for Ryzens,
> > https://developer.amd.com/wp-content/resources/55449_1.12.pdf
> >
> > Some of the issues listed there looks quite relevant to the potential
> > hangs that some people still experience with the machines.  I wrote
> > a script which should apply the recommended workarounds to the erratas
> > that I find interesting.
> >
> > To run it, kldload cpuctl, then apply the latest firmware update to your
> > CPU, then run the following shell script.  Comments indicate the errata
> > number for the workarounds.
> >
> > Please report the results.  If the script helps, I will code the kernel
> > change to apply the workarounds.
> >
> > #!/bin/sh
> >
> > # Enable workarounds for erratas listed in
> > # https://developer.amd.com/wp-content/resources/55449_1.12.pdf
> >
> > # 1057, 1109
> > sysctl machdep.idle_mwait=0
> > sysctl machdep.idle=hlt
> >
> > for x in /dev/cpuctl*; do
> > # 1021
> > cpucontrol -m '0xc0011029|=0x2000' $x
> > # 1033
> > cpucontrol -m '0xc0011020|=0x10' $x
> > # 1049
> > cpucontrol -m '0xc0011028|=0x10' $x
> > # 1095
> > cpucontrol -m '0xc0011020|=0x200' $x
> > done
> >
> 
> Hi
> 
> Thanks for the fix! I'm trying it now on my Ryzen 3 2200G which does
> experience some random occasional resets.
> 
> About updating to latest firmware, is this something that's done from BIOS or
> from FreeBSD? If the latter, how?
>From FreeBSD, install sysutils/devcpu-data then do
service microcode_update start
and of course, you must flash latest BIOS.

The microcode_update must be applied before running this script.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Ryzen public erratas

2018-06-14 Thread Konstantin Belousov
On Thu, Jun 14, 2018 at 10:24:17AM -0400, Mike Tancsa wrote:
> On 6/14/2018 9:36 AM, Eric van Gyzen wrote:
> > On 06/13/2018 05:35, Konstantin Belousov wrote:
> >> Today I noted that AMD published the public errata document for Ryzens,
> >> https://developer.amd.com/wp-content/resources/55449_1.12.pdf
> >>
> >> Some of the issues listed there looks quite relevant to the potential
> >> hangs that some people still experience with the machines.  I wrote
> >> a script which should apply the recommended workarounds to the erratas
> >> that I find interesting.
> >>
> >> To run it, kldload cpuctl, then apply the latest firmware update to your
> >> CPU, then run the following shell script.  Comments indicate the errata
> >> number for the workarounds.
> >>
> >> Please report the results.  If the script helps, I will code the kernel
> >> change to apply the workarounds.
> > Kostik:  This thread on the -stable list has a lot of positive feedback:
> > 
> > https://lists.freebsd.org/pipermail/freebsd-stable/2018-June/089110.html
> 
> I have a couple of Epyc boxes that showed the same lockup behaviour. I
> will re-install FreeBSD on them and see if their microcode updates fix
> this issue as well...
I am not sure about only microcode update.  Depending on the BIOS
vendor and current BIOS, you may need all three: BIOS update, microcode
update using cpucontrol/devcpu-data, and running the script I posted.
In the best case, some of this is just redundand.

> 
> Should I run the same cpuctl commands on those CPUs ?  BTW, I am happy
> to loan one out to you in the FreeBSD netperf cluster for a few weeks
> 
>   ---Mike
> 
> 
> 
> -- 
> ---
> Mike Tancsa, tel +1 519 651 3400 x203
> Sentex Communications, m...@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] Deprecation and removal of the drm2 driver

2018-05-31 Thread Konstantin Belousov
On Thu, May 31, 2018 at 08:34:44AM +0100, Johannes Lundberg wrote:
> On Wed, May 30, 2018 at 10:57 PM O. Hartmann  wrote:
> 
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA512
> >
> > Am Thu, 24 May 2018 09:10:10 -0700 (PDT)
> > "Rodney W. Grimes"  schrieb:
> >
> > > -- Start of PGP signed section.
> > > > On Thu, May 24, 2018 at 08:22:12AM -0700, Rodney W. Grimes wrote:
> > > > > > On Wed, May 23, 2018 at 11:48:38AM +0200, Philip Homburg wrote:
> > > > > > > >Also as the Moore's law curve flattens expect the life of these
> > > > > > > >older, but not so old, machines to live quiet some time.  I
> > > > > > > >believe we are talking sandy bridge and earlier?  If that is
> > > > > > > >corret Sandy bridge is still a very viable system.
> > > > > > >
> > > > > > > I noticed this lack of love for older systems recently.
> > > > > > >
> > > > > > > I wanted to use an older Dell server to test the 11.2 BETAs and
> > RCs.
> > > > > > >
> > > > > > > Turns out, you can't install FreeBSD using a USB stick image
> > because the
> > > > > > > BIOS only support MBR. No idea why MBR support was dropped for
> > the USB images.
> > > > > > >
> > > > > > > In the end I had to find a CD burner, and after a couple of
> > tries managed to
> > > > > > > install from CD.
> > > > > > >
> > > > > > > After that, my ansible playbooks started failing because
> > /boot/loader.conf
> > > > > > > is absent if you boot from zfs in combination with MBR.
> > > > > > >
> > > > > > > Pity. This older server hardware is great for trying out new
> > releases, play
> > > > > > > with zfs, etc.
> > > > > >
> > > > > > The disc1.iso (as well as bootonly.iso and dvd1.iso) images are now
> > > > > > built as hybrid images, supporting both MBR and GPT, as well as
> > being
> > > > > > written to a flash drive (like memstick.img) as well as a CD.
> > > > >
> > > > > To clarify a minor point here, are the amd64 disc1.iso images or
> > > > > both the amd64 and i386 disk1.iso images being built as "hybrid"?
> > > > >
> > > >
> > > > Only amd64.  i386 does not have UEFI-/GPT-related boot issues.
> > >
> > > Here is a data point:
> > >
> > > Test system is Dell R710,
> > > First attempt is with BIOS in Boot mode: Bios
> > > Second attempt is with BIOS in Boot mode: UEFI
> > >
> > > Attemtped to boot amd64 version:
> > > Screen goes white background, text appears (yes, way indented)
> > > BTX version is 1.02
> > > Consoles: internal video/keyboard
> > > _
> > >
> > > That last _ is a blinking cursor, system is hung, does repsond to C-A-del
> > >
> > >
> > > Second attempt:
> > > Works properly
> > >
> > >
> > > I am going after Ed Maste's posted build image in the other thread now...
> > >
> > >
> > > >
> > > > > As this is what I see on my system:
> > > > > root@x230a:/home/ISO/x # file FreeBSD-11.2-BETA2-*
> > > > > FreeBSD-11.2-BETA2-amd64-disc1.iso: DOS/MBR boot sector; partition 1
> > : ID=0xee,
> > > > > start-CHS (0x0,0,2), end-CHS (0x3ff,255,63), startsector 1, 1472695
> > sectors
> > > > > FreeBSD-11.2-BETA2-i386-disc1.iso:  ISO 9660 CD-ROM filesystem data
> > > > > '11_2_BETA2_I386_CD' (bootable)
> > > > > > MBR support was initially removed from the memstick installer, as
> > it is
> > > > > > not compatible with some UEFI implementations.  (Or, at least that
> > is my
> > > > > > understanding, based on my limited intimate knowledge of UEFI.)
> > > > > >
> > > >
> > > > Glen
> > > >
> > > -- End of PGP section, PGP failed!
> > >
> >
> > Today, I tried to eliminate FreeBSD's native KMS drm2 by installing
> > graphics/drm-stable-kmod (ports tree at revision 471172) on CURRENT
> > (FreeBSD 12.0-CURRENT
> > #46 r334401: Wed May 30 23:32:45 CEST 2018 amd64, CUSTOM kernel).
> >
> > The hardware is a presumably UEFI capable ASROCK Z77-Pro4M (latest
> > firmware available
> > from late 2013) equipted with a XEON IvyBridge:
> >
> > CPU: Intel(R) Xeon(R) CPU E3-1245 V2 @ 3.40GHz (3400.09-MHz K8-class CPU)
> >   Origin="GenuineIntel"  Id=0x306a9  Family=0x6  Model=0x3a  Stepping=9
> >
> > Features=0xbfebfbff
> >
> > Features2=0x7fbae3ff
> >   AMD Features=0x28100800
> >   AMD Features2=0x1
> >   Structured Extended Features=0x281
> >   XSAVE Features=0x1
> >   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
> >   TSC: P-state invariant, performance statistics
> > real memory  = 17179869184 (16384 MB)
> > avail memory = 16295137280 (15540 MB)
> > Event timer "LAPIC" quality 600
> > ACPI APIC Table: 
> > FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
> >
> >
> > The box isn't capable of booting FreeBSD in UEFI (while Linux and Windows
> > seems to work).
> > So I'm stuck with BIOS.
> >
> > Loading i915kms.ko/drm2.ko on the system put the console in a
> > high-resolution mode
> > instead of this crap 80x25 ancient console immediately.
> >
> > Loading graphics/drm-stable-kmod as requested (via rc.conf.local) ends up
> > in a trap
> > 12/crash. Very nice for such an old hardware. graphics/drm-next-kmod does
> > load without
> > 

Re: ``make buildkernel'' fails when /usr/obj is empty

2018-05-31 Thread Konstantin Belousov
On Thu, May 31, 2018 at 08:19:20PM +0200, Dimitry Andric wrote:
> On 31 May 2018, at 20:11, Warner Losh  wrote:
> > 
> > On Thu, May 31, 2018 at 11:47 AM, Dimitry Andric  wrote:
> > On 31 May 2018, at 18:04, Benjamin Kaduk  wrote:
> > > On Thu, May 31, 2018 at 09:58:50AM +0200, Gary Jennejohn wrote:
> > >> On Thu, 31 May 2018 09:52:22 +0200
> > >> Gary Jennejohn  wrote:
> ...
> > > Whatever happened to the "run buildworld or kernel-toolchain before
> > > buildkernel" requirement?
> > 
> > That is still a requirement, yes.  Otherwise, you might have outdated
> > toolchain components are in your /usr/obj.
> > 
> > Usually you can get away without doing that, and now that clang is the 
> > toolchain that's rebuilt (and that's not fast) people try to get away with 
> > it more and more...
> 
> Actually clang doesn't get updated *that* often, but there is a minor
> snag that one of llvm's config files (lib/clang/include/llvm/Config/config.h)
> includes , so each time __FreeBSD_version is bumped, quite
> a lot of dependencies get triggered...
> 
> The version is only used for two checks:
> 
> #if __FreeBSD_version >= 152
> /* Define to 1 if you have the `backtrace' function. */
> #define HAVE_BACKTRACE TRUE
> 
> and:
> 
> /* Define to 1 if you have the `futimens' function. */
> #if __FreeBSD_version >= 1100056
> #define HAVE_FUTIMENS 1
> #endif
> 
> Maybe the first check could be dropped, assuming that backtrace() is
> always available, but I'm not sure about futimens().  Is there any
> supported version of FreeBSD left that does *not* have it?
Or you can manually define the symbols as needed on each branch,
eliminating the need for osreldate.h and reusable if some other
configuration variable needs to be conditionally set.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy)

2018-06-03 Thread Konstantin Belousov
On Sun, Jun 03, 2018 at 04:55:00PM +0200, Michael Gmelin wrote:
> 
> 
> On Sun, 3 Jun 2018 16:21:10 +0300
> Konstantin Belousov  wrote:
> 
> > On Sun, Jun 03, 2018 at 02:48:40PM +0200, Michael Gmelin wrote:
> > > Hi,
> > > 
> > > After upgrading CURRENT to r333992 (from something at least a year
> > > old, quite some changes in mp_machdep.c since), this machine crashes
> > > on boot:
> > > 
> > > Copyright (c) 1992-2018 The FreeBSD Project.
> > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993,
> > > 1994 The Regents of the University of California. All rights
> > > reserved. FreeBSD is a registered trademark of The FreeBSD
> > > Foundation. FreeBSD 12.0-CURRENT #1 r333992: Tue May 22 00:31:04
> > > CEST 2018 root@flimsy:/usr/obj/usr/src/amd64.amd64/sys/flimsy amd64
> > > FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based
> > > on LLVM 6.0.0) WARNING: WITNESS option enabled, expect reduced
> > > performance. VT(vga): resolution 640x480
> > > CPU: Intel(R) Celeron(R) 2955U @ 1.40GHz (1396.80-MHz K8-class CPU)
> > >   Origin="GenuineIntel"  Id=0x40651  Family=0x6  Model=0x45
> > > Stepping=1
> > > Features=0xbfebfbff > > CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> > > Features2=0x4ddaebbf > > xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
> > > AMD Features=0x2c100800 AMD
> > > Features2=0x21 Structured Extended
> > > Features=0x2603 XSAVE
> > > Features=0x1 VT-x: (disabled in BIOS)
> > > PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance
> > > statistics real memory  = 4301258752 (4102 MB)
> > > avail memory = 1907572736 (1819 MB)
> > > Event timer "LAPIC" quality 600
> > > ACPI APIC Table:   
> > What does this mean ?  Did you flashed coreboot ?
> 
> This machine comes with it by default (my model was delivered with 
> SeaBIOS 20131018_145217-build121-m2). So I didn't flash anything
> (didn't feel like bricking it).
> 
> > 
> > > kernel trap 12 with interrupts disabled
> > > 
> > > Fatal trap 12: page fault while in kernel mode 
> > > cpuid = 0; apic id = 00
> > > fault virtual address= 0xf8000100
> > > fault code   = supervisor write data, protection
> > > violation instruction pointer  = 0x20:Ox8102955f
> > > stack pointer= 0x28:0x82a79be0
> > > frame pointer= 0x28:0x82a79c10
> > > code segment = base Ox0, limit Oxf, type Ox1b
> > >  = DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags = resume, IOPL = 0
> > > current process  = 0 ()
> > > [ thread pid 0 tid 0 ]
> > > Stopped at  native_start_all_aps+0x08f:  movq
> > > %rax,(%rsi)  
> > Look up the source line number for this address.
> > 
> 
> I guess that's sys/amd64/amd64/support.S line 854 (in rdmsr), called by
> native_start_all_aps. Any additional hints how I can track it down?
Why did you decided that this is rdmsr_safe() ? First,
native_start_all_aps() does not call rdmsr, second the ddb
report clearly indicates that the fault occured acessing DMAP in
native_start_all_aps().

Just look up the source line by the address native_start_all_aps+0x08f.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy)

2018-06-03 Thread Konstantin Belousov
On Sun, Jun 03, 2018 at 02:48:40PM +0200, Michael Gmelin wrote:
> Hi,
> 
> After upgrading CURRENT to r333992 (from something at least a year
> old, quite some changes in mp_machdep.c since), this machine crashes
> on boot:
> 
> Copyright (c) 1992-2018 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>   The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 12.0-CURRENT #1 r333992: Tue May 22 00:31:04 CEST 2018
> root@flimsy:/usr/obj/usr/src/amd64.amd64/sys/flimsy amd64
> FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 
> 6.0.0)
> WARNING: WITNESS option enabled, expect reduced performance.
> VT(vga): resolution 640x480
> CPU: Intel(R) Celeron(R) 2955U @ 1.40GHz (1396.80-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x40651  Family=0x6  Model=0x45  Stepping=1
>   
> Features=0xbfebfbff CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>   
> Features2=0x4ddaebbf xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
>   AMD Features=0x2c100800
>   AMD Features2=0x21
>   Structured Extended Features=0x2603
>   XSAVE Features=0x1
>   VT-x: (disabled in BIOS) PAT,HLT,MTF,PAUSE,EPT,UG,VPID
>   TSC: P-state invariant, performance statistics
> real memory  = 4301258752 (4102 MB)
> avail memory = 1907572736 (1819 MB)
> Event timer "LAPIC" quality 600
> ACPI APIC Table: 
What does this mean ?  Did you flashed coreboot ?

> kernel trap 12 with interrupts disabled
> 
> Fatal trap 12: page fault while in kernel mode 
> cpuid = 0; apic id = 00
> fault virtual address= 0xf8000100
> fault code   = supervisor write data, protection violation
> instruction pointer  = 0x20:Ox8102955f
> stack pointer= 0x28:0x82a79be0
> frame pointer= 0x28:0x82a79c10
> code segment = base Ox0, limit Oxf, type Ox1b
>  = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags = resume, IOPL = 0
> current process  = 0 ()
> [ thread pid 0 tid 0 ]
> Stopped at  native_start_all_aps+0x08f:  movq %rax,(%rsi)
Look up the source line number for this address.

> db>
> 
> Any key press in the debugger will reboot the machine.
> 
> Booting with kern.smp.disabled=1 works.
> 
> Any ideas?
> 
> -m
> 
> -- 
> Michael Gmelin
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Error build nvidia-driver with r334555

2018-06-03 Thread Konstantin Belousov
On Sun, Jun 03, 2018 at 05:08:34AM -0700, David Wolfskill wrote:
> On Sun, Jun 03, 2018 at 06:48:01PM +0700, Alex V. Petrov wrote:
> > 
> > --- nvidia_subr.o ---
> > 
> > 
> > nvidia_subr.c:367:26: error: 'memset' call operates on objects of type
> > 'struct nv_ioctl_card_info' while the size is based on a different type
> > 'struct nv_ioctl_card_info *' [-Werror,-Wsizeof
> > -pointer-memaccess]
> > memset(ci, 0, sizeof(ci));
> >~~^~
> > nvidia_subr.c:367:26: note: did you mean to dereference the argument to
> > 'sizeof' (and multiply it by the number of elements)?
> > 
> > memset(ci, 0, sizeof(ci));
> >  ^~
> > 1 error generated.
> > *** [nvidia_subr.o] Error code 1
> > 
> 
> Aye; please ref.
>  for
> additional details.
> 
> The issue has been narrowed down to the range r334529 - r334535; I'm
> suspecting r334533 and/or r334534.  (Not that the commits are in any way
> "faulty" -- merely that the nvidia-driver port may need some "evasive
> action" to allow for the change).

Even not looking at the actual code, I am quite sure that the line
nvidia_subr.c:367 should be changed to
memset(ci, 0, sizeof(*ci));
This is a bug in the driver sources.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy)

2018-06-03 Thread Konstantin Belousov
On Sun, Jun 03, 2018 at 09:50:20PM +0200, Michael Gmelin wrote:
> 
> 
> On Sun, 3 Jun 2018 18:04:23 +0300
> Konstantin Belousov  wrote:
> 
> > On Sun, Jun 03, 2018 at 04:55:00PM +0200, Michael Gmelin wrote:
> > > 
> > > 
> > > On Sun, 3 Jun 2018 16:21:10 +0300
> > > Konstantin Belousov  wrote:
> > >   
> > > > On Sun, Jun 03, 2018 at 02:48:40PM +0200, Michael Gmelin wrote:  
> > > > > Hi,
> > > > > 
> > > > > After upgrading CURRENT to r333992 (from something at least a
> > > > > year old, quite some changes in mp_machdep.c since), this
> > > > > machine crashes on boot:
> > > > > 
> > > > > Copyright (c) 1992-2018 The FreeBSD Project.
> > > > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992,
> > > > > 1993, 1994 The Regents of the University of California. All
> > > > > rights reserved. FreeBSD is a registered trademark of The
> > > > > FreeBSD Foundation. FreeBSD 12.0-CURRENT #1 r333992: Tue May 22
> > > > > 00:31:04 CEST 2018
> > > > > root@flimsy:/usr/obj/usr/src/amd64.amd64/sys/flimsy amd64
> > > > > FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565)
> > > > > (based on LLVM 6.0.0) WARNING: WITNESS option enabled, expect
> > > > > reduced performance. VT(vga): resolution 640x480 CPU: Intel(R)
> > > > > Celeron(R) 2955U @ 1.40GHz (1396.80-MHz K8-class CPU)
> > > > > Origin="GenuineIntel"  Id=0x40651  Family=0x6  Model=0x45
> > > > > Stepping=1
> > > > > Features=0xbfebfbff > > > > CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> > > > > Features2=0x4ddaebbf > > > > xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
> > > > > AMD Features=0x2c100800 AMD
> > > > > Features2=0x21 Structured Extended
> > > > > Features=0x2603 XSAVE
> > > > > Features=0x1 VT-x: (disabled in BIOS)
> > > > > PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant,
> > > > > performance statistics real memory  = 4301258752 (4102 MB)
> > > > > avail memory = 1907572736 (1819 MB) Event timer "LAPIC" quality
> > > > > 600 ACPI APIC Table: 
> > > > What does this mean ?  Did you flashed coreboot ?  
> > > 
> > > This machine comes with it by default (my model was delivered with 
> > > SeaBIOS 20131018_145217-build121-m2). So I didn't flash anything
> > > (didn't feel like bricking it).
> > >   
> > > >   
> > > > > kernel trap 12 with interrupts disabled
> > > > > 
> > > > > Fatal trap 12: page fault while in kernel mode 
> > > > > cpuid = 0; apic id = 00
> > > > > fault virtual address= 0xf8000100
> > > > > fault code   = supervisor write data, protection
> > > > > violation instruction pointer  = 0x20:Ox8102955f
> > > > > stack pointer= 0x28:0x82a79be0
> > > > > frame pointer= 0x28:0x82a79c10
> > > > > code segment = base Ox0, limit Oxf, type Ox1b
> > > > >  = DPL 0, pres 1, long 1, def32 0, gran
> > > > > 1 processor eflags = resume, IOPL = 0
> > > > > current process  = 0 ()
> > > > > [ thread pid 0 tid 0 ]
> > > > > Stopped at  native_start_all_aps+0x08f:  movq
> > > > > %rax,(%rsi)
> > > > Look up the source line number for this address.
> > > >   
> > > 
> > > I guess that's sys/amd64/amd64/support.S line 854 (in rdmsr),
> > > called by native_start_all_aps. Any additional hints how I can
> > > track it down?  
> > Why did you decided that this is rdmsr_safe() ? First,
> > native_start_all_aps() does not call rdmsr, second the ddb
> > report clearly indicates that the fault occured acessing DMAP in
> > native_start_all_aps().
> > 
> > Just look up the source line by the address
> > native_start_all_aps+0x08f.
> 
> Okay, according to kgbd this should be here:
> 
> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=68=markup#l369
> 
> 364
> 365/* Create the initial 1GB replicated page tables */
> 366for (i = 0; i < 512; i++) {
> 367/* Each slot of the level 4 pages points to the same
>

Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy)

2018-06-04 Thread Konstantin Belousov
On Mon, Jun 04, 2018 at 12:46:32AM +0200, Michael Gmelin wrote:
> 
> 
> On Sun, 3 Jun 2018 23:53:40 +0300
> Konstantin Belousov  wrote:
> 
> > On Sun, Jun 03, 2018 at 09:50:20PM +0200, Michael Gmelin wrote:
> > > 
> > > 
> > > On Sun, 3 Jun 2018 18:04:23 +0300
> > > Konstantin Belousov  wrote:
> > >   
> > > > On Sun, Jun 03, 2018 at 04:55:00PM +0200, Michael Gmelin wrote:  
> > > > > 
> > > > > 
> > > > > On Sun, 3 Jun 2018 16:21:10 +0300
> > > > > Konstantin Belousov  wrote:
> > > > > 
> > > > > > On Sun, Jun 03, 2018 at 02:48:40PM +0200, Michael Gmelin
> > > > > > wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > After upgrading CURRENT to r333992 (from something at least
> > > > > > > a year old, quite some changes in mp_machdep.c since), this
> > > > > > > machine crashes on boot:
> > > > > > > 
> > > > > > > Copyright (c) 1992-2018 The FreeBSD Project.
> > > > > > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991,
> > > > > > > 1992, 1993, 1994 The Regents of the University of
> > > > > > > California. All rights reserved. FreeBSD is a registered
> > > > > > > trademark of The FreeBSD Foundation. FreeBSD 12.0-CURRENT
> > > > > > > #1 r333992: Tue May 22 00:31:04 CEST 2018
> > > > > > > root@flimsy:/usr/obj/usr/src/amd64.amd64/sys/flimsy amd64
> > > > > > > FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565)
> > > > > > > (based on LLVM 6.0.0) WARNING: WITNESS option enabled,
> > > > > > > expect reduced performance. VT(vga): resolution 640x480
> > > > > > > CPU: Intel(R) Celeron(R) 2955U @ 1.40GHz (1396.80-MHz
> > > > > > > K8-class CPU) Origin="GenuineIntel"  Id=0x40651
> > > > > > > Family=0x6  Model=0x45 Stepping=1
> > > > > > > Features=0xbfebfbff > > > > > >   
> > > > > > > CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>  
> > > > > > > Features2=0x4ddaebbf > > > > > >   
> > > > > > > xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
> > > > > > >   
> > > > > > > AMD Features=0x2c100800 AMD
> > > > > > > Features2=0x21 Structured Extended
> > > > > > > Features=0x2603 XSAVE
> > > > > > > Features=0x1 VT-x: (disabled in BIOS)
> > > > > > > PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant,
> > > > > > > performance statistics real memory  = 4301258752 (4102 MB)
> > > > > > > avail memory = 1907572736 (1819 MB) Event timer "LAPIC"
> > > > > > > quality 600 ACPI APIC Table:   
> > > > > > What does this mean ?  Did you flashed coreboot ?
> > > > > 
> > > > > This machine comes with it by default (my model was delivered
> > > > > with SeaBIOS 20131018_145217-build121-m2). So I didn't flash
> > > > > anything (didn't feel like bricking it).
> > > > > 
> > > > > > 
> > > > > > > kernel trap 12 with interrupts disabled
> > > > > > > 
> > > > > > > Fatal trap 12: page fault while in kernel mode 
> > > > > > > cpuid = 0; apic id = 00
> > > > > > > fault virtual address= 0xf8000100
> > > > > > > fault code   = supervisor write data, protection
> > > > > > > violation instruction pointer  = 0x20:Ox8102955f
> > > > > > > stack pointer= 0x28:0x82a79be0
> > > > > > > frame pointer= 0x28:0x82a79c10
> > > > > > > code segment = base Ox0, limit Oxf, type
> > > > > > > Ox1b = DPL 0, pres 1, long 1, def32 0, gran
> > > > > > > 1 processor eflags = resume, IOPL = 0
> > > > > > > current process  = 0 ()
> > > > > > > [ thread pid 0 tid 0 ]
> > > > > > > Stopped at  native_start_all_aps+0x08f:  movq
> > > > > > > %rax,(%rsi)  
> > > > > > Look up the s

Re: SVN r334499 breaks i386 kernel build

2018-06-01 Thread Konstantin Belousov
On Fri, Jun 01, 2018 at 07:42:07PM -0400, Michael Butler wrote:
> Building /usr/obj/usr/src/i386.i386/sys/SARAH/vm_mmap.o
> --- vm_mmap.o ---
> /usr/src/sys/vm/vm_mmap.c:245:6: error: use of undeclared identifier
> 'MAP_32BIT'
> MAP_32BIT | MAP_ALIGNMENT_MASK)) != 0))
> ^
> 1 error generated.
> *** [vm_mmap.o] Error code 1
Should be fixed by r334507.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy)

2018-06-05 Thread Konstantin Belousov
On Mon, Jun 04, 2018 at 11:17:56PM +0200, Michael Gmelin wrote:
> 
> 
> On Mon, 4 Jun 2018 14:06:55 +0300
> Konstantin Belousov  wrote:
> 
> > On Mon, Jun 04, 2018 at 12:46:32AM +0200, Michael Gmelin wrote:
> > > 
> > > 
> > > On Sun, 3 Jun 2018 23:53:40 +0300
> > > Konstantin Belousov  wrote:
> > >   
> > > > On Sun, Jun 03, 2018 at 09:50:20PM +0200, Michael Gmelin wrote:  
> > > > > 
> > > > > 
> > > > > On Sun, 3 Jun 2018 18:04:23 +0300
> > > > > Konstantin Belousov  wrote:
> > > > > 
> > > > > > On Sun, Jun 03, 2018 at 04:55:00PM +0200, Michael Gmelin
> > > > > > wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Sun, 3 Jun 2018 16:21:10 +0300
> > > > > > > Konstantin Belousov  wrote:
> > > > > > >   
> > > > > > > > On Sun, Jun 03, 2018 at 02:48:40PM +0200, Michael Gmelin
> > > > > > > > wrote:  
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > After upgrading CURRENT to r333992 (from something at
> > > > > > > > > least a year old, quite some changes in mp_machdep.c
> > > > > > > > > since), this machine crashes on boot:
> > > > > > > > > 
> > > > > > > > > Copyright (c) 1992-2018 The FreeBSD Project.
> > > > > > > > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991,
> > > > > > > > > 1992, 1993, 1994 The Regents of the University of
> > > > > > > > > California. All rights reserved. FreeBSD is a registered
> > > > > > > > > trademark of The FreeBSD Foundation. FreeBSD
> > > > > > > > > 12.0-CURRENT #1 r333992: Tue May 22 00:31:04 CEST 2018
> > > > > > > > > root@flimsy:/usr/obj/usr/src/amd64.amd64/sys/flimsy
> > > > > > > > > amd64 FreeBSD clang version 6.0.0
> > > > > > > > > (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0)
> > > > > > > > > WARNING: WITNESS option enabled, expect reduced
> > > > > > > > > performance. VT(vga): resolution 640x480 CPU: Intel(R)
> > > > > > > > > Celeron(R) 2955U @ 1.40GHz (1396.80-MHz K8-class CPU)
> > > > > > > > > Origin="GenuineIntel"  Id=0x40651 Family=0x6
> > > > > > > > > Model=0x45 Stepping=1
> > > > > > > > > Features=0xbfebfbff > > > > > > > > CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> > > > > > > > > Features2=0x4ddaebbf > > > > > > > > xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
> > > > > > > > > AMD Features=0x2c100800
> > > > > > > > > AMD Features2=0x21 Structured Extended
> > > > > > > > > Features=0x2603
> > > > > > > > > XSAVE Features=0x1 VT-x: (disabled in BIOS)
> > > > > > > > > PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant,
> > > > > > > > > performance statistics real memory  = 4301258752 (4102
> > > > > > > > > MB) avail memory = 1907572736 (1819 MB) Event timer
> > > > > > > > > "LAPIC" quality 600 ACPI APIC Table:  > > > > > > > > COREBOOT>
> > > > > > > > What does this mean ?  Did you flashed coreboot ?  
> > > > > > > 
> > > > > > > This machine comes with it by default (my model was
> > > > > > > delivered with SeaBIOS 20131018_145217-build121-m2). So I
> > > > > > > didn't flash anything (didn't feel like bricking it).
> > > > > > >   
> > > > > > > >   
> > > > > > > > > kernel trap 12 with interrupts disabled
> > > > > > > > > 
> > > > > > > > > Fatal trap 12: page fault while in kernel mode 
> > > > > > > > > cpuid = 0; apic id = 00
> > > > > > > > > fault virtual address= 0xf8000100
> > > > > > > > > fault code   = supervisor write data,
> > > 

Re: nfsd kernel threads won't die via SIGKILL

2018-06-25 Thread Konstantin Belousov
On Mon, Jun 25, 2018 at 02:04:32AM +, Rick Macklem wrote:
> Konstantin Belousov wrote:
> >On Sat, Jun 23, 2018 at 09:03:02PM +, Rick Macklem wrote:
> >> During testing of the pNFS server I have been frequently 
> >> killing/restarting the nfsd.
> >> Once in a while, the "slave" nfsd process doesn't terminate and a "ps 
> >> axHl" shows:
> >>   0 48889 1   0   20  0  5884  812 svcexit  D -   0:00.01 nfsd: 
> >> server
> >>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
> >> server
> >> ... more of the same
> >>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
> >> server
> >>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   1:51.78 nfsd: 
> >> server
> >>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   2:27.75 nfsd: 
> >> server
> >>
> >> You can see that the top thread (the one that was created with the 
> >> process) is
> >> stuck in "D"  on "svcexit".
> >> The rest of the threads are still servicing NFS RPCs. If you still have an 
> >> NFS mount >>on
> >> the server, the mount continues to work and the CPU time for the last two 
> >> threads
> >> slowly climbs, due to NFS RPC activity. A SIGKILL was posted for the 
> >> process and
> >> these threads (created by kthread_add) are here, but the
> >> cv_wait_sig/cv_timedwait_sig never seems to return EINTR for these other 
> >> >>threads.
> >>
> >>if (ismaster || (!ismaster &&
> >> 1207  grp->sg_threadcount > 
> >> grp->sg_minthreads))
> >> 1208  error = 
> >> cv_timedwait_sig(>st_cond,
> >> 1209  >sg_lock, 5 * hz);
> >> 1210  else
> >> 1211  error = cv_wait_sig(>st_cond,
> >> 1212  >sg_lock);
> >>
> >> The top thread (referred to in svc.c as "ismaster" did return from here 
> >> with EINTR
> >> and has now done an msleep() here, waiting for the other threads to 
> >> terminate.
> >>
> >>/* Waiting for threads to stop. */
> >> 1387  for (g = 0; g < pool->sp_groupcount; g++) {
> >> 1388  grp = >sp_groups[g];
> >> 1389  mtx_lock(>sg_lock);
> >> 1390  while (grp->sg_threadcount > 0)
> >> 1391  msleep(grp, >sg_lock, 0, "svcexit", 0);
> >> 1392  mtx_unlock(>sg_lock);
> >> 1393  }
> >>
> >> Although I can't be sure if this patch has fixed the problem because it 
> >> happens
> >> intermittently, I have not seen the problem since applying this patch:
> >> --- rpc/svc.c.sav 2018-06-21 22:52:11.623955000 -0400
> >> +++ rpc/svc.c 2018-06-22 09:01:40.271803000 -0400
> >> @@ -1388,7 +1388,7 @@ svc_run(SVCPOOL *pool)
> >>   grp = >sp_groups[g];
> >>   mtx_lock(>sg_lock);
> >>   while (grp->sg_threadcount > 0)
> >> - msleep(grp, >sg_lock, 0, "svcexit", 0);
> >> + msleep(grp, >sg_lock, 0, "svcexit", 1);
> >>   mtx_unlock(>sg_lock);
> >>   }
> >>  }
> >>
> >> As you can see, all it does is add a timeout to the msleep().
> >> I am not familiar with the signal delivery code in sleepqeue, so it 
> >> probably
> >> isn't correct, but my theory is alonge the lines of...
> >>
> >> Since the msleep() doesn't have PCATCH, it does not set TDF_SINTR
> >> and if that happens before the other threads return EINTR from 
> >> cv_wait_sig(),
> >> they no longer do so?
> >> And I thought that waking up from the msleep() via timeouts would maybe 
> >> allow
> >> the other threads to return EINTR from cv_wait_sig()?
> >>
> >> Does this make sense? rick
> >> ps: I'll post if I see the problem again with the patch applied.
> >> pss: This is a single core i386 system, just in case that might affect 
> >> this.
> >
> >No, the patch does not make sense. I think it was just coincidental that
> >with the patch you did not get the hang.
> >
> >S

Re: nfsd kernel threads won't die via SIGKILL

2018-06-24 Thread Konstantin Belousov
On Sat, Jun 23, 2018 at 09:03:02PM +, Rick Macklem wrote:
> During testing of the pNFS server I have been frequently killing/restarting 
> the nfsd.
> Once in a while, the "slave" nfsd process doesn't terminate and a "ps axHl" 
> shows:
>   0 48889 1   0   20  0  5884  812 svcexit  D -   0:00.01 nfsd: 
> server 
>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
> server 
> ... more of the same
>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
> server 
>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   1:51.78 nfsd: 
> server 
>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   2:27.75 nfsd: 
> server 
> 
> You can see that the top thread (the one that was created with the process) is
> stuck in "D"  on "svcexit".
> The rest of the threads are still servicing NFS RPCs. If you still have an 
> NFS mount on
> the server, the mount continues to work and the CPU time for the last two 
> threads
> slowly climbs, due to NFS RPC activity. A SIGKILL was posted for the process 
> and
> these threads (created by kthread_add) are here, but the
> cv_wait_sig/cv_timedwait_sig never seems to return EINTR for these other 
> threads.
> 
>if (ismaster || (!ismaster &&
> 1207  grp->sg_threadcount > grp->sg_minthreads))
> 1208  error = cv_timedwait_sig(>st_cond,
> 1209  >sg_lock, 5 * hz);
> 1210  else
> 1211  error = cv_wait_sig(>st_cond,
> 1212  >sg_lock);
> 
> The top thread (referred to in svc.c as "ismaster" did return from here with 
> EINTR
> and has now done an msleep() here, waiting for the other threads to terminate.
> 
>/* Waiting for threads to stop. */
> 1387  for (g = 0; g < pool->sp_groupcount; g++) {
> 1388  grp = >sp_groups[g];
> 1389  mtx_lock(>sg_lock);
> 1390  while (grp->sg_threadcount > 0)
> 1391  msleep(grp, >sg_lock, 0, "svcexit", 0);
> 1392  mtx_unlock(>sg_lock);
> 1393  }
> 
> Although I can't be sure if this patch has fixed the problem because it 
> happens
> intermittently, I have not seen the problem since applying this patch:
> --- rpc/svc.c.sav 2018-06-21 22:52:11.623955000 -0400
> +++ rpc/svc.c 2018-06-22 09:01:40.271803000 -0400
> @@ -1388,7 +1388,7 @@ svc_run(SVCPOOL *pool)
>   grp = >sp_groups[g];
>   mtx_lock(>sg_lock);
>   while (grp->sg_threadcount > 0)
> - msleep(grp, >sg_lock, 0, "svcexit", 0);
> + msleep(grp, >sg_lock, 0, "svcexit", 1);
>   mtx_unlock(>sg_lock);
>   }
>  }
> 
> As you can see, all it does is add a timeout to the msleep(). 
> I am not familiar with the signal delivery code in sleepqeue, so it probably
> isn't correct, but my theory is alonge the lines of...
> 
> Since the msleep() doesn't have PCATCH, it does not set TDF_SINTR
> and if that happens before the other threads return EINTR from cv_wait_sig(),
> they no longer do so?
> And I thought that waking up from the msleep() via timeouts would maybe allow
> the other threads to return EINTR from cv_wait_sig()?
> 
> Does this make sense? rick
> ps: I'll post if I see the problem again with the patch applied.
> pss: This is a single core i386 system, just in case that might affect this.

No, the patch does not make sense. I think it was just coincidental that
with the patch you did not get the hang.

Signals are delivered to a thread, which should take the appropriate
actions. For the kernel process like rpc pool, the signals are never
delivered, they are queued in the randomly selected thread' signal queue
and sit there. The interruptible sleeps are aborted in the context
of that thread, but nothing else happens. So if you need to make svc
pools properly killable, all threads must check at least for EINTR and
instruct other threads to exit as well.

Your description at the start of the message of the behaviour after
SIGKILL, where other threads continued to serve RPCs, exactly matches
above explanation. You need to add some global 'stop' flag, if it is not
yet present, and recheck it after each RPC handled. Any thread which
notes EINTR or does a direct check for the pending signal, should set
the flag and wake up every other thread in the pool.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: atomic_testandclear_, atomic_testandset_

2018-06-24 Thread Konstantin Belousov
On Sat, Jun 23, 2018 at 01:38:07PM -0700, Matthew Macy wrote:
> It turns out ck already has equivalent primitives. Pardon the noise.
Why not to add trivial cmpset-based implementations to the lacking
arches ?  If maintainers prefer proper ll/cs assembly, they would
have the time to do it properly without ultimatum.

It is useful to utilize consistent atomic(9) KPI across kernel.

> 
> -M
> 
> On Sat, Jun 23, 2018 at 12:18 Matthew Macy  wrote:
> 
> > The functions in the subject are both documented in atomic(9) and are
> > implemented by every arch except sparc64 and MIPS. I have some code in
> > review that uses them that I intend to commit once the various design
> > issues are addressed. Please implement them so that those targets can
> > remain part of universe.
> >
> > Thanks in advance.
> > -M
> >
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: change of nfsd->kernel interface in head

2018-06-30 Thread Konstantin Belousov
On Sat, Jun 30, 2018 at 06:39:57PM +, Rick Macklem wrote:
> r335012 (the big patch that added the pNFS server support) revised the 
> nfsd->kernel
> nfssvc(2) syscall interface.
> It has compatibility code, so that old nfsd binaries still work.
> 
> I now need to revise this interface again to add a new pNFS server feature.
> Since the revised interface is only in head/current starting at r335012, I
> believe I can revise it again without an additional compatibility shim for
> r335012 or later nfsd binaries. Is this correct?
> 
> I would post a HEADS UP to this email list and the only code affected would be
> sites running current/head and using the "-p" (pNFS server) option, so they 
> would
> be few, if any.
> 

You are right.

More, it is not clear if nfsd interface should be considered part of the
stable contract even on stable.  It is clearly the management interface,
nfsd is not required to get the system operational enough to install the
right nfsd.  If possible, stable should not add more troubles for upgrade,
while for HEAD it does not matter.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: mlx5(4) jumbo receive

2018-04-26 Thread Konstantin Belousov
On Wed, Apr 25, 2018 at 04:04:13PM -0400, Ryan Stone wrote:
> On Tue, Apr 24, 2018 at 4:55 AM, Konstantin Belousov
> <kostik...@gmail.com> wrote:
> > +#ifndef MLX5E_MAX_RX_BYTES
> > +#defineMLX5E_MAX_RX_BYTES MCLBYTES
> > +#endif
> 
> Why do you use a 2KB buffer rather than a PAGE_SIZE'd buffer?
> MJUMPAGESIZE should offer significantly better performance for jumbo
> frames without increasing the risk of memory fragmentation.
Part of the answer is that the patch was not written in one go (even not
by one person), but evolved, and this is how it shaped.

Another part is that indeed, as Rick stated, I am not sure about mixing
the different sizes for mbuf allocator.  This might be more FUD than
factual-based considerations, but still.

I believe that the patch as is provides the important improvements.
If developing mlx4(4) change of the same nature, I will probably take
this into the exp stage from the beginning.  For mlx5(4), I think
that the patch should be applied as is, then I might  experiment
with PAGE_SIZE as the later step.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: invalid bcd 194

2017-12-30 Thread Konstantin Belousov
On Sat, Dec 30, 2017 at 10:07:11PM +0100, Matthias Apitz wrote:
> El d??a s??bado, diciembre 30, 2017 a las 10:44:57p. m. +0200, Konstantin 
> Belousov escribi??:
> 
> > On Sat, Dec 30, 2017 at 09:03:17PM +0100, Matthias Apitz wrote:
> > > 
> > > Hello,
> > > 
> > > I've got an older Acer C720 with r314251, which was not booted for some 
> > > time,
> > > and now panics on boot, also in single user mode, saying:
> > > 
> > > ...
> > > Dec 30 19:54:26 c720-r314251 kernel: ada0: Command Queueing enabled
> > > Dec 30 19:54:26 c720-r314251 kernel: ada0: 244198MB (500118192 512 byte 
> > > sectors)
> > > Dec 30 19:54:26 c720-r314251 kernel: WARNING: WITNESS option enabled, 
> > > expect reduced performance.
> > > Dec 30 19:54:26 c720-r314251 kernel: Trying to mount root from 
> > > ufs:/dev/ada0p2 [rw,noatime]...
> > > panic: invalid bcd 194
> > > ...
> > > 
> > > The message comes from 
> > > 
> > > $ find * -type f -exec fgrep "invalid bcd" {} /dev/null \;
> > > sys/sys/libkern.h:("invalid bcd %d", bcd));
> > > 
> > > $ vim sys/sys/libkern.h
> > > ...
> > > #define LIBKERN_LEN_BCD2BIN 154
> > > #define LIBKERN_LEN_BIN2BCD 100
> > > #define LIBKERN_LEN_HEX2ASCII   36
> > > 
> > > static inline u_char
> > > bcd2bin(int bcd)
> > > {
> > > 
> > > KASSERT(bcd >= 0 && bcd < LIBKERN_LEN_BCD2BIN,
> > > ("invalid bcd %d", bcd));
> > > return (bcd2bin_data[bcd]);
> > > }
> > > 
> > > Any idea what could be damaged the system and what to do or check before
> > > re-setup?
> > 
> > Show the backtrace.
> 
> Thanks, here we have it as photo: 
> http://www.unixarea.de/download_238222137_147226.jpg

For an immediate relief, enter the BIOS setup and set up the date.  Try to
change it even if the BIOS date looks fine.

artc(4) should do more validation of the date read from CMOS, but this is
a known issue.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Programmatically cache line

2017-12-30 Thread Konstantin Belousov
On Sat, Dec 30, 2017 at 07:50:19AM +, blubee blubeeme wrote:
> Is there some way to programmatically get the CPU cache line sizes on
> FreeBSD?

There are, all of them are MD.

On x86, the CPUID instruction leaf 0x1 returns the information in
%ebx register.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Programmatically cache line

2018-01-04 Thread Konstantin Belousov
On Thu, Jan 04, 2018 at 10:03:32AM +, David Chisnall wrote:
> On 3 Jan 2018, at 22:12, Nathan Whitehorn <nwhiteh...@freebsd.org> wrote:
> > 
> > On 01/03/18 13:37, Ed Schouten wrote:
> >> 2018-01-01 11:36 GMT+01:00 Konstantin Belousov <kostik...@gmail.com>:
> >>>>>> On x86, the CPUID instruction leaf 0x1 returns the information in
> >>>>>> %ebx register.
> >>>>> Hm, weird. Why don't we extend sysctl to include this info?
> >>> For the same reason we do not provide a sysctl to add two integers.
> >> I strongly agree with Kostik on this one. Why add stuff to the kernel,
> >> if userspace is already capable of extracting this? Adding that stuff
> >> to sysctl has the downside that it will effectively introduce yet
> >> another FreeBSDism, whereas something generic already exists.
> >> 
> > 
> > Well, kind of. The userspace version is platform-dependent and not always 
> > available: for example, on PPC, you can't do this from userland and we 
> > provide a sysctl machdep.cacheline_size to userland. It would be nice to 
> > have an MI API.
> 
> On ARMv8, similarly, sometimes the kernel needs to advertise the wrong size.  
> A few big.LITTLE cores have 64-byte cache lines on one cluster and 32-byte on 
> the other.  If you query the size from userspace while running on a 64-byte 
> cluster, then issue the zero-cache-line instruction while migrated to the 
> 32-byte cluster, you only clear half the size.  Linux works around this by 
> trapping and emulating the instruction to query the cache size and always 
> reporting the size for the smallest cache lines.  ARM tells people not to 
> build systems like this, but it doesn???t always stop them.  Trapping and 
> emulating is much slower than just providing the information in a shared 
> page, elf aux args vector, or even (often) a system call.

Of course MD way is the best way to get such information, just because the
meaning of the 'cache line size' exists only in context of the given CPU
(micro)architecture.  For instance, on PowerPC and ARM you are often concerned
with the granularity of the instruction cache flush, but also you might be
concerned with the DMA, and these are different concepts of cache.

Even on x86, you may care about alignment to avoid false sharing or
about CLFLUSH granularity, and these can be different legitimately.
Which one to report as 'cache line' ?

And you cannot bail out with the max among all constants, because sometimes
you really need the min size (for CLFLUSH), and sometime max size (for
false sharing).

> 
> To give another example, Linux provides a very cheap way for a userspace 
> process to enquire which core it???s running on.  Some more recent 
> high-performance mallocs use this to have a second-layer per-core cache after 
> the per-thread cache for free blocks.  Unlike the per-thread cache, the 
> per-core cache does need a lock, but it???s very unlikely to be contended (it 
> will only be contended if either a thread is migrated in between checking and 
> locking, so acquires the wrong CPU???s lock, or if a thread is preempted in 
> the middle of middle of the very brief fill operation).  The author of the 
> SuperMalloc paper tried doing this with CPUID and found that it was slower by 
> a sufficient margin to almost entirely offset the benefits of the extra layer 
> of caching.  

There, RDTSCP is the intended way to get cpu id in userspace, but the use
of this instruction requires some minimal OS support.  It should be faster
than CPUID, since it is not fully serializing.  We do not support it only
because nobody asked so far.

> 
> Just because userspace can get at the information directly from the hardware 
> doesn???t mean that this is the most efficient or best way for userspace to 
> get at it.
> 
It depends, but single instruction (!) vs syscall comparision makes this
discussion silly.

> Oh, and some of these things are useful in portable code, so having to write 
> some assembly for every target to get information that the kernel already 
> knows is wasteful.
> 
Required work is to provide the definitions of these interfaces, then they
can be implemented in the best way for each architecture.  But nobody did
that.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Insta-reset-bootloop with r328166 and vm.pmap.pcid_enabled=1

2018-01-25 Thread Konstantin Belousov
On Thu, Jan 25, 2018 at 04:58:04PM -0500, Ian FREISLICH wrote:
> Hi
> 
> I cannot for the life of me recall why I had vm.pmap.pcid_enabled=1 set
> in loader.conf, but I did.  Most likely very historical reasons.
> 
> r328166 and later reset without dropping into the debugger during boot,
> no panic message.

Yes, this is how it is currently behaves.
PCID can be used to optimize PTI, see https://reviews.freebsd.org/D13985.
It is used for very differrent algorithm when PTI=1, comparing with PTI=0.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [CFT] AMD cpu microcode update port sysutils/devcpu-data D13832

2018-01-12 Thread Konstantin Belousov
On Fri, Jan 12, 2018 at 12:12:31PM -0500, Mike Tancsa wrote:
> On 1/12/2018 11:58 AM, Sean Bruno wrote:
> >>
> > 
> > Probably, update your port of x86info.  I pushed an update for this and
> > it might work better for you.
> 
> Thanks, I tried it but still generates the error / warning. What does it
> mean ? (CPU0: local APIC error 0x80)
It means that the x86info accessed not implemented LAPIC register.
This warning is mostly harmless.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: td_swvoltick

2018-01-12 Thread Konstantin Belousov
On Fri, Jan 12, 2018 at 01:31:41PM -0600, Eric van Gyzen wrote:
> should_yield() compares thread::td_swvoltick to 'ticks' to determine
> whether a thread is hogging and should yield.  Since td_swvoltick
> records 'ticks' /before/ the actual context switch, the calculation in
> should_yield() includes any time that the thread was switched out.  It
> seems that should_yield() wants to know how long the thread has actually
> been running.  Therefore, td_swvoltick should record 'ticks' /after/
> sched_switch() returns.
> 
> Does this make sense, or am I missing something?
Yes, it does make sense to me.

> 
> If this makes sense, I would probably keep the current assignment in
> mi_switch() and simply add a second assignment after the call to
> sched_switch().  That way, db_show_thread will still show useful data
> for sleeping threads.  I would do the same for td_swinvolticks.
> 
> I'll be happy to make the change myself.  I just want a sanity check
> before I bother.
> 
> Thanks in advance,
> 
> Eric
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic in prison_alloc() on boot

2018-02-12 Thread Konstantin Belousov
On Sun, Feb 11, 2018 at 11:50:43PM -0500, Ryan Stone wrote:
> I'm getting a persistent panic on boot in prison_allow().  The first
> case that I hit is this:
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 10; apic id = 22
> fault virtual address   = 0x30
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x80ab9674
> stack pointer   = 0x28:0xfe00f8bb16f0
> frame pointer   = 0x28:0xfe00f8bb16f0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 90 (mount_nullfs)
> [ thread pid 90 tid 100913 ]
> Stopped at  prison_allow+0x4:   movqll+0xf(%rdi),%rax
> db> bt
> Tracing pid 90 tid 100913 td 0xf801a5aab000
> prison_allow() at prison_allow+0x4/frame 0xfe00f8bb16f0
Can you get the core dump and look at the arguments ?
It is most likely cred->cr_prison which is NULL.

> nullfs_mount() at nullfs_mount+0x31/frame 0xfe00f8bb1830
> vfs_donmount() at vfs_donmount+0x1415/frame 0xfe00f8bb1a80
> sys_nmount() at sys_nmount+0x72/frame 0xfe00f8bb1ac0
> amd64_syscall() at amd64_syscall+0xa48/frame 0xfe00f8bb1bf0
> fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffecb0
> 
> However if I comment out my nullfs mounts in /etc/fstab, I just get a
> panic in prison_allow() called from elsewhere.
> 
> I've seen this panic with r328936 (Feb 6), r329091 (Feb 9) and r329142 (Feb 
> 11)
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

2018-02-20 Thread Konstantin Belousov
On Tue, Feb 20, 2018 at 05:50:57PM +0100, Juan Ram?n Molina Menor wrote:
> > I committed the fix in
> > https://svnweb.freebsd.org/base?view=revision=329542
> >
> > i.e. should be stable from this point on.
> 
> Hi!
> 
> It is maybe unrelated, but recent commits have broken my system with a 
> similar error. I did not have panics with a system built around 
> December, but since updating first to r329555 then today to r329641 I?m 
> getting a reproducible panic when logging out from a Lumina desktop session:
> 
> Unread portion of the kernel message buffer:
> spin lock 0xf8000d440020 (process slock) held by 0xf8000daed560 
> (tid 100111) too long
> panic: spin lock held too long
> cpuid = 1
> time = 1519143505
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> 0xfe5c15e0
> vpanic() at vpanic+0x18d/frame 0xfe5c1640
> panic() at panic+0x43/frame 0xfe5c16a0
> _mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x71/frame 
> 0xfe5c16b0
> mtx_spin_wait_unlocked() at mtx_spin_wait_unlocked+0x59/frame 
> 0xfe5c16e0
> proc_reap() at proc_reap+0x24/frame 0xfe5c1720
> procdesc_close() at procdesc_close+0x125/frame 0xfe5c1760
> closef() at closef+0x251/frame 0xfe5c17f0
> fdescfree_fds() at fdescfree_fds+0x90/frame 0xfe5c1840
> fdescfree() at fdescfree+0x4df/frame 0xfe5c1900
> exit1() at exit1+0x508/frame 0xfe5c1970
> sys_sys_exit() at sys_sys_exit+0xd/frame 0xfe5c1980
> amd64_syscall() at amd64_syscall+0xa48/frame 0xfe5c1ab0
> fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffea90
> Uptime: 17m45s
> Dumping 327 out of 3990 MB:..5%..15%..25%..35%..44%..54%..64%..74%..84%..93%
> 
> Reading symbols from /boot/kernel/linux.ko...done.
> Loaded symbols for /boot/kernel/linux.ko
> Reading symbols from /boot/kernel/linux_common.ko...done.
> Loaded symbols for /boot/kernel/linux_common.ko
> Reading symbols from /boot/kernel/acpi_ibm.ko...done.
> Loaded symbols for /boot/kernel/acpi_ibm.ko
> Reading symbols from /boot/kernel/iwm7260fw.ko...done.
> Loaded symbols for /boot/kernel/iwm7260fw.ko
> Reading symbols from /boot/kernel/coretemp.ko...done.
> Loaded symbols for /boot/kernel/coretemp.ko
> Reading symbols from /boot/kernel/if_iwm.ko...done.
> Loaded symbols for /boot/kernel/if_iwm.ko
> Reading symbols from /boot/kernel/acpi_video.ko...done.
> Loaded symbols for /boot/kernel/acpi_video.ko
> Reading symbols from /boot/kernel/nullfs.ko...done.
> Loaded symbols for /boot/kernel/nullfs.ko
> Reading symbols from /boot/kernel/fdescfs.ko...done.
> Loaded symbols for /boot/kernel/fdescfs.ko
> Reading symbols from /boot/kernel/i915kms.ko...done.
> Loaded symbols for /boot/kernel/i915kms.ko
> Reading symbols from /boot/kernel/drm2.ko...done.
> Loaded symbols for /boot/kernel/drm2.ko
> Reading symbols from /boot/kernel/iicbus.ko...done.
> Loaded symbols for /boot/kernel/iicbus.ko
> Reading symbols from /boot/kernel/iic.ko...done.
> Loaded symbols for /boot/kernel/iic.ko
> Reading symbols from /boot/kernel/iicbb.ko...done.
> Loaded symbols for /boot/kernel/iicbb.ko
> #0  cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1324
> 1324    CPU_SET_ATOMIC(cpu, _cpus);
> (kgdb) bt
> #0  cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1324
> #1  0x80e29fb4 in ipi_nmi_handler () at 
> /usr/src/sys/x86/x86/mp_x86.c:1280
> #2  0x80d09a79 in trap (frame=0x8158bef0)
>      at /usr/src/sys/amd64/amd64/trap.c:188
> #3  0x80cec054 in nmi_calltrap () at 
> /usr/src/sys/amd64/amd64/exception.S:633
> #4  0x80e1aaef in acpi_cpu_idle_mwait (mwait_hint=0) at 
> cpufunc.h:611
> Previous frame inner to this frame (corrupt stack?)
> Current language:  auto; currently minimal
> 
> kgdb is over my head, but I can provide more details under some guidance.

Use this.

diff --git a/sys/kern/sys_procdesc.c b/sys/kern/sys_procdesc.c
index 5e8928cb153..174fffc5c66 100644
--- a/sys/kern/sys_procdesc.c
+++ b/sys/kern/sys_procdesc.c
@@ -398,7 +398,6 @@ procdesc_close(struct file *fp, struct thread *td)
 * process's reference to the process descriptor when it
 * calls back into procdesc_reap().
 */
-   PROC_SLOCK(p);
proc_reap(curthread, p, NULL, 0);
} else {
/*
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: pkg does not recognize correct kernel version

2018-02-19 Thread Konstantin Belousov
On Mon, Feb 19, 2018 at 09:39:37PM +0100, Rainer Hurling wrote:
> Am 19.02.2018 um 21:24 schrieb Ronald Klop:
> > On Mon, 19 Feb 2018 21:10:48 +0100, Rainer Hurling  wrote:
> > 
> >> Hi Ronald,
> >>
> >> Am 19.02.2018 um 20:27 schrieb Ronald Klop:
> >>> I just did this.
> >>>
> >>> root@sjakie ~]# pkg upgrade
> >>> Updating FreeBSD repository catalogue...
> >>> Fetching meta.txz: 100%    944 B   0.9kB/s    00:01
> >>> Fetching packagesite.txz: 100%    6 MiB   6.0MB/s    00:01
> >>> Processing entries:   0%
> >>> pkg: Newer FreeBSD version for package gnome-menus:
> >>> - package: 1200058
> >>> - running kernel: 1200056
> >>> pkg: repository FreeBSD contains packages for wrong OS version:
> >>> FreeBSD:12:amd64
> >>> Processing entries: 100%
> >>> Unable to update repository FreeBSD
> >>> Error updating repositories!
> >>>
> >>> [root@sjakie ~]# uname -UK
> >>> 1200058 1200058
> >>>
> >>> [root@sjakie ~]# uname -a
> >>> FreeBSD sjakie 12.0-CURRENT FreeBSD 12.0-CURRENT #4 r329516M: Sun Feb 18
> >>> 12:37:36 CET 2018  
> >>> ronald@sjakie:/data/ronald/obj-freebsd-current/data/ronald/freebsd-current/amd64.amd64/sys/GENERIC-NODEBUGamd64
> >>>
> >>>
> >>>
> >>> So uname gives a different version than pkg detects.
> >>>
> >>> What is happening? pkg update -f gives the same result. -o
> >>> OSVERSION=1200058 helps, but does not feel like the right solution.
> >>>
> >>> Regards,
> >>> Ronald.
> >>
> >> Please try
> >>
> >> #sysctl kern.osreldate
> >> kern.osreldate: 1200058
> >>
> >> HTH,
> >> Rainer Hurling
> > 
> > 
> > [root@sjakie ~]# sysctl kern.osreldate
> > kern.osreldate: 1200058
> > 
> > Regards,
> > Ronald.
> 
> On my kernel patchlevel 1200058 (r329446) I get:
> 
> #pkg update -f
> Updating FreeBSD repository catalogue...
> Fetching meta.txz: 100%944 B   0.9kB/s00:01
> Fetching packagesite.txz: 100%6 MiB   1.2MB/s00:05
> Processing entries: 100%
> FreeBSD repository update completed. 28645 packages processed.
> All repositories are up to date.
> 
> 
> Perhaps more a local problem :(

Look at the man page.  pkg reads version from the /bin/sh ELF FreeBSD
version note:
orion% file /bin/ls
/bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically 
linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 11.1 (1101506), 
FreeBSD-style, stripped

Update world past the __FreeBSD_version which is reported for the repository.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Problem with C11 _Atomic

2018-01-02 Thread Konstantin Belousov
On Tue, Jan 02, 2018 at 03:17:14PM +0100, Pierre DAVID wrote:
> On Mon, Jan 01, 2018 at 11:09:07PM +0200, Konstantin Belousov wrote:
> >clang issues a calls to libatomic, which we do not provide.
> >As a workaround, use the following command to compile.  The resulting
> >binary works on all practically usable machines.
> > $ cc -march=core2 source.c
> >You might want to turn off sse3/4.1 if you are concerned about older 
> >pentium4.
> >
> 
> Thanks for your help. I wish that the C11 status of FreeBSD will soon
> be complete out of the box, without the help of such a hack.

This is not FreeBSD but clang.  Also I looked at the generated reference,
and the referenced symbol was absent in the gcc' 7.2.0 libatomic.

Same common problem with i386 and same cmpxchg8b is popular because the
default arch is i486.

This is a clang way of operations.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Programmatically cache line

2018-01-01 Thread Konstantin Belousov
On Mon, Jan 01, 2018 at 06:52:37AM +, David Chisnall wrote:
> On 1 Jan 2018, at 05:09, Adrian Chadd <adrian.ch...@gmail.com> wrote:
> > 
> > On 30 December 2017 at 00:28, Konstantin Belousov <kostik...@gmail.com> 
> > wrote:
> >> On Sat, Dec 30, 2017 at 07:50:19AM +, blubee blubeeme wrote:
> >>> Is there some way to programmatically get the CPU cache line sizes on
> >>> FreeBSD?
> >> 
> >> There are, all of them are MD.
> >> 
> >> On x86, the CPUID instruction leaf 0x1 returns the information in
> >> %ebx register.
> > 
> > Hm, weird. Why don't we extend sysctl to include this info?
For the same reason we do not provide a sysctl to add two integers.

> 
> It would be nice to expose this kind of information via VDSO or similar.  
> There are a lot of similar bits of info that people want to use for ifunc 
> and, SVE is going to have a bunch of similar requirements.
> 
Is VDSO a new trendy word ?

ifunc resolvers in usermode on FreeBSD/x86 get four arguments which
are essentially cpu_features / cpu_features2 / cpu_stdext_features /
cpu_stdext_features2.  I suspect that only FreeBSD/x86 arches have the
ifunc support, in rtld and coming shortly in kernel.

Recently HW_CAP/HW_CAP2 were added to the ELF auxv, and elf_aux_info(3)
interface exported from libc.

ARM* did not implemented yet the ifunc stubs in rtld. I believe this is
considered a low priority because there is no ready to use toolchain
which allow to utilize ifuncs on FreeBSD, except if you use recent bfd
ld externally.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Problem with C11 _Atomic

2018-01-01 Thread Konstantin Belousov
On Mon, Jan 01, 2018 at 09:47:40PM +0100, Pierre DAVID wrote:
> Hi,
> 
> I'm on a recent current:
> FreeBSD biceps.ma.maison 12.0-CURRENT FreeBSD 12.0-CURRENT #2 r327239: 
> Wed Dec 27 18:25:46 CET 2017 
> p...@biceps.ma.maison:/usr/obj/usr/src/amd64.amd64/sys/BICEPS  amd64
> 
> with clang 5.0.1:
> FreeBSD clang version 5.0.1 (tags/RELEASE_501/final 320880) (based on 
> LLVM 5.0.1)
> Target: x86_64-unknown-freebsd12.0
> Thread model: posix
> InstalledDir: /usr/bin
> 
> I'm having a problem with the following source file:
> 
> --
> #include 
> 
> struct foo
> {
> int f1 ;
> char f2 ;
> int f3 ;
> } ;
> 
> _Atomic struct foo a ;
> struct foo b ;
> 
> int main (int argc, char *argv [])
> {
> b = (struct foo) {.f1 = 5, .f2 = 7, .f3 = 9 } ;
> // atomic_store (, b) ;
> a = b ;
> }
> --
> 
> This code does not compile/link with:
> % cc foo.c -lstdthreads
> /tmp/foo-a0ef26.o: In function `main':
> foo.c:(.text+0x63): undefined reference to `__sync_lock_test_and_set_16'
> cc: error: linker command failed with exit code 1 (use -v to see 
> invocation)
> 
> The gcc internal seems to be linked as "cc -v" told me:
> % cc -v foo.c -lstdthreads
> ...
>  "/usr/bin/ld" --eh-frame-hdr -dynamic-linker /libexec/ld-elf.so.1 
> --hash-style=both --enable-new-dtags -o a.out /usr/lib/crt1.o /usr/lib/crti.o 
> /usr/lib/crtbegin.o -L/usr/lib /tmp/foo-d7a21b.o -lstdthreads -lgcc 
> --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s 
> --no-as-needed /usr/lib/crtend.o /usr/lib/crtn.o
> 
> The problem occurs with "atomic_store(, b)" as well as with "a = b".
> 
> If I remove the f3 member, the struct foo is only 8 bytes and the code
> compiles/links.
> 
> Did I missed something?

clang issues a calls to libatomic, which we do not provide.
As a workaround, use the following command to compile.  The resulting
binary works on all practically usable machines.
$ cc -march=core2 source.c
You might want to turn off sse3/4.1 if you are concerned about older pentium4.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Linux process causes kernel panic

2018-08-03 Thread Konstantin Belousov
On Fri, Aug 03, 2018 at 09:26:08PM +0100, Johannes Lundberg wrote:
> Hi
> 
> After install new kernel+world built from today's checkout I keep getting
> the same crash over and over. Never had this problem before. The previous
> kernel was from 3 weeks ago.
> 
> Looks familiar to anyone?
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address= 0xfffe665c
> fault code= supervisor write data, protection violation
> instruction pointer= 0x20:0x82282db3
> stack pointer= 0x0:0xfe004c74c8c8
> frame pointer= 0x0:0xfe004c74c980
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process= 1579 (wcgrid_zika_vina_7.)
> trap number= 12
> panic: page fault
> cpuid = 0
> time = 1533327428
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe004c74c580
> vpanic() at vpanic+0x1a3/frame 0xfe004c74c5e0
> panic() at panic+0x43/frame 0xfe004c74c640
> trap_fatal() at trap_fatal+0x35f/frame 0xfe004c74c690
> trap_pfault() at trap_pfault+0x62/frame 0xfe004c74c6e0
> trap() at trap+0x2ba/frame 0xfe004c74c7f0
> calltrap() at calltrap+0x8/frame 0xfe004c74c7f0
> --- trap 0xc, rip = 0x82282db3, rsp = 0xfe004c74c8c8, rbp =
> 0xfe004c74c980 ---
> futex_xchgl() at futex_xchgl+0x23/frame 0xfe004c74c980
> ia32_syscall() at ia32_syscall+0x29f/frame 0xfe004c74cab0
> int0x80_syscall_common() at int0x80_syscall_common+0x9c/frame 0x401
> Uptime: 7m29s
> Dumping 411 out of 8056 MB:..4%..12%..24%..32%..43%..51%..63%..74%..82%..94%
> Dump complete

Post first 40 lines from the verbose dmesg boot of your machine.

I have a guess what is going on, I need the dmesg to confirm.
If my guess is correct, you can use a workaround by setting the
"hw.cpu_stdext_disable=0x0010" tunable at the loader prompt for now.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: mutex pmap not owned at ... efirt_machdep.c:255

2018-08-04 Thread Konstantin Belousov
On Sat, Aug 04, 2018 at 08:28:25PM +0100, Warner Losh wrote:
> When I looked at it, I'd assumed there would be VA range we'd assign to the
> PAs in the EFI table that at the loader and kernel would agree on. The DMAP
> does this on x64 and aarch64, but that's not an option for armv7 nor i386.

It is not DMAP.

Amd64 works by assumption that ROM BIOS and its memory are located at
the physical addresses below 4G. Since kernel is always mapped at upper
half of the virtual address space, we hope that identity mapping for RT
memory can be established over the user half of the VA.

Apparently arm64 makes the same assumption.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: mutex pmap not owned at ... efirt_machdep.c:255

2018-08-04 Thread Konstantin Belousov
On Sat, Aug 04, 2018 at 08:05:24AM -0500, Kyle Evans wrote:
> On Sat, Aug 4, 2018 at 3:37 AM, Konstantin Belousov  
> wrote:
> > On Fri, Aug 03, 2018 at 11:27:02PM -0500, Kyle Evans wrote:
> >>
> >> This seems odd- pmap lock is acquired at [1], then asserted shortly
> >> later at [2]... I avoid some of this stuff as well as I can, but is it
> >> actually possible for PCPU_GET(...) acquired curpmap to not match
> >> curthread->td_proc->p_vmspace->vm_pmap in this context?
> >>
> >> [1] 
> >> https://svnweb.freebsd.org/base/head/sys/dev/efidev/efirt.c?view=markup#l260
> >> [2] 
> >> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/efirt_machdep.c?view=markup#l254
> > There could be that curpcpu not yet synced with proc0 pmap.  It could be
> > fixed.
> >
> > But it is not clear to me why efi_arch_enter() is called there.  I see
> > the check for GetTime belonging to the range described by a map descriptor.
> > I do not see why do you need an enter into the EFI context for comparing
> > integers.
> 
> This probably could have been documented better, but efi_runtime
> pointer may (always?) point into runtime service memory that isn't
> valid/available at that point, so we get a fault and panic when
> dereferencing it to grab rt_gettime address. We ran into this wall
> when adding the check originally.
Wouldn't it be enough to access it by translating physical address into
DMAP ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Linux process causes kernel panic

2018-08-04 Thread Konstantin Belousov
On Sat, Aug 04, 2018 at 01:12:17PM +0100, Johannes Lundberg wrote:
> No panic over night with that tunable so it seems you're on the right
> track.

Please try this, on top of r337316.

diff --git a/sys/amd64/linux/linux_machdep.c b/sys/amd64/linux/linux_machdep.c
index 6c5b014853f..434ea0eac07 100644
--- a/sys/amd64/linux/linux_machdep.c
+++ b/sys/amd64/linux/linux_machdep.c
@@ -78,6 +78,9 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 
+#include 
+#include 
+
 #include 
 #include 
 #include 
@@ -88,8 +91,6 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 
-#include 
-
 int
 linux_execve(struct thread *td, struct linux_execve_args *args)
 {
@@ -276,3 +277,48 @@ linux_set_cloned_tls(struct thread *td, void *desc)
 
return (0);
 }
+
+int futex_xchgl_nosmap(int oparg, uint32_t *uaddr, int *oldval);
+int futex_xchgl_smap(int oparg, uint32_t *uaddr, int *oldval);
+DEFINE_IFUNC(, int, futex_xchgl, (int, uint32_t *, int *), static)
+{
+
+   return ((cpu_stdext_feature & CPUID_STDEXT_SMAP) != 0 ?
+   futex_xchgl_smap : futex_xchgl_nosmap);
+}
+
+int futex_addl_nosmap(int oparg, uint32_t *uaddr, int *oldval);
+int futex_addl_smap(int oparg, uint32_t *uaddr, int *oldval);
+DEFINE_IFUNC(, int, futex_addl, (int, uint32_t *, int *), static)
+{
+
+   return ((cpu_stdext_feature & CPUID_STDEXT_SMAP) != 0 ?
+   futex_addl_smap : futex_addl_nosmap);
+}
+
+int futex_orl_nosmap(int oparg, uint32_t *uaddr, int *oldval);
+int futex_orl_smap(int oparg, uint32_t *uaddr, int *oldval);
+DEFINE_IFUNC(, int, futex_orl, (int, uint32_t *, int *), static)
+{
+
+   return ((cpu_stdext_feature & CPUID_STDEXT_SMAP) != 0 ?
+   futex_orl_smap : futex_orl_nosmap);
+}
+
+int futex_andl_nosmap(int oparg, uint32_t *uaddr, int *oldval);
+int futex_andl_smap(int oparg, uint32_t *uaddr, int *oldval);
+DEFINE_IFUNC(, int, futex_andl, (int, uint32_t *, int *), static)
+{
+
+   return ((cpu_stdext_feature & CPUID_STDEXT_SMAP) != 0 ?
+   futex_andl_smap : futex_andl_nosmap);
+}
+
+int futex_xorl_nosmap(int oparg, uint32_t *uaddr, int *oldval);
+int futex_xorl_smap(int oparg, uint32_t *uaddr, int *oldval);
+DEFINE_IFUNC(, int, futex_xorl, (int, uint32_t *, int *), static)
+{
+
+   return ((cpu_stdext_feature & CPUID_STDEXT_SMAP) != 0 ?
+   futex_xorl_smap : futex_xorl_nosmap);
+}
diff --git a/sys/amd64/linux/linux_support.s b/sys/amd64/linux/linux_support.s
index a9f02160be2..391f76414f2 100644
--- a/sys/amd64/linux/linux_support.s
+++ b/sys/amd64/linux/linux_support.s
@@ -38,7 +38,7 @@ futex_fault:
movl$-EFAULT,%eax
ret
 
-ENTRY(futex_xchgl)
+ENTRY(futex_xchgl_nosmap)
movqPCPU(CURPCB),%r8
movq$futex_fault,PCB_ONFAULT(%r8)
movq$VM_MAXUSER_ADDRESS-4,%rax
@@ -49,25 +49,58 @@ ENTRY(futex_xchgl)
xorl%eax,%eax
movq%rax,PCB_ONFAULT(%r8)
ret
-END(futex_xchgl)
+END(futex_xchgl_nosmap)
 
-ENTRY(futex_addl)
+ENTRY(futex_xchgl_smap)
movqPCPU(CURPCB),%r8
movq$futex_fault,PCB_ONFAULT(%r8)
movq$VM_MAXUSER_ADDRESS-4,%rax
cmpq%rax,%rsi
ja  futex_fault
+   stac
+   xchgl   %edi,(%rsi)
+   clac
+   movl%edi,(%rdx)
+   xorl%eax,%eax
+   movq%rax,PCB_ONFAULT(%r8)
+   ret
+END(futex_xchgl_smap)
+
+ENTRY(futex_addl_nosmap)
+   movqPCPU(CURPCB),%r8
+   movq$futex_fault,PCB_ONFAULT(%r8)
+   movq$VM_MAXUSER_ADDRESS-4,%rax
+   cmpq%rax,%rsi
+   ja  futex_fault
+#ifdef SMP
+   lock
+#endif
+   xaddl   %edi,(%rsi)
+   movl%edi,(%rdx)
+   xorl%eax,%eax
+   movq%rax,PCB_ONFAULT(%r8)
+   ret
+END(futex_addl_nosmap)
+
+ENTRY(futex_addl_smap)
+   movqPCPU(CURPCB),%r8
+   movq$futex_fault,PCB_ONFAULT(%r8)
+   movq$VM_MAXUSER_ADDRESS-4,%rax
+   cmpq%rax,%rsi
+   ja  futex_fault
+   stac
 #ifdef SMP
lock
 #endif
xaddl   %edi,(%rsi)
+   clac
movl%edi,(%rdx)
xorl%eax,%eax
movq%rax,PCB_ONFAULT(%r8)
ret
-END(futex_addl)
+END(futex_addl_smap)
 
-ENTRY(futex_orl)
+ENTRY(futex_orl_nosmap)
movqPCPU(CURPCB),%r8
movq$futex_fault,PCB_ONFAULT(%r8)
movq$VM_MAXUSER_ADDRESS-4,%rax
@@ -85,9 +118,31 @@ ENTRY(futex_orl)
xorl%eax,%eax
movq%rax,PCB_ONFAULT(%r8)
ret
-END(futex_orl)
+END(futex_orl_nosmap)
 
-ENTRY(futex_andl)
+ENTRY(futex_orl_smap)
+   movqPCPU(CURPCB),%r8
+   movq$futex_fault,PCB_ONFAULT(%r8)
+   movq$VM_MAXUSER_ADDRESS-4,%rax
+   cmpq%rax,%rsi
+   ja  futex_fault
+   movl(%rsi),%eax
+1: movl%eax,%ecx
+   orl %edi,%ecx
+   stac
+#ifdef SMP
+   lock
+#endif
+   cmpxchgl %ecx,(%rsi)
+   clac
+   jnz 1b
+   movl%eax,(%rdx)
+   xorl%eax,%eax
+   movq%rax,PCB_ONFAULT(%r8)
+   ret

Re: panic: mutex pmap not owned at ... efirt_machdep.c:255

2018-08-04 Thread Konstantin Belousov
On Sat, Aug 04, 2018 at 08:56:58AM -0500, Kyle Evans wrote:
> On Sat, Aug 4, 2018 at 8:13 AM, Konstantin Belousov  
> wrote:
> > On Sat, Aug 04, 2018 at 08:05:24AM -0500, Kyle Evans wrote:
> >> On Sat, Aug 4, 2018 at 3:37 AM, Konstantin Belousov  
> >> wrote:
> >> > On Fri, Aug 03, 2018 at 11:27:02PM -0500, Kyle Evans wrote:
> >> >>
> >> >> This seems odd- pmap lock is acquired at [1], then asserted shortly
> >> >> later at [2]... I avoid some of this stuff as well as I can, but is it
> >> >> actually possible for PCPU_GET(...) acquired curpmap to not match
> >> >> curthread->td_proc->p_vmspace->vm_pmap in this context?
> >> >>
> >> >> [1] 
> >> >> https://svnweb.freebsd.org/base/head/sys/dev/efidev/efirt.c?view=markup#l260
> >> >> [2] 
> >> >> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/efirt_machdep.c?view=markup#l254
> >> > There could be that curpcpu not yet synced with proc0 pmap.  It could be
> >> > fixed.
> >> >
> >> > But it is not clear to me why efi_arch_enter() is called there.  I see
> >> > the check for GetTime belonging to the range described by a map 
> >> > descriptor.
> >> > I do not see why do you need an enter into the EFI context for comparing
> >> > integers.
> >>
> >> This probably could have been documented better, but efi_runtime
> >> pointer may (always?) point into runtime service memory that isn't
> >> valid/available at that point, so we get a fault and panic when
> >> dereferencing it to grab rt_gettime address. We ran into this wall
> >> when adding the check originally.
> > Wouldn't it be enough to access it by translating physical address into
> > DMAP ?
> 
> Ah, sure, sure. [1] is proper form, yeah?
> 
> [1] https://people.freebsd.org/~kevans/efi-dmap.diff

I would brace it with #ifdef PHYS_TO_DMAP, #error otherwise.
Also, it might make sense to check against dmaplimit as well (on arm64
it is called PHYS_IN_DMAP(), sight).

So it might make sense to define MD function in arch/efirt_machdep.c
to translate table' address into the KVA.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: mutex pmap not owned at ... efirt_machdep.c:255

2018-08-04 Thread Konstantin Belousov
On Sat, Aug 04, 2018 at 09:58:43AM -0500, Kyle Evans wrote:
> On Sat, Aug 4, 2018 at 9:51 AM, Ian Lepore  wrote:
> > On Sat, 2018-08-04 at 08:56 -0500, Kyle Evans wrote:
> >> On Sat, Aug 4, 2018 at 8:13 AM, Konstantin Belousov  >> com> wrote:
> >> >
> >> > On Sat, Aug 04, 2018 at 08:05:24AM -0500, Kyle Evans wrote:
> >> > >
> >> > > On Sat, Aug 4, 2018 at 3:37 AM, Konstantin Belousov  >> > > ail.com> wrote:
> >> > > >
> >> > > > On Fri, Aug 03, 2018 at 11:27:02PM -0500, Kyle Evans wrote:
> >> > > > >
> >> > > > >
> >> > > > > This seems odd- pmap lock is acquired at [1], then asserted
> >> > > > > shortly
> >> > > > > later at [2]... I avoid some of this stuff as well as I can,
> >> > > > > but is it
> >> > > > > actually possible for PCPU_GET(...) acquired curpmap to not
> >> > > > > match
> >> > > > > curthread->td_proc->p_vmspace->vm_pmap in this context?
> >> > > > >
> >> > > > > [1] https://svnweb.freebsd.org/base/head/sys/dev/efidev/efirt
> >> > > > > .c?view=markup#l260
> >> > > > > [2] https://svnweb.freebsd.org/base/head/sys/amd64/amd64/efir
> >> > > > > t_machdep.c?view=markup#l254
> >> > > > There could be that curpcpu not yet synced with proc0 pmap.  It
> >> > > > could be
> >> > > > fixed.
> >> > > >
> >> > > > But it is not clear to me why efi_arch_enter() is called
> >> > > > there.  I see
> >> > > > the check for GetTime belonging to the range described by a map
> >> > > > descriptor.
> >> > > > I do not see why do you need an enter into the EFI context for
> >> > > > comparing
> >> > > > integers.
> >> > > This probably could have been documented better, but efi_runtime
> >> > > pointer may (always?) point into runtime service memory that
> >> > > isn't
> >> > > valid/available at that point, so we get a fault and panic when
> >> > > dereferencing it to grab rt_gettime address. We ran into this
> >> > > wall
> >> > > when adding the check originally.
> >> > Wouldn't it be enough to access it by translating physical address
> >> > into
> >> > DMAP ?
> >> Ah, sure, sure. [1] is proper form, yeah?
> >>
> >> [1] https://people.freebsd.org/~kevans/efi-dmap.diff
> >
> > What do we do on 32-bit arm that has no dmap but may have efi runtime
> > support?
> >
> 
> This should probably just be compiled out for !arm64 && !x86 - its
> sole purpose was to compensate for outdated loader.efi that hasn't
> done the SetVirtualAddressMap. EFI on 32-bit ARM is "new" enough that
> it shouldn't have this problem.
Does EFI on 32bit arm have RT support ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: mutex pmap not owned at ... efirt_machdep.c:255

2018-08-04 Thread Konstantin Belousov
On Sat, Aug 04, 2018 at 09:25:47AM -0600, Ian Lepore wrote:
> On Sat, 2018-08-04 at 18:22 +0300, Konstantin Belousov wrote:
> > On Sat, Aug 04, 2018 at 09:58:43AM -0500, Kyle Evans wrote:
> > > 
> > > On Sat, Aug 4, 2018 at 9:51 AM, Ian Lepore  wrote:
> > > > 
> > > > On Sat, 2018-08-04 at 08:56 -0500, Kyle Evans wrote:
> > > > > 
> > > > > On Sat, Aug 4, 2018 at 8:13 AM, Konstantin Belousov  > > > > gmail.
> > > > > com> wrote:
> > > > > > 
> > > > > > 
> > > > > > On Sat, Aug 04, 2018 at 08:05:24AM -0500, Kyle Evans wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Sat, Aug 4, 2018 at 3:37 AM, Konstantin Belousov  > > > > > > bel@gm
> > > > > > > ail.com> wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Fri, Aug 03, 2018 at 11:27:02PM -0500, Kyle Evans
> > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > This seems odd- pmap lock is acquired at [1], then
> > > > > > > > > asserted
> > > > > > > > > shortly
> > > > > > > > > later at [2]... I avoid some of this stuff as well as I
> > > > > > > > > can,
> > > > > > > > > but is it
> > > > > > > > > actually possible for PCPU_GET(...) acquired curpmap to
> > > > > > > > > not
> > > > > > > > > match
> > > > > > > > > curthread->td_proc->p_vmspace->vm_pmap in this context?
> > > > > > > > > 
> > > > > > > > > [1] https://svnweb.freebsd.org/base/head/sys/dev/efidev
> > > > > > > > > /efirt
> > > > > > > > > .c?view=markup#l260
> > > > > > > > > [2] https://svnweb.freebsd.org/base/head/sys/amd64/amd6
> > > > > > > > > 4/efir
> > > > > > > > > t_machdep.c?view=markup#l254
> > > > > > > > There could be that curpcpu not yet synced with proc0
> > > > > > > > pmap.  It
> > > > > > > > could be
> > > > > > > > fixed.
> > > > > > > > 
> > > > > > > > But it is not clear to me why efi_arch_enter() is called
> > > > > > > > there.  I see
> > > > > > > > the check for GetTime belonging to the range described by
> > > > > > > > a map
> > > > > > > > descriptor.
> > > > > > > > I do not see why do you need an enter into the EFI
> > > > > > > > context for
> > > > > > > > comparing
> > > > > > > > integers.
> > > > > > > This probably could have been documented better, but
> > > > > > > efi_runtime
> > > > > > > pointer may (always?) point into runtime service memory
> > > > > > > that
> > > > > > > isn't
> > > > > > > valid/available at that point, so we get a fault and panic
> > > > > > > when
> > > > > > > dereferencing it to grab rt_gettime address. We ran into
> > > > > > > this
> > > > > > > wall
> > > > > > > when adding the check originally.
> > > > > > Wouldn't it be enough to access it by translating physical
> > > > > > address
> > > > > > into
> > > > > > DMAP ?
> > > > > Ah, sure, sure. [1] is proper form, yeah?
> > > > > 
> > > > > [1] https://people.freebsd.org/~kevans/efi-dmap.diff
> > > > What do we do on 32-bit arm that has no dmap but may have efi
> > > > runtime
> > > > support?
> > > > 
> > > This should probably just be compiled out for !arm64 && !x86 - its
> > > sole purpose was to compensate for outdated loader.efi that hasn't
> > > done the SetVirtualAddressMap. EFI on 32-bit ARM is "new" enough
> > > that
> > > it shouldn't have this problem.
> > Does EFI on 32bit arm have RT support ?
> 
> I suspect the uboot implementation doesn't, but I can't think of any
> reason why other implementations are not possible/available. In
> particular, even 32bit arm supports virtualization and such an
> environment could provide rt support.

No, I mean, does our kernel has RT support on armv7 ?  I only implemented
necessary VM tricks for amd64, then it was ported to arm64, and in both
cases it relies on 64bit address space and specific location of the KVA.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: IOMMU for GPUs

2018-08-05 Thread Konstantin Belousov
On Sun, Aug 05, 2018 at 10:34:56AM +0100, Johannes Lundberg wrote:
> Hi
> 
> First I have to say I don't know much when it comes to virtual GPUs and
> IOMMU. I'm trying to figure out what we have and what is missing in regards
> to sharing the GPU to virtual guests on Intel and AMD and things like the
> amdkfd driver (for Radeon open compute).
> 
> Looking at the state of IOMMU in FreeBSD there seem to be (what I guess is)
> a general driver for Intel at /usr/src/sys/x86/iommu/.
This is indeed a generic driver, which also provides busdma
implementation using DMAR. Its main use right now is to debug other
drivers. It is also helpful if some device has very restrictive DMA
alignment, contiguous or location requirements which cannot be satisfied
by VM subsystem easily, esp. under the load. In this case its behaviour
might be better than the usual bounce busdma.

Also the Intel driver knows about significant set of the DMAR erratas in
the existing chipsets, although I did not updated the list recently.

There is no similar driver for AMD IOMMU.

> 
> Then there's the support for AMD's IOMMU in bhyve at
> /usr/src/sys/amd64/vmm/io/iommu.c.
Bhyve has both Intel DMAR and AMD IOMMU simple drivers, only providing
for the PCI pass-through. Intention is to switch to x86/iommu DMAR
driver eventually, at least on Intel. I did not found time to work on
this still.

> 
> Without looking too much into the details my guess is we have an iommu
> driver for Intel that Intel's i915/gvt can use but for AMD there's only the
> specific implementation for bhyve and no general driver that other clients
> can use... Or?
In-tree code in dev/drm2 does not work with GPU IOMMU turned on.  GTT code
for Intel/GEM assumes that physical == GTT address.  I strongly suspect
that dev/drm2/ati does the same for TTM buffers.

> 
> If anyone wants to work on this, it's up for grabs :)
> We'll probably have to add some glue in linuxkpi as well.
As far as I know (would be glad to appear wrong) linuxkpi does not
wrap Linux DMA KPI into our busdma, if such wrapping is ever possible.
More probably, DMA interfaces should be implemented from scratch and
perhaps use existing Intel DMAR driver as one of the substrates.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: mutex pmap not owned at ... efirt_machdep.c:255

2018-08-05 Thread Konstantin Belousov
On Sat, Aug 04, 2018 at 09:46:39PM -0500, Kyle Evans wrote:
> On Sat, Aug 4, 2018 at 3:37 AM, Konstantin Belousov  
> wrote:
> > On Fri, Aug 03, 2018 at 11:27:02PM -0500, Kyle Evans wrote:
> >> On Fri, Aug 3, 2018 at 10:10 PM, Eitan Adler  wrote:
> >> > Hi all,
> >> >
> >> > After installing the latest current kernel I get the following panic:
> >> >
> >> > panic: mutex pmap not owned at ... efirt_machdep.c:255
> >> > cpuid =3
> >> > time = 1
> >> > ...
> >> > mtx_assert()
> >> > efi_arch_enter()
> >> > efirt_modevents()
> >> > module_register_init()
> >> > mi_startup()
> >> > btext()
> >> >
> >>
> >> This seems odd- pmap lock is acquired at [1], then asserted shortly
> >> later at [2]... I avoid some of this stuff as well as I can, but is it
> >> actually possible for PCPU_GET(...) acquired curpmap to not match
> >> curthread->td_proc->p_vmspace->vm_pmap in this context?
> >>
> >> [1] 
> >> https://svnweb.freebsd.org/base/head/sys/dev/efidev/efirt.c?view=markup#l260
> >> [2] 
> >> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/efirt_machdep.c?view=markup#l254
> > There could be that curpcpu not yet synced with proc0 pmap.  It could be
> > fixed.
> >
> 
> He now gets a little further, but ends up with the same panic due to
> efirtc_probe trying to get time to verify the rtc's actually
> implemented. What kind of approach must we take to ensure curcpu is
> synced?

It does not panic for me, when I load efirt.ko from the loader prompt.
Anyway, try this

diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
index 572b2197453..f84f56b98e2 100644
--- a/sys/amd64/amd64/pmap.c
+++ b/sys/amd64/amd64/pmap.c
@@ -2655,7 +2655,7 @@ pmap_pinit0(pmap_t pmap)
__pcpu[i].pc_ucr3 = PMAP_NO_CR3;
}
}
-   PCPU_SET(curpmap, kernel_pmap);
+   PCPU_SET(curpmap, pmap);
pmap_activate(curthread);
CPU_FILL(_pmap->pm_active);
 }
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-08 Thread Konstantin Belousov
On Wed, Aug 08, 2018 at 12:30:54PM +, Rick Macklem wrote:
> Konstantin Belousov wrote:
> >On Tue, Aug 07, 2018 at 12:28:33PM +, Rick Macklem wrote:
> >> Hi,
> >>
> >> During testing of the pNFS server I get an ffs_truncate3 panic every once 
> >> in a while.
> >> A few things that might be relevant:
> >> - Seems to happen more often when soft update journaling is enabled, but 
> >> will
> >>   happen when it is disabled.
> >> - Normally happens when a fairly large subtree of the file system is being 
> >> removed.
> - Oh, and this is an old i386 with 256Mbytes (not one of them new fangled 
> computers,
>where memory is in Gbytes;-)
> 
> >>
> >> These file systems are a bit odd, since all the regular files in them are 
> >> empty but
> >> have extended attributes that are accessed during the subtree removal. (The
> >> extended attributes tell the server where the data files are.)
> >>
> >> I replaced the panic() with a printf() and every time the printf() 
> >> happens...
> >> bo->bo_dirty.bv_cnt == 0 and bo->bo_clean.bv_cnt == 1.
> >> After one of these printf()s, the system continues to run ok. When the file
> >> system is fsck'd after this has occurred, it passes fine and I haven't 
> >> seen and
> >> indication of file system corruption after running with this file system 
> >> for
> >> quite a while after the printf()s first occurred.
> >The lack of corruption is, most likely, because the files are removed.
> >Would the files truncated to zero length and then extended, I am almost
> >sure that a corruption occur.
> >
> >Can you print the only buffer on the clean queue when the panic occur ?
> ffst3 vtyp=1 bodirty=0 boclean=1
> buf at 0x428a110
> b_flags = 0x20001020, b_xflags=0x2, b_vflags=0x0
> b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0
> b_bufobj = (0xfd8ba94), b_data = 0x517, b_blkno = -1, b_lblkno = -1, 
> b_dep = 0
> b_kvabase = 0x517, b_kvasize = 32768
So the buffer was indeed for extended attrs, and never written to the disk.
I am quite interested what was the inode content prior to the truncation,
esp. the di_extsize.

Could you try to formulate a way to reproduce the panic so that Peter
can recreate it, please ?

> 
> >Also, it is interesting to know the initial length of the file.
> Since they are regular files, they are 0 length. (Just inodes with extended 
> attributes.)
> 
> >>
> >> Since the panic() only occurs when "options INVARIANTS" is enabled and I 
> >> don't
> >> see evidence of file system corruption, I'm wondering if this panic() is 
> >> valid and
> >> needed?
> 
> rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-10 Thread Konstantin Belousov
On Thu, Aug 09, 2018 at 08:38:50PM +, Rick Macklem wrote:
> >BTW, does NFS server use extended attributes ?  What for ?  Can you, please,
> >point out the code which does this ?
> For the pNFS service, there are two system namespace extended attributes for
> each file stored on the service.
> pnfsd.dsfile - Stores where the data for the file is. Can be displayed by the
>  pnfsdsfile(8) command.
> 
> pnfsd.dsattr - Cached attributes that change when a file is written (size, 
> mtime,
> change) so that the MDS doesn't have to do a Getattr on the data server for 
> every client Getattr.
> 

My reading of the nfsd code + ffs extattr handling reminds me that you
already reported this issue some time ago.  I suspected ufs_balloc() at
that time.

Now I think that the situation with the stray buffers hanging on the
queue is legitimate, ffs_extread() might create such buffer and release
it to a clean queue, then removal of the file would see inode with no
allocated ext blocks but with the buffer.

I think the easiest way to handle it is to always flush buffers and pages
in the ext attr range, regardless of the number of allocated ext blocks.
Patch below was not tested.
diff --git a/sys/ufs/ffs/ffs_inode.c b/sys/ufs/ffs/ffs_inode.c
index 3cf58558c18..2ffd861f3b4 100644
--- a/sys/ufs/ffs/ffs_inode.c
+++ b/sys/ufs/ffs/ffs_inode.c
@@ -244,22 +244,19 @@ ffs_truncate(vp, length, flags, cred)
extblocks = btodb(fragroundup(fs, ip->i_din2->di_extsize));
datablocks -= extblocks;
}
-   if ((flags & IO_EXT) && extblocks > 0) {
+   if ((flags & IO_EXT) != 0) {
if (length != 0)
panic("ffs_truncate: partial trunc of extdata");
if (softdeptrunc || journaltrunc) {
if ((flags & IO_NORMAL) == 0)
goto extclean;
needextclean = 1;
-   } else {
-   if ((error = ffs_syncvnode(vp, MNT_WAIT, 0)) != 0)
-   return (error);
+   } else if ((error = ffs_syncvnode(vp, MNT_WAIT, 0)) != 0)
+   return (error);
+   if (extblocks > 0) {
 #ifdef QUOTA
(void) chkdq(ip, -extblocks, NOCRED, 0);
 #endif
-   vinvalbuf(vp, V_ALT, 0, 0);
-   vn_pages_remove(vp,
-   OFF_TO_IDX(lblktosize(fs, -extblocks)), 0);
osize = ip->i_din2->di_extsize;
ip->i_din2->di_blocks -= extblocks;
ip->i_din2->di_extsize = 0;
@@ -278,6 +275,8 @@ ffs_truncate(vp, length, flags, cred)
vp->v_type, NULL, SINGLETON);
}
}
+   vinvalbuf(vp, V_ALT, 0, 0);
+   vn_pages_remove(vp, OFF_TO_IDX(lblktosize(fs, -UFS_NXADDR)), 0);
}
if ((flags & IO_NORMAL) == 0)
return (0);
@@ -631,7 +630,10 @@ ffs_truncate(vp, length, flags, cred)
softdep_journal_freeblocks(ip, cred, length, IO_EXT);
else
softdep_setup_freeblocks(ip, length, IO_EXT);
-   return (ffs_update(vp, waitforupdate));
+   error = ffs_update(vp, waitforupdate);
+   vinvalbuf(vp, V_ALT, 0, 0);
+   vn_pages_remove(vp, OFF_TO_IDX(lblktosize(fs, -UFS_NXADDR)), 0);
+   return (error);
 }
 
 /*
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Early kernel boot log?

2018-08-09 Thread Konstantin Belousov
On Thu, Aug 09, 2018 at 08:54:31AM +0100, Johannes Lundberg wrote:
> Hi
> 
> So I believe the reason I'm not seeing and printf output in dmesg is that
> it is too early in some functions.
> For example
> machdep.s
>  getmemsize()
>  add_efi_map_entries()
>  etc
> 
> However, these functions do contain debug printf statements so if they're
> logging to somewhere, where/how can I see this?
> 
> I also tried booting in bhyve too see if I could get any output via serial
> console but nothing there either.
Disable efi console, only leaving comconsole around, then set
debug.late_console=0
in loader.

> 
> printfs in cpu_startup() does give me output in dmesg.
> 
> Cheers
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Early kernel boot log?

2018-08-09 Thread Konstantin Belousov
On Thu, Aug 09, 2018 at 10:26:06AM +0100, Johannes Lundberg wrote:
> On Thu, Aug 9, 2018 at 9:29 AM Konstantin Belousov 
> wrote:
> 
> > On Thu, Aug 09, 2018 at 08:54:31AM +0100, Johannes Lundberg wrote:
> > > Hi
> > >
> > > So I believe the reason I'm not seeing and printf output in dmesg is that
> > > it is too early in some functions.
> > > For example
> > > machdep.s
> > >  getmemsize()
> > >  add_efi_map_entries()
> > >  etc
> > >
> > > However, these functions do contain debug printf statements so if they're
> > > logging to somewhere, where/how can I see this?
> > >
> > > I also tried booting in bhyve too see if I could get any output via
> > serial
> > > console but nothing there either.
> > Disable efi console, only leaving comconsole around, then set
> > debug.late_console=0
> > in loader.
> >
> 
> Thanks for the tip. I found the comment in machdep.c that explains this
> now.
> However, running in bhyve with
> console="comconsole" (not needed in bhyve I guess?)
> debug.late_console=0
> 
> Boot hangs after
> Booting...
> output.
> Caused by late_console=0.

That early hangs are typically due to an exception occuring before
IDT is set up and trap machinery operational.  Double-check that
there is no any early framebuffer access, as a drastic measure remove
all framebuffer drivers from your kernel config.

I do not remember, where gdb stubs added to bhyve ?  Is there a way
to inspect the vm guest state in bhyve by other means ?

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-09 Thread Konstantin Belousov
On Thu, Aug 09, 2018 at 01:39:20AM +, Rick Macklem wrote:
> Konstantin Belousov wrote:
> [stuff snipped]
> >> >Can you print the only buffer on the clean queue when the panic occur ?
> >> ffst3 vtyp=1 bodirty=0 boclean=1
> >> buf at 0x428a110
> >> b_flags = 0x20001020, b_xflags=0x2, b_vflags=0x0
> >> b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0
> >> b_bufobj = (0xfd8ba94), b_data = 0x517, b_blkno = -1, b_lblkno = -1, 
> >> b_dep = 0
> >> b_kvabase = 0x517, b_kvasize = 32768
> >So the buffer was indeed for extended attrs, and never written to the disk.
> >I am quite interested what was the inode content prior to the truncation,
> >esp. the di_extsize.
> Just in case it wasn't clear, this buffer is on the clean list and not the 
> dirty one.
> (Does this mean it somehow got onto the "clean list" without being written to 
> disk?)
> 
> >Could you try to formulate a way to reproduce the panic so that Peter
> >can recreate it, please ?
> I doubt it. It would require him doing a pNFS setup with multiple systems.
> (At least that is the only way I reproduce it and I sometimes go a week of 
> testing
>  before I see them.)
> It would be great to have more testers for the pNFS server stuff, but I doubt 
> it
> would fit into Peter's setup?
> 
> I can add printf()s anywhere you suggest, but I'm not sure how you would catch
> this case sooner? (For example, I could print out di_extsize at the beginning 
> of
> ffs_truncate(), if that would help?)
May be, add a loop at the beginning of ffs_truncate(), over all buffers
on both clean and dirty queues, calculating number of buffers with
b_lblkno < 0 and >= -UFS_NXADDR. Print some diagnostic if such buffer is
detected but di_extsize is zero.

BTW, does NFS server use extended attributes ?  What for ?  Can you, please,
point out the code which does this ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


<    4   5   6   7   8   9   10   11   12   13   >