Re: numa involved in instability and swap usage despite RAM free?

2018-06-26 Thread Shane Ambler
On 27/06/2018 07:52, Steve Kargl wrote:
> On Tue, Jun 26, 2018 at 02:39:27PM -0700, Adrian Chadd wrote:
>> On Mon, 25 Jun 2018 at 11:23, Steve Kargl
>>  wrote:
>>>
>>> On Sun, Jun 24, 2018 at 12:03:29PM +0200, Alexander Leidinger wrote:

 I don't have hard evidence, but there is enough "smell" to open up a
 discussion...

 Short:
 Can it be that enabling numa in the kernel is the reason why some
 people see instability with zfs and usage of swap while a lot of free
 RAM is available?
>>>
>>> Interesting observation.  I do have NUMA in my kernel, and swap
>>> seems to be used instead of recycling freeing inactive memory.
>>> Top shows
>>>
>>> Mem: 506M Active, 27G Inact, 98M Laundry, 2735M Wired, 1474M Buf, 1536M Free
>>> Swap: 16G Total, 120M Used, 16G Free

>From someone that has had memory issues since 10.1 (bug 194654), I have
recently realised something that seems to make some sense to me.

The arc_max setting is a limit of zfs arc and this ram gets wired to
prevent it swapping, this makes sense.

The vm.max_wired is a value that I had thought was ignored but now I see
that these are two values of wired memory which are not connected.
max_wired appears to default to 30% of kmem_size.

Both of these values are added together to be reported in
vm.stats.vm.v_wire_count which is the wired value shown by top. This
appears to be the reason that I can see 9G wired when max_wired is at 5G

The implications of this is that together (arc_max + max_wired) can be
set to more than the physical installed ram. I can verify that with 8G
installed and the two values add up to more than 7G you get no choice
but a hard reset. Since upgrading to 16G I have been more vigilant and
not allowed more than 10G to be wired so haven't had that problem in a
year and a half.

With the default arc_max usually set to ram minus 1G and max_wired at 5G
it is easy to see that the current defaults are dangerous.

I have not seen max_wired mentioned in relation to zfs but it seems that
it should be considered when setting arc_max to prevent over wiring ram.

Close to three weeks ago I applied review D7538 to my everyday desktop
running stable/11. Until a few days ago I had no swap usage which is now
at 9M. In the last few years of monitoring wired usage to try and find a
solution I have not seen less than 1G of swap usage after an hour of
uptime. If nothing else D7538 makes arc more willing to be released.


-- 
FreeBSD - the place to B...Storing Data

Shane Ambler

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: nfsd kernel threads won't die via SIGKILL

2018-06-26 Thread Rick Macklem
Konstantin Belousov wrote:
On Mon, Jun 25, 2018 at 02:04:32AM +, Rick Macklem wrote:
> Konstantin Belousov wrote:
> >On Sat, Jun 23, 2018 at 09:03:02PM +, Rick Macklem wrote:
> >> During testing of the pNFS server I have been frequently 
> >> killing/restarting the nfsd.
> >> Once in a while, the "slave" nfsd process doesn't terminate and a "ps 
> >> axHl" shows:
> >>   0 48889 1   0   20  0  5884  812 svcexit  D -   0:00.01 nfsd: 
> >> server
> >>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
> >> server
> >> ... more of the same
> >>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
> >> server
> >>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   1:51.78 nfsd: 
> >> server
> >>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   2:27.75 nfsd: 
> >> server
> >>
> >> You can see that the top thread (the one that was created with the 
> >> process) is
> >> stuck in "D"  on "svcexit".
> >> The rest of the threads are still servicing NFS RPCs.
[lots of stuff snipped]
>Signals are put onto a signal queue between a time where the signal is
>generated until the thread actually consumes it.  I.e. the signal queue
>is a container for the signals which are not yet acted upon.  There is
>one signal queue per process, and one signal queue for each thread
>belonging to the process.  When you signal the process, the signal is
>put into some thread' signal queue, where the only criteria for the
>selection of the thread is that the signal is not blocked.  Since
>SIGKILL is never blocked, it is put anywhere.
>
>Until signal is delivered by cursig()/postsig() loop, typically at the
>AST handler, the only consequence of its presence are the EINTR/ERESTART
>errors returned from the PCATCH-enabled sleeps.
Ok, now I think I understand how this works. Thanks a lot for the explanation.

> >Your description at the start of the message of the behaviour after
> >SIGKILL, where other threads continued to serve RPCs, exactly matches
> >above explanation. You need to add some global 'stop' flag, if it is not
I looked at the code and there is already basically a "global stop flag".
It's done by setting the sg_state variable to CLOSING for all thread groups
in a function called svc_exit().  (I missed this when I looked before, so I
didn't understand how all the threads normally terminate.)

So, when I looked at svc_run_internal(), there is a loop while (state != 
closing)
that calls cv_wait_sig()/cv_timedwait_sig() and when these return EINTR/ERESTART
the call to svc_exit() is done to make the threads all return from the function.
--> The only way in can get into the broken situation I see sometimes is if the
  top thread (called "ismaster" by the code) somehow returns from
  svc_run_internal() without calling svc_exit(), so that the state isn't 
set to
  "closing".

  Turns out there is only one place this can happen. It's this line:
   if (grp->sg_threadcount > grp->sg_maxthreads)
break;
  I wouldn't have thought that sg_threadcount would have become ">" than
  sg_maxthreads, but when I looked at the output of "ps" that I pasted into
  the first message, there are 33 threads. (When I started the nfsd, I 
specified
  32 threads, so I think it did the "break;" at this place to get out of 
the loop
  and return from svc_run_internal() without calling svc_exit().)

  I think changing the above line to:
 if (!ismaster && grp->sg_threadcount > grp->sg_maxthreads)
  will fix it.

  I'll test this and see if I can get it to fail.

Thanks again for your help, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: make installworld fails on install: process-control: No such file or directory

2018-06-26 Thread Alan Somers
Still works for me with a clean build.  Is it still failing for you?

On Tue, Jun 26, 2018 at 3:08 PM, Alan Somers  wrote:

> It works for me (though I didn't do a clean build).  Can you please svn up
> and try again?  I'll do a clean build in the meantime.
>
>
> On Tue, Jun 26, 2018 at 2:54 PM, Lars Schotte  wrote:
>
>> make installworld fails on install: process-control: No such file or
>> directory somewhere around Revision: 335679 of ^/head.
>>
>> --
>>  Lars Schotte
>>  Mudroňova 13
>> 92101 Piešťany
>> ___
>> freebsd-current@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org
>> "
>>
>
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: numa involved in instability and swap usage despite RAM free?

2018-06-26 Thread Steve Kargl
On Tue, Jun 26, 2018 at 02:39:27PM -0700, Adrian Chadd wrote:
> On Mon, 25 Jun 2018 at 11:23, Steve Kargl
>  wrote:
> >
> > On Sun, Jun 24, 2018 at 12:03:29PM +0200, Alexander Leidinger wrote:
> > >
> > > I don't have hard evidence, but there is enough "smell" to open up a
> > > discussion...
> > >
> > > Short:
> > > Can it be that enabling numa in the kernel is the reason why some
> > > people see instability with zfs and usage of swap while a lot of free
> > > RAM is available?
> >
> > Interesting observation.  I do have NUMA in my kernel, and swap
> > seems to be used instead of recycling freeing inactive memory.
> > Top shows
> >
> > Mem: 506M Active, 27G Inact, 98M Laundry, 2735M Wired, 1474M Buf, 1536M Free
> > Swap: 16G Total, 120M Used, 16G Free
> >
> > Perhaps, I don't understand what is meant by inactive memory.  I
> > thought that this means memory is still available in the buffer
> > cache, but nothing is current using what is there.
> >
> 
> Aren't there now per-domain VM counters you can query via sysctl?
> Maybe they'd help in diagnosing what's going on.
> 

I upgraded to a r335642 yesterday.  I haven't seen the swapping
problem, yet; although I've tried to force it.  There are 158
sysctl knobs that contain the string "vm".  Do you have a pointer
any particular one to monitor?

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: numa involved in instability and swap usage despite RAM free?

2018-06-26 Thread Adrian Chadd
Hi,

Aren't there now per-domain VM counters you can query via sysctl?
Maybe they'd help in diagnosing what's going on.



-adrian

On Mon, 25 Jun 2018 at 11:23, Steve Kargl
 wrote:
>
> On Sun, Jun 24, 2018 at 12:03:29PM +0200, Alexander Leidinger wrote:
> >
> > I don't have hard evidence, but there is enough "smell" to open up a
> > discussion...
> >
> > Short:
> > Can it be that enabling numa in the kernel is the reason why some
> > people see instability with zfs and usage of swap while a lot of free
> > RAM is available?
>
> Interesting observation.  I do have NUMA in my kernel, and swap
> seems to be used instead of recycling freeing inactive memory.
> Top shows
>
> Mem: 506M Active, 27G Inact, 98M Laundry, 2735M Wired, 1474M Buf, 1536M Free
> Swap: 16G Total, 120M Used, 16G Free
>
> Perhaps, I don't understand what is meant by inactive memory.  I
> thought that this means memory is still available in the buffer
> cache, but nothing is current using what is there.
>
> --
> Steve
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: make installworld fails on install: process-control: No such file or directory

2018-06-26 Thread Alan Somers
It works for me (though I didn't do a clean build).  Can you please svn up
and try again?  I'll do a clean build in the meantime.

On Tue, Jun 26, 2018 at 2:54 PM, Lars Schotte  wrote:

> make installworld fails on install: process-control: No such file or
> directory somewhere around Revision: 335679 of ^/head.
>
> --
>  Lars Schotte
>  Mudroňova 13
> 92101 Piešťany
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


make installworld fails on install: process-control: No such file or directory

2018-06-26 Thread Lars Schotte
make installworld fails on install: process-control: No such file or
directory somewhere around Revision: 335679 of ^/head.

-- 
 Lars Schotte
 Mudroňova 13
92101 Piešťany
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r335399: . . . head/sys/security/mac_veriexec/ . . . breaks ci.freebsg.org builds of FreeBSD-head-{armv6,ar,m7,i386,mips,powerpc,powerpcspe}-build

2018-06-26 Thread Lars Schotte
Yes, add amd64 to it. I get 
install: mac_veriexec.ko: No such file or directory
when I do installkernel.

On Tue, 19 Jun 2018 19:33:17 -0700
Mark Millard  wrote:

> Stephen J. Kiernan stevek at FreeBSD.org 
> Wed Jun 20 00:41:33 UTC 2018
> 
> 
> Author: stevek
> Date: Wed Jun 20 00:41:30 2018
> New Revision: 335399
> URL: 
> https://svnweb.freebsd.org/changeset/base/335399
> 
> 
> Log:
>   MAC/veriexec implements a verified execution environment using the
> MAC framework.
> 
> . . .
> 
> 
> But the logs on ci.freebsd.prg show for -r335399 and later for
> FreeBSD-head-{armv6,ar,m7,i386,mips,powerpc,powerpcspe}-build
> messages like:
> 
> 
> --- all_subdir_mac_veriexec ---
> cc1: warnings being treated as errors
> /usr/src/sys/security/mac_veriexec/veriexec_fingerprint.c: In
> function
> 'identify_error': 
> /usr/src/sys/security/mac_veriexec/veriexec_fingerprint.c:115:
> warning: format '%lu' expects type 'long unsigned int', but argument
> 5 has type
> 'dev_t' [-Wformat] 
> /usr/src/sys/security/mac_veriexec/veriexec_fingerprint.c:115:
> warning: format '%lu' expects type 'long unsigned int', but argument
> 6 has type 'ino_t' [-Wformat] . . . --- all_subdir_mac_veriexec ---
> *** [veriexec_fingerprint.o] Error code 1
> 
> And, as a result, those builds fail on ci.freebsd.org .
> 
> Basically the 32-bit architectures fail and the 64-bit
> ones do not (for the same code).
> 
> 
> I've not checked the later *veriex* related check-ins:
> 
> -r335400
> -r335401
> -r335402
> 
> for possible similar problems.
> 
> ===
> Mark Millard
> marklmi at yahoo.com
> ( dsl-only.net went
> away in early 2018-Mar)
> 
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to
> "freebsd-current-unsubscr...@freebsd.org"



-- 
 Lars Schotte
 Mudroňova 13
92101 Piešťany
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: error building clang in HEAD

2018-06-26 Thread Bryan Drewery
On 6/26/2018 12:40 PM, Kevin Oberman wrote:
> On Tue, Jun 26, 2018 at 11:00 AM, Gary Jennejohn  > wrote:
> 
> On Tue, 26 Jun 2018 18:24:12 +0200
> Gary Jennejohn mailto:gljennj...@gmail.com>>
> wrote:
> 
> > On Mon, 25 Jun 2018 11:28:18 -0700
> > Bryan Drewery  wrote:
> >
> > > On 6/24/2018 12:57 AM, Gary Jennejohn wrote: 
> > > > On Sat, 23 Jun 2018 17:05:16 +0200
> > > > Dimitry Andric  wrote:
> > > >     
> > > >> On 23 Jun 2018, at 15:40, Gary Jennejohn
> mailto:gljennj...@gmail.com>> wrote:   
> > > >>>
> > > >>> There is a strange error building clang with this use case:
> > > >>>
> > > >>> cd /usr/src
> > > >>> make -j10 makeworld     
> > > >>
> > > >> What's the "makeworld" target?  I've not heard of this.
> > > >>   
> > > >
> > > > A typo.  I meant buildowrld.
> > > >     
> > > >>> which produces this error output:
> > > >>>       
> > > >>> ===> lib/clang/libclang (all)     
> > > >>> error: unable to rename temporary
> 'Sema/SemaTemplate-12ad7e30.o.tmp' to output file
> 'Sema/SemaTemplate.o': 'No such file or directory'
> > > >>> 1 error generated.
> > > >>> --- Sema/SemaTemplate.o ---
> > > >>> *** [Sema/SemaTemplate.o] Error code 1     
> > > >>
> > > >> This typically happens if "make obj" was not run before the
> rest of the
> > > >> make targets.  Normally, the order is: make obj, then make
> depend, then
> > > >> make (a.k.a. make all).
> > > >>
> > > >> Is there a directory /usr/obj/usr/src/lib/libclang/Sema ?
> > > >>   
> > > >
> > > > Well, I would hope/expect that make buildworld does make obj.
> > > >
> > > > Yes, the directory was there.
> > > >     
> > >
> > > Actually neither 'make obj' nor 'make depend' is done or needed
> anymore
> > > in buildworld.
> > >
> > > The directory above is incorrect, please check for
> > >
> > >     /usr/obj/usr/src/amd64.amd64/lib/clang/libclang/Sema
> > >   
> >
> > Well, now everything is there because I ran a buildworld without -j.
> >
> > > Do you have another Makefile or script that is executing
> > > buildworld for you?
> > >   
> >
> > No, I use a bash alias named mw:
> > mw is aliased to `pushd /usr/src;time make -s -j$NCPU buildworld;popd'
> >
> > NCPU is defined as 10.
> >
> > > What's in your src.conf and make.conf?
> > >   
> >
> > The only changes I made recently were to /etc/src.conf when I added:
> >
> > WITHOUT_LLVM_TARGET_AARCH64=yes
> > WITHOUT_LLVM_TARGET_ARM=ys
> > WITHOUT_LLVM_TARGET_MIPS=yes
> > WITHOUT_LLVM_TARGET_POWERPC=yes
> > WITHOUT_LLVM_TARGET_SPARC=yes
> > WITH_LLVM_TARGET_X86=yes
> >
> > Otherwise, I haven't touched src.conf or make.conf in  a long time.
> >
> 
> I removed some old cruft from src.conf and now make -j10 buildworld is
> succeeding, even after rm -rf /usr/obj/usr.
> 
> Thanks for pointing me in the right direction.
> 
> -- 
> Gary Jennejohn
> 
> 
> I'd like to hear what triggered this as removing unneeded LLVM targets
> seems like a good idea if you know that you won't need them. Building

I don't think the options are related to the build error.

> LLVM takes a long time on my 7+ year old, memory constrained (8G)
> system. Anything that reduces that time would be nice.

By the way, before these options get out of hand...

I am adding a new WITHOUT_LLVM_TARGET_ALL option to more easily disable
unneeded targets which will be simpler for user maintenance.

And I am going to make buildworld automatically disable unneeded targets
for the bootstrap compiler. For the installed compiler it will still
default to all targets. If targets are disabled then SYSTEM_COMPILER
logic will fail when cross-building and you will need to build another
bootstrap compiler. Something to keep in mind.


> --
> Kevin Oberman, Part time kid herder and retired Network Engineer
> E-mail: rkober...@gmail.com 
> PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
> 


-- 
Regards,
Bryan Drewery



signature.asc
Description: OpenPGP digital signature


Re: error building clang in HEAD

2018-06-26 Thread Kevin Oberman
On Tue, Jun 26, 2018 at 11:00 AM, Gary Jennejohn 
wrote:

> On Tue, 26 Jun 2018 18:24:12 +0200
> Gary Jennejohn  wrote:
>
> > On Mon, 25 Jun 2018 11:28:18 -0700
> > Bryan Drewery  wrote:
> >
> > > On 6/24/2018 12:57 AM, Gary Jennejohn wrote:
> > > > On Sat, 23 Jun 2018 17:05:16 +0200
> > > > Dimitry Andric  wrote:
> > > >
> > > >> On 23 Jun 2018, at 15:40, Gary Jennejohn 
> wrote:
> > > >>>
> > > >>> There is a strange error building clang with this use case:
> > > >>>
> > > >>> cd /usr/src
> > > >>> make -j10 makeworld
> > > >>
> > > >> What's the "makeworld" target?  I've not heard of this.
> > > >>
> > > >
> > > > A typo.  I meant buildowrld.
> > > >
> > > >>> which produces this error output:
> > > >>>
> > > >>> ===> lib/clang/libclang (all)
> > > >>> error: unable to rename temporary 'Sema/SemaTemplate-12ad7e30.o.tmp'
> to output file 'Sema/SemaTemplate.o': 'No such file or directory'
> > > >>> 1 error generated.
> > > >>> --- Sema/SemaTemplate.o ---
> > > >>> *** [Sema/SemaTemplate.o] Error code 1
> > > >>
> > > >> This typically happens if "make obj" was not run before the rest of
> the
> > > >> make targets.  Normally, the order is: make obj, then make depend,
> then
> > > >> make (a.k.a. make all).
> > > >>
> > > >> Is there a directory /usr/obj/usr/src/lib/libclang/Sema ?
> > > >>
> > > >
> > > > Well, I would hope/expect that make buildworld does make obj.
> > > >
> > > > Yes, the directory was there.
> > > >
> > >
> > > Actually neither 'make obj' nor 'make depend' is done or needed anymore
> > > in buildworld.
> > >
> > > The directory above is incorrect, please check for
> > >
> > > /usr/obj/usr/src/amd64.amd64/lib/clang/libclang/Sema
> > >
> >
> > Well, now everything is there because I ran a buildworld without -j.
> >
> > > Do you have another Makefile or script that is executing
> > > buildworld for you?
> > >
> >
> > No, I use a bash alias named mw:
> > mw is aliased to `pushd /usr/src;time make -s -j$NCPU buildworld;popd'
> >
> > NCPU is defined as 10.
> >
> > > What's in your src.conf and make.conf?
> > >
> >
> > The only changes I made recently were to /etc/src.conf when I added:
> >
> > WITHOUT_LLVM_TARGET_AARCH64=yes
> > WITHOUT_LLVM_TARGET_ARM=ys
> > WITHOUT_LLVM_TARGET_MIPS=yes
> > WITHOUT_LLVM_TARGET_POWERPC=yes
> > WITHOUT_LLVM_TARGET_SPARC=yes
> > WITH_LLVM_TARGET_X86=yes
> >
> > Otherwise, I haven't touched src.conf or make.conf in  a long time.
> >
>
> I removed some old cruft from src.conf and now make -j10 buildworld is
> succeeding, even after rm -rf /usr/obj/usr.
>
> Thanks for pointing me in the right direction.
>
> --
> Gary Jennejohn
>

I'd like to hear what triggered this as removing unneeded LLVM targets
seems like a good idea if you know that you won't need them. Building LLVM
takes a long time on my 7+ year old, memory constrained (8G) system.
Anything that reduces that time would be nice.
--
Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: rkober...@gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r335672 - head/sys/modules [ broke ci.freebsd.org FreeBSD-head-{powerpcspe,mips,mips64,powerpc,armv6,armv7}-build ]

2018-06-26 Thread Mark Millard
> Author: emaste
> Date: Tue Jun 26 16:50:41 2018
> New Revision: 335672
> URL: 
> https://svnweb.freebsd.org/changeset/base/335672
> 
> 
> Log:
>   Build linprocfs and linsysfs also on arm64
>   
>   Sponsored by:   Turing Robotic Industries
> 
> . . .


https://ci.freebsd.org/job/FreeBSD-head-powerpcspe-build/6487/consoleText
(a gcc 4.2.1 32-bit target example):

--- all_subdir_linprocfs ---
/usr/src/sys/compat/linprocfs/linprocfs.c: In function 'linprocfs_doprocstat':
/usr/src/sys/compat/linprocfs/linprocfs.c:747: warning: format '%ld' expects 
type 'long int', but argument 3 has type 'time_t' [-Wformat]
/usr/src/sys/compat/linprocfs/linprocfs.c:748: warning: format '%ld' expects 
type 'long int', but argument 3 has type 'time_t' [-Wformat]
/usr/src/sys/compat/linprocfs/linprocfs.c:749: warning: format '%ld' expects 
type 'long int', but argument 3 has type 'time_t' [-Wformat]
/usr/src/sys/compat/linprocfs/linprocfs.c:750: warning: format '%ld' expects 
type 'long int', but argument 3 has type 'time_t' [-Wformat]
/usr/src/sys/compat/linprocfs/linprocfs.c:755: warning: format '%lu' expects 
type 'long unsigned int', but argument 3 has type 'time_t' [-Wformat]
--- all_subdir_libiconv ---
ctfmerge -L VERSION -g -o libiconv.kld iconv.o iconv_ucs.o iconv_xlat.o 
iconv_xlat16.o iconv_converter_if.o
--- all_subdir_linprocfs ---
*** [linprocfs.o] Error code 1


https://ci.freebsd.org/job/FreeBSD-head-armv7-build/444/consoleText
(32-bit clang example):

--- all_subdir_linprocfs ---
/usr/src/sys/compat/linprocfs/linprocfs.c:747:26: error: format specifies type 
'long' but the argument has type 'long long' [-Werror,-Wformat]
PS_ADD("utime", "%ld",  TV2J(_rusage.ru_utime));
^
 %lld
/usr/src/sys/compat/linprocfs/linprocfs.c:122:17: note: expanded from macro 
'TV2J'
#define TV2J(x) ((x)->tv_sec * 100UL + (x)->tv_usec / 1)
^
/usr/src/sys/compat/linprocfs/linprocfs.c:723:57: note: expanded from macro 
'PS_ADD'
#define PS_ADD(name, fmt, arg) sbuf_printf(sb, " " fmt, arg)
^~~
/usr/src/sys/compat/linprocfs/linprocfs.c:748:26: error: format specifies type 
'long' but the argument has type 'long long' [-Werror,-Wformat]
PS_ADD("stime", "%ld",  TV2J(_rusage.ru_stime));
^
 %lld
/usr/src/sys/compat/linprocfs/linprocfs.c:122:17: note: expanded from macro 
'TV2J'
#define TV2J(x) ((x)->tv_sec * 100UL + (x)->tv_usec / 1)
^
/usr/src/sys/compat/linprocfs/linprocfs.c:723:57: note: expanded from macro 
'PS_ADD'
#define PS_ADD(name, fmt, arg) sbuf_printf(sb, " " fmt, arg)
^~~
/usr/src/sys/compat/linprocfs/linprocfs.c:749:26: error: format specifies type 
'long' but the argument has type 'long long' [-Werror,-Wformat]
PS_ADD("cutime","%ld",  TV2J(_rusage_ch.ru_utime));
^~~~
 %lld
/usr/src/sys/compat/linprocfs/linprocfs.c:122:17: note: expanded from macro 
'TV2J'
#define TV2J(x) ((x)->tv_sec * 100UL + (x)->tv_usec / 1)
^
/usr/src/sys/compat/linprocfs/linprocfs.c:723:57: note: expanded from macro 
'PS_ADD'
#define PS_ADD(name, fmt, arg) sbuf_printf(sb, " " fmt, arg)
^~~
/usr/src/sys/compat/linprocfs/linprocfs.c:750:26: error: format specifies type 
'long' but the argument has type 'long long' [-Werror,-Wformat]
PS_ADD("cstime","%ld",  TV2J(_rusage_ch.ru_stime));
^~~~
 %lld
/usr/src/sys/compat/linprocfs/linprocfs.c:122:17: note: expanded from macro 
'TV2J'
#define TV2J(x) ((x)->tv_sec * 100UL + (x)->tv_usec / 1)
^
/usr/src/sys/compat/linprocfs/linprocfs.c:723:57: note: expanded from macro 
'PS_ADD'
#define PS_ADD(name, fmt, arg) sbuf_printf(sb, " " fmt, arg)
^~~
/usr/src/sys/compat/linprocfs/linprocfs.c:755:29: error: format specifies type 
'unsigned long' but the argument has type 'long long' [-Werror,-Wformat]
PS_ADD("starttime", "%lu",  TV2J(_start) - TV2J());
^
 %lld
/usr/src/sys/compat/linprocfs/linprocfs.c:122:17: note: expanded from macro 
'TV2J'
#define TV2J(x) ((x)->tv_sec * 100UL + (x)->tv_usec / 1)
^
/usr/src/sys/compat/linprocfs/linprocfs.c:723:57: note: expanded from macro 
'PS_ADD'
#define PS_ADD(name, fmt, arg) sbuf_printf(sb, " " fmt, arg)
^~~
. . .
--- all_subdir_linprocfs 

Re: error building clang in HEAD

2018-06-26 Thread Gary Jennejohn
On Tue, 26 Jun 2018 18:24:12 +0200
Gary Jennejohn  wrote:

> On Mon, 25 Jun 2018 11:28:18 -0700
> Bryan Drewery  wrote:
> 
> > On 6/24/2018 12:57 AM, Gary Jennejohn wrote:  
> > > On Sat, 23 Jun 2018 17:05:16 +0200
> > > Dimitry Andric  wrote:
> > > 
> > >> On 23 Jun 2018, at 15:40, Gary Jennejohn  wrote:   
> > >>  
> > >>>
> > >>> There is a strange error building clang with this use case:
> > >>>
> > >>> cd /usr/src
> > >>> make -j10 makeworld  
> > >>
> > >> What's the "makeworld" target?  I've not heard of this.
> > >>
> > > 
> > > A typo.  I meant buildowrld.
> > > 
> > >>> which produces this error output:
> > >>>   
> > >>> ===> lib/clang/libclang (all)  
> > >>> error: unable to rename temporary 'Sema/SemaTemplate-12ad7e30.o.tmp' to 
> > >>> output file 'Sema/SemaTemplate.o': 'No such file or directory'
> > >>> 1 error generated.
> > >>> --- Sema/SemaTemplate.o ---
> > >>> *** [Sema/SemaTemplate.o] Error code 1  
> > >>
> > >> This typically happens if "make obj" was not run before the rest of the
> > >> make targets.  Normally, the order is: make obj, then make depend, then
> > >> make (a.k.a. make all).
> > >>
> > >> Is there a directory /usr/obj/usr/src/lib/libclang/Sema ?
> > >>
> > > 
> > > Well, I would hope/expect that make buildworld does make obj.
> > > 
> > > Yes, the directory was there.
> > > 
> > 
> > Actually neither 'make obj' nor 'make depend' is done or needed anymore
> > in buildworld.
> > 
> > The directory above is incorrect, please check for
> > 
> > /usr/obj/usr/src/amd64.amd64/lib/clang/libclang/Sema
> >   
> 
> Well, now everything is there because I ran a buildworld without -j.
> 
> > Do you have another Makefile or script that is executing
> > buildworld for you?
> >   
> 
> No, I use a bash alias named mw:
> mw is aliased to `pushd /usr/src;time make -s -j$NCPU buildworld;popd'
> 
> NCPU is defined as 10.
> 
> > What's in your src.conf and make.conf?
> >   
> 
> The only changes I made recently were to /etc/src.conf when I added:
> 
> WITHOUT_LLVM_TARGET_AARCH64=yes
> WITHOUT_LLVM_TARGET_ARM=ys
> WITHOUT_LLVM_TARGET_MIPS=yes
> WITHOUT_LLVM_TARGET_POWERPC=yes
> WITHOUT_LLVM_TARGET_SPARC=yes
> WITH_LLVM_TARGET_X86=yes
> 
> Otherwise, I haven't touched src.conf or make.conf in  a long time.
> 

I removed some old cruft from src.conf and now make -j10 buildworld is
succeeding, even after rm -rf /usr/obj/usr.

Thanks for pointing me in the right direction.

-- 
Gary Jennejohn
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Ryzen public erratas

2018-06-26 Thread Gary Jennejohn
On Tue, 26 Jun 2018 07:05:22 -0700
Eitan Adler  wrote:

> On 19 June 2018 at 02:50, Gary Jennejohn  wrote:
> > On Mon, 18 Jun 2018 22:44:13 -0700
> > Eitan Adler  wrote:
> >  
> >> On 13 June 2018 at 04:16, Eitan Adler  wrote:  
> >> > On 13 June 2018 at 03:35, Konstantin Belousov  
> >> > wrote:  
> >> >> Today I noted that AMD published the public errata document for Ryzens,
> >> >> https://developer.amd.com/wp-content/resources/55449_1.12.pdf
> >> >>
> >> >> Some of the issues listed there looks quite relevant to the potential
> >> >> hangs that some people still experience with the machines.  I wrote
> >> >> a script which should apply the recommended workarounds to the erratas
> >> >> that I find interesting.
> >> >>
> >> >> To run it, kldload cpuctl, then apply the latest firmware update to your
> >> >> CPU, then run the following shell script.  Comments indicate the errata
> >> >> number for the workarounds.
> >> >>
> >> >> Please report the results.  If the script helps, I will code the kernel
> >> >> change to apply the workarounds.
> >> >>
> >> >> #!/bin/sh
> >> >>
> >> >> # Enable workarounds for erratas listed in
> >> >> # https://developer.amd.com/wp-content/resources/55449_1.12.pdf
> >> >>
> >> >> # 1057, 1109
> >> >> sysctl machdep.idle_mwait=0
> >> >> sysctl machdep.idle=hlt  
> >> >
> >> >
> >> > Is this needed if it was previously machdep.idle: acpi ?  
> >>
> >> This might explain why I've never seen the lockup issues mentioned by
> >> other people. What would cause my machine to differ from others?
> >>  
> >
> > I had sysctl machdep.idle_mwait=1 and machdep.idle=acpi before
> > applying the shell script.  I had multiple lockups every week,
> > sometimes multiple lockups per day.  
> 
> This makes me curious about why I didn't experience lockups.  Perhaps my
> BIOS defaulted to something else?
> 
> With these settings:
> 
> machdep.idle: acpi
> machdep.idle_mwait: 1
> 

I can only say that after updating the processor's microcde and
applying the errata script my system runs much more stabily.  No
lockups for days.

I suspect that updating the microcode helped quite a bit.

I have a first-generation Ryzen 5 1600 with all the errata.

-- 
Gary Jennejohn
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: error building clang in HEAD

2018-06-26 Thread Gary Jennejohn
On Mon, 25 Jun 2018 11:28:18 -0700
Bryan Drewery  wrote:

> On 6/24/2018 12:57 AM, Gary Jennejohn wrote:
> > On Sat, 23 Jun 2018 17:05:16 +0200
> > Dimitry Andric  wrote:
> >   
> >> On 23 Jun 2018, at 15:40, Gary Jennejohn  wrote:  
> >>>
> >>> There is a strange error building clang with this use case:
> >>>
> >>> cd /usr/src
> >>> make -j10 makeworld
> >>
> >> What's the "makeworld" target?  I've not heard of this.
> >>  
> > 
> > A typo.  I meant buildowrld.
> >   
> >>> which produces this error output:
> >>> 
> >>> ===> lib/clang/libclang (all)
> >>> error: unable to rename temporary 'Sema/SemaTemplate-12ad7e30.o.tmp' to 
> >>> output file 'Sema/SemaTemplate.o': 'No such file or directory'
> >>> 1 error generated.
> >>> --- Sema/SemaTemplate.o ---
> >>> *** [Sema/SemaTemplate.o] Error code 1
> >>
> >> This typically happens if "make obj" was not run before the rest of the
> >> make targets.  Normally, the order is: make obj, then make depend, then
> >> make (a.k.a. make all).
> >>
> >> Is there a directory /usr/obj/usr/src/lib/libclang/Sema ?
> >>  
> > 
> > Well, I would hope/expect that make buildworld does make obj.
> > 
> > Yes, the directory was there.
> >   
> 
> Actually neither 'make obj' nor 'make depend' is done or needed anymore
> in buildworld.
> 
> The directory above is incorrect, please check for
> 
> /usr/obj/usr/src/amd64.amd64/lib/clang/libclang/Sema
> 

Well, now everything is there because I ran a buildworld without -j.

> Do you have another Makefile or script that is executing
> buildworld for you?
> 

No, I use a bash alias named mw:
mw is aliased to `pushd /usr/src;time make -s -j$NCPU buildworld;popd'

NCPU is defined as 10.

> What's in your src.conf and make.conf?
> 

The only changes I made recently were to /etc/src.conf when I added:

WITHOUT_LLVM_TARGET_AARCH64=yes
WITHOUT_LLVM_TARGET_ARM=ys
WITHOUT_LLVM_TARGET_MIPS=yes
WITHOUT_LLVM_TARGET_POWERPC=yes
WITHOUT_LLVM_TARGET_SPARC=yes
WITH_LLVM_TARGET_X86=yes

Otherwise, I haven't touched src.conf or make.conf in  a long time.

I'll try commenting these out and see what happens.

-- 
Gary Jennejohn
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Ryzen public erratas

2018-06-26 Thread Eitan Adler
On 19 June 2018 at 02:50, Gary Jennejohn  wrote:
> On Mon, 18 Jun 2018 22:44:13 -0700
> Eitan Adler  wrote:
>
>> On 13 June 2018 at 04:16, Eitan Adler  wrote:
>> > On 13 June 2018 at 03:35, Konstantin Belousov  wrote:
>> >> Today I noted that AMD published the public errata document for Ryzens,
>> >> https://developer.amd.com/wp-content/resources/55449_1.12.pdf
>> >>
>> >> Some of the issues listed there looks quite relevant to the potential
>> >> hangs that some people still experience with the machines.  I wrote
>> >> a script which should apply the recommended workarounds to the erratas
>> >> that I find interesting.
>> >>
>> >> To run it, kldload cpuctl, then apply the latest firmware update to your
>> >> CPU, then run the following shell script.  Comments indicate the errata
>> >> number for the workarounds.
>> >>
>> >> Please report the results.  If the script helps, I will code the kernel
>> >> change to apply the workarounds.
>> >>
>> >> #!/bin/sh
>> >>
>> >> # Enable workarounds for erratas listed in
>> >> # https://developer.amd.com/wp-content/resources/55449_1.12.pdf
>> >>
>> >> # 1057, 1109
>> >> sysctl machdep.idle_mwait=0
>> >> sysctl machdep.idle=hlt
>> >
>> >
>> > Is this needed if it was previously machdep.idle: acpi ?
>>
>> This might explain why I've never seen the lockup issues mentioned by
>> other people. What would cause my machine to differ from others?
>>
>
> I had sysctl machdep.idle_mwait=1 and machdep.idle=acpi before
> applying the shell script.  I had multiple lockups every week,
> sometimes multiple lockups per day.

This makes me curious about why I didn't experience lockups.  Perhaps my
BIOS defaulted to something else?

With these settings:

machdep.idle: acpi
machdep.idle_mwait: 1

-- 
Eitan Adler
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: unloading pf causes desktop system to freeze since ~ r335651 [coredump]

2018-06-26 Thread Dave Cottlehuber
On Mon, 25 Jun 2018, at 23:08, Dave Cottlehuber wrote:
> [cross-posting for advice on general debugging + network-specific thoughts]

The HPET NMI watchdog patch was very timely - works a treat: 
https://reviews.freebsd.org/D15630

> However each time there's no crashdump, & the usual ctrl-alt-esc does't 
> work either.

I bumped my /usr/src to latest HEAD, applied HPET NMI watchdog hack &
after freezing via `service pf stop`, I was rewarded with a coredump
on next reboot; full log: https://git.io/f4Q4P

[1202] panic: Assertion !in_epoch() && 
!mtx_owned(&(&(*(__typeof(vnet_entry_tcbinfo)*) 
(__curthread())->td_vnet))->vnet_data_base) + 
(uintptr_t)_entry_tcbinfo)))->ipi_lock) failed at 
/usr/src/sys/netinet/tcp_input.c:802
[1202] cpuid = 4
[1202] time = 1529997533
[1202] KDB: stack backtrace:
[1202] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe0077ddc4a0
[1202] vpanic() at vpanic+0x1a3/frame 0xfe0077ddc500
[1202] doadump() at doadump/frame 0xfe0077ddc580
[1202] tcp_input() at tcp_input+0x940/frame 0xfe0077ddc6c0
[1202] ip_input() at ip_input+0x3f7/frame 0xfe0077ddc720
[1202] netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 
0xfe0077ddc780
[1202] ether_demux() at ether_demux+0x16e/frame 0xfe0077ddc7b0
[1202] ether_nh_input() at ether_nh_input+0x402/frame 0xfe0077ddc820
[1202] netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 
0xfe0077ddc880
[1202] ether_input() at ether_input+0x8f/frame 0xfe0077ddc8c0
[1202] iflib_rxeof() at iflib_rxeof+0xcce/frame 0xfe0077ddc9b0
[1202] _task_fn_rx() at _task_fn_rx+0x7f/frame 0xfe0077ddc9f0
[1202] gtaskqueue_run_locked() at gtaskqueue_run_locked+0x139/frame 
0xfe0077ddca40
[1202] gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame 
0xfe0077ddca70
[1202] fork_exit() at fork_exit+0x84/frame 0xfe0077ddcab0
[1202] fork_trampoline() at fork_trampoline+0xe/frame 0xfe0077ddcab0
[1202] --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
[1202] KDB: enter: panic


db:0:kdb.enter.panic>  run lockinfo
db:1:lockinfo> show locks
db:1:lockinfo>  show alllocks
Process 12789 (h2o) thread 0xf8020dd4f580 (101673)
Process 17635 (pflogd) thread 0xf8017c91c580 (101027)
db:1:lockinfo>  show lockedvnods
Locked vnodes
db:0:kdb.enter.panic>  show pcpu
cpuid= 4
dynamic pcpu = 0xfe00848dd8c0
curthread= 0xf801067f2000: pid 0 tid 100029 "if_io_tqg_4"
curpcb   = 0xfe0077ddcb80
fpcurthread  = none
idlethread   = 0xf80106796000: tid 17 "idle: cpu4"
curpmap  = 0x81ffbe08
tssp = 0x82066ac0
commontssp   = 0x82066ac0
rsp0 = 0xfe0077ddcb80
gs32p= 0x8206d6f8
ldt  = 0x8206d738
tss  = 0x8206d728
curvnet  = 0xf801000ca0c0
spin locks held:
db:0:kdb.enter.panic>  bt
Tracing pid 0 tid 100029 td 0xf801067f2000
kdb_enter() at kdb_enter+0x3b/frame 0xfe0077ddc4a0
vpanic() at vpanic+0x1c0/frame 0xfe0077ddc500
doadump() at doadump/frame 0xfe0077ddc580
tcp_input() at tcp_input+0x940/frame 0xfe0077ddc6c0
ip_input() at ip_input+0x3f7/frame 0xfe0077ddc720
netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfe0077ddc780
ether_demux() at ether_demux+0x16e/frame 0xfe0077ddc7b0
ether_nh_input() at ether_nh_input+0x402/frame 0xfe0077ddc820
netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfe0077ddc880
ether_input() at ether_input+0x8f/frame 0xfe0077ddc8c0
iflib_rxeof() at iflib_rxeof+0xcce/frame 0xfe0077ddc9b0
_task_fn_rx() at _task_fn_rx+0x7f/frame 0xfe0077ddc9f0
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x139/frame 0xfe0077ddca40
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame 0xfe0077ddca70
fork_exit() at fork_exit+0x84/frame 0xfe0077ddcab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe0077ddcab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---


Please keep replies just to freebsd-net@ from here on.

A+
Dave
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS: I/O error - blocks larger than 16777216 are not supported

2018-06-26 Thread Toomas Soome



> On 26 Jun 2018, at 05:08, KIRIYAMA Kazuhiko  wrote:
> 
> At Thu, 21 Jun 2018 10:48:28 +0300,
> Toomas Soome wrote:
>> 
>> 
>> 
>>> On 21 Jun 2018, at 09:00, KIRIYAMA Kazuhiko  wrote:
>>> 
>>> At Wed, 20 Jun 2018 23:34:48 -0400,
>>> Allan Jude wrote:
 
 On 2018-06-20 21:36, KIRIYAMA Kazuhiko wrote:
> Hi all,
> 
> I've been reported ZFS boot disable problem [1], and found
> that this issue occers form RAID configuration [2]. So I
> rebuit with RAID5 and re-installed 12.0-CURRENT
> (r333982). But failed to boot with:
> 
> ZFS: i/o error - all block copies unavailable
> ZFS: can't read MOS of pool zroot
> gptzfsboot: failed to mount default pool zroot
> 
> FreeBSD/x86 boot
> ZFS: I/O error - blocks larger than 16777216 are not supported
> ZFS: can't find dataset u
> Default: zroot/<0x0>:
> 
> In this case, the reason is "blocks larger than 16777216 are
> not supported" and I guess this means datasets that have
> recordsize greater than 8GB is NOT supported by the
> FreeBSD boot loader(zpool-features(7)). Is that true ?
> 
> My zpool featues are as follows:
> 
> # kldload zfs
> # zpool import 
>  pool: zroot
>id: 13407092850382881815
> state: ONLINE
> status: The pool was last accessed by another system.
> action: The pool can be imported using its name or numeric identifier and
>   the '-f' flag.
>  see: http://illumos.org/msg/ZFS-8000-EY
> config:
> 
>   zroot   ONLINE
> mfid0p3   ONLINE
> # zpool import -fR /mnt zroot
> # zpool list
> NAMESIZE  ALLOC   FREE  EXPANDSZ   FRAGCAP  DEDUP  HEALTH  ALTROOT
> zroot  19.9T   129G  19.7T - 0% 0%  1.00x  ONLINE  /mnt
> # zpool get all zroot
> NAME   PROPERTY  VALUE
>  SOURCE
> zroot  size  19.9T
>  -
> zroot  capacity  0%   
>  -
> zroot  altroot   /mnt 
>  local
> zroot  healthONLINE   
>  -
> zroot  guid  13407092850382881815 
>  default
> zroot  version   -
>  default
> zroot  bootfszroot/ROOT/default   
>  local
> zroot  delegationon   
>  default
> zroot  autoreplace   off  
>  default
> zroot  cachefile none 
>  local
> zroot  failmode  wait 
>  default
> zroot  listsnapshots off  
>  default
> zroot  autoexpandoff  
>  default
> zroot  dedupditto0
>  default
> zroot  dedupratio1.00x
>  -
> zroot  free  19.7T
>  -
> zroot  allocated 129G 
>  -
> zroot  readonly  off  
>  -
> zroot  comment   -
>  default
> zroot  expandsize-
>  -
> zroot  freeing   0
>  default
> zroot  fragmentation 0%   
>  -
> zroot  leaked0
>  default
> zroot  feature@async_destroy enabled  
>  local
> zroot  feature@empty_bpobj   active   
>  local
> zroot  feature@lz4_compress  active   
>  local
> zroot  feature@multi_vdev_crash_dump enabled  
>  local
> zroot  feature@spacemap_histogramactive   
>