Re: Nvidia issue with CURRENT

2018-04-23 Thread Mateusz Piotrowski
On Mon, 23 Apr 2018 08:19:25 -0700
Kevin Oberman  wrote:

>On Mon, Apr 23, 2018 at 1:53 AM, Mateusz Piotrowski <0...@freebsd.org>
>wrote:
>
>> On Mon, 23 Apr 2018 09:00:33 +0200
>> "O. Hartmann"  wrote:
>>  
>> >In /etc/src.conf , therefore you should add something similar to
>> >(like I added to mine):
>> >
>> >PORTS_MODULES=
>> >PORTS_MODULES+= x11/nvidia-driver
>> >PORTS_MODULES+=
>> >emulators/virtualbox-ose-kmod  
>>
>> Shouldn't it go into make.conf(5)? That's what the manual suggests.
>>  
>
>Ans the man page should be fixed. By putting it into make.conf, it
>pollutes the environment  of ever make(1) run on the system. (Yes,
>unlikely to ever cause a problem,)
>
>In src.conf it only impacts the builds of the system and ports. This
>is why src.conf was created a few years ago. In fact, it was
>originally supposed to only impact the build of the system, but the
>ports Mk files were modified to pull it in, to, much to my annoyance.
>I liked being able to modify compile options just for the system
>without them breaking ports builds.
>
>Simple rule... any definition used by make(1) only for system builds
>belongs in /etc/sec.conf.

Done, I've added this information to make.conf(5). Now it's waiting for
a review. 

https://reviews.freebsd.org/D15177

Thanks!

Mateusz
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Nvidia issue with CURRENT

2018-04-23 Thread Kevin Oberman
On Mon, Apr 23, 2018 at 1:53 AM, Mateusz Piotrowski <0...@freebsd.org> wrote:

> On Mon, 23 Apr 2018 09:00:33 +0200
> "O. Hartmann"  wrote:
>
> >In /etc/src.conf , therefore you should add something similar to (like
> >I added to mine):
> >
> >PORTS_MODULES=
> >PORTS_MODULES+= x11/nvidia-driver
> >PORTS_MODULES+= emulators/virtualbox-ose-kmod
>
> Shouldn't it go into make.conf(5)? That's what the manual suggests.
>

Ans the man page should be fixed. By putting it into make.conf, it pollutes
the environment  of ever make(1) run on the system. (Yes, unlikely to ever
cause a problem,)

In src.conf it only impacts the builds of the system and ports. This is why
src.conf was created a few years ago. In fact, it was originally supposed
to only impact the build of the system, but the ports Mk files were
modified to pull it in, to, much to my annoyance. I liked being able to
modify compile options just for the system without them breaking ports
builds.

Simple rule... any definition used by make(1) only for system builds
belongs in /etc/sec.conf.
--
Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: rkober...@gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


crash on process exit.. current at about r332467

2018-04-23 Thread Julian Elischer

back trace at:  http://www.freebsd.org/~julian/bob-crash.png

If anyone wants to take a look..

In the exit syscall, while deallocating a vm object.

I haven't see references to a similar crash in the last 10 days or 
so.. But if it rings any bells...




___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Nvidia issue with CURRENT

2018-04-23 Thread Mariusz Zaborski
On Mon, Apr 23, 2018 at 06:17:42PM +1000, Greg 'groggy' Lehey wrote:
> On Monday, 23 April 2018 at  9:55:40 +0200, Mariusz Zaborski wrote:
> > On Mon, Apr 23, 2018 at 05:51:01PM +1000, Greg 'groggy' Lehey wrote:
> >> On Monday, 23 April 2018 at  9:00:33 +0200, O. Hartmann wrote:
> >>> On Sun, 22 Apr 2018 14:38:55 +0200  Mariusz Zaborski 
> >>>  wrote:
> >>> In /etc/src.conf , therefore you should add something similar to (like I 
> >>> added
> >>> to mine):
> >>>
> >>> PORTS_MODULES=
> >>> PORTS_MODULES+= x11/nvidia-driver
> >>> PORTS_MODULES+= emulators/virtualbox-ose-kmod
> >>>
> >>> This is one of the great advantages of having an operating system which 
> >>> you can
> >>> compile yourself.
> >>
> >> Yes, but this has nothing to do with the bug.  Clearly Marisuz and I
> >> have the configuration correct, but something has changed in the last
> >> few months.
> >
> > Yea this is a known issue so I rebuild nvidia-driver.
> > I'm just not sure if this is a problem with kernel or with the
> > driver itself.
> 
> Almost by definition, it's a driver issue.  Something in the kernel
> has changed which makes it no longer work.
> 
> >> Marisuz, as I commented, your log wasn't appended to the message I
> >> received.  What is your hardware?
> >
> > https://people.freebsd.org/~oshogbo/Xorg.0.log
> 
> A brief scan doesn't show anything very similar to my issues.  I'll
> look again tomorrow when I have time.
> 
> Did you try the most recent driver?
If you mean the 390.48, then yes.
I didn't see any newer then that.

Thanks,
-- 
Mariusz Zaborski
oshogbo//vx | http://oshogbo.vexillium.org
FreeBSD commiter| https://freebsd.org
Software developer  | http://wheelsystems.com
If it's not broken, let's fix it till it is!!1
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Nvidia issue with CURRENT

2018-04-23 Thread Mateusz Piotrowski
On Mon, 23 Apr 2018 09:00:33 +0200
"O. Hartmann"  wrote:

>In /etc/src.conf , therefore you should add something similar to (like
>I added to mine):
>
>PORTS_MODULES=
>PORTS_MODULES+= x11/nvidia-driver
>PORTS_MODULES+= emulators/virtualbox-ose-kmod

Shouldn't it go into make.conf(5)? That's what the manual suggests.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Nvidia issue with CURRENT

2018-04-23 Thread Greg 'groggy' Lehey
On Monday, 23 April 2018 at  9:55:40 +0200, Mariusz Zaborski wrote:
> On Mon, Apr 23, 2018 at 05:51:01PM +1000, Greg 'groggy' Lehey wrote:
>> On Monday, 23 April 2018 at  9:00:33 +0200, O. Hartmann wrote:
>>> On Sun, 22 Apr 2018 14:38:55 +0200  Mariusz Zaborski  
>>> wrote:
>>> In /etc/src.conf , therefore you should add something similar to (like I 
>>> added
>>> to mine):
>>>
>>> PORTS_MODULES=
>>> PORTS_MODULES+= x11/nvidia-driver
>>> PORTS_MODULES+= emulators/virtualbox-ose-kmod
>>>
>>> This is one of the great advantages of having an operating system which you 
>>> can
>>> compile yourself.
>>
>> Yes, but this has nothing to do with the bug.  Clearly Marisuz and I
>> have the configuration correct, but something has changed in the last
>> few months.
>
> Yea this is a known issue so I rebuild nvidia-driver.
> I'm just not sure if this is a problem with kernel or with the
> driver itself.

Almost by definition, it's a driver issue.  Something in the kernel
has changed which makes it no longer work.

>> Marisuz, as I commented, your log wasn't appended to the message I
>> received.  What is your hardware?
>
> https://people.freebsd.org/~oshogbo/Xorg.0.log

A brief scan doesn't show anything very similar to my issues.  I'll
look again tomorrow when I have time.

Did you try the most recent driver?

Greg
--
Sent from my desktop computer.
Finger g...@freebsd.org for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA


signature.asc
Description: PGP signature


Re: Nvidia issue with CURRENT

2018-04-23 Thread Mariusz Zaborski
On Mon, Apr 23, 2018 at 05:51:01PM +1000, Greg 'groggy' Lehey wrote:
> On Monday, 23 April 2018 at  9:00:33 +0200, O. Hartmann wrote:
> > On Sun, 22 Apr 2018 14:38:55 +0200
> > Mariusz Zaborski  wrote:
> >
> >> Hi,
> >>
> >> Normally I build my CURRENT by myself from Xorg - r332861.
> >> But I also tried latest SNAPSHOT.
> >
> > All my boxes running with nVidia hardware running most recent CURRENT 
> > (compiled
> > this morning on an almost daily basis) and I'm using the lates official 
> > driver
> > available from nVidia, 390.48.
> >
> > It happens to be as a natural byproduct of CURRENT that very often
> > the kernel module of the nVidia driver is out of sync so i made it a
> > habit to recompile the module from sources whenever I
> > recompile/install a kernel.
> 
> As I commented, I've had this on -STABLE as well.
> 
> My guess is that this is GPU dependent.  I'm using an old card:
> 
> [32.251] Current Operating System: FreeBSD teevee.lemis.com 11.1-STABLE 
> FreeBSD 11.1-STABLE #2 r327971: Mon Jan 15 1
> 0:55:53 AEDT 2018 
> g...@teevee.lemis.com:/home/obj/eureka/home/src/FreeBSD/svn/stable/11/sys/GENERIC
>  amd64
> ...
> [32.763] (II) NVIDIA dlloader X Driver  390.25  Wed Jan 24 19:00:20 PST 
> 2018
> ...
> [33.785] (II) NVIDIA(0): NVIDIA GPU GeForce GT 710 (GK208) at PCI:1:0:0 
> (GPU-0)
> [33.785] (--) NVIDIA(0): Memory: 2097152 kBytes
> [33.785] (--) NVIDIA(0): VideoBIOS: 80.28.b8.00.45
> [33.785] (II) NVIDIA(0): Detected PCI Express Link width: 8X
> 
> > In /etc/src.conf , therefore you should add something similar to (like I 
> > added
> > to mine):
> >
> > PORTS_MODULES=
> > PORTS_MODULES+= x11/nvidia-driver
> > PORTS_MODULES+= emulators/virtualbox-ose-kmod
> >
> > This is one of the great advantages of having an operating system which you 
> > can
> > compile yourself.
> 
> Yes, but this has nothing to do with the bug.  Clearly Marisuz and I
> have the configuration correct, but something has changed in the last
> few months.
Yea this is a known issue so I rebuild nvidia-driver.
I'm just not sure if this is a problem with kernel or with the driver itself.

> Marisuz, as I commented, your log wasn't appended to the message I
> received.  What is your hardware?
https://people.freebsd.org/~oshogbo/Xorg.0.log

NVIDIA GPU GeForce GTX 1050 Ti

Thanks,
-- 
Mariusz Zaborski
oshogbo//vx | http://oshogbo.vexillium.org
FreeBSD commiter| https://freebsd.org
Software developer  | http://wheelsystems.com
If it's not broken, let's fix it till it is!!1


signature.asc
Description: PGP signature


Re: Nvidia issue with CURRENT

2018-04-23 Thread Greg 'groggy' Lehey
On Monday, 23 April 2018 at  9:00:33 +0200, O. Hartmann wrote:
> On Sun, 22 Apr 2018 14:38:55 +0200
> Mariusz Zaborski  wrote:
>
>> Hi,
>>
>> Normally I build my CURRENT by myself from Xorg - r332861.
>> But I also tried latest SNAPSHOT.
>
> All my boxes running with nVidia hardware running most recent CURRENT 
> (compiled
> this morning on an almost daily basis) and I'm using the lates official driver
> available from nVidia, 390.48.
>
> It happens to be as a natural byproduct of CURRENT that very often
> the kernel module of the nVidia driver is out of sync so i made it a
> habit to recompile the module from sources whenever I
> recompile/install a kernel.

As I commented, I've had this on -STABLE as well.

My guess is that this is GPU dependent.  I'm using an old card:

[32.251] Current Operating System: FreeBSD teevee.lemis.com 11.1-STABLE 
FreeBSD 11.1-STABLE #2 r327971: Mon Jan 15 1
0:55:53 AEDT 2018 
g...@teevee.lemis.com:/home/obj/eureka/home/src/FreeBSD/svn/stable/11/sys/GENERIC
 amd64
...
[32.763] (II) NVIDIA dlloader X Driver  390.25  Wed Jan 24 19:00:20 PST 2018
...
[33.785] (II) NVIDIA(0): NVIDIA GPU GeForce GT 710 (GK208) at PCI:1:0:0 
(GPU-0)
[33.785] (--) NVIDIA(0): Memory: 2097152 kBytes
[33.785] (--) NVIDIA(0): VideoBIOS: 80.28.b8.00.45
[33.785] (II) NVIDIA(0): Detected PCI Express Link width: 8X

> In /etc/src.conf , therefore you should add something similar to (like I added
> to mine):
>
> PORTS_MODULES=
> PORTS_MODULES+= x11/nvidia-driver
> PORTS_MODULES+= emulators/virtualbox-ose-kmod
>
> This is one of the great advantages of having an operating system which you 
> can
> compile yourself.

Yes, but this has nothing to do with the bug.  Clearly Marisuz and I
have the configuration correct, but something has changed in the last
few months.

Marisuz, as I commented, your log wasn't appended to the message I
received.  What is your hardware?

Greg
--
Sent from my desktop computer.
Finger g...@freebsd.org for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA


signature.asc
Description: PGP signature


Re: SCHED_ULE makes 256Mbyte i386 unusable

2018-04-23 Thread Julian Elischer

On 22/4/18 9:43 pm, Rick Macklem wrote:

Konstantin Belousov wrote:

On Sat, Apr 21, 2018 at 11:30:55PM +, Rick Macklem wrote:

Konstantin Belousov wrote:

On Sat, Apr 21, 2018 at 07:21:58PM +, Rick Macklem wrote:

I decided to start a new thread on current related to SCHED_ULE, since I see
more than just performance degradation and on a recent current kernel.
(I cc'd a couple of the people discussing performance problems in freebsd-stable
  recently under a subject line of "Re: kern.sched.quantum: Creepy, sadistic 
scheduler".

When testing a pNFS server on a single core i386 with 256Mbytes using a Dec. 
2017
current/head kernel, I would see about a 30% performance degradation (elapsed
run time for a kernel build over NFSv4.1) when the server kernel was built with
options SCHED_ULE
instead of
options SCHED_4BSD

So, now that I have decreased the number of nfsd kernel threads to 32, it works
with both schedulers and with essentially the same performance. (ie. The 30%
performance degradation has disappeared.)


Now, with a kernel from a couple of days ago, the
options SCHED_ULE
kernel becomes unusable shortly after starting testing.
I have seen two variants of this:
- Became essentially hung. All I could do was ping the machine from the network.
- Reported "vm_thread_new: kstack allocation failed
   and then any attempt to do anything gets "No more processes".

This is strange.  It usually means that you get KVA either exhausted or
severly fragmented.

Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE
kernel is working ok now. I haven't done enough to compare performance yet.
Maybe I'll post again when I have some numbers.


Enter ddb, it should be operational since pings are replied.  Try to see
where the threads are stuck.

I didn't do this, since reducing the number of kernel threads seems to have 
fixed
the problem. For the pNFS server, the nfsd threads will spawn additional kernel
threads to do proxies to the mirrored DS servers.


with the only difference being a kernel built with
options SCHED_4BSD
everything works and performs the same as the Dec 2017 kernel.

I can try rolling back through the revisions, but it would be nice if someone
could suggest where to start, because it takes a couple of hours to build a
kernel on this system.

So, something has made things worse for a head/current kernel this winter, rick

There are at least two potentially relevant changes.

First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4.

I've been running this machine with KSTACK_PAGES=4 for some time, so no change.

W.r.t. Rodney Grimes comments about this (which didn't end up in this messages
in the thread):
I didn't see any instability when using KSTACK_PAGES=4 for this until this 
cropped
up and seemed to be scheduler related (but not really, it seems).
I bumped it to KSTACK_PAGES=4 because I needed that for the pNFS Metadata
Server code.

Yes, NFS does use quite a bit of kernel stack. Unfortunately, it isn't one big
item getting allocated on the stack, but many moderate sized ones.
(A part of it is multiple instances of "struct vattr", some buried in "struct 
nfsvattr",
  that NFS needs to use. I don't think these are large enough to justify 
malloc/free,
  but it has to use several of them.)

One case I did try fixing was about 6 cases where "struct nfsstate" ended up on
the stack. I changes the code to malloc/free them and then when testing, to
my surprise I had a 20% performance hit and shelved the patch.


you might try using uma. especially setting up a non-freeing zone, 
where he system allocates what it needs and then just recycles them.

(man uma)

Now that I know that the server was running near its limit, I might try this one
again, to see if the performance hit doesn't occur when the machine has adequate
memory. If the performance hit goes away, I could commit this, but it wouldn't 
have that much effect on the kstack usage. (It's interesting how this patch 
ended
up related to the issue this thread discussed.)


Second is r332489 Apr 13, which introduced 4/4G KVA/UVA split.

Could this change have resulted in the system being able to allocate fewer
kernel threads/stacks for some reason?

Well, it could, as anything can be buggy. But the intent of the change
was to give 4G KVA, and it did.

Righto. No concern here. I suspect the Dec. 2017 kernel was close to the limit
(see performance issue that went away, noted above) and any change could
have pushed it across the line, I think.


Consequences of the first one are obvious, it is much harder to find
the place to map the stack.  Second change, on the other hand, provides
almost full 4G for KVA and should have mostly compensate for the negative
effects of the first.

And, I cannot see how changing the scheduler would fix or even affect that
behaviour.

My hunch is that the system was running near its limit for kernel 
threads/stacks.
Then, somehow, the timing SCHED_ULE caused resulted in the nfsd trying to get

Re: SCHED_ULE makes 256Mbyte i386 unusable

2018-04-23 Thread Julian Elischer

On 22/4/18 10:36 pm, Rodney W. Grimes wrote:

Konstantin Belousov wrote:

On Sat, Apr 21, 2018 at 11:30:55PM +, Rick Macklem wrote:

Konstantin Belousov wrote:

On Sat, Apr 21, 2018 at 07:21:58PM +, Rick Macklem wrote:

I decided to start a new thread on current related to SCHED_ULE, since I see
more than just performance degradation and on a recent current kernel.
(I cc'd a couple of the people discussing performance problems in freebsd-stable
  recently under a subject line of "Re: kern.sched.quantum: Creepy, sadistic 
scheduler".

When testing a pNFS server on a single core i386 with 256Mbytes using a Dec. 
2017
current/head kernel, I would see about a 30% performance degradation (elapsed
run time for a kernel build over NFSv4.1) when the server kernel was built with
options SCHED_ULE
instead of
options SCHED_4BSD

So, now that I have decreased the number of nfsd kernel threads to 32, it works
with both schedulers and with essentially the same performance. (ie. The 30%
performance degradation has disappeared.)


Now, with a kernel from a couple of days ago, the
options SCHED_ULE
kernel becomes unusable shortly after starting testing.
I have seen two variants of this:
- Became essentially hung. All I could do was ping the machine from the network.
- Reported "vm_thread_new: kstack allocation failed
   and then any attempt to do anything gets "No more processes".

This is strange.  It usually means that you get KVA either exhausted or
severly fragmented.

Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE
kernel is working ok now. I haven't done enough to compare performance yet.
Maybe I'll post again when I have some numbers.


Enter ddb, it should be operational since pings are replied.  Try to see
where the threads are stuck.

I didn't do this, since reducing the number of kernel threads seems to have 
fixed
the problem. For the pNFS server, the nfsd threads will spawn additional kernel
threads to do proxies to the mirrored DS servers.


with the only difference being a kernel built with
options SCHED_4BSD
everything works and performs the same as the Dec 2017 kernel.

I can try rolling back through the revisions, but it would be nice if someone
could suggest where to start, because it takes a couple of hours to build a
kernel on this system.

So, something has made things worse for a head/current kernel this winter, rick

There are at least two potentially relevant changes.

First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4.

I've been running this machine with KSTACK_PAGES=4 for some time, so no change.

W.r.t. Rodney Grimes comments about this (which didn't end up in this messages
in the thread):
I didn't see any instability when using KSTACK_PAGES=4 for this until this 
cropped
up and seemed to be scheduler related (but not really, it seems).
I bumped it to KSTACK_PAGES=4 because I needed that for the pNFS Metadata
Server code.

Yes, NFS does use quite a bit of kernel stack. Unfortunately, it isn't one big
item getting allocated on the stack, but many moderate sized ones.
(A part of it is multiple instances of "struct vattr", some buried in "struct 
nfsvattr",
  that NFS needs to use. I don't think these are large enough to justify 
malloc/free,
  but it has to use several of them.)

One case I did try fixing was about 6 cases where "struct nfsstate" ended up on
the stack. I changes the code to malloc/free them and then when testing, to
my surprise I had a 20% performance hit and shelved the patch.
Now that I know that the server was running near its limit, I might try this one
again, to see if the performance hit doesn't occur when the machine has adequate
memory. If the performance hit goes away, I could commit this, but it wouldn't
have that much effect on the kstack usage. (It's interesting how this patch 
ended
up related to the issue this thread discussed.)

Anything we can do to help relieve KSTACK usage, especially on i386
is helpfull.  These is a thread back quite some time where someone
came up with a compile time static "this functions uses X bytes of
local stack" and a bit of clean up was done.  We should persue
this issue further.


that was me.

use
|-Wframe-larger-than||=|¶ 
 
and set it to something like 512 bytes




My experiece with the i386/KSTACK issues was attempting to do installs
from snapshot .iso's, I usually had to change to a custom kernel without
INVARIANTS and WITNESS, or reduce KSTACK to 2 and suffer the small stack
problem (ie, dont use NFS during install).  Neither was very pleasant.

I have found it in practical to run the 4 page KSTACK in production
VM's using i386 due to memory requirements.  I run many very lean
i386 VM's with 64MB of memory.  I suspect our user base also has
many people doing this, and it would be to our advantage to try
and reduce our kernel stack needs.



Second is r332489 Apr 13, which introduced 4/4G 

Re: SCHED_ULE makes 256Mbyte i386 unusable

2018-04-23 Thread Julian Elischer

On 22/4/18 10:36 pm, Rodney W. Grimes wrote:

Konstantin Belousov wrote:

On Sat, Apr 21, 2018 at 11:30:55PM +, Rick Macklem wrote:

Konstantin Belousov wrote:

On Sat, Apr 21, 2018 at 07:21:58PM +, Rick Macklem wrote:

I decided to start a new thread on current related to SCHED_ULE, since I see
more than just performance degradation and on a recent current kernel.
(I cc'd a couple of the people discussing performance problems in freebsd-stable
  recently under a subject line of "Re: kern.sched.quantum: Creepy, sadistic 
scheduler".

When testing a pNFS server on a single core i386 with 256Mbytes using a Dec. 
2017
current/head kernel, I would see about a 30% performance degradation (elapsed
run time for a kernel build over NFSv4.1) when the server kernel was built with
options SCHED_ULE
instead of
options SCHED_4BSD

So, now that I have decreased the number of nfsd kernel threads to 32, it works
with both schedulers and with essentially the same performance. (ie. The 30%
performance degradation has disappeared.)


Now, with a kernel from a couple of days ago, the
options SCHED_ULE
kernel becomes unusable shortly after starting testing.
I have seen two variants of this:
- Became essentially hung. All I could do was ping the machine from the network.
- Reported "vm_thread_new: kstack allocation failed
   and then any attempt to do anything gets "No more processes".

This is strange.  It usually means that you get KVA either exhausted or
severly fragmented.

Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE
kernel is working ok now. I haven't done enough to compare performance yet.
Maybe I'll post again when I have some numbers.


Enter ddb, it should be operational since pings are replied.  Try to see
where the threads are stuck.

I didn't do this, since reducing the number of kernel threads seems to have 
fixed
the problem. For the pNFS server, the nfsd threads will spawn additional kernel
threads to do proxies to the mirrored DS servers.


with the only difference being a kernel built with
options SCHED_4BSD
everything works and performs the same as the Dec 2017 kernel.

I can try rolling back through the revisions, but it would be nice if someone
could suggest where to start, because it takes a couple of hours to build a
kernel on this system.

So, something has made things worse for a head/current kernel this winter, rick

There are at least two potentially relevant changes.

First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4.

I've been running this machine with KSTACK_PAGES=4 for some time, so no change.

W.r.t. Rodney Grimes comments about this (which didn't end up in this messages
in the thread):
I didn't see any instability when using KSTACK_PAGES=4 for this until this 
cropped
up and seemed to be scheduler related (but not really, it seems).
I bumped it to KSTACK_PAGES=4 because I needed that for the pNFS Metadata
Server code.

Yes, NFS does use quite a bit of kernel stack. Unfortunately, it isn't one big
item getting allocated on the stack, but many moderate sized ones.
(A part of it is multiple instances of "struct vattr", some buried in "struct 
nfsvattr",
  that NFS needs to use. I don't think these are large enough to justify 
malloc/free,
  but it has to use several of them.)

One case I did try fixing was about 6 cases where "struct nfsstate" ended up on
the stack. I changes the code to malloc/free them and then when testing, to
my surprise I had a 20% performance hit and shelved the patch.
Now that I know that the server was running near its limit, I might try this one
again, to see if the performance hit doesn't occur when the machine has adequate
memory. If the performance hit goes away, I could commit this, but it wouldn't
have that much effect on the kstack usage. (It's interesting how this patch 
ended
up related to the issue this thread discussed.)

Anything we can do to help relieve KSTACK usage, especially on i386
is helpfull.  These is a thread back quite some time where someone
came up with a compile time static "this functions uses X bytes of
local stack" and a bit of clean up was done.  We should persue
this issue further.


that was me.

use
|-Wframe-larger-than||=|¶ 
 
and set it to something like 512 bytes (obviously you have to make 
warnings non fatal as well).






My experiece with the i386/KSTACK issues was attempting to do installs
from snapshot .iso's, I usually had to change to a custom kernel without
INVARIANTS and WITNESS, or reduce KSTACK to 2 and suffer the small stack
problem (ie, dont use NFS during install).  Neither was very pleasant.

I have found it in practical to run the 4 page KSTACK in production
VM's using i386 due to memory requirements.  I run many very lean
i386 VM's with 64MB of memory.  I suspect our user base also has
many people doing this, and it would be to our advantage to try
and reduce our kernel stack 

Re: Nvidia issue with CURRENT

2018-04-23 Thread O. Hartmann
On Sun, 22 Apr 2018 14:38:55 +0200
Mariusz Zaborski  wrote:

> Hi,
> 
> Normally I build my CURRENT by myself from Xorg - r332861.
> But I also tried latest SNAPSHOT.
> 
> Thanks,
> Mariusz

All my boxes running with nVidia hardware running most recent CURRENT (compiled
this morning on an almost daily basis) and I'm using the lates official driver
available from nVidia, 390.48.

It happens to be as a natural byproduct of CURRENT that very often the kernel
module of the nVidia driver is out of sync so i made it a habit to recompile
the module from sources whenever I recompile/install a  kernel.

In /etc/src.conf , therefore you should add something similar to (like I added
to mine):

PORTS_MODULES=
PORTS_MODULES+= x11/nvidia-driver
PORTS_MODULES+= emulators/virtualbox-ose-kmod

This is one of the great advantages of having an operating system which you can
compile yourself. 

Regards,

oh

 
> 
> On 22 April 2018 at 14:24, Tommi Pernila  wrote:
> > Hi,
> >
> > are you running which version of CURRENT?
> > E.g. Some snapshot or did you compile from source?
> >
> > -Tommi
> >
> > On Sun, 22 Apr 2018 at 13.47, Mariusz Zaborski 
> > wrote:  
> >>
> >> Hello,
> >>
> >> I upgraded my FreeBSD to CURRENT and nvidia-drvier-390.48. But it's
> >> stop working.
> >> I tried also nvidia-driver-390.25 without luck as well.
> >>
> >> I have loaded nvidia-modeset.ko .
> >>
> >> While I'm rebooting my machine its also core dumping:
> >> https://people.freebsd.org/~oshogbo/nvidia-mail.png .
> >> I'm attaching also Xorg log.
> >>
> >> Is this a known issue?
> >>
> >> Thanks,
> >> Mariusz
> >> ___
> >> freebsd-current@freebsd.org mailing list
> >> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> >> To unsubscribe, send any mail to
> >> "freebsd-current-unsubscr...@freebsd.org"  
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"