Re: X server being killed a lot

2018-10-29 Thread Christos Zoulas
In article ,
Michael van Elst  wrote:
>chris...@astron.com (Christos Zoulas) writes:
>
>>But we kill the process that faulted in this case not the process that
>>likely caused the shortage. We should be keeping stats so that we can
>>select a better victim, then kill that instead and retry. But this is
>>easier said than done :-)
>
>Linux tried for years. The best they have is to mark specific processes
>as not eligible for killing.

I would be happy with that. Having syslogd killed for example is not nice.

>But first should be to find out what allocation failed. There are
>reasons to believe this is caused by the DRM memory management.
>And then the X server is the process that caused the shortage and
>still shouldn't be killed.

I agree.

christos



Re: X server being killed a lot

2018-10-29 Thread Robert Swindells


Izumi Tsutsui  wrote:
>> Do we know what combination of things is causing X to be killed ?
>
>I can reproduce it by Xorg server + Firefox 62 + makefs(8) creating
>4GB FFS image on NetBSD/i386 8.0 (i.e. on building live images).

How is your X server configured ? Is it operating on a framebuffer in
main memory or VRAM on a separate graphics card ?

The sizes shown in top(1) for X on my system are smaller than several
other processes. This is with an AMD Radeon GPU with 1GB VRAM.




Re: X server being killed a lot

2018-10-29 Thread Robert Swindells


Izumi Tsutsui  wrote:
>> Izumi Tsutsui  wrote:
>> >> Do we know what combination of things is causing X to be killed ?
>> >
>> >I can reproduce it by Xorg server + Firefox 62 + makefs(8) creating
>> >4GB FFS image on NetBSD/i386 8.0 (i.e. on building live images).
>> 
>> How is your X server configured ? Is it operating on a framebuffer in
>> main memory or VRAM on a separate graphics card ?
>
>My machine has RADEON HD 5450 so it has own VRAM, I think

I think I have the same model.

>> The sizes shown in top(1) for X on my system are smaller than several
>> other processes. This is with an AMD Radeon GPU with 1GB VRAM.

I guess I'm not getting to the point where anything has been paged out:

Memory: 9017M Act, 1400M Inact, 9400K Wired, 345M Exec, 7679M File, 2156M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME PRI NICE   SIZE   RES STATE  TIME   WCPUCPU COMMAND
16605 rjs   430  2964M 1328M parked/0 230:45  6.88%  6.88% firefox
  634 rjs   850   182M   61M select/0 296:32  0.59%  0.59% X
18233 rjs   850  5840M  699M futex/3   84:31  0.00%  0.00% java
  711 rjs   850   301M  179M select/5  12:10  0.00%  0.00% emacs
  615 rjs   85065M 6332K select/5   2:21  0.00%  0.00% mwm
21786 rjs   850  1182M  202M select/1   0:51  0.00%  0.00% sbcl
22344 rjs   85097M   33M select/3   0:10  0.00%  0.00% xpdf
 1028 rjs   850   135M 7872K wait/0 0:00  0.00%  0.00% eclipse

My memory summary line looks similar to what wiz@ reported in the
original message though.



Re: X server being killed a lot

2018-10-29 Thread Izumi Tsutsui
> Izumi Tsutsui  wrote:
> >> Do we know what combination of things is causing X to be killed ?
> >
> >I can reproduce it by Xorg server + Firefox 62 + makefs(8) creating
> >4GB FFS image on NetBSD/i386 8.0 (i.e. on building live images).
> 
> How is your X server configured ? Is it operating on a framebuffer in
> main memory or VRAM on a separate graphics card ?

My machine has RADEON HD 5450 so it has own VRAM, I think

> The sizes shown in top(1) for X on my system are smaller than several
> other processes. This is with an AMD Radeon GPU with 1GB VRAM.

top(1) on the same environment says:
---

Memory: 1976M Act, 974M Inact, 57M Wired, 139M Exec, 867M File, 16M Free
Swap: 8192M Total, 1578M Used, 6614M Free

  PID USERNAME PRI NICE   SIZE   RES STATE  TIME   WCPUCPU COMMAND
 2521 tsutsui   430  1588M 1118M parked/0  36:33  1.22%  1.22% firefox
  708 tsutsui   850  1431M  764M select/0 484.2H  0.88%  0.88% ruby24
   73 tsutsui   850   383M  145M select/0 117:56  0.00%  0.00% Xorg
 :
---

Always Xorg was killed, not ruby24 or firefox:

---
% zgrep 'out of swap' /var/log/messages*
/var/log/messages:Oct 14 00:53:12 mirage /netbsd: UVM: pid 1962.1 (Xorg), uid 0 
killed: out of swap
/var/log/messages:Oct 18 23:27:19 mirage /netbsd: UVM: pid 4634.1 (Xorg), uid 0 
killed: out of swap
/var/log/messages.0.gz:Aug 22 00:45:01 mirage /netbsd: UVM: pid 2257.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.1.gz:Aug  6 22:35:49 mirage /netbsd: UVM: pid 394.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.1.gz:Aug 13 14:34:55 mirage /netbsd: UVM: pid 491.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.1.gz:Aug 13 17:18:07 mirage /netbsd: UVM: pid 2576.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.1.gz:Aug 15 13:07:11 mirage /netbsd: UVM: pid 970.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.4.gz:Jul  7 11:49:25 mirage /netbsd: UVM: pid 481.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.4.gz:Jul 22 09:29:25 mirage /netbsd: UVM: pid 75.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.4.gz:Jul 22 22:43:04 mirage /netbsd: UVM: pid 75.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.4.gz:Jul 22 22:48:24 mirage /netbsd: UVM: pid 4999.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.4.gz:Jul 22 22:50:08 mirage /netbsd: UVM: pid 14199.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.4.gz:Jul 22 22:58:48 mirage /netbsd: UVM: pid 3339.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.4.gz:Jul 22 23:09:28 mirage /netbsd: UVM: pid 11407.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.4.gz:Jul 22 23:12:20 mirage /netbsd: UVM: pid 29705.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.4.gz:Jul 22 23:21:39 mirage /netbsd: UVM: pid 1100.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.5.gz:Jun  2 20:48:55 mirage /netbsd: UVM: pid 846.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.5.gz:Jun  3 03:09:12 mirage /netbsd: UVM: pid 14182.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.5.gz:Jun  4 02:29:10 mirage /netbsd: UVM: pid 1260.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.5.gz:Jun  9 06:50:52 mirage /netbsd: UVM: pid 10595.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.5.gz:Jun  9 20:33:06 mirage /netbsd: UVM: pid 14867.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.5.gz:Jun 18 21:49:19 mirage /netbsd: UVM: pid 12485.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.5.gz:Jun 18 22:23:54 mirage /netbsd: UVM: pid 26101.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.5.gz:Jun 20 00:42:46 mirage /netbsd: UVM: pid 491.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.5.gz:Jun 23 06:34:39 mirage /netbsd: UVM: pid 22067.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.5.gz:Jun 30 00:06:18 mirage /netbsd: UVM: pid 12055.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.6.gz:May  5 17:40:40 mirage /netbsd: UVM: pid 74.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.6.gz:May 12 03:31:54 mirage /netbsd: UVM: pid 26634.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.6.gz:May 12 05:49:47 mirage /netbsd: UVM: pid 7793.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.6.gz:May 12 15:31:58 mirage /netbsd: UVM: pid 7632.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.6.gz:May 13 12:07:35 mirage /netbsd: UVM: pid 28029.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.6.gz:May 13 15:27:24 mirage /netbsd: UVM: pid 1197.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.6.gz:May 13 20:54:15 mirage /netbsd: UVM: pid 833.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.6.gz:May 17 00:28:20 mirage /netbsd: UVM: pid 3351.1 (Xorg), 
uid 0 killed: out of swap
/var/log/messages.7.gz:May  4 11:01:59 mirage /netbsd: UVM: pid 1588.1 (Xorg), 
uid 0 killed: out of swap
% 
---

Note I updated the machine from 7.1.2 to 8.0_RC1 on April 30
(and no 'out of swap' messages in older logs).

---
Izumi Tsutsui


Re: X server being killed a lot

2018-10-29 Thread Izumi Tsutsui
> Do we know what combination of things is causing X to be killed ?

I can reproduce it by Xorg server + Firefox 62 + makefs(8) creating
4GB FFS image on NetBSD/i386 8.0 (i.e. on building live images).

IIRC no such problem on 7.x days.
(though Firefox was also smaller in those days)

---
Izumi Tsutsui


Re: X server being killed a lot

2018-10-29 Thread Robert Swindells


Do we know what combination of things is causing X to be killed ?

I have never seen it happen and am running X, Firefox and several other
big packages as well as doing builds on the same machine.

Robert Swindells


Re: X server being killed a lot

2018-10-29 Thread Michael van Elst
chris...@astron.com (Christos Zoulas) writes:

>But we kill the process that faulted in this case not the process that
>likely caused the shortage. We should be keeping stats so that we can
>select a better victim, then kill that instead and retry. But this is
>easier said than done :-)

Linux tried for years. The best they have is to mark specific processes
as not eligible for killing.

But first should be to find out what allocation failed. There are
reasons to believe this is caused by the DRM memory management.
And then the X server is the process that caused the shortage and
still shouldn't be killed.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: X server being killed a lot

2018-10-29 Thread Christos Zoulas
In article ,
Michael van Elst  wrote:
>mlel...@serpens.de (Michael van Elst) writes:
>
>>filemax is not the limit for the cache but the level it tries to keep
>>when pressed for memory.
>
>None of these settings are directly responsible for killing a process,
>they just help to avoid that the system runs against the wall.
>
>A process is killed by UVM when it needs to fault-in a page but there
>is no free page and it thinks none could be reclaimed. As long as there
>is swap, the assumption is that there is at least one anon page that can
>be reclaimed somewhen and nothing is killed. As long as the file cache
>exceeds 1/16 of managed memory or 5MByte, the assumption is that at
>least one file page can be reclaimed somewhen and nothing is killed.
>
>There is one more possibility. Even when there is swap and pages
>could be reclaimed but the pager itself runs out of (kernel) memory,
>that error can kill the process. That includes also a failure to
>allocate kernel address space.
>
>The UVM history should give you the exact reason why the fault
>couldn't be handled.

But we kill the process that faulted in this case not the process that
likely caused the shortage. We should be keeping stats so that we can
select a better victim, then kill that instead and retry. But this is
easier said than done :-)

christos



Re: X server being killed a lot

2018-10-29 Thread Michael van Elst
mlel...@serpens.de (Michael van Elst) writes:

>filemax is not the limit for the cache but the level it tries to keep
>when pressed for memory.

None of these settings are directly responsible for killing a process,
they just help to avoid that the system runs against the wall.

A process is killed by UVM when it needs to fault-in a page but there
is no free page and it thinks none could be reclaimed. As long as there
is swap, the assumption is that there is at least one anon page that can
be reclaimed somewhen and nothing is killed. As long as the file cache
exceeds 1/16 of managed memory or 5MByte, the assumption is that at
least one file page can be reclaimed somewhen and nothing is killed.

There is one more possibility. Even when there is swap and pages
could be reclaimed but the pager itself runs out of (kernel) memory,
that error can kill the process. That includes also a failure to
allocate kernel address space.

The UVM history should give you the exact reason why the fault
couldn't be handled.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: X server being killed a lot

2018-10-29 Thread Michael van Elst
t...@giga.or.at (Thomas Klausner) writes:

>On Mon, Oct 22, 2018 at 12:18:01PM -0400, Michael wrote:
>> It helped somewhat to add this to sysctl.conf:
>> vm.filemin=2
>> vm.filemax=10
>> now it still uses well over 10% or memory as file cache but seems more
>> willing to shrink it.

filemax is not the limit for the cache but the level it tries to keep
when pressed for memory.


>Is there some delay until these values are really used? Or are they only
>relevant if we're below the magic boundary and afterwards they are not
>enforced so much because the limit has already been broken? How do those
>limits work?

The three types anon, file and exec can grow as long as memory permits.

Things change when free memory drops below some limit, then the page
daemon tries to free inactive pages. inactive pages are those that
haven't been used recently.

When scanning for pages to free, it follows a simple heuristic. Pages
that belong to a type (anon,file,exec) that is below the minimum will
be skipped, pages that belong to a type that is above minimum but
below maximum will be skipped if any other type is above maximum.

If all types would be skipped (then all are below minimum), then nothing
is skipped.

If only file and exec would be skipped but swap is full (so anon cannot
be paged out), then nothing is skipped.

As a side effect, pages skipped in the scan are activated and thus
removed from the inactive queue for some time.

So the heuristic first tries to reduce everything to the maximum,
then tries to reduce everything to the minimum, and then as far as
possible. It will never try to free active pages.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: X server being killed a lot

2018-10-29 Thread Lars Reichardt
On Mon, 29 Oct 2018 09:46:34 +0100
Thomas Klausner  wrote:

> On Mon, Oct 22, 2018 at 12:18:01PM -0400, Michael wrote:
> > I've had firefox starting to get swapped out ( and everything
> > slowing to a crawl because of it ) while in active use, with more
> > than half of RAM being used as file cache, and nothing hammering
> > the filesystem either.
> > One would think the OS would shrink the cache first, especially if
> > it's several gigabytes.
> > 
> > It helped somewhat to add this to sysctl.conf:
> > vm.filemin=2
> > vm.filemax=10
> > now it still uses well over 10% or memory as file cache but seems
> > more willing to shrink it.  
> 
> I just gave that a try after X was killed again, setting the values
> with sysctl -w.
> 
> Then I restarted X, gnucash and firefox and X got killed again before
> all of them had finished starting up.
> 
> Is there some delay until these values are really used? Or are they
> only relevant if we're below the magic boundary and afterwards they
> are not enforced so much because the limit has already been broken?
> How do those limits work?
>  Thomas

Those values are used under memory pressure when the pagedaemon scans
for pages to be replaced. They change the behavior which pages are
taken as candidates for replacement. 

Lars

-
Mystische Erklärungen:
Die mystischen Erklärungen gelten für tief;
die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind.

   -- Friedrich Nietzsche
   [ Die Fröhliche Wissenschaft Buch 3, 126 ]


Re: X server being killed a lot

2018-10-29 Thread Thomas Klausner
On Mon, Oct 22, 2018 at 12:18:01PM -0400, Michael wrote:
> I've had firefox starting to get swapped out ( and everything slowing
> to a crawl because of it ) while in active use, with more than half of
> RAM being used as file cache, and nothing hammering the filesystem
> either.
> One would think the OS would shrink the cache first, especially if it's
> several gigabytes.
> 
> It helped somewhat to add this to sysctl.conf:
> vm.filemin=2
> vm.filemax=10
> now it still uses well over 10% or memory as file cache but seems more
> willing to shrink it.

I just gave that a try after X was killed again, setting the values
with sysctl -w.

Then I restarted X, gnucash and firefox and X got killed again before
all of them had finished starting up.

Is there some delay until these values are really used? Or are they only
relevant if we're below the magic boundary and afterwards they are not
enforced so much because the limit has already been broken? How do those
limits work?
 Thomas


Re: X server being killed a lot

2018-10-22 Thread Michael
Hello,

On Mon, 22 Oct 2018 07:34:37 +0200
Thomas Klausner  wrote:

> On Fri, Aug 17, 2018 at 08:20:35AM +0200, Thomas Klausner wrote:
> > On Sat, Jul 28, 2018 at 06:44:50PM +0900, Izumi Tsutsui wrote:  
> > > > When I'm running a bulk build, the X server is a likely victim.
> > > > 
> > > > UVM: pid 28091.1 (X), uid 0 killed: out of swap
> > > > 
> > > > I'm not really sure why because I have lots of swap.
> > > > 
> > > > Swap: 148G Total, 27G Used, 121G Free  
> > > 
> > > I also see the similar problem, on NetBSD/i386 8.0 with 8GB swap.
> > > 
> > > Jul 22 09:29:25 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out 
> > > of swap
> > > Jul 22 22:43:04 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out 
> > > of swap
> > > Jul 22 22:48:24 mirage /netbsd: UVM: pid 4999.1 (Xorg), uid 0 killed: out 
> > > of swap
> > > Jul 22 22:50:08 mirage /netbsd: UVM: pid 14199.1 (Xorg), uid 0 killed: 
> > > out of swap
> > > Jul 22 22:58:48 mirage /netbsd: UVM: pid 3339.1 (Xorg), uid 0 killed: out 
> > > of swap
> > > Jul 22 23:09:28 mirage /netbsd: UVM: pid 11407.1 (Xorg), uid 0 killed: 
> > > out of swap
> > > Jul 22 23:12:20 mirage /netbsd: UVM: pid 29705.1 (Xorg), uid 0 killed: 
> > > out of swap
> > > Jul 22 23:21:39 mirage /netbsd: UVM: pid 1100.1 (Xorg), uid 0 killed: out 
> > > of swap
> > > 
> > > It seems easily reproducible by running firefox and makefs(8)
> > > to create learge iso/ffs images.  
> > 
> > Does anyone have any insight in this?
> > 
> > This is highly annoying behaviour for me - it happens even when I'm
> > actively using the X session, so it's definitely not because it's the
> > least-used process in the system.  
> 
> It just happened again for me.
> 
> top says:
> 
> CPU states:  0.2% user,  0.0% nice, 15.4% system, 17.4% interrupt, 66.8% idle
> Memory: 14G Act, 6984M Inact, 10M Wired, 1758M Exec, 18G File, 59M Free
> Swap: 148G Total, 148G Free
> 
> so there is some pressure on the I/O system for file data, but no swap
> use.
> 
> It looks to me like the priority of the File section is too high, if X
> is killed for that...

Possibly related:
I've had firefox starting to get swapped out ( and everything slowing
to a crawl because of it ) while in active use, with more than half of
RAM being used as file cache, and nothing hammering the filesystem
either.
One would think the OS would shrink the cache first, especially if it's
several gigabytes.

It helped somewhat to add this to sysctl.conf:
vm.filemin=2
vm.filemax=10
now it still uses well over 10% or memory as file cache but seems more
willing to shrink it.

have fun
Michael


Re: X server being killed a lot

2018-08-17 Thread Thomas Klausner
On Sat, Jul 28, 2018 at 06:44:50PM +0900, Izumi Tsutsui wrote:
> > When I'm running a bulk build, the X server is a likely victim.
> > 
> > UVM: pid 28091.1 (X), uid 0 killed: out of swap
> > 
> > I'm not really sure why because I have lots of swap.
> > 
> > Swap: 148G Total, 27G Used, 121G Free
> 
> I also see the similar problem, on NetBSD/i386 8.0 with 8GB swap.
> 
> Jul 22 09:29:25 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out of 
> swap
> Jul 22 22:43:04 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out of 
> swap
> Jul 22 22:48:24 mirage /netbsd: UVM: pid 4999.1 (Xorg), uid 0 killed: out of 
> swap
> Jul 22 22:50:08 mirage /netbsd: UVM: pid 14199.1 (Xorg), uid 0 killed: out of 
> swap
> Jul 22 22:58:48 mirage /netbsd: UVM: pid 3339.1 (Xorg), uid 0 killed: out of 
> swap
> Jul 22 23:09:28 mirage /netbsd: UVM: pid 11407.1 (Xorg), uid 0 killed: out of 
> swap
> Jul 22 23:12:20 mirage /netbsd: UVM: pid 29705.1 (Xorg), uid 0 killed: out of 
> swap
> Jul 22 23:21:39 mirage /netbsd: UVM: pid 1100.1 (Xorg), uid 0 killed: out of 
> swap
> 
> It seems easily reproducible by running firefox and makefs(8)
> to create learge iso/ffs images.

Does anyone have any insight in this?

This is highly annoying behaviour for me - it happens even when I'm
actively using the X session, so it's definitely not because it's the
least-used process in the system.
 Thomas


Re: X server being killed a lot

2018-07-28 Thread Izumi Tsutsui
> When I'm running a bulk build, the X server is a likely victim.
> 
> UVM: pid 28091.1 (X), uid 0 killed: out of swap
> 
> I'm not really sure why because I have lots of swap.
> 
> Swap: 148G Total, 27G Used, 121G Free

I also see the similar problem, on NetBSD/i386 8.0 with 8GB swap.

Jul 22 09:29:25 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out of swap
Jul 22 22:43:04 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out of swap
Jul 22 22:48:24 mirage /netbsd: UVM: pid 4999.1 (Xorg), uid 0 killed: out of 
swap
Jul 22 22:50:08 mirage /netbsd: UVM: pid 14199.1 (Xorg), uid 0 killed: out of 
swap
Jul 22 22:58:48 mirage /netbsd: UVM: pid 3339.1 (Xorg), uid 0 killed: out of 
swap
Jul 22 23:09:28 mirage /netbsd: UVM: pid 11407.1 (Xorg), uid 0 killed: out of 
swap
Jul 22 23:12:20 mirage /netbsd: UVM: pid 29705.1 (Xorg), uid 0 killed: out of 
swap
Jul 22 23:21:39 mirage /netbsd: UVM: pid 1100.1 (Xorg), uid 0 killed: out of 
swap

It seems easily reproducible by running firefox and makefs(8)
to create learge iso/ffs images.

---
Izumi Tsutsui


X server being killed a lot

2018-07-28 Thread Thomas Klausner
Hi!

When I'm running a bulk build, the X server is a likely victim.

UVM: pid 28091.1 (X), uid 0 killed: out of swap

I'm not really sure why because I have lots of swap.

Swap: 148G Total, 27G Used, 121G Free

And usually there is still lots of pages e.g. in Files, which could be
recovered easily (AFAIU). Why does X have to die?

Memory: 16G Act, 8693M Inact, 10M Wired, 1278M Exec, 8367M File, 1004M Free

 Thomas