Re: X server being killed a lot
In article , Michael van Elst wrote: >chris...@astron.com (Christos Zoulas) writes: > >>But we kill the process that faulted in this case not the process that >>likely caused the shortage. We should be keeping stats so that we can >>select a better victim, then kill that instead and retry. But this is >>easier said than done :-) > >Linux tried for years. The best they have is to mark specific processes >as not eligible for killing. I would be happy with that. Having syslogd killed for example is not nice. >But first should be to find out what allocation failed. There are >reasons to believe this is caused by the DRM memory management. >And then the X server is the process that caused the shortage and >still shouldn't be killed. I agree. christos
Re: X server being killed a lot
Izumi Tsutsui wrote: >> Do we know what combination of things is causing X to be killed ? > >I can reproduce it by Xorg server + Firefox 62 + makefs(8) creating >4GB FFS image on NetBSD/i386 8.0 (i.e. on building live images). How is your X server configured ? Is it operating on a framebuffer in main memory or VRAM on a separate graphics card ? The sizes shown in top(1) for X on my system are smaller than several other processes. This is with an AMD Radeon GPU with 1GB VRAM.
Re: X server being killed a lot
Izumi Tsutsui wrote: >> Izumi Tsutsui wrote: >> >> Do we know what combination of things is causing X to be killed ? >> > >> >I can reproduce it by Xorg server + Firefox 62 + makefs(8) creating >> >4GB FFS image on NetBSD/i386 8.0 (i.e. on building live images). >> >> How is your X server configured ? Is it operating on a framebuffer in >> main memory or VRAM on a separate graphics card ? > >My machine has RADEON HD 5450 so it has own VRAM, I think I think I have the same model. >> The sizes shown in top(1) for X on my system are smaller than several >> other processes. This is with an AMD Radeon GPU with 1GB VRAM. I guess I'm not getting to the point where anything has been paged out: Memory: 9017M Act, 1400M Inact, 9400K Wired, 345M Exec, 7679M File, 2156M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPUCPU COMMAND 16605 rjs 430 2964M 1328M parked/0 230:45 6.88% 6.88% firefox 634 rjs 850 182M 61M select/0 296:32 0.59% 0.59% X 18233 rjs 850 5840M 699M futex/3 84:31 0.00% 0.00% java 711 rjs 850 301M 179M select/5 12:10 0.00% 0.00% emacs 615 rjs 85065M 6332K select/5 2:21 0.00% 0.00% mwm 21786 rjs 850 1182M 202M select/1 0:51 0.00% 0.00% sbcl 22344 rjs 85097M 33M select/3 0:10 0.00% 0.00% xpdf 1028 rjs 850 135M 7872K wait/0 0:00 0.00% 0.00% eclipse My memory summary line looks similar to what wiz@ reported in the original message though.
Re: X server being killed a lot
> Izumi Tsutsui wrote: > >> Do we know what combination of things is causing X to be killed ? > > > >I can reproduce it by Xorg server + Firefox 62 + makefs(8) creating > >4GB FFS image on NetBSD/i386 8.0 (i.e. on building live images). > > How is your X server configured ? Is it operating on a framebuffer in > main memory or VRAM on a separate graphics card ? My machine has RADEON HD 5450 so it has own VRAM, I think > The sizes shown in top(1) for X on my system are smaller than several > other processes. This is with an AMD Radeon GPU with 1GB VRAM. top(1) on the same environment says: --- Memory: 1976M Act, 974M Inact, 57M Wired, 139M Exec, 867M File, 16M Free Swap: 8192M Total, 1578M Used, 6614M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPUCPU COMMAND 2521 tsutsui 430 1588M 1118M parked/0 36:33 1.22% 1.22% firefox 708 tsutsui 850 1431M 764M select/0 484.2H 0.88% 0.88% ruby24 73 tsutsui 850 383M 145M select/0 117:56 0.00% 0.00% Xorg : --- Always Xorg was killed, not ruby24 or firefox: --- % zgrep 'out of swap' /var/log/messages* /var/log/messages:Oct 14 00:53:12 mirage /netbsd: UVM: pid 1962.1 (Xorg), uid 0 killed: out of swap /var/log/messages:Oct 18 23:27:19 mirage /netbsd: UVM: pid 4634.1 (Xorg), uid 0 killed: out of swap /var/log/messages.0.gz:Aug 22 00:45:01 mirage /netbsd: UVM: pid 2257.1 (Xorg), uid 0 killed: out of swap /var/log/messages.1.gz:Aug 6 22:35:49 mirage /netbsd: UVM: pid 394.1 (Xorg), uid 0 killed: out of swap /var/log/messages.1.gz:Aug 13 14:34:55 mirage /netbsd: UVM: pid 491.1 (Xorg), uid 0 killed: out of swap /var/log/messages.1.gz:Aug 13 17:18:07 mirage /netbsd: UVM: pid 2576.1 (Xorg), uid 0 killed: out of swap /var/log/messages.1.gz:Aug 15 13:07:11 mirage /netbsd: UVM: pid 970.1 (Xorg), uid 0 killed: out of swap /var/log/messages.4.gz:Jul 7 11:49:25 mirage /netbsd: UVM: pid 481.1 (Xorg), uid 0 killed: out of swap /var/log/messages.4.gz:Jul 22 09:29:25 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out of swap /var/log/messages.4.gz:Jul 22 22:43:04 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out of swap /var/log/messages.4.gz:Jul 22 22:48:24 mirage /netbsd: UVM: pid 4999.1 (Xorg), uid 0 killed: out of swap /var/log/messages.4.gz:Jul 22 22:50:08 mirage /netbsd: UVM: pid 14199.1 (Xorg), uid 0 killed: out of swap /var/log/messages.4.gz:Jul 22 22:58:48 mirage /netbsd: UVM: pid 3339.1 (Xorg), uid 0 killed: out of swap /var/log/messages.4.gz:Jul 22 23:09:28 mirage /netbsd: UVM: pid 11407.1 (Xorg), uid 0 killed: out of swap /var/log/messages.4.gz:Jul 22 23:12:20 mirage /netbsd: UVM: pid 29705.1 (Xorg), uid 0 killed: out of swap /var/log/messages.4.gz:Jul 22 23:21:39 mirage /netbsd: UVM: pid 1100.1 (Xorg), uid 0 killed: out of swap /var/log/messages.5.gz:Jun 2 20:48:55 mirage /netbsd: UVM: pid 846.1 (Xorg), uid 0 killed: out of swap /var/log/messages.5.gz:Jun 3 03:09:12 mirage /netbsd: UVM: pid 14182.1 (Xorg), uid 0 killed: out of swap /var/log/messages.5.gz:Jun 4 02:29:10 mirage /netbsd: UVM: pid 1260.1 (Xorg), uid 0 killed: out of swap /var/log/messages.5.gz:Jun 9 06:50:52 mirage /netbsd: UVM: pid 10595.1 (Xorg), uid 0 killed: out of swap /var/log/messages.5.gz:Jun 9 20:33:06 mirage /netbsd: UVM: pid 14867.1 (Xorg), uid 0 killed: out of swap /var/log/messages.5.gz:Jun 18 21:49:19 mirage /netbsd: UVM: pid 12485.1 (Xorg), uid 0 killed: out of swap /var/log/messages.5.gz:Jun 18 22:23:54 mirage /netbsd: UVM: pid 26101.1 (Xorg), uid 0 killed: out of swap /var/log/messages.5.gz:Jun 20 00:42:46 mirage /netbsd: UVM: pid 491.1 (Xorg), uid 0 killed: out of swap /var/log/messages.5.gz:Jun 23 06:34:39 mirage /netbsd: UVM: pid 22067.1 (Xorg), uid 0 killed: out of swap /var/log/messages.5.gz:Jun 30 00:06:18 mirage /netbsd: UVM: pid 12055.1 (Xorg), uid 0 killed: out of swap /var/log/messages.6.gz:May 5 17:40:40 mirage /netbsd: UVM: pid 74.1 (Xorg), uid 0 killed: out of swap /var/log/messages.6.gz:May 12 03:31:54 mirage /netbsd: UVM: pid 26634.1 (Xorg), uid 0 killed: out of swap /var/log/messages.6.gz:May 12 05:49:47 mirage /netbsd: UVM: pid 7793.1 (Xorg), uid 0 killed: out of swap /var/log/messages.6.gz:May 12 15:31:58 mirage /netbsd: UVM: pid 7632.1 (Xorg), uid 0 killed: out of swap /var/log/messages.6.gz:May 13 12:07:35 mirage /netbsd: UVM: pid 28029.1 (Xorg), uid 0 killed: out of swap /var/log/messages.6.gz:May 13 15:27:24 mirage /netbsd: UVM: pid 1197.1 (Xorg), uid 0 killed: out of swap /var/log/messages.6.gz:May 13 20:54:15 mirage /netbsd: UVM: pid 833.1 (Xorg), uid 0 killed: out of swap /var/log/messages.6.gz:May 17 00:28:20 mirage /netbsd: UVM: pid 3351.1 (Xorg), uid 0 killed: out of swap /var/log/messages.7.gz:May 4 11:01:59 mirage /netbsd: UVM: pid 1588.1 (Xorg), uid 0 killed: out of swap % --- Note I updated the machine from 7.1.2 to 8.0_RC1 on April 30 (and no 'out of swap' messages in older logs). --- Izumi Tsutsui
Re: X server being killed a lot
> Do we know what combination of things is causing X to be killed ? I can reproduce it by Xorg server + Firefox 62 + makefs(8) creating 4GB FFS image on NetBSD/i386 8.0 (i.e. on building live images). IIRC no such problem on 7.x days. (though Firefox was also smaller in those days) --- Izumi Tsutsui
Re: X server being killed a lot
Do we know what combination of things is causing X to be killed ? I have never seen it happen and am running X, Firefox and several other big packages as well as doing builds on the same machine. Robert Swindells
Re: X server being killed a lot
chris...@astron.com (Christos Zoulas) writes: >But we kill the process that faulted in this case not the process that >likely caused the shortage. We should be keeping stats so that we can >select a better victim, then kill that instead and retry. But this is >easier said than done :-) Linux tried for years. The best they have is to mark specific processes as not eligible for killing. But first should be to find out what allocation failed. There are reasons to believe this is caused by the DRM memory management. And then the X server is the process that caused the shortage and still shouldn't be killed. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: X server being killed a lot
In article , Michael van Elst wrote: >mlel...@serpens.de (Michael van Elst) writes: > >>filemax is not the limit for the cache but the level it tries to keep >>when pressed for memory. > >None of these settings are directly responsible for killing a process, >they just help to avoid that the system runs against the wall. > >A process is killed by UVM when it needs to fault-in a page but there >is no free page and it thinks none could be reclaimed. As long as there >is swap, the assumption is that there is at least one anon page that can >be reclaimed somewhen and nothing is killed. As long as the file cache >exceeds 1/16 of managed memory or 5MByte, the assumption is that at >least one file page can be reclaimed somewhen and nothing is killed. > >There is one more possibility. Even when there is swap and pages >could be reclaimed but the pager itself runs out of (kernel) memory, >that error can kill the process. That includes also a failure to >allocate kernel address space. > >The UVM history should give you the exact reason why the fault >couldn't be handled. But we kill the process that faulted in this case not the process that likely caused the shortage. We should be keeping stats so that we can select a better victim, then kill that instead and retry. But this is easier said than done :-) christos
Re: X server being killed a lot
mlel...@serpens.de (Michael van Elst) writes: >filemax is not the limit for the cache but the level it tries to keep >when pressed for memory. None of these settings are directly responsible for killing a process, they just help to avoid that the system runs against the wall. A process is killed by UVM when it needs to fault-in a page but there is no free page and it thinks none could be reclaimed. As long as there is swap, the assumption is that there is at least one anon page that can be reclaimed somewhen and nothing is killed. As long as the file cache exceeds 1/16 of managed memory or 5MByte, the assumption is that at least one file page can be reclaimed somewhen and nothing is killed. There is one more possibility. Even when there is swap and pages could be reclaimed but the pager itself runs out of (kernel) memory, that error can kill the process. That includes also a failure to allocate kernel address space. The UVM history should give you the exact reason why the fault couldn't be handled. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: X server being killed a lot
t...@giga.or.at (Thomas Klausner) writes: >On Mon, Oct 22, 2018 at 12:18:01PM -0400, Michael wrote: >> It helped somewhat to add this to sysctl.conf: >> vm.filemin=2 >> vm.filemax=10 >> now it still uses well over 10% or memory as file cache but seems more >> willing to shrink it. filemax is not the limit for the cache but the level it tries to keep when pressed for memory. >Is there some delay until these values are really used? Or are they only >relevant if we're below the magic boundary and afterwards they are not >enforced so much because the limit has already been broken? How do those >limits work? The three types anon, file and exec can grow as long as memory permits. Things change when free memory drops below some limit, then the page daemon tries to free inactive pages. inactive pages are those that haven't been used recently. When scanning for pages to free, it follows a simple heuristic. Pages that belong to a type (anon,file,exec) that is below the minimum will be skipped, pages that belong to a type that is above minimum but below maximum will be skipped if any other type is above maximum. If all types would be skipped (then all are below minimum), then nothing is skipped. If only file and exec would be skipped but swap is full (so anon cannot be paged out), then nothing is skipped. As a side effect, pages skipped in the scan are activated and thus removed from the inactive queue for some time. So the heuristic first tries to reduce everything to the maximum, then tries to reduce everything to the minimum, and then as far as possible. It will never try to free active pages. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: X server being killed a lot
On Mon, 29 Oct 2018 09:46:34 +0100 Thomas Klausner wrote: > On Mon, Oct 22, 2018 at 12:18:01PM -0400, Michael wrote: > > I've had firefox starting to get swapped out ( and everything > > slowing to a crawl because of it ) while in active use, with more > > than half of RAM being used as file cache, and nothing hammering > > the filesystem either. > > One would think the OS would shrink the cache first, especially if > > it's several gigabytes. > > > > It helped somewhat to add this to sysctl.conf: > > vm.filemin=2 > > vm.filemax=10 > > now it still uses well over 10% or memory as file cache but seems > > more willing to shrink it. > > I just gave that a try after X was killed again, setting the values > with sysctl -w. > > Then I restarted X, gnucash and firefox and X got killed again before > all of them had finished starting up. > > Is there some delay until these values are really used? Or are they > only relevant if we're below the magic boundary and afterwards they > are not enforced so much because the limit has already been broken? > How do those limits work? > Thomas Those values are used under memory pressure when the pagedaemon scans for pages to be replaced. They change the behavior which pages are taken as candidates for replacement. Lars - Mystische Erklärungen: Die mystischen Erklärungen gelten für tief; die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind. -- Friedrich Nietzsche [ Die Fröhliche Wissenschaft Buch 3, 126 ]
Re: X server being killed a lot
On Mon, Oct 22, 2018 at 12:18:01PM -0400, Michael wrote: > I've had firefox starting to get swapped out ( and everything slowing > to a crawl because of it ) while in active use, with more than half of > RAM being used as file cache, and nothing hammering the filesystem > either. > One would think the OS would shrink the cache first, especially if it's > several gigabytes. > > It helped somewhat to add this to sysctl.conf: > vm.filemin=2 > vm.filemax=10 > now it still uses well over 10% or memory as file cache but seems more > willing to shrink it. I just gave that a try after X was killed again, setting the values with sysctl -w. Then I restarted X, gnucash and firefox and X got killed again before all of them had finished starting up. Is there some delay until these values are really used? Or are they only relevant if we're below the magic boundary and afterwards they are not enforced so much because the limit has already been broken? How do those limits work? Thomas
Re: X server being killed a lot
Hello, On Mon, 22 Oct 2018 07:34:37 +0200 Thomas Klausner wrote: > On Fri, Aug 17, 2018 at 08:20:35AM +0200, Thomas Klausner wrote: > > On Sat, Jul 28, 2018 at 06:44:50PM +0900, Izumi Tsutsui wrote: > > > > When I'm running a bulk build, the X server is a likely victim. > > > > > > > > UVM: pid 28091.1 (X), uid 0 killed: out of swap > > > > > > > > I'm not really sure why because I have lots of swap. > > > > > > > > Swap: 148G Total, 27G Used, 121G Free > > > > > > I also see the similar problem, on NetBSD/i386 8.0 with 8GB swap. > > > > > > Jul 22 09:29:25 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out > > > of swap > > > Jul 22 22:43:04 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out > > > of swap > > > Jul 22 22:48:24 mirage /netbsd: UVM: pid 4999.1 (Xorg), uid 0 killed: out > > > of swap > > > Jul 22 22:50:08 mirage /netbsd: UVM: pid 14199.1 (Xorg), uid 0 killed: > > > out of swap > > > Jul 22 22:58:48 mirage /netbsd: UVM: pid 3339.1 (Xorg), uid 0 killed: out > > > of swap > > > Jul 22 23:09:28 mirage /netbsd: UVM: pid 11407.1 (Xorg), uid 0 killed: > > > out of swap > > > Jul 22 23:12:20 mirage /netbsd: UVM: pid 29705.1 (Xorg), uid 0 killed: > > > out of swap > > > Jul 22 23:21:39 mirage /netbsd: UVM: pid 1100.1 (Xorg), uid 0 killed: out > > > of swap > > > > > > It seems easily reproducible by running firefox and makefs(8) > > > to create learge iso/ffs images. > > > > Does anyone have any insight in this? > > > > This is highly annoying behaviour for me - it happens even when I'm > > actively using the X session, so it's definitely not because it's the > > least-used process in the system. > > It just happened again for me. > > top says: > > CPU states: 0.2% user, 0.0% nice, 15.4% system, 17.4% interrupt, 66.8% idle > Memory: 14G Act, 6984M Inact, 10M Wired, 1758M Exec, 18G File, 59M Free > Swap: 148G Total, 148G Free > > so there is some pressure on the I/O system for file data, but no swap > use. > > It looks to me like the priority of the File section is too high, if X > is killed for that... Possibly related: I've had firefox starting to get swapped out ( and everything slowing to a crawl because of it ) while in active use, with more than half of RAM being used as file cache, and nothing hammering the filesystem either. One would think the OS would shrink the cache first, especially if it's several gigabytes. It helped somewhat to add this to sysctl.conf: vm.filemin=2 vm.filemax=10 now it still uses well over 10% or memory as file cache but seems more willing to shrink it. have fun Michael
Re: X server being killed a lot
On Sat, Jul 28, 2018 at 06:44:50PM +0900, Izumi Tsutsui wrote: > > When I'm running a bulk build, the X server is a likely victim. > > > > UVM: pid 28091.1 (X), uid 0 killed: out of swap > > > > I'm not really sure why because I have lots of swap. > > > > Swap: 148G Total, 27G Used, 121G Free > > I also see the similar problem, on NetBSD/i386 8.0 with 8GB swap. > > Jul 22 09:29:25 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out of > swap > Jul 22 22:43:04 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out of > swap > Jul 22 22:48:24 mirage /netbsd: UVM: pid 4999.1 (Xorg), uid 0 killed: out of > swap > Jul 22 22:50:08 mirage /netbsd: UVM: pid 14199.1 (Xorg), uid 0 killed: out of > swap > Jul 22 22:58:48 mirage /netbsd: UVM: pid 3339.1 (Xorg), uid 0 killed: out of > swap > Jul 22 23:09:28 mirage /netbsd: UVM: pid 11407.1 (Xorg), uid 0 killed: out of > swap > Jul 22 23:12:20 mirage /netbsd: UVM: pid 29705.1 (Xorg), uid 0 killed: out of > swap > Jul 22 23:21:39 mirage /netbsd: UVM: pid 1100.1 (Xorg), uid 0 killed: out of > swap > > It seems easily reproducible by running firefox and makefs(8) > to create learge iso/ffs images. Does anyone have any insight in this? This is highly annoying behaviour for me - it happens even when I'm actively using the X session, so it's definitely not because it's the least-used process in the system. Thomas
Re: X server being killed a lot
> When I'm running a bulk build, the X server is a likely victim. > > UVM: pid 28091.1 (X), uid 0 killed: out of swap > > I'm not really sure why because I have lots of swap. > > Swap: 148G Total, 27G Used, 121G Free I also see the similar problem, on NetBSD/i386 8.0 with 8GB swap. Jul 22 09:29:25 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out of swap Jul 22 22:43:04 mirage /netbsd: UVM: pid 75.1 (Xorg), uid 0 killed: out of swap Jul 22 22:48:24 mirage /netbsd: UVM: pid 4999.1 (Xorg), uid 0 killed: out of swap Jul 22 22:50:08 mirage /netbsd: UVM: pid 14199.1 (Xorg), uid 0 killed: out of swap Jul 22 22:58:48 mirage /netbsd: UVM: pid 3339.1 (Xorg), uid 0 killed: out of swap Jul 22 23:09:28 mirage /netbsd: UVM: pid 11407.1 (Xorg), uid 0 killed: out of swap Jul 22 23:12:20 mirage /netbsd: UVM: pid 29705.1 (Xorg), uid 0 killed: out of swap Jul 22 23:21:39 mirage /netbsd: UVM: pid 1100.1 (Xorg), uid 0 killed: out of swap It seems easily reproducible by running firefox and makefs(8) to create learge iso/ffs images. --- Izumi Tsutsui
X server being killed a lot
Hi! When I'm running a bulk build, the X server is a likely victim. UVM: pid 28091.1 (X), uid 0 killed: out of swap I'm not really sure why because I have lots of swap. Swap: 148G Total, 27G Used, 121G Free And usually there is still lots of pages e.g. in Files, which could be recovered easily (AFAIU). Why does X have to die? Memory: 16G Act, 8693M Inact, 10M Wired, 1278M Exec, 8367M File, 1004M Free Thomas