Re: NetBSD 7.1.1 instability - hangs
Hi Robert, I did some tests. On 2018-03-10 00:01:27 + Robert Elzwrote: | X does not get "killed" either, I do not return to console. If it gets killed, that (not returning to console) is expected. | On this system usually the X driver is decently well-behaved, That I believe takes assistance from the X server. | so while things work I can switch to terminals, Yes, the working server helps with the transition. | if X was being killed by | out-of-swap, I'd expect to get to the console again. I wouldn't. But you can easily test that - just start X (no need for anything huge like ruse - the "by out of swap" is not important for this test) and kill -9 the X server process. Observe what happens. You might want to make sure you can ssh in (or have already done that) so you can gracefully reboot. I tried by telneting into the box and 1) killed with normal signal: recovery is perfect 2) killed with -9 as you suggested: recovery is almost perfect: I get back to console (framebuffer, actually) although only with a cursoer. I am on a wrong terminal, switching console gives me back to the original prompt. I'd say that is quite fine. Of course, this does not mean that this is what is happening in the situation in question, it could also be dropping to DDB and hanging (because the console is still in X mode) as you indicated you think is happening. Maybe that is the cause then, with ddb perhaps there is no real "kill" and the situation can be explained. That one can be tested by disabling ddb.onpanic ( sysctl -w ddb.onpanic=0 ) and doing the rust build again - if the kernel crashes, it will simply crash and reboot now. Or you could do the rust build from a wscons console, and not run X at all, and see what happens then, if it is dropping into DDB, you should see it happen, and be able to interact. I did build rust by not runnign anything else, just left the ThinkPad there compiling. rustc takes a to long time (I hate it, but that is another story). I checked from time to time memory usage and the process stayed from 300MB up to 1.2GB... but no swap usage. th machine ghas 1.5GB of RAM Riccardo
Re: NetBSD 7.1.1 instability - hangs
Date:Fri, 9 Mar 2018 22:53:57 +0100 From:Riccardo MottolaMessage-ID: <2135aaf6-afc5-f10b-1511-cf59e0df2...@libero.it> | X does not get "killed" either, I do not return to console. If it gets killed, that (not returning to console) is expected. | On this system usually the X driver is decently well-behaved, That I believe takes assistance from the X server. | so while things work I can switch to terminals, Yes, the working server helps with the transition. | if X was being killed by | out-of-swap, I'd expect to get to the console again. I wouldn't. But you can easily test that - just start X (no need for anything huge like ruse - the "by out of swap" is not important for this test) and kill -9 the X server process. Observe what happens. You might want to make sure you can ssh in (or have already done that) so you can gracefully reboot. Of course, this does not mean that this is what is happening in the situation in question, it could also be dropping to DDB and hanging (because the console is still in X mode) as you indicated you think is happening. That one can be tested by disabling ddb.onpanic ( sysctl -w ddb.onpanic=0 ) and doing the rust build again - if the kernel crashes, it will simply crash and reboot now. Or you could do the rust build from a wscons console, and not run X at all, and see what happens then, if it is dropping into DDB, you should see it happen, and be able to interact. kre
Re: NetBSD 7.1.1 instability - hangs
Hi. On 09/03/2018 17:14, m...@netbsd.org wrote: I strongly suspect you are running out of RAM due to lang/rust not respecting MAKE_JOBS and linking in parallel, killing Xorg. I've seen the same. You can tell if normal shutdown works fine and you get a message in UVM about Xorg being killed So it's three bugs: - lang/rust doesn't respect MAKE_JOBS - Xorg being killed like so is exceptionally fatal, we don't recover from it ... this should be possible to fix now. - We could try to avoid killing Xorg in these circumstances (this behaviour is caused by something called 'memory overcommit') as a short term solution: - Add more swap, so it doesn't blow up I don't think that is the case - the machine hangs: hitting the power buttons does not power-down gracefully, I need to hold it for 5 seconds and then at reboot a filesystem check. X does not get "killed" either, I do not return to console. On this system usually the X driver is decently well-behaved, so while things work I can switch to terminals, if X was being killed by out-of-swap, I'd expect to get to the console again. I fear that the kernel panics/goes into debugger and I cannot see it because X is running. I can of course resume rust isntall and check in an xterm how the memory usage appears, but in case that is the issue, it is being handled much more ungracefully than it should. Riccardo
Re: NetBSD 7.1.1 instability - hangs
On 09/03/2018 16:14, m...@netbsd.org wrote: I strongly suspect you are running out of RAM due to lang/rust not respecting MAKE_JOBS and linking in parallel, killing Xorg. The rust compiler is multi-threaded so 1 process can use all the cores of the system all on its own. However if you have got the resources to handle it its still quite a lot faster to build rust with MAKE_JOBS set to a value >1 Thankfully my build machine has the RAM to cope but given the average size of a rustc process it would be easy to see how it could eat all the system RAM and swap. Mike
Re: NetBSD 7.1.1 instability - hangs
You can confirm this by reading /var/log/messages It will look like: UVM pid 123 (X) killed: out of swap
Re: NetBSD 7.1.1 instability - hangs
I strongly suspect you are running out of RAM due to lang/rust not respecting MAKE_JOBS and linking in parallel, killing Xorg. I've seen the same. You can tell if normal shutdown works fine and you get a message in UVM about Xorg being killed So it's three bugs: - lang/rust doesn't respect MAKE_JOBS - Xorg being killed like so is exceptionally fatal, we don't recover from it ... this should be possible to fix now. - We could try to avoid killing Xorg in these circumstances (this behaviour is caused by something called 'memory overcommit') as a short term solution: - Add more swap, so it doesn't blow up
NetBSD 7.1.1 instability - hangs
Hi All, I upgraded 7.1.1 on my ThinkPad R51 (this means x86 32bit single core), the machine was rock stable 7.1, it is not with 7.1.1 In a couple of hours it crashed several times: it means it "hangs", I need to power-cycle. What I do is starting a package upgrade (in specific, rust, which takes days) and start working inside X11 and at one point the computer will hang. Since X11 is running, I cannot switch consoles either. If I check "dmesg" there is no information on the previous run/crash. Once I crashed into the debugger while I was still setting up the machine, but I did not think at that moment to get a trace, it could have been meaningful. Anybody else has such a bad experience from 7.1 to 7.1.1 ? Riccardo