Re: NetBSD 7.1.1 instability - hangs

2018-03-14 Thread Riccardo Mottola

Hi Robert,

I did some tests.

On 2018-03-10 00:01:27 + Robert Elz  wrote:


  | X does not get "killed" either, I do not return to console.

If it gets killed, that (not returning to console) is expected.

  | On this system usually the X driver is decently well-behaved,
That I believe takes assistance from the X server.

  | so while things work I can switch to terminals,

Yes, the working server helps with the transition.

  | if X was being killed by  | out-of-swap, I'd expect to get to the 
console 
again.


I wouldn't.   But you can easily test that - just start X
(no need for anything huge like ruse - the "by out of
swap" is not important for this test)  and kill -9 the
X server process.   Observe what happens.   You might
want to make sure you can ssh in (or have already done
that) so you can gracefully reboot.


I tried by telneting into the box and
1) killed with normal signal: recovery is perfect
2) killed with -9 as you suggested: recovery is almost perfect: I get 
back to console (framebuffer, actually) although only with a cursoer. 
I am on a wrong terminal, switching console gives me back to the 
original prompt.


I'd say that is quite fine.


Of course, this does not mean that this is what is happening
in the situation in question, it could also be dropping to DDB
and hanging (because the console is still in X mode) as you
indicated you think is happening.


Maybe that is the cause then, with ddb perhaps there is no real "kill" 
and the situation can be explained.




That one can be tested by disabling ddb.onpanic
( sysctl -w ddb.onpanic=0 )
and doing the rust build again - if the kernel crashes, it
will simply crash and reboot now.

Or you could do the rust build from a wscons console,
and not run X at all, and see what happens then, if it
is dropping into DDB, you should see it happen, and be
able to interact.


I did build rust by not runnign anything else, just left the ThinkPad 
there compiling.

rustc takes a to long time (I hate it, but that is another story).
I checked from time to time memory usage and the process stayed from 
300MB up to 1.2GB... but no swap usage. th machine ghas 1.5GB of RAM



Riccardo



Re: NetBSD 7.1.1 instability - hangs

2018-03-09 Thread Robert Elz
Date:Fri, 9 Mar 2018 22:53:57 +0100
From:Riccardo Mottola 
Message-ID:  <2135aaf6-afc5-f10b-1511-cf59e0df2...@libero.it>

  | X does not get "killed" either, I do not return to console.

If it gets killed, that (not returning to console) is expected.

  | On this system usually the X driver is decently well-behaved, 

That I believe takes assistance from the X server.

  | so while things work I can switch to terminals,

Yes, the working server helps with the transition.

  | if X was being killed by 
  | out-of-swap, I'd expect to get to the console again.

I wouldn't.   But you can easily test that - just start X
(no need for anything huge like ruse - the "by out of
swap" is not important for this test)  and kill -9 the
X server process.   Observe what happens.   You might
want to make sure you can ssh in (or have already done
that) so you can gracefully reboot.

Of course, this does not mean that this is what is happening
in the situation in question, it could also be dropping to DDB
and hanging (because the console is still in X mode) as you
indicated you think is happening.

That one can be tested by disabling ddb.onpanic
( sysctl -w ddb.onpanic=0 )
and doing the rust build again - if the kernel crashes, it
will simply crash and reboot now.

Or you could do the rust build from a wscons console,
and not run X at all, and see what happens then, if it
is dropping into DDB, you should see it happen, and be
able to interact.

kre



Re: NetBSD 7.1.1 instability - hangs

2018-03-09 Thread Riccardo Mottola

Hi.



On 09/03/2018 17:14, m...@netbsd.org wrote:

I strongly suspect you are running out of RAM due to lang/rust not
respecting MAKE_JOBS and linking in parallel, killing Xorg.

I've seen the same.

You can tell if normal shutdown works fine and you get a message in UVM
about Xorg being killed

So it's three bugs:

- lang/rust doesn't respect MAKE_JOBS
- Xorg being killed like so is exceptionally fatal, we don't recover from it
   ... this should be possible to fix now.
- We could try to avoid killing Xorg in these circumstances

(this behaviour is caused by something called 'memory overcommit')

as a short term solution:
- Add more swap, so it doesn't blow up


I don't think that is the case - the machine hangs: hitting the power 
buttons does not power-down gracefully, I need to hold it for 5 seconds 
and then at reboot a filesystem check.

X does not get "killed" either, I do not return to console.
On this system usually the X driver is decently well-behaved, so while 
things work I can switch to terminals, if X was being killed by 
out-of-swap, I'd expect to get to the console again.


I fear that the kernel panics/goes into debugger and I cannot see it 
because X is running.


I can of course resume rust isntall and check in an xterm how the memory 
usage appears, but in case that is the issue, it is being handled much 
more ungracefully than it should.


Riccardo


Re: NetBSD 7.1.1 instability - hangs

2018-03-09 Thread Mike Pumford



On 09/03/2018 16:14, m...@netbsd.org wrote:

I strongly suspect you are running out of RAM due to lang/rust not
respecting MAKE_JOBS and linking in parallel, killing Xorg.

The rust compiler is multi-threaded so 1 process can use all the cores 
of the system all on its own. However if you have got the resources to 
handle it its still quite a lot faster to build rust with MAKE_JOBS set 
to a value >1


Thankfully my build machine has the RAM to cope but given the average 
size of a rustc process it would be easy to see how it could eat all the 
system RAM and swap.

Mike


Re: NetBSD 7.1.1 instability - hangs

2018-03-09 Thread maya
You can confirm this by reading /var/log/messages

It will look like:
UVM pid 123 (X) killed: out of swap


Re: NetBSD 7.1.1 instability - hangs

2018-03-09 Thread maya
I strongly suspect you are running out of RAM due to lang/rust not
respecting MAKE_JOBS and linking in parallel, killing Xorg.

I've seen the same.

You can tell if normal shutdown works fine and you get a message in UVM
about Xorg being killed

So it's three bugs:

- lang/rust doesn't respect MAKE_JOBS
- Xorg being killed like so is exceptionally fatal, we don't recover from it
  ... this should be possible to fix now.
- We could try to avoid killing Xorg in these circumstances

(this behaviour is caused by something called 'memory overcommit')

as a short term solution:
- Add more swap, so it doesn't blow up


NetBSD 7.1.1 instability - hangs

2018-03-09 Thread Riccardo Mottola

Hi All,

I upgraded 7.1.1 on my ThinkPad R51 (this means x86 32bit single core), 
the machine was rock stable 7.1, it is not with 7.1.1
In a couple of hours it crashed several times: it means it "hangs", I 
need to power-cycle.


What I do is starting a package upgrade (in specific, rust, which takes 
days) and start working inside X11 and at one point the computer will hang.

Since X11 is running, I cannot switch consoles either.

If I check "dmesg" there is no information on the previous run/crash.

Once I crashed into the debugger while I was still setting up the 
machine, but I did not think at that moment to get a trace, it could 
have been meaningful.


Anybody else has such a bad experience from 7.1 to 7.1.1 ?


Riccardo