On Tue, May 14, 2024 at 10:58 PM Mauro Tridici <mauro.trid...@cmcc.it> wrote:
> I will try to solve this issue by myself, but, if you have any interesting > idea, please, share it with me :) It is great that you can reproduce the issue reliably - it gives hope that we can find the problem. I still think something is off on your production machine. So if I were you, I would work towards being able to reproduce the issue on another machine - preferably a VM. Maybe install a fresh VM with the same OS. Take a snapshot (called A). Then copy all files from production to the VM (most importantly /bin /lib /usr /etc). If you can then reproduce the error take another snapshot (called B). Then copy files from A to B. Can you make the error disappear? Can you make the error appear if you copy files from B to A? Is there some sort of monitoring system on production that is not on your VM? Maybe such a system would find it weird to kill off a lot of processes in one go. Can you trigger the error by: seq 10000 | parallel -j 0 sleep & sleep 1 killall -9 sleep Happy bug hunting /Ole