Hi Colin, On Sat, Jul 4, 2015 at 12:43 AM, Colin Ian King <[email protected]> wrote: > I was able to hit the following translation fault running sudo ./stress- > ng --seq 0 -t 60 --syslog --metrics --times -v
I suggest to not run stress-ng as root, otherwise it can be less serious because: - root user can do bad things easily, and it is quite easy to kill any of process - in reality most of loads are run as non-root If some system processes(irqbalance, systemd-*) are only killed becasue stress-ng is running as root, it can be a low priority issue. Otherwise we need pay close attention to the issue. And I always run 'stress-ng' as ubuntu user without sudo, that may be the reason why it is difficult for me to reproduce that. Even with the two new approaches, it is still not easy for me to reproduce that. I only see one time of translation fault by your first approach(./stress-ng --seq 0 ...) in 6 hours, and can't trigger that with your 2nd approach(by bash script). Folllows the log[1] I triggered, and I think it is very likely a userspace issue. From irqbalanc-dbgsym package, we can easily find 'PC is at 0x406078' is one address in text section, and it should be inside function of 'place_irq_in_node' because the exec file isn't built as relocation. One thing I still can't understand is that why the fault address is '0x00000040' in the context. [1] [ 3616.333392] Bits 55-60 of /proc/PID/pagemap entries are about to stop being page-shift some time soon. See the linux/Documentation/vm/pagemap.txt for details. [ 3616.333393] Bits 55-60 of /proc/PID/pagemap entries are about to stop being page-shift some time soon. See the linux/Documentation/vm/pagemap.txt for details. [ 5316.367265] irqbalance[1457]: unhandled level 2 translation fault (11) at 0x00000040, esr 0x92000006 [ 5316.476937] pgd = ffffffcfb5478000 [ 5316.520692] [00000040] *pgd=0000004fb4a3c003, *pud=0000004fb4a3c003, *pmd=0000000000000000 [ 5316.620270] [ 5316.638140] CPU: 7 PID: 1457 Comm: irqbalance Not tain-21-generic #21-Ubuntu [ 5316.733212] Hardware name: HP ProLiant m400 Server Cartridge (DT) [ 5316.806382] task: ffffffcfb55e6e40 ti: ffffffcfa72b0000 task.ti: ffffffcfa72b0000 [ 5316.896258] PC is at 0x406078 [ 5316.931865] LR is at 0x404100 [ 5316.967457] pc : [<0000000000406078>] lr : [<0000000000404100>] pstate: 20000000 [ 5317.056268] sp : 0000007fc07ff2d0 [ 5317.096038] x29: 0000007fc07ff2d0 x28: 00000000004095a0 [ 5317.160023] x27: 0000000000409548 x26: 000000000041a000 [ 5317.223897] x25: 0000000000405000 x24: 000000000041acf8 [ 5317.287868] x23: 000000000041a000 x22: 000000000041a000 [ 5317.351841] x21: 000000002e0d6050 x20: 000000000041a000 [ 5317.415744] x19: 000000002e0e9020 x18: 0000000000000000 [ 5317.479620] x17: 0000007fb5ac287c x16: 000000000041a188 [ 5317.543490] x15: 003bdd2370f74a1c x14: 2030203020302030 [ 5317.607373] x13: 2030203020302030 x12: 2030203020302030 [ 5317.671263] x11: 2030203020302030 x10: 2030203020302030 [ 5317.735137] x9 : 00000000000000a0 x8 : 0000000000000001 [ 5317.799113] x7 : 0000000000000033 x6 : 000000002e0d6e08 [ 5317.862983] x5 : 0000000000000040 x4 : 0000000000000000 [ 5317.926867] x3 : 000000002e0d7008 x2 : 0000000000000000 [ 5317.990840] x1 : 000000000000002c x0 : 0000000000000003 [ 5318.054713] -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1469214 Title: HP ProLiant m400 Server crashes with unhandled level 3 translation fault To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1469214/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
