Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault

2015-07-13 Thread dann frazier
On Mon, Jul 13, 2015 at 9:27 AM, Ming Lei 1469...@bugs.launchpad.net wrote: Dann, Please follow the steps in #12, in which you should trigger the crash in 4 minutes. I've been running that in a loop and I'm currently on iteration #76 w/o a crash :( Maybe it's Linux ms10-33-mcdivittB0

Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault

2015-07-13 Thread dann frazier
On Mon, Jul 13, 2015 at 9:27 AM, Ming Lei 1469...@bugs.launchpad.net wrote: Dann, Please follow the steps in #12, in which you should trigger the crash in 4 minutes. I've been running that in a loop and I'm currently on iteration #76 w/o a crash :( Maybe it's Linux ms10-33-mcdivittB0

Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault

2015-07-09 Thread dann frazier
On Tue, Jul 7, 2015 at 2:25 AM, Ming Lei 1469...@bugs.launchpad.net wrote: On Tue, Jul 7, 2015 at 11:16 AM, Ming Lei ming@canonical.com wrote: Looks there are two kinds of translation fault from irqbalance: 1) happend in place_irq_in_node() which can reproduce in vivid package 2) the 2nd

Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault

2015-07-09 Thread dann frazier
On Tue, Jul 7, 2015 at 2:25 AM, Ming Lei 1469...@bugs.launchpad.net wrote: On Tue, Jul 7, 2015 at 11:16 AM, Ming Lei ming@canonical.com wrote: Looks there are two kinds of translation fault from irqbalance: 1) happend in place_irq_in_node() which can reproduce in vivid package 2) the 2nd

Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault

2015-07-07 Thread Ming Lei
On Tue, Jul 7, 2015 at 11:16 AM, Ming Lei ming@canonical.com wrote: Looks there are two kinds of translation fault from irqbalance: 1) happend in place_irq_in_node() which can reproduce in vivid package 2) the 2nd one happened in glib2, which is built by myself, because irqbalance can

Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault

2015-07-06 Thread Ming Lei
On Tue, Jul 7, 2015 at 2:37 AM, Colin Ian King 1469...@bugs.launchpad.net wrote: captured irqbalance segfaulting: Program received signal SIGSEGV, Segmentation fault. 0x00408f8c in place_irq_in_node (info=0x2c3d0050, data=0x0) at placement.c:145 145 if

Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault

2015-07-06 Thread Ming Lei
Looks there are two kinds of translation fault from irqbalance: 1) happend in place_irq_in_node() which can reproduce in vivid package 2) the 2nd one happened in glib2, which is built by myself, because irqbalance can choose to use its own local glib if there isn't glib2 available, and the

Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault

2015-07-06 Thread Ming Lei
On Mon, Jul 6, 2015 at 9:28 PM, Colin Ian King 1469...@bugs.launchpad.net wrote: I re-ran this today with the following script as a non-root user: #!/bin/bash tests=affinity aio bigheap brk bsearch cache chdir chmod clock context cpu crypt dentry dir dup epoll eventfd fstat fallocate fault

Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault

2015-07-03 Thread Ming Lei
Hi Colin, On Sat, Jul 4, 2015 at 12:43 AM, Colin Ian King 1469...@bugs.launchpad.net wrote: I was able to hit the following translation fault running sudo ./stress- ng --seq 0 -t 60 --syslog --metrics --times -v I suggest to not run stress-ng as root, otherwise it can be less serious because:

Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault

2015-07-03 Thread Ming Lei
Hi Colin, That looks one progress, but still takes time to reproduce that, and I will use your new approach to reproduce that. When you are doing that, could you dump the file of /proc/$(pidof irqbalance)/maps so that we can see where the faulted address are in the process's vm space? thanks,