[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-10 Thread Sami Pietila
This is a new Dell Power Edge server that has been initially installed with Ubuntu 14.04 server. Reported problem(s) seems to occur with all kernels that I have tried (Ubuntu default server kernel and with v3.15 upstream kernel). -- You received this bug notification because you are a member of

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-10 Thread Christopher M. Penalver
** Description changed: Dell PowerEdge 720 on ubuntu 14.04 shows MCE errors on dmesg. Dell support instructed to run DSET and BIOS hardware diagnostics. Neither of the tools showed any errors. Dell support said that if there was a hardware error it would have been shown on Dell logs and

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-09 Thread Joseph Salisbury
Did this issue start happening after an update/upgrade? Was there a kernel version where you were not having this particular problem? This will help determine if the problem you are seeing is the result of the introduction of a regression, and when this regression was introduced. If this is a

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-08 Thread Sami Pietila
Now the logs show also the MCE error, so there seems to be no behavior differences between v3.15 and ubuntu standard kernel. [78132.360975] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR [78132.403996] EDAC sbridge MC1: CPU 1: Machine Check Event: 0 Bank 10: 8c46000800c1 [78132.448800] EDAC

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-08 Thread Sami Pietila
Here are error messages from dmesg: There are lines like: [30187.335401] kernel BUG at /home/apw/COD/linux/mm/memory.c:3924! [30187.337183] invalid opcode: [#1] SMP ... [30223.621247] WARNING: CPU: 12 PID: 29190 at /home/apw/COD/linux/kernel/watchdog.c:249

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-08 Thread Sami Pietila
Top is also showing load averages about 60 - 70, but the process list does look like the system is pretty much idle. ** Attachment added: top.png https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1315736/+attachment/4107851/+files/top.png -- You received this bug notification because you

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-07 Thread Sami Pietila
Hi, I did run the memory test and no errors were detected. I also changed to the mainline kernel. With the mainline kernel (3.15.0-031500rc4-generic #201405042135 SMP) I have not seen yet MCE error or had an unresponsive system, however I can still see some errors on dmesg: [ 840.160260]

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-07 Thread Joseph Salisbury
Can you also give the latest 3.13 upstream kernel a test? It can be downloaded from: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.11-trusty/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu.

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-07 Thread Sami Pietila
It might be after all that this mainline v3.15 behaves like the default ubuntu kernel as the server just went to unresponsive. However, I noticed that I am able to login but only as a root and only from console. The server seems to be in a weird state as ps aux command shows about half of the

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-07 Thread Sami Pietila
I am not sure if I have understood these tools correctly, but does the vmstat show that CPUs are at idle and uptime command show that system load is about 40? ** Attachment added: load.png https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1315736/+attachment/4107526/+files/load.png --

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-05 Thread Sami Pietila
I am also seeing messages like: BUG: soft lockup - CPU#25 stuck for 23s! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1315736 Title: Machine Check Exception Status in “linux” package

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-05 Thread Sami Pietila
It seems that putting the server under load results an unresponsible server with console constantly flooding with error messages: A screenshot attached. ** Attachment added: dmesg.png https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1315736/+attachment/4105408/+files/dmesg.png -- You

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-05 Thread Joseph Salisbury
Also, would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.15 kernel[0]. If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'. If the mainline kernel does

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-05 Thread Joseph Salisbury
Can you also perform a memory test, which can be accessed from the GRUB menu? If you haven't gone to the GRUB menu before, it can be accessed by holding the SHIFT key after system power-on and seeing the BIOS messages. ** Changed in: linux (Ubuntu) Importance: Undecided = Medium ** Changed

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-04 Thread Sami Pietila
mesg also has following message: [ 9441.626809] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756! [ 9441.628777] invalid opcode: [#1] SMP [ 9441.630053] Modules linked in: ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf _defrag_ipv6 ipt_REJECT xt_limit xt_tcpudp xt_addrtype

[Kernel-packages] [Bug 1315736] Re: Machine Check Exception

2014-05-03 Thread Sami Pietila
apport information ** Tags added: apport-collected ** Description changed: Dell PowerEdge 720 on ubuntu 14.04 shows MCE errors on dmesg. Dell support instructed to run DSET and BIOS hardware diagnostics. Neither of the tools showed any errors. Dell support said that if there was a