Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-21 Thread Kevin Stange
For some additional context, all my hardware is Supermicro and working
great on 4.9.13 - 26.  I have dom0_max_vcpus=2 because of issues I was
having with deadlocked CPU cores before setting that option on 3.18
kernels.  In my experience setting that value doesn't cause any
detriment to the dom0, which isn't doing most of the work anyway.

These are all the motherboards I'm running the kernel stably on:

Supermicro X8DT3
Supermicro X8DT6
Supermicro X9DRD-iF/LF
Supermicro X9DRT
Supermicro X9SCL/X9SCM

I'm on CentOS 6 across the board.

On 04/21/2017 05:01 AM, Mark L Sung wrote:
> Hu, seems there are still stability issues on the
> "4.9.2-26.el7.x86_64", recently hear many issue related to Supermicro
> board! :-(
> 
> Peace!!!
> 
> On Fri, Apr 21, 2017 at 9:40 AM, Anderson, Dave  > wrote:
> 
> Good news/bad news testing the new kernel on CentOS7 with my now
> notoriously finicky machines:
> 
> Good news: 4.9.23-26.el7 (grabbed today via yum update) isn't any
> worse than 4.9.13-22 was on my xen hosts (as far as I can tell so
> far at least)
> 
> Bad news: It isn't any better than 4.9.13 was for me either, if I
> don't set vcpu limit in the grub/xen config, it still panics like so:
> 
> [6.716016] CPU: Physical Processor ID: 0
> [6.720199] CPU: Processor Core ID: 0
> [6.724046] mce: CPU supports 2 MCE banks
> [6.728239] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
> [6.733884] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0
> [6.740770] Freeing SMP alternatives memory: 32K
> (821a8000 - 821b)
> [6.750638] ftrace: allocating 34344 entries in 135 pages
> [6.771888] smpboot: Max logical packages: 1
> [6.776363] VPMU disabled by hypervisor.
> [6.780479] Performance Events: SandyBridge events, PMU not
> available due to virtualization, using software events only.
> [6.792237] NMI watchdog: disabled (cpu0): hardware events not
> enabled
> [6.798943] NMI watchdog: Shutting down hard lockup detector on
> all cpus
> [6.805949] installing Xen timer for CPU 1
> [6.810659] installing Xen timer for CPU 2
> [6.815317] installing Xen timer for CPU 3
> [6.819947] installing Xen timer for CPU 4
> [6.824618] installing Xen timer for CPU 5
> [6.829282] installing Xen timer for CPU 6
> [6.833935] installing Xen timer for CPU 7
> [6.838565] installing Xen timer for CPU 8
> [6.843110] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
> [6.849475] [ cut here ]
> [6.854091] kernel BUG at arch/x86/kernel/cpu/common.c:997!
> [6.855864] random: fast init done
> [6.863070] invalid opcode:  [#1] SMP
> [6.867088] Modules linked in:
> [6.870168] CPU: 8 PID: 0 Comm: swapper/8 Not tainted
> 4.9.23-26.el7.x86_64 #1
> [6.877298] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a
> 08/04/2015
> [6.883920] task: 880058a6a5c0 task.stack: c900400c
> [6.889840] RIP: e030:[]  []
> identify_secondary_cpu+0x57/0x80
> [6.898756] RSP: e02b:c900400c3f08  EFLAGS: 00010086
> [6.904069] RAX: ffe4 RBX: 88005d80a020 RCX:
> 81e5ffc8
> [6.911201] RDX: 0001 RSI: 0005 RDI:
> 0005
> [6.918335] RBP: c900400c3f18 R08: 00ce R09:
> 
> [6.925466] R10: 0005 R11: 0006 R12:
> 0008
> [6.932599] R13:  R14:  R15:
> 
> [6.939735] FS:  () GS:88005d80()
> knlGS:
> [6.947819] CS:  e033 DS: 002b ES: 002b CR0: 80050033
> [6.953565] CR2:  CR3: 01e07000 CR4:
> 00042660
> [6.960696] Stack:
> [6.962731]  0008  c900400c3f28
> 8104ebce
> [6.970205]  c900400c3f40 81029855 
> c900400c3f50
> [6.977691]  810298d0  
> 
> [6.985164] Call Trace:
> [6.987626]  [] smp_store_cpu_info+0x3e/0x40
> [6.993480]  [] cpu_bringup+0x35/0x90
> [6.998700]  [] cpu_bringup_and_idle+0x20/0x40
> [7.004706] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75
> 1c 0f b7 bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c
> 5d c3 0f 0b <0f> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 90
> d3 ca 81
> [7.024976] RIP  []
> identify_secondary_cpu+0x57/0x80
> [7.031528]  RSP 
> [7.035032] ---[ end trace f2a8d75941398d9f ]---
> [7.039658] Kernel panic - not syncing: 

Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-21 Thread Mark L Sung
Hu, seems there are still stability issues on the
"4.9.2-26.el7.x86_64", recently hear many issue related to Supermicro
board! :-(

Peace!!!

On Fri, Apr 21, 2017 at 9:40 AM, Anderson, Dave 
wrote:

> Good news/bad news testing the new kernel on CentOS7 with my now
> notoriously finicky machines:
>
> Good news: 4.9.23-26.el7 (grabbed today via yum update) isn't any worse
> than 4.9.13-22 was on my xen hosts (as far as I can tell so far at least)
>
> Bad news: It isn't any better than 4.9.13 was for me either, if I don't
> set vcpu limit in the grub/xen config, it still panics like so:
>
> [6.716016] CPU: Physical Processor ID: 0
> [6.720199] CPU: Processor Core ID: 0
> [6.724046] mce: CPU supports 2 MCE banks
> [6.728239] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
> [6.733884] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0
> [6.740770] Freeing SMP alternatives memory: 32K (821a8000 -
> 821b)
> [6.750638] ftrace: allocating 34344 entries in 135 pages
> [6.771888] smpboot: Max logical packages: 1
> [6.776363] VPMU disabled by hypervisor.
> [6.780479] Performance Events: SandyBridge events, PMU not available
> due to virtualization, using software events only.
> [6.792237] NMI watchdog: disabled (cpu0): hardware events not enabled
> [6.798943] NMI watchdog: Shutting down hard lockup detector on all cpus
> [6.805949] installing Xen timer for CPU 1
> [6.810659] installing Xen timer for CPU 2
> [6.815317] installing Xen timer for CPU 3
> [6.819947] installing Xen timer for CPU 4
> [6.824618] installing Xen timer for CPU 5
> [6.829282] installing Xen timer for CPU 6
> [6.833935] installing Xen timer for CPU 7
> [6.838565] installing Xen timer for CPU 8
> [6.843110] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
> [6.849475] [ cut here ]
> [6.854091] kernel BUG at arch/x86/kernel/cpu/common.c:997!
> [6.855864] random: fast init done
> [6.863070] invalid opcode:  [#1] SMP
> [6.867088] Modules linked in:
> [6.870168] CPU: 8 PID: 0 Comm: swapper/8 Not tainted
> 4.9.23-26.el7.x86_64 #1
> [6.877298] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
> [6.883920] task: 880058a6a5c0 task.stack: c900400c
> [6.889840] RIP: e030:[]  []
> identify_secondary_cpu+0x57/0x80
> [6.898756] RSP: e02b:c900400c3f08  EFLAGS: 00010086
> [6.904069] RAX: ffe4 RBX: 88005d80a020 RCX:
> 81e5ffc8
> [6.911201] RDX: 0001 RSI: 0005 RDI:
> 0005
> [6.918335] RBP: c900400c3f18 R08: 00ce R09:
> 
> [6.925466] R10: 0005 R11: 0006 R12:
> 0008
> [6.932599] R13:  R14:  R15:
> 
> [6.939735] FS:  () GS:88005d80()
> knlGS:
> [6.947819] CS:  e033 DS: 002b ES: 002b CR0: 80050033
> [6.953565] CR2:  CR3: 01e07000 CR4:
> 00042660
> [6.960696] Stack:
> [6.962731]  0008  c900400c3f28
> 8104ebce
> [6.970205]  c900400c3f40 81029855 
> c900400c3f50
> [6.977691]  810298d0  
> 
> [6.985164] Call Trace:
> [6.987626]  [] smp_store_cpu_info+0x3e/0x40
> [6.993480]  [] cpu_bringup+0x35/0x90
> [6.998700]  [] cpu_bringup_and_idle+0x20/0x40
> [7.004706] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f
> b7 bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b
> <0f> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 90 d3 ca 81
> [7.024976] RIP  [] identify_secondary_cpu+0x57/0x80
> [7.031528]  RSP 
> [7.035032] ---[ end trace f2a8d75941398d9f ]---
> [7.039658] Kernel panic - not syncing: Attempted to kill the idle task!
>
> So...other than my work around...that still works...not sure what else I
> can provide in the way of feedback/testing. But if you want anything else
> gathered, let me know.
>
> Thanks,
> -Dave
>
> --
> Dave Anderson
>
>
> > On Apr 19, 2017, at 10:33 AM, Johnny Hughes  wrote:
> >
> > On 04/19/2017 12:18 PM, PJ Welsh wrote:
> >>
> >> On Wed, Apr 19, 2017 at 5:40 AM, Johnny Hughes  >> > wrote:
> >>
> >>On 04/18/2017 12:39 PM, PJ Welsh wrote:
> >>> Here is something interesting... I went through the BIOS options and
> >>> found that one R710 that *is* functioning only differed in that
> "Logical
> >>> Processor"/Hyperthreading was *enabled* while the one that is *not*
> >>> functioning had HT *disabled*. Enabled Logical Processor and the system
> >>> starts without issue! I've rebooted 3 times now without issue.
> >>> Dell R710 BIOS version 6.4.0
> >>> 2x