Re: kernel BUG at arch/x86/kernel/traps.c:643! when run Redhat7(v3.10) in kvm guest

2016-11-06 Thread Kefeng Wang
Hi all, any ideas, thanks.

+ gonglei, haibin

On 2016/10/17 15:51, Kefeng Wang wrote:
> 
> 
> On 2016/10/15 2:36, Andy Lutomirski wrote:
>> On Thu, Oct 13, 2016 at 11:14 PM, Kefeng Wang
>>  wrote:
>>> Hi all,
>>>
>>> We met BUG_ON in do_device_not_available(fpu exception handler) when run 
>>> redhat7 in kvm guest,
>>> and there is no special test on this guest, only some network packet 
>>> receipt and transmission.
>>>
>>> I checked the new kernel version, found this commit 
>>> 4ecd16ec7059390b430af34bd8bc3ca2b5dcef9a
>>> Author: Andy Lutomirski 
>>> Date:   Sun Jan 24 14:38:06 2016 -0800
>>>
>>> x86/fpu: Fix math emulation in eager fpu mode
>>>
>>> Systems without an FPU are generally old and therefore use lazy FPU
>>> switching. Unsurprisingly, math emulation in eager FPU mode is a
>>> bit buggy. Fix it.
>>>
>>> There were two bugs involving kernel code trying to use the FPU
>>> registers in eager mode even if they didn't exist and one BUG_ON()
>>> that was incorrect.
>>>
>>>
>>> The BUG_ON() is incorrect, but I have no idea about eager fpu, why the 
>>> BUG_ON is incorrect?
>>> Should we backport the patch to v3.10, or is there some bugs in the 
>>> qemu-kvm?
>>> Any reply will be appreciated.
>>
>> The BUG_ON was incorrect because you could hit it if FPU emulation was
>> enabled.  But, unless you explicitly set the "eagerfpu=" option or you
>> have some really weird set of cpu flags, old kernels shouldn't have
>> hit it.  Is the cpuinfo you pasted below from the guest?  Also, could
>> you attach whatever dmesg has to say about FPU in a crashing guest?
>>
>> --Andy
>>
> 
> Hi Andy and Rik, thanks for your quick response.
> 
> Attach more information, and we have no special configuration for fpu, and 
> only met this issue once(can't reproduce).
> 
> 
> 1) cmdline
> 
> BOOT_IMAGE=/vmlinuz-3.10.0-229.20.1.x86_64 root=/dev/vda2 oops=panic 
> softlockup_panic=1 net.ifnames=0 biosdevname=0 nmi_watchdog=1 selinux=0 
> console=tty0 panic=3
> 
> 
> 2) virsh dumpxml 3 (only cpu parts)
> ---
>  1600
>   1600
>   8
>   
> /machine
>   
>   
> hvm
> 
>   
>   
> 
> 
> 
>   
>   
> 
>   
> ---
> 
> 3) The host os cpuinfo
> ---
> processor   : 0
> vendor_id   : GenuineIntel
> cpu family  : 6
> model   : 45
> model name  : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
> stepping: 7
> microcode   : 1808
> cpu MHz : 2899.894
> cache size  : 20480 KB
> physical id : 0
> siblings: 16
> core id : 0
> cpu cores   : 8
> apicid  : 0
> initial apicid  : 0
> fpu : yes
> fpu_exception   : yes
> cpuid level : 13
> wp  : yes
> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
> pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
>  nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl 
> vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
> tsc_deadline_timer aes xsave avx lahf_lm arat epb xsaveopt pl
> n pts dtherm tpr_shadow vnmi flexpriority ept vpid
> bogomips: 5799.78
> clflush size: 64
> cache_alignment : 64
> address sizes   : 46 bits physical, 48 bits virtual
> power management:
> 
> 
> 
> 
> 4) The guest os cpuinfo
> 
>>> [2] The /proc/cpuinfo shows below(show only the first cpu0),
>>> 
>>> localhost:~ # cat /proc/cpuinfo
>>> processor   : 0
>>> vendor_id   : GenuineIntel
>>> cpu family  : 6
>>> model   : 45
>>> model name  : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
>>> stepping: 7
>>> microcode   : 0x1
>>> cpu MHz : 2899.992
>>> cache size  : 4096 KB
>>> physical id : 0
>>> siblings: 8
>>> core id : 0
>>> cpu cores   : 8
>>> apicid  : 0
>>> initial apicid  : 0
>>> fpu : yes
>>> fpu_exception   : yes
>>> cpuid level : 13
>>> wp  : yes
>>> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
>>> cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm 
>>> constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 
>>> pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx 
>>> hypervisor lahf_lm xsaveopt
>>> bogomips: 5799.98
>>> clflush size: 64
>>> cache_alignment : 64
>>> address sizes   : 42 bits physical, 48 bits virtual
>>> power management:
>>>
>>
> 
> 5) parts of bootmsg
> 
> 
>  [0.00] Booting paravirtualized kernel on KVM
>  [0.00] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:8 nr_cpu_ids:8 
> nr_node_ids:1
>  [0.00] PERCPU: Embedded 28 pages/cpu @88040f40 s82816 r8192 
> d23680 u262144
>  [0.00] KVM setup async PF for cpu 

Re: kernel BUG at arch/x86/kernel/traps.c:643! when run Redhat7(v3.10) in kvm guest

2016-11-06 Thread Kefeng Wang
Hi all, any ideas, thanks.

+ gonglei, haibin

On 2016/10/17 15:51, Kefeng Wang wrote:
> 
> 
> On 2016/10/15 2:36, Andy Lutomirski wrote:
>> On Thu, Oct 13, 2016 at 11:14 PM, Kefeng Wang
>>  wrote:
>>> Hi all,
>>>
>>> We met BUG_ON in do_device_not_available(fpu exception handler) when run 
>>> redhat7 in kvm guest,
>>> and there is no special test on this guest, only some network packet 
>>> receipt and transmission.
>>>
>>> I checked the new kernel version, found this commit 
>>> 4ecd16ec7059390b430af34bd8bc3ca2b5dcef9a
>>> Author: Andy Lutomirski 
>>> Date:   Sun Jan 24 14:38:06 2016 -0800
>>>
>>> x86/fpu: Fix math emulation in eager fpu mode
>>>
>>> Systems without an FPU are generally old and therefore use lazy FPU
>>> switching. Unsurprisingly, math emulation in eager FPU mode is a
>>> bit buggy. Fix it.
>>>
>>> There were two bugs involving kernel code trying to use the FPU
>>> registers in eager mode even if they didn't exist and one BUG_ON()
>>> that was incorrect.
>>>
>>>
>>> The BUG_ON() is incorrect, but I have no idea about eager fpu, why the 
>>> BUG_ON is incorrect?
>>> Should we backport the patch to v3.10, or is there some bugs in the 
>>> qemu-kvm?
>>> Any reply will be appreciated.
>>
>> The BUG_ON was incorrect because you could hit it if FPU emulation was
>> enabled.  But, unless you explicitly set the "eagerfpu=" option or you
>> have some really weird set of cpu flags, old kernels shouldn't have
>> hit it.  Is the cpuinfo you pasted below from the guest?  Also, could
>> you attach whatever dmesg has to say about FPU in a crashing guest?
>>
>> --Andy
>>
> 
> Hi Andy and Rik, thanks for your quick response.
> 
> Attach more information, and we have no special configuration for fpu, and 
> only met this issue once(can't reproduce).
> 
> 
> 1) cmdline
> 
> BOOT_IMAGE=/vmlinuz-3.10.0-229.20.1.x86_64 root=/dev/vda2 oops=panic 
> softlockup_panic=1 net.ifnames=0 biosdevname=0 nmi_watchdog=1 selinux=0 
> console=tty0 panic=3
> 
> 
> 2) virsh dumpxml 3 (only cpu parts)
> ---
>  1600
>   1600
>   8
>   
> /machine
>   
>   
> hvm
> 
>   
>   
> 
> 
> 
>   
>   
> 
>   
> ---
> 
> 3) The host os cpuinfo
> ---
> processor   : 0
> vendor_id   : GenuineIntel
> cpu family  : 6
> model   : 45
> model name  : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
> stepping: 7
> microcode   : 1808
> cpu MHz : 2899.894
> cache size  : 20480 KB
> physical id : 0
> siblings: 16
> core id : 0
> cpu cores   : 8
> apicid  : 0
> initial apicid  : 0
> fpu : yes
> fpu_exception   : yes
> cpuid level : 13
> wp  : yes
> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
> pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
>  nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl 
> vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
> tsc_deadline_timer aes xsave avx lahf_lm arat epb xsaveopt pl
> n pts dtherm tpr_shadow vnmi flexpriority ept vpid
> bogomips: 5799.78
> clflush size: 64
> cache_alignment : 64
> address sizes   : 46 bits physical, 48 bits virtual
> power management:
> 
> 
> 
> 
> 4) The guest os cpuinfo
> 
>>> [2] The /proc/cpuinfo shows below(show only the first cpu0),
>>> 
>>> localhost:~ # cat /proc/cpuinfo
>>> processor   : 0
>>> vendor_id   : GenuineIntel
>>> cpu family  : 6
>>> model   : 45
>>> model name  : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
>>> stepping: 7
>>> microcode   : 0x1
>>> cpu MHz : 2899.992
>>> cache size  : 4096 KB
>>> physical id : 0
>>> siblings: 8
>>> core id : 0
>>> cpu cores   : 8
>>> apicid  : 0
>>> initial apicid  : 0
>>> fpu : yes
>>> fpu_exception   : yes
>>> cpuid level : 13
>>> wp  : yes
>>> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
>>> cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm 
>>> constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 
>>> pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx 
>>> hypervisor lahf_lm xsaveopt
>>> bogomips: 5799.98
>>> clflush size: 64
>>> cache_alignment : 64
>>> address sizes   : 42 bits physical, 48 bits virtual
>>> power management:
>>>
>>
> 
> 5) parts of bootmsg
> 
> 
>  [0.00] Booting paravirtualized kernel on KVM
>  [0.00] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:8 nr_cpu_ids:8 
> nr_node_ids:1
>  [0.00] PERCPU: Embedded 28 pages/cpu @88040f40 s82816 r8192 
> d23680 u262144
>  [0.00] KVM setup async PF for cpu 0
>  [0.00] Built 1 zonelists in 

Re: kernel BUG at arch/x86/kernel/traps.c:643! when run Redhat7(v3.10) in kvm guest

2016-10-17 Thread Kefeng Wang


On 2016/10/15 2:36, Andy Lutomirski wrote:
> On Thu, Oct 13, 2016 at 11:14 PM, Kefeng Wang
>  wrote:
>> Hi all,
>>
>> We met BUG_ON in do_device_not_available(fpu exception handler) when run 
>> redhat7 in kvm guest,
>> and there is no special test on this guest, only some network packet receipt 
>> and transmission.
>>
>> I checked the new kernel version, found this commit 
>> 4ecd16ec7059390b430af34bd8bc3ca2b5dcef9a
>> Author: Andy Lutomirski 
>> Date:   Sun Jan 24 14:38:06 2016 -0800
>>
>> x86/fpu: Fix math emulation in eager fpu mode
>>
>> Systems without an FPU are generally old and therefore use lazy FPU
>> switching. Unsurprisingly, math emulation in eager FPU mode is a
>> bit buggy. Fix it.
>>
>> There were two bugs involving kernel code trying to use the FPU
>> registers in eager mode even if they didn't exist and one BUG_ON()
>> that was incorrect.
>>
>>
>> The BUG_ON() is incorrect, but I have no idea about eager fpu, why the 
>> BUG_ON is incorrect?
>> Should we backport the patch to v3.10, or is there some bugs in the qemu-kvm?
>> Any reply will be appreciated.
> 
> The BUG_ON was incorrect because you could hit it if FPU emulation was
> enabled.  But, unless you explicitly set the "eagerfpu=" option or you
> have some really weird set of cpu flags, old kernels shouldn't have
> hit it.  Is the cpuinfo you pasted below from the guest?  Also, could
> you attach whatever dmesg has to say about FPU in a crashing guest?
> 
> --Andy
> 

Hi Andy and Rik, thanks for your quick response.

Attach more information, and we have no special configuration for fpu, and only 
met this issue once(can't reproduce).


1) cmdline

BOOT_IMAGE=/vmlinuz-3.10.0-229.20.1.x86_64 root=/dev/vda2 oops=panic 
softlockup_panic=1 net.ifnames=0 biosdevname=0 nmi_watchdog=1 selinux=0 
console=tty0 panic=3


2) virsh dumpxml 3 (only cpu parts)
---
 1600
  1600
  8
  
/machine
  
  
hvm

  
  



  
  

  
---

3) The host os cpuinfo
---
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 45
model name  : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
stepping: 7
microcode   : 1808
cpu MHz : 2899.894
cache size  : 20480 KB
physical id : 0
siblings: 16
core id : 0
cpu cores   : 8
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
 nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx 
smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
tsc_deadline_timer aes xsave avx lahf_lm arat epb xsaveopt pl
n pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips: 5799.78
clflush size: 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:




4) The guest os cpuinfo

>> [2] The /proc/cpuinfo shows below(show only the first cpu0),
>> 
>> localhost:~ # cat /proc/cpuinfo
>> processor   : 0
>> vendor_id   : GenuineIntel
>> cpu family  : 6
>> model   : 45
>> model name  : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
>> stepping: 7
>> microcode   : 0x1
>> cpu MHz : 2899.992
>> cache size  : 4096 KB
>> physical id : 0
>> siblings: 8
>> core id : 0
>> cpu cores   : 8
>> apicid  : 0
>> initial apicid  : 0
>> fpu : yes
>> fpu_exception   : yes
>> cpuid level : 13
>> wp  : yes
>> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
>> cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm 
>> constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 
>> pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor 
>> lahf_lm xsaveopt
>> bogomips: 5799.98
>> clflush size: 64
>> cache_alignment : 64
>> address sizes   : 42 bits physical, 48 bits virtual
>> power management:
>>
> 

5) parts of bootmsg


 [0.00] Booting paravirtualized kernel on KVM
 [0.00] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:8 nr_cpu_ids:8 
nr_node_ids:1
 [0.00] PERCPU: Embedded 28 pages/cpu @88040f40 s82816 r8192 
d23680 u262144
 [0.00] KVM setup async PF for cpu 0
 [0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 3933317
 [0.00] Policy zone: Normal
 [0.00] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.20.1.x86_64 
root=/dev/vda2 oops=panic softlockup_panic=1 net.ifnames=0 biosdevname=0 
nmi_watchdog=1 selinux=0 console=tty0 panic=3
 [ 

Re: kernel BUG at arch/x86/kernel/traps.c:643! when run Redhat7(v3.10) in kvm guest

2016-10-17 Thread Kefeng Wang


On 2016/10/15 2:36, Andy Lutomirski wrote:
> On Thu, Oct 13, 2016 at 11:14 PM, Kefeng Wang
>  wrote:
>> Hi all,
>>
>> We met BUG_ON in do_device_not_available(fpu exception handler) when run 
>> redhat7 in kvm guest,
>> and there is no special test on this guest, only some network packet receipt 
>> and transmission.
>>
>> I checked the new kernel version, found this commit 
>> 4ecd16ec7059390b430af34bd8bc3ca2b5dcef9a
>> Author: Andy Lutomirski 
>> Date:   Sun Jan 24 14:38:06 2016 -0800
>>
>> x86/fpu: Fix math emulation in eager fpu mode
>>
>> Systems without an FPU are generally old and therefore use lazy FPU
>> switching. Unsurprisingly, math emulation in eager FPU mode is a
>> bit buggy. Fix it.
>>
>> There were two bugs involving kernel code trying to use the FPU
>> registers in eager mode even if they didn't exist and one BUG_ON()
>> that was incorrect.
>>
>>
>> The BUG_ON() is incorrect, but I have no idea about eager fpu, why the 
>> BUG_ON is incorrect?
>> Should we backport the patch to v3.10, or is there some bugs in the qemu-kvm?
>> Any reply will be appreciated.
> 
> The BUG_ON was incorrect because you could hit it if FPU emulation was
> enabled.  But, unless you explicitly set the "eagerfpu=" option or you
> have some really weird set of cpu flags, old kernels shouldn't have
> hit it.  Is the cpuinfo you pasted below from the guest?  Also, could
> you attach whatever dmesg has to say about FPU in a crashing guest?
> 
> --Andy
> 

Hi Andy and Rik, thanks for your quick response.

Attach more information, and we have no special configuration for fpu, and only 
met this issue once(can't reproduce).


1) cmdline

BOOT_IMAGE=/vmlinuz-3.10.0-229.20.1.x86_64 root=/dev/vda2 oops=panic 
softlockup_panic=1 net.ifnames=0 biosdevname=0 nmi_watchdog=1 selinux=0 
console=tty0 panic=3


2) virsh dumpxml 3 (only cpu parts)
---
 1600
  1600
  8
  
/machine
  
  
hvm

  
  



  
  

  
---

3) The host os cpuinfo
---
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 45
model name  : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
stepping: 7
microcode   : 1808
cpu MHz : 2899.894
cache size  : 20480 KB
physical id : 0
siblings: 16
core id : 0
cpu cores   : 8
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
 nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx 
smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
tsc_deadline_timer aes xsave avx lahf_lm arat epb xsaveopt pl
n pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips: 5799.78
clflush size: 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:




4) The guest os cpuinfo

>> [2] The /proc/cpuinfo shows below(show only the first cpu0),
>> 
>> localhost:~ # cat /proc/cpuinfo
>> processor   : 0
>> vendor_id   : GenuineIntel
>> cpu family  : 6
>> model   : 45
>> model name  : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
>> stepping: 7
>> microcode   : 0x1
>> cpu MHz : 2899.992
>> cache size  : 4096 KB
>> physical id : 0
>> siblings: 8
>> core id : 0
>> cpu cores   : 8
>> apicid  : 0
>> initial apicid  : 0
>> fpu : yes
>> fpu_exception   : yes
>> cpuid level : 13
>> wp  : yes
>> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
>> cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm 
>> constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 
>> pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor 
>> lahf_lm xsaveopt
>> bogomips: 5799.98
>> clflush size: 64
>> cache_alignment : 64
>> address sizes   : 42 bits physical, 48 bits virtual
>> power management:
>>
> 

5) parts of bootmsg


 [0.00] Booting paravirtualized kernel on KVM
 [0.00] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:8 nr_cpu_ids:8 
nr_node_ids:1
 [0.00] PERCPU: Embedded 28 pages/cpu @88040f40 s82816 r8192 
d23680 u262144
 [0.00] KVM setup async PF for cpu 0
 [0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 3933317
 [0.00] Policy zone: Normal
 [0.00] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.20.1.x86_64 
root=/dev/vda2 oops=panic softlockup_panic=1 net.ifnames=0 biosdevname=0 
nmi_watchdog=1 selinux=0 console=tty0 panic=3
 [0.00] PID hash table entries: 4096 

Re: kernel BUG at arch/x86/kernel/traps.c:643! when run Redhat7(v3.10) in kvm guest

2016-10-14 Thread Andy Lutomirski
On Thu, Oct 13, 2016 at 11:14 PM, Kefeng Wang
 wrote:
> Hi all,
>
> We met BUG_ON in do_device_not_available(fpu exception handler) when run 
> redhat7 in kvm guest,
> and there is no special test on this guest, only some network packet receipt 
> and transmission.
>
> I checked the new kernel version, found this commit 
> 4ecd16ec7059390b430af34bd8bc3ca2b5dcef9a
> Author: Andy Lutomirski 
> Date:   Sun Jan 24 14:38:06 2016 -0800
>
> x86/fpu: Fix math emulation in eager fpu mode
>
> Systems without an FPU are generally old and therefore use lazy FPU
> switching. Unsurprisingly, math emulation in eager FPU mode is a
> bit buggy. Fix it.
>
> There were two bugs involving kernel code trying to use the FPU
> registers in eager mode even if they didn't exist and one BUG_ON()
> that was incorrect.
>
>
> The BUG_ON() is incorrect, but I have no idea about eager fpu, why the BUG_ON 
> is incorrect?
> Should we backport the patch to v3.10, or is there some bugs in the qemu-kvm?
> Any reply will be appreciated.

The BUG_ON was incorrect because you could hit it if FPU emulation was
enabled.  But, unless you explicitly set the "eagerfpu=" option or you
have some really weird set of cpu flags, old kernels shouldn't have
hit it.  Is the cpuinfo you pasted below from the guest?  Also, could
you attach whatever dmesg has to say about FPU in a crashing guest?

--Andy

>
> Thanks,
> Kefeng
>
> [1] BUG_ON
> 
> [347134.486436] [ cut here ]
> [347134.487310] kernel BUG at arch/x86/kernel/traps.c:643!
> [347134.487398] invalid opcode:  [#1] SMP
> [347134.500532] Modules linked in:loop binfmt_misc nf_log_ipv4 nf_log_common 
> xt_LOG softdog ipmi_devintf ipmi_msghandler xfs libcrc32c tipc squashfs 
> ipt_REJECT iptable_filter ip_tables dm_mod crct10dif_pclmul crct10dif_common 
> crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw ppdev gf128mul 
> i2c_piix4 glue_helper ablk_helper i2c_core cryptd serio_raw parport_pc 
> parport pcspkr ext3 mbcache jbd ata_generic pata_acpi virtio_console(OVE) 
> virtio_blk(OVE) virtio_balloon(OVE) virtio_net(OVE) ata_piix libata floppy 
> virtio_pci(OVE) virtio_ring(OVE) virtio(OVE)
> [347134.525182] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G   O E 
> V---   3.10.0-229.20.1.x86_64 #1 SMP Mon Apr 18 11:26:55 UTC 2016
> [347134.525182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20160318_175052-HGH108214 04/01/2014
> [347134.525182] task: 8803fa2c5c00 ti: 8803fa2ec000 task.ti: 
> 8803fa2ec000
> [347134.525182] RIP: 0010:[]  [] 
> do_device_not_available+0x13/0x60
> [347134.525182] RSP: 0018:8803fa2abc80  EFLAGS: 00010046
> [347134.525182] RAX: 8160ecec RBX:  RCX: 
> 8160ecec
> [347134.525182] RDX:  RSI:  RDI: 
> 8803fa2abc98
> [347134.525182] RBP: 8803fa2abc88 R08:  R09: 
> 
> [347134.525182] R10: 0001 R11: 0005 R12: 
> 8803fa2c5c00
> [347134.525182] R13: 88040f550a40 R14: 8803fa2c6298 R15: 
> 0005
> [347134.544262] FS:  () GS:88040f54() 
> knlGS:
> [347134.544262] CS:  0010 DS:  ES:  CR0: 8005003b
> [347134.544262] CR2: 7f15a4ac0e30 CR3: 0003f5081000 CR4: 
> 000407e0
> [347134.544262] DR0:  DR1:  DR2: 
> 
> [347134.544262] DR3:  DR6: 0ff0 DR7: 
> 0400
> [347134.544262] Stack:
> [347134.544262]  0001 8803fa2abd88 81618d8e 
> 0005
> [347134.544262]  8803fa2c6298 88040f550a40 8803fa2c5c00 
> 8803fa2abd88
> [347134.544262]  8803fa284500 0005 0001 
> 
> [347134.544262] Call Trace:
> [347134.544262] Code: 81 a4 24 90 00 00 00 ff fe ff ff e9 df fe ff ff e8 c3 
> f5 a5 ff 0f 1f 00 66 66 66 66 90 55 48 89 e5 53 66 66 66 66 90 31 db 66 90 
> <0f> 0b 0f 1f 00 65 8b 1c 25 74 f1 00 00 e8 fb 86 b4 ff eb ea 66
> [347134.544262] RIP  [] do_device_not_available+0x13/0x60
> [347134.544262]  RSP 
> [347134.580457] ---[ end trace 7c0ed2be7ded5c73 ]---
> [347134.580457] Kernel panic - not syncing: Fatal exception
> [347134.580457] Shutting down cpus with NMI
>
>
>
> [2] The /proc/cpuinfo shows below(show only the first cpu0),
> 
> localhost:~ # cat /proc/cpuinfo
> processor   : 0
> vendor_id   : GenuineIntel
> cpu family  : 6
> model   : 45
> model name  : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
> stepping: 7
> microcode   : 0x1
> cpu MHz : 2899.992
> cache size  : 4096 KB
> physical id : 0
> siblings: 8
> core id : 0
> cpu cores   : 8
> apicid  : 0
> initial 

Re: kernel BUG at arch/x86/kernel/traps.c:643! when run Redhat7(v3.10) in kvm guest

2016-10-14 Thread Andy Lutomirski
On Thu, Oct 13, 2016 at 11:14 PM, Kefeng Wang
 wrote:
> Hi all,
>
> We met BUG_ON in do_device_not_available(fpu exception handler) when run 
> redhat7 in kvm guest,
> and there is no special test on this guest, only some network packet receipt 
> and transmission.
>
> I checked the new kernel version, found this commit 
> 4ecd16ec7059390b430af34bd8bc3ca2b5dcef9a
> Author: Andy Lutomirski 
> Date:   Sun Jan 24 14:38:06 2016 -0800
>
> x86/fpu: Fix math emulation in eager fpu mode
>
> Systems without an FPU are generally old and therefore use lazy FPU
> switching. Unsurprisingly, math emulation in eager FPU mode is a
> bit buggy. Fix it.
>
> There were two bugs involving kernel code trying to use the FPU
> registers in eager mode even if they didn't exist and one BUG_ON()
> that was incorrect.
>
>
> The BUG_ON() is incorrect, but I have no idea about eager fpu, why the BUG_ON 
> is incorrect?
> Should we backport the patch to v3.10, or is there some bugs in the qemu-kvm?
> Any reply will be appreciated.

The BUG_ON was incorrect because you could hit it if FPU emulation was
enabled.  But, unless you explicitly set the "eagerfpu=" option or you
have some really weird set of cpu flags, old kernels shouldn't have
hit it.  Is the cpuinfo you pasted below from the guest?  Also, could
you attach whatever dmesg has to say about FPU in a crashing guest?

--Andy

>
> Thanks,
> Kefeng
>
> [1] BUG_ON
> 
> [347134.486436] [ cut here ]
> [347134.487310] kernel BUG at arch/x86/kernel/traps.c:643!
> [347134.487398] invalid opcode:  [#1] SMP
> [347134.500532] Modules linked in:loop binfmt_misc nf_log_ipv4 nf_log_common 
> xt_LOG softdog ipmi_devintf ipmi_msghandler xfs libcrc32c tipc squashfs 
> ipt_REJECT iptable_filter ip_tables dm_mod crct10dif_pclmul crct10dif_common 
> crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw ppdev gf128mul 
> i2c_piix4 glue_helper ablk_helper i2c_core cryptd serio_raw parport_pc 
> parport pcspkr ext3 mbcache jbd ata_generic pata_acpi virtio_console(OVE) 
> virtio_blk(OVE) virtio_balloon(OVE) virtio_net(OVE) ata_piix libata floppy 
> virtio_pci(OVE) virtio_ring(OVE) virtio(OVE)
> [347134.525182] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G   O E 
> V---   3.10.0-229.20.1.x86_64 #1 SMP Mon Apr 18 11:26:55 UTC 2016
> [347134.525182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20160318_175052-HGH108214 04/01/2014
> [347134.525182] task: 8803fa2c5c00 ti: 8803fa2ec000 task.ti: 
> 8803fa2ec000
> [347134.525182] RIP: 0010:[]  [] 
> do_device_not_available+0x13/0x60
> [347134.525182] RSP: 0018:8803fa2abc80  EFLAGS: 00010046
> [347134.525182] RAX: 8160ecec RBX:  RCX: 
> 8160ecec
> [347134.525182] RDX:  RSI:  RDI: 
> 8803fa2abc98
> [347134.525182] RBP: 8803fa2abc88 R08:  R09: 
> 
> [347134.525182] R10: 0001 R11: 0005 R12: 
> 8803fa2c5c00
> [347134.525182] R13: 88040f550a40 R14: 8803fa2c6298 R15: 
> 0005
> [347134.544262] FS:  () GS:88040f54() 
> knlGS:
> [347134.544262] CS:  0010 DS:  ES:  CR0: 8005003b
> [347134.544262] CR2: 7f15a4ac0e30 CR3: 0003f5081000 CR4: 
> 000407e0
> [347134.544262] DR0:  DR1:  DR2: 
> 
> [347134.544262] DR3:  DR6: 0ff0 DR7: 
> 0400
> [347134.544262] Stack:
> [347134.544262]  0001 8803fa2abd88 81618d8e 
> 0005
> [347134.544262]  8803fa2c6298 88040f550a40 8803fa2c5c00 
> 8803fa2abd88
> [347134.544262]  8803fa284500 0005 0001 
> 
> [347134.544262] Call Trace:
> [347134.544262] Code: 81 a4 24 90 00 00 00 ff fe ff ff e9 df fe ff ff e8 c3 
> f5 a5 ff 0f 1f 00 66 66 66 66 90 55 48 89 e5 53 66 66 66 66 90 31 db 66 90 
> <0f> 0b 0f 1f 00 65 8b 1c 25 74 f1 00 00 e8 fb 86 b4 ff eb ea 66
> [347134.544262] RIP  [] do_device_not_available+0x13/0x60
> [347134.544262]  RSP 
> [347134.580457] ---[ end trace 7c0ed2be7ded5c73 ]---
> [347134.580457] Kernel panic - not syncing: Fatal exception
> [347134.580457] Shutting down cpus with NMI
>
>
>
> [2] The /proc/cpuinfo shows below(show only the first cpu0),
> 
> localhost:~ # cat /proc/cpuinfo
> processor   : 0
> vendor_id   : GenuineIntel
> cpu family  : 6
> model   : 45
> model name  : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
> stepping: 7
> microcode   : 0x1
> cpu MHz : 2899.992
> cache size  : 4096 KB
> physical id : 0
> siblings: 8
> core id : 0
> cpu cores   : 8
> apicid  : 0
> initial apicid  : 0
> fpu : yes
> 

Re: kernel BUG at arch/x86/kernel/traps.c:643! when run Redhat7(v3.10) in kvm guest

2016-10-14 Thread Rik van Riel
On Fri, 2016-10-14 at 14:14 +0800, Kefeng Wang wrote:
> Hi all,
> 
> We met BUG_ON in do_device_not_available(fpu exception handler) when
> run redhat7 in kvm guest,
> and there is no special test on this guest, only some network packet
> receipt and transmission.
> 
> I checked the new kernel version, found this commit
> 4ecd16ec7059390b430af34bd8bc3ca2b5dcef9a
> Author: Andy Lutomirski 
> Date:   Sun Jan 24 14:38:06 2016 -0800
> 
> x86/fpu: Fix math emulation in eager fpu mode
> 
> Systems without an FPU are generally old and therefore use lazy
> FPU
> switching. Unsurprisingly, math emulation in eager FPU mode is a
> bit buggy. Fix it.

This patch is for "systems without an FPU"

Before we go on, I would like to know what kind of CPU
you tell your KVM virtual machine to present to the guest,
and what kind of CPU your host system has.

Are you by any chance configuring the CPU inside your
virtual machine without an FPU?  It is possible to mask
out bits presented in the CPUID result inside a KVM
guest, so I suspect this is possible...

> There were two bugs involving kernel code trying to use the FPU
> registers in eager mode even if they didn't exist and one
> BUG_ON()
> that was incorrect.
> 
> 
> The BUG_ON() is incorrect, but I have no idea about eager fpu, why
> the BUG_ON is incorrect?
> Should we backport the patch to v3.10, or is there some bugs in the
> qemu-kvm?
> Any reply will be appreciated.
> 
> Thanks,
> Kefeng
> 
> [1] BUG_ON
> 
> [347134.486436] [ cut here ]
> [347134.487310] kernel BUG at arch/x86/kernel/traps.c:643!
> [347134.487398] invalid opcode:  [#1] SMP
> [347134.500532] Modules linked in:loop binfmt_misc nf_log_ipv4
> nf_log_common xt_LOG softdog ipmi_devintf ipmi_msghandler xfs
> libcrc32c tipc squashfs ipt_REJECT iptable_filter ip_tables dm_mod
> crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel
> ghash_clmulni_intel aesni_intel lrw ppdev gf128mul i2c_piix4
> glue_helper ablk_helper i2c_core cryptd serio_raw parport_pc parport
> pcspkr ext3 mbcache jbd ata_generic pata_acpi virtio_console(OVE)
> virtio_blk(OVE) virtio_balloon(OVE) virtio_net(OVE) ata_piix libata
> floppy virtio_pci(OVE) virtio_ring(OVE) virtio(OVE)
> [347134.525182] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G   O
> E V---   3.10.0-229.20.1.x86_64 #1 SMP Mon Apr 18 11:26:55
> UTC 2016
> [347134.525182] Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996), BIOS rel-1.8.1-0-g4adadbd-20160318_175052-HGH108214
> 04/01/2014
> [347134.525182] task: 8803fa2c5c00 ti: 8803fa2ec000 task.ti:
> 8803fa2ec000
> [347134.525182] RIP: 0010:[]  []
> do_device_not_available+0x13/0x60
> [347134.525182] RSP: 0018:8803fa2abc80  EFLAGS: 00010046
> [347134.525182] RAX: 8160ecec RBX:  RCX:
> 8160ecec
> [347134.525182] RDX:  RSI:  RDI:
> 8803fa2abc98
> [347134.525182] RBP: 8803fa2abc88 R08:  R09:
> 
> [347134.525182] R10: 0001 R11: 0005 R12:
> 8803fa2c5c00
> [347134.525182] R13: 88040f550a40 R14: 8803fa2c6298 R15:
> 0005
> [347134.544262] FS:  () GS:88040f54()
> knlGS:
> [347134.544262] CS:  0010 DS:  ES:  CR0: 8005003b
> [347134.544262] CR2: 7f15a4ac0e30 CR3: 0003f5081000 CR4:
> 000407e0
> [347134.544262] DR0:  DR1:  DR2:
> 
> [347134.544262] DR3:  DR6: 0ff0 DR7:
> 0400
> [347134.544262] Stack:
> [347134.544262]  0001 8803fa2abd88 81618d8e
> 0005
> [347134.544262]  8803fa2c6298 88040f550a40 8803fa2c5c00
> 8803fa2abd88
> [347134.544262]  8803fa284500 0005 0001
> 
> [347134.544262] Call Trace:
> [347134.544262] Code: 81 a4 24 90 00 00 00 ff fe ff ff e9 df fe ff ff
> e8 c3 f5 a5 ff 0f 1f 00 66 66 66 66 90 55 48 89 e5 53 66 66 66 66 90
> 31 db 66 90 <0f> 0b 0f 1f 00 65 8b 1c 25 74 f1 00 00 e8 fb 86 b4 ff
> eb ea 66
> [347134.544262] RIP  []
> do_device_not_available+0x13/0x60
> [347134.544262]  RSP 
> [347134.580457] ---[ end trace 7c0ed2be7ded5c73 ]---
> [347134.580457] Kernel panic - not syncing: Fatal exception
> [347134.580457] Shutting down cpus with NMI
> 
> 
> 
> [2] The /proc/cpuinfo shows below(show only the first cpu0),
> 
> localhost:~ # cat /proc/cpuinfo
> processor : 0
> vendor_id : GenuineIntel
> cpu family: 6
> model : 45
> model name: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
> stepping  : 7
> microcode : 0x1
> cpu MHz   : 2899.992
> cache size: 4096 KB
> physical id   : 0
> siblings  : 8
> core id   : 0
> cpu cores : 8
> apicid: 0
> 

Re: kernel BUG at arch/x86/kernel/traps.c:643! when run Redhat7(v3.10) in kvm guest

2016-10-14 Thread Rik van Riel
On Fri, 2016-10-14 at 14:14 +0800, Kefeng Wang wrote:
> Hi all,
> 
> We met BUG_ON in do_device_not_available(fpu exception handler) when
> run redhat7 in kvm guest,
> and there is no special test on this guest, only some network packet
> receipt and transmission.
> 
> I checked the new kernel version, found this commit
> 4ecd16ec7059390b430af34bd8bc3ca2b5dcef9a
> Author: Andy Lutomirski 
> Date:   Sun Jan 24 14:38:06 2016 -0800
> 
> x86/fpu: Fix math emulation in eager fpu mode
> 
> Systems without an FPU are generally old and therefore use lazy
> FPU
> switching. Unsurprisingly, math emulation in eager FPU mode is a
> bit buggy. Fix it.

This patch is for "systems without an FPU"

Before we go on, I would like to know what kind of CPU
you tell your KVM virtual machine to present to the guest,
and what kind of CPU your host system has.

Are you by any chance configuring the CPU inside your
virtual machine without an FPU?  It is possible to mask
out bits presented in the CPUID result inside a KVM
guest, so I suspect this is possible...

> There were two bugs involving kernel code trying to use the FPU
> registers in eager mode even if they didn't exist and one
> BUG_ON()
> that was incorrect.
> 
> 
> The BUG_ON() is incorrect, but I have no idea about eager fpu, why
> the BUG_ON is incorrect?
> Should we backport the patch to v3.10, or is there some bugs in the
> qemu-kvm?
> Any reply will be appreciated.
> 
> Thanks,
> Kefeng
> 
> [1] BUG_ON
> 
> [347134.486436] [ cut here ]
> [347134.487310] kernel BUG at arch/x86/kernel/traps.c:643!
> [347134.487398] invalid opcode:  [#1] SMP
> [347134.500532] Modules linked in:loop binfmt_misc nf_log_ipv4
> nf_log_common xt_LOG softdog ipmi_devintf ipmi_msghandler xfs
> libcrc32c tipc squashfs ipt_REJECT iptable_filter ip_tables dm_mod
> crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel
> ghash_clmulni_intel aesni_intel lrw ppdev gf128mul i2c_piix4
> glue_helper ablk_helper i2c_core cryptd serio_raw parport_pc parport
> pcspkr ext3 mbcache jbd ata_generic pata_acpi virtio_console(OVE)
> virtio_blk(OVE) virtio_balloon(OVE) virtio_net(OVE) ata_piix libata
> floppy virtio_pci(OVE) virtio_ring(OVE) virtio(OVE)
> [347134.525182] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G   O
> E V---   3.10.0-229.20.1.x86_64 #1 SMP Mon Apr 18 11:26:55
> UTC 2016
> [347134.525182] Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996), BIOS rel-1.8.1-0-g4adadbd-20160318_175052-HGH108214
> 04/01/2014
> [347134.525182] task: 8803fa2c5c00 ti: 8803fa2ec000 task.ti:
> 8803fa2ec000
> [347134.525182] RIP: 0010:[]  []
> do_device_not_available+0x13/0x60
> [347134.525182] RSP: 0018:8803fa2abc80  EFLAGS: 00010046
> [347134.525182] RAX: 8160ecec RBX:  RCX:
> 8160ecec
> [347134.525182] RDX:  RSI:  RDI:
> 8803fa2abc98
> [347134.525182] RBP: 8803fa2abc88 R08:  R09:
> 
> [347134.525182] R10: 0001 R11: 0005 R12:
> 8803fa2c5c00
> [347134.525182] R13: 88040f550a40 R14: 8803fa2c6298 R15:
> 0005
> [347134.544262] FS:  () GS:88040f54()
> knlGS:
> [347134.544262] CS:  0010 DS:  ES:  CR0: 8005003b
> [347134.544262] CR2: 7f15a4ac0e30 CR3: 0003f5081000 CR4:
> 000407e0
> [347134.544262] DR0:  DR1:  DR2:
> 
> [347134.544262] DR3:  DR6: 0ff0 DR7:
> 0400
> [347134.544262] Stack:
> [347134.544262]  0001 8803fa2abd88 81618d8e
> 0005
> [347134.544262]  8803fa2c6298 88040f550a40 8803fa2c5c00
> 8803fa2abd88
> [347134.544262]  8803fa284500 0005 0001
> 
> [347134.544262] Call Trace:
> [347134.544262] Code: 81 a4 24 90 00 00 00 ff fe ff ff e9 df fe ff ff
> e8 c3 f5 a5 ff 0f 1f 00 66 66 66 66 90 55 48 89 e5 53 66 66 66 66 90
> 31 db 66 90 <0f> 0b 0f 1f 00 65 8b 1c 25 74 f1 00 00 e8 fb 86 b4 ff
> eb ea 66
> [347134.544262] RIP  []
> do_device_not_available+0x13/0x60
> [347134.544262]  RSP 
> [347134.580457] ---[ end trace 7c0ed2be7ded5c73 ]---
> [347134.580457] Kernel panic - not syncing: Fatal exception
> [347134.580457] Shutting down cpus with NMI
> 
> 
> 
> [2] The /proc/cpuinfo shows below(show only the first cpu0),
> 
> localhost:~ # cat /proc/cpuinfo
> processor : 0
> vendor_id : GenuineIntel
> cpu family: 6
> model : 45
> model name: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
> stepping  : 7
> microcode : 0x1
> cpu MHz   : 2899.992
> cache size: 4096 KB
> physical id   : 0
> siblings  : 8
> core id   : 0
> cpu cores : 8
> apicid: 0
> initial apicid 

kernel BUG at arch/x86/kernel/traps.c:643! when run Redhat7(v3.10) in kvm guest

2016-10-14 Thread Kefeng Wang
Hi all,

We met BUG_ON in do_device_not_available(fpu exception handler) when run 
redhat7 in kvm guest,
and there is no special test on this guest, only some network packet receipt 
and transmission.

I checked the new kernel version, found this commit 
4ecd16ec7059390b430af34bd8bc3ca2b5dcef9a
Author: Andy Lutomirski 
Date:   Sun Jan 24 14:38:06 2016 -0800

x86/fpu: Fix math emulation in eager fpu mode

Systems without an FPU are generally old and therefore use lazy FPU
switching. Unsurprisingly, math emulation in eager FPU mode is a
bit buggy. Fix it.

There were two bugs involving kernel code trying to use the FPU
registers in eager mode even if they didn't exist and one BUG_ON()
that was incorrect.


The BUG_ON() is incorrect, but I have no idea about eager fpu, why the BUG_ON 
is incorrect?
Should we backport the patch to v3.10, or is there some bugs in the qemu-kvm?
Any reply will be appreciated.

Thanks,
Kefeng

[1] BUG_ON

[347134.486436] [ cut here ]
[347134.487310] kernel BUG at arch/x86/kernel/traps.c:643!
[347134.487398] invalid opcode:  [#1] SMP
[347134.500532] Modules linked in:loop binfmt_misc nf_log_ipv4 nf_log_common 
xt_LOG softdog ipmi_devintf ipmi_msghandler xfs libcrc32c tipc squashfs 
ipt_REJECT iptable_filter ip_tables dm_mod crct10dif_pclmul crct10dif_common 
crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw ppdev gf128mul 
i2c_piix4 glue_helper ablk_helper i2c_core cryptd serio_raw parport_pc parport 
pcspkr ext3 mbcache jbd ata_generic pata_acpi virtio_console(OVE) 
virtio_blk(OVE) virtio_balloon(OVE) virtio_net(OVE) ata_piix libata floppy 
virtio_pci(OVE) virtio_ring(OVE) virtio(OVE)
[347134.525182] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G   O E 
V---   3.10.0-229.20.1.x86_64 #1 SMP Mon Apr 18 11:26:55 UTC 2016
[347134.525182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20160318_175052-HGH108214 04/01/2014
[347134.525182] task: 8803fa2c5c00 ti: 8803fa2ec000 task.ti: 
8803fa2ec000
[347134.525182] RIP: 0010:[]  [] 
do_device_not_available+0x13/0x60
[347134.525182] RSP: 0018:8803fa2abc80  EFLAGS: 00010046
[347134.525182] RAX: 8160ecec RBX:  RCX: 
8160ecec
[347134.525182] RDX:  RSI:  RDI: 
8803fa2abc98
[347134.525182] RBP: 8803fa2abc88 R08:  R09: 

[347134.525182] R10: 0001 R11: 0005 R12: 
8803fa2c5c00
[347134.525182] R13: 88040f550a40 R14: 8803fa2c6298 R15: 
0005
[347134.544262] FS:  () GS:88040f54() 
knlGS:
[347134.544262] CS:  0010 DS:  ES:  CR0: 8005003b
[347134.544262] CR2: 7f15a4ac0e30 CR3: 0003f5081000 CR4: 
000407e0
[347134.544262] DR0:  DR1:  DR2: 

[347134.544262] DR3:  DR6: 0ff0 DR7: 
0400
[347134.544262] Stack:
[347134.544262]  0001 8803fa2abd88 81618d8e 
0005
[347134.544262]  8803fa2c6298 88040f550a40 8803fa2c5c00 
8803fa2abd88
[347134.544262]  8803fa284500 0005 0001 

[347134.544262] Call Trace:
[347134.544262] Code: 81 a4 24 90 00 00 00 ff fe ff ff e9 df fe ff ff e8 c3 f5 
a5 ff 0f 1f 00 66 66 66 66 90 55 48 89 e5 53 66 66 66 66 90 31 db 66 90 <0f> 0b 
0f 1f 00 65 8b 1c 25 74 f1 00 00 e8 fb 86 b4 ff eb ea 66
[347134.544262] RIP  [] do_device_not_available+0x13/0x60
[347134.544262]  RSP 
[347134.580457] ---[ end trace 7c0ed2be7ded5c73 ]---
[347134.580457] Kernel panic - not syncing: Fatal exception
[347134.580457] Shutting down cpus with NMI



[2] The /proc/cpuinfo shows below(show only the first cpu0),

localhost:~ # cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 45
model name  : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
stepping: 7
microcode   : 0x1
cpu MHz : 2899.992
cache size  : 4096 KB
physical id : 0
siblings: 8
core id : 0
cpu cores   : 8
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm 
constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid 
sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm 
xsaveopt
bogomips: 5799.98
clflush size: 64
cache_alignment : 64
address sizes   : 42 bits physical, 48 bits virtual
power management:



kernel BUG at arch/x86/kernel/traps.c:643! when run Redhat7(v3.10) in kvm guest

2016-10-14 Thread Kefeng Wang
Hi all,

We met BUG_ON in do_device_not_available(fpu exception handler) when run 
redhat7 in kvm guest,
and there is no special test on this guest, only some network packet receipt 
and transmission.

I checked the new kernel version, found this commit 
4ecd16ec7059390b430af34bd8bc3ca2b5dcef9a
Author: Andy Lutomirski 
Date:   Sun Jan 24 14:38:06 2016 -0800

x86/fpu: Fix math emulation in eager fpu mode

Systems without an FPU are generally old and therefore use lazy FPU
switching. Unsurprisingly, math emulation in eager FPU mode is a
bit buggy. Fix it.

There were two bugs involving kernel code trying to use the FPU
registers in eager mode even if they didn't exist and one BUG_ON()
that was incorrect.


The BUG_ON() is incorrect, but I have no idea about eager fpu, why the BUG_ON 
is incorrect?
Should we backport the patch to v3.10, or is there some bugs in the qemu-kvm?
Any reply will be appreciated.

Thanks,
Kefeng

[1] BUG_ON

[347134.486436] [ cut here ]
[347134.487310] kernel BUG at arch/x86/kernel/traps.c:643!
[347134.487398] invalid opcode:  [#1] SMP
[347134.500532] Modules linked in:loop binfmt_misc nf_log_ipv4 nf_log_common 
xt_LOG softdog ipmi_devintf ipmi_msghandler xfs libcrc32c tipc squashfs 
ipt_REJECT iptable_filter ip_tables dm_mod crct10dif_pclmul crct10dif_common 
crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw ppdev gf128mul 
i2c_piix4 glue_helper ablk_helper i2c_core cryptd serio_raw parport_pc parport 
pcspkr ext3 mbcache jbd ata_generic pata_acpi virtio_console(OVE) 
virtio_blk(OVE) virtio_balloon(OVE) virtio_net(OVE) ata_piix libata floppy 
virtio_pci(OVE) virtio_ring(OVE) virtio(OVE)
[347134.525182] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G   O E 
V---   3.10.0-229.20.1.x86_64 #1 SMP Mon Apr 18 11:26:55 UTC 2016
[347134.525182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20160318_175052-HGH108214 04/01/2014
[347134.525182] task: 8803fa2c5c00 ti: 8803fa2ec000 task.ti: 
8803fa2ec000
[347134.525182] RIP: 0010:[]  [] 
do_device_not_available+0x13/0x60
[347134.525182] RSP: 0018:8803fa2abc80  EFLAGS: 00010046
[347134.525182] RAX: 8160ecec RBX:  RCX: 
8160ecec
[347134.525182] RDX:  RSI:  RDI: 
8803fa2abc98
[347134.525182] RBP: 8803fa2abc88 R08:  R09: 

[347134.525182] R10: 0001 R11: 0005 R12: 
8803fa2c5c00
[347134.525182] R13: 88040f550a40 R14: 8803fa2c6298 R15: 
0005
[347134.544262] FS:  () GS:88040f54() 
knlGS:
[347134.544262] CS:  0010 DS:  ES:  CR0: 8005003b
[347134.544262] CR2: 7f15a4ac0e30 CR3: 0003f5081000 CR4: 
000407e0
[347134.544262] DR0:  DR1:  DR2: 

[347134.544262] DR3:  DR6: 0ff0 DR7: 
0400
[347134.544262] Stack:
[347134.544262]  0001 8803fa2abd88 81618d8e 
0005
[347134.544262]  8803fa2c6298 88040f550a40 8803fa2c5c00 
8803fa2abd88
[347134.544262]  8803fa284500 0005 0001 

[347134.544262] Call Trace:
[347134.544262] Code: 81 a4 24 90 00 00 00 ff fe ff ff e9 df fe ff ff e8 c3 f5 
a5 ff 0f 1f 00 66 66 66 66 90 55 48 89 e5 53 66 66 66 66 90 31 db 66 90 <0f> 0b 
0f 1f 00 65 8b 1c 25 74 f1 00 00 e8 fb 86 b4 ff eb ea 66
[347134.544262] RIP  [] do_device_not_available+0x13/0x60
[347134.544262]  RSP 
[347134.580457] ---[ end trace 7c0ed2be7ded5c73 ]---
[347134.580457] Kernel panic - not syncing: Fatal exception
[347134.580457] Shutting down cpus with NMI



[2] The /proc/cpuinfo shows below(show only the first cpu0),

localhost:~ # cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 45
model name  : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
stepping: 7
microcode   : 0x1
cpu MHz : 2899.992
cache size  : 4096 KB
physical id : 0
siblings: 8
core id : 0
cpu cores   : 8
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm 
constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid 
sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm 
xsaveopt
bogomips: 5799.98
clflush size: 64
cache_alignment : 64
address sizes   : 42 bits physical, 48 bits virtual
power management: