Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
Il 11/04/2014 19:40, H. Peter Anvin ha scritto: On 04/11/2014 10:35 AM, Jet Chen wrote: As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case. Either way, unless there is a CPUID interface exposed in CPUID levels 0x4000+, then relying on the hypervisor bit to do VMCALL is wrong in the extreme. Sorry for the delay guys, I was on vacation. Lack of a CPUID interface at 0x4000 is indeed *the* good reason why QEMU should not set the hypervisor bit. Of course that there is no guarantee that QEMU will never expose a 0x4000 interface, and at that point the hypervisor bit may reappear in QEMU's JIT mode. As to sending #UD to the guest at CPL>0, that is a choice of the hypervisor. Hyper-V (and KVM in Hyper-V emulation mode) does that, and does the same in real mode too. KVM instead sets EAX to -KVM_EPERM, and accepts hypercalls in real mode (where CPL=0). Terminating the guest is surely the wrong thing to do at CPL>0. Thanks, Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
Il 11/04/2014 19:40, H. Peter Anvin ha scritto: On 04/11/2014 10:35 AM, Jet Chen wrote: As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case. Either way, unless there is a CPUID interface exposed in CPUID levels 0x4000+, then relying on the hypervisor bit to do VMCALL is wrong in the extreme. Sorry for the delay guys, I was on vacation. Lack of a CPUID interface at 0x4000 is indeed *the* good reason why QEMU should not set the hypervisor bit. Of course that there is no guarantee that QEMU will never expose a 0x4000 interface, and at that point the hypervisor bit may reappear in QEMU's JIT mode. As to sending #UD to the guest at CPL0, that is a choice of the hypervisor. Hyper-V (and KVM in Hyper-V emulation mode) does that, and does the same in real mode too. KVM instead sets EAX to -KVM_EPERM, and accepts hypercalls in real mode (where CPL=0). Terminating the guest is surely the wrong thing to do at CPL0. Thanks, Paolo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
Should we perhaps CC qemu-devel here for an opinion. Guys, this mail should explain the issue but in case there are questions, the whole thread starts here: http://lkml.kernel.org/r/20140407111725.GC25152@localhost Thanks. On Sat, Apr 12, 2014 at 01:35:49AM +0800, Jet Chen wrote: > On 04/12/2014 12:33 AM, H. Peter Anvin wrote: > > On 04/11/2014 06:51 AM, Romer, Benjamin M wrote: > >> > >>> I'm still confused where KVM comes into the picture. Are you actually > >>> using KVM (and thus talking about nested virtualization) or are you > >>> using Qemu in JIT mode and running another hypervisor underneath? > >> > >> The test that Fengguang used to find the problem was running the linux > >> kernel directly using KVM. When the kernel was run with "-cpu Haswell, > >> +smep,+smap" set, the vmcall failed with invalid op, but when the kernel > >> is run with "-cpu qemu64", the vmcall causes a vmexit, as it should. > > > > As far as I know, Fengguang's test doesn't use KVM at all, it runs Qemu > > as a JIT. Completely different thing. In that case Qemu probably > > should *not* set the hypervisor bit. However, the only thing that the > > hypervisor bit means is that you can look for specific hypervisor APIs > > in CPUID level 0x4000+. > > > >> My point is, the vmcall was made because the hypervisor bit was set. If > >> this bit had been turned off, as it would be on a real processor, the > >> vmcall wouldn't have happened. > > > > And my point is that that is a bug. In the driver. A very serious one. > > You cannot call VMCALL until you know *which* hypervisor API(s) you > > have available, period. > > > >>> The hypervisor bit is a complete red herring. If the guest CPU is > >>> running in VT-x mode, then VMCALL should VMEXIT inside the guest > >>> (invoking the guest root VT-x), > >> > >> The CPU is running in VT-X. That was my point, the kernel is running in > >> the KVM guest, and KVM is setting the CPU feature bits such that bit 31 > >> is enabled. > > > > Which it is because it wants to export the KVM hypercall interface. > > However, keying VMCALL *only* on the HYPERVISOR bit is wrong in the extreme. > > > >> I don't think it's a red herring because the kernel uses this bit > >> elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU > >> features, and can be checked with the cpu_has_hypervisor macro (which > >> was not used by the original author of the code in the driver, but > >> should have been). VMWare and KVM support in the kernel also check for > >> this bit before checking their hypervisor leaves for an ID. If it's not > >> properly set it affects more than just the s-Par drivers. > >> > >>> but the fact still remains that you > >>> should never, ever, invoke VMCALL unless you know what hypervisor you > >>> have underneath. > >> > >> From the standpoint of the s-Par drivers, yes, I agree (as I already > >> said). However, VMCALL is not a privileged instruction, so anyone could > >> use it from user space and go right past the OS straight to the > >> hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since > >> any user could hard-stop the guest with a couple of lines of C. > > > > Typically the hypervisor wants to generate a #UD inside of the guest for > > that case. The guest OS will intercept it and SIGILL the user space > > process. > > > > -hpa > > > > Hi Ben, > > I re-tested this case with/without option -enable-kvm. > > qemu-system-x86_64 -cpu Haswell,+smep,+smap invalid op > qemu-system-x86_64 -cpu kvm64 invalid op > qemu-system-x86_64 -cpu Haswell,+smep,+smap -enable-kvm everything OK > qemu-system-x86_64 -cpu kvm64 -enable-kvm everything OK > > I think this is probably a bug in QEMU. > Sorry for misleading you. I am not experienced in QEMU usage. I don't realize > I need try this case with different options Until read Peter's reply. > > As Peter said, QEMU probably should *not* set the hypervisor bit. But based > on my testing, I think KVM works properly in this case. > > Thanks, > Jet > -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
Should we perhaps CC qemu-devel here for an opinion. Guys, this mail should explain the issue but in case there are questions, the whole thread starts here: http://lkml.kernel.org/r/20140407111725.GC25152@localhost Thanks. On Sat, Apr 12, 2014 at 01:35:49AM +0800, Jet Chen wrote: On 04/12/2014 12:33 AM, H. Peter Anvin wrote: On 04/11/2014 06:51 AM, Romer, Benjamin M wrote: I'm still confused where KVM comes into the picture. Are you actually using KVM (and thus talking about nested virtualization) or are you using Qemu in JIT mode and running another hypervisor underneath? The test that Fengguang used to find the problem was running the linux kernel directly using KVM. When the kernel was run with -cpu Haswell, +smep,+smap set, the vmcall failed with invalid op, but when the kernel is run with -cpu qemu64, the vmcall causes a vmexit, as it should. As far as I know, Fengguang's test doesn't use KVM at all, it runs Qemu as a JIT. Completely different thing. In that case Qemu probably should *not* set the hypervisor bit. However, the only thing that the hypervisor bit means is that you can look for specific hypervisor APIs in CPUID level 0x4000+. My point is, the vmcall was made because the hypervisor bit was set. If this bit had been turned off, as it would be on a real processor, the vmcall wouldn't have happened. And my point is that that is a bug. In the driver. A very serious one. You cannot call VMCALL until you know *which* hypervisor API(s) you have available, period. The hypervisor bit is a complete red herring. If the guest CPU is running in VT-x mode, then VMCALL should VMEXIT inside the guest (invoking the guest root VT-x), The CPU is running in VT-X. That was my point, the kernel is running in the KVM guest, and KVM is setting the CPU feature bits such that bit 31 is enabled. Which it is because it wants to export the KVM hypercall interface. However, keying VMCALL *only* on the HYPERVISOR bit is wrong in the extreme. I don't think it's a red herring because the kernel uses this bit elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU features, and can be checked with the cpu_has_hypervisor macro (which was not used by the original author of the code in the driver, but should have been). VMWare and KVM support in the kernel also check for this bit before checking their hypervisor leaves for an ID. If it's not properly set it affects more than just the s-Par drivers. but the fact still remains that you should never, ever, invoke VMCALL unless you know what hypervisor you have underneath. From the standpoint of the s-Par drivers, yes, I agree (as I already said). However, VMCALL is not a privileged instruction, so anyone could use it from user space and go right past the OS straight to the hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since any user could hard-stop the guest with a couple of lines of C. Typically the hypervisor wants to generate a #UD inside of the guest for that case. The guest OS will intercept it and SIGILL the user space process. -hpa Hi Ben, I re-tested this case with/without option -enable-kvm. qemu-system-x86_64 -cpu Haswell,+smep,+smap invalid op qemu-system-x86_64 -cpu kvm64 invalid op qemu-system-x86_64 -cpu Haswell,+smep,+smap -enable-kvm everything OK qemu-system-x86_64 -cpu kvm64 -enable-kvm everything OK I think this is probably a bug in QEMU. Sorry for misleading you. I am not experienced in QEMU usage. I don't realize I need try this case with different options Until read Peter's reply. As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case. Thanks, Jet -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Sat, 2014-04-12 at 01:35 +0800, Jet Chen wrote: > Hi Ben, > > I re-tested this case with/without option -enable-kvm. > > qemu-system-x86_64 -cpu Haswell,+smep,+smap invalid op > qemu-system-x86_64 -cpu kvm64 invalid op > qemu-system-x86_64 -cpu Haswell,+smep,+smap -enable-kvm everything OK > qemu-system-x86_64 -cpu kvm64 -enable-kvm everything OK > > I think this is probably a bug in QEMU. > Sorry for misleading you. I am not experienced in QEMU usage. I don't realize > I need try this case with different options Until read Peter's reply. > > As Peter said, QEMU probably should *not* set the hypervisor bit. But based > on my testing, I think KVM works properly in this case. > > Thanks, > Jet Great, thanks! Sorry for the trouble. :) -- Ben
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Fri, 2014-04-11 at 10:40 -0700, H. Peter Anvin wrote: > On 04/11/2014 10:35 AM, Jet Chen wrote: > > > > As Peter said, QEMU probably should *not* set the hypervisor bit. But based > > on my testing, I think KVM works properly in this case. > > > > Either way, unless there is a CPUID interface exposed in CPUID levels > 0x4000+, then relying on the hypervisor bit to do VMCALL is wrong in > the extreme. > > -hpa > > I'll pass your feedback on to the people who wrote the bad code. Sorry for the trouble. :) -- Ben
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/11/2014 10:35 AM, Jet Chen wrote: > > As Peter said, QEMU probably should *not* set the hypervisor bit. But based > on my testing, I think KVM works properly in this case. > Either way, unless there is a CPUID interface exposed in CPUID levels 0x4000+, then relying on the hypervisor bit to do VMCALL is wrong in the extreme. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/12/2014 12:33 AM, H. Peter Anvin wrote: > On 04/11/2014 06:51 AM, Romer, Benjamin M wrote: >> >>> I'm still confused where KVM comes into the picture. Are you actually >>> using KVM (and thus talking about nested virtualization) or are you >>> using Qemu in JIT mode and running another hypervisor underneath? >> >> The test that Fengguang used to find the problem was running the linux >> kernel directly using KVM. When the kernel was run with "-cpu Haswell, >> +smep,+smap" set, the vmcall failed with invalid op, but when the kernel >> is run with "-cpu qemu64", the vmcall causes a vmexit, as it should. > > As far as I know, Fengguang's test doesn't use KVM at all, it runs Qemu > as a JIT. Completely different thing. In that case Qemu probably > should *not* set the hypervisor bit. However, the only thing that the > hypervisor bit means is that you can look for specific hypervisor APIs > in CPUID level 0x4000+. > >> My point is, the vmcall was made because the hypervisor bit was set. If >> this bit had been turned off, as it would be on a real processor, the >> vmcall wouldn't have happened. > > And my point is that that is a bug. In the driver. A very serious one. > You cannot call VMCALL until you know *which* hypervisor API(s) you > have available, period. > >>> The hypervisor bit is a complete red herring. If the guest CPU is >>> running in VT-x mode, then VMCALL should VMEXIT inside the guest >>> (invoking the guest root VT-x), >> >> The CPU is running in VT-X. That was my point, the kernel is running in >> the KVM guest, and KVM is setting the CPU feature bits such that bit 31 >> is enabled. > > Which it is because it wants to export the KVM hypercall interface. > However, keying VMCALL *only* on the HYPERVISOR bit is wrong in the extreme. > >> I don't think it's a red herring because the kernel uses this bit >> elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU >> features, and can be checked with the cpu_has_hypervisor macro (which >> was not used by the original author of the code in the driver, but >> should have been). VMWare and KVM support in the kernel also check for >> this bit before checking their hypervisor leaves for an ID. If it's not >> properly set it affects more than just the s-Par drivers. >> >>> but the fact still remains that you >>> should never, ever, invoke VMCALL unless you know what hypervisor you >>> have underneath. >> >> From the standpoint of the s-Par drivers, yes, I agree (as I already >> said). However, VMCALL is not a privileged instruction, so anyone could >> use it from user space and go right past the OS straight to the >> hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since >> any user could hard-stop the guest with a couple of lines of C. > > Typically the hypervisor wants to generate a #UD inside of the guest for > that case. The guest OS will intercept it and SIGILL the user space > process. > > -hpa > Hi Ben, I re-tested this case with/without option -enable-kvm. qemu-system-x86_64 -cpu Haswell,+smep,+smap invalid op qemu-system-x86_64 -cpu kvm64 invalid op qemu-system-x86_64 -cpu Haswell,+smep,+smap -enable-kvm everything OK qemu-system-x86_64 -cpu kvm64 -enable-kvm everything OK I think this is probably a bug in QEMU. Sorry for misleading you. I am not experienced in QEMU usage. I don't realize I need try this case with different options Until read Peter's reply. As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case. Thanks, Jet -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/11/2014 06:51 AM, Romer, Benjamin M wrote: > >> I'm still confused where KVM comes into the picture. Are you actually >> using KVM (and thus talking about nested virtualization) or are you >> using Qemu in JIT mode and running another hypervisor underneath? > > The test that Fengguang used to find the problem was running the linux > kernel directly using KVM. When the kernel was run with "-cpu Haswell, > +smep,+smap" set, the vmcall failed with invalid op, but when the kernel > is run with "-cpu qemu64", the vmcall causes a vmexit, as it should. As far as I know, Fengguang's test doesn't use KVM at all, it runs Qemu as a JIT. Completely different thing. In that case Qemu probably should *not* set the hypervisor bit. However, the only thing that the hypervisor bit means is that you can look for specific hypervisor APIs in CPUID level 0x4000+. > My point is, the vmcall was made because the hypervisor bit was set. If > this bit had been turned off, as it would be on a real processor, the > vmcall wouldn't have happened. And my point is that that is a bug. In the driver. A very serious one. You cannot call VMCALL until you know *which* hypervisor API(s) you have available, period. >> The hypervisor bit is a complete red herring. If the guest CPU is >> running in VT-x mode, then VMCALL should VMEXIT inside the guest >> (invoking the guest root VT-x), > > The CPU is running in VT-X. That was my point, the kernel is running in > the KVM guest, and KVM is setting the CPU feature bits such that bit 31 > is enabled. Which it is because it wants to export the KVM hypercall interface. However, keying VMCALL *only* on the HYPERVISOR bit is wrong in the extreme. > I don't think it's a red herring because the kernel uses this bit > elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU > features, and can be checked with the cpu_has_hypervisor macro (which > was not used by the original author of the code in the driver, but > should have been). VMWare and KVM support in the kernel also check for > this bit before checking their hypervisor leaves for an ID. If it's not > properly set it affects more than just the s-Par drivers. > >> but the fact still remains that you >> should never, ever, invoke VMCALL unless you know what hypervisor you >> have underneath. > > From the standpoint of the s-Par drivers, yes, I agree (as I already > said). However, VMCALL is not a privileged instruction, so anyone could > use it from user space and go right past the OS straight to the > hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since > any user could hard-stop the guest with a couple of lines of C. Typically the hypervisor wants to generate a #UD inside of the guest for that case. The guest OS will intercept it and SIGILL the user space process. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Thu, 2014-04-10 at 19:28 -0700, H. Peter Anvin wrote: > On 04/10/2014 06:19 AM, Romer, Benjamin M wrote: > > > > I'm confused by the intended behavior of KVM.. Is the intention of the > > -cpu switch to fully emulate a particular CPU? If that's the case, the > > Intel documentation says bit 31 should always be 0, so the value > > returned by the cpuid instruction isn't correct. If the intention is to > > present a VM with a specific CPU architecture, the CPU ought to behave > > as described in Intel's virtualization documentation and just vmexit > > instead of faulting with invalid op, IMHO. > > > > I've already said the check in the code was insufficient, and I'm trying > > to fix that part now. :) > > > > I'm still confused where KVM comes into the picture. Are you actually > using KVM (and thus talking about nested virtualization) or are you > using Qemu in JIT mode and running another hypervisor underneath? The test that Fengguang used to find the problem was running the linux kernel directly using KVM. When the kernel was run with "-cpu Haswell, +smep,+smap" set, the vmcall failed with invalid op, but when the kernel is run with "-cpu qemu64", the vmcall causes a vmexit, as it should. My point is, the vmcall was made because the hypervisor bit was set. If this bit had been turned off, as it would be on a real processor, the vmcall wouldn't have happened. > The hypervisor bit is a complete red herring. If the guest CPU is > running in VT-x mode, then VMCALL should VMEXIT inside the guest > (invoking the guest root VT-x), The CPU is running in VT-X. That was my point, the kernel is running in the KVM guest, and KVM is setting the CPU feature bits such that bit 31 is enabled. I don't think it's a red herring because the kernel uses this bit elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU features, and can be checked with the cpu_has_hypervisor macro (which was not used by the original author of the code in the driver, but should have been). VMWare and KVM support in the kernel also check for this bit before checking their hypervisor leaves for an ID. If it's not properly set it affects more than just the s-Par drivers. > but the fact still remains that you > should never, ever, invoke VMCALL unless you know what hypervisor you > have underneath. From the standpoint of the s-Par drivers, yes, I agree (as I already said). However, VMCALL is not a privileged instruction, so anyone could use it from user space and go right past the OS straight to the hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since any user could hard-stop the guest with a couple of lines of C. -- Ben
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Thu, 2014-04-10 at 19:28 -0700, H. Peter Anvin wrote: On 04/10/2014 06:19 AM, Romer, Benjamin M wrote: I'm confused by the intended behavior of KVM.. Is the intention of the -cpu switch to fully emulate a particular CPU? If that's the case, the Intel documentation says bit 31 should always be 0, so the value returned by the cpuid instruction isn't correct. If the intention is to present a VM with a specific CPU architecture, the CPU ought to behave as described in Intel's virtualization documentation and just vmexit instead of faulting with invalid op, IMHO. I've already said the check in the code was insufficient, and I'm trying to fix that part now. :) I'm still confused where KVM comes into the picture. Are you actually using KVM (and thus talking about nested virtualization) or are you using Qemu in JIT mode and running another hypervisor underneath? The test that Fengguang used to find the problem was running the linux kernel directly using KVM. When the kernel was run with -cpu Haswell, +smep,+smap set, the vmcall failed with invalid op, but when the kernel is run with -cpu qemu64, the vmcall causes a vmexit, as it should. My point is, the vmcall was made because the hypervisor bit was set. If this bit had been turned off, as it would be on a real processor, the vmcall wouldn't have happened. The hypervisor bit is a complete red herring. If the guest CPU is running in VT-x mode, then VMCALL should VMEXIT inside the guest (invoking the guest root VT-x), The CPU is running in VT-X. That was my point, the kernel is running in the KVM guest, and KVM is setting the CPU feature bits such that bit 31 is enabled. I don't think it's a red herring because the kernel uses this bit elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU features, and can be checked with the cpu_has_hypervisor macro (which was not used by the original author of the code in the driver, but should have been). VMWare and KVM support in the kernel also check for this bit before checking their hypervisor leaves for an ID. If it's not properly set it affects more than just the s-Par drivers. but the fact still remains that you should never, ever, invoke VMCALL unless you know what hypervisor you have underneath. From the standpoint of the s-Par drivers, yes, I agree (as I already said). However, VMCALL is not a privileged instruction, so anyone could use it from user space and go right past the OS straight to the hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since any user could hard-stop the guest with a couple of lines of C. -- Ben
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/11/2014 06:51 AM, Romer, Benjamin M wrote: I'm still confused where KVM comes into the picture. Are you actually using KVM (and thus talking about nested virtualization) or are you using Qemu in JIT mode and running another hypervisor underneath? The test that Fengguang used to find the problem was running the linux kernel directly using KVM. When the kernel was run with -cpu Haswell, +smep,+smap set, the vmcall failed with invalid op, but when the kernel is run with -cpu qemu64, the vmcall causes a vmexit, as it should. As far as I know, Fengguang's test doesn't use KVM at all, it runs Qemu as a JIT. Completely different thing. In that case Qemu probably should *not* set the hypervisor bit. However, the only thing that the hypervisor bit means is that you can look for specific hypervisor APIs in CPUID level 0x4000+. My point is, the vmcall was made because the hypervisor bit was set. If this bit had been turned off, as it would be on a real processor, the vmcall wouldn't have happened. And my point is that that is a bug. In the driver. A very serious one. You cannot call VMCALL until you know *which* hypervisor API(s) you have available, period. The hypervisor bit is a complete red herring. If the guest CPU is running in VT-x mode, then VMCALL should VMEXIT inside the guest (invoking the guest root VT-x), The CPU is running in VT-X. That was my point, the kernel is running in the KVM guest, and KVM is setting the CPU feature bits such that bit 31 is enabled. Which it is because it wants to export the KVM hypercall interface. However, keying VMCALL *only* on the HYPERVISOR bit is wrong in the extreme. I don't think it's a red herring because the kernel uses this bit elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU features, and can be checked with the cpu_has_hypervisor macro (which was not used by the original author of the code in the driver, but should have been). VMWare and KVM support in the kernel also check for this bit before checking their hypervisor leaves for an ID. If it's not properly set it affects more than just the s-Par drivers. but the fact still remains that you should never, ever, invoke VMCALL unless you know what hypervisor you have underneath. From the standpoint of the s-Par drivers, yes, I agree (as I already said). However, VMCALL is not a privileged instruction, so anyone could use it from user space and go right past the OS straight to the hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since any user could hard-stop the guest with a couple of lines of C. Typically the hypervisor wants to generate a #UD inside of the guest for that case. The guest OS will intercept it and SIGILL the user space process. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/12/2014 12:33 AM, H. Peter Anvin wrote: On 04/11/2014 06:51 AM, Romer, Benjamin M wrote: I'm still confused where KVM comes into the picture. Are you actually using KVM (and thus talking about nested virtualization) or are you using Qemu in JIT mode and running another hypervisor underneath? The test that Fengguang used to find the problem was running the linux kernel directly using KVM. When the kernel was run with -cpu Haswell, +smep,+smap set, the vmcall failed with invalid op, but when the kernel is run with -cpu qemu64, the vmcall causes a vmexit, as it should. As far as I know, Fengguang's test doesn't use KVM at all, it runs Qemu as a JIT. Completely different thing. In that case Qemu probably should *not* set the hypervisor bit. However, the only thing that the hypervisor bit means is that you can look for specific hypervisor APIs in CPUID level 0x4000+. My point is, the vmcall was made because the hypervisor bit was set. If this bit had been turned off, as it would be on a real processor, the vmcall wouldn't have happened. And my point is that that is a bug. In the driver. A very serious one. You cannot call VMCALL until you know *which* hypervisor API(s) you have available, period. The hypervisor bit is a complete red herring. If the guest CPU is running in VT-x mode, then VMCALL should VMEXIT inside the guest (invoking the guest root VT-x), The CPU is running in VT-X. That was my point, the kernel is running in the KVM guest, and KVM is setting the CPU feature bits such that bit 31 is enabled. Which it is because it wants to export the KVM hypercall interface. However, keying VMCALL *only* on the HYPERVISOR bit is wrong in the extreme. I don't think it's a red herring because the kernel uses this bit elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU features, and can be checked with the cpu_has_hypervisor macro (which was not used by the original author of the code in the driver, but should have been). VMWare and KVM support in the kernel also check for this bit before checking their hypervisor leaves for an ID. If it's not properly set it affects more than just the s-Par drivers. but the fact still remains that you should never, ever, invoke VMCALL unless you know what hypervisor you have underneath. From the standpoint of the s-Par drivers, yes, I agree (as I already said). However, VMCALL is not a privileged instruction, so anyone could use it from user space and go right past the OS straight to the hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since any user could hard-stop the guest with a couple of lines of C. Typically the hypervisor wants to generate a #UD inside of the guest for that case. The guest OS will intercept it and SIGILL the user space process. -hpa Hi Ben, I re-tested this case with/without option -enable-kvm. qemu-system-x86_64 -cpu Haswell,+smep,+smap invalid op qemu-system-x86_64 -cpu kvm64 invalid op qemu-system-x86_64 -cpu Haswell,+smep,+smap -enable-kvm everything OK qemu-system-x86_64 -cpu kvm64 -enable-kvm everything OK I think this is probably a bug in QEMU. Sorry for misleading you. I am not experienced in QEMU usage. I don't realize I need try this case with different options Until read Peter's reply. As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case. Thanks, Jet -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/11/2014 10:35 AM, Jet Chen wrote: As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case. Either way, unless there is a CPUID interface exposed in CPUID levels 0x4000+, then relying on the hypervisor bit to do VMCALL is wrong in the extreme. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Fri, 2014-04-11 at 10:40 -0700, H. Peter Anvin wrote: On 04/11/2014 10:35 AM, Jet Chen wrote: As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case. Either way, unless there is a CPUID interface exposed in CPUID levels 0x4000+, then relying on the hypervisor bit to do VMCALL is wrong in the extreme. -hpa I'll pass your feedback on to the people who wrote the bad code. Sorry for the trouble. :) -- Ben
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Sat, 2014-04-12 at 01:35 +0800, Jet Chen wrote: Hi Ben, I re-tested this case with/without option -enable-kvm. qemu-system-x86_64 -cpu Haswell,+smep,+smap invalid op qemu-system-x86_64 -cpu kvm64 invalid op qemu-system-x86_64 -cpu Haswell,+smep,+smap -enable-kvm everything OK qemu-system-x86_64 -cpu kvm64 -enable-kvm everything OK I think this is probably a bug in QEMU. Sorry for misleading you. I am not experienced in QEMU usage. I don't realize I need try this case with different options Until read Peter's reply. As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case. Thanks, Jet Great, thanks! Sorry for the trouble. :) -- Ben
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/10/2014 06:19 AM, Romer, Benjamin M wrote: > > I'm confused by the intended behavior of KVM.. Is the intention of the > -cpu switch to fully emulate a particular CPU? If that's the case, the > Intel documentation says bit 31 should always be 0, so the value > returned by the cpuid instruction isn't correct. If the intention is to > present a VM with a specific CPU architecture, the CPU ought to behave > as described in Intel's virtualization documentation and just vmexit > instead of faulting with invalid op, IMHO. > > I've already said the check in the code was insufficient, and I'm trying > to fix that part now. :) > I'm still confused where KVM comes into the picture. Are you actually using KVM (and thus talking about nested virtualization) or are you using Qemu in JIT mode and running another hypervisor underneath? The hypervisor bit is a complete red herring. If the guest CPU is running in VT-x mode, then VMCALL should VMEXIT inside the guest (invoking the guest root VT-x), but the fact still remains that you should never, ever, invoke VMCALL unless you know what hypervisor you have underneath. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Wed, 2014-04-09 at 16:10 -0700, H. Peter Anvin wrote: > On 04/09/2014 04:01 PM, Fengguang Wu wrote: > > CC the KVM people: it looks like a KVM problem that can be triggered by > > > > qemu-system-x86_64 -cpu Haswell,+smep,+smap > > I'm really confused. First of all, is this a KVM problem or is it a > Qemu JIT problem? > > Either seems really wonky. It is questionable at best whether or not > Qemu in JIT mode should set the hypervisor bit IMO. However, even so, > you *better* not call VMCALL *just* because the hypervisor bit is set. > > The reason for it is that you have absolutely no idea what VMCALL is > going to do on any one hypervisor... different hypervisors even use > completely different conventions for VMCALL, and some might not accept > VMCALL at all and might just terminate your guest with extreme prejudice. > > So what is actually going on here? > > -hpa > I'm confused by the intended behavior of KVM.. Is the intention of the -cpu switch to fully emulate a particular CPU? If that's the case, the Intel documentation says bit 31 should always be 0, so the value returned by the cpuid instruction isn't correct. If the intention is to present a VM with a specific CPU architecture, the CPU ought to behave as described in Intel's virtualization documentation and just vmexit instead of faulting with invalid op, IMHO. I've already said the check in the code was insufficient, and I'm trying to fix that part now. :)
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Wed, 2014-04-09 at 16:10 -0700, H. Peter Anvin wrote: On 04/09/2014 04:01 PM, Fengguang Wu wrote: CC the KVM people: it looks like a KVM problem that can be triggered by qemu-system-x86_64 -cpu Haswell,+smep,+smap I'm really confused. First of all, is this a KVM problem or is it a Qemu JIT problem? Either seems really wonky. It is questionable at best whether or not Qemu in JIT mode should set the hypervisor bit IMO. However, even so, you *better* not call VMCALL *just* because the hypervisor bit is set. The reason for it is that you have absolutely no idea what VMCALL is going to do on any one hypervisor... different hypervisors even use completely different conventions for VMCALL, and some might not accept VMCALL at all and might just terminate your guest with extreme prejudice. So what is actually going on here? -hpa I'm confused by the intended behavior of KVM.. Is the intention of the -cpu switch to fully emulate a particular CPU? If that's the case, the Intel documentation says bit 31 should always be 0, so the value returned by the cpuid instruction isn't correct. If the intention is to present a VM with a specific CPU architecture, the CPU ought to behave as described in Intel's virtualization documentation and just vmexit instead of faulting with invalid op, IMHO. I've already said the check in the code was insufficient, and I'm trying to fix that part now. :)
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/10/2014 06:19 AM, Romer, Benjamin M wrote: I'm confused by the intended behavior of KVM.. Is the intention of the -cpu switch to fully emulate a particular CPU? If that's the case, the Intel documentation says bit 31 should always be 0, so the value returned by the cpuid instruction isn't correct. If the intention is to present a VM with a specific CPU architecture, the CPU ought to behave as described in Intel's virtualization documentation and just vmexit instead of faulting with invalid op, IMHO. I've already said the check in the code was insufficient, and I'm trying to fix that part now. :) I'm still confused where KVM comes into the picture. Are you actually using KVM (and thus talking about nested virtualization) or are you using Qemu in JIT mode and running another hypervisor underneath? The hypervisor bit is a complete red herring. If the guest CPU is running in VT-x mode, then VMCALL should VMEXIT inside the guest (invoking the guest root VT-x), but the fact still remains that you should never, ever, invoke VMCALL unless you know what hypervisor you have underneath. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/09/2014 04:01 PM, Fengguang Wu wrote: > CC the KVM people: it looks like a KVM problem that can be triggered by > > qemu-system-x86_64 -cpu Haswell,+smep,+smap I'm really confused. First of all, is this a KVM problem or is it a Qemu JIT problem? Either seems really wonky. It is questionable at best whether or not Qemu in JIT mode should set the hypervisor bit IMO. However, even so, you *better* not call VMCALL *just* because the hypervisor bit is set. The reason for it is that you have absolutely no idea what VMCALL is going to do on any one hypervisor... different hypervisors even use completely different conventions for VMCALL, and some might not accept VMCALL at all and might just terminate your guest with extreme prejudice. So what is actually going on here? -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/09/2014 04:01 PM, Fengguang Wu wrote: > CC the KVM people: it looks like a KVM problem that can be triggered by > > qemu-system-x86_64 -cpu Haswell,+smep,+smap Is it a KVM problem or a Qemu bug? It sounds more like a Qemu JIT bug. -hpa > On Thu, Apr 10, 2014 at 01:58:18AM +0800, Jet Chen wrote: >> On 04/09/2014 10:44 PM, Romer, Benjamin M wrote: >>> On Wed, 2014-04-09 at 02:38 +0800, Jet Chen wrote: >>> Hi Ben, I checked my >>> Manual> which published in Feb 2014. Volume 2: Instruction Set Reference, A-Z: CPUID--CPU Identification >>> >>> I agree completely, which is why I'm confused about KVM's behavior. If >>> bit 31 was off, the code in our drivers that uses the vmcall instruction >>> would not have been run, the kernel would not have tried to perform a >>> vmcall, and not crashed with invalid op. >>> >>> If you look in the definition for the VMCALL instruction (Intel 64 and >>> IA32 Architectures Software Developer's Manual, volume 3C pg.30-9) >>> You'll see that a processor in VMX non-root operation should perform a >>> vmexit. >>> Why this document not match what you said ? I am not experienced with VM, please correct me if I went for wrong document >>> >>> According to VMWare's documentation (there is a page at >>> http://kb.vmware.com./selfservice/microsites/search.do?cmd=displayKC=1009458 >>> ) , as well as Microsoft's hypervisor spec (at >>> http://www.microsoft.com/en-us/download/details.aspx?id=39289 ), this bit >>> is used to indicate the CPU is running under virtualization. KVM is also >>> setting this bit to indicate virtualization. I believe Xen uses it as well. >>> >>> >>> My contention is, if KVM is going to set the ISVM bit, it needs to do a >>> vmexit, and if it's not going to set the bit, then doing an invalid op >>> is okay, but the current behavior is inconsistent. >>> >>> -- Ben >>> >> >> Ben, >> >> Really thanks for your explanation. >> Let me summary it up, please correct me where i am wrong. If it is really a >> KVM bug, we report it to KVM guys. >> On a real CPU, ECX 31bit always be 0 as Intel documentation filed. >> However, KVM, as a hypervisor, should emulate this bit of the virtual ECX >> register to 1 for guest OS to indicate it is running in a virtualization >> environment. >> Problem is, KVM does set this bit to 1, but does an invalid op instead of >> emit a VMCALL. As a result, we get this dmesg error messages. >> >> Thanks, >> -Jet -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
CC the KVM people: it looks like a KVM problem that can be triggered by qemu-system-x86_64 -cpu Haswell,+smep,+smap On Thu, Apr 10, 2014 at 01:58:18AM +0800, Jet Chen wrote: > On 04/09/2014 10:44 PM, Romer, Benjamin M wrote: > > On Wed, 2014-04-09 at 02:38 +0800, Jet Chen wrote: > > > >> Hi Ben, > >> > >> I checked my >> Manual> which published in Feb 2014. > >> Volume 2: Instruction Set Reference, A-Z: CPUID--CPU Identification > >> > > > > I agree completely, which is why I'm confused about KVM's behavior. If > > bit 31 was off, the code in our drivers that uses the vmcall instruction > > would not have been run, the kernel would not have tried to perform a > > vmcall, and not crashed with invalid op. > > > > If you look in the definition for the VMCALL instruction (Intel 64 and > > IA32 Architectures Software Developer's Manual, volume 3C pg.30-9) > > You'll see that a processor in VMX non-root operation should perform a > > vmexit. > > > >> Why this document not match what you said ? I am not experienced with VM, > >> please correct me if I went for wrong document > >> > > > > According to VMWare's documentation (there is a page at > > http://kb.vmware.com./selfservice/microsites/search.do?cmd=displayKC=1009458 > > ) , as well as Microsoft's hypervisor spec (at > > http://www.microsoft.com/en-us/download/details.aspx?id=39289 ), this bit > > is used to indicate the CPU is running under virtualization. KVM is also > > setting this bit to indicate virtualization. I believe Xen uses it as well. > > > > > > My contention is, if KVM is going to set the ISVM bit, it needs to do a > > vmexit, and if it's not going to set the bit, then doing an invalid op > > is okay, but the current behavior is inconsistent. > > > > -- Ben > > > > Ben, > > Really thanks for your explanation. > Let me summary it up, please correct me where i am wrong. If it is really a > KVM bug, we report it to KVM guys. > On a real CPU, ECX 31bit always be 0 as Intel documentation filed. > However, KVM, as a hypervisor, should emulate this bit of the virtual ECX > register to 1 for guest OS to indicate it is running in a virtualization > environment. > Problem is, KVM does set this bit to 1, but does an invalid op instead of > emit a VMCALL. As a result, we get this dmesg error messages. > > Thanks, > -Jet -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
CC the KVM people: it looks like a KVM problem that can be triggered by qemu-system-x86_64 -cpu Haswell,+smep,+smap On Thu, Apr 10, 2014 at 01:58:18AM +0800, Jet Chen wrote: On 04/09/2014 10:44 PM, Romer, Benjamin M wrote: On Wed, 2014-04-09 at 02:38 +0800, Jet Chen wrote: Hi Ben, I checked my Intel 64 and IA-32 Architectures Software Developer's Manual which published in Feb 2014. Volume 2: Instruction Set Reference, A-Z: CPUID--CPU Identification I agree completely, which is why I'm confused about KVM's behavior. If bit 31 was off, the code in our drivers that uses the vmcall instruction would not have been run, the kernel would not have tried to perform a vmcall, and not crashed with invalid op. If you look in the definition for the VMCALL instruction (Intel 64 and IA32 Architectures Software Developer's Manual, volume 3C pg.30-9) You'll see that a processor in VMX non-root operation should perform a vmexit. Why this document not match what you said ? I am not experienced with VM, please correct me if I went for wrong document According to VMWare's documentation (there is a page at http://kb.vmware.com./selfservice/microsites/search.do?cmd=displayKCexternalId=1009458 ) , as well as Microsoft's hypervisor spec (at http://www.microsoft.com/en-us/download/details.aspx?id=39289 ), this bit is used to indicate the CPU is running under virtualization. KVM is also setting this bit to indicate virtualization. I believe Xen uses it as well. My contention is, if KVM is going to set the ISVM bit, it needs to do a vmexit, and if it's not going to set the bit, then doing an invalid op is okay, but the current behavior is inconsistent. -- Ben Ben, Really thanks for your explanation. Let me summary it up, please correct me where i am wrong. If it is really a KVM bug, we report it to KVM guys. On a real CPU, ECX 31bit always be 0 as Intel documentation filed. However, KVM, as a hypervisor, should emulate this bit of the virtual ECX register to 1 for guest OS to indicate it is running in a virtualization environment. Problem is, KVM does set this bit to 1, but does an invalid op instead of emit a VMCALL. As a result, we get this dmesg error messages. Thanks, -Jet -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/09/2014 04:01 PM, Fengguang Wu wrote: CC the KVM people: it looks like a KVM problem that can be triggered by qemu-system-x86_64 -cpu Haswell,+smep,+smap Is it a KVM problem or a Qemu bug? It sounds more like a Qemu JIT bug. -hpa On Thu, Apr 10, 2014 at 01:58:18AM +0800, Jet Chen wrote: On 04/09/2014 10:44 PM, Romer, Benjamin M wrote: On Wed, 2014-04-09 at 02:38 +0800, Jet Chen wrote: Hi Ben, I checked my Intel 64 and IA-32 Architectures Software Developer's Manual which published in Feb 2014. Volume 2: Instruction Set Reference, A-Z: CPUID--CPU Identification I agree completely, which is why I'm confused about KVM's behavior. If bit 31 was off, the code in our drivers that uses the vmcall instruction would not have been run, the kernel would not have tried to perform a vmcall, and not crashed with invalid op. If you look in the definition for the VMCALL instruction (Intel 64 and IA32 Architectures Software Developer's Manual, volume 3C pg.30-9) You'll see that a processor in VMX non-root operation should perform a vmexit. Why this document not match what you said ? I am not experienced with VM, please correct me if I went for wrong document According to VMWare's documentation (there is a page at http://kb.vmware.com./selfservice/microsites/search.do?cmd=displayKCexternalId=1009458 ) , as well as Microsoft's hypervisor spec (at http://www.microsoft.com/en-us/download/details.aspx?id=39289 ), this bit is used to indicate the CPU is running under virtualization. KVM is also setting this bit to indicate virtualization. I believe Xen uses it as well. My contention is, if KVM is going to set the ISVM bit, it needs to do a vmexit, and if it's not going to set the bit, then doing an invalid op is okay, but the current behavior is inconsistent. -- Ben Ben, Really thanks for your explanation. Let me summary it up, please correct me where i am wrong. If it is really a KVM bug, we report it to KVM guys. On a real CPU, ECX 31bit always be 0 as Intel documentation filed. However, KVM, as a hypervisor, should emulate this bit of the virtual ECX register to 1 for guest OS to indicate it is running in a virtualization environment. Problem is, KVM does set this bit to 1, but does an invalid op instead of emit a VMCALL. As a result, we get this dmesg error messages. Thanks, -Jet -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/09/2014 04:01 PM, Fengguang Wu wrote: CC the KVM people: it looks like a KVM problem that can be triggered by qemu-system-x86_64 -cpu Haswell,+smep,+smap I'm really confused. First of all, is this a KVM problem or is it a Qemu JIT problem? Either seems really wonky. It is questionable at best whether or not Qemu in JIT mode should set the hypervisor bit IMO. However, even so, you *better* not call VMCALL *just* because the hypervisor bit is set. The reason for it is that you have absolutely no idea what VMCALL is going to do on any one hypervisor... different hypervisors even use completely different conventions for VMCALL, and some might not accept VMCALL at all and might just terminate your guest with extreme prejudice. So what is actually going on here? -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Tue, 2014-04-08 at 10:53 +0800, Fengguang Wu wrote: > Hi Benjamin, > > > Fengguang, > > > > I ran your script against freshly-checked-out source from staging-next, and > > was not able to reproduce the error with it. My boot log is attached. I > > noticed that your log did not have "Hypervisor detected: KVM" in the trace. > > The KVM options in your script also differ substantially from the ones > > shown at the end of your trace... > > > When I reran your script with the "-cpu Haswell,+smep,+smap" option I was > > able to get the same result as you. IMHO KVM should not be setting this bit > > if it's emulating bare metal. > > Sorry.. We tried to provide a simplified reproduce script and in your > case, it has a significant mismatch with the real KVM options. We'll > fix it, thanks for pointing it out! > > Thanks, > Fengguang That will be helpful, and as I mentioned, I can reproduce your results, but I'm still not sure why a virtualized processor is giving an invalid opcode fault on a vmcall. The Intel documentation is pretty specific about this - IF not in VMX operation THEN #UD; ELSIF in VMX non-root operation THEN VM exit. Either KVM should be saying "I'm a real processor and not a virtual CPU, really!" - in which case, the hypervisor bit should be off and vmcalls should cause an invalid opcode fault, or, KVM should be saying "I'm a vritualized processor!" and setting the hypervisor bit, and doing a vmexit on vmcall instead. This seems like a KVM bug to me. -- Ben
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Tue, 2014-04-08 at 10:53 +0800, Fengguang Wu wrote: Hi Benjamin, Fengguang, I ran your script against freshly-checked-out source from staging-next, and was not able to reproduce the error with it. My boot log is attached. I noticed that your log did not have Hypervisor detected: KVM in the trace. The KVM options in your script also differ substantially from the ones shown at the end of your trace... When I reran your script with the -cpu Haswell,+smep,+smap option I was able to get the same result as you. IMHO KVM should not be setting this bit if it's emulating bare metal. Sorry.. We tried to provide a simplified reproduce script and in your case, it has a significant mismatch with the real KVM options. We'll fix it, thanks for pointing it out! Thanks, Fengguang That will be helpful, and as I mentioned, I can reproduce your results, but I'm still not sure why a virtualized processor is giving an invalid opcode fault on a vmcall. The Intel documentation is pretty specific about this - IF not in VMX operation THEN #UD; ELSIF in VMX non-root operation THEN VM exit. Either KVM should be saying I'm a real processor and not a virtual CPU, really! - in which case, the hypervisor bit should be off and vmcalls should cause an invalid opcode fault, or, KVM should be saying I'm a vritualized processor! and setting the hypervisor bit, and doing a vmexit on vmcall instead. This seems like a KVM bug to me. -- Ben
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
Hi Benjamin, > Fengguang, > > I ran your script against freshly-checked-out source from staging-next, and > was not able to reproduce the error with it. My boot log is attached. I > noticed that your log did not have "Hypervisor detected: KVM" in the trace. > The KVM options in your script also differ substantially from the ones shown > at the end of your trace... > When I reran your script with the "-cpu Haswell,+smep,+smap" option I was > able to get the same result as you. IMHO KVM should not be setting this bit > if it's emulating bare metal. Sorry.. We tried to provide a simplified reproduce script and in your case, it has a significant mismatch with the real KVM options. We'll fix it, thanks for pointing it out! Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Mon, Apr 07, 2014 at 12:23:47PM -0700, Greg Kroah-Hartman wrote: > On Mon, Apr 07, 2014 at 09:24:37AM -0500, Ken Cox wrote: > > > > On 04/07/2014 09:09 AM, Greg Kroah-Hartman wrote: > > >On Mon, Apr 07, 2014 at 07:17:25PM +0800, Fengguang Wu wrote: > > >>Hi Ken, > > >> > > >>I got the below dmesg and the first bad commit is > > >> > > >>git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > > >> > > >>commit 12e364b9f08aa335dc7716ce74113e834c993765 > > >>Author: Ken Cox > > >>AuthorDate: Tue Mar 4 07:58:07 2014 -0600 > > >>Commit: Greg Kroah-Hartman > > >>CommitDate: Tue Mar 4 16:58:21 2014 -0800 > > >> > > >> staging: visorchipset driver to provide registration and other > > >> services > > >I think Sasha has already sent a fix to resolve this issue that I'll be > > >sending to Linus in a day or so. > > > > > >Ken, is Sasha's patch going to resolve this issue as well? It looks > > >like people haven't tested what happens when the module is loaded > > >without the hardware present in the system :( > > You are exactly right. The driver needs to check for hardware early on > > before trying to use it. Unfortunately, Sasha's patch will not resolve this > > one. I'll work with Ben Romer to get a patch out ASAP. > > Wait, in looking at this closer, I don't see any of the "normal" > hardware checks to determine that this really is a valid piece of > hardware present, before it starts to just go and initialize a whole > bunch of things (sysfs busses, proc files and directories, and other > things.) > > That's not ok, and it's obvious it's starting to affect people's work > systems. > > How about I just mark the whole thing BROKEN for now, disabling the > build, until "correct" hardware probing can be added to the driver, so > no one else gets hurt by this? In looking at it further, that seems like the best thing to do for now, we can slowly enable the driver back after things like proper device probing is fixed up so as to not break people's boxes. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Mon, Apr 07, 2014 at 09:24:37AM -0500, Ken Cox wrote: > > On 04/07/2014 09:09 AM, Greg Kroah-Hartman wrote: > >On Mon, Apr 07, 2014 at 07:17:25PM +0800, Fengguang Wu wrote: > >>Hi Ken, > >> > >>I got the below dmesg and the first bad commit is > >> > >>git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > >> > >>commit 12e364b9f08aa335dc7716ce74113e834c993765 > >>Author: Ken Cox > >>AuthorDate: Tue Mar 4 07:58:07 2014 -0600 > >>Commit: Greg Kroah-Hartman > >>CommitDate: Tue Mar 4 16:58:21 2014 -0800 > >> > >> staging: visorchipset driver to provide registration and other services > >I think Sasha has already sent a fix to resolve this issue that I'll be > >sending to Linus in a day or so. > > > >Ken, is Sasha's patch going to resolve this issue as well? It looks > >like people haven't tested what happens when the module is loaded > >without the hardware present in the system :( > You are exactly right. The driver needs to check for hardware early on > before trying to use it. Unfortunately, Sasha's patch will not resolve this > one. I'll work with Ben Romer to get a patch out ASAP. Wait, in looking at this closer, I don't see any of the "normal" hardware checks to determine that this really is a valid piece of hardware present, before it starts to just go and initialize a whole bunch of things (sysfs busses, proc files and directories, and other things.) That's not ok, and it's obvious it's starting to affect people's work systems. How about I just mark the whole thing BROKEN for now, disabling the build, until "correct" hardware probing can be added to the driver, so no one else gets hurt by this? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/07/2014 09:09 AM, Greg Kroah-Hartman wrote: On Mon, Apr 07, 2014 at 07:17:25PM +0800, Fengguang Wu wrote: Hi Ken, I got the below dmesg and the first bad commit is git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit 12e364b9f08aa335dc7716ce74113e834c993765 Author: Ken Cox AuthorDate: Tue Mar 4 07:58:07 2014 -0600 Commit: Greg Kroah-Hartman CommitDate: Tue Mar 4 16:58:21 2014 -0800 staging: visorchipset driver to provide registration and other services I think Sasha has already sent a fix to resolve this issue that I'll be sending to Linus in a day or so. Ken, is Sasha's patch going to resolve this issue as well? It looks like people haven't tested what happens when the module is loaded without the hardware present in the system :( You are exactly right. The driver needs to check for hardware early on before trying to use it. Unfortunately, Sasha's patch will not resolve this one. I'll work with Ben Romer to get a patch out ASAP. Thanks, Ken Cox -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Mon, Apr 07, 2014 at 07:17:25PM +0800, Fengguang Wu wrote: > Hi Ken, > > I got the below dmesg and the first bad commit is > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > > commit 12e364b9f08aa335dc7716ce74113e834c993765 > Author: Ken Cox > AuthorDate: Tue Mar 4 07:58:07 2014 -0600 > Commit: Greg Kroah-Hartman > CommitDate: Tue Mar 4 16:58:21 2014 -0800 > > staging: visorchipset driver to provide registration and other services I think Sasha has already sent a fix to resolve this issue that I'll be sending to Linus in a day or so. Ken, is Sasha's patch going to resolve this issue as well? It looks like people haven't tested what happens when the module is loaded without the hardware present in the system :( thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/07/2014 06:17 AM, Fengguang Wu wrote: Hi Ken, I got the below dmesg and the first bad commit is git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit 12e364b9f08aa335dc7716ce74113e834c993765 Author: Ken Cox AuthorDate: Tue Mar 4 07:58:07 2014 -0600 Commit: Greg Kroah-Hartman CommitDate: Tue Mar 4 16:58:21 2014 -0800 --snip-- [ 24.135101] FPGA image file name: xlinx_fpga_firmware.bit [ 24.137595] GPIO INIT FAIL!! [ 24.141283] driver version 1.0.0.0 loaded [ 24.142539] chipset driver version 1.0.0.0 loadedinvalid opcode: [#1] PREEMPT SMP [ 24.144793] Modules linked in: [ 24.145303] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc5-00621-g12e364b #1 [ 24.145303] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 24.145303] task: 88001157a010 ti: 88001157c000 task.ti: 88001157c000 [ 24.145303] RIP: 0010:[] [] visorchipset_init+0x7b/0x8c5 The problem is that the driver is trying to call firmware code that only exists on Unisys s-Par hardware. I will add a check to make sure the driver is running on the correct platform before trying to call into the firmware. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/07/2014 06:17 AM, Fengguang Wu wrote: Hi Ken, I got the below dmesg and the first bad commit is git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit 12e364b9f08aa335dc7716ce74113e834c993765 Author: Ken Cox j...@redhat.com AuthorDate: Tue Mar 4 07:58:07 2014 -0600 Commit: Greg Kroah-Hartman gre...@linuxfoundation.org CommitDate: Tue Mar 4 16:58:21 2014 -0800 --snip-- [ 24.135101] FPGA image file name: xlinx_fpga_firmware.bit [ 24.137595] GPIO INIT FAIL!! [ 24.141283] driver version 1.0.0.0 loaded [ 24.142539] chipset driver version 1.0.0.0 loadedinvalid opcode: [#1] PREEMPT SMP [ 24.144793] Modules linked in: [ 24.145303] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc5-00621-g12e364b #1 [ 24.145303] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 24.145303] task: 88001157a010 ti: 88001157c000 task.ti: 88001157c000 [ 24.145303] RIP: 0010:[81e37115] [81e37115] visorchipset_init+0x7b/0x8c5 The problem is that the driver is trying to call firmware code that only exists on Unisys s-Par hardware. I will add a check to make sure the driver is running on the correct platform before trying to call into the firmware. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Mon, Apr 07, 2014 at 07:17:25PM +0800, Fengguang Wu wrote: Hi Ken, I got the below dmesg and the first bad commit is git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit 12e364b9f08aa335dc7716ce74113e834c993765 Author: Ken Cox j...@redhat.com AuthorDate: Tue Mar 4 07:58:07 2014 -0600 Commit: Greg Kroah-Hartman gre...@linuxfoundation.org CommitDate: Tue Mar 4 16:58:21 2014 -0800 staging: visorchipset driver to provide registration and other services I think Sasha has already sent a fix to resolve this issue that I'll be sending to Linus in a day or so. Ken, is Sasha's patch going to resolve this issue as well? It looks like people haven't tested what happens when the module is loaded without the hardware present in the system :( thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On 04/07/2014 09:09 AM, Greg Kroah-Hartman wrote: On Mon, Apr 07, 2014 at 07:17:25PM +0800, Fengguang Wu wrote: Hi Ken, I got the below dmesg and the first bad commit is git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit 12e364b9f08aa335dc7716ce74113e834c993765 Author: Ken Cox j...@redhat.com AuthorDate: Tue Mar 4 07:58:07 2014 -0600 Commit: Greg Kroah-Hartman gre...@linuxfoundation.org CommitDate: Tue Mar 4 16:58:21 2014 -0800 staging: visorchipset driver to provide registration and other services I think Sasha has already sent a fix to resolve this issue that I'll be sending to Linus in a day or so. Ken, is Sasha's patch going to resolve this issue as well? It looks like people haven't tested what happens when the module is loaded without the hardware present in the system :( You are exactly right. The driver needs to check for hardware early on before trying to use it. Unfortunately, Sasha's patch will not resolve this one. I'll work with Ben Romer to get a patch out ASAP. Thanks, Ken Cox -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Mon, Apr 07, 2014 at 09:24:37AM -0500, Ken Cox wrote: On 04/07/2014 09:09 AM, Greg Kroah-Hartman wrote: On Mon, Apr 07, 2014 at 07:17:25PM +0800, Fengguang Wu wrote: Hi Ken, I got the below dmesg and the first bad commit is git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit 12e364b9f08aa335dc7716ce74113e834c993765 Author: Ken Cox j...@redhat.com AuthorDate: Tue Mar 4 07:58:07 2014 -0600 Commit: Greg Kroah-Hartman gre...@linuxfoundation.org CommitDate: Tue Mar 4 16:58:21 2014 -0800 staging: visorchipset driver to provide registration and other services I think Sasha has already sent a fix to resolve this issue that I'll be sending to Linus in a day or so. Ken, is Sasha's patch going to resolve this issue as well? It looks like people haven't tested what happens when the module is loaded without the hardware present in the system :( You are exactly right. The driver needs to check for hardware early on before trying to use it. Unfortunately, Sasha's patch will not resolve this one. I'll work with Ben Romer to get a patch out ASAP. Wait, in looking at this closer, I don't see any of the normal hardware checks to determine that this really is a valid piece of hardware present, before it starts to just go and initialize a whole bunch of things (sysfs busses, proc files and directories, and other things.) That's not ok, and it's obvious it's starting to affect people's work systems. How about I just mark the whole thing BROKEN for now, disabling the build, until correct hardware probing can be added to the driver, so no one else gets hurt by this? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
On Mon, Apr 07, 2014 at 12:23:47PM -0700, Greg Kroah-Hartman wrote: On Mon, Apr 07, 2014 at 09:24:37AM -0500, Ken Cox wrote: On 04/07/2014 09:09 AM, Greg Kroah-Hartman wrote: On Mon, Apr 07, 2014 at 07:17:25PM +0800, Fengguang Wu wrote: Hi Ken, I got the below dmesg and the first bad commit is git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit 12e364b9f08aa335dc7716ce74113e834c993765 Author: Ken Cox j...@redhat.com AuthorDate: Tue Mar 4 07:58:07 2014 -0600 Commit: Greg Kroah-Hartman gre...@linuxfoundation.org CommitDate: Tue Mar 4 16:58:21 2014 -0800 staging: visorchipset driver to provide registration and other services I think Sasha has already sent a fix to resolve this issue that I'll be sending to Linus in a day or so. Ken, is Sasha's patch going to resolve this issue as well? It looks like people haven't tested what happens when the module is loaded without the hardware present in the system :( You are exactly right. The driver needs to check for hardware early on before trying to use it. Unfortunately, Sasha's patch will not resolve this one. I'll work with Ben Romer to get a patch out ASAP. Wait, in looking at this closer, I don't see any of the normal hardware checks to determine that this really is a valid piece of hardware present, before it starts to just go and initialize a whole bunch of things (sysfs busses, proc files and directories, and other things.) That's not ok, and it's obvious it's starting to affect people's work systems. How about I just mark the whole thing BROKEN for now, disabling the build, until correct hardware probing can be added to the driver, so no one else gets hurt by this? In looking at it further, that seems like the best thing to do for now, we can slowly enable the driver back after things like proper device probing is fixed up so as to not break people's boxes. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP
Hi Benjamin, Fengguang, I ran your script against freshly-checked-out source from staging-next, and was not able to reproduce the error with it. My boot log is attached. I noticed that your log did not have Hypervisor detected: KVM in the trace. The KVM options in your script also differ substantially from the ones shown at the end of your trace... When I reran your script with the -cpu Haswell,+smep,+smap option I was able to get the same result as you. IMHO KVM should not be setting this bit if it's emulating bare metal. Sorry.. We tried to provide a simplified reproduce script and in your case, it has a significant mismatch with the real KVM options. We'll fix it, thanks for pointing it out! Thanks, Fengguang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/