subject:"Re\: \[PATCH v4 2\/3\] i386\: Explicitly ignore unsupported BUS_MCEERR

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-10-13 Thread William Roche


Just a note to inform you that I've submitted a new patch on a
separate thread -- dealing with VM live migration after receiving
memory errors:
https://lore.kernel.org/qemu-devel/20231013150839.867164-3-william.ro...@oracle.com/

This patch belongs to a 2 patches set that should fix the migration in
case of memory errors received and handled by the VM before the
migration request.

For the moment this other patch only fixes the ARM case ignoring
SIGBUS/BUS_MCEERR_AO errors, but the same mechanism should be used with
AMD ignoring SIGBUS/BUS_MCEERR_AO too. Using the same new parameter
to the kvm_hwpoison_page_add function in kvm_arch_on_sigbus_vcpu with:

kvm_hwpoison_page_add(ram_addr, (code == BUS_MCEERR_AR));

Of course we'll have to wait for this above patch to be integrated first.

HTH,
William.


On 9/19/23 00:00, William Roche wrote:
> Hi John,
>
> I'd like to put the emphasis on the fact that ignoring the SRAO error
> for a VM is a real problem at least for a specific (rare) case I'm
> currently working on: The VM migration.
>
> Context:
>
> - In the case of a poisoned page in the VM address space, the migration
> can't read it and will skip this page, considering it as a zero-filled
> page. The VM kernel (that handled the vMCE) would have marked it's
> associated page as poisoned, and if the VM touches the page, the VM
> kernel generates the associated MCE because it already knows about the
> poisoned page.
>
> - When we ignore the vMCE in the case of a SIGBUS/BUS_MCEERR_AO error
> (what this patch does), we entirely rely on the Hypervisor to send an
> SRAR error to qemu when the page is touched: The AMD VM kernel will
> receive the SIGBUS/BUS_MCEERR_AR and deal with it, thanks to your
> changes here.
>
> So it looks like the mechanism works fine... unless the VM has migrated
> between the SRAO error and the first time it really touches the poisoned
> page to get an SRAR error !  In this case, its new address space
> (created on the migration destination) will have a zero-page where we
> had a poisoned page, and the AMD VM Kernel (that never dealt with the
> SRAO) doesn't know about the poisoned page and will access the page
> finding only zeros...  We have a memory corruption !
>
> It is a very rare window, but in order to fix it the most reasonable
> course of action would be to make the AMD emulation deal with SRAO
> errors, instead of ignoring them.
>
> Do you agree with my analysis ?
> Would an AMD platform generate SRAO signal to a process
> (SIGBUS/BUS_MCEERR_AO) in case of a real hardware error ?
>
> Thanks,
> William.

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-22 Thread William Roche


On 9/22/23 16:30, Yazen Ghannam wrote:

On 9/22/23 4:36 AM, William Roche wrote:

On 9/21/23 19:41, Yazen Ghannam wrote:

[...]
Also, during page migration, does the data flow through the CPU core?
Sorry for the basic question. I haven't done a lot with virtualization.


Yes, in most cases (with the exception of RDMA) the data flow through
the CPU cores because the migration verifies if the area to transfer has
some empty pages.



If the CPU moves the memory, then the data will pass through the core/L1
caches, correct? If so, then this will result in a MCE/poison
consumption/AR event in that core.


That's the entire point of this other patch I was referring to:
 "Qemu crashes on VM migration after an handled memory error"
an example of a direct link:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg990803.html

The idea is to skip the pages we know are poisoned -- so we have a
chance to complete the migration without getting AR events :)



So it seems to me that migration will always cause an AR event, and the
gap you describe will not occur. Does this make sense? Sorry if I
misunderstood.

In general, the hardware is designed to detect and mark poison, and to
not let poison escape a system undetected. In the strictest case, the
hardware will perform a system reset if poison is leaving the system. In
a more graceful case, the hardware will continue to pass the poison
marker with the data, so the destination hardware will receive it. In
both cases, the goal is to avoid silent data corruption, and to do so in
the hardware, i.e. without relying on firmware or software management.
The hardware designers are very keen on this point.


For the moment virtualization needs *several* enhancements just to deal
with memory errors -- what we are currently trying to fix is a good
example of that !



BTW, the RDMA case will need further discussion. I *think* this would
fall under the "strictest" case. And likely, CPU-based migration will
also. But I think we can test this and find out. :)


The test has been done, and showed that the RDMA migration is failing
when poison exists.
But we are discussing aspects that are probably too far from our main
topic here.





Please note that current AMD systems use an internal poison marker on
memory. This cannot be cleared through normal memory operations. The
only exception, I think, is to use the CLZERO instruction. This will
completely wipe a cacheline including metadata like poison, etc.

So the hardware should not (by design) loose track of poisoned data.


This would be better, but virtualization migration currently looses
track of this.
Which is not a problem for VMs where the kernel took note of the poison
and keeps track of it. Because this kernel will handle the poison
locations it knows about, signaling when these poisoned locations are
touched.



Can you please elaborate on this? I would expect the host kernel to do
all the physical, including poison, memory management.


Yes, the host kernel does that, and the VM kernel too for its own
address space.



Or do you mean in the nested poison case like this?
1) The host detects an "AO/deferred" error.


The host Kernel is notified by the hardware of an SRAO/deferred error


2) The host can try to recover the memory, if clean, etc.


From my understanding, this is an uncorrectable error, standard case
Kernel can't "clean" the error, but keeps track of it and tries to
signal the user of the impacted memory page every-time it's needed.


3) Otherwise, the host passes the error info, with "AO/deferred" severity
to the guest.


Yes, in the case of a guest VM impacted, qemu asked to be informed of AO
events, so that the host kernel should signal it to qemu. Qemu than
relays the information (creating a virtual MCE event) that the VM Kernel
receives and deals with.


4) The guest, in nested fashion, can try to recover the memory, if
clean, etc. Or signal its own processes with the AO SIGBUS.


Here again there is no recovery: The VM kernel does the same thing as
the host kernel: memory management, possible signals, etc...



An enhancement will be to take the MCA error information collected
during the interrupt and extract useful data. For example, we'll need to
translate the reported address to a system physical address that can be
mapped to a page.


This would be great, as it would mean that a kernel running in a VM can
get notified too.



Yes, I agree.



Once we have the page, then we can decide how we want to signal the
process(es). We could get a deferred/AO error in the host, and signal the
guest with an AR. So the guest handling could be the same in both cases. >
Would this be okay? Or is it important that the guest can distinguish
between the A0/AR cases?



SIGBUS/BUS_MCEERR_AO and BUS_MCEERR_AR are not interchangeable, it is
important to distinguish them.
AO is an asynchronous signal that is only generated when the process
asked for it -- indicating that an error has been detected in its
address space

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-22 Thread Yazen Ghannam

On 9/22/23 4:36 AM, William Roche wrote:
> On 9/21/23 19:41, Yazen Ghannam wrote:
>> On 9/20/23 7:13 AM, Joao Martins wrote:
>>> On 18/09/2023 23:00, William Roche wrote:
 [...]
 So it looks like the mechanism works fine... unless the VM has migrated
 between the SRAO error and the first time it really touches the poisoned
 page to get an SRAR error !  In this case, its new address space
 (created on the migration destination) will have a zero-page where we
 had a poisoned page, and the AMD VM Kernel (that never dealt with the
 SRAO) doesn't know about the poisoned page and will access the page
 finding only zeros...  We have a memory corruption !
>>
>> I don't understand this. Why would the page be zero? Even so, why would
>> that affect poison?
> 
> The migration of a VM moves the memory content from a source platform to
> a destination. This is mainly the qemu processes reading the data and
> replicating it on the destination. The source qemu where a memory page
> is poisoned is(will be[*]) able to skip the poisoned pages it knows
> about to indicate to the destination machine to populate the associated
> page(s) with zeros as there is no "poison destination page" mechanism in
> place for this migration transfer.
> 
>>
>> Also, during page migration, does the data flow through the CPU core?
>> Sorry for the basic question. I haven't done a lot with virtualization.
> 
> Yes, in most cases (with the exception of RDMA) the data flow through
> the CPU cores because the migration verifies if the area to transfer has
> some empty pages.
>

If the CPU moves the memory, then the data will pass through the core/L1
caches, correct? If so, then this will result in a MCE/poison
consumption/AR event in that core.

So it seems to me that migration will always cause an AR event, and the
gap you describe will not occur. Does this make sense? Sorry if I
misunderstood.

In general, the hardware is designed to detect and mark poison, and to
not let poison escape a system undetected. In the strictest case, the
hardware will perform a system reset if poison is leaving the system. In
a more graceful case, the hardware will continue to pass the poison
marker with the data, so the destination hardware will receive it. In
both cases, the goal is to avoid silent data corruption, and to do so in
the hardware, i.e. without relying on firmware or software management.
The hardware designers are very keen on this point.

BTW, the RDMA case will need further discussion. I *think* this would
fall under the "strictest" case. And likely, CPU-based migration will
also. But I think we can test this and find out. :)

>>
>> Please note that current AMD systems use an internal poison marker on
>> memory. This cannot be cleared through normal memory operations. The
>> only exception, I think, is to use the CLZERO instruction. This will
>> completely wipe a cacheline including metadata like poison, etc.
>>
>> So the hardware should not (by design) loose track of poisoned data.
> 
> This would be better, but virtualization migration currently looses
> track of this.
> Which is not a problem for VMs where the kernel took note of the poison
> and keeps track of it. Because this kernel will handle the poison
> locations it knows about, signaling when these poisoned locations are
> touched.
>

Can you please elaborate on this? I would expect the host kernel to do
all the physical, including poison, memory management.

Or do you mean in the nested poison case like this?
1) The host detects an "AO/deferred" error.
2) The host can try to recover the memory, if clean, etc.
3) Otherwise, the host passes the error info, with "AO/deferred" severity
to the guest.
4) The guest, in nested fashion, can try to recover the memory, if
clean, etc. Or signal its own processes with the AO SIGBUS.

>>

 It is a very rare window, but in order to fix it the most reasonable
 course of action would be to make the AMD emulation deal with SRAO
 errors, instead of ignoring them.

 Do you agree with my analysis ?
>>>
>>> Under the case that SRAO aren't handled well in the kernel today[*] for 
>>> AMD, we
>>> could always add a migration blocker when we hit AO sigbus, in case ignoring
>>> is our only option. But this would be less than ideal to propagating the
>>> SRAO into the guest.
>>>
>>> [*] Meaning knowing that handling the SRAO would generate a crash in the 
>>> guest
>>>
>>> Perhaps as an improvement, perhaps allow qemu to choose to propagate should 
>>> this
>>> limitation be lifted via a new -action value and allow it to 
>>> ignore/propagate or
>>> not e.g.
>>>
>>>   -action mce=none # default on Intel to propagate all MCE events to the 
>>> guest
>>>   -action mce=ignore-optional # Ignore SRAO
> 
> Yes we may need to create something like that, but missing SRAO has
> technical consequences too.
> 
>>>
>>> I suppose the second is also useful for ARM64 considering they currently 
>>> ignore
>>> SRAO events

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-22 Thread William Roche


On 9/21/23 19:41, Yazen Ghannam wrote:

On 9/20/23 7:13 AM, Joao Martins wrote:

On 18/09/2023 23:00, William Roche wrote:

[...]
So it looks like the mechanism works fine... unless the VM has migrated
between the SRAO error and the first time it really touches the poisoned
page to get an SRAR error !  In this case, its new address space
(created on the migration destination) will have a zero-page where we
had a poisoned page, and the AMD VM Kernel (that never dealt with the
SRAO) doesn't know about the poisoned page and will access the page
finding only zeros...  We have a memory corruption !


I don't understand this. Why would the page be zero? Even so, why would
that affect poison?


The migration of a VM moves the memory content from a source platform to
a destination. This is mainly the qemu processes reading the data and
replicating it on the destination. The source qemu where a memory page
is poisoned is(will be[*]) able to skip the poisoned pages it knows
about to indicate to the destination machine to populate the associated
page(s) with zeros as there is no "poison destination page" mechanism in
place for this migration transfer.



Also, during page migration, does the data flow through the CPU core?
Sorry for the basic question. I haven't done a lot with virtualization.


Yes, in most cases (with the exception of RDMA) the data flow through
the CPU cores because the migration verifies if the area to transfer has
some empty pages.



Please note that current AMD systems use an internal poison marker on
memory. This cannot be cleared through normal memory operations. The
only exception, I think, is to use the CLZERO instruction. This will
completely wipe a cacheline including metadata like poison, etc.

So the hardware should not (by design) loose track of poisoned data.


This would be better, but virtualization migration currently looses
track of this.
Which is not a problem for VMs where the kernel took note of the poison
and keeps track of it. Because this kernel will handle the poison
locations it knows about, signaling when these poisoned locations are
touched.





It is a very rare window, but in order to fix it the most reasonable
course of action would be to make the AMD emulation deal with SRAO
errors, instead of ignoring them.

Do you agree with my analysis ?


Under the case that SRAO aren't handled well in the kernel today[*] for AMD, we
could always add a migration blocker when we hit AO sigbus, in case ignoring
is our only option. But this would be less than ideal to propagating the
SRAO into the guest.

[*] Meaning knowing that handling the SRAO would generate a crash in the guest

Perhaps as an improvement, perhaps allow qemu to choose to propagate should this
limitation be lifted via a new -action value and allow it to ignore/propagate or
not e.g.

  -action mce=none # default on Intel to propagate all MCE events to the guest
  -action mce=ignore-optional # Ignore SRAO


Yes we may need to create something like that, but missing SRAO has
technical consequences too.



I suppose the second is also useful for ARM64 considering they currently ignore
SRAO events too.


Would an AMD platform generate SRAO signal to a process
(SIGBUS/BUS_MCEERR_AO) in case of a real hardware error ?


This would be useful to confirm.



There is no SRAO signal on AMD. The closest equivalent may be a
"Deferred" error interrupt. This is an x86 APIC LVT interrupt, and it's
sent when a deferred (uncorrectable non-urgent) error is detected by a
memory controller.

In this case, the CPU will get the interrupt and log the error (in the
host).

An enhancement will be to take the MCA error information collected
during the interrupt and extract useful data. For example, we'll need to
translate the reported address to a system physical address that can be
mapped to a page.


This would be great, as it would mean that a kernel running in a VM can
get notified too.



Once we have the page, then we can decide how we want to signal the
process(es). We could get a deferred/AO error in the host, and signal the
guest with an AR. So the guest handling could be the same in both cases. >
Would this be okay? Or is it important that the guest can distinguish
between the A0/AR cases?



SIGBUS/BUS_MCEERR_AO and BUS_MCEERR_AR are not interchangeable, it is
important to distinguish them.
AO is an asynchronous signal that is only generated when the process
asked for it -- indicating that an error has been detected in its
address space but hasn't been touched yet.
Most of the processes don't care about that (and don't get notified),
they just continue to run, if the poisoned area is not touched, great.
Otherwise a BUS_MCEERR_AR signal is generated when the area is touched,
indicating that the execution thread can't access the location.



IOW, will guests have their own policies on
when to take action? Or is it more about allowing the guest to handle
the error less urgently?


Yes to both questions. Any process can indicate

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-21 Thread Yazen Ghannam

On 9/20/23 7:13 AM, Joao Martins wrote:
> On 18/09/2023 23:00, William Roche wrote:
>> Hi John,
>>
>> I'd like to put the emphasis on the fact that ignoring the SRAO error
>> for a VM is a real problem at least for a specific (rare) case I'm
>> currently working on: The VM migration.
>>
>> Context:
>>
>> - In the case of a poisoned page in the VM address space, the migration
>> can't read it and will skip this page, considering it as a zero-filled
>> page. The VM kernel (that handled the vMCE) would have marked it's
>> associated page as poisoned, and if the VM touches the page, the VM
>> kernel generates the associated MCE because it already knows about the
>> poisoned page.
>>
>> - When we ignore the vMCE in the case of a SIGBUS/BUS_MCEERR_AO error
>> (what this patch does), we entirely rely on the Hypervisor to send an
>> SRAR error to qemu when the page is touched: The AMD VM kernel will
>> receive the SIGBUS/BUS_MCEERR_AR and deal with it, thanks to your
>> changes here.
>>
>> So it looks like the mechanism works fine... unless the VM has migrated
>> between the SRAO error and the first time it really touches the poisoned
>> page to get an SRAR error !  In this case, its new address space
>> (created on the migration destination) will have a zero-page where we
>> had a poisoned page, and the AMD VM Kernel (that never dealt with the
>> SRAO) doesn't know about the poisoned page and will access the page
>> finding only zeros...  We have a memory corruption !

I don't understand this. Why would the page be zero? Even so, why would
that affect poison?

Also, during page migration, does the data flow through the CPU core?
Sorry for the basic question. I haven't done a lot with virtualization.

Please note that current AMD systems use an internal poison marker on
memory. This cannot be cleared through normal memory operations. The
only exception, I think, is to use the CLZERO instruction. This will
completely wipe a cacheline including metadata like poison, etc.

So the hardware should not (by design) loose track of poisoned data.

>>
>> It is a very rare window, but in order to fix it the most reasonable
>> course of action would be to make the AMD emulation deal with SRAO
>> errors, instead of ignoring them.
>>
>> Do you agree with my analysis ?
> 
> Under the case that SRAO aren't handled well in the kernel today[*] for AMD, 
> we
> could always add a migration blocker when we hit AO sigbus, in case ignoring
> is our only option. But this would be less than ideal to propagating the
> SRAO into the guest.
> 
> [*] Meaning knowing that handling the SRAO would generate a crash in the guest
> 
> Perhaps as an improvement, perhaps allow qemu to choose to propagate should 
> this
> limitation be lifted via a new -action value and allow it to ignore/propagate 
> or
> not e.g.
> 
>  -action mce=none # default on Intel to propagate all MCE events to the guest
>  -action mce=ignore-optional # Ignore SRAO
> 
> I suppose the second is also useful for ARM64 considering they currently 
> ignore
> SRAO events too.
> 
>> Would an AMD platform generate SRAO signal to a process
>> (SIGBUS/BUS_MCEERR_AO) in case of a real hardware error ?
>>
> This would be useful to confirm.
>

There is no SRAO signal on AMD. The closest equivalent may be a
"Deferred" error interrupt. This is an x86 APIC LVT interrupt, and it's
sent when a deferred (uncorrectable non-urgent) error is detected by a
memory controller.

In this case, the CPU will get the interrupt and log the error (in the
host).

An enhancement will be to take the MCA error information collected
during the interrupt and extract useful data. For example, we'll need to
translate the reported address to a system physical address that can be
mapped to a page.

Once we have the page, then we can decide how we want to signal the
process(es). We could get a deferred/AO error in the host, and signal the
guest with an AR. So the guest handling could be the same in both cases.

Would this be okay? Or is it important that the guest can distinguish
between the A0/AR cases? IOW, will guests have their own policies on
when to take action? Or is it more about allowing the guest to handle
the error less urgently?

Thanks,
Yazen

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-20 Thread Joao Martins

On 18/09/2023 23:00, William Roche wrote:
> Hi John,
> 
> I'd like to put the emphasis on the fact that ignoring the SRAO error
> for a VM is a real problem at least for a specific (rare) case I'm
> currently working on: The VM migration.
> 
> Context:
> 
> - In the case of a poisoned page in the VM address space, the migration
> can't read it and will skip this page, considering it as a zero-filled
> page. The VM kernel (that handled the vMCE) would have marked it's
> associated page as poisoned, and if the VM touches the page, the VM
> kernel generates the associated MCE because it already knows about the
> poisoned page.
> 
> - When we ignore the vMCE in the case of a SIGBUS/BUS_MCEERR_AO error
> (what this patch does), we entirely rely on the Hypervisor to send an
> SRAR error to qemu when the page is touched: The AMD VM kernel will
> receive the SIGBUS/BUS_MCEERR_AR and deal with it, thanks to your
> changes here.
> 
> So it looks like the mechanism works fine... unless the VM has migrated
> between the SRAO error and the first time it really touches the poisoned
> page to get an SRAR error !  In this case, its new address space
> (created on the migration destination) will have a zero-page where we
> had a poisoned page, and the AMD VM Kernel (that never dealt with the
> SRAO) doesn't know about the poisoned page and will access the page
> finding only zeros...  We have a memory corruption !
> 
> It is a very rare window, but in order to fix it the most reasonable
> course of action would be to make the AMD emulation deal with SRAO
> errors, instead of ignoring them.
> 
> Do you agree with my analysis ?

Under the case that SRAO aren't handled well in the kernel today[*] for AMD, we
could always add a migration blocker when we hit AO sigbus, in case ignoring
is our only option. But this would be less than ideal to propagating the
SRAO into the guest.

[*] Meaning knowing that handling the SRAO would generate a crash in the guest

Perhaps as an improvement, perhaps allow qemu to choose to propagate should this
limitation be lifted via a new -action value and allow it to ignore/propagate or
not e.g.

 -action mce=none # default on Intel to propagate all MCE events to the guest
 -action mce=ignore-optional # Ignore SRAO

I suppose the second is also useful for ARM64 considering they currently ignore
SRAO events too.

> Would an AMD platform generate SRAO signal to a process
> (SIGBUS/BUS_MCEERR_AO) in case of a real hardware error ?
> 
This would be useful to confirm.

> Thanks,
> William.

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-18 Thread William Roche


Hi John,

I'd like to put the emphasis on the fact that ignoring the SRAO error
for a VM is a real problem at least for a specific (rare) case I'm
currently working on: The VM migration.

Context:

- In the case of a poisoned page in the VM address space, the migration
can't read it and will skip this page, considering it as a zero-filled
page. The VM kernel (that handled the vMCE) would have marked it's
associated page as poisoned, and if the VM touches the page, the VM
kernel generates the associated MCE because it already knows about the
poisoned page.

- When we ignore the vMCE in the case of a SIGBUS/BUS_MCEERR_AO error
(what this patch does), we entirely rely on the Hypervisor to send an
SRAR error to qemu when the page is touched: The AMD VM kernel will
receive the SIGBUS/BUS_MCEERR_AR and deal with it, thanks to your
changes here.

So it looks like the mechanism works fine... unless the VM has migrated
between the SRAO error and the first time it really touches the poisoned
page to get an SRAR error !  In this case, its new address space
(created on the migration destination) will have a zero-page where we
had a poisoned page, and the AMD VM Kernel (that never dealt with the
SRAO) doesn't know about the poisoned page and will access the page
finding only zeros...  We have a memory corruption !

It is a very rare window, but in order to fix it the most reasonable
course of action would be to make the AMD emulation deal with SRAO
errors, instead of ignoring them.

Do you agree with my analysis ?
Would an AMD platform generate SRAO signal to a process
(SIGBUS/BUS_MCEERR_AO) in case of a real hardware error ?

Thanks,
William.

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-12 Thread Gupta, Pankaj


From: William Roche 

AMD guests can't currently deal with BUS_MCEERR_AO MCE injection
as it panics the VM kernel. We filter this event and provide a
warning message.

Signed-off-by: William Roche 
---
v3:
   - New patch
v4:
   - Remove redundant check for AO errors
---
  target/i386/kvm/kvm.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5fce74aac5..7e9fc0cac5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -604,6 +604,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int 
code)
  mcg_status |= MCG_STATUS_RIPV;
  }
  } else {
+if (code == BUS_MCEERR_AO) {
+/* XXX we don't support BUS_MCEERR_AO injection on AMD yet */
+return;
+}
  mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
  }
  
@@ -668,8 +672,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)

  addr, paddr, "BUS_MCEERR_AR");
  } else {
   warn_report("Guest MCE Memory Error at QEMU addr %p and "
- "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
- addr, paddr, "BUS_MCEERR_AO");
+ "GUEST addr 0x%" HWADDR_PRIx " of type %s %s",
+ addr, paddr, "BUS_MCEERR_AO",
+ IS_AMD_CPU(env) ? "ignored on AMD guest" : "injected");
  }
  
  return;


Reviewed-by: Pankaj Gupta

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

8 matches

Site Navigation

Mail list logo

Footer information