Re: [Qemu-devel] [PATCH v8 0/3] Generate APEI GHES table and dynamically record CPER

2017-08-04 Thread Fam Zheng
On Thu, 08/03 23:01, no-re...@patchew.org wrote:
> /var/tmp/patchew-tester-tmp-2j515e8v/src/tcg/tcg-op.c:3056:1: fatal error: 
> error writing to /tmp/cc9gtyQ1.s: No space left on device

Sorry, it is a false positive. I will clean up the disk space and rerun the
test. Sorry for the noise.

Fam



[Qemu-devel] [PATCH v8 0/3] Generate APEI GHES table and dynamically record CPER

2017-08-03 Thread Dongjiu Geng
In the armv8 platform, the mainly hardware error source are ARMv8
SEA/SEI/GSIV. For the ARMv8 SEA/SEI, the KVM or host kernel will signal SIGBUS
or use other interface to notify user space, such as Qemu. After Qemu gets
the notification, it will record the CPER and inject the SEA/SEI to KVM. this
series of patches will generate APEI table when guest OS boot up, and 
dynamically
record CPER for the guest OS about the generic hardware errors, currently the
userspace only handle the memory section hardware errors. Before Qemu record the
CPER, it needs to check the ACK value written by the guest OS to avoid 
read-write
race condition.

Below is the APEI/GHESV2/CPER table layout, the max number of error soure is 11,
which is classified by notification type, now only enable the SEA/SEI 
notification type
error source.

 etc/acpi/tables   etc/hardware_errors

==
+ +--++--+
| | HEST ||address   |  
+--+
| +--+|registers |  | 
Error Status |
| | GHES0|| ++  | 
Data Block 0 |
| +--+ +->| |status_address0 |->| 
++
| | .| |  | ++  | | 
 CPER  |
| | error_status_address-+-+ +--->| |status_address1 |--+   | | 
 CPER  |
| | .|   || ++  |   | | 
   |
| | read_ack_register+-+ ||  .   |  |   | | 
 CPER  |
| | read_ack_preserve| | |+--+  |   | 
+-++
| | read_ack_write   | | | +->| |status_address10|+ |   | 
Error Status |
+ +--+ | | |  | ++| |   | 
Data Block 1 |
| | GHES1| +-+-+->| |ack_address0|--+ | +-->| 
++
+ +--+   | |  | ++  | | | | 
 CPER  |
| | .|   | | +--->| |ack_address1|--+-+   | | | 
 CPER  |
| | error_status_address-+---+ | || ++  | |   | | | 
   |
| | .| | || | .  |  | |   | | | 
 CPER  |
| | read_ack_register+-+-+| ++  | |   | 
+-++
| | read_ack_preserve| |   +->| |ack_address10   |--+-+-+ | | 
|..  |
| | read_ack_write   | |   |  | ++  | | | | | 
++
+ +--| |   |  | |  ack0  |<-+ | | | | 
Error Status |
| | ...  | |   |  | ++| | | | 
Data Block 10|
+ +--+ |   |  | |  ack1  |<---+ | +>| 
++
| | GHES10   | |   |  | ++  |   | | 
 CPER  |
+ +--+ |   |  | |    |  |   | | 
 CPER  |
| | .| |   |  | +--+ |  |   | | 
   |
| | error_status_address-+-+   |  | |  ack10 |< +   | | 
 CPER  |
| | .| |  | ++  
+-++
| | read_ack_register+-+
| | read_ack_preserve|
| | read_ack_write   |
+ +--+

After injecting a SEA/SEI ghes error, the gueset OS kernel log will be shown as 
below:

[  142.95] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 8
[  142.913141] {1}[Hardware Error]: event severity: recoverable
[  142.914498] {1}[Hardware Error]:  Error 0, type: recoverable
[  142.915851] {1}[Hardware Error]:   section_type: memory error
[  142.917163] {1}[Hardware Error]:   physical_address: 0x
[  142.918792] {1}[Hardware Error]:   error_type: 3, multi-bit ECC

how to test:
1. In the guest OS, use this command to dump the APEI table: 
"iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST"
2. And find the address for the generic error status block
   according to the notification type
3. then find the CPER record through the generic error status block.

For example(notification type is SEA):

(1) root@genericarmv8:~# iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST
(2) root@genericarmv8:~# cat HEST.dsl
/*
 * Intel ACPI Component Architecture
 * AML/ASL+ Disassembler version 20170728 (64-bit version)
 * Copyright (c) 2000 - 2017 Intel Corporation
 *
 * Disassembly of /sys/firmware/acpi/tables/HEST, Mon Sep  5 07:59:17 2016
 *
 * ACPI Data Table [HEST]
 *
 *