Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-04-21 Thread Alexander Graf



On 04.04.17 12:35, Suzuki K Poulose wrote:

Hi Christoffer,

On 04/04/17 11:13, Christoffer Dall wrote:

Hi Suzuki,

On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:

In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
unmap_stage2_range() on the entire memory range for the guest. This
could
cause problems with other callers (e.g, munmap on a memslot) trying to
unmap a range. And since we have to unmap the entire Guest memory range
holding a spinlock, make sure we yield the lock if necessary, after we
unmap each PUD range.

Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Cc: sta...@vger.kernel.org # v3.10+
Cc: Paolo Bonzini 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: Mark Rutland 
Signed-off-by: Suzuki K Poulose 
[ Avoid vCPU starvation and lockup detector warnings ]
Signed-off-by: Marc Zyngier 
Signed-off-by: Suzuki K Poulose 



This unfortunately fails to build on 32-bit ARM, and I also think we
intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.


Sorry about that, I didn't test the patch with arm32. I am fine the
patch below. And I agree that the name change does make things more
readable. See below for a hunk that I posted to the kbuild report.



How about adding this to your patch (which includes a rename of
S2_PGD_SIZE which is horribly confusing as it indicates the size of the
first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
of address space mapped by a single entry in the same table):

diff --git a/arch/arm/include/asm/stage2_pgtable.h
b/arch/arm/include/asm/stage2_pgtable.h
index 460d616..c997f2d 100644
--- a/arch/arm/include/asm/stage2_pgtable.h
+++ b/arch/arm/include/asm/stage2_pgtable.h
@@ -35,10 +35,13 @@

 #define stage2_pud_huge(pud)pud_huge(pud)

+#define S2_PGDIR_SIZEPGDIR_SIZE
+#define S2_PGDIR_MASKPGDIR_MASK
+
 /* Open coded p*d_addr_end that can deal with 64bit addresses */
 static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr,
phys_addr_t end)
 {
-phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
+phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;

 return (boundary - 1 < end - 1) ? boundary : end;
 }
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..6e79a4c 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
 static unsigned long hyp_idmap_end;
 static phys_addr_t hyp_idmap_vector;

-#define S2_PGD_SIZE(PTRS_PER_S2_PGD * sizeof(pgd_t))
+#define S2_PGD_TABLE_SIZE(PTRS_PER_S2_PGD * sizeof(pgd_t))
 #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))

 #define KVM_S2PTE_FLAG_IS_IOMAP(1UL << 0)
@@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm,
phys_addr_t start, u64 size)
  * If the range is too large, release the kvm->mmu_lock
  * to prevent starvation and lockup detector warnings.
  */
-if (size > S2_PUD_SIZE)
+if (size > S2_PGDIR_SIZE)
 cond_resched_lock(>mmu_lock);
 next = stage2_pgd_addr_end(addr, end);
 if (!stage2_pgd_none(*pgd))
@@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
 }

 /* Allocate the HW PGD, making sure that each page gets its own
refcount */
-pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
+pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
 if (!pgd)
 return -ENOMEM;

@@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
 spin_unlock(>mmu_lock);

 /* Free the HW pgd, one page at a time */
-free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
+free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
 kvm->arch.pgd = NULL;
 }



Btw, I have a different hunk to solve the problem, posted to the kbuild
report. I will post it here for the sake of capturing the discussion in
one place. The following hunk on top of the patch, changes the lock
release after we process one PGDIR entry. As for the first time
we enter the loop we haven't done much with the lock held, hence it may
make
sense to do it after the first round and we have more work to do.

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..582a972 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm,
phys_addr_t start, u64 size)
assert_spin_locked(>mmu_lock);
pgd = kvm->arch.pgd + stage2_pgd_index(addr);
do {
+   next = stage2_pgd_addr_end(addr, end);
+   if (!stage2_pgd_none(*pgd))


Just as heads up, I had this version applied to my tree by accident 
(commit 8b3405e345b5a098101b0c31b264c812bba045d9 from Christoffer's 
queue) and ran into a NULL pointer dereference:



Re: [RFC] minimum gcc version for kernel: raise to gcc-4.3 or 4.6?

2017-04-21 Thread Kees Cook
On Fri, Apr 21, 2017 at 1:55 PM, Arnd Bergmann  wrote:
> On Thu, Apr 20, 2017 at 9:52 PM, Kees Cook  wrote:
>> On Thu, Apr 20, 2017 at 3:15 AM, Arnd Bergmann  wrote:
>>> On Sun, Apr 16, 2017 at 9:52 PM, Kees Cook  wrote:
>> The original gcc-4.3 release was in early 2008. If we decide to still
>> support that, we probably want the first 10 quirks in this series,
>> while gcc-4.6 (released in 2011) requires none of them.

 I'd be in support of raising the minimum to gcc 4.6. (I'd actually
 prefer 4.7, just to avoid some 4.6 packaging issues, and for better
 gcc plugin support.)

 I'm curious what gcc 4.6 binaries are common in the wild besides
 old-stable Debian (unsupported in maybe a year from now?) and 12.04
 Ubuntu (going fully unsupported in 2 weeks). It looks like 4.6 was
 used only in Fedora 15 and 16 (both EOL).
>>>
>>> I think we are better off defining two versions: One that we know
>>> a lot of people care about, and we actively try to make that work
>>> well in all configurations (e.g. 4.6, 4.7 or 4.8), fixing all warnings
>>> we run into, and an older version that we try not to break
>>> intentionally (e.g. 3.4, 4.1 or 4.3) but that we only fix when
>>> someone actually runs into a problem they can't work around
>>> by upgrading to a more modern compiler.
>>
>> For "working well everywhere" I feel like 4.8 is the better of those
>> three (I'd prefer 4.9). I think we should avoid 4.6 -- it seems not
>> widely used.
>
> I suspect that 4.9 might be the one that actually works best
> across architectures, and it contained some very significant
> changes. In my testing gcc-5 tends to behave very similarly
> to 4.9, and gcc-6 introduced a larger number of new warnings,
> so that would clearly be too new for a recommended version.
>
> The suggestion of 4.9 or higher is appealing as a recommendation
> because it matches what I would personally tell people:
>
> - If you have gcc-4.9 or newer and you don't rely on any newer
>   features, there is no need to upgrade
> - Wth gcc-4.8, the -Wmaybe-uninitialized warnings are now turned
>   off because they were too noisy, so upgrading is probably a good
>   idea even though the compiler is otherwise ok and in widespread
>   use
> - gcc-4.6 and 4.7 are basically usable for building kernels, but the
>   warning output is often counterproductive, and the generated
>   object code may be noticeably worse.
> - anything before gcc-4.6 is missing too many features to be
>   useful on ARM, but may still be fine on other architectures.
>
> On the other hand, there is a noticeable difference in compile
> speed, as a 5% slowdown compared to the previous release
> apparently is not considered a regression. These are the times
> I see for building ARM 'vexpress_defconfig':
>
> gcc-4.4: real 0m47.269s user 11m48.576s
> gcc-4.5: real 0m44.878s user 10m58.900s
> gcc-4.6: real 0m44.621s user 11m34.716s
> gcc-4.7: real 0m47.476s user 12m42.924s
> gcc-4.8: real 0m48.494s user 13m19.736s
> gcc-4.9: real 0m50.140s user 13m44.876s
> gcc-5.x: real 0m51.302s user 14m05.564s
> gcc-6.x: real 0m54.615s user 15m06.304s
> gcc-7.x: real 0m56.008s user 15m44.720s
>
> That is a factor of 1.5x in CPU cycles between slowest and
> fastest, so there is clearly a benefit to keeping the old versions
> around, but there is also no clear cut-off other thannoticing
> that gcc-4.4 is slower than 4.5 in this particular
> configuration.
>
>> For an old compiler... yikes. 3.4 sounds insane to me. :)
>
> That was my initial thought as well. On ARM, it clearly is
> insane, as even gcc-4.0 is unable to build any of the modern
> defconfigs (lacking -mabi=aapcs, ICE when building vsprintf.c)
> and even the patch I did to get gcc-4.1 to build is probably
> too ugly to get merged, so to build any unpatched kernel after
> linux-3.6 you need at least gcc-4.2, or even gcc-4.4 for the
> ''defconfig' (gcc-4.3 if you disable vdso).
>
> Then again, on x86, old cmpilers were claimed to be much better
> supported. I just tried it out and found that no x86 defconfig kernel
> since linux-3.2 could be built with gcc-3.4, probably not on any
> other architecture either (it cannot have forward declarations
> for inline functions and we have one in kernel/sched_fair.c).
>
> I think that would be a really good argument for requiring
> something newer ;-)
>
> The linux-4.2 x86 defconfig could still be built with gcc-4.0, but
> later kernels have several minor problems with that, and
> require at least gcc-4.3.
>
> If we are ok with this status quo, we could simply declare gcc-4.3
> the absolute minimum version for the kernel, make gcc-4.9
> the recommeded minimum version, and remove all workarounds
> for gcc-4.2 or older.

I think starting with this would be a good first step. I'm not sure
the best way to add "recommended minimum" to
Documentation/process/changes.rst hmmm

> If anyone has a good reason 

Re: [RFC] minimum gcc version for kernel: raise to gcc-4.3 or 4.6?

2017-04-21 Thread Arnd Bergmann
On Thu, Apr 20, 2017 at 9:52 PM, Kees Cook  wrote:
> On Thu, Apr 20, 2017 at 3:15 AM, Arnd Bergmann  wrote:
>> On Sun, Apr 16, 2017 at 9:52 PM, Kees Cook  wrote:
> The original gcc-4.3 release was in early 2008. If we decide to still
> support that, we probably want the first 10 quirks in this series,
> while gcc-4.6 (released in 2011) requires none of them.
>>>
>>> I'd be in support of raising the minimum to gcc 4.6. (I'd actually
>>> prefer 4.7, just to avoid some 4.6 packaging issues, and for better
>>> gcc plugin support.)
>>>
>>> I'm curious what gcc 4.6 binaries are common in the wild besides
>>> old-stable Debian (unsupported in maybe a year from now?) and 12.04
>>> Ubuntu (going fully unsupported in 2 weeks). It looks like 4.6 was
>>> used only in Fedora 15 and 16 (both EOL).
>>
>> I think we are better off defining two versions: One that we know
>> a lot of people care about, and we actively try to make that work
>> well in all configurations (e.g. 4.6, 4.7 or 4.8), fixing all warnings
>> we run into, and an older version that we try not to break
>> intentionally (e.g. 3.4, 4.1 or 4.3) but that we only fix when
>> someone actually runs into a problem they can't work around
>> by upgrading to a more modern compiler.
>
> For "working well everywhere" I feel like 4.8 is the better of those
> three (I'd prefer 4.9). I think we should avoid 4.6 -- it seems not
> widely used.

I suspect that 4.9 might be the one that actually works best
across architectures, and it contained some very significant
changes. In my testing gcc-5 tends to behave very similarly
to 4.9, and gcc-6 introduced a larger number of new warnings,
so that would clearly be too new for a recommended version.

The suggestion of 4.9 or higher is appealing as a recommendation
because it matches what I would personally tell people:

- If you have gcc-4.9 or newer and you don't rely on any newer
  features, there is no need to upgrade
- Wth gcc-4.8, the -Wmaybe-uninitialized warnings are now turned
  off because they were too noisy, so upgrading is probably a good
  idea even though the compiler is otherwise ok and in widespread
  use
- gcc-4.6 and 4.7 are basically usable for building kernels, but the
  warning output is often counterproductive, and the generated
  object code may be noticeably worse.
- anything before gcc-4.6 is missing too many features to be
  useful on ARM, but may still be fine on other architectures.

On the other hand, there is a noticeable difference in compile
speed, as a 5% slowdown compared to the previous release
apparently is not considered a regression. These are the times
I see for building ARM 'vexpress_defconfig':

gcc-4.4: real 0m47.269s user 11m48.576s
gcc-4.5: real 0m44.878s user 10m58.900s
gcc-4.6: real 0m44.621s user 11m34.716s
gcc-4.7: real 0m47.476s user 12m42.924s
gcc-4.8: real 0m48.494s user 13m19.736s
gcc-4.9: real 0m50.140s user 13m44.876s
gcc-5.x: real 0m51.302s user 14m05.564s
gcc-6.x: real 0m54.615s user 15m06.304s
gcc-7.x: real 0m56.008s user 15m44.720s

That is a factor of 1.5x in CPU cycles between slowest and
fastest, so there is clearly a benefit to keeping the old versions
around, but there is also no clear cut-off other thannoticing
that gcc-4.4 is slower than 4.5 in this particular
configuration.

> For an old compiler... yikes. 3.4 sounds insane to me. :)

That was my initial thought as well. On ARM, it clearly is
insane, as even gcc-4.0 is unable to build any of the modern
defconfigs (lacking -mabi=aapcs, ICE when building vsprintf.c)
and even the patch I did to get gcc-4.1 to build is probably
too ugly to get merged, so to build any unpatched kernel after
linux-3.6 you need at least gcc-4.2, or even gcc-4.4 for the
''defconfig' (gcc-4.3 if you disable vdso).

Then again, on x86, old cmpilers were claimed to be much better
supported. I just tried it out and found that no x86 defconfig kernel
since linux-3.2 could be built with gcc-3.4, probably not on any
other architecture either (it cannot have forward declarations
for inline functions and we have one in kernel/sched_fair.c).

I think that would be a really good argument for requiring
something newer ;-)

The linux-4.2 x86 defconfig could still be built with gcc-4.0, but
later kernels have several minor problems with that, and
require at least gcc-4.3.

If we are ok with this status quo, we could simply declare gcc-4.3
the absolute minimum version for the kernel, make gcc-4.9
the recommeded minimum version, and remove all workarounds
for gcc-4.2 or older.

If anyone has a good reason for gcc-4.0 through gcc-4.2, then
we would need a small number of patches to get them back
working with x86 defconfig.

   Arnd
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-21 Thread Baicar, Tyler

On 4/21/2017 11:55 AM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 05:05:16PM -0600, Tyler Baicar wrote:

Add support for ARM Common Platform Error Record (CPER).
UEFI 2.6 specification adds support for ARM specific
processor error information to be reported as part of the
CPER records. This provides more detail on for processor error logs.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
Reviewed-by: Ard Biesheuvel 
---
  drivers/firmware/efi/cper.c | 135 
  include/linux/cper.h|  54 ++
  2 files changed, 189 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 46585f9..f959185 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
  static const char * const proc_type_strs[] = {
"IA32/X64",
"IA64",
+   "ARM",
  };
  
  static const char * const proc_isa_strs[] = {

"IA32",
"IA64",
"X64",
+   "ARM A32/T32",
+   "ARM A64",
  };
  
  static const char * const proc_error_type_strs[] = {

@@ -184,6 +187,128 @@ static void cper_print_proc_generic(const char *pfx,
printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
  }
  
+#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)

+static const char * const arm_reg_ctx_strs[] = {
+   "AArch32 general purpose registers",
+   "AArch32 EL1 context registers",
+   "AArch32 EL2 context registers",
+   "AArch32 secure context registers",
+   "AArch64 general purpose registers",
+   "AArch64 EL1 context registers",
+   "AArch64 EL2 context registers",
+   "AArch64 EL3 context registers",
+   "Misc. system register structure",
+};
+
+static void cper_print_proc_arm(const char *pfx,
+   const struct cper_sec_proc_arm *proc)
+{
+   int i, len, max_ctx_type;
+   struct cper_arm_err_info *err_info;
+   struct cper_arm_ctx_info *ctx_info;
+   char newpfx[64];
+
+   printk("%ssection length: %d\n", pfx, proc->section_length);

We need to dump section length because?
I guess it's not really needed. It just may be useful considering there 
can be numerous error info structures, numerous context info structures, 
and a variable length vendor information section. I can move this print 
to only in the length check failure cases.



+   printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
+
+   len = proc->section_length - (sizeof(*proc) +
+   proc->err_info_num * (sizeof(*err_info)));
+   if (len < 0) {
+   printk("%ssection length is too small\n", pfx);

Now here we *can* dump it.


+   printk("%sfirmware-generated error record is incorrect\n", pfx);
+   printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
+   return;
+   }
+
+   if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
+   printk("%sMPIDR: 0x%016llx\n", pfx, proc->mpidr);


< newline here.

Also, what is MPIDR and can it be written in a more user-friendly manner
and not be an abbreviation?


+   if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
+   printk("%serror affinity level: %d\n", pfx,
+   proc->affinity_level);
+   if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
+   printk("%srunning state: 0x%x\n", pfx, proc->running_state);
+   printk("%sPSCI state: %d\n", pfx, proc->psci_state);

One more abbreviation. Please consider whether having the abbreviations
or actually writing them out is more user-friendly.


+   }
+
+   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);

That INDENT_SP thing is just silly, someone should kill it.


+
+   err_info = (struct cper_arm_err_info *)(proc + 1);
+   for (i = 0; i < proc->err_info_num; i++) {
+   printk("%sError info structure %d:\n", pfx, i);
+   printk("%sversion:%d\n", newpfx, err_info->version);
+   printk("%slength:%d\n", newpfx, err_info->length);

< newline here.

Why do we even dump version and info for *every* err_info structure?
Because these are part of the error information structure. I wouldn't 
think FW would populate error information structures that are different 
versions in the same processor error, but it could be possible from the 
spec (at least once there are different versions of the table).



+   if (err_info->validation_bits &
+   CPER_ARM_INFO_VALID_MULTI_ERR) {
+   if (err_info->multiple_error == 0)
+   printk("%ssingle error\n", newpfx);
+   else if (err_info->multiple_error == 1)
+   printk("%smultiple 

Re: [PATCH V15 03/11] cper: add timestamp print to CPER status printing

2017-04-21 Thread Borislav Petkov
On Fri, Apr 21, 2017 at 12:08:43PM -0600, Baicar, Tyler wrote:
> The timestamp may still be useful when it is imprecise. In the polling case,
> you may only poll every minute or so, so the time may be useful.

Well, what is in the timestamp when !precise? Some random time or some
timestamp from a couple of seconds ago? How do you differentiate what
timestamp is bollocks and what is from a while ago?

Is the imprecise tstamp really close to the time the error happened or
pointing at 1970 - the beginning of unix time? :-)

I'm sure you've picked up by now that we don't trust the firmware one
bit.

> Also, I imagine there could be interrupt based errors happening much faster 
> than the
> FW/OS handshake can happen. Maybe we can just use what I had before but also
> specify imprecise so that it is clear:
> 
> printk("%s%ststamp: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
> (timestamp[3] & 0x1 ? "precise " : "imprecise "),
>  century, year, mon, day, hour, min, sec);

I guess.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 03/11] cper: add timestamp print to CPER status printing

2017-04-21 Thread Baicar, Tyler

On 4/21/2017 11:26 AM, Borislav Petkov wrote:

On Fri, Apr 21, 2017 at 10:04:35AM -0600, Baicar, Tyler wrote:

This is basically what I already had in v14...you asked to move it into a
different if-statement? https://lkml.org/lkml/2017/4/12/397

Well, clearly I've been smoking some nasty potent sh*t. :-\

/me goes and looks at the spec:

"Bit 0 – Timestamp is precise if this bit is set and correlates to the
time of the error event."

So why are we even printing the timestamp when !precise?

IOW, I think we should do:

if (!(timestamp[3] & 0x1))
printk("%stimestamp imprecise\n", pfx);
else {
sec = ..
min = ...

...
}

and print the actual values only when the timestamp is precise.
Otherwise it has *some* values which could just as well be completely
random. And it's not like we're reporting the error tomorrow - it is
mostly a couple of seconds from logging to the fw pushing it out...
The timestamp may still be useful when it is imprecise. In the polling 
case, you may only poll every minute or so, so the time may be useful. 
Also, I imagine there could be interrupt based errors happening much 
faster than the FW/OS handshake can happen. Maybe we can just use what I 
had before but also specify imprecise so that it is clear:


printk("%s%ststamp: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
(timestamp[3] & 0x1 ? "precise " : "imprecise "),
 century, year, mon, day, hour, min, sec);

Thanks,
Tyler

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-21 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 05:05:16PM -0600, Tyler Baicar wrote:
> Add support for ARM Common Platform Error Record (CPER).
> UEFI 2.6 specification adds support for ARM specific
> processor error information to be reported as part of the
> CPER records. This provides more detail on for processor error logs.
> 
> Signed-off-by: Tyler Baicar 
> CC: Jonathan (Zhixiong) Zhang 
> Reviewed-by: James Morse 
> Reviewed-by: Ard Biesheuvel 
> ---
>  drivers/firmware/efi/cper.c | 135 
> 
>  include/linux/cper.h|  54 ++
>  2 files changed, 189 insertions(+)
> 
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 46585f9..f959185 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
>  static const char * const proc_type_strs[] = {
>   "IA32/X64",
>   "IA64",
> + "ARM",
>  };
>  
>  static const char * const proc_isa_strs[] = {
>   "IA32",
>   "IA64",
>   "X64",
> + "ARM A32/T32",
> + "ARM A64",
>  };
>  
>  static const char * const proc_error_type_strs[] = {
> @@ -184,6 +187,128 @@ static void cper_print_proc_generic(const char *pfx,
>   printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
>  }
>  
> +#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
> +static const char * const arm_reg_ctx_strs[] = {
> + "AArch32 general purpose registers",
> + "AArch32 EL1 context registers",
> + "AArch32 EL2 context registers",
> + "AArch32 secure context registers",
> + "AArch64 general purpose registers",
> + "AArch64 EL1 context registers",
> + "AArch64 EL2 context registers",
> + "AArch64 EL3 context registers",
> + "Misc. system register structure",
> +};
> +
> +static void cper_print_proc_arm(const char *pfx,
> + const struct cper_sec_proc_arm *proc)
> +{
> + int i, len, max_ctx_type;
> + struct cper_arm_err_info *err_info;
> + struct cper_arm_ctx_info *ctx_info;
> + char newpfx[64];
> +
> + printk("%ssection length: %d\n", pfx, proc->section_length);

We need to dump section length because?

> + printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
> +
> + len = proc->section_length - (sizeof(*proc) +
> + proc->err_info_num * (sizeof(*err_info)));
> + if (len < 0) {
> + printk("%ssection length is too small\n", pfx);

Now here we *can* dump it.

> + printk("%sfirmware-generated error record is incorrect\n", pfx);
> + printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
> + return;
> + }
> +
> + if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
> + printk("%sMPIDR: 0x%016llx\n", pfx, proc->mpidr);


< newline here.

Also, what is MPIDR and can it be written in a more user-friendly manner
and not be an abbreviation?

> + if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
> + printk("%serror affinity level: %d\n", pfx,
> + proc->affinity_level);
> + if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
> + printk("%srunning state: 0x%x\n", pfx, proc->running_state);
> + printk("%sPSCI state: %d\n", pfx, proc->psci_state);

One more abbreviation. Please consider whether having the abbreviations
or actually writing them out is more user-friendly.

> + }
> +
> + snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);

That INDENT_SP thing is just silly, someone should kill it.

> +
> + err_info = (struct cper_arm_err_info *)(proc + 1);
> + for (i = 0; i < proc->err_info_num; i++) {
> + printk("%sError info structure %d:\n", pfx, i);
> + printk("%sversion:%d\n", newpfx, err_info->version);
> + printk("%slength:%d\n", newpfx, err_info->length);

< newline here.

Why do we even dump version and info for *every* err_info structure?

> + if (err_info->validation_bits &
> + CPER_ARM_INFO_VALID_MULTI_ERR) {
> + if (err_info->multiple_error == 0)
> + printk("%ssingle error\n", newpfx);
> + else if (err_info->multiple_error == 1)
> + printk("%smultiple errors\n", newpfx);
> + else
> + printk("%smultiple errors count:%u\n",
> + newpfx, err_info->multiple_error);

So this can be simply: "num errors: %d", err_info->multiple_error+1...

Without checking CPER_ARM_INFO_VALID_MULTI_ERR.

> + }

< newline here.

> + if (err_info->validation_bits & CPER_ARM_INFO_VALID_FLAGS) {
> + if (err_info->flags & CPER_ARM_INFO_FLAGS_FIRST)
> + 

Re: [PATCH V15 03/11] cper: add timestamp print to CPER status printing

2017-04-21 Thread Borislav Petkov
On Fri, Apr 21, 2017 at 10:04:35AM -0600, Baicar, Tyler wrote:
> This is basically what I already had in v14...you asked to move it into a
> different if-statement? https://lkml.org/lkml/2017/4/12/397

Well, clearly I've been smoking some nasty potent sh*t. :-\

/me goes and looks at the spec:

"Bit 0 – Timestamp is precise if this bit is set and correlates to the
time of the error event."

So why are we even printing the timestamp when !precise?

IOW, I think we should do:

if (!(timestamp[3] & 0x1))
printk("%stimestamp imprecise\n", pfx);
else {
sec = ..
min = ...

...
}

and print the actual values only when the timestamp is precise.
Otherwise it has *some* values which could just as well be completely
random. And it's not like we're reporting the error tomorrow - it is
mostly a couple of seconds from logging to the fw pushing it out...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 03/11] cper: add timestamp print to CPER status printing

2017-04-21 Thread Baicar, Tyler

On 4/21/2017 6:21 AM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 05:05:15PM -0600, Tyler Baicar wrote:

The ACPI 6.1 spec added a timestamp to the HEST generic data

HEST?

I see the timestamp in

Table 18-343 Generic Error Data Entry

where those things are "One or more Generic Error Data Entry structures
may be recorded in the Generic Error Data Entries field of the Generic
Error Status Block structure."

And those are part of the "18.3.2.7 Generic Hardware Error Source",
i.e., GHES. So why do you say "HEST" above?
Ah yes, this should be GHES no HEST. There are too many acronyms 
involved here :)



structure. Print the timestamp out when printing out the error
status information.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
Reviewed-by: Ard Biesheuvel 

Remove those Reviewed-by:s


---
  drivers/firmware/efi/cper.c | 28 
  1 file changed, 28 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 8328a6f..46585f9 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -32,6 +32,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  #include 
  
  #define INDENT_SP	" "

@@ -387,6 +389,29 @@ static void cper_print_pcie(const char *pfx, const struct 
cper_sec_pcie *pcie,
pfx, pcie->bridge.secondary_status, pcie->bridge.control);
  }
  
+static void cper_estatus_timestamp(const char *pfx,

cper_print_tstamp()


+  struct acpi_hest_generic_data_v300 *gdata)
+{
+   __u8 hour, min, sec, day, mon, year, century, *timestamp;
+
+   if (gdata->validation_bits & ACPI_HEST_GEN_VALID_TIMESTAMP) {
+   timestamp = (__u8 *)&(gdata->time_stamp);
+   sec   = bcd2bin(timestamp[0]);
+   min   = bcd2bin(timestamp[1]);
+   hour  = bcd2bin(timestamp[2]);
+   day   = bcd2bin(timestamp[4]);
+   mon   = bcd2bin(timestamp[5]);
+   year  = bcd2bin(timestamp[6]);
+   century   = bcd2bin(timestamp[7]);
+
+   if (*(timestamp + 3) & 0x1)
+   printk("%stimestamp is precise\n", pfx);
+
+   printk("%stime: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
+   century, year, mon, day, hour, min, sec);

Yeah, about the precise tstamp, you can do something like this:

---
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 46585f92b741..a649884e2264 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -404,10 +404,8 @@ static void cper_estatus_timestamp(const char *pfx,
year  = bcd2bin(timestamp[6]);
century   = bcd2bin(timestamp[7]);
  
-		if (*(timestamp + 3) & 0x1)

-   printk("%stimestamp is precise\n", pfx);
-
-   printk("%stime: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
+   printk("%s%ststamp: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
+   (timestamp[3] & 0x1 ? "precise " : ""),
century, year, mon, day, hour, min, sec);
}
  }
This is basically what I already had in v14...you asked to move it into 
a different if-statement? https://lkml.org/lkml/2017/4/12/397


Thanks,
Tyler

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-04-21 Thread gengdongjiu
Hi all/Laszlo,

  sorry, I have a question to consult with you.


On 2017/4/7 2:55, Laszlo Ersek wrote:
> On 04/06/17 14:35, gengdongjiu wrote:
>> Dear, Laszlo
>>Thanks for your detailed explanation.
>>
>> On 2017/3/29 19:58, Laszlo Ersek wrote:
>>> (This ought to be one of the longest address lists I've ever seen :)
>>> Thanks for the CC. I'm glad Shannon is already on the CC list. For good
>>> measure, I'm adding MST and Igor.)
>>>
>>> On 03/29/17 12:36, Achin Gupta wrote:
 Hi gengdongjiu,

 On Wed, Mar 29, 2017 at 05:36:37PM +0800, gengdongjiu wrote:
>
> Hi Laszlo/Biesheuvel/Qemu developer,
>
>Now I encounter a issue and want to consult with you in ARM64 
> platform, as described below:
>
> when guest OS happen synchronous or asynchronous abort, kvm needs
> to send the error address to Qemu or UEFI through sigbus to
> dynamically generate APEI table. from my investigation, there are
> two ways:
>
> (1) Qemu get the error address, and generate the APEI table, then
> notify UEFI to know this generation, then inject abort error to
> guest OS, guest OS read the APEI table.
> (2) Qemu get the error address, and let UEFI to generate the APEI
> table, then inject abort error to guest OS, guest OS read the APEI
> table.

 Just being pedantic! I don't think we are talking about creating the APEI 
 table
 dynamically here. The issue is: Once KVM has received an error that is 
 destined
 for a guest it will raise a SIGBUS to Qemu. Now before Qemu can inject the 
 error
 into the guest OS, a CPER (Common Platform Error Record) has to be 
 generated
 corresponding to the error source (GHES corresponding to memory subsystem,
 processor etc) to allow the guest OS to do anything meaningful with the
 error. So who should create the CPER is the question.

 At the EL3/EL2 interface (Secure Firmware and OS/Hypervisor), an error 
 arrives
 at EL3 and secure firmware (at EL3 or a lower secure exception level) is
 responsible for creating the CPER. ARM is experimenting with using a 
 Standalone
 MM EDK2 image in the secure world to do the CPER creation. This will avoid
 adding the same code in ARM TF in EL3 (better for security). The error 
 will then
 be injected into the OS/Hypervisor (through SEA/SEI/SDEI) through ARM 
 Trusted
 Firmware.

 Qemu is essentially fulfilling the role of secure firmware at the EL2/EL1
 interface (as discussed with Christoffer below). So it should generate the 
 CPER
 before injecting the error.

 This is corresponds to (1) above apart from notifying UEFI (I am assuming 
 you
 mean guest UEFI). At this time, the guest OS already knows where to pick 
 up the
 CPER from through the HEST. Qemu has to create the CPER and populate its 
 address
 at the address exported in the HEST. Guest UEFI should not be involved in 
 this
 flow. Its job was to create the HEST at boot and that has been done by this
 stage.

 Qemu folk will be able to add but it looks like support for CPER 
 generation will
 need to be added to Qemu. We need to resolve this.

 Do shout if I am missing anything above.
>>>
>>> After reading this email, the use case looks *very* similar to what
>>> we've just done with VMGENID for QEMU 2.9.
>>>
>>> We have a facility between QEMU and the guest firmware, called "ACPI
>>> linker/loader", with which QEMU instructs the firmware to
>>>
>>> - allocate and download blobs into guest RAM (AcpiNVS type memory) --
>>> ALLOCATE command,
>>>
>>> - relocate pointers in those blobs, to fields in other (or the same)
>>> blobs -- ADD_POINTER command,
>>>
>>> - set ACPI table checksums -- ADD_CHECKSUM command,
>>>
>>> - and send GPAs of fields within such blobs back to QEMU --
>>> WRITE_POINTER command.
>>>
>>> This is how I imagine we can map the facility to the current use case
>>> (note that this is the first time I read about HEST / GHES / CPER):

Laszlo lists a Qemu GHES table generation solution, Mainly use the four 
commands: "ALLOCATE/ADD_POINTER/ADD_CHECKSUM/WRITE_POINTER" to communicate with 
BIOS
so whether the four commands needs to be supported by the guest firware/UEFI.  
I found the  "WRITE_POINTER" always failed. so I suspect guest UEFI/firmware 
not support the "WRITE_POINTER" command. please help me confirm it, thanks so 
much.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 03/11] cper: add timestamp print to CPER status printing

2017-04-21 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 05:05:15PM -0600, Tyler Baicar wrote:
> The ACPI 6.1 spec added a timestamp to the HEST generic data

HEST?

I see the timestamp in

Table 18-343 Generic Error Data Entry

where those things are "One or more Generic Error Data Entry structures
may be recorded in the Generic Error Data Entries field of the Generic
Error Status Block structure."

And those are part of the "18.3.2.7 Generic Hardware Error Source",
i.e., GHES. So why do you say "HEST" above?

> structure. Print the timestamp out when printing out the error
> status information.
> 
> Signed-off-by: Tyler Baicar 
> CC: Jonathan (Zhixiong) Zhang 
> Reviewed-by: James Morse 
> Reviewed-by: Ard Biesheuvel 

Remove those Reviewed-by:s

> ---
>  drivers/firmware/efi/cper.c | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 8328a6f..46585f9 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -32,6 +32,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  
>  #define INDENT_SP" "
> @@ -387,6 +389,29 @@ static void cper_print_pcie(const char *pfx, const 
> struct cper_sec_pcie *pcie,
>   pfx, pcie->bridge.secondary_status, pcie->bridge.control);
>  }
>  
> +static void cper_estatus_timestamp(const char *pfx,

cper_print_tstamp()

> +struct acpi_hest_generic_data_v300 *gdata)
> +{
> + __u8 hour, min, sec, day, mon, year, century, *timestamp;
> +
> + if (gdata->validation_bits & ACPI_HEST_GEN_VALID_TIMESTAMP) {
> + timestamp = (__u8 *)&(gdata->time_stamp);
> + sec   = bcd2bin(timestamp[0]);
> + min   = bcd2bin(timestamp[1]);
> + hour  = bcd2bin(timestamp[2]);
> + day   = bcd2bin(timestamp[4]);
> + mon   = bcd2bin(timestamp[5]);
> + year  = bcd2bin(timestamp[6]);
> + century   = bcd2bin(timestamp[7]);
> +
> + if (*(timestamp + 3) & 0x1)
> + printk("%stimestamp is precise\n", pfx);
> +
> + printk("%stime: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
> + century, year, mon, day, hour, min, sec);

Yeah, about the precise tstamp, you can do something like this:

---
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 46585f92b741..a649884e2264 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -404,10 +404,8 @@ static void cper_estatus_timestamp(const char *pfx,
year  = bcd2bin(timestamp[6]);
century   = bcd2bin(timestamp[7]);
 
-   if (*(timestamp + 3) & 0x1)
-   printk("%stimestamp is precise\n", pfx);
-
-   printk("%stime: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
+   printk("%s%ststamp: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
+   (timestamp[3] & 0x1 ? "precise " : ""), 
century, year, mon, day, hour, min, sec);
}
 }

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

2017-04-21 Thread Xiongfeng Wang
Hi James,

Thanks for your reply.

On 2017/4/20 16:52, James Morse wrote:
> Hi Wang Xiongfeng,
> 
> On 19/04/17 03:37, Xiongfeng Wang wrote:
>> On 2017/4/18 18:51, James Morse wrote:
>>> The host expects to receive physical SError Interrupts. The ARM-ARM doesn't
>>> describe a way to inject these as they are generated by the CPU.
>>>
>>> Am I right in thinking you want this to use SError Interrupts as an APEI
>>> notification? (This isn't a CPU thing so the RAS spec doesn't cover this 
>>> use)
>>
>> Yes, using sei as an APEI notification is one part of my consideration. 
>> Another use is for ESB.
>> RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External 
>> Abort with ESB'
>> describes the SEI recovery process when ESB is implemented.
>>
>> In this situation, SEI is routed to EL3 (SCR_EL3.EA = 1). When an SEI occurs 
>> in EL0 and not been taken immediately,
>> and then an ESB instruction at SVC entry is executed, SEI is taken to EL3. 
>> The ESB at SVC entry is
>> used for preventing the error propagating from user space to kernel space. 
>> The EL3 SEI handler collects
> 
>> the errors and fills in the APEI table, and then jump to EL2 SEI handler. 
>> EL2 SEI handler inject
>> an vSEI into EL1 by setting HCR_EL2.VSE = 1, so that when returned to OS, an 
>> SEI is pending.
> 
> This step has confused me. How would this work with VHE where the host runs at
> EL2 and there is nothing at Host:EL1?

RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External 
Abort with ESB'
I don't know whether the step described in the section is designed for guest os 
or host os or both.
Yes, it doesn't work with VHE where the host runs at EL2.

>>From your description I assume you have some firmware resident at EL2.

Our actual SEI step is planned as follows:
Host OS:  EL0/EL1 -> EL3 -> EL0/EL1
Guest OS:  EL0/EL1 -> EL3 -> EL2 -> EL0/EL1

Our problem is that, when returning to EL0/EL1, whether we should jump to EL1 
SEI vector or just return to where the SEI is taken from?
In guest os situation, we can inject an vSEI and return to where the SEI is 
taken from.
But in host os situation, we can't inject an vSEI (if we don't set 
HCR_EL2.AMO), so we have to jump to EL1 SEI vector.
Then the process following ESB won't be executed becuase SEI is taken to EL3 
from the ESB instruction in EL1, and when control
is returned to OS, we are in EL1 SEI vector rather than the ESB instruction.

It is ok to just ignore the process following the ESB instruction in el0_sync, 
because the process will be sent SIGBUS signal.
But for el0_irq, the irq process following the ESB may should be executed 
because the irq is not related to the user process.

If we set HCR_EL2.AMO when SCR_EL3.EA is set. We can still inject an vSEI into 
host OS in EL3.
Physical SError won't be taken to EL2 because SCR_EL3.EA is set. But this may 
be too racy is
not consistent with ARM rules.
> 
> 
>> Then ESB is executed again, and DISR_EL1.A is set by hardware (2.4.4 ESB and 
>> virtual errors), so that
>> the following process can be executed.
> 
> 
>> So we want to inject a vSEI into OS, when control is returned from EL3/2 to 
>> OS, no matter whether
>> it is on host OS or guest OS.
> 
> I disagree. With Linux/KVM you can't use Virtual SError to notify the host OS.
> The host OS expects to receive Physical SError, these shouldn't be taken to 
> EL2
> unless a guest is running. Notifications from firmware that use SEA or SEI
> should follow the routing rules in the ARM-ARM, which means they should never
> reach a guest OS.

Yes, I agree. When running the host os, exception should not be taken to EL2, 
because
EL2 SEI vector always suppose that exception is taken from guest os and save
guest context in the first place.
> 
> For VHE the host runs at EL2 and sets HCR_EL2.AMO. Any firmware notification
> should come from EL3 to the host at EL2. The host may choose to notify the
> guest, but this should always go via Qemu.
> 
> For non-VHE systems the host runs at EL1 and KVM lives at EL2 to do the world
> switch. When KVM is running a guest it sets HCR_EL2.AMO, when it has switched
> back to the host it clears it.
> 
> Notifications from EL3 that pretend to be SError should route SError to EL2 or
> EL1 depending on HCR_EL2.AMO.
> When KVM gets a Synchronous External Abort or an SError while a guest is 
> running
> it will switch back to the host and tell the handle_exit() code.  Again the 
> host
> may choose to notify the guest, but this should always go via Qemu.
> 
> The ARM-ARM pseudo code for the routing rules is in
> AArch64.TakePhysicalSErrorException(). Firmware injecting fake SError should
> behave exactly like this.
> 
> 
> Newly appeared in the ARM-ARM is HCR_EL2.TEA, which takes Synchronous External
> Aborts to EL2 (if SCR_EL3 hasn't claimed them). We should set/clear this bit
> like we do HCR_EL2.AMO if the CPU has the RAS extensions.
> 
> 
>>> You cant use SError to cover all the possible RAS exceptions. We already 

Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

2017-04-21 Thread Xiongfeng Wang
Hi XiuQi,

On 2017/3/30 18:31, Xie XiuQi wrote:
> Error Synchronization Barrier (ESB; part of the ARMv8.2 Extensions)
> is used to synchronize Unrecoverable errors. That is, containable errors
> architecturally consumed by the PE and not silently propagated.
> 
> With ESB it is generally possible to isolate an unrecoverable error
> between two ESB instructions. So, it's possible to recovery from
> unrecoverable errors reported by asynchronous SError interrupt.
> 
> If ARMv8.2 RAS Extension is not support, ESB is treated as a NOP.
> 
> Signed-off-by: Xie XiuQi 
> Signed-off-by: Wang Xiongfeng 
> ---
>  arch/arm64/Kconfig   | 16 ++
>  arch/arm64/include/asm/esr.h | 14 +
>  arch/arm64/kernel/entry.S| 70 
> ++--
>  arch/arm64/kernel/traps.c| 54 --
>  4 files changed, 150 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 859a90e..7402175 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -911,6 +911,22 @@ endmenu
>  
>  menu "ARMv8.2 architectural features"
>  
> +config ARM64_ESB
> + bool "Enable support for Error Synchronization Barrier (ESB)"
> + default n
> + help
> +   Error Synchronization Barrier (ESB; part of the ARMv8.2 Extensions)
> +   is used to synchronize Unrecoverable errors. That is, containable 
> errors
> +   architecturally consumed by the PE and not silently propagated.
> +
> +   Without ESB it is not generally possible to isolate an Unrecoverable
> +   error because it is not known which instruction generated the error.
> +
> +   Selecting this option allows inject esb instruction before the 
> exception
> +   change. If ARMv8.2 RAS Extension is not support, ESB is treated as a 
> NOP.
> +
> +   Note that ESB instruction can introduce slight overhead, so say N if 
> unsure.
> +
>  config ARM64_UAO
>   bool "Enable support for User Access Override (UAO)"
>   default y
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index f20c64a..22f9c90 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -106,6 +106,20 @@
>  #define ESR_ELx_AR   (UL(1) << 14)
>  #define ESR_ELx_CM   (UL(1) << 8)
>  
> +#define ESR_Elx_DFSC_SEI (0x11)
> +
> +#define ESR_ELx_AET_SHIFT(10)
> +#define ESR_ELx_AET_MAX  (7)
> +#define ESR_ELx_AET_MASK (UL(7) << ESR_ELx_AET_SHIFT)
> +#define ESR_ELx_AET(esr) (((esr) & ESR_ELx_AET_MASK) >> 
> ESR_ELx_AET_SHIFT)
> +
> +#define ESR_ELx_AET_UC   (0)
> +#define ESR_ELx_AET_UEU  (1)
> +#define ESR_ELx_AET_UEO  (2)
> +#define ESR_ELx_AET_UER  (3)
> +#define ESR_ELx_AET_CE   (6)
> +
> +
>  /* ISS field definitions for exceptions taken in to Hyp */
>  #define ESR_ELx_CV   (UL(1) << 24)
>  #define ESR_ELx_COND_SHIFT   (20)
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 43512d4..d8a7306 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -69,7 +69,14 @@
>  #define BAD_FIQ  2
>  #define BAD_ERROR3
>  
> + .arch_extension ras
> +
>   .macro  kernel_entry, el, regsize = 64
because we also want to take SEI exception in kernel space, we may need to 
unmask SEI in kernel_entry.
> +#ifdef CONFIG_ARM64_ESB
> + .if \el == 0
> + esb
> + .endif
> +#endif
>   sub sp, sp, #S_FRAME_SIZE
>   .if \regsize == 32
>   mov w0, w0  // zero upper 32 bits of x0
> @@ -208,6 +215,7 @@ alternative_else_nop_endif
>  #endif
>  
>   .if \el == 0
> + msr daifset, #0xF   // Set flags
>   ldr x23, [sp, #S_SP]// load return stack pointer
>   msr sp_el0, x23
>  #ifdef CONFIG_ARM64_ERRATUM_845719
> @@ -226,6 +234,15 @@ alternative_else_nop_endif
>  
>   msr elr_el1, x21// set up the return data
>   msr spsr_el1, x22
> +
> +#ifdef CONFIG_ARM64_ESB
> + .if \el == 0
> + esb // Error Synchronization Barrier
> + mrs x21, disr_el1   // Check for deferred error
> + tbnzx21, #31, el1_sei
> + .endif
> +#endif
> +
>   ldp x0, x1, [sp, #16 * 0]
>   ldp x2, x3, [sp, #16 * 1]
>   ldp x4, x5, [sp, #16 * 2]
> @@ -318,7 +335,7 @@ ENTRY(vectors)
>   ventry  el1_sync_invalid// Synchronous EL1t
>   ventry  el1_irq_invalid // IRQ EL1t
>   ventry  el1_fiq_invalid // FIQ EL1t
> - ventry  el1_error_invalid   // Error EL1t
> + ventry  el1_error   // Error EL1t
>  
>   ventry  el1_sync// Synchronous EL1h
>   ventry  el1_irq  

Re: [PATCH v4 01/22] KVM: arm/arm64: Add vITS save/restore API documentation

2017-04-21 Thread Christoffer Dall
On Mon, Apr 10, 2017 at 04:26:14PM +0200, Auger Eric wrote:
> Hi Christoffer,
> 
> On 08/04/2017 20:17, Christoffer Dall wrote:
> > Hi Eric,
> > 
> > Most of my comments below are just me being picky about text when
> > defining a user space ABI, so I think this mainly looks good, but just
> > needs a bit of clarify, except the versioning aspect and exporting the
> > pending table vie the redistributor instead, as Marc and Andre have
> > pointed out.
> > 
> > On Mon, Mar 27, 2017 at 11:30:51AM +0200, Eric Auger wrote:
> >> Add description for how to access vITS registers and how to flush/restore
> >> vITS tables into/from memory
> >>
> >> Signed-off-by: Eric Auger 
> >>
> >> ---
> >> v3 -> v4:
> >> - take into account Peter's comments:
> >>   - typos
> >>   - KVM_DEV_ARM_VGIC_GRP_ITS_TABLES kvm_device_attr = 0
> >>   - add a validity bit in DTE
> >>   - document all fields in CTE and ITE
> >>   - document ABI revision
> >> - take into account Andre's comments:
> >>   - document restrictions about GITS_CREADR writing and GITS_IIDR
> >>   - document -EBUSY error if one or more VCPUS are runnning
> >>   - document 64b registers only can be accessed with 64b access
> >> - itt_addr field matches bits [51:8] of the itt_addr
> >>
> >> v1 -> v2:
> >> - DTE and ITE now are 8 bytes
> >> - DTE and ITE now indexed by deviceid/eventid
> >> - use ITE name instead of ITTE
> >> - mentions ITT_addr matches bits [51:8] of the actual address
> >> - mentions LE layout
> >> ---
> >>  Documentation/virtual/kvm/devices/arm-vgic-its.txt | 118 
> >> +
> >>  1 file changed, 118 insertions(+)
> >>
> >> diff --git a/Documentation/virtual/kvm/devices/arm-vgic-its.txt 
> >> b/Documentation/virtual/kvm/devices/arm-vgic-its.txt
> >> index 6081a5b..0902d20 100644
> >> --- a/Documentation/virtual/kvm/devices/arm-vgic-its.txt
> >> +++ b/Documentation/virtual/kvm/devices/arm-vgic-its.txt
> >> @@ -36,3 +36,121 @@ Groups:
> >>  -ENXIO:  ITS not properly configured as required prior to setting
> >>   this attribute
> >>  -ENOMEM: Memory shortage when allocating ITS internal data
> >> +
> >> +  KVM_DEV_ARM_VGIC_GRP_ITS_REGS
> >> +  Attributes:
> >> +  The attr field of kvm_device_attr encodes the offset of the
> >> +  ITS register, relative to the ITS control frame base address
> >> +  (ITS_base).
> >> +
> >> +  kvm_device_attr.addr points to a __u64 value whatever the width
> >> +  of the addressed register (32/64 bits). 64 bit registers can only
> >> +  be accessed with full length.
> >> +
> >> +  Writes to read-only registers are ignored by the kernel except for:
> >> +  - GITS_READR. It needs to be restored otherwise commands in the 
> >> queue
> >> +will be re-executed after CWRITER setting. Writing this register 
> >> is
> >  ^^^
> > 
> > "after restoring CWRITER." ?
> OK
> > 
> >> +allowed if the ITS is not enabled (GITS_CTLR.enable = 0). Also it
> > 
> > Does that mean that you can only save/restore a disabled ITS or does it
> > mean that initially after creating the ITS it is disabled and userspace
> > should restore the CWRITER before restoring GITS_CTLR (which may enable
> > the ITS) ?
> No I meant the second. As normally the ITS is responsible for updating
> the GITS_READR, if the userspace plays with it while the ITS is enabled
> this can mess everything. So the userspace is supposed to restore the
> GITS_READR *before* restoring the GITS_CTLR which is likely to enable
> the ITS.
> > 
> >> +needs to be restored after GITS_CBASER since a write to 
> >> GITS_CBASER
> >> +resets GITS_CREADR.
> >> +  - GITS_IIDR. Its Revision field encodes the table layout ABI 
> >> revision.
> > 
> > How is this really going to work?  Your ABI here must be backwards
> > compatible for future revisions, so what is userspace supposed to do,
> > when it reads a newer revision than it was programmed for?
> destination ABI revision must be >= source ABI revision
> By restoring the IIDR value on the destination side, the userspace
> informs destination KVM about the layout of the table. If the
> destination KVM does not support this ABI revision, the restoration of
> the ITS table fails. Is that wrong?
> 

No, that's fine.  I think my confusion was that I didn't appreciate that
the revision thing is for the table layouts only, and not the other
calls here.

> > 
> > I think we need a more clear description of how the revision is going to
> > be used, such that each operation on the ITS here is described as
> > requiring a minimum revision X, and making sure that userspace can
> > safely ignore things that are of a higher revision number, while at the
> > same time userspace can decide not to use newer features with older
> > kernels.
> 
> At the moment I did not plan to implement any way for the userspace to
> force the ITS state save in a specified ABI revision. Today it uses the