Re: [PATCH v2 8/9] KVM: x86: add force_intercept_exceptions_mask

2021-04-27 Thread Borislav Petkov
On Thu, Apr 01, 2021 at 04:54:50PM +0300, Maxim Levitsky wrote:
> This parameter will be used by VMX and SVM code to force
> interception of a set of exceptions, given by a bitmask
> for guest debug and/or kvm debug.
> 
> This option is not intended for production.
> 
> This is based on an idea first shown here:
> https://patchwork.kernel.org/project/kvm/patch/20160301192822.gd22...@pd.tnic/
> 
> CC: Borislav Petkov 
> Signed-off-by: Maxim Levitsky 
> ---
>  arch/x86/kvm/x86.c | 3 +++
>  arch/x86/kvm/x86.h | 2 ++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3627ce8fe5bb..1a51031d64d8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -176,6 +176,9 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
>  int __read_mostly pi_inject_timer = -1;
>  module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
>  
> +uint force_intercept_exceptions_mask;
> +module_param(force_intercept_exceptions_mask, uint, S_IRUGO | S_IWUSR);
> +EXPORT_SYMBOL_GPL(force_intercept_exceptions_mask);

That's nice.

I could use some text explaning the usage though, i.e. that thing takes
a bitfield of exception vectors, so that I don't have to look at the
code each time. :-)

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 18/26] ACPI / APEI: Make GHES estatus header validation more user friendly

2019-02-01 Thread Borislav Petkov
On Tue, Jan 29, 2019 at 06:48:54PM +, James Morse wrote:
> ghes_read_estatus() checks various lengths in the top-level header to
> ensure the CPER records to be read aren't obviously corrupt.
> 
> Take the opportunity to make this more user-friendly, printing a
> (ratelimited) message about the nature of the header format error.
> 
> Suggested-by: Borislav Petkov 
> Signed-off-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 46 
>  1 file changed, 32 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index f95db2398dd5..9391fff71344 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -293,6 +293,30 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 
> paddr, u32 len,
>   }
>  }
>  
> +/* Check the top-level record header has an appropriate size. */
> +int __ghes_check_estatus(struct ghes *ghes,

static.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 09/26] ACPI / APEI: Generalise the estatus queue's notify code

2019-02-01 Thread Borislav Petkov
On Tue, Jan 29, 2019 at 06:48:45PM +, James Morse wrote:
> +static int ghes_in_nmi_spool_from_list(struct list_head *rcu_list)
> +{
> + int err, ret = -ENOENT;
> + struct ghes *ghes;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(ghes, rcu_list, list) {
> + err = ghes_in_nmi_queue_one_entry(ghes);
> + if (!err)
> + ret = 0;

Do I understand this correctly that we want to do "ret = 0" for at least
one record which ghes_in_nmi_queue_one_entry() has succeeded queueing?

For those for which it has returned -ENOENT, estatus has been cleared,
nothing has been queued so we don't have to do anything for that
particular entry...

Btw, you don't really need the err variable:

if (!ghes_in_nmi_queue_one_entry(ghes))
ret = 0;

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 04/26] ACPI / APEI: Make hest.c manage the estatus memory pool

2019-02-01 Thread Borislav Petkov
On Tue, Jan 29, 2019 at 06:48:40PM +, James Morse wrote:
> ghes.c has a memory pool it uses for the estatus cache and the estatus
> queue. The cache is initialised when registering the platform driver.
> For the queue, an NMI-like notification has to grow/shrink the pool
> as it is registered and unregistered.
> 
> This is all pretty noisy when adding new NMI-like notifications, it
> would be better to replace this with a static pool size based on the
> number of users.
> 
> As a precursor, move the call that creates the pool from ghes_init(),
> into hest.c. Later this will take the number of ghes entries and
> consolidate the queue allocations.
> Remove ghes_estatus_pool_exit() as hest.c doesn't have anywhere to put
> this.
> 
> The pool is now initialised as part of ACPI's subsys_initcall():
> (acpi_init(), acpi_scan_init(), acpi_pci_root_init(), acpi_hest_init())
> Before this patch it happened later as a GHES specific device_initcall().
> 
> Signed-off-by: James Morse 
> ---
> Changes since v7:
> * Moved the pool init later, the driver isn't probed until device_init.
> ---
>  drivers/acpi/apei/ghes.c | 33 ++---
>  drivers/acpi/apei/hest.c | 10 +-
>  include/acpi/ghes.h  |  2 ++
>  3 files changed, 17 insertions(+), 28 deletions(-)

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 22/25] ACPI / APEI: Kick the memory_failure() queue for synchronous errors

2019-01-31 Thread Borislav Petkov
On Wed, Jan 23, 2019 at 06:40:08PM +, James Morse wrote:
> My SMM comment was because the CPU must jump from user-space->SMM, which 
> injects
> an NMI into the kernel. The kernel's EIP must point into user-space, so
> returning from the NMI without doing the memory_failure() work puts us back 
> the
> same position we started in.

Yeah, known issue. We dealt with that on x86 at the time:

d4812e169de4 ("x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks")

> > Now, memory_failure_queue() does that and can run from IRQ context so
> > you need only an irq_work which can queue from NMI context. We do it
> > this way in the MCA code:
> > 
> 
> (was there something missing here?)

Whoops. Yeah, I was about to paste this:

void mce_log(struct mce *m)
{
if (!mce_gen_pool_add(m))
irq_work_queue(_irq_work);
}

we're basically queueing only into the lockless buffer and kicking the
IRQ work.

> > We queue in an irq_work in NMI context and work through the items in
> > process context.
> 
> How are you getting from NMI to process context in one go?

Well, #MC is basically an NMI context on x86 and when it is done, we
work through the items queued in process context. But see the commit
above too - for really urgent errors we run memory_failure *before* we
return to user.

> This patch causes the IRQ->process transition.
> The arch specific bit of this gives the irq work queue a kick if returning 
> from
> the NMI would unmask IRQs. This makes it look like we moved from NMI to IRQ
> context without returning to user-space.
> 
> Once ghes_handle_memory_failure() runs in IRQ context, it task_work_add()s the
> call to ghes_kick_memory_failure().
> 
> Finally on the way out of the kernel to user-space that task_work runs and the
> memory_failure() work happens in process context.
> 
> During all this the user-space program counter can point at a poisoned 
> location,
> but we don't return there until the memory_failure() work has been done.

Sounds very similar.

Actually, yours is even a bit more elegant. I wonder why we didn't use
task_work_add() then...

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 20/25] ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications

2019-01-31 Thread Borislav Petkov
On Wed, Jan 23, 2019 at 06:33:02PM +, James Morse wrote:
> Was the best I had, but this trips the BUILD_BUG() too early.
> With it, x86 BUILD_BUG()s. With just the -1 the path gets pruned out, and 
> there
> are no 'sdei' symbols in the object file.
> 
> ...at this point, I stopped caring!

Yah, you said it: __end_of_fixed_addresses will practically give you the
BUG behavior:

if (idx >= __end_of_fixed_addresses) {
BUG();
return;
}

and ARM64 does the same.

> We already skip registering notifiers if the kconfig option wasn't selected.
> 
> We can't catch this at compile time, as the dead-code elimination seems to
> happen in multiple passes.
> 
> I'll switch the SDEI ones to __end_of_fixed_addresses, as both architectures
> BUG() when they see this.

Right.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 10/25] ACPI / APEI: Tell firmware the estatus queue consumed the records

2019-01-31 Thread Borislav Petkov
On Tue, Jan 29, 2019 at 06:48:33PM +, James Morse wrote:
> If firmware has never generated CPER records, so it has never written to void
> *error_status_address, yes.

I guess this is the bit of information I was missing.

> There seem to be two ways of doing this. This zero check implies an example
> system could be:
> | g->error_status_address == 0xf00d
> | *(u64 *)0xf00d == 0
> Firmware populates CPER records, then updates 0xf00d.
> (0xf00d would have been pre-mapped by apei_map_generic_address() in 
> ghes_new())
> Reads of 0xf00d before CPER records are generated get 0.

Ok, this sounds like the polled case. FW better have a record ready
before raising the NMI.

> Once an error occurs, this system now looks like this:
> | g->error_status_address == 0xf00d
> | *(u64 *)0xf00d == 0xbeef
> | *(u64 *)0xbeef == 0
> 
> For new errors, firmware populates CPER records, then updates 0xf00d.
> Alternatively firmware could re-use the memory at 0xbeef, generating the CPER
> records backwards, so that once 0xbeef is updated, the rest of the record is
> visible. (firmware knows not to race with another CPU right?)

Thanks for the comic relief. :-P

> Firmware could equally point 0xf00d at 0xbeef at startup, so it has one fewer
> values to write when an error occurs. I have an arm64 system with a HEST that
> does this. (I'm pretty sure its ACPI support is a copy-and-paste from x86, it
> even describes NOTIFY_NMI, who knows what that means on arm!)

Oh the fun.

> When linux processes an error, ghes_clear_estatus() NULLs the
> estatus->block_status, (which in this example is at 0xbeef). This is the
> documented sequence for GHESv2.
> Elsewhere the spec talks of checking the block status which is part of the
> records, (not the error_status_address, which is the pointer to the records).
>
> Linux can't NULL 0xf00d, because it doesn't know if firmware will write it 
> again
> next time it updates the records.
> I can't find where in the spec it says the error status address is written to.
> Linux works with both 'at boot' and 'on each error'.
> If it were know to have a static value, ghes_copy_tofrom_phys() would not have
> been necessary, but its been there since d334a49113a4.
>
> In the worst case, if there is a value at the error_status_address, we have to
> map/unmap it every time we poll in case firmware wrote new records at that 
> same
> location.
> 
> I don't think we can change Linux's behaviour here, without interpreting zero 
> as
> CPER records or missing new errors.

Nah, I was simply trying to figure out why we do that buf_paddr check.
Thanks for the extensive clarification.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 10/25] ACPI / APEI: Tell firmware the estatus queue consumed the records

2019-01-29 Thread Borislav Petkov
On Wed, Jan 23, 2019 at 06:36:38PM +, James Morse wrote:
> Do you consider ENOENT an error? We don't ack in that case as the
> memory wasn't in use.

So let's see:

if (!*buf_paddr)
return -ENOENT;

can happen when apei_read() has returned 0 but it has managed to do

*val = 0;

Now, that function returns error values which we should be checking
but we're checking the buf_paddr pointed to value for being 0. Are
we fearing that even if acpi_os_read_memory() or acpi_os_read_port()
succeed, *buf_paddr could still be 0 ?

Because if not, we should be checking whether rc == -EINVAL and then
convert it to -ENOENT.

But ghes_read_estatus() handles the error case first *and* *then* checks
buf_paddr too, to make really really sure we won't be reading from
address 0.

> For the other cases its because the records are bogus, but we still
> unconditionally tell firmware we're done with them.

... to free the memory, yes, ok.

> >> I think it is. 18.3.2.8 of ACPI v6.2 (search for Generic Hardware Error 
> >> Source
> >> version 2", then below the table):
> >> * OSPM detects error (via interrupt/exception or polling the block status)
> >> * OSPM copies the error status block
> >> * OSPM clears the block status field of the error status block
> >> * OSPM acknowledges the error via Read Ack register
> >>
> >> The ENOENT case is excluded by 'polling the block status'.
> > 
> > Ok, so we signal the absence of an error record with ENOENT.
> > 
> > if (!buf_paddr)
> > return -ENOENT;
> > 
> > Can that even happen?
> 
> Yes, for NOTIFY_POLLED its the norm. For the IRQ flavours that walk a list of
> GHES, all but one of them will return ENOENT.

Lemme get this straight: when we do

apei_read(_paddr, >error_status_address);

in the polled case, buf_paddr can be 0?

> We could try it and see. It depends if firmware shares ack locations between
> multiple GHES. We could ack an empty GHES, and it removes the records of one 
> we
> haven't looked at yet.

Yeah, OTOH, we shouldn't be pushing our luck here, I guess.

So let's sum up: we'll ack the GHES error in all but the -ENOENT cases
in order to free the memory occupied by the error record.

The slightly "pathological" -ENOENT case is I guess how the fw behaves
when it is being polled and also for broken firmware which could report
a 0 buf_paddr.

Btw, that last thing I'm assuming because

  d334a49113a4 ("ACPI, APEI, Generic Hardware Error Source memory error 
support")

doesn't say what that check was needed for.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 22/25] ACPI / APEI: Kick the memory_failure() queue for synchronous errors

2019-01-22 Thread Borislav Petkov
On Mon, Dec 10, 2018 at 07:15:13PM +, James Morse wrote:
> What happens if we miss MF_ACTION_REQUIRED?

AFAICU, the logic is to force-send a signal to the user process, i.e.,
force_sig_info() which cannot be ignored. IOW, an "enlightened" process
would know how to do recovery action from a memory error.

VS the action optional thing which you can handle at your leisure.

So the question boils down to what kind of severity do the errors
reported through SEA have? I mean, if the hw would go the trouble to do
the synchronous reporting, then something important must've happened and
it wants us to know about it and handle it.

> Surely the page still gets unmapped as its PG_Poisoned, an AO signal
> may be pending, but if user-space touches the page it will get an AR
> signal. Is this just about removing an extra AO signal to user-space?
>
> If we do need this, I'd like to pick it up from the CPER records, as x86's
> NOTIFY_NMI looks like it covers both AO/AR cases. (as does NOTIFY_SDEI). The
> Master/Target abort or Invalid-address types in the memory-error-section CPER
> records look like the best bet.

Right, and we do all kinds of severity mapping there aka ghes_severity()
so that'll be a good start, methinks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 22/25] ACPI / APEI: Kick the memory_failure() queue for synchronous errors

2019-01-21 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:06:10PM +, James Morse wrote:
> memory_failure() offlines or repairs pages of memory that have been
> discovered to be corrupt. These may be detected by an external
> component, (e.g. the memory controller), and notified via an IRQ.
> In this case the work is queued as not all of memory_failure()s work
> can happen in IRQ context.
> 
> If the error was detected as a result of user-space accessing a
> corrupt memory location the CPU may take an abort instead. On arm64
> this is a 'synchronous external abort', and on a firmware first
> system it is replayed using NOTIFY_SEA.
> 
> This notification has NMI like properties, (it can interrupt
> IRQ-masked code), so the memory_failure() work is queued. If we
> return to user-space before the queued memory_failure() work is
> processed, we will take the fault again. This loop may cause platform
> firmware to exceed some threshold and reboot when Linux could have
> recovered from this error.
> 
> If a ghes notification type indicates that it may be triggered again
> when we return to user-space, use the task-work and notify-resume
> hooks to kick the relevant memory_failure() queue before returning
> to user-space.
> 
> Signed-off-by: James Morse 
> 
> ---
> current->mm == _mm ? I couldn't find a helper for this.
> The intent is not to set TIF flags on kernel threads. What happens
> if a kernel-thread takes on of these? Its just one of the many
> not-handled-very-well cases we have already, as memory_failure()
> puts it: "try to be lucky".
> 
> I assume that if NOTIFY_NMI is coming from SMM it must suffer from
> this problem too.

Good question.

I'm guessing all those things should be queued on a normal struct
work_struct queue, no?

Now, memory_failure_queue() does that and can run from IRQ context so
you need only an irq_work which can queue from NMI context. We do it
this way in the MCA code:

We queue in an irq_work in NMI context and work through the items in
process context.

> ---
>  drivers/acpi/apei/ghes.c | 65 
>  1 file changed, 60 insertions(+), 5 deletions(-)

...

> @@ -407,7 +447,22 @@ static void ghes_handle_memory_failure(struct 
> acpi_hest_generic_data *gdata, int
>  
>   if (flags != -1)
>   memory_failure_queue(pfn, flags);
> -#endif
> +
> + /*
> +  * If the notification indicates that it was the interrupted
> +  * instruction that caused the error, try to kick the
> +  * memory_failure() queue before returning to user-space.
> +  */
> + if (ghes_is_synchronous(ghes) && current->mm != _mm) {
> + callback = kzalloc(sizeof(*callback), GFP_ATOMIC);

Can we avoid that GFP_ATOMIC allocation and kfree() in
ghes_kick_memory_failure()?

I mean, that struct ghes_memory_failure_work is small enough and we
already do lockless allocation:

estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool, node_len);

so I guess we could add that ghes_memory_failure_work struct to that
estatus_node, hand it into ghes_do_proc() and then free it.

No?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 20/25] ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications

2019-01-21 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:06:08PM +, James Morse wrote:
> Now that ghes notification helpers provide the fixmap slots and
> take the lock themselves, multiple NMI-like notifications can
> be used on arm64.
> 
> These should be named after their notification method as they can't
> all be called 'NMI'. x86's NOTIFY_NMI already is, change the SEA
> fixmap entry to be called FIX_APEI_GHES_SEA.
> 
> Future patches can add support for FIX_APEI_GHES_SEI and
> FIX_APEI_GHES_SDEI_{NORMAL,CRITICAL}.
> 
> Because all of ghes.c builds on both architectures, provide a
> constant for each fixmap entry that the architecture will never
> use.
> 
> Signed-off-by: James Morse 
> 
> ---
> Changes since v6:
>  * Added #ifdef definitions of each missing fixmap entry.
> 
> Changes since v3:
>  * idx/lock are now in a separate struct.
>  * Add to the comment above ghes_fixmap_lock_irq so that it makes more
>sense in isolation.
> 
> fixup for split fixmap
> ---
>  arch/arm64/include/asm/fixmap.h |  2 +-
>  drivers/acpi/apei/ghes.c| 10 +-
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
> index ec1e6d6fa14c..966dd4bb23f2 100644
> --- a/arch/arm64/include/asm/fixmap.h
> +++ b/arch/arm64/include/asm/fixmap.h
> @@ -55,7 +55,7 @@ enum fixed_addresses {
>  #ifdef CONFIG_ACPI_APEI_GHES
>   /* Used for GHES mapping from assorted contexts */
>   FIX_APEI_GHES_IRQ,
> - FIX_APEI_GHES_NMI,
> + FIX_APEI_GHES_SEA,
>  #endif /* CONFIG_ACPI_APEI_GHES */
>  
>  #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 849da0d43a21..6cbf9471b2a2 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -85,6 +85,14 @@
>   ((struct acpi_hest_generic_status *)\
>((struct ghes_estatus_node *)(estatus_node) + 1))
>  
> +/* NMI-like notifications vary by architecture. Fill in the fixmap gaps */
> +#ifndef CONFIG_HAVE_ACPI_APEI_NMI
> +#define FIX_APEI_GHES_NMI-1
> +#endif
> +#ifndef CONFIG_ACPI_APEI_SEA
> +#define FIX_APEI_GHES_SEA-1

I'm guessing those -1 are going to cause __set_fixmap() to fail, right?

I'm wondering if we could catch that situation in ghes_map() already to
protect ourselves against future changes in the fixmap code...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 19/25] ACPI / APEI: Only use queued estatus entry during _in_nmi_notify_one()

2019-01-21 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:06:07PM +, James Morse wrote:
> Each struct ghes has an worst-case sized buffer for storing the
> estatus. If an error is being processed by ghes_proc() in process
> context this buffer will be in use. If the error source then triggers
> an NMI-like notification, the same buffer will be used by
> _in_nmi_notify_one() to stage the estatus data, before
> __process_error() copys it into a queued estatus entry.
> 
> Merge __process_error()s work into _in_nmi_notify_one() so that
> the queued estatus entry is used from the beginning. Use the new
> ghes_peek_estatus() to know how much memory to allocate from
> the ghes_estatus_pool before reading the records.
> 
> Reported-by: Borislav Petkov 
> Signed-off-by: James Morse 
> 
> Change since v6:
>  * Added a comment explaining the 'ack-error, then goto no_work'.
>  * Added missing esatus-clearing, which is necessary after reading the GAS,
> ---
>  drivers/acpi/apei/ghes.c | 59 
>  1 file changed, 35 insertions(+), 24 deletions(-)

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 18/25] ACPI / APEI: Split ghes_read_estatus() to allow a peek at the CPER length

2019-01-21 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:06:06PM +, James Morse wrote:
> ghes_read_estatus() reads the record address, then the record's
> header, then performs some sanity checks before reading the
> records into the provided estatus buffer.
> 
> To provide this estatus buffer the caller must know the size of the
> records in advance, or always provide a worst-case sized buffer as
> happens today for the non-NMI notifications.
> 
> Add a function to peek at the record's header to find the size. This
> will let the NMI path allocate the right amount of memory before reading
> the records, instead of using the worst-case size, and having to copy
> the records.
> 
> Split ghes_read_estatus() to create __ghes_peek_estatus() which
> returns the address and size of the CPER records.
> 
> Signed-off-by: James Morse 
> 
> Changes since v6:
>  * Additional buf_addr = 0 error handling
>  * Moved checking out of peek-estatus
>  * Reworded an error message so we can tell them apart
> ---
>  drivers/acpi/apei/ghes.c | 59 
>  1 file changed, 47 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index b70f5fd962cc..07a12aac4c1a 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -277,12 +277,12 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 
> paddr, u32 len,
>   }
>  }
>  
> -static int ghes_read_estatus(struct ghes *ghes,
> -  struct acpi_hest_generic_status *estatus,
> -  u64 *buf_paddr, int fixmap_idx)
> +/* Read the CPER block and returning its address, and header in estatus. */

s/and /,/

> +static int __ghes_peek_estatus(struct ghes *ghes, int fixmap_idx,
> +struct acpi_hest_generic_status *estatus,

Also, we probably should stick to some order of arguments of those
functions for easier code staring, i.e.

function_name(ghes, estatus, buf_paddr, fixmap_idx)

or so.

> +u64 *buf_paddr)
>  {
>   struct acpi_hest_generic *g = ghes->generic;
> - u32 len;
>   int rc;
>  
>   rc = apei_read(buf_paddr, >error_status_address);
> @@ -303,29 +303,64 @@ static int ghes_read_estatus(struct ghes *ghes,
>   return -ENOENT;
>   }
>  
> - rc = -EIO;
> - len = cper_estatus_len(estatus);
> + return 0;
> +}
> +
> +/* Check the top-level record header has an appropriate size. */
> +int __ghes_check_estatus(struct ghes *ghes,
> +  struct acpi_hest_generic_status *estatus)
> +{
> + u32 len = cper_estatus_len(estatus);
> + int rc = -EIO;
> +
>   if (len < sizeof(*estatus))
>   goto err_read_block;
>   if (len > ghes->generic->error_block_length)
>   goto err_read_block;
>   if (cper_estatus_check_header(estatus))
>   goto err_read_block;

Please make this chunk more user-friendly, maybe in a separate patch ontop:

/* Check the top-level record header has an appropriate size. */
int __ghes_check_estatus(struct ghes *ghes,
 struct acpi_hest_generic_status *estatus)
{
u32 len = cper_estatus_len(estatus);

if (len < sizeof(*estatus)) {
pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status 
block!\n");
return -EIO;
}

if (len > ghes->generic->error_block_length) {
pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status 
block length!\n");
return -EIO;
}

if (cper_estatus_check_header(estatus)) {
pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid CPER header!\n");
return -EIO;
}

return 0;
}

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 17/25] ACPI / APEI: Pass ghes and estatus separately to avoid a later copy

2019-01-21 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:06:05PM +, James Morse wrote:
> The NMI-like notifications scribble over ghes->estatus, before
> copying it somewhere else. If this interrupts the ghes_probe() code
> calling ghes_proc() on each struct ghes, the data is corrupted.
> 
> All the NMI-like notifications should use a queued estatus entry
> from the beginning, instead of the ghes version, then copying it.
> To do this, break up any use of "ghes->estatus" so that all
> functions take the estatus as an argument.
> 
> This patch just moves these ghes->estatus dereferences into separate

s/This patch just moves/Move/

> arguments, no change in behaviour. struct ghes becomes unused in
> ghes_clear_estatus() as it only wanted ghes->estatus, which we now
> pass directly. This is removed.
> 
> Signed-off-by: James Morse 

...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 11/25] ACPI / APEI: Move NOTIFY_SEA between the estatus-queue and NOTIFY_NMI

2019-01-21 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:05:59PM +, James Morse wrote:
> The estatus-queue code is currently hidden by the NOTIFY_NMI #ifdefs.
> Once NOTIFY_SEA starts using the estatus-queue we can stop hiding
> it as each architecture has a user that can't be turned off.
> 
> Split the existing CONFIG_HAVE_ACPI_APEI_NMI block in two, and move
> the SEA code into the gap.
> 
> This patch moves code around ... and changes the stale comment

s/This patch moves/Move the/

> describing why the status queue is necessary: printk() is no
> longer the issue, its the helpers like memory_failure_queue() that
> aren't nmi safe.
> 
> Signed-off-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 113 ---
>  1 file changed, 59 insertions(+), 54 deletions(-)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 10/25] ACPI / APEI: Tell firmware the estatus queue consumed the records

2019-01-11 Thread Borislav Petkov
On Fri, Jan 11, 2019 at 06:09:28PM +, James Morse wrote:
> We can return on ENOENT out earlier, as nothing needs doing in that case. Its
> what the GHES_TO_CLEAR spaghetti is for, we can probably move the ack thing 
> into
> ghes_clear_estatus(), that way that thing means 'I'm done with this memory'.

That actually sounds nice and other code in the kernel already does
that: when a failure has been encountered during reading status, you
free up resources right then and there. No need for passing retvals back
and forth. And this would simplify the spaghetti. Which is something
good(tm) on its own!

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 10/25] ACPI / APEI: Tell firmware the estatus queue consumed the records

2019-01-11 Thread Borislav Petkov
On Fri, Jan 11, 2019 at 06:25:21PM +, James Morse wrote:
> We ack it in the corrupt-record case too, because we are done with the
> memory.

Ok, so the only thing that we need to do unconditionally is ACK in order
to free the memory. Or is there an exception to that set of steps in
error handling?

> I think it is. 18.3.2.8 of ACPI v6.2 (search for Generic Hardware Error Source
> version 2", then below the table):
> * OSPM detects error (via interrupt/exception or polling the block status)
> * OSPM copies the error status block
> * OSPM clears the block status field of the error status block
> * OSPM acknowledges the error via Read Ack register
> 
> The ENOENT case is excluded by 'polling the block status'.

Ok, so we signal the absence of an error record with ENOENT.

if (!buf_paddr)
return -ENOENT;

Can that even happen?

Also, in that case, what would happen if we ACK the error anyway? We'd
confuse the firmware?

I sure hope firmware is prepared for spurious ACKs :)

> Unsurprisingly the spec doesn't consider the case that firmware generates
> corrupt records!

You mean the EIO case?

Not surprised at all. But we do not report that record so all good.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 10/25] ACPI / APEI: Tell firmware the estatus queue consumed the records

2019-01-11 Thread Borislav Petkov
On Fri, Jan 11, 2019 at 10:32:23AM -0500, Tyler Baicar wrote:
> The kernel would have no way of knowing what to do here.

What do you mean, there's no way of knowing what to do? It needs to
clear registers so that the next error can get reported properly.

Or of the status read failed and it doesn't need to do anything, then it
shouldn't.

Whatever it is, the kernel either needs to do something in the error
case to clean up, or nothing if the firmware doesn't need anything done
in the error case; *or* ack the error in the success case.

This should all be written down somewhere in that GHES v2
spec/doc/writeup whatever, explaining what the OS is supposed to do to
signal the error has been read by the OS.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 10/25] ACPI / APEI: Tell firmware the estatus queue consumed the records

2019-01-11 Thread Borislav Petkov
On Thu, Jan 10, 2019 at 04:01:27PM -0500, Tyler Baicar wrote:
> On Thu, Jan 10, 2019 at 1:23 PM James Morse  wrote:
> > >>
> > >> +if (is_hest_type_generic_v2(ghes) && 
> > >> ghes_ack_error(ghes->generic_v2))
> > >
> > > Since ghes_ack_error() is always prepended with this check, you could
> > > push it down into the function:
> > >
> > > ghes_ack_error(ghes)
> > > ...
> > >
> > >   if (!is_hest_type_generic_v2(ghes))
> > >   return 0;
> > >
> > > and simplify the two callsites :)
> >
> > Great idea! ...
> >
> > .. huh. Turns out for ghes_proc() we discard any errors other than ENOENT 
> > from
> > ghes_read_estatus() if is_hest_type_generic_v2(). This masks EIO.
> >
> > Most of the error sources discard the result, the worst thing I can find is
> > ghes_irq_func() will return IRQ_HANDLED, instead of IRQ_NONE when we didn't
> > really handle the IRQ. They're registered as SHARED, but I don't have an 
> > example
> > of what goes wrong next.
> >
> > I think this will also stop the spurious handling code kicking in to shut 
> > it up
> > if its broken and screaming. Unlikely, but not impossible.
> >
> > Fixed in a prior patch, with Boris' suggestion, ghes_proc()s tail ends up 
> > look
> > like this:
> > --%<--
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> > index 0321d9420b1e..8d1f9930b159 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -700,18 +708,11 @@ static int ghes_proc(struct ghes *ghes)
> >
> >  out:
> > ghes_clear_estatus(ghes, buf_paddr);
> > +   if (rc != -ENOENT)
> > +   rc_ack = ghes_ack_error(ghes);
> >
> > -   if (rc == -ENOENT)
> > -   return rc;
> > -
> > -   /*
> > -* GHESv2 type HEST entries introduce support for error 
> > acknowledgment,
> > -* so only acknowledge the error if this support is present.
> > -*/
> > -   if (is_hest_type_generic_v2(ghes))
> > -   return ghes_ack_error(ghes->generic_v2);
> > -
> > -   return rc;
> > +   /* If rc and rc_ack failed, return the first one */
> > +   return rc ? rc : rc_ack;
> >  }
> > --%<--
> >
> 
> Looks good to me, I guess there's no harm in acking invalid error status 
> blocks.

Err, why?

I don't know what the firmware glue does on ARM but if I'd have to
remain logical - which is hard to do with firmware - the proper thing to
do would be this:

rc = ghes_read_estatus(ghes, _paddr);
if (rc) {
ghes_reset_hardware();
}

/* clear estatus and bla bla */

/* Now, I'm in the success case: */
 ghes_ack_error();


This way, you have the error path clear of something unexpected happened
when reading the hardware, obvious and separated. ghes_reset_hardware()
clears the registers and does the necessary steps to put the hardware in
good state again so that it can report the next error.

And the success path simply acks the error and does possibly the same
thing. The naming of the functions is important though, to denote what
gets called when.

This way you handle all the cases just fine. No looking at the error
type and blabla.

Right?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 09/25] ACPI / APEI: Generalise the estatus queue's notify code

2019-01-11 Thread Borislav Petkov
On Thu, Jan 10, 2019 at 06:21:21PM +, James Morse wrote:
> Something like:
> ghes_notify_nmi() -> in_nmi_spool_from_list(list) -> 
> in_nmi_queue_one_entry(ghes).

Yah, but make that

ghes_notify_nmi() -> ghes_nmi_spool_from_list(list) -> 
ghes_nmi_queue_one_entry(ghes).

to denote it is the GHES NMI path.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


(ghes|hest)_disable

2019-01-11 Thread Borislav Petkov
Ok,

lemme split this out into a separate thread and add Tony.

On Thu, Jan 10, 2019 at 06:20:35PM +, James Morse wrote:
> > Grrr, what an effing mess that code is! There's hest_disable *and*
> > ghes_disable. Do we really need them both?
> 
> ghes_disable lets you ignore the firmware-first notifications, but still 'use'
> the other error sources:
> drivers/pci/pcie/aer.c picks out the three AER types, and uses 
> apei_hest_parse()
> to know if firmware is controlling AER, even if ghes_disable is set.

Ok, that kinda makes sense.

But look what our sparse documentation says:

hest_disable[ACPI]
Disable Hardware Error Source Table (HEST) support;
corresponding firmware-first mode error processing
logic will be disabled.


and from looking at the code, hest_disable is kinda like the master
switch because it gets evaluated first. Right?

Which sounds to me like we want a generic switch which does:

apei=disable_ff_notifications

to explicitly do exactly that - disable the firmware-first notification
method. And then the master switch will be

apei=disable

And we'll be able to pass whatever options here instead of all those
different _disable switches which need lotsa code staring to figure out
what exactly they even do in the first place.

> x86's arch_apei_enable_cmcff() looks like it disables MCE to get firmware to
> handle them. hest_disable would stop this, but instead ghes_disable keeps 
> that,
> and stops the NOTIFY_NMI being registered.

Yeah, and when you boot with ghes_disable, that would say:

pr_info("HEST: Enabling Firmware First mode for corrected errors.\n");

but there will be no notifications and users will scratch heads.

> (do you consider cmdline arguments as ABI, or hard to justify and hard to 
> remove?)

I don't, frankly. I guess we will have to have a transition period where
we keep them and issue a warning message that users should switch to
"apei=xxx" instead and remove them after a lot of time has passed.

> I don't think its broken enough to justify ripping them out. A user of
> ghes_disable would be someone with broken firmware-first handling of AER. They
> need to know firmware is changing the register values behind their back (so 
> need
> to parse the HEST), but want to ignore the junk notifications. It doesn't 
> sound
> like an unlikely scenario.

Yes, that makes sense.

But I think we should add a generic cmdline arg with suboptions and
document exactly what all those do. Similar to "mce=" on x86 which is
nicely documented in Documentation/x86/x86_64/boot-options.txt.

Right now, only a few people understand what those do and in some of the
cases they do too much/the wrong thing.

Thoughts?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 04/25] ACPI / APEI: Make hest.c manage the estatus memory pool

2018-12-19 Thread Borislav Petkov
On Fri, Dec 14, 2018 at 01:56:16PM +, James Morse wrote:
> /me digs a bit,
> 
> ghes_estatus_pool_init() allocates memory from hest_ghes_dev_register().
> Its caller is behind a 'if (!ghes_disable)' in acpi_hest_init(), and is after
> another 2 calls to apei_hest_parse().
> 
> If ghes_disable is set, we don't call this thing.
> If hest_disable is set, acpi_hest_init() exits early.
> If we don't have a HEST table, acpi_hest_init() exits early.
> 
> ... if the HEST table doesn't have any GHES entries, hest_ghes_dev_register() 
> is
> called with ghes_count==0, and does nothing useful. 
> (kmalloc_alloc_array(0,...)
> great!) But we do call ghes_estatus_pool_init().
> 
> I think a check that ghes_count is non-zero before calling
> hest_ghes_dev_register() is the cleanest way to avoid this.

Grrr, what an effing mess that code is! There's hest_disable *and*
ghes_disable. Do we really need them both?

With my simplifier hat on I wanna say, we should have a single switch -
apei_disable - and kill those other two. What a damn mess that is.

> I wanted the estatus pool to be initialised before creating the platform 
> devices
> in case the order of these things is changed in the future and they get probed
> immediately, before the pool is initialised.

Hmmm.

Actually, I meant flipping those two calls:

rc = ghes_estatus_pool_init(ghes_count);
if (rc)
goto out;

rc = apei_hest_parse(hest_parse_ghes, _arr);
if (rc)
goto err;

to

rc = apei_hest_parse(hest_parse_ghes, _arr);
if (rc)
goto err;

rc = ghes_estatus_pool_init(ghes_count);
if (rc)
goto out;

so as not to alloc the pool unnecessarily if the parsing fails.

Also, AFAICT, the order you have them in now might be a problem anyway
if

apei_hest_parse(hest_parse_ghes, _arr);

fails because then you goto err and and that pool leaks, right?

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 10/25] ACPI / APEI: Tell firmware the estatus queue consumed the records

2018-12-11 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:05:58PM +, James Morse wrote:
> ACPI has a GHESv2 which is used on hardware reduced platforms to
> explicitly acknowledge that the memory for CPER records has been
> consumed. This lets an external agent know it can re-use this
> memory for something else.
> 
> Previously notify_nmi and the estatus queue didn't do this as
> they were never used on hardware reduced platforms. Once we move
> notify_sea over to use the estatus queue, it may become necessary.
> 
> Add the call. This is safe for use in NMI context as the
> read_ack_register is pre-mapped by ghes_new() before the
> ghes can be added to an RCU list, and then found by the
> notification handler.
> 
> Signed-off-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 366dbdd41ef3..15d94373ba72 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -926,6 +926,10 @@ static int _in_nmi_notify_one(struct ghes *ghes)
>   __process_error(ghes);
>   ghes_clear_estatus(ghes, buf_paddr);
>  
> + if (is_hest_type_generic_v2(ghes) && ghes_ack_error(ghes->generic_v2))

Since ghes_ack_error() is always prepended with this check, you could
push it down into the function:

ghes_ack_error(ghes)
...

if (!is_hest_type_generic_v2(ghes))
return 0;

and simplify the two callsites :)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 09/25] ACPI / APEI: Generalise the estatus queue's notify code

2018-12-11 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:05:57PM +, James Morse wrote:
> Refactor the estatus queue's pool notification routine from
> NOTIFY_NMI's handlers. This will allow another notification
> method to use the estatus queue without duplicating this code.
> 
> This patch adds rcu_read_lock()/rcu_read_unlock() around the list

s/This patch adds/Add/

> list_for_each_entry_rcu() walker. These aren't strictly necessary as
> the whole nmi_enter/nmi_exit() window is a spooky RCU read-side
> critical section.
> 
> _in_nmi_notify_one() is separate from the rcu-list walker for a later
> caller that doesn't need to walk a list.
> 
> Signed-off-by: James Morse 
> Reviewed-by: Punit Agrawal 
> Tested-by: Tyler Baicar 
> 
> ---
> Changes since v6:
>  * Removed pool grow/remove code as this is no longer necessary.
> 
> Changes since v3:
>  * Removed duplicate or redundant paragraphs in commit message.
>  * Fixed the style of a zero check.
> Changes since v1:
>* Tidied up _in_nmi_notify_one().
> ---
>  drivers/acpi/apei/ghes.c | 63 ++--
>  1 file changed, 41 insertions(+), 22 deletions(-)

...

> +static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
> +{
> + int ret = NMI_DONE;
> +
> + if (!atomic_add_unless(_in_nmi, 1, 1))
> + return ret;
> +
> + if (!ghes_estatus_queue_notified(_nmi))
> + ret = NMI_HANDLED;

So this reads kinda the other way around, at least to me:

"if the queue was *not* notified, the NMI was handled."

Maybe rename to this:

err = process_queue(_nmi);
if (!err)
ret = NMI_HANDLED;

to make it clearer...

And yeah, all those static functions having "ghes_" prefix is just
encumbering readability for no good reason.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 07/25] ACPI / APEI: Remove spurious GHES_TO_CLEAR check

2018-12-11 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:05:55PM +, James Morse wrote:
> ghes_notify_nmi() checks ghes->flags for GHES_TO_CLEAR before going
> on to __process_error(). This is pointless as ghes_read_estatus()
> will always set this flag if it returns success, which was checked
> earlier in the loop. Remove it.
> 
> Signed-off-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index acf0c37e9af9..f7a0ff1c785a 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -936,9 +936,6 @@ static int ghes_notify_nmi(unsigned int cmd, struct 
> pt_regs *regs)
>   __ghes_panic(ghes);
>   }
>  
> - if (!(ghes->flags & GHES_TO_CLEAR))
> - continue;
> -
>   __process_error(ghes);
>   ghes_clear_estatus(ghes, buf_paddr);
>   }
> -- 

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 06/25] ACPI / APEI: Don't store CPER records physical address in struct ghes

2018-12-11 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:05:54PM +, James Morse wrote:
> When CPER records are found the address of the records is stashed
> in the struct ghes. Once the records have been processed, this
> address is overwritten with zero so that it won't be processed
> again without being re-populated by firmware.
> 
> This goes wrong if a struct ghes can be processed concurrently,
> as can happen at probe time when an NMI occurs. If the NMI arrives
> on another CPU, the probing CPU may call ghes_clear_estatus() on the
> records before the handler had finished with them.
> Even on the same CPU, once the interrupted handler is resumed, it
> will call ghes_clear_estatus() on the NMIs records, this memory may
> have already been re-used by firmware.
> 
> Avoid this stashing by letting the caller hold the address. A
> later patch will do away with the use of ghes->flags in the
> read/clear code too.
> 
> Signed-off-by: James Morse 
> 
> ---
> Changes since v6:
>  * Moved earlier in the series
>  * Added buf_adder = 0 on all the error paths, and test for it in
>ghes_estatus_clear() for extra sanity.
> ---
>  drivers/acpi/apei/ghes.c | 40 +++-
>  include/acpi/ghes.h  |  1 -
>  2 files changed, 23 insertions(+), 18 deletions(-)

...

> @@ -349,17 +350,20 @@ static int ghes_read_estatus(struct ghes *ghes)
>   if (rc)
>   pr_warn_ratelimited(FW_WARN GHES_PFX
>   "Failed to read error status block!\n");
> +
>   return rc;
>  }
>  
> -static void ghes_clear_estatus(struct ghes *ghes)
> +static void ghes_clear_estatus(struct ghes *ghes, u64 buf_paddr)
>  {
>   ghes->estatus->block_status = 0;
>   if (!(ghes->flags & GHES_TO_CLEAR))
>   return;
> - ghes_copy_tofrom_phys(ghes->estatus, ghes->buffer_paddr,
> -   sizeof(ghes->estatus->block_status), 0);
> - ghes->flags &= ~GHES_TO_CLEAR;

< newline here.

> + if (buf_paddr) {

Also, you can save yourself an indendation level:

if (!buf_paddr)
return;

ghes_copy...

> + ghes_copy_tofrom_phys(ghes->estatus, buf_paddr,
> +   sizeof(ghes->estatus->block_status), 0);
> + ghes->flags &= ~GHES_TO_CLEAR;
> + }
>  }

With that addressed:

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 05/25] ACPI / APEI: Make estatus pool allocation a static size

2018-12-11 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:05:53PM +, James Morse wrote:
> Adding new NMI-like notifications duplicates the calls that grow
> and shrink the estatus pool. This is all pretty pointless, as the
> size is capped to 64K. Allocate this for each ghes and drop
> the code that grows and shrinks the pool.
> 
> Suggested-by: Borislav Petkov 
> Signed-off-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 49 +---
>  drivers/acpi/apei/hest.c |  2 +-
>  include/acpi/ghes.h  |  2 +-
>  3 files changed, 8 insertions(+), 45 deletions(-)

Nice and simple, cool. Thanks for doing that.

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 04/25] ACPI / APEI: Make hest.c manage the estatus memory pool

2018-12-11 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:05:52PM +, James Morse wrote:
> ghes.c has a memory pool it uses for the estatus cache and the estatus
> queue. The cache is initialised when registering the platform driver.
> For the queue, an NMI-like notification has to grow/shrink the pool
> as it is registered and unregistered.
> 
> This is all pretty noisy when adding new NMI-like notifications, it
> would be better to replace this with a static pool size based on the
> number of users.
> 
> As a precursor, move the call that creates the pool from ghes_init(),
> into hest.c. Later this will take the number of ghes entries and
> consolidate the queue allocations.
> Remove ghes_estatus_pool_exit() as hest.c doesn't have anywhere to put
> this.
> 
> The pool is now initialised as part of ACPI's subsys_initcall():
> (acpi_init(), acpi_scan_init(), acpi_pci_root_init(), acpi_hest_init())
> Before this patch it happened later as a GHES specific device_initcall().
> 
> Signed-off-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 33 ++---
>  drivers/acpi/apei/hest.c |  5 +
>  include/acpi/ghes.h  |  2 ++
>  3 files changed, 13 insertions(+), 27 deletions(-)

...

> diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
> index b1e9f81ebeea..da5fabaeb48f 100644
> --- a/drivers/acpi/apei/hest.c
> +++ b/drivers/acpi/apei/hest.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "apei-internal.h"
>  
> @@ -200,6 +201,10 @@ static int __init hest_ghes_dev_register(unsigned int 
> ghes_count)
>   if (!ghes_arr.ghes_devs)
>   return -ENOMEM;
>  
> + rc = ghes_estatus_pool_init();
> + if (rc)
> + goto out;

Right, this happens before...

> +
>   rc = apei_hest_parse(hest_parse_ghes, _arr);

... this but do we even want to do any memory allocations if we don't
have any HEST tables or we've been disabled by hest_disable?

IOW, we should swap those two calls, methinks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 03/25] ACPI / APEI: Switch estatus pool to use vmalloc memory

2018-12-04 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:05:51PM +, James Morse wrote:
> The ghes code is careful to parse and round firmware's advertised
> memory requirements for CPER records, up to a maximum of 64K.
> However when ghes_estatus_pool_expand() does its work, it splits
> the requested size into PAGE_SIZE granules.
> 
> This means if firmware generates 5K of CPER records, and correctly
> describes this in the table, __process_error() will silently fail as it
> is unable to allocate more than PAGE_SIZE.
> 
> Switch the estatus pool to vmalloc() memory. On x86 vmalloc() memory
> may fault and be fixed up by vmalloc_fault(). To prevent this call
> vmalloc_sync_all() before an NMI handler could discover the memory.
> 
> Signed-off-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 30 +++---
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index e8503c7d721f..c15264f2dc4b 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -170,40 +170,40 @@ static int ghes_estatus_pool_init(void)
>   return 0;
>  }
>  
> -static void ghes_estatus_pool_free_chunk_page(struct gen_pool *pool,
> +static void ghes_estatus_pool_free_chunk(struct gen_pool *pool,
> struct gen_pool_chunk *chunk,
> void *data)
>  {
> - free_page(chunk->start_addr);
> + vfree((void *)chunk->start_addr);
>  }
>  
>  static void ghes_estatus_pool_exit(void)
>  {
>   gen_pool_for_each_chunk(ghes_estatus_pool,
> - ghes_estatus_pool_free_chunk_page, NULL);
> + ghes_estatus_pool_free_chunk, NULL);
>   gen_pool_destroy(ghes_estatus_pool);
>  }
>  
>  static int ghes_estatus_pool_expand(unsigned long len)
>  {
> - unsigned long i, pages, size, addr;
> - int ret;
> + unsigned long size, addr;
>  
>   ghes_estatus_pool_size_request += PAGE_ALIGN(len);

So here we increment with page-aligned len...

>   size = gen_pool_size(ghes_estatus_pool);
>   if (size >= ghes_estatus_pool_size_request)
>   return 0;
> - pages = (ghes_estatus_pool_size_request - size) / PAGE_SIZE;
> - for (i = 0; i < pages; i++) {
> - addr = __get_free_page(GFP_KERNEL);
> - if (!addr)
> - return -ENOMEM;
> - ret = gen_pool_add(ghes_estatus_pool, addr, PAGE_SIZE, -1);
> - if (ret)
> - return ret;
> - }
>  
> - return 0;
> + addr = (unsigned long)vmalloc(PAGE_ALIGN(len));
> + if (!addr)
> + return -ENOMEM;

... and if we return here due to the ENOMEM, that increment above
remains.

I see you're reworking all that stuff in the next patches which is cool,
thx. So I guess we should leave it as is, as the code before was broken
too.

IOW,

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 02/25] ACPI / APEI: Remove silent flag from ghes_read_estatus()

2018-12-04 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 06:05:50PM +, James Morse wrote:
> Subsequent patches will split up ghes_read_estatus(), at which
> point passing around the 'silent' flag gets annoying. This is to
> suppress prink() messages, which prior to commit 42a0bb3f7138
> ("printk/nmi: generic solution for safe printk in NMI"), were
> unsafe in NMI context.
> 
> This is no longer necessary, remove the flag. printk() messages
> are batched in a per-cpu buffer and printed via irq-work, or a call
> back from panic().
> 
> Signed-off-by: James Morse 
> ---
> Changes since v6:
>  * Moved earlier in the series,
>  * Tinkered with the commit message.
>  * switched to pr_warn_ratelimited() to shut checkpatch up
> 
> shutup checkpatch
> ---
>  drivers/acpi/apei/ghes.c | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 17/18] mm/memory-failure: increase queued recovery work's priority

2018-10-15 Thread Borislav Petkov
+ Peter.

On Fri, Sep 21, 2018 at 11:17:04PM +0100, James Morse wrote:
> arm64 can take an NMI-like error notification when user-space steps in
> some corrupt memory. APEI's GHES code will call memory_failure_queue()
> to schedule the recovery work. We then return to user-space, possibly
> taking the fault again.
> 
> Currently the arch code unconditionally signals user-space from this
> path, so we don't get stuck in this loop, but the affected process
> never benefits from memory_failure()s recovery work. To fix this we
> need to know the recovery work will run before we get back to user-space.
> 
> Increase the priority of the recovery work by scheduling it on the
> system_highpri_wq, then try to bump the current task off this CPU
> so that the recovery work starts immediately.
> 
> Reported-by: Xie XiuQi 
> Signed-off-by: James Morse 
> Reviewed-by: Punit Agrawal 
> Tested-by: Tyler Baicar 
> Tested-by: gengdongjiu 
> CC: Xie XiuQi 
> CC: gengdongjiu 
> ---
>  mm/memory-failure.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 0cd3de3550f0..4e7b115cea5a 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -56,6 +56,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1454,6 +1455,7 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, 
> memory_failure_cpu);
>   */
>  void memory_failure_queue(unsigned long pfn, int flags)
>  {
> + int cpu = smp_processor_id();
>   struct memory_failure_cpu *mf_cpu;
>   unsigned long proc_flags;
>   struct memory_failure_entry entry = {
> @@ -1463,11 +1465,14 @@ void memory_failure_queue(unsigned long pfn, int 
> flags)
>  
>   mf_cpu = _cpu_var(memory_failure_cpu);
>   spin_lock_irqsave(_cpu->lock, proc_flags);
> - if (kfifo_put(_cpu->fifo, entry))
> - schedule_work_on(smp_processor_id(), _cpu->work);
> - else
> + if (kfifo_put(_cpu->fifo, entry)) {
> + queue_work_on(cpu, system_highpri_wq, _cpu->work);
> + set_tsk_need_resched(current);
> + preempt_set_need_resched();

What guarantees the workqueue would run before the process? I see this:

``WQ_HIGHPRI``
  Work items of a highpri wq are queued to the highpri
  worker-pool of the target cpu.  Highpri worker-pools are
  served by worker threads with elevated nice level.

but is that enough?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol

2018-10-12 Thread Borislav Petkov
On Fri, Oct 12, 2018 at 06:17:48PM +0100, James Morse wrote:
> Ripping out the existing #ifdefs and replacing them with IS_ENABLED() would 
> let
> the compiler work out the estatus stuff is unused, and saves us describing the
> what-uses-it logic in Kconfig.
> 
> But this does expose the x86 nmi stuff on arm64, which doesn't build today.

Gah, that ifdeffery is one big mess. ;-\

One fine day...

> Dragging NMI_HANDLED and friends up to the 'linux' header causes a fair amount
> of noise under arch/x86 (include the new header in 22 files). Adding dummy
> declarations to arm64 fixes this, and doesn't affect the other architectures
> that have an asm/nmi.h
> 
> Alternatively we could leave {un,}register_nmi_handler() under
> CONFIG_HAVE_ACPI_APEI_NMI. I think we need to keep the NOTIFY_NMI kconfig 
> symbol
> around, as its one of the two I can't work out how to fix without the 
> TLBI-IPI.

Hmm, so I just tried the diff below with my arm64 cross compiler and a
defconfig with

CONFIG_ACPI_APEI_GHES=y
CONFIG_EDAC_GHES=y

and it did build fine. What am I missing?

---
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index 2b191e09b647..52ae5438edeb 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -4,7 +4,6 @@ config HAVE_ACPI_APEI
 
 config HAVE_ACPI_APEI_NMI
bool
-   select ACPI_APEI_GHES_ESTATUS_QUEUE
 
 config ACPI_APEI
bool "ACPI Platform Error Interface (APEI)"
@@ -34,10 +33,6 @@ config ACPI_APEI_GHES
  by firmware to produce more valuable hardware error
  information for Linux.
 
-config ACPI_APEI_GHES_ESTATUS_QUEUE
-   bool
-   depends on ACPI_APEI_GHES && ARCH_HAVE_NMI_SAFE_CMPXCHG
-
 config ACPI_APEI_PCIEAER
bool "APEI PCIe AER logging/recovering support"
depends on ACPI_APEI && PCIEAER
@@ -48,7 +43,6 @@ config ACPI_APEI_PCIEAER
 config ACPI_APEI_SEA
bool "APEI Synchronous External Abort logging/recovering support"
depends on ARM64 && ACPI_APEI_GHES
-   select ACPI_APEI_GHES_ESTATUS_QUEUE
default y
help
  This option should be enabled if the system supports
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 463c8e6d1bb5..8191d711564b 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -683,7 +683,6 @@ static void ghes_estatus_cache_add(
rcu_read_unlock();
 }
 
-#ifdef CONFIG_ACPI_APEI_GHES_ESTATUS_QUEUE
 /*
  * Handlers for CPER records may not be NMI safe. For example,
  * memory_failure_queue() takes spinlocks and calls schedule_work_on().
@@ -862,10 +861,6 @@ static void ghes_nmi_init_cxt(void)
init_irq_work(_proc_irq_work, ghes_proc_in_irq);
 }
 
-#else
-static inline void ghes_nmi_init_cxt(void) { }
-#endif /* CONFIG_ACPI_APEI_GHES_ESTATUS_QUEUE */
-
 static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
 {
int rc;


-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 15/18] ACPI / APEI: Only use queued estatus entry during _in_nmi_notify_one()

2018-10-12 Thread Borislav Petkov
On Fri, Sep 21, 2018 at 11:17:02PM +0100, James Morse wrote:
> Each struct ghes has an worst-case sized buffer for storing the
> estatus. If an error is being processed by ghes_proc() in process
> context this buffer will be in use. If the error source then triggers
> an NMI-like notification, the same buffer will be used by
> _in_nmi_notify_one() to stage the estatus data, before
> __process_error() copys it into a queued estatus entry.
> 
> Merge __process_error()s work into _in_nmi_notify_one() so that
> the queued estatus entry is used from the beginning. Use the
> ghes_peek_estatus() so we know how much memory to allocate from
> the ghes_estatus_pool before we read the records.
> 
> Reported-by: Borislav Petkov 
> Signed-off-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 45 
>  1 file changed, 22 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 055176ed68ac..a0c10b60ad44 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -722,40 +722,32 @@ static void ghes_print_queued_estatus(void)
>   }
>  }
>  
> -/* Save estatus for further processing in IRQ context */
> -static void __process_error(struct ghes *ghes,
> - struct acpi_hest_generic_status *ghes_estatus)
> +static int _in_nmi_notify_one(struct ghes *ghes, int fixmap_idx)
>  {
> + u64 buf_paddr;
> + int sev, rc = 0;
>   u32 len, node_len;
>   struct ghes_estatus_node *estatus_node;
>   struct acpi_hest_generic_status *estatus;

Please sort function local variables declaration in a reverse christmas
tree order:

 longest_variable_name;
 shorter_var_name;
 even_shorter;
 i;

>  
> - if (ghes_estatus_cached(ghes_estatus))
> - return;
> + rc = ghes_peek_estatus(ghes, fixmap_idx, _paddr, );

Oh ok, maybe not prefix it with "__" - we're using it somewhere else.

> + if (rc)
> + return rc;
>  
> - len = cper_estatus_len(ghes_estatus);
>   node_len = GHES_ESTATUS_NODE_LEN(len);
>  
>   estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool, node_len);
>   if (!estatus_node)
> - return;
> + return -ENOMEM;
>  
>   estatus_node->ghes = ghes;
>   estatus_node->generic = ghes->generic;
>   estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
> - memcpy(estatus, ghes_estatus, len);
> - llist_add(_node->llnode, _estatus_llist);
> -}
> -
> -static int _in_nmi_notify_one(struct ghes *ghes, int fixmap_idx)
> -{
> - int sev;
> - u64 buf_paddr;
> - struct acpi_hest_generic_status *estatus = ghes->estatus;
>  
> - if (ghes_read_estatus(ghes, estatus, _paddr, fixmap_idx)) {
> + if (__ghes_read_estatus(estatus, buf_paddr, len, fixmap_idx)) {
>   ghes_clear_estatus(estatus, buf_paddr, fixmap_idx);
> - return -ENOENT;
> + rc = -ENOENT;
> + goto no_work;
>   }
>  
>   sev = ghes_severity(estatus->error_severity);
> @@ -764,13 +756,20 @@ static int _in_nmi_notify_one(struct ghes *ghes, int 
> fixmap_idx)
>   __ghes_panic(ghes, estatus);
>   }
>  
> - if (!buf_paddr)
> - return 0;
> -
> - __process_error(ghes, estatus);
>   ghes_clear_estatus(estatus, buf_paddr, fixmap_idx);
>  
> - return 0;
> + if (!buf_paddr || ghes_estatus_cached(estatus))
> + goto no_work;
> +
> + llist_add(_node->llnode, _estatus_llist);
> +
> + return rc;
> +
> +no_work:
> + gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node,
> +   node_len);

Yeah, let it stick out.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 14/18] ACPI / APEI: Split ghes_read_estatus() to read CPER length

2018-10-12 Thread Borislav Petkov
On Fri, Sep 21, 2018 at 11:17:01PM +0100, James Morse wrote:
> ghes_read_estatus() reads the record address, then the record's
> header, then performs some sanity checks before reading the
> records into the provided estatus buffer.
> 
> We either need to know the size of the records before we call
> ghes_read_estatus(), or always provide a worst-case sized buffer,
> as happens today.
> 
> Add a function to peek at the record's header to find the size. This
> will let the NMI path allocate the right amount of memory before reading
> the records, instead of using the worst-case size, and having to copy
> the records.
> 
> Split ghes_read_estatus() to create ghes_peek_estatus() which
> returns the address and size of the CPER records.
> 
> Signed-off-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 55 ++--
>  1 file changed, 41 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 3028487d43a3..055176ed68ac 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -298,11 +298,12 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 
> paddr, u32 len,
>   }
>  }
>  
> -static int ghes_read_estatus(struct ghes *ghes,
> -  struct acpi_hest_generic_status *estatus,
> -  u64 *buf_paddr, int fixmap_idx)
> +/* read the CPER block returning its address and size */

Make that comment a proper sentence:

"./* ... Read the CPER ... and size. */

> +static int ghes_peek_estatus(struct ghes *ghes, int fixmap_idx,
> +  u64 *buf_paddr, u32 *buf_len)
>  {

I find the functionality split a bit strange:

ghes_peek_estatus() does peek *and* verify sizes. The latter belongs
maybe better in ghes_read_estatus(). Together with the
cper_estatus_check_header() call. Or maybe into a separate

__ghes_check_estatus()

to separate it all nicely.

>   struct acpi_hest_generic *g = ghes->generic;
> + struct acpi_hest_generic_status estatus;
>   u32 len;
>   int rc;
>  
> @@ -317,26 +318,23 @@ static int ghes_read_estatus(struct ghes *ghes,
>   if (!*buf_paddr)
>   return -ENOENT;
>  
> - ghes_copy_tofrom_phys(estatus, *buf_paddr,
> -   sizeof(*estatus), 1, fixmap_idx);
> - if (!estatus->block_status) {
> + ghes_copy_tofrom_phys(, *buf_paddr,
> +   sizeof(estatus), 1, fixmap_idx);
> + if (!estatus.block_status) {
>   *buf_paddr = 0;
>   return -ENOENT;
>   }
>  
>   rc = -EIO;
> - len = cper_estatus_len(estatus);
> - if (len < sizeof(*estatus))
> + len = cper_estatus_len();
> + if (len < sizeof(estatus))
>   goto err_read_block;
>   if (len > ghes->generic->error_block_length)
>   goto err_read_block;
> - if (cper_estatus_check_header(estatus))
> - goto err_read_block;
> - ghes_copy_tofrom_phys(estatus + 1,
> -   *buf_paddr + sizeof(*estatus),
> -   len - sizeof(*estatus), 1, fixmap_idx);
> - if (cper_estatus_check(estatus))
> + if (cper_estatus_check_header())
>   goto err_read_block;
> + *buf_len = len;
> +
>   rc = 0;
>  
>  err_read_block:
> @@ -346,6 +344,35 @@ static int ghes_read_estatus(struct ghes *ghes,
>   return rc;
>  }
>  
> +static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
> +u64 buf_paddr, size_t buf_len,
> +int fixmap_idx)
> +{
> + ghes_copy_tofrom_phys(estatus, buf_paddr, buf_len, 1, fixmap_idx);
> + if (cper_estatus_check(estatus)) {
> + if (printk_ratelimit())
> + pr_warning(FW_WARN GHES_PFX
> +"Failed to read error status block!\n");

Then you won't have to have two identical messages:

"Failed to read error status block!\n"

which, when one sees them, is hard to figure out where exactly in the
code that happened.

> + return -EIO;
> + }
> +
> + return 0;
> +}
> +
> +static int ghes_read_estatus(struct ghes *ghes,
> +  struct acpi_hest_generic_status *estatus,
> +  u64 *buf_paddr, int fixmap_idx)
> +{
> + int rc;
> + u32 buf_len;
> +
> + rc = ghes_peek_estatus(ghes, fixmap_idx, buf_paddr, _len);

Also, if we have a __ghes_read_estatus() helper now, maybe prefixing
ghes_peek_estatus() with "__" would make sense too...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 13/18] ACPI / APEI: Don't update struct ghes' flags in read/clear estatus

2018-10-12 Thread Borislav Petkov
On Fri, Sep 21, 2018 at 11:17:00PM +0100, James Morse wrote:
> ghes_read_estatus() sets a flag in struct ghes if the buffer of
> CPER records needs to be cleared once the records have been
> processed. This global flags value is a problem if a struct ghes
> can be processed concurrently, as happens at probe time if an
> NMI arrives for the same error source.
> 
> The GHES_TO_CLEAR flags was only set at the same time as
> buffer_paddr, which is now owned by the caller and passed to
> ghes_clear_estatus(). Use this as the flag.
> 
> A non-zero buf_paddr returned by ghes_read_estatus() means
> ghes_clear_estatus() will clear this address. ghes_read_estatus()
> already checks for a read of error_status_address being zero,
> so we can never get CPER records written at zero.
> 
> After this ghes_clear_estatus() no longer needs the struct ghes.
> 
> Signed-off-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 26 --
>  include/acpi/ghes.h  |  1 -
>  2 files changed, 12 insertions(+), 15 deletions(-)

Nice.

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 11/18] ACPI / APEI: Remove silent flag from ghes_read_estatus()

2018-10-12 Thread Borislav Petkov
On Fri, Sep 21, 2018 at 11:16:58PM +0100, James Morse wrote:
> Subsequent patches will split up ghes_read_estatus(), at which
> point passing around the 'silent' flag gets annoying. This is to
> suppress prink() messages, which prior to 42a0bb3f7138 ("printk/nmi:
> generic solution for safe printk in NMI"), were unsafe in NMI context.

Put that commit onto a separate line:

"... which prior to

  42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI")

were unsafe ..."

This way it is immediately visible.

In any case, this patch looks like a cleanup so move it to the beginning
of the queue, I'd say.

> We don't need to do this anymore, remove the flag. printk() messages
> are batched in a per-cpu buffer and printed via irq-work, or a call
> back from panic().
> 
> Signed-off-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 586689cbc0fd..ba5344d26a39 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -300,7 +300,7 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 
> paddr, u32 len,
>  
>  static int ghes_read_estatus(struct ghes *ghes,
>struct acpi_hest_generic_status *estatus,
> -  int silent, int fixmap_idx)
> +  int fixmap_idx)
>  {
>   struct acpi_hest_generic *g = ghes->generic;
>   u64 buf_paddr;
> @@ -309,7 +309,7 @@ static int ghes_read_estatus(struct ghes *ghes,
>  
>   rc = apei_read(_paddr, >error_status_address);
>   if (rc) {
> - if (!silent && printk_ratelimit())
> + if (printk_ratelimit())

Btw, checkpatch complains here:

WARNING: Prefer printk_ratelimited or pr__ratelimited to printk_ratelimit
#57: FILE: drivers/acpi/apei/ghes.c:312:
+   if (printk_ratelimit())

WARNING: Prefer printk_ratelimited or pr__ratelimited to printk_ratelimit
#66: FILE: drivers/acpi/apei/ghes.c:345:
+   if (rc && printk_ratelimit())


-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 10/18] ACPI / APEI: preparatory split of ghes->estatus

2018-10-12 Thread Borislav Petkov
Nitpick:

Subject: Re: [PATCH v6 10/18] ACPI / APEI: preparatory split of ghes->estatus

Pls have an active formulation in your Subject and start it with a capital
letter, i.e., something like:

"Split ghes->estatus in preparation for... "

On Fri, Sep 21, 2018 at 11:16:57PM +0100, James Morse wrote:
> The NMI-like notifications scribble over ghes->estatus, before
> copying it somewhere else. If this interrupts the ghes_probe() code
> calling ghes_proc() on each struct ghes, the data is corrupted.
> 
> We want the NMI-like notifications to use a queued estatus entry

Pls formulate commit messages in passive voice.

> from the beginning. To that end, break up any use of "ghes->estatus"
> so that all functions take the estatus as an argument.
> 
> This patch is just moving code around, no change in behaviour.
> 
> Signed-off-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 82 ++--
>  1 file changed, 45 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index adf7fd402813..586689cbc0fd 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -298,7 +298,9 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 
> paddr, u32 len,
>   }
>  }
>  
> -static int ghes_read_estatus(struct ghes *ghes, int silent, int fixmap_idx)
> +static int ghes_read_estatus(struct ghes *ghes,
> +  struct acpi_hest_generic_status *estatus,

acpi_hest_generic_status - geez, could this name have been any longer ?!

> +  int silent, int fixmap_idx)
>  {
>   struct acpi_hest_generic *g = ghes->generic;
>   u64 buf_paddr;
> @@ -316,26 +318,26 @@ static int ghes_read_estatus(struct ghes *ghes, int 
> silent, int fixmap_idx)
>   if (!buf_paddr)
>   return -ENOENT;
>  
> - ghes_copy_tofrom_phys(ghes->estatus, buf_paddr,
> -   sizeof(*ghes->estatus), 1, fixmap_idx);
> - if (!ghes->estatus->block_status)
> + ghes_copy_tofrom_phys(estatus, buf_paddr,
> +   sizeof(*estatus), 1, fixmap_idx);

Yeah, let that line stick out - it is easier to follow the code this
way.

> + if (!estatus->block_status)
>   return -ENOENT;
>  
>   ghes->buffer_paddr = buf_paddr;
>   ghes->flags |= GHES_TO_CLEAR;
>  
>   rc = -EIO;
> - len = cper_estatus_len(ghes->estatus);
> - if (len < sizeof(*ghes->estatus))
> + len = cper_estatus_len(estatus);
> + if (len < sizeof(*estatus))
>   goto err_read_block;
>   if (len > ghes->generic->error_block_length)
>   goto err_read_block;
> - if (cper_estatus_check_header(ghes->estatus))
> + if (cper_estatus_check_header(estatus))
>   goto err_read_block;
> - ghes_copy_tofrom_phys(ghes->estatus + 1,
> -   buf_paddr + sizeof(*ghes->estatus),
> -   len - sizeof(*ghes->estatus), 1, fixmap_idx);
> - if (cper_estatus_check(ghes->estatus))
> + ghes_copy_tofrom_phys(estatus + 1,
> +   buf_paddr + sizeof(*estatus),
> +   len - sizeof(*estatus), 1, fixmap_idx);
> + if (cper_estatus_check(estatus))
>   goto err_read_block;
>   rc = 0;
>  
> @@ -346,13 +348,15 @@ static int ghes_read_estatus(struct ghes *ghes, int 
> silent, int fixmap_idx)
>   return rc;
>  }
>  
> -static void ghes_clear_estatus(struct ghes *ghes, int fixmap_idx)
> +static void ghes_clear_estatus(struct ghes *ghes,
> +struct acpi_hest_generic_status *estatus,
> +int fixmap_idx)
>  {
> - ghes->estatus->block_status = 0;
> + estatus->block_status = 0;
>   if (!(ghes->flags & GHES_TO_CLEAR))
>   return;

< newline here.

> - ghes_copy_tofrom_phys(ghes->estatus, ghes->buffer_paddr,
> -   sizeof(ghes->estatus->block_status), 0, 
> fixmap_idx);
> + ghes_copy_tofrom_phys(estatus, ghes->buffer_paddr,
> +   sizeof(estatus->block_status), 0, fixmap_idx);
>   ghes->flags &= ~GHES_TO_CLEAR;
>  }
>  
> @@ -518,9 +522,10 @@ static int ghes_print_estatus(const char *pfx,
>   return 0;
>  }
>  
> -static void __ghes_panic(struct ghes *ghes)
> +static void __ghes_panic(struct ghes *ghes,
> +  struct acpi_hest_generic_status *estatus)

Yeah, let that one stick out too. That struct naming needs slimming.

>  {
> - __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
> + __ghes_print_estatus(KERN_EMERG, ghes->generic, estatus);
>  
>   /* reboot to log the error! */
>   if (!panic_timeout)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu

Re: [PATCH v6 08/18] ACPI / APEI: Move locking to the notification helper

2018-10-12 Thread Borislav Petkov
On Fri, Sep 21, 2018 at 11:16:55PM +0100, James Morse wrote:
> ghes_copy_tofrom_phys() takes different locks depending on in_nmi().
> This doesn't work when we have multiple NMI-like notifications, that
> can interrupt each other.
> 
> Now that NOTIFY_SEA is always called as an NMI, move the lock-taking
> to the notification helper. The helper will always know which lock
> to take. This avoids ghes_copy_tofrom_phys() taking a guess based
> on in_nmi().
> 
> This splits NOTIFY_NMI and NOTIFY_SEA to use different locks. All
> the other notifications use ghes_proc(), and are called in process
> or IRQ context. Move the spin_lock_irqsave() around their ghes_proc()
> calls.

Right, should ghes_proc() be renamed to ghes_proc_irq() now, to be
absolutely clear on the processing context it is operating in?

Other than that:

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 07/18] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface

2018-10-12 Thread Borislav Petkov
On Fri, Sep 21, 2018 at 11:16:54PM +0100, James Morse wrote:
> To split up APEIs in_nmi() path, we need the nmi-like callers to always
> be in_nmi(). Add a helper to do the work and claim the notification.
> 
> When KVM or the arch code takes an exception that might be a RAS
> notification, it asks the APEI firmware-first code whether it wants
> to claim the exception. We can then go on to see if (a future)
> kernel-first mechanism wants to claim the notification, before
> falling through to the existing default behaviour.
> 
> The NOTIFY_SEA code was merged before we had multiple, possibly
> interacting, NMI-like notifications and the need to consider kernel
> first in the future. Make the 'claiming' behaviour explicit.
> 
> As we're restructuring the APEI code to allow multiple NMI-like
> notifications, any notification that might interrupt interrupts-masked
> code must always be wrapped in nmi_enter()/nmi_exit(). This allows APEI
> to use in_nmi() to use the right fixmap entries.
> 
> We mask SError over this window to prevent an asynchronous RAS error
> arriving and tripping 'nmi_enter()'s BUG_ON(in_nmi()).
> 
> Signed-off-by: James Morse 
> Acked-by: Marc Zyngier 
> Tested-by: Tyler Baicar 

...

> diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
> index ed46dc188b22..a9b8bba014b5 100644
> --- a/arch/arm64/kernel/acpi.c
> +++ b/arch/arm64/kernel/acpi.c
> @@ -28,8 +28,10 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -257,3 +259,30 @@ pgprot_t __acpi_get_mem_attribute(phys_addr_t addr)
>   return __pgprot(PROT_NORMAL_NC);
>   return __pgprot(PROT_DEVICE_nGnRnE);
>  }
> +
> +/*
> + * Claim Synchronous External Aborts as a firmware first notification.
> + *
> + * Used by KVM and the arch do_sea handler.
> + * @regs may be NULL when called from process context.
> + */
> +int apei_claim_sea(struct pt_regs *regs)
> +{
> + int err = -ENOENT;
> + unsigned long current_flags = arch_local_save_flags();
> +
> + if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA))
> + return err;

I don't know what side effects arch_local_save_flags() has on ARM but if
we return here, it looks to me like useless work.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 06/18] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing

2018-10-12 Thread Borislav Petkov
On Fri, Sep 21, 2018 at 11:16:53PM +0100, James Morse wrote:
> To split up APEIs in_nmi() path, we need any nmi-like callers to always
> be in_nmi(). KVM shouldn't have to know about this, pull the RAS plumbing
> out into a header file.
> 
> Currently guest synchronous external aborts are claimed as RAS
> notifications by handle_guest_sea(), which is hidden in the arch codes
> mm/fault.c. 32bit gets a dummy declaration in system_misc.h.
> 
> There is going to be more of this in the future if/when we support
> the SError-based firmware-first notification mechanism and/or
> kernel-first notifications for both synchronous external abort and
> SError. Each of these will come with some Kconfig symbols and a
> handful of header files.
> 
> Create a header file for all this.
> 
> This patch gives handle_guest_sea() a 'kvm_' prefix, and moves the
> declarations to kvm_ras.h as preparation for a future patch that moves
> the ACPI-specific RAS code out of mm/fault.c.
> 
> Signed-off-by: James Morse 
> Reviewed-by: Punit Agrawal 
> Acked-by: Marc Zyngier 
> Tested-by: Tyler Baicar 
> ---
>  arch/arm/include/asm/kvm_ras.h   | 14 ++
>  arch/arm/include/asm/system_misc.h   |  5 -
>  arch/arm64/include/asm/kvm_ras.h | 11 +++
>  arch/arm64/include/asm/system_misc.h |  2 --
>  arch/arm64/mm/fault.c|  2 +-
>  virt/kvm/arm/mmu.c   |  4 ++--
>  6 files changed, 28 insertions(+), 10 deletions(-)
>  create mode 100644 arch/arm/include/asm/kvm_ras.h
>  create mode 100644 arch/arm64/include/asm/kvm_ras.h
> 
> diff --git a/arch/arm/include/asm/kvm_ras.h b/arch/arm/include/asm/kvm_ras.h
> new file mode 100644
> index ..aaff56bf338f
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_ras.h
> @@ -0,0 +1,14 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2018 - Arm Ltd

checkpatch is complaining for some reason:

WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
#66: FILE: arch/arm/include/asm/kvm_ras.h:1:
+// SPDX-License-Identifier: GPL-2.0

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol

2018-10-04 Thread Borislav Petkov
On Wed, Oct 03, 2018 at 06:50:36PM +0100, James Morse wrote:
> I'm all in favour of letting the compiler work it out, but the existing ghes
> code has #ifdef/#else all over the place. This is 'keeping the style'.

Yeah, but this "style" is not the optimal one and we should
simplify/clean up and fix this thing.

Swapping the order of your statements here:

> The ACPI spec has four ~NMI notifications, so far the support for
> these in Linux has been selectable separately.

Yes, but: distro kernels end up enabling all those options anyway and
distro kernels are 90-ish% of the setups. Which means, this will get
enabled anyway and this additional Kconfig symbol is simply going to be
one automatic reply "Yes".

So let's build it in by default and if someone complains about it, we
can always carve it out. But right now I don't see the need for the
unnecessary separation...

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 00/18] APEI in_nmi() rework

2018-10-04 Thread Borislav Petkov
On Wed, Oct 03, 2018 at 06:50:38PM +0100, James Morse wrote:

...

> The non-ghes HEST entries have a "number of records to pre-allocate" too, we
> could make this memory pool something hest.c looks after, but I can't see if 
> the
> other error sources use those values.

Thanks for the detailed analysis!

> Hmmm, The size is capped to 64K, we could ignore the firmware description of 
> the
> memory requirements, and allocate SZ_64K each time. Doing it per-GHES is still
> the only way to avoid allocating nmi-safe memory for irqs.

Right, so I'm thinking a lot simpler: allocate a pool which should
be large enough to handle all situations and drop all that logic
which recomputes and reallocates pool size. Just a static thing which
JustWorks(tm).

For a couple of reasons:

 - you state it above: all those synchronization issues are gone with a
 prellocated pool

 - 64K per-GHES pool is nothing if you consider the machines this thing
 runs on - fat servers with lotsa memory. And RAS there *is* important.
 And TBH 64K is nothing even on a small client sporting gigabytes of
 memory.

 - code is a lot simpler and cleaner - you don't need all that pool
 expanding and shrinking. I mean, I'm all for smarter solutions if they
 have any clear advantages warranting the complication but this is a
 lot of machinery just so that we can save a couple of KBs. Which, as a
 whole, sounds just too much to me.

But this is just me.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol

2018-10-01 Thread Borislav Petkov
On Fri, Sep 21, 2018 at 11:16:52PM +0100, James Morse wrote:
> Now that there are two users of the estatus queue, and likely to be more,
> make it a Kconfig symbol selected by the appropriate notification. We
> can move the ARCH_HAVE_NMI_SAFE_CMPXCHG checks in here too.

Ok, question: why do we need to complicate things at all? I mean, why do
we even need a Kconfig symbol?

This code is being used by two arches now so why not simply build it in
unconditionally and be done with it. The couple of KB saved are simply
not worth the effort, especially if it is going to end up being enabled
on 99% of the setups...

Or?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 04/18] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue

2018-09-28 Thread Borislav Petkov
On Fri, Sep 21, 2018 at 11:16:51PM +0100, James Morse wrote:
> Now that the estatus queue can be used by more than one notification
> method, we can move notifications that have NMI-like behaviour over to
> it, and start abstracting GHES's single in_nmi() path.
> 
> Switch NOTIFY_SEA over to use the estatus queue. This makes it behave
> in the same way as x86's NOTIFY_NMI.
> 
> Signed-off-by: James Morse 
> Reviewed-by: Punit Agrawal 
> Tested-by: Tyler Baicar 
> ---
>  drivers/acpi/apei/ghes.c | 23 +++
>  1 file changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index d7c46236b353..150fb184c7cb 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -58,6 +58,10 @@
>  
>  #define GHES_PFX "GHES: "
>  
> +#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA)
> +#define WANT_NMI_ESTATUS_QUEUE   1
> +#endif

Is that just so that you have shorter ifdeffery lines? Because if so, an
additional level of indirection is silly. Or maybe there's more coming -
I'll see when I continue going through this set. :)

Otherwise looks good - trying to reuse the facilities and all. Better. :)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 00/18] APEI in_nmi() rework

2018-09-25 Thread Borislav Petkov
On Fri, Sep 21, 2018 at 11:16:47PM +0100, James Morse wrote:
> Hello,
> 
> The GHES driver has collected quite a few bugs:
> 
> ghes_proc() at ghes_probe() time can be interrupted by an NMI that
> will clobber the ghes->estatus fields, flags, and the buffer_paddr.
> 
> ghes_copy_tofrom_phys() uses in_nmi() to decide which path to take. arm64's
> SEA taking both paths, depending on what it interrupted.
> 
> There is no guarantee that queued memory_failure() errors will be processed
> before this CPU returns to user-space.
> 
> x86 can't TLBI from interrupt-masked code which this driver does all the
> time.
> 
> 
> This series aims to fix the first three, with an eye to fixing the
> last one with a follow-up series.
> 
> Previous postings included the SDEI notification calls, which I haven't
> finished re-testing. This series is big enough as it is.

Yeah, and everywhere I look, this thing looks overengineered. Like,
for example, what's the purpose of this ghes_esource_prealloc_size()
computing a size each time the pool changes size?

AFAICT, this size can be computed exactly *once* at driver init and be
done with it. Right?

Or am I missing something subtle?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users

2018-05-17 Thread Borislav Petkov
On Wed, May 16, 2018 at 11:38:16AM -0400, Tyler Baicar wrote:
> I haven't seen a deadlock from that, but it looks possible. What if
> the ghes_proc() call in ghes_probe() is moved before the second switch
> statement? That way it is before the NMI/IRQ/poll is setup. At quick
> glance I think that should avoid the deadlock and still provide the
> functionality that call was added for. I can test that out if you all
> agree.

Makes sense but please audit it properly before doing the change. That
code is full of landmines and could use a proper scrubbing first.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users

2018-05-17 Thread Borislav Petkov
On Wed, May 16, 2018 at 03:51:14PM +0100, James Morse wrote:
> The first two overload existing architectural behavior, the third improves all
> this with a third standard option. Its the standard story!

:-)

> I thought this was safe because its just ghes_copy_tofrom_phys()s access to 
> the
> fixmap slots that needs mutual exclusion.
>
> Polled and all the IRQ flavours are kept apart by the spin_lock_irqsave(), and
> the NMIs have their own fixmap entry. (This is fine until there is more than
> once source of NMI)

For example:

ghes_probe()

switch (generic->notify.type) {

...

case ACPI_HEST_NOTIFY_NMI:
ghes_nmi_add(ghes);
}

...

ghes_proc();
  ghes_read_estatus();
 spin_lock_irqsave(_ioremap_lock_irq, flags);

 memcpy...

-> NMI

ghes_notify_nmi();
 ghes_read_estatus();
 ..
   if (in_nmi) {
   raw_spin_lock(_ioremap_lock_nmi);

...
<- NMI

ghes->estatus from above, before the NMI fired, has gotten some nice
scribbling over. AFAICT.

Now, I don't know whether this can happen with the ARM facilities but if
they're NMI-like, I don't see why not.

Which means, that this code is not really reentrant and if should be
fixed to be callable from different contexts, then it should use private
buffers and be careful about locking.

Oh, and that

if (in_nmi)
lock()
else
lock_irqsave()

pattern is really yucky. And it is an explosion waiting to happen.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users

2018-05-16 Thread Borislav Petkov
On Tue, May 08, 2018 at 09:45:01AM +0100, James Morse wrote:
> NOTIFY_NMI is x86's NMI, arm doesn't have anything that behaves in the same 
> way,
> so doesn't use it. The equivalent notifications with NMI-like behaviour are:
> * SEA (synchronous external abort)
> * SEI (SError Interrupt)
> * SDEI (software delegated exception interface)

Oh wow, three! :)

> Alternatively, I can put the fixmap-page and spinlock in some 'struct
> ghes_notification' that only the NMI-like struct-ghes need. This is just 
> moving
> the indirection up a level, but it does pair the lock with the thing it locks,
> and gets rid of assigning spinlock pointers.

Keeping the lock and what it protects in one place certainly sounds
better. I guess you could so something like this:

struct ghes_fixmap {
 union {
  raw_spinlock_t nmi_lock;
   spinlock_t lock;
 };
 void __iomem *(map)(struct ghes_fixmap *);
};

and assign the proper ghes_ioremap function to ->map.

The spin_lock_irqsave() call in ghes_copy_tofrom_phys() is kinda
questionable. Because we should have disabled interrupts so that you can
do

spin_lock(map->lock);

Except that we do get called with IRQs on and looking at that call of
ghes_proc() at the end of ghes_probe(), that's a deadlock waiting to
happen.

And that comes from:

  77b246b32b2c ("acpi: apei: check for pending errors when probing GHES 
entries")

Tyler, this can't work in any context: imagine the GHES NMI or IRQ or
the timer fires while that ghes_proc() runs...

What's up?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users

2018-05-05 Thread Borislav Petkov
On Fri, Apr 27, 2018 at 04:35:05PM +0100, James Morse wrote:
> Arm64 has multiple NMI-like notifications, but ghes.c only has one
> in_nmi() path, risking deadlock if one NMI-like notification can
> interrupt another.
> 
> To support this we need a fixmap entry and lock for each notification
> type. But ghes_probe() attempts to process each struct ghes at probe
> time, to ensure any error that was notified before ghes_probe() was
> called has been done, and the buffer released (and maybe acknowledge
> to firmware) so that future errors can be delivered.
> 
> This means NMI-like notifications need two fixmap entries and locks,
> one for the ghes_probe() time call, and another for the actual NMI
> that could interrupt ghes_probe().
> 
> Split this single path up by adding an NMI fixmap idx and lock into
> the struct ghes. Any notification that can be called as an NMI can
> use these to separate its resources from any other notification it
> may interrupt.
> 
> The majority of notifications occur in IRQ context, so unless its
> called in_nmi(), ghes_copy_tofrom_phys() will use the FIX_APEI_GHES_IRQ
> fixmap entry and the ghes_fixmap_lock_irq lock. This allows
> NMI-notifications to be processed by ghes_probe(), and then taken
> as an NMI.
> 
> The double-underscore version of fix_to_virt() is used because the index
> to be mapped can't be tested against the end of the enum at compile
> time.
> 
> Signed-off-by: James Morse 
> 
> ---
> Changes since v1:
>  * Fixed for ghes_proc() always calling every notification in process context.
>Now only NMI-like notifications need an additional fixmap-slot/lock.

...

> @@ -986,6 +960,8 @@ int ghes_notify_sea(void)
>  
>  static void ghes_sea_add(struct ghes *ghes)
>  {
> + ghes->nmi_fixmap_lock = _fixmap_lock_nmi;
> + ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
>   ghes_estatus_queue_grow_pool(ghes);
>  
>   mutex_lock(_list_mutex);
> @@ -1032,6 +1008,8 @@ static int ghes_notify_nmi(unsigned int cmd, struct 
> pt_regs *regs)
>  
>  static void ghes_nmi_add(struct ghes *ghes)
>  {
> + ghes->nmi_fixmap_lock = _fixmap_lock_nmi;

Ewww, we're assigning the spinlock to a pointer which we'll take later?
Yuck.

Why?

Do I see it correctly that one can have ACPI_HEST_NOTIFY_SEA and
ACPI_HEST_NOTIFY_NMI coexist in parallel on a single system?

If not, you can use a single spinlock.

If yes, then I'd prefer to make it less ugly and do the notification
type check ghes_probe() does:

switch (generic->notify.type)

and take the respective spinlock in ghes_copy_tofrom_phys(). This way it
is a bit better than using a spinlock ptr.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 02/12] ACPI / APEI: Generalise the estatus queue's add/remove and notify code

2018-05-05 Thread Borislav Petkov
On Fri, Apr 27, 2018 at 04:35:00PM +0100, James Morse wrote:
> To support asynchronous NMI-like notifications on arm64 we need to use
> the estatus-queue. These patches refactor it to allow multiple APEI
> notification types to use it.
> 
> Refactor the estatus queue's pool grow/shrink code and notification
> routine from NOTIFY_NMI's handlers. This will allow another notification
> method to use the estatus queue without duplicating this code.

These two are repeated from patch 1.

> This patch adds rcu_read_lock()/rcu_read_unlock() around the list
> list_for_each_entry_rcu() walker. These aren't strictly necessary as
> the whole nmi_enter/nmi_exit() window is a spooky RCU read-side
> critical section.
> 
> Keep the oops_begin() call for x86, arm64 doesn't have one of these,
> and APEI is the only thing outside arch code calling this..

Next patch removes it so I guess you don't have to talk about it here.

> The existing ghes_estatus_pool_shrink() is folded into the new
> ghes_estatus_queue_shrink_pool() as only the queue uses it.
> 
> _in_nmi_notify_one() is separate from the rcu-list walker for a later
> caller that doesn't need to walk a list.
> 
> Signed-off-by: James Morse 
> Reviewed-by: Punit Agrawal 
> 
> ---
> Changes since v1:
>  * Tidied up _in_nmi_notify_one().
> 
>  drivers/acpi/apei/ghes.c | 100 
> ++-
>  1 file changed, 65 insertions(+), 35 deletions(-)

...

> +static int ghes_estatus_queue_notified(struct list_head *rcu_list)
> +{
> + int ret = -ENOENT;
> + struct ghes *ghes;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(ghes, rcu_list, list) {
> + if (!_in_nmi_notify_one(ghes))
> + ret = 0;
> + }
> + rcu_read_unlock();
> +
> + if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) && ret == 0)

... && !ret

like the rest of the file.

> + irq_work_queue(_proc_irq_work);
> +
> + return ret;
> +}
> +
>  static unsigned long ghes_esource_prealloc_size(
>   const struct acpi_hest_generic *generic)
>  {

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef

2018-05-05 Thread Borislav Petkov
On Fri, Apr 27, 2018 at 04:34:59PM +0100, James Morse wrote:
> To support asynchronous NMI-like notifications on arm64 we need to use
> the estatus-queue. These patches refactor it to allow multiple APEI
> notification types to use it.
> 
> First we move the estatus-queue code higher in the file so that any
> notify_foo() handler can make use of it.
> 
> This patch moves code around ... and makes the following trivial change:
> Freshen the dated comment above ghes_estatus_llist. printk() is no
> longer the issue, its the helpers like memory_failure_queue() that
> still aren't nmi safe.
> 
> Signed-off-by: James Morse <james.mo...@arm.com>
> Reviewed-by: Punit Agrawal <punit.agra...@arm.com>
> 
> Notes for cover letter:
> ghes.c has three things all called 'estatus'. One is a pool of memory
> that has a static size, and is grown/shrunk when new NMI users are
> allocated.
> The second is the cache, this holds recent notifications so we can
> suppress notifications we've already handled.
> The last is the queue, which hold data from NMI notifications (in pool
> memory) that can't be handled immediatly.
> ---
>  drivers/acpi/apei/ghes.c | 265 
> -------
>  1 file changed, 137 insertions(+), 128 deletions(-)

Reviewed-by: Borislav Petkov <b...@suse.de>

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 02/11] ACPI / APEI: Generalise the estatus queue's add/remove and notify code

2018-04-17 Thread Borislav Petkov
On Wed, Mar 28, 2018 at 05:30:55PM +0100, James Morse wrote:
> -%<-
> ACPI / APEI: don't wait to serialise with oops messages when panic()ing
> 
> oops_begin() exists to group printk() messages with the oops message
> printed by die(). To reach this caller we know that platform firmware
> took this error first, then notified the OS via NMI with a 'panic'
> severity.
> 
> Don't wait for another CPU to release the die-lock before we can
> panic(), our only goal is to print this fatal error and panic().
> 
> This code is always called in_nmi(), and since 42a0bb3f7138 ("printk/nmi:
> generic solution for safe printk in NMI"), it has been safe to call
> printk() from this context. Messages are batched in a per-cpu buffer
> and printed via irq-work, or a call back from panic().
> 
> Signed-off-by: James Morse <james.mo...@arm.com>
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 22f6ea5b9ad5..f348e6540960 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -34,7 +34,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -736,9 +735,6 @@ static int _in_nmi_notify_one(struct ghes *ghes)
> 
> sev = ghes_severity(ghes->estatus->error_severity);
> if (sev >= GHES_SEV_PANIC) {
> -#ifdef CONFIG_X86
> -   oops_begin();
> -#endif
> ghes_print_queued_estatus();
> __ghes_panic(ghes);
> }
> -%<-

Acked-by: Borislav Petkov <b...@suse.de>

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 02/11] ACPI / APEI: Generalise the estatus queue's add/remove and notify code

2018-03-27 Thread Borislav Petkov
Hi James,

On Mon, Mar 19, 2018 at 02:29:13PM +, James Morse wrote:
> I don't think the die_lock really helps here, do we really want to wait for a
> remote CPU to finish printing an OOPs about user-space's bad memory accesses,
> before we bring the machine down due to this system-wide fatal RAS error? The
> presence of firmware-first means we know this error, and any other oops are
> unrelated.

Hmm, now that you put it this way...

> I'd like to leave this under the x86-ifdef for now. For arm64 it would be an
> APEI specific arch hook to stop the arch code from printing some messages,

... I'm thinking we should ignore the whole serializing of oopses and
really dump that hw error ASAP. If it really is a fatal error, our main
and only goal is to get it out as fast as possible so that it has the
highest chance to appear on some screen or logging facility and thus the
system can be serviced successfully.

And the other oopses have lower prio.

Hmmm?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 02/11] ACPI / APEI: Generalise the estatus queue's add/remove and notify code

2018-03-08 Thread Borislav Petkov
On Wed, Mar 07, 2018 at 06:15:02PM +, James Morse wrote:
> Today its just x86 and arm64. arm64 doesn't have a hook to do this. I'm happy 
> to
> add an empty declaration or leave it under an ifdef until someone complains
> about any behaviour I missed!

So I did some more staring at the code and I think oops_begin() is
needed mainly, as you point out, to prevent two oops messages from
interleaving. And yap, the other stuff with printk() is not true anymore
because the commit which added oops_begin():

  81e88fdc432a ("ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI 
notification type support")

still saw an NMI-unsafe printk. Which is long taken care of now.

So only the interleaving issue remains.

Which begs the question: how are you guys preventing the interleaving on
arm64? Because arch/arm64/kernel/traps.c:200 grabs the die_lock too, so
interleaving can happen on arm64 too, AFAICT.

And by that logic, you should technically grab that lock here too in
_in_nmi_notify_one().

Or?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 02/11] ACPI / APEI: Generalise the estatus queue's add/remove and notify code

2018-03-01 Thread Borislav Petkov
On Thu, Mar 01, 2018 at 06:06:59PM +, Punit Agrawal wrote:
> You're looking at support for the 32-bit ARM systems.

I know. That's why I'm asking.

> The 64-bit support lives in arch/arm64 and the die() there doesn't
> contain an oops_begin()/oops_end(). But the lack of oops_begin() on
> arm64 doesn't really matter here.

Yap.

> One issue I see with calling die() is that it is defined in different
> includes across various architectures, (e.g., include/asm/kdebug.h for
> x86, include/asm/system_misc.h in arm64, etc.)

I don't think that's insurmountable.

The more important question is, can we do the same set of calls when
panic severity on all architectures which support APEI or should we have
arch-specific ghes_panic() callbacks or so.

As it is now, it would turn into a mess if we start with the ifdeffery
and the different requirements architectures might have...

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 02/11] ACPI / APEI: Generalise the estatus queue's add/remove and notify code

2018-03-01 Thread Borislav Petkov
On Thu, Feb 15, 2018 at 06:55:57PM +, James Morse wrote:
> Keep the oops_begin() call for x86,

That oops_begin() in generic code is such a layering violation, grrr.

> arm64 doesn't have one of these,
> and APEI is the only thing outside arch code calling this..

So looking at:

arch/arm/kernel/traps.c:die()

it does call oops_begin() ... oops_end() just like the x86 version of
die().

I'm wondering if we could move the code to do die() in a prepatch? My
assumption is that all the arches should have die()... A quick grep does
show a bunch of other arches having die()...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 01/11] ACPI / APEI: Move the estatus queue code up, and under its own ifdef

2018-02-23 Thread Borislav Petkov
On Fri, Feb 23, 2018 at 06:02:21PM +, James Morse wrote:
> Sure. I reckon your English grammar is better than mine, is this better?:

Bah, you must be joking :-)

> | In any NMI-like handler, memory from ghes_estatus_pool is used to save
> | estatus, and added to the ghes_estatus_llist. irq_work_queue() causes
> | ghes_proc_in_irq() to run in IRQ context where each estatus in
> | ghes_estatus_llist are processed. Each NMI-like error source must grow

s/are/is/ reads better to me, for some reason :)

> | the ghes_estatus_pool to ensure memory is available.

Other than that, yap, much better!

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 01/11] ACPI / APEI: Move the estatus queue code up, and under its own ifdef

2018-02-20 Thread Borislav Petkov
On Thu, Feb 15, 2018 at 06:55:56PM +, James Morse wrote:
> +#ifdef CONFIG_HAVE_ACPI_APEI_NMI
> +/*
> + * While printk() now has an in_nmi() path, the handling for CPER records
> + * does not. For example, memory_failure_queue() takes spinlocks and calls
> + * schedule_work_on().
> + *
> + * So in any NMI-like handler, we allocate required memory from lock-less
> + * memory allocator (ghes_estatus_pool), save estatus into it, put them into
> + * lock-less list (ghes_estatus_llist), then delay printk into IRQ context 
> via
> + * irq_work (ghes_proc_irq_work).  ghes_estatus_size_request record
> + * required pool size by all NMI error source.

Since you're touching this, pls correct the grammar too, while at it,
and correct them into proper sentences. Also, end function names with
"()". Also the "we" pronoun and tense sounds funny - let's make it
passive.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 00/11] APEI in_nmi() rework and arm64 SDEI wire-up

2018-02-19 Thread Borislav Petkov
On Thu, Feb 15, 2018 at 06:55:55PM +, James Morse wrote:
> Hello!
> 
> The aim of this series is to wire arm64's SDEI into APEI.
> 
> What's SDEI? Its ARM's "Software Delegated Exception Interface" [0]. It's
> used by firmware to tell the OS about firmware-first RAS events.
> 
> These Software exceptions can interrupt anything, so I describe them as
> NMI-like. They aren't the only NMI-like way to notify the OS about
> firmware-first RAS events, the ACPI spec also defines 'NOTFIY_SEA' and
> 'NOTIFY_SEI'.
> 
> (Acronyms: SEA, Synchronous External Abort. The CPU requested some memory,
> but the owner of that memory said no. These are always synchronous with the
> instruction that caused them. SEI, System-Error Interrupt, commonly called
> SError. This is an asynchronous external abort, the memory-owner didn't say no
> at the right point. Collectively these things are called external-aborts
> How is firmware involved? It traps these and re-injects them into the kernel
> once its written the CPER records).

Thank you about those! This is how people should write 0/N introductory
messages with fancy new abbreviations.

:-)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V17 00/11] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64

2017-05-23 Thread Borislav Petkov
On Fri, May 19, 2017 at 02:32:02PM -0600, Tyler Baicar wrote:
> When a memory error, CPU error, PCIe error, or other type of hardware error
> that's covered by RAS occurs, firmware should populate the shared GHES memory
> location with the proper GHES structures to notify the OS of the error.

...

> V17:Rebase on tip
> Change trace event helper function names
> Remove unneeded prefixes from commit text

The whole series:

Reviewed-by: Borislav Petkov <b...@suse.de>

Thanks!

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V16 08/11] efi: print unrecognized CPER section

2017-05-16 Thread Borislav Petkov
On Tue, May 16, 2017 at 10:44:43AM -0600, Baicar, Tyler wrote:
> I meant to respond to this comment after I sent the v16 patch series, but
> you beat me to it :)
> 
> These prefixes are common to all the GHES/CPER printing to the kernel logs.

I don't mean that - I meant to remove them from this particular example
in the commit message along with the timestamps. Because they are simply
useless there.

Of course they should stay in the actual printk calls.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V16 10/11] trace, ras: add ARM processor error trace event

2017-05-16 Thread Borislav Petkov
On Mon, May 15, 2017 at 03:27:59PM -0600, Tyler Baicar wrote:
> Currently there are trace events for the various RAS
> errors with the exception of ARM processor type errors.
> Add a new trace event for such errors so that the user
> will know when they occur. These trace events are
> consistent with the ARM processor error section type
> defined in UEFI 2.6 spec section N.2.4.4.
> 
> Signed-off-by: Tyler Baicar 
> Acked-by: Steven Rostedt 
> Reviewed-by: Xie XiuQi 
> ---
>  drivers/acpi/apei/ghes.c|  6 +-
>  drivers/firmware/efi/cper.c |  1 +
>  drivers/ras/ras.c   |  6 ++
>  include/linux/ras.h |  3 +++
>  include/ras/ras_event.h | 45 
> +
>  5 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 1106722..2dddb3b 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -518,7 +518,11 @@ static void ghes_do_proc(struct ghes *ghes,
>  
>   }
>  #endif
> - else {
> + else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM)) {
> + struct cper_sec_proc_arm *err = 
> acpi_hest_get_payload(gdata);
> +
> + call_arm_trace_event(err);
> + } else {
>   void *err = acpi_hest_get_payload(gdata);
>  
>   call_non_standard_trace_event(_type, fru_id,
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index d5a5855..48a8f69 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define INDENT_SP" "
>  
> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> index 57363be..8655ef4 100644
> --- a/drivers/ras/ras.c
> +++ b/drivers/ras/ras.c
> @@ -20,6 +20,11 @@ void call_non_standard_trace_event(const uuid_le *sec_type,
>   trace_non_standard_event(sec_type, fru_id, fru_text, sev, err, len);
>  }
>  
> +void call_arm_trace_event(struct cper_sec_proc_arm *err)

log_arm_hw_error()

because it is exactly that - a hardware error.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V16 09/11] ras: acpi / apei: generate trace event for unrecognized CPER section

2017-05-16 Thread Borislav Petkov
On Mon, May 15, 2017 at 03:27:58PM -0600, Tyler Baicar wrote:
> The UEFI spec includes non-standard section type support in the
> Common Platform Error Record. This is defined in section N.2.3 of
> UEFI version 2.5.
> 
> Currently if the CPER section's type (UUID) does not match any
> section type that the kernel knows how to parse, a trace event is
> not generated.
> 
> Generate a trace event which contains the raw error data for
> non-standard section type error records.
> 
> Signed-off-by: Tyler Baicar 
> CC: Jonathan (Zhixiong) Zhang 
> Tested-by: Shiju Jose 
> ---
>  drivers/acpi/apei/ghes.c  | 27 +++
>  drivers/ras/ras.c |  9 +
>  include/linux/ras.h   | 12 
>  include/ras/ras_event.h   | 45 +
>  include/uapi/linux/uuid.h |  6 --
>  5 files changed, 93 insertions(+), 6 deletions(-)

This patch doesn't apply cleanly against 4.12-rc1. Please rediff it.

> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> index b67dd36..57363be 100644
> --- a/drivers/ras/ras.c
> +++ b/drivers/ras/ras.c
> @@ -7,11 +7,19 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  #define CREATE_TRACE_POINTS
>  #define TRACE_INCLUDE_PATH ../../include/ras
>  #include 
>  
> +void call_non_standard_trace_event(const uuid_le *sec_type,

You are not calling a non-standard trace event - you're logging it:

log_non_standard_event()

> +  const uuid_le *fru_id, const char *fru_text, const u8 sev,
> +  const u8 *err, const u32 len)

Align arguments at opening brace.

> +{
> + trace_non_standard_event(sec_type, fru_id, fru_text, sev, err, len);
> +}
> +
>  static int __init ras_init(void)
>  {
>   int rc = 0;

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V16 08/11] efi: print unrecognized CPER section

2017-05-16 Thread Borislav Petkov
On Mon, May 15, 2017 at 03:27:57PM -0600, Tyler Baicar wrote:
> UEFI spec allows for non-standard section in Common Platform Error
> Record. This is defined in section N.2.3 of UEFI version 2.5.
> 
> Currently if the CPER section's type (UUID) does not match with
> one of the section types that the kernel knows how to parse, the
> section is skipped. Therefore, user is not able to see
> such CPER data, for instance, error record of non-standard section.
> 
> This change prints out the raw data in hex in the dmesg buffer so
> that non-standard sections are reported to the user. Non-standard
> section type errors should be reported to the user because these
> can include errors which are vendor specific. The data length is
> taken from Error Data length field of Generic Error Data Entry.
> 
> The following is a sample output from dmesg:
> [  140.739180] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
> Error Source: 2
> [  140.739182] {1}[Hardware Error]: It has been corrected by h/w and requires 
> no further action
> [  140.739191] {1}[Hardware Error]: event severity: corrected
> [  140.739196] {1}[Hardware Error]:  time: precise 2017-03-15 20:37:35
> [  140.739197] {1}[Hardware Error]:  Error 0, type: corrected
> [  140.739203] {1}[Hardware Error]:   section type: unknown, 
> d2e2621c-f936-468d-0d84-15a4ed015c8b
> [  140.739205] {1}[Hardware Error]:   section length: 0x238
> [  140.739210] {1}[Hardware Error]:   : 4d415201 4d492031 453a4d45 
> 435f4343  .RAM1 IMEM:ECC_C
> [  140.739214] {1}[Hardware Error]:   0010: 53515f45 44525f42  
>   E_QSB_RD
> [  140.739217] {1}[Hardware Error]:   0020:    
>   
> [  140.739220] {1}[Hardware Error]:   0030:   0101 
> 0101  
> [  140.739223] {1}[Hardware Error]:   0040:   0005 
>   
> [  140.739226] {1}[Hardware Error]:   0050: 0101  0001 
> 0000  

Let me repeat myself from the last time:

"Kill all those prefixes:

" Hardware error from APEI Generic Hardware Error Source: 2
  It has been corrected by h/w and requires no further action
  event severity: corrected
   time: precise 2017-03-15 20:37:35
   Error 0, type: corrected
section type: unknown, d2e2621c-f936-468d-0d84-15a4ed015c8b
section length: 568 (0x238)
: 4d415201 4d492031 453a4d45 435f4343  .RAM1 IMEM:ECC_C
0010: 53515f45 44525f42    E_QSB_RD
0020:      
0030:   0101 0101  
0040:   0005   
0050: 0101  0001 0000  
"

to the important info only."

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 11/11] arm/arm64: KVM: add guest SEA support

2017-05-08 Thread Borislav Petkov
On Mon, May 08, 2017 at 01:54:44PM -0600, Baicar, Tyler wrote:
> This was discussed in the v12 and v13 patch series. There is existing
> code in kvm_handle_guest_abort for injecting an abort back into the
> guest. We only want to do that if it was an abort that was not handled
> by the firmware first handling. So here we verify that at least one of
> the SEA error sources successfully reported an error record sent from
> the firmware. If there were no errors reported by firmware, then we
> want to continue with the current implementation that will inject the
> virtual abort. (kvm_inject_vabt)

So this needs to be in a comment there. This is generic code in the
sense that it is in drivers/acpi/ and it should say why it is doing that
special thing. I know, SEA is ARM-only, as far as I'm gathering from
reviewing this, but this behavior needs to be documented as it is not
obvious.

Thanks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 06/11] acpi: apei: handle SEA notification type for ARMv8

2017-04-25 Thread Borislav Petkov
On Tue, Apr 25, 2017 at 11:41:39AM -0600, Baicar, Tyler wrote:
> I originally had this as a notifier, but Will requested to remove the
> notifier. That conversation is here: https://lkml.org/lkml/2017/1/18/1018

Yeah, he mentioned on IRC. I just think notifiers would be the cleaner
thing but whatever you guys say.

Just we had a nasty hack on x86 which I got rid of recently:

https://lkml.kernel.org/r/20170406090634.30950-1...@alien8.de

and I wouldn't want you guys to have the same "fun". :)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 06/11] acpi: apei: handle SEA notification type for ARMv8

2017-04-25 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 05:05:18PM -0600, Tyler Baicar wrote:
> ARM APEI extension proposal added SEA (Synchronous External Abort)
> notification type for ARMv8.
> Add a new GHES error source handling function for SEA. If an error
> source's notification type is SEA, then this function can be registered
> into the SEA exception handler. That way GHES will parse and report
> SEA exceptions when they occur.
> An SEA can interrupt code that had interrupts masked and is treated as
> an NMI. To aid this the page of address space for mapping APEI buffers
> while in_nmi() is always reserved, and ghes_ioremap_pfn_nmi() is
> changed to use the helper methods to find the prot_t to map with in
> the same way as ghes_ioremap_pfn_irq().

...

> @@ -518,6 +520,17 @@ static int do_sea(unsigned long addr, unsigned int esr, 
> struct pt_regs *regs)
>   pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>   inf->name, esr, addr);
>  
> + /*
> +  * Synchronous aborts may interrupt code which had interrupts masked.
> +  * Before calling out into the wider kernel tell the interested
> +  * subsystems.
> +  */
> + if (IS_ENABLED(CONFIG_ACPI_APEI_SEA)) {
> + nmi_enter();
> + ghes_notify_sea();
> + nmi_exit();
> + }

Well, the other GHES notification methods use a notifier:
ghes_notify_sci, ghes_notify_nmi. You probably should do that too
instead of calling straight into a driver from arch code.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-25 Thread Borislav Petkov
On Tue, Apr 25, 2017 at 10:05:31AM -0600, Baicar, Tyler wrote:
> That seems like something that should be done outside of these patches (if
> added to the kernel at all). The decoding for this information would all be
> vendor specific, so I'm not sure if we want to pollute the EFI code with
> vendor specific error decoding. Currently we are using the RAS Daemon user
> space tool for the decoding of this information since vendors can easily
> pick up this tool and add an extension for their vendor specific parsing.
> These prints will only happen when the firmware supports the vendor specific
> error information anyway.

The same questions apply here: what do you do if the machine panics
because the error is fatal and you can't get it to run any userspace?
Wouldn't it be good to decode the whole error?

Right now we photograph screens on Intel x86 and feed the typed MCA info
by hand to mcelog. Hardly an optimal situation.

On AMD, there's a decoder which actually can dump the decoded critical
error. (Or could - that's in flux again :-\).

So yes, if stuff is too vendor-specific then you probably don't
want to decode it (or add a vendor-specific decoding module like
edac_mce_amd.ko, for example). But the tables from the UEFI spec,
documenting IP-specific error types which should be valid for most if
not all ARM64 implementations adhering to the spec, would be useful to
decode.

In general, the more we can decode the error in the kernel and the less
we need an external help to do so, the better.

Thanks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-24 Thread Borislav Petkov
On Fri, Apr 21, 2017 at 12:22:09PM -0600, Baicar, Tyler wrote:
> I guess it's not really needed. It just may be useful considering there can
> be numerous error info structures, numerous context info structures, and a
> variable length vendor information section. I can move this print to only in
> the length check failure cases.

And? Why does the user care?

I mean, it is good for debugging when you wanna see you're parsing the
error info data properly but otherwise it doesn't improve the error
reporting one bit.

> Because these are part of the error information structure. I wouldn't think
> FW would populate error information structures that are different versions
> in the same processor error, but it could be possible from the spec (at
> least once there are different versions of the table).

Same argument as above.

> There is an error information 64 bit value in the ARM processor error
> information structure. (UEFI spec 2.6 table 261)

So that's IP-dependent and explained in the following tables. Any plans
on decoding that too?

> Why's that? Dumping this vendor specific error information is similar to the
> unrecognized CPER section reporting which is also meant for vendor specific
> information https://lkml.org/lkml/2017/4/18/751

And how do those naked bytes help the user understand the error happening?

Even in your example you have:

[  140.739210] {1}[Hardware Error]:   : 4d415201 4d492031 453a4d45 
435f4343  .RAM1 IMEM:ECC_C
[  140.739214] {1}[Hardware Error]:   0010: 53515f45 44525f42  
  E_QSB_RD

Which looks like some correctable ECC DRAM error and is actually begging
to be decoded in a human-readable form. So let's do that completely and
not dump partially decoded information.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 03/11] cper: add timestamp print to CPER status printing

2017-04-21 Thread Borislav Petkov
On Fri, Apr 21, 2017 at 12:08:43PM -0600, Baicar, Tyler wrote:
> The timestamp may still be useful when it is imprecise. In the polling case,
> you may only poll every minute or so, so the time may be useful.

Well, what is in the timestamp when !precise? Some random time or some
timestamp from a couple of seconds ago? How do you differentiate what
timestamp is bollocks and what is from a while ago?

Is the imprecise tstamp really close to the time the error happened or
pointing at 1970 - the beginning of unix time? :-)

I'm sure you've picked up by now that we don't trust the firmware one
bit.

> Also, I imagine there could be interrupt based errors happening much faster 
> than the
> FW/OS handshake can happen. Maybe we can just use what I had before but also
> specify imprecise so that it is clear:
> 
> printk("%s%ststamp: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
> (timestamp[3] & 0x1 ? "precise " : "imprecise "),
>  century, year, mon, day, hour, min, sec);

I guess.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-21 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 05:05:16PM -0600, Tyler Baicar wrote:
> Add support for ARM Common Platform Error Record (CPER).
> UEFI 2.6 specification adds support for ARM specific
> processor error information to be reported as part of the
> CPER records. This provides more detail on for processor error logs.
> 
> Signed-off-by: Tyler Baicar 
> CC: Jonathan (Zhixiong) Zhang 
> Reviewed-by: James Morse 
> Reviewed-by: Ard Biesheuvel 
> ---
>  drivers/firmware/efi/cper.c | 135 
> 
>  include/linux/cper.h|  54 ++
>  2 files changed, 189 insertions(+)
> 
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 46585f9..f959185 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
>  static const char * const proc_type_strs[] = {
>   "IA32/X64",
>   "IA64",
> + "ARM",
>  };
>  
>  static const char * const proc_isa_strs[] = {
>   "IA32",
>   "IA64",
>   "X64",
> + "ARM A32/T32",
> + "ARM A64",
>  };
>  
>  static const char * const proc_error_type_strs[] = {
> @@ -184,6 +187,128 @@ static void cper_print_proc_generic(const char *pfx,
>   printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
>  }
>  
> +#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
> +static const char * const arm_reg_ctx_strs[] = {
> + "AArch32 general purpose registers",
> + "AArch32 EL1 context registers",
> + "AArch32 EL2 context registers",
> + "AArch32 secure context registers",
> + "AArch64 general purpose registers",
> + "AArch64 EL1 context registers",
> + "AArch64 EL2 context registers",
> + "AArch64 EL3 context registers",
> + "Misc. system register structure",
> +};
> +
> +static void cper_print_proc_arm(const char *pfx,
> + const struct cper_sec_proc_arm *proc)
> +{
> + int i, len, max_ctx_type;
> + struct cper_arm_err_info *err_info;
> + struct cper_arm_ctx_info *ctx_info;
> + char newpfx[64];
> +
> + printk("%ssection length: %d\n", pfx, proc->section_length);

We need to dump section length because?

> + printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
> +
> + len = proc->section_length - (sizeof(*proc) +
> + proc->err_info_num * (sizeof(*err_info)));
> + if (len < 0) {
> + printk("%ssection length is too small\n", pfx);

Now here we *can* dump it.

> + printk("%sfirmware-generated error record is incorrect\n", pfx);
> + printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
> + return;
> + }
> +
> + if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
> + printk("%sMPIDR: 0x%016llx\n", pfx, proc->mpidr);


< newline here.

Also, what is MPIDR and can it be written in a more user-friendly manner
and not be an abbreviation?

> + if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
> + printk("%serror affinity level: %d\n", pfx,
> + proc->affinity_level);
> + if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
> + printk("%srunning state: 0x%x\n", pfx, proc->running_state);
> + printk("%sPSCI state: %d\n", pfx, proc->psci_state);

One more abbreviation. Please consider whether having the abbreviations
or actually writing them out is more user-friendly.

> + }
> +
> + snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);

That INDENT_SP thing is just silly, someone should kill it.

> +
> + err_info = (struct cper_arm_err_info *)(proc + 1);
> + for (i = 0; i < proc->err_info_num; i++) {
> + printk("%sError info structure %d:\n", pfx, i);
> + printk("%sversion:%d\n", newpfx, err_info->version);
> + printk("%slength:%d\n", newpfx, err_info->length);

< newline here.

Why do we even dump version and info for *every* err_info structure?

> + if (err_info->validation_bits &
> + CPER_ARM_INFO_VALID_MULTI_ERR) {
> + if (err_info->multiple_error == 0)
> + printk("%ssingle error\n", newpfx);
> + else if (err_info->multiple_error == 1)
> + printk("%smultiple errors\n", newpfx);
> + else
> + printk("%smultiple errors count:%u\n",
> + newpfx, err_info->multiple_error);

So this can be simply: "num errors: %d", err_info->multiple_error+1...

Without checking CPER_ARM_INFO_VALID_MULTI_ERR.

> + }

< newline here.

> + if (err_info->validation_bits & CPER_ARM_INFO_VALID_FLAGS) {
> + if (err_info->flags & CPER_ARM_INFO_FLAGS_FIRST)
> + 

Re: [PATCH V15 03/11] cper: add timestamp print to CPER status printing

2017-04-21 Thread Borislav Petkov
On Fri, Apr 21, 2017 at 10:04:35AM -0600, Baicar, Tyler wrote:
> This is basically what I already had in v14...you asked to move it into a
> different if-statement? https://lkml.org/lkml/2017/4/12/397

Well, clearly I've been smoking some nasty potent sh*t. :-\

/me goes and looks at the spec:

"Bit 0 – Timestamp is precise if this bit is set and correlates to the
time of the error event."

So why are we even printing the timestamp when !precise?

IOW, I think we should do:

if (!(timestamp[3] & 0x1))
printk("%stimestamp imprecise\n", pfx);
else {
sec = ..
min = ...

...
}

and print the actual values only when the timestamp is precise.
Otherwise it has *some* values which could just as well be completely
random. And it's not like we're reporting the error tomorrow - it is
mostly a couple of seconds from logging to the fw pushing it out...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 03/11] cper: add timestamp print to CPER status printing

2017-04-21 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 05:05:15PM -0600, Tyler Baicar wrote:
> The ACPI 6.1 spec added a timestamp to the HEST generic data

HEST?

I see the timestamp in

Table 18-343 Generic Error Data Entry

where those things are "One or more Generic Error Data Entry structures
may be recorded in the Generic Error Data Entries field of the Generic
Error Status Block structure."

And those are part of the "18.3.2.7 Generic Hardware Error Source",
i.e., GHES. So why do you say "HEST" above?

> structure. Print the timestamp out when printing out the error
> status information.
> 
> Signed-off-by: Tyler Baicar 
> CC: Jonathan (Zhixiong) Zhang 
> Reviewed-by: James Morse 
> Reviewed-by: Ard Biesheuvel 

Remove those Reviewed-by:s

> ---
>  drivers/firmware/efi/cper.c | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 8328a6f..46585f9 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -32,6 +32,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  
>  #define INDENT_SP" "
> @@ -387,6 +389,29 @@ static void cper_print_pcie(const char *pfx, const 
> struct cper_sec_pcie *pcie,
>   pfx, pcie->bridge.secondary_status, pcie->bridge.control);
>  }
>  
> +static void cper_estatus_timestamp(const char *pfx,

cper_print_tstamp()

> +struct acpi_hest_generic_data_v300 *gdata)
> +{
> + __u8 hour, min, sec, day, mon, year, century, *timestamp;
> +
> + if (gdata->validation_bits & ACPI_HEST_GEN_VALID_TIMESTAMP) {
> + timestamp = (__u8 *)&(gdata->time_stamp);
> + sec   = bcd2bin(timestamp[0]);
> + min   = bcd2bin(timestamp[1]);
> + hour  = bcd2bin(timestamp[2]);
> + day   = bcd2bin(timestamp[4]);
> + mon   = bcd2bin(timestamp[5]);
> + year  = bcd2bin(timestamp[6]);
> + century   = bcd2bin(timestamp[7]);
> +
> + if (*(timestamp + 3) & 0x1)
> + printk("%stimestamp is precise\n", pfx);
> +
> + printk("%stime: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
> + century, year, mon, day, hour, min, sec);

Yeah, about the precise tstamp, you can do something like this:

---
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 46585f92b741..a649884e2264 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -404,10 +404,8 @@ static void cper_estatus_timestamp(const char *pfx,
year  = bcd2bin(timestamp[6]);
century   = bcd2bin(timestamp[7]);
 
-   if (*(timestamp + 3) & 0x1)
-   printk("%stimestamp is precise\n", pfx);
-
-   printk("%stime: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
+   printk("%s%ststamp: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
+   (timestamp[3] & 0x1 ? "precise " : ""), 
century, year, mon, day, hour, min, sec);
}
 }

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 02/11] ras: acpi/apei: cper: add support for generic data v3 structure

2017-04-20 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 05:05:14PM -0600, Tyler Baicar wrote:
> The ACPI 6.1 spec adds a new version of the generic data structure.

which data structure? HEST?

> Add support to handle the new structure as well as properly verify
> and iterate through the generic data entries.
> 
> Signed-off-by: Tyler Baicar 
> CC: Jonathan (Zhixiong) Zhang 
> Reviewed-by: James Morse 
> Reviewed-by: Ard Biesheuvel 

This is clearly a new version of the patch so Reviewed-by tags don't
apply anymore. Ditto for the next one. Remember to remove such tags in
the future when the patches are changed non-trivially in a following
iteration.

> ---
>  drivers/acpi/apei/ghes.c|  6 +++---
>  drivers/firmware/efi/cper.c | 37 ++---
>  include/acpi/ghes.h | 22 ++
>  3 files changed, 47 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 6d87ab7..dfb7dd2 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -429,7 +429,7 @@ static void ghes_handle_memory_failure(struct 
> acpi_hest_generic_data *gdata, int
>   int flags = -1;
>   int sec_sev = ghes_severity(gdata->error_severity);
>   struct cper_sec_mem_err *mem_err;
> - mem_err = (struct cper_sec_mem_err *)(gdata + 1);
> + mem_err = acpi_hest_get_payload(gdata);

struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);

while you're at it.

>   if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
>   return;
> @@ -466,7 +466,7 @@ static void ghes_do_proc(struct ghes *ghes,
>   if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
>CPER_SEC_PLATFORM_MEM)) {
>   struct cper_sec_mem_err *mem_err;
> - mem_err = (struct cper_sec_mem_err *)(gdata+1);
> + mem_err = acpi_hest_get_payload(gdata);

Ditto

+ add a \n here.

>   ghes_edac_report_mem_error(ghes, sev, mem_err);
>  
>   arch_apei_report_mem_error(sev, mem_err);
> @@ -476,7 +476,7 @@ static void ghes_do_proc(struct ghes *ghes,
>   else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
> CPER_SEC_PCIE)) {
>   struct cper_sec_pcie *pcie_err;
> - pcie_err = (struct cper_sec_pcie *)(gdata+1);
> + pcie_err = acpi_hest_get_payload(gdata);

Ditto.

>   if (sev == GHES_SEV_RECOVERABLE &&
>   sec_sev == GHES_SEV_RECOVERABLE &&
>   pcie_err->validation_bits & 
> CPER_PCIE_VALID_DEVICE_ID &&
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index d425374..8328a6f 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define INDENT_SP" "
>  
> @@ -386,8 +387,9 @@ static void cper_print_pcie(const char *pfx, const struct 
> cper_sec_pcie *pcie,
>   pfx, pcie->bridge.secondary_status, pcie->bridge.control);
>  }
>  
> -static void cper_estatus_print_section(
> - const char *pfx, const struct acpi_hest_generic_data *gdata, int sec_no)
> +static void
> +cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data 
> *gdata,
> +int sec_no)
>  {
>   uuid_le *sec_type = (uuid_le *)gdata->section_type;
>   __u16 severity;
> @@ -403,14 +405,18 @@ static void cper_estatus_print_section(
>  
>   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
>   if (!uuid_le_cmp(*sec_type, CPER_SEC_PROC_GENERIC)) {
> - struct cper_sec_proc_generic *proc_err = (void *)(gdata + 1);
> + struct cper_sec_proc_generic *proc_err;
> +
> + proc_err = acpi_hest_get_payload(gdata);

Ditto.

>   printk("%s""section_type: general processor error\n", newpfx);
>   if (gdata->error_data_length >= sizeof(*proc_err))
>   cper_print_proc_generic(newpfx, proc_err);
>   else
>   goto err_section_too_small;
>   } else if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
> - struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
> + struct cper_sec_mem_err *mem_err;
> +
> + mem_err = acpi_hest_get_payload(gdata);

Ditto.

>   printk("%s""section_type: memory error\n", newpfx);
>   if (gdata->error_data_length >=
>   sizeof(struct cper_sec_mem_err_old))
> @@ -419,7 +425,9 @@ static void cper_estatus_print_section(
>   else
>   goto err_section_too_small;
>   } else if (!uuid_le_cmp(*sec_type, CPER_SEC_PCIE)) {
> - struct cper_sec_pcie *pcie = (void *)(gdata + 

Re: [PATCH V15 01/11] acpi: apei: read ack upon ghes record consumption

2017-04-19 Thread Borislav Petkov
On Wed, Apr 19, 2017 at 02:31:13PM -0600, Baicar, Tyler wrote:
> Will do.

You don't necessarily have to reply with "will do" if you agree with the
review.

Also, please wait until I've gone through the whole pile before sending
it again.

Thanks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V15 01/11] acpi: apei: read ack upon ghes record consumption

2017-04-19 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 05:05:13PM -0600, Tyler Baicar wrote:
> A RAS (Reliability, Availability, Serviceability) controller
> may be a separate processor running in parallel with OS
> execution, and may generate error records for consumption by
> the OS. If the RAS controller produces multiple error records,
> then they may be overwritten before the OS has consumed them.
> 
> The Generic Hardware Error Source (GHES) v2 structure
> introduces the capability for the OS to acknowledge the
> consumption of the error record generated by the RAS
> controller. A RAS controller supporting GHESv2 shall wait for
> the acknowledgment before writing a new error record, thus
> eliminating the race condition.
> 
> Add support for parsing of GHESv2 sub-tables as well.
> 
> Signed-off-by: Tyler Baicar 
> CC: Jonathan (Zhixiong) Zhang 
> Reviewed-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 55 
> +---
>  drivers/acpi/apei/hest.c |  7 --
>  include/acpi/ghes.h  |  5 -
>  3 files changed, 61 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 79b3c9c..6d87ab7 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -46,6 +46,7 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -80,6 +81,10 @@
>   ((struct acpi_hest_generic_status *)\
>((struct ghes_estatus_node *)(estatus_node) + 1))
>  
> +#define IS_HEST_TYPE_GENERIC_V2(ghes)\
> + ((struct acpi_hest_header *)ghes->generic)->type == \

This is a nasty hack: casting the ghes->generic pointer to a pointer of its
first member which is a acpi_hest_header.

Why isn't this a nice inline function with proper dereferencing:

static inline bool is_hest_type_generic_v2(struct ghes *ghes)
{
return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
}

?

Also, please integrate scripts/checkpatch.pl in your patch creation
workflow. Some of the warnings/errors *actually* make sense.

>  /*
>   * This driver isn't really modular, however for the time being,
>   * continuing to use module_param is the easiest way to remain
> @@ -240,6 +245,17 @@ static int ghes_estatus_pool_expand(unsigned long len)
>   return 0;
>  }
>  
> +static int map_gen_v2(struct ghes *ghes)
> +{
> + return apei_map_generic_address(>generic_v2->read_ack_register);
> +}
> +
> +static void unmap_gen_v2(struct ghes *ghes)
> +{
> + apei_unmap_generic_address(>generic_v2->read_ack_register);
> + return;
> +}

Like this one, for example:

WARNING: void function return statements are not generally useful
#89: FILE: drivers/acpi/apei/ghes.c:257:
+   return;
+}

> +
>  static struct ghes *ghes_new(struct acpi_hest_generic *generic)
>  {
>   struct ghes *ghes;
> @@ -249,10 +265,17 @@ static struct ghes *ghes_new(struct acpi_hest_generic 
> *generic)
>   ghes = kzalloc(sizeof(*ghes), GFP_KERNEL);
>   if (!ghes)
>   return ERR_PTR(-ENOMEM);
> +
>   ghes->generic = generic;
> + if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
> + rc = map_gen_v2(ghes);
> + if (rc)
> + goto err_free;
> + }
> +
>   rc = apei_map_generic_address(>error_status_address);
>   if (rc)
> - goto err_free;
> + goto err_unmap_read_ack_addr;
>   error_block_length = generic->error_block_length;
>   if (error_block_length > GHES_ESTATUS_MAX_SIZE) {
>   pr_warning(FW_WARN GHES_PFX
> @@ -264,13 +287,16 @@ static struct ghes *ghes_new(struct acpi_hest_generic 
> *generic)
>   ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
>   if (!ghes->estatus) {
>   rc = -ENOMEM;
> - goto err_unmap;
> + goto err_unmap_status_addr;
>   }
>  
>   return ghes;
>  
> -err_unmap:
> +err_unmap_status_addr:
>   apei_unmap_generic_address(>error_status_address);
> +err_unmap_read_ack_addr:
> + if (IS_HEST_TYPE_GENERIC_V2(ghes))
> + unmap_gen_v2(ghes);
>  err_free:
>   kfree(ghes);
>   return ERR_PTR(rc);
> @@ -280,6 +306,8 @@ static void ghes_fini(struct ghes *ghes)
>  {
>   kfree(ghes->estatus);
>   apei_unmap_generic_address(>generic->error_status_address);
> + if (IS_HEST_TYPE_GENERIC_V2(ghes))
> + unmap_gen_v2(ghes);
>  }
>  
>  static inline int ghes_severity(int severity)
> @@ -649,6 +677,21 @@ static void ghes_estatus_cache_add(
>   rcu_read_unlock();
>  }
>  
> +static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)

If you name this function parameter to something shorter, say gv2, for
example...

> +{
> + int rc;
> + u64 val = 0;
> +
> + rc = apei_read(, _v2->read_ack_register);
> + if (rc)
> + return rc;
> +
> + val &= 

Re: [PATCH V14 02/10] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1

2017-04-13 Thread Borislav Petkov
On Thu, Apr 13, 2017 at 02:30:21PM -0600, Baicar, Tyler wrote:
> I do not agree with this. The struct being passed to this function is
> already named acpi_hest_generic_data in the existing code and all over this
> code is named gdata not just d.

And I'm saying they're too long - the preexisting ones and the ones
you're adding - and impair readability. This whole driver is one
unreadable ugly pile and if I were the maintainer I would never allowed
it in its current form.

But I don't think it really has a maintainer - poor Rafael has to deal
with it because it is under drivers/acpi/ and that whole RAS firmware
crap got thrown over the wall at some point and now we're stuck with it.

So this is just my opinion since he asked me to take a look.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V14 01/10] acpi: apei: read ack upon ghes record consumption

2017-04-12 Thread Borislav Petkov
On Tue, Mar 28, 2017 at 01:30:31PM -0600, Tyler Baicar wrote:
> A RAS (Reliability, Availability, Serviceability) controller
> may be a separate processor running in parallel with OS
> execution, and may generate error records for consumption by
> the OS. If the RAS controller produces multiple error records,
> then they may be overwritten before the OS has consumed them.
> 
> The Generic Hardware Error Source (GHES) v2 structure
> introduces the capability for the OS to acknowledge the
> consumption of the error record generated by the RAS
> controller. A RAS controller supporting GHESv2 shall wait for
> the acknowledgment before writing a new error record, thus
> eliminating the race condition.
> 
> Add support for parsing of GHESv2 sub-tables as well.
> 
> Signed-off-by: Tyler Baicar 
> CC: Jonathan (Zhixiong) Zhang 
> Reviewed-by: James Morse 
> ---
>  drivers/acpi/apei/ghes.c | 49 
> +---
>  drivers/acpi/apei/hest.c |  7 +--
>  include/acpi/ghes.h  |  5 -
>  3 files changed, 55 insertions(+), 6 deletions(-)

...

> @@ -249,10 +254,18 @@ static struct ghes *ghes_new(struct acpi_hest_generic 
> *generic)
>   ghes = kzalloc(sizeof(*ghes), GFP_KERNEL);
>   if (!ghes)
>   return ERR_PTR(-ENOMEM);
> +
>   ghes->generic = generic;
> + if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
> + rc = apei_map_generic_address(
> + >generic_v2->read_ack_register);

Yeah, that linebreak just to keep the 80-cols rule makes the code ugly
and hard to read.

Please put that mapping and unmapping in wrappers called
map_gen_v2(ghes) and unmap_gen_v2(ghes) or so, so that you can call them
wherever needed. Thus should make the flow a bit more understandable
what's going on and you won't have to repeat the unmapping lines in
ghes_fini().

> @@ -649,6 +669,23 @@ static void ghes_estatus_cache_add(
>   rcu_read_unlock();
>  }
>  
> +static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
> +{
> + int rc;
> + u64 val = 0;
> +
> + rc = apei_read(, _v2->read_ack_register);
> + if (rc)
> + return rc;
> + val &= generic_v2->read_ack_preserve <<
> + generic_v2->read_ack_register.bit_offset;
> + val |= generic_v2->read_ack_write <<
> + generic_v2->read_ack_register.bit_offset;

Yeah, let them stick out, it more readable this way. Line spacing is
helpful too:

...
rc = apei_read(, _v2->read_ack_register);
if (rc)
return rc;

val &= generic_v2->read_ack_preserve << 
generic_v2->read_ack_register.bit_offset;
val |= generic_v2->read_ack_write<< 
generic_v2->read_ack_register.bit_offset;

return apei_write(val, _v2->read_ack_register);
}

> + rc = apei_write(val, _v2->read_ack_register);
> +
> + return rc;
> +}
> +
>  static int ghes_proc(struct ghes *ghes)
>  {
>   int rc;
-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V14 03/10] efi: parse ARM processor error

2017-04-12 Thread Borislav Petkov
On Tue, Mar 28, 2017 at 01:30:33PM -0600, Tyler Baicar wrote:
> Add support for ARM Common Platform Error Record (CPER).
> UEFI 2.6 specification adds support for ARM specific
> processor error information to be reported as part of the
> CPER records. This provides more detail on for processor error logs.
> 
> Signed-off-by: Tyler Baicar 
> CC: Jonathan (Zhixiong) Zhang 
> Reviewed-by: James Morse 
> Reviewed-by: Ard Biesheuvel 
> ---
>  drivers/firmware/efi/cper.c | 133 
> 
>  include/linux/cper.h|  54 ++
>  2 files changed, 187 insertions(+)
> 
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 8fa4e23..56aa516 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
>  static const char * const proc_type_strs[] = {
>   "IA32/X64",
>   "IA64",
> + "ARM",
>  };
>  
>  static const char * const proc_isa_strs[] = {
>   "IA32",
>   "IA64",
>   "X64",
> + "ARM A32/T32",
> + "ARM A64",
>  };
>  
>  static const char * const proc_error_type_strs[] = {
> @@ -139,6 +142,18 @@ void cper_print_bits(const char *pfx, unsigned int bits,
>   "corrected",
>  };
>  
> +static const char * const arm_reg_ctx_strs[] = {
> + "AArch32 general purpose registers",
> + "AArch32 EL1 context registers",
> + "AArch32 EL2 context registers",
> + "AArch32 secure context registers",
> + "AArch64 general purpose registers",
> + "AArch64 EL1 context registers",
> + "AArch64 EL2 context registers",
> + "AArch64 EL3 context registers",
> + "Misc. system register structure",
> +};

That...

> +
>  static void cper_print_proc_generic(const char *pfx,
>   const struct cper_sec_proc_generic *proc)
>  {
> @@ -184,6 +199,114 @@ static void cper_print_proc_generic(const char *pfx,
>   printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
>  }
>  
> +static void cper_print_proc_arm(const char *pfx,
> + const struct cper_sec_proc_arm *proc)

... and that function should go into:

#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)

Just put them close together so that you don't have too much ifdeffery.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm