Re: [PATCH v5 0/4] arch/x86: Remove unnecessary dependencies on bootparam.h

2024-01-15 Thread Borislav Petkov
On Mon, Jan 15, 2024 at 11:55:36AM +0100, Ard Biesheuvel wrote:
> But please be aware that we are in the middle of the merge window

Yes, and the merge window has been suspended too:

https://lore.kernel.org/r/CAHk-=wjmwpmxtkein__vnno4tcttzr-8dvvd_obq%2bhjessw...@mail.gmail.com

> right now, and I suspect that the -tip maintainers may have some
> feedback of their own. So give it at least a week or so, and ping this
> thread again to ask how to proceed.

From: Documentation/process/maintainer-tip.rst

"Merge window


Please do not expect large patch series to be handled during the merge
window or even during the week before.  Such patches should be submitted in
mergeable state *at* *least* a week before the merge window opens.
Exceptions are made for bug fixes and *sometimes* for small standalone
drivers for new hardware or minimally invasive patches for hardware
enablement.

During the merge window, the maintainers instead focus on following the
upstream changes, fixing merge window fallout, collecting bug fixes, and
allowing themselves a breath. Please respect that.

The release candidate -rc1 is the starting point for new patches to be
applied which are targeted for the next merge window."

So pls be patient.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [RFC PATCH v2 0/9] Use ERST for persistent storage of MCE and APEI errors

2023-10-26 Thread Borislav Petkov
On Sat, Oct 07, 2023 at 03:15:45PM +0800, Shuai Xue wrote:
> So, IMHO, it's better to add a way to retrieve MCE records through switching
> to the new generation rasdaemon solution.

rasdaemon already collects errors and even saves them in a database of
sorts. No kernel changes needed.

> Sorry for the poor cover letter. I hope the following response can clarify
> the matter.
> 
> Q1: What is the exact problem?
> 
> Traditionally, fatal hardware errors will cause Linux print error log to
> console, e.g. print_mce() or __ghes_print_estatus(), then reboot. With
> Linux, the primary method for obtaining debugging information of a serious
> error or fault is via the kdump mechanism.

Not necessarily - see above.

> In the public cloud scenario, multiple virtual machines run on a
> single physical server, and if that server experiences a failure, it can
> potentially impact multiple tenants. It is crucial for us to thoroughly
> analyze the root causes of each instance failure in order to:
> 
> - Provide customers with a detailed explanation of the outage to reassure 
> them.
> - Collect the characteristics of the failures, such as ECC syndrome, to 
> enable fault prediction.
> - Explore potential solutions to prevent widespread outages.

Huh, are you talking about providing customers with error information
from the *underlying* physical machine which runs the cloud VMs? That
sounds suspicious, to say the least.

AFAICT, all you can tell the VM owner is: yah, the hw had an
uncorrectable error in its memory and crashed. Is that the use case?

To be able to tell the VM owners why it crashed?

> In short, it is necessary to serialize hardware error information available
> for post-mortem debugging.
> 
> Q2: What exactly I wanna do:
> 
> The MCE handler, do_machine_check(), saves the MCE record to persistent
> storage and it is retrieved by mcelog. Mcelog has been deprecated when
> kernel 4.12 released in 2017, and the help of the configuration option
> CONFIG_X86_MCELOG_LEGACY suggest to consider switching to the new
> generation rasdaemon solution. The GHES handler does not support APEI error
> record now.

I think you're confusing things: MCEs do get reported to userspace
through the trace_mc_record tracepoint and rasdaemon opens it and reads
error info from there. And then writes it out to its db. So that works
now.

GHES is something different: it is a fw glue around error reporting so
that you don't have to develop a reporting driver for every platform but
you can use a single one - only the fw glue needs to be added.

The problem with GHES is that it is notoriously buggy and currently
it loads on a single platform only on x86.

ARM are doing something in that area - you're better off talking to
James Morse about it. And he's on Cc.

> To serialize hardware error information available for post-mortem
> debugging:
> - add support to save APEI error record into flash via ERST before go panic,
> - add support to retrieve MCE or APEI error record from the flash and emit
> the related tracepoint after system boot successful again so that rasdaemon
> can collect them

Now that is yet another thing: you want to save error records into
firmware. First of all, you don't really need it if you do kdump as
explained above.

Then, that thing has its own troubles: it is buggy like every firmware
is and it can brick the machine.

I'm not saying it is not useful - there are some use cases for it which
are being worked on but if all you wanna do is dump MCEs to rasdaemon,
that works even now.

But then you have an ARM patch there and I'm confused because MCEs are
x86 thing - ARM has different stuff.

So I think you need to elaborate more here.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [RFC PATCH v2 0/9] Use ERST for persistent storage of MCE and APEI errors

2023-09-28 Thread Borislav Petkov
On Mon, Sep 25, 2023 at 03:44:17PM +0800, Shuai Xue wrote:
> After /dev/mcelog character device deprecated by commit 5de97c9f6d85
> ("x86/mce: Factor out and deprecate the /dev/mcelog driver"), the
> serialized MCE error record, of previous boot in persistent storage is not
> collected via APEI ERST.

You lost me here. /dev/mcelog is deprecated but you can still use it and
apei_write_mce() still happens.

Looking at your patches, you're adding this to ghes so how about you sit
down first and explain your exact use case and what exactly you wanna
do?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v4] x86, efi: never relocate kernel below lowest acceptable address

2019-10-22 Thread Borislav Petkov
On Tue, Oct 22, 2019 at 08:13:56AM +0200, Ard Biesheuvel wrote:
> On Thu, 17 Oct 2019 at 11:30, Kairui Song  wrote:
> >
> > Currently, kernel fails to boot on some HyperV VMs when using EFI.
> > And it's a potential issue on all platforms.
> >
> > It's caused by broken kernel relocation on EFI systems, when below three
> > conditions are met:
> >
> > 1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR)
> >by the loader.
> > 2. There isn't enough room to contain the kernel, starting from the
> >default load address (eg. something else occupied part the region).
> > 3. In the memmap provided by EFI firmware, there is a memory region
> >starts below LOAD_PHYSICAL_ADDR, and suitable for containing the
> >kernel.
> >
> > EFI stub will perform a kernel relocation when condition 1 is met. But
> > due to condition 2, EFI stub can't relocate kernel to the preferred
> > address, so it fallback to ask EFI firmware to alloc lowest usable memory
> > region, got the low region mentioned in condition 3, and relocated
> > kernel there.
> >
> > It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This
> > is the lowest acceptable kernel relocation address.
> >
> > The first thing goes wrong is in arch/x86/boot/compressed/head_64.S.
> > Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output
> > address if kernel is located below it. Then the relocation before
> > decompression, which move kernel to the end of the decompression buffer,
> > will overwrite other memory region, as there is no enough memory there.
> >
> > To fix it, just don't let EFI stub relocate the kernel to any address
> > lower than lowest acceptable address.
> >
> > Signed-off-by: Kairui Song 
> > Acked-by: Jarkko Sakkinen 
> >
> 
> Ingo, Boris, could you please comment on this?

Yah, the commit message makes more sense now.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3] x86, efi: never relocate kernel below lowest acceptable address

2019-10-16 Thread Borislav Petkov
On Wed, Oct 16, 2019 at 08:23:56AM -0700, Joe Perches wrote:
> ?  examples please.

>From this very thread:

\sEfi\s, \sefi\s, \seFI\s etc should be "EFI"

I'm thinking perhaps start conservatively and catch the most often
misspelled ones in commit messages or comments. "CPU", "SMT", "MCE",
"MCA", "PCI" etc come to mind.

> checkpatch has a db for misspellings, I supposed another for
> acronyms could be added,

Doesn't have to be another one - established acronyms are part of the
dictionary too.

> but how would false positives be avoided?

Perhaps delimited with spaces or non-word chars (\W) and when they're
part of a comment or the commit message...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3] x86, efi: never relocate kernel below lowest acceptable address

2019-10-14 Thread Borislav Petkov
On Mon, Oct 14, 2019 at 11:21:11PM +0300, Jarkko Sakkinen wrote:
> Was there a section in the patch submission documentation to point out
> when people send patches with all the possible twists for an acronym?

I don't think so.

> This is giving me constantly gray hairs with TPM patches.

Well, I'm slowly getting tired of repeating the same crap over and over
again about how important it is to document one's changes and to write
good commit messages. The most repeated answers I'm simply putting into
canned reply templates because, well, saying it once or twice is not
enough anymore. :-\

And yeah, I see your pain. Same here, actually.

In the acronym case, I'd probably add a regex to my patch massaging
script and convert those typos automatically and be done with it.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3] x86, efi: never relocate kernel below lowest acceptable address

2019-10-14 Thread Borislav Petkov
On Sat, Oct 12, 2019 at 11:44:21AM +0800, Kairui Song wrote:
> Currently, kernel fails to boot on some HyperV VMs when using EFI.
> And it's a potential issue on all platforms.
> 
> It's caused a broken kernel relocation on EFI systems, when below three
> conditions are met:
> 
> 1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR)
>by the loader.
> 2. There isn't enough room to contain the kernel, starting from the
>default load address (eg. something else occupied part the region).
> 3. In the memmap provided by EFI firmware, there is a memory region
>starts below LOAD_PHYSICAL_ADDR, and suitable for containing the
>kernel.
> 
> Efi stub will perform a kernel relocation when condition 1 is met. But
> due to condition 2, efi stub can't relocate kernel to the preferred
> address, so it fallback to query and alloc from EFI firmware for lowest

Your spelling of "EFI" is like a random number generator in this
paragraph: "Efi", "efi" and "EFI". Can you please be more careful when
writing your commit messages? They're not some random text you hurriedly
jot down before sending the patch but a most important description of
why a change is being done.

And if you don't see their importance now, just try doing some git
archeology, trying to understand why a change has been done in the past
and then encounter a commit message two-liner which doesn't say sh*t.
Then you'll start appreciating properly written commit messages.

> usable memory region.
> 
> It's incorrect to use the lowest memory address. In later stage, kernel
> will assume LOAD_PHYSICAL_ADDR as the minimal acceptable relocate address,
> but efi stub will end up relocating kernel below it.

Why don't you simply explain what
choose_random_location()->find_random_virt_addr() does? That's the
problem you're solving, right? KASLR using LOAD_PHYSICAL_ADDR as the
minimum...

> The later kernel decompressing code will forcefully correct the wrong
> kernel load location,

... or do you mean by that the dance in
arch/x86/boot/compressed/head_64.S where we move the kernel temporarily
to LOAD_PHYSICAL_ADDR for the decompression?

You can simply say that here...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2] x86, efi: never relocate kernel below lowest acceptable address

2019-10-11 Thread Borislav Petkov
On Fri, Sep 20, 2019 at 12:05:21AM +0800, Kairui Song wrote:
> Currently, kernel fails to boot on some HyperV VMs when using EFI.
> And it's a potential issue on all platforms.
> 
> It's caused a broken kernel relocation on EFI systems, when below three
> conditions are met:
> 
> 1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR)
>by the loader.
> 2. There isn't enough room to contain the kernel, starting from the
>default load address (eg. something else occupied part the region).
> 3. In the memmap provided by EFI firmware, there is a memory region
>starts below LOAD_PHYSICAL_ADDR, and suitable for containing the
>kernel.
> 
> Efi stub will perform a kernel relocation when condition 1 is met. But
> due to condition 2, efi stub can't relocate kernel to the preferred
> address, so it fallback to query and alloc from EFI firmware for lowest
> usable memory region.
> 
> It's incorrect to use the lowest memory address. In later stage, kernel
> will assume LOAD_PHYSICAL_ADDR as the minimal acceptable relocate address,
> but efi stub will end up relocating kernel below it.

So far, so good.

> Then before the kernel decompressing. Kernel will do another relocation
> to address not lower than LOAD_PHYSICAL_ADDR, this time the relocate will
> over write the blockage at the default load address, which efi stub tried
> to avoid, and lead to unexpected behavior. Beside, the memory region it
> writes to is not allocated from EFI firmware, which is also wrong.

This paragraph is an unreadable mess and should be rewritten in simple,
declarative sentences.

The patch itself looks ok.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 0/2] Fix crash in cper_estatus_check()

2019-02-19 Thread Borislav Petkov
On Tue, Feb 19, 2019 at 11:00:49AM +0100, Rafael J. Wysocki wrote:
> Boris, any comments here?

For both:

Acked-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

2019-02-08 Thread Borislav Petkov
On Tue, Feb 05, 2019 at 10:05:16AM -0500, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> Date: Tue, 5 Feb 2019 10:00:59 -0500
> Subject: [PATCH] x86/mm: Introduce adjustment the padding size for KASLR

"Adjust the padding size for KASLR"

> If the physical memory layout has huge space for hotplug, the padding
> used for the physical memory mapping section is not enough.
> So, such system may crash while memory hot-adding on KASLR enabled system.

Crash why?

Why is the padding not enough?

> For example, SRAT has the following layout, the maximum possible memory
> size is 32TB, and the memory is installed as 2TB actually, then the padding
> size should set 30TB (== possible memory size - actual memory size).
> 
>   SRAT: Node 3 PXM 7 [mem 0x1c00-0x1fff] hotplug

What is that supposed to exemplify: that range is 3T not 2 and that
range start is not at 2T but 28T. So I have absolutely no clue what
you're trying to explain here.

Please go back, take your time and structure your commit message like
this:

Problem is A.

It happens because of B.

Fix it by doing C.

(Potentially do D).

For more detailed info, see
Documentation/process/submitting-patches.rst, Section "2) Describe your
changes".

> This patch introduces adjustment the padding size if the default

Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.

Also, do

$ git grep 'This patch' Documentation/process

for more details.

> padding size isn't enough.
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  Documentation/x86/zero-page.txt   |  1 +
>  arch/x86/boot/compressed/acpi.c   | 19 +++
>  arch/x86/include/uapi/asm/bootparam.h |  2 +-
>  arch/x86/mm/kaslr.c   | 26 +-
>  4 files changed, 42 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
> index 68aed077f..343fe1a90 100644
> --- a/Documentation/x86/zero-page.txt
> +++ b/Documentation/x86/zero-page.txt
> @@ -15,6 +15,7 @@ Offset  Proto   NameMeaning
>  058/008  ALL tboot_addr  Physical address of tboot shared page
>  060/010  ALL ist_infoIntel SpeedStep (IST) BIOS support 
> information
>   (struct ist_info)
> +078/010  ALL possible_mem_addr The possible maximum physical memory 
> address.

Why isn't this called max_phys_addr then?

Also, please explain what it means at the end of this file.

>  080/010  ALL hd0_infohd0 disk parameter, OBSOLETE!!
>  090/010  ALL hd1_infohd1 disk parameter, OBSOLETE!!
>  0A0/010  ALL sys_desc_table  System description table (struct 
> sys_desc_table),
> diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
> index c5a949335..7dd61b943 100644
> --- a/arch/x86/boot/compressed/acpi.c
> +++ b/arch/x86/boot/compressed/acpi.c
> @@ -288,6 +288,7 @@ int count_immovable_mem_regions(void)
>   struct acpi_subtable_header *sub_table;
>   struct acpi_table_header *table_header;
>   char arg[MAX_ACPI_ARG_LENGTH];
> + unsigned long long possible_addr, max_possible_addr = 0;
>   int num = 0;
>  
>   if (cmdline_find_option("acpi", arg, sizeof(arg)) == 3 &&
> @@ -308,10 +309,19 @@ int count_immovable_mem_regions(void)
>   struct acpi_srat_mem_affinity *ma;
>  
>   ma = (struct acpi_srat_mem_affinity *)sub_table;
> - if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && 
> ma->length) {
> - immovable_mem[num].start = ma->base_address;
> - immovable_mem[num].size = ma->length;
> - num++;
> + if (ma->length) {
> + if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
> + possible_addr =
> + ma->base_address + ma->length;
> + if (possible_addr > max_possible_addr)
> + max_possible_addr =
> + possible_addr;
> + } else {
> + immovable_mem[num].start =
> + ma->base_address;
> + immovable_mem[num].size = ma->length;
> + num++;
> + }

This piece has some ugly linebreaks which makes it impossible to read.
Perhaps because the variable names you're adding are too long.

You can carve it out in a separate function, for example.

>   if (num >= MAX_NUMNODES*2) {
> @@ -320,6 +330,7 @@ int count_immovable_mem_regions(void)
>   }
>   }
>   table += sub_table->length;

Re: [PATCH 2/2] efi/cper: Avoid possible OOB when checking generic data block

2019-01-23 Thread Borislav Petkov
On Tue, Jan 22, 2019 at 04:09:12PM +, Ross Lagerwall wrote:
> When checking a generic status block, we iterate over all the generic
> data blocks. The loop condition only checks that the start of the
> generic data block is valid (within estatus->data_length) but not the
> whole block. Because the size of data blocks (excluding error data) may
> vary depending on the revision and the revision is contained within the
> data block, ensure that enough of the current data block is valid before
> dereferencing any members otherwise an OOB access may occur if

Please write out the OOB abbreviation in your commit messages.

> estatus->data_length is invalid. This relies on the fact that
> struct acpi_hest_generic_data_v300 is a superset of the earlier version.
> Also rework the other checks to avoid potential underflow.
> 
> Signed-off-by: Ross Lagerwall 
> ---
>  drivers/firmware/efi/cper.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index a7902fccdcfa..7cc18874b9d0 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -546,7 +546,7 @@ EXPORT_SYMBOL_GPL(cper_estatus_check_header);
>  int cper_estatus_check(const struct acpi_hest_generic_status *estatus)
>  {
>   struct acpi_hest_generic_data *gdata;
> - unsigned int data_len, gedata_len;
> + unsigned int data_len, record_len;
>   int rc;
>  
>   rc = cper_estatus_check_header(estatus);
> @@ -555,10 +555,12 @@ int cper_estatus_check(const struct 
> acpi_hest_generic_status *estatus)
>   data_len = estatus->data_length;
>  
>   apei_estatus_for_each_section(estatus, gdata) {
> - gedata_len = acpi_hest_get_error_length(gdata);
> - if (gedata_len > data_len - acpi_hest_get_size(gdata))
> + if (sizeof(struct acpi_hest_generic_data) > data_len)
>   return -EINVAL;

< newline here.

Also, add a new line before the data_len assignment above, in the function.

> - data_len -= acpi_hest_get_record_size(gdata);
> + record_len = acpi_hest_get_record_size(gdata);

record_size so that it matches the function name it is used to compute
this.

Btw, trying to grok this code is making my head spin.

> + if (record_len > data_len)
> + return -EINVAL;

< newline here.

Btw, those checks in the loop you can abstract away into a separate
function so that you end up with something more readable like:

apei_estatus_for_each_section(estatus, gdata) {
record_size = check_hest_record_size(gdata, data_len);
if (!record_size)
return -EINVAL;

data_len -= record_size;
}

for example.

> + data_len -= record_len;
>   }
>   if (data_len)
>   return -EINVAL;
> -- 

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH efi-urgent] Revert "efi: Align 'efi_guid_t' to 64 bits"

2018-12-21 Thread Borislav Petkov
On Fri, Dec 21, 2018 at 12:54:46PM +0100, Ard Biesheuvel wrote:
> This reverts commit 793423cf07e51e3185b8680167115813589c057d.
> 
> The 64-bit alignment affects the size of efi_config_table_32_t,
> which is used as an array type. On the other hand, the former
> byte alignment could trigger alignment faults during firmware
> calls on 32-bit ARM, since UEFI defines efi_guid_t as a struct
> of UINT32 + UINT16 + UINT16 + UINT8[8], and so it may use load/
> store multiple instructions [requiring 32-bit alignment], e.g.,
> on GUID pointers passed as function arguments since UEFI's view
> of the type has implicit 32-bit alignment.
> 
> Let's sort this out for the next release, and revert the change
> for now.
> 
> Fixes: 793423cf07e5 ("efi: Align 'efi_guid_t' to 64 bits")
> Reported-by: Heinrich Schuchardt 
> Cc: Ingo Molnar 
> Cc: Andy Lutomirski 
> Cc: Borislav Petkov 
> Cc: Dave Hansen 
> Cc: H. Peter Anvin 
> Cc: Linus Torvalds 
> Cc: Peter Zijlstra 
> Cc: Qian Cai 
> Cc: Rik van Riel 
> Cc: Thomas Gleixner 
> Signed-off-by: Ard Biesheuvel 
> ---
>  include/linux/efi.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index e6480c805932..100ce4a4aff6 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -48,7 +48,7 @@ typedef u16 efi_char16_t;   /* UNICODE character */
>  typedef u64 efi_physical_addr_t;
>  typedef void *efi_handle_t;
>  
> -typedef guid_t efi_guid_t __aligned(8);
> +typedef guid_t efi_guid_t;
>  
>  #define EFI_GUID(a,b,c,d0,d1,d2,d3,d4,d5,d6,d7) \
>   GUID_INIT(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)
> -- 

It was the top commit on tip:efi/urgent so I've zapped it.

HTH.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-12-06 Thread Borislav Petkov
On Fri, Nov 30, 2018 at 03:04:44PM +0800, lijiang wrote:
> I have noticed the changes on x86, but for IA64, i'm not sure whether it 
> should do the same
> thing, so keep it as before.
> 
> If IA64 people would like to give any comment, that will be appreciated.

I think you should not touch ia64 and not make Tony unnecessarily power
up the old rust.

:-)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [tip:x86/fpu] x86/fpu: Don't export __kernel_fpu_{begin,end}()

2018-12-04 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 11:08:41PM +0100, Borislav Petkov wrote:
> On Mon, Dec 03, 2018 at 10:12:19PM +0100, Ard Biesheuvel wrote:
> > > + * Using the FPU in hardirq is not allowed.
> > 
> > According to the documentation in x86/kernel/fpu/core.c, this is not
> > true. So which one is accurate?
> 
> I think you mean the irq from user mode... Yap, we do allow that.
> 
> Sebastian?

Ok, I've zapped that sentence from the comment until we clarify what
Sebastian meant.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [tip:x86/fpu] x86/fpu: Don't export __kernel_fpu_{begin,end}()

2018-12-03 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 10:12:19PM +0100, Ard Biesheuvel wrote:
> > + * Using the FPU in hardirq is not allowed.
> 
> According to the documentation in x86/kernel/fpu/core.c, this is not
> true. So which one is accurate?

I think you mean the irq from user mode... Yap, we do allow that.

Sebastian?

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 0/2 RESEND v7] add reserved e820 ranges to the kdump kernel e820 table

2018-11-26 Thread Borislav Petkov
On Mon, Nov 26, 2018 at 09:44:46AM -0800, Dave Hansen wrote:
> On 11/23/18 9:12 PM, Lianbo Jiang wrote:
> > These patches add the new I/O resource descriptor 'IORES_DESC_RESERVED'
> > for the iomem resources search interfaces, and in order to make it still
> > work after the new descriptor is added, these codes originally related
> > to 'IORES_DESC_NONE' have been updated.
> 
> I'm having a really hard time figuring out what problem these patches solve.
> 
> What end-user visible behavior does this set change?

Yeah, that was a long process of digging out the "why".

I'd recommend reading the previous threads and, more specifically:

https://lkml.kernel.org/r/cabhmzuuscs3juzusm5y6eyjk6weo7mjj5-eakgvbw0qee%2b3...@mail.gmail.com

in short: we need to export e820 reserved ranges to the second kernel so
that it can talk to PCI devices properly. (And I'm wondering how it did
work until now...)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v11 5/5] x86/boot/KASLR: Walk srat tables to filter immovable memory

2018-11-16 Thread Borislav Petkov
 Subject: Re: [PATCH v11 5/5] x86/boot/KASLR: Walk srat tables to filter 
immovable memory

s/srat/SRAT/g

On Mon, Nov 12, 2018 at 05:46:45PM +0800, Chao Fan wrote:
> KASLR may randomly chooses some positions which are located in movable

choose

> memory regions. This will break memory hotplug feature and make the
> movable memory chosen by KASLR can't be removed.

by KASLR practically immovable.

:)

> The solution is limite KASLR to choose memory regions in immovable

limite?

"to limit"

> node according to SRAT tables.
> 
> If CONFIG_MEMORY_HOTREMOVE enabled, walk through the SRAT memory

   *is* enabled,

> tables and store those immovable memory regions so that KASLR can get
> where to choose for randomization.
> 
> If the amount of immovable memory regions is not zero, which
> means the immovable memory regions existing. Calculate the intersection
> between memory regions from e820/efi memory table and immovable memory
> regions.

This is explaining *what* the patch does and generally doesn't need to
be in the commit messge as people can read it in the patch itself.

> Signed-off-by: Chao Fan 
> ---
>  arch/x86/boot/compressed/kaslr.c | 77 +++-
>  1 file changed, 66 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/kaslr.c 
> b/arch/x86/boot/compressed/kaslr.c
> index b251572e77af..174d2114045e 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -97,6 +97,11 @@ static bool memmap_too_large;
>  /* Store memory limit specified by "mem=nn[KMG]" or "memmap=nn[KMG]" */
>  static unsigned long long mem_limit = ULLONG_MAX;
>  
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +/* Store the immovable memory regions */
> +extern struct mem_vector immovable_mem[MAX_NUMNODES*2];
> +#endif

For this and the other occurrences of ifdef CONFIG_MEMORY_HOTREMOVE,
define empty stubs for those functions in a header and remove the
ifdeffery at the call sites.

> +
>  
>  enum mem_avoid_index {
>   MEM_AVOID_ZO_RANGE = 0,
> @@ -413,6 +418,11 @@ static void mem_avoid_init(unsigned long input, unsigned 
> long input_size,
>   /* Mark the memmap regions we need to avoid */
>   handle_mem_options();
>  
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> + /* Mark the immovable regions we need to choose */
> + get_immovable_mem();
> +#endif
> +
>  #ifdef CONFIG_X86_VERBOSE_BOOTUP
>   /* Make sure video RAM can be used. */
>   add_identity_map(0, PMD_SIZE);
> @@ -568,9 +578,9 @@ static unsigned long slots_fetch_random(void)
>   return 0;
>  }
>  
> -static void process_mem_region(struct mem_vector *entry,
> -unsigned long minimum,
> -unsigned long image_size)
> +static void slots_count(struct mem_vector *entry,

That's a strange rename.

__process_mem_region() makes more sense to me.

> + unsigned long minimum,
> + unsigned long image_size)
>  {
>   struct mem_vector region, overlap;
>   unsigned long start_orig, end;
> @@ -646,6 +656,57 @@ static void process_mem_region(struct mem_vector *entry,
>   }
>  }
>  
> +static bool process_mem_region(struct mem_vector *region,
> +unsigned long long minimum,
> +unsigned long long image_size)
> +{
> + int i;
> + /*
> +  * If no immovable memory found, or MEMORY_HOTREMOVE disabled,
> +  * walk all the regions, so use region directely.

"directly"

> +  */
> + if (num_immovable_mem == 0) {

if (!...

> + slots_count(region, minimum, image_size);
> +
> + if (slot_area_index == MAX_SLOT_AREA) {
> + debug_putstr("Aborted e820/efi memmap scan (slot_areas 
> full)!\n");
> + return 1;
> + }
> + return 0;
> + }
> +

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v11 4/5] x86/boot: Dig out SRAT table from RSDP and find immovable memory

2018-11-16 Thread Borislav Petkov
On Mon, Nov 12, 2018 at 05:46:44PM +0800, Chao Fan wrote:
> To avoid KASLR extracting kernel on movable memory, slove the
  ^

Please introduce a spellchecker into your patch creation workflow.

> conflict between KASLR and movable_node feature, dig the SRAT tables

s/dig/determine/ or "compute SRAT table's address" or so.

Also, replace "dig" with a more suitable verb in all your patches.

> from RSDP pointer. Walk the SRAT tables and store the immovable
> memory regions in immovable_mem[].

"... in an array called immovable_mem[]."

> There are three methods to get RSDP pointer: KEXEC condition,
> EFI confition, BIOS condition.

"condition" is not the right word here.

> If KEXEC add 'acpi_rsdp' to cmdline, use it.
> Otherwise, parse EFI table for RSDP.
> Then, search memory for RSDP.
> 
> Imitate from ACPI code, based on acpi_os_get_root_pointer().
> Process: RSDP->RSDT/XSDT->ACPI root table->SRAT.

What?!

This looks like a comment you've added as a note for yourself but not
part of the final commit message. If you wanna explain the process, then
write it out in plain english as if you're explaining it to someone who
doesn't know what you're doing.

> 
> Signed-off-by: Chao Fan 
> ---
>  arch/x86/boot/compressed/Makefile |   4 +
>  arch/x86/boot/compressed/acpitb.c | 139 ++
>  arch/x86/boot/compressed/kaslr.c  |   4 -
>  arch/x86/boot/compressed/misc.h   |  15 
>  4 files changed, 158 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/Makefile 
> b/arch/x86/boot/compressed/Makefile
> index 466f66c8a7f8..b51f7629b8ef 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -84,6 +84,10 @@ ifdef CONFIG_X86_64
>   vmlinux-objs-y += $(obj)/pgtable_64.o
>  endif
>  
> +#if (defined CONFIG_MEMORY_HOTREMOVE) && (defined CONFIG_RANDOMIZE_BASE)
> +vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/acpitb.o
> +#endif

Right, as previously pointed out, this needs that CONFIG_ symbol and
then you can save yourself most (if not all) of the ifdeffery in the
rest of the patchset.

> +
>  $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
>  
>  vmlinux-objs-$(CONFIG_EFI_STUB) += $(obj)/eboot.o $(obj)/efi_stub_$(BITS).o \
> diff --git a/arch/x86/boot/compressed/acpitb.c 
> b/arch/x86/boot/compressed/acpitb.c
> index 5cfb4efa5a19..161f21a7fb3b 100644
> --- a/arch/x86/boot/compressed/acpitb.c
> +++ b/arch/x86/boot/compressed/acpitb.c
> @@ -14,6 +14,11 @@
>  #define BOOT_STRING
>  #include "../string.h"
>  
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +/* Store the immovable memory regions */
> +struct mem_vector immovable_mem[MAX_NUMNODES*2];
> +#endif
> +
>  /* Search EFI table for RSDP table. */
>  static void efi_get_rsdp_addr(acpi_physical_address *rsdp_addr)
>  {
> @@ -226,3 +231,137 @@ static void get_acpi_rsdp(acpi_physical_address 
> *rsdp_addr)
>   }
>  #endif
>  }
> +
> +/*
> + * Used to dig RSDP table from EFI table or BIOS.
> + * If RSDP table found in EFI table, use it. Or search BIOS.
> + * Based on acpi_os_get_root_pointer().
> + */
> +static acpi_physical_address get_rsdp_addr(void)
> +{
> + acpi_physical_address pa = 0;
> +
> + get_acpi_rsdp();
> +
> + if (!pa)
> + efi_get_rsdp_addr();
> +
> + if (!pa)
> + bios_get_rsdp_addr();
> +
> + return pa;
> +}
> +
> +static struct acpi_table_header *get_acpi_srat_table(void)
> +{
> + acpi_physical_address acpi_table;
> + acpi_physical_address root_table;
> + struct acpi_table_header *header;
> + struct acpi_table_rsdp *rsdp;
> + bool acpi_use_rsdt = false;
> + char *signature;
> + char arg[10];
> + u8 *entry;
> + u32 count;
> + u32 size;
> + int i, j;
> + int ret;
> + u32 len;
> +
> + rsdp = (struct acpi_table_rsdp *)get_rsdp_addr();
> + if (!rsdp)
> + return NULL;
> +
> + ret = cmdline_find_option("acpi", arg, sizeof(arg));
> + if (ret == 4 && !strncmp(arg, "rsdt", 4))
> + acpi_use_rsdt = true;
> +
> + /* Get RSDT or XSDT from RSDP. */
> + if (!acpi_use_rsdt &&
> + rsdp->xsdt_physical_address && rsdp->revision > 1) {
> + root_table = rsdp->xsdt_physical_address;
> + size = ACPI_XSDT_ENTRY_SIZE;
> + } else {
> + root_table = rsdp->rsdt_physical_address;
> + size = ACPI_RSDT_ENTRY_SIZE;
> + }

Reorganize that code here to get rid of acpi_use_rsdt.

> +
> + /* Get ACPI root table from RSDT or XSDT.*/
> + header = (struct acpi_table_header *)root_table;
> + len = header->length;

No checking of that header pointer before dereffing it?

If it is NUL, that gives you a very nasty bug to try to debug in the
early code.

> + count = (u32)((len - sizeof(struct acpi_table_header)) / size);

Uuh, no checking for count wrapping around here due to wrong len? That
would give you a *lot* 

Re: [PATCH v11 3/5] x86/boot: Add get_acpi_rsdp() to parse RSDP in cmdlien from kexec

2018-11-14 Thread Borislav Petkov
On Wed, Nov 14, 2018 at 09:54:50AM +0800, Chao Fan wrote:
> CONFIG_KEXEC is only needed in this function.
> 
> When searching RSDP, there are three methods in order:
> 1. When booting from KEXEC, 'acpi_rsdp' is added to cmdline by KEXEC,
>so it can be parsed and used. CONFIG_KEXEC is needed here.

But theoretically acpi_rsdp can be supplied by the first kernel too,
right?

So you don't need that CONFIG_KEXEC here at all?

> >Ok, let's recap: so far, for your use case you need:
> >
> >CONFIG_MEMORY_HOTREMOVE
> >CONFIG_RANDOMIZE_BASE
> >and now
> >CONFIG_KEXEC
> >
> >So, you can clean up all that ifdeffery by defining a new config item
> >CONFIG_EARLY_PARSE_RSDP or so which depends on all those three items and
> >then you can do
> >
> >vmlinux-objs-$(CONFIG_EARLY_PARSE_RSDP) += $(obj)/acpitb.o
> >
> >and get rid of the most of the ifdeffery.

Regardless of CONFIG_KEXEC - you still need to define a CONFIG_ symbol
for your use case. We won't be parsing RSDP early on !MEMORY_HOTREMOVE
machines, which is the majority.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v11 3/5] x86/boot: Add get_acpi_rsdp() to parse RSDP in cmdlien from kexec

2018-11-14 Thread Borislav Petkov
On Wed, Nov 14, 2018 at 02:12:16PM +0800, Chao Fan wrote:
> But isdigit() would be redefine, so:
> 
> diff --git a/include/linux/ctype.h b/include/linux/ctype.h
> index 363b004426db..aba01c385232 100644
> --- a/include/linux/ctype.h
> +++ b/include/linux/ctype.h
> @@ -23,10 +23,12 @@ extern const unsigned char _ctype[];
>  #define isalnum(c) ((__ismask(c)&(_U|_L|_D)) != 0)
>  #define isalpha(c) ((__ismask(c)&(_U|_L)) != 0)
>  #define iscntrl(c) ((__ismask(c)&(_C)) != 0)
> +#ifndef BOOT_STRING
>  static inline int isdigit(int c)
>  {
> return '0' <= c && c <= '9';
>  }
> +#endif
>  #define isgraph(c) ((__ismask(c)&(_P|_U|_L|_D)) != 0)
>  #define islower(c) ((__ismask(c)&(_L)) != 0)
>  #define isprint(c) ((__ismask(c)&(_P|_U|_L|_D|_SP)) != 0)
> 
> Now I can make it.
> I wonder whether this is OK to cover isdigit() with 'BOOT_STRING' tag.

See the beginning of arch/x86/boot/compressed/kaslr.c for a possible way
to disable boot/ctype.h

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v11 3/5] x86/boot: Add get_acpi_rsdp() to parse RSDP in cmdlien from kexec

2018-11-13 Thread Borislav Petkov
On Tue, Nov 13, 2018 at 03:06:16PM -0500, Masayoshi Mizuma wrote:
> I just felt the BOOT_STRING thing in lib/kstrtox.c confuses...
> I'm OK for now if it's applied your below comment.

Well, actually, upon a second look, I don't think that including a .c
file into a header is ok:

diff --git a/arch/x86/boot/string.h b/arch/x86/boot/string.h
index 3d78e27077f4..0ff3edb888e4 100644
--- a/arch/x86/boot/string.h
+++ b/arch/x86/boot/string.h
@@ -30,3 +30,7 @@ extern unsigned long long simple_strtoull(const char *cp, 
char **endp,
  unsigned int base);

 #endif /* BOOT_STRING_H */
+
+#ifdef BOOT_STRING
+#include "../../../lib/kstrtox.c"
+#endif

Chao, why isn't this part of arch/x86/boot/compressed/misc.c ?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v11 3/5] x86/boot: Add get_acpi_rsdp() to parse RSDP in cmdlien from kexec

2018-11-13 Thread Borislav Petkov
On Tue, Nov 13, 2018 at 11:11:11AM -0500, Masayoshi Mizuma wrote:
> I think it's not very good idea to use kstrtoull() in
> arch/x86/boot/compressed/* because some tricks are needed to
> use the function, looks like Chao is trying...

Ok, I had a look at the patch. And frankly, I don't see anything wrong
with the aspect of using kstrtoull() in the compressed stage too and
getting rid of simple_strtoull().

So what are your reservations?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v11 3/5] x86/boot: Add get_acpi_rsdp() to parse RSDP in cmdlien from kexec

2018-11-13 Thread Borislav Petkov
On Mon, Nov 12, 2018 at 05:46:43PM +0800, Chao Fan wrote:
> Imitate setup_acpi_rsdp() for the early_param of 'acpi_rsdp'.
> KEXEC writes the RSDP pointer to cmdline for EFI booting.
> So if 'acpi_rsdp' found in cmdline, use it directely.
> 
> Since function kstrtoull() is needed, include it in
> arch/x86/boot/string.h. To solve the definition conflict
> problem, set BOOT_STRING tag to expose only kstrtoull() and
> functions used by it. Other functions in lib/kstrtox.c will
> be covered.
> 
> Signed-off-by: Chao Fan 
> ---
>  arch/x86/boot/compressed/acpitb.c | 26 ++
>  arch/x86/boot/string.h|  4 
>  lib/kstrtox.c |  4 
>  3 files changed, 34 insertions(+)
> 
> diff --git a/arch/x86/boot/compressed/acpitb.c 
> b/arch/x86/boot/compressed/acpitb.c
> index 50fa65cf824d..5cfb4efa5a19 100644
> --- a/arch/x86/boot/compressed/acpitb.c
> +++ b/arch/x86/boot/compressed/acpitb.c
> @@ -8,6 +8,12 @@
>  #include 
>  #include 
>  
> +#define STATIC
> +#include 
> +
> +#define BOOT_STRING
> +#include "../string.h"
> +
>  /* Search EFI table for RSDP table. */
>  static void efi_get_rsdp_addr(acpi_physical_address *rsdp_addr)
>  {
> @@ -200,3 +206,23 @@ static void bios_get_rsdp_addr(acpi_physical_address 
> *rsdp_addr)
>   *rsdp_addr = (acpi_physical_address)address;
>   }
>  }
> +
> +static void get_acpi_rsdp(acpi_physical_address *rsdp_addr)
> +{
> +#ifdef CONFIG_KEXEC

Ok, why is that CONFIG_KEXEC dependency needed now too?

Ok, let's recap: so far, for your use case you need:

CONFIG_MEMORY_HOTREMOVE
CONFIG_RANDOMIZE_BASE
and now
CONFIG_KEXEC

So, you can clean up all that ifdeffery by defining a new config item
CONFIG_EARLY_PARSE_RSDP or so which depends on all those three items and
then you can do

vmlinux-objs-$(CONFIG_EARLY_PARSE_RSDP) += $(obj)/acpitb.o

and get rid of the most of the ifdeffery.

Yes?

> + unsigned long long res;
> + int len = 0;
> + char *val;
> +
> + val = malloc(19);
> + len = cmdline_find_option("acpi_rsdp", val, 19);
> +

^ Superfluous newline.

> + if (len == -1)
> + return;

That check is not needed since you do > 0 below.

> +
> + if (len > 0) {
> + val[len] = 0;
> + *rsdp_addr = (acpi_physical_address)kstrtoull(val, 16, );
> + }
> +#endif
> +}
> diff --git a/arch/x86/boot/string.h b/arch/x86/boot/string.h
> index 3d78e27077f4..0ff3edb888e4 100644
> --- a/arch/x86/boot/string.h
> +++ b/arch/x86/boot/string.h
> @@ -30,3 +30,7 @@ extern unsigned long long simple_strtoull(const char *cp, 
> char **endp,
> unsigned int base);
>  
>  #endif /* BOOT_STRING_H */
> +
> +#ifdef BOOT_STRING
> +#include "../../../lib/kstrtox.c"
> +#endif
> diff --git a/lib/kstrtox.c b/lib/kstrtox.c
> index 1006bf70bf74..3804db9eed56 100644
> --- a/lib/kstrtox.c
> +++ b/lib/kstrtox.c
> @@ -126,6 +126,8 @@ int kstrtoull(const char *s, unsigned int base, unsigned 
> long long *res)
>  }
>  EXPORT_SYMBOL(kstrtoull);

This needs a comment to explain what is that guard used for.

> +#ifndef BOOT_STRING
> +
>  /**
>   * kstrtoll - convert a string to a long long
>   * @s: The start of the string. The string must be null-terminated, and may 
> also
> @@ -408,3 +410,5 @@ kstrto_from_user(kstrtou16_from_user, kstrtou16,  
> u16);
>  kstrto_from_user(kstrtos16_from_user,kstrtos16,  s16);
>  kstrto_from_user(kstrtou8_from_user, kstrtou8,   u8);
>  kstrto_from_user(kstrtos8_from_user, kstrtos8,   s8);
> +
> +#endif

#endif /* BOOT_STRING */

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v11 3/5] x86/boot: Add get_acpi_rsdp() to parse RSDP in cmdlien from kexec

2018-11-13 Thread Borislav Petkov
On Tue, Nov 13, 2018 at 11:11:11AM -0500, Masayoshi Mizuma wrote:
> I know checkpatch.pl says an warning about simple_strtoull(),
> however, I believe the warning is for simple_strtoull() defined
> in lib/vsprintf.c.

simple_strtoull is deprecated for various reasons. I'll take a look at
Chao's patch soon.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v11 2/5] x86/boot: Add bios_get_rsdp_addr() to search RSDP in memory

2018-11-13 Thread Borislav Petkov
On Tue, Nov 13, 2018 at 10:10:16AM +0800, Chao Fan wrote:
> The 'rover' was named as 'mem_rover', but the length of this line is too
> long. So shorten it as 'rever' so that they can keep in one line.

I still have no clue what "rover" or "rever" means...

> This name came from ACPI driver code, acpi_find_root_pointer().
> Used for the loop. If you have a better name, please tell me.

I would if I knew what it meant. If you don't know either, then name it
to something descriptive so that it is clear what it is when reading the
code.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v11 2/5] x86/boot: Add bios_get_rsdp_addr() to search RSDP in memory

2018-11-12 Thread Borislav Petkov
On Mon, Nov 12, 2018 at 05:46:42PM +0800, Chao Fan wrote:
> Imitate ACPI code to search RSDP pointer from memory.
> Walk memory and check the signature until get the RSDP signature.
> Based on acpi_tb_scan_memory_for_rsdp() and acpi_find_root_pointer().
> If didn't get RSDP from EFI table, will run this function.

That's some very strange english. Please improve.

> Used for later patch to dig out SRAT table and get the memory
> information. And figure out the immovable memory regions
> to avoid KASLR extracts kernel on movable memory, slove the
^^

Please introduce a spellchecker into your patch creation workflow.

> conflict between KASLR and movable_node feature.

Btw, this paragraph could be used for a CONFIG_ item you could define
for your particular use case. Because right now you have funnies like:

+#if (defined CONFIG_MEMORY_HOTREMOVE) && (defined CONFIG_RANDOMIZE_BASE)
+vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/acpitb.o
+#endif

where CONFIG_RANDOMIZE_BASE is repeated for no good reason.

But we'll see - need to get to the end of your patch series first.

> Signed-off-by: Chao Fan 
> ---
>  arch/x86/boot/compressed/acpitb.c | 106 ++
>  1 file changed, 106 insertions(+)
> 
> diff --git a/arch/x86/boot/compressed/acpitb.c 
> b/arch/x86/boot/compressed/acpitb.c
> index 56b54b0e0889..50fa65cf824d 100644
> --- a/arch/x86/boot/compressed/acpitb.c
> +++ b/arch/x86/boot/compressed/acpitb.c
> @@ -94,3 +94,109 @@ static void efi_get_rsdp_addr(acpi_physical_address 
> *rsdp_addr)
>   }
>  #endif
>  }
> +
> +static u8 compute_checksum(u8 *buffer, u32 length)
> +{
> + u8 sum = 0;
> + u8 *end = buffer + length;
> +
> + while (buffer < end)
> + sum = (u8)(sum + *(buffer++));

What's that cast for?

Ah, this is the version in acpi_tb_checksum(). Well, I'd write this
simply as:

sum += *(buffer++);

> +
> + return sum;
> +}
> +
> +/*
> + * Used to search a block of memory for the RSDP signature.
> + * Return Pointer to the RSDP if found, otherwise NULL.

 "Returns pointer... "

> + * Based on acpi_tb_scan_memory_for_rsdp().
> + */
> +static u8 *scan_mem_for_rsdp(u8 *start, u32 length)
> +{
> + struct acpi_table_rsdp *rsdp;
> + u8 *end;
> + u8 *rover;

rover?

> +
> + end = start + length;
> +
> + /* Search from given start address for the requested length */
> + for (rover = start; rover < end; rover += ACPI_RSDP_SCAN_STEP) {
> + /*
> +  * The RSDP signature and checksum must both be correct
> +  * Note: Sometimes there exists more than one RSDP in memory;
> +  * the valid RSDP has a valid checksum, all others have an
> +  * invalid checksum.
> +  */
> + rsdp = (struct acpi_table_rsdp *)rover;
> +
> + /* Nope, BAD Signature */
> + if (!ACPI_VALIDATE_RSDP_SIG(rsdp->signature))
> + continue;
> +
> + /* Check the standard checksum */
> + if (compute_checksum((u8 *) rsdp, ACPI_RSDP_CHECKSUM_LENGTH))
> + continue;
> +
> + /* Check extended checksum if table version >= 2 */
> + if ((rsdp->revision >= 2) &&
> + (compute_checksum((u8 *) rsdp, ACPI_RSDP_XCHECKSUM_LENGTH)))
> + continue;
> +
> + /* Sig and checksum valid, we have found a real RSDP */
> + return rover;
> + }
> + return NULL;
> +}
> +
> +/*
> + * Used to search RSDP physical address.
> + * Based on acpi_find_root_pointer(). Since only use physical address
> + * in this period, so there is no need to do the memory map jobs.

You mean: "All addresses used here are physical."?

"memory map jobs"?

Please be more careful when writing comments which are going to be read
by other people. "jobs" means a lot of things and you don't want "jobs"
in that context here.

> + */
> +static void bios_get_rsdp_addr(acpi_physical_address *rsdp_addr)

Same remark as before: the function is void and you're returning through
its parameter. Make it return acpi_physical_address instead.

> +{
> + struct acpi_table_rsdp *rsdp;
> + u8 *table_ptr;
> + u8 *mem_rover;

rover?

> + u32 address;
> +
> + /*
> +  * Get the location of the Extended BIOS Data Area (EBDA)
> +  * Since we use physical address directely, so

It is "directly" - what about that spellchecker?

> +  * acpi_os_map_memory() and acpi_os_unmap_memory() are
> +  * not needed here.

Why do you even need to say that here?

> +  */
> + table_ptr = (u8 *)ACPI_EBDA_PTR_LOCATION;
> + *(u32 *)(void *) = *(u16 *)(void *)table_ptr;
> + address <<= 4;
> + table_ptr = (u8 *)address;

arch/x86/boot/compressed/acpitb.c: In function ‘bios_get_rsdp_addr’:
arch/x86/boot/compressed/acpitb.c:172:14: warning: cast to pointer from integer 
of different size 

Re: [PATCH v11 1/5] x86/boot: Add efi_get_rsdp_addr() to dig out RSDP from EFI table

2018-11-12 Thread Borislav Petkov
On Mon, Nov 12, 2018 at 05:46:41PM +0800, Chao Fan wrote:
> In order to parse SRAT table and get memory information, RSDP pointer
> should be found. In kernel, there are three methods to get RSDP:
> EFI condition, BIOS condition and KEXEC condition. The first works
> for EFI condition.

"condition"?

Also, please explain shortly what all those abbreviations mean: think
of a person reading your commit message who doesn't have any clue from
ACPI.

> Imitate ACPI code and EFI code to dig RSDP pointer from EFI tables.
> Process: boot_param->systab->efi_config_table->RSDP.
> Based on efi_init(), efi_config_init(), efi_config_parse_tables().
> 
> Signed-off-by: Chao Fan 
> ---
>  arch/x86/boot/compressed/acpitb.c | 96 +++
>  1 file changed, 96 insertions(+)
>  create mode 100644 arch/x86/boot/compressed/acpitb.c
> 
> diff --git a/arch/x86/boot/compressed/acpitb.c 
> b/arch/x86/boot/compressed/acpitb.c
> new file mode 100644
> index ..56b54b0e0889
> --- /dev/null
> +++ b/arch/x86/boot/compressed/acpitb.c
> @@ -0,0 +1,96 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define BOOT_CTYPE_H
> +#include "misc.h"
> +#include "error.h"
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Search EFI table for RSDP table. */
> +static void efi_get_rsdp_addr(acpi_physical_address *rsdp_addr)

This is just silly: the function returns void and has a single parameter
which is an *output* parameter?!

Why isn't the signature

static acpi_physical_address *efi_get_rsdp_addr(void)

instead?

> +{
> +#ifdef CONFIG_EFI
> + efi_system_table_t *systab;
> + bool efi_64 = false;

You're setting it below already, why here too?

> + void *config_tables;
> + struct efi_info *e;
> + char *sig;
> + int size;
> + int i;
> +
> + e = _params->efi_info;
> + sig = (char *)>efi_loader_signature;
> +
> + if (!strncmp(sig, EFI64_LOADER_SIGNATURE, 4))
> + efi_64 = true;
> + else if (!strncmp(sig, EFI32_LOADER_SIGNATURE, 4))
> + efi_64 = false;
> + else {
> + debug_putstr("Wrong EFI loader signature.\n");
> + return;
> + }
> +
> + /* Get systab from boot params. Based on efi_init(). */
> +#ifdef CONFIG_X86_64
> + systab = (efi_system_table_t *)(
> + e->efi_systab | ((__u64)e->efi_systab_hi<<32));

No ugly line breaks with open braces trailing like that, pls - just let
it stick out.

> +#else
> + if (e->efi_systab_hi || e->efi_memmap_hi) {
> + debug_putstr("Table located above 4GB. EFI should be 
> disabled.\n");

You need to say here what really happens here:

debug_putstr("Error getting RSDP address: EFI system table 
located above 4GB.\n");

The same below.

> + return;
> + }
> + systab = (efi_system_table_t *)e->efi_systab;
> +#endif
> +
> + if (!systab)
> + return;
> +
> + /*
> +  * Get EFI tables from systab. Based on efi_config_init() and
> +  * efi_config_parse_tables(). Only dig out the config_table.
> +  */
> + size = efi_64 ? sizeof(efi_config_table_64_t) :
> + sizeof(efi_config_table_32_t);
> +
> + for (i = 0; i < systab->nr_tables; i++) {
> + efi_guid_t guid;
> + unsigned long table;

Put the void *config_tables declaration here.

> +
> + config_tables = (void *)(systab->tables + size * i);
> + if (efi_64) {
> + efi_config_table_64_t *tmp_table;
> +
> + tmp_table = (efi_config_table_64_t *)config_tables;
> + guid = tmp_table->guid;
> + table = tmp_table->table;
> +#ifndef CONFIG_64BIT

Above you have CONFIG_X86_64, here CONFIG_64BIT. Please use one only.

Also, use IS_ENABLED() instead.

> + if (table >> 32) {
> + debug_putstr("Table located above 4G. EFI 
> should be disabled.\n");
> + return;
> + }
> +#endif
> + } else {
> + efi_config_table_32_t *tmp_table;
> +
> + tmp_table = (efi_config_table_32_t *)config_tables;
> + guid = tmp_table->guid;
> + table = tmp_table->table;
> + }
> +
> + /*
> +  * Get RSDP from EFI tables.
> +  * If ACPI20 table found, use it.
> +  * If ACPI20 table not found, but ACPI table found,
> +  * use the ACPI table.
> +  */

That comment is the opposite of what the code does. Also, why is that
comment needed at all? If anything, it should say *why* ACPI_TABLE_GUID
is preferred and then the fallback to ACPI_20_TABLE_GUID is done - not
*what* it does. That's easily visible in the code.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

2018-11-10 Thread Borislav Petkov
On Thu, Nov 08, 2018 at 11:51:29AM +0100, Borislav Petkov wrote:
> A global definition which doesn't need allocation?
> 
> Maybe hpa would have another, better idea...

...and he has: just put that address in a new field in struct
boot_params by converting one of the padding arrays there.

Don't forget to document it in Documentation/x86/zero-page.txt

This way you don't need any of the allocation fun or to use setup_data
at all.

HTH.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

2018-11-08 Thread Borislav Petkov
On Tue, Nov 06, 2018 at 05:21:34PM -0500, Masayoshi Mizuma wrote:
> On Tue, Nov 06, 2018 at 09:45:11PM +0100, Borislav Petkov wrote:
> > On Tue, Nov 06, 2018 at 02:36:38PM -0500, Masayoshi Mizuma wrote:
> > > Yes, I think I can see what you are saying. However, I'm not sure how
> > > I use the setup_data in legacy BIOS environment.
> > 
> > What is "legacy BIOS environment" and what does that have to do with
> > setup_data?
> 
> My proposed patch [1] is useful only for EFI environment. The patch
> allocates a setup_date structure by efi_call_early() and store the
> KASLR data into the memory area.
> 
> +static void setup_kaslr(struct boot_params *params)
> +{
> + struct setup_data *kaslr_data = NULL;
> + struct setup_data *data;
> + unsigned long size;
> + efi_status_t status;
> +
> + size = sizeof(struct setup_data) + sizeof(unsigned long long);
> +
> + status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
> + size, (void **)_data);
> 
> I'm not sure how I allocate such memory on no EFI environment.

A global definition which doesn't need allocation?

Maybe hpa would have another, better idea...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

2018-11-06 Thread Borislav Petkov
On Tue, Nov 06, 2018 at 02:36:38PM -0500, Masayoshi Mizuma wrote:
> Yes, I think I can see what you are saying. However, I'm not sure how
> I use the setup_data in legacy BIOS environment.

What is "legacy BIOS environment" and what does that have to do with
setup_data?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v10 2/7] x86/boot: Copy kstrtoull() to compressed period

2018-11-06 Thread Borislav Petkov
On Mon, Oct 22, 2018 at 05:37:15PM +0800, Chao Fan wrote:
> kstrtoull() lives in 'uncompressed' period, used to
> convert a string to an unsigned long long.
> Copy to 'compressed' so that we can use it to
> convert the memory address from sting to unsigned

sting?

> long long in 'compressed' period.
> 
> Signed-off-by: Chao Fan 
> ---
>  arch/x86/boot/compressed/misc.c | 88 +
>  arch/x86/boot/compressed/misc.h |  4 ++
>  2 files changed, 92 insertions(+)

Why do you need to copy things?

You can link that file into compressed/ as lib/kstrtox.c is a library or
include it similar to what arch/x86/boot/compressed/cmdline.c does.

Still better than copying the code.

> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index 008fdc47a29c..40378408d980 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -63,6 +63,10 @@ static inline void debug_puthex(const char *s)
>  
>  #endif
>  
> +#if (defined CONFIG_RANDOMIZE_BASE) && (defined CONFIG_RANDOMIZE_BASE)

CONFIG_RANDOMIZE_BASE twice huh? Once not enough?

:-)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

2018-11-06 Thread Borislav Petkov
On Thu, Oct 25, 2018 at 09:40:51AM -0400, Masayoshi Mizuma wrote:
> I have another idea to solve this issue. Adding a SRAT parsing code
> to arch/x86/mm/kaslr.c. It is useful for both EFI and BIOS and
> also we don't need a new kernel parameter...
> Dose the idea make sense?

Ok, having swapped the whole thing back into my brain, forget what I
said earlier today.

Didn't we talk about passing info with setup_data to the later kernel
stage? You even had a patch:

https://lkml.kernel.org/r/20181022154204.kagmdb55jtoez4ca@gabell

So what is that "idea" again about adding SRAT parsing code to
arch/x86/mm/kaslr.c?!?!

The intent of passing info with setup_data is to *avoid* parsing SRAT
yet another time and duplicating that code one more time.

IOW:

* The first place that needs SRAT parsing, does the parsing - i.e., the
compressed stage.

* Later stages get information passed to them with setup_data. No second
parsing.

Ok?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v10 1/7] x86/boot: Introduce cmdline_find_option_arg()to detect if option=arg in cmdline

2018-11-06 Thread Borislav Petkov
On Mon, Oct 22, 2018 at 05:37:14PM +0800, Chao Fan wrote:
> Now, there are cmdline_find_option() and cmdline_find_option_bool() in
> cmdline.c. Sometimes, when detecting such as whether 'acpi=off' is
> in cmdline, we need to cmdline_find_option() first, then compare
> the argument. Now splite the operation as a independent function.
> Introduce a new function cmdline_find_option_arg() to detect whether
> option is in command line and the value is arg.

For all future commit messages you write:

Use passive tone in your commit message: no "we", etc.

Also, pls read section "2) Describe your changes" in
Documentation/process/submitting-patches.rst.

> Signed-off-by: Chao Fan 
> ---
>  arch/x86/boot/compressed/cmdline.c | 15 +++
>  arch/x86/boot/compressed/misc.h|  1 +
>  2 files changed, 16 insertions(+)
> 
> diff --git a/arch/x86/boot/compressed/cmdline.c 
> b/arch/x86/boot/compressed/cmdline.c
> index af6cda0b7900..61118c69feb8 100644
> --- a/arch/x86/boot/compressed/cmdline.c
> +++ b/arch/x86/boot/compressed/cmdline.c
> @@ -1,5 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include "misc.h"
> +#define STATIC
> +#include 
>  
>  #if CONFIG_EARLY_PRINTK || CONFIG_RANDOMIZE_BASE || CONFIG_X86_5LEVEL
>  
> @@ -30,5 +32,18 @@ int cmdline_find_option_bool(const char *option)
>  {
>   return __cmdline_find_option_bool(get_cmd_line_ptr(), option);
>  }
> +bool cmdline_find_option_arg(const char *option, const char *arg, int 
> argsize)
> +{
> + char *buffer = malloc(argsize+1);
> + bool find = false;
> + int ret;
> +
> + ret = cmdline_find_option(option, buffer, argsize+1);
> + if (ret == argsize && !strncmp(buffer, arg, argsize))
> + find = true;
> +
> + free(buffer);
> + return find;
> +}

I don't think such wrapper is needed. Simply calling
cmdline_find_option() and then examining the buffer - like other call
sites do - is perfectly fine.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

2018-11-06 Thread Borislav Petkov
On Thu, Oct 25, 2018 at 09:40:51AM -0400, Masayoshi Mizuma wrote:
> My actual use case is for EFI boxes, however, I think it's better to useful
> for legacy BIOS as well because memory hot-plug affinity in SRAT and KASLR
> are available on legacy BIOS.
> Actually, we can create such environment in qemu.

Ah, right, qemu. :)

> I have another idea to solve this issue. Adding a SRAT parsing code
> to arch/x86/mm/kaslr.c. It is useful for both EFI and BIOS and
> also we don't need a new kernel parameter...
> Dose the idea make sense?

The more automatic stuff we do and we don't have to involve the user,
the better.

However, lemme look at Chao's current patchset first - we should not go
nuts by putting SRAT parsing everywhere :)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

2018-10-25 Thread Borislav Petkov
On Mon, Oct 22, 2018 at 11:42:05AM -0400, Masayoshi Mizuma wrote:
> I'm trying to store the SRAT info and pass it to kernel_randomize_memory(),
> looks like add_e820ext()/parse_setup_data().
> 
> Is the approach useful only EFI environment? I'm not sure how I

Does it matter for non-EFI even?

I mean, you want this code only for your use case - when you have
movable memory and you're doing kexec, yes?

And those machines are all EFI boxes I'd assume...

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--


Re: [PATCH v8 0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

2018-10-16 Thread Borislav Petkov
On Tue, Oct 16, 2018 at 03:54:30PM -0400, Masayoshi Mizuma wrote:
> Ah, sorry, I misunderstood your suggetion...
> In parse_setup_data(), the data is picked up from boot_params,

Yes, this is the pointer to the setup_data linked list head. See

Documentation/ABI/testing/sysfs-kernel-boot_params
Documentation/x86/boot.txt

for more information and basically grep the tree for examples.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

2018-10-16 Thread Borislav Petkov
On Tue, Oct 16, 2018 at 11:13:54AM -0400, Masayoshi Mizuma wrote:
> Hi Boris, Baoquan and all,
> 
> I have created a RFC patch for adjust KASLR padding size.
> This patch is based on Can's v8 patch [1/3], and the Can's patch
> will be changed in the future, so this patch is just RFC.
> 
> Welcome to any comments and suggestions. Thanks!

...

> diff --git a/arch/x86/include/uapi/asm/bootparam.h 
> b/arch/x86/include/uapi/asm/bootparam.h
> index a06cbf019744..c987c725e93a 100644
> --- a/arch/x86/include/uapi/asm/bootparam.h
> +++ b/arch/x86/include/uapi/asm/bootparam.h
> @@ -191,7 +191,8 @@ struct boot_params {
>   __u8  _pad7[0x290-0x1f1-sizeof(struct setup_header)];
>   __u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];  /* 0x290 */
>   struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 
> */
> - __u8  _pad8[48];/* 0xcd0 */
> + __u8  _pad8[40];/* 0xcd0 */
> + __u64 possible_mem_addr;/* 0xcf8 */

Where in the example I gave you with add_e820ext() do you see members
being added to struct boot_params?

Take a look at it again pls.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 1/3] x86/boot: Add acpitb.c to parse acpi tables

2018-10-16 Thread Borislav Petkov
On Tue, Oct 16, 2018 at 10:48:44AM +0800, Chao Fan wrote:
> Sorry for disturbing you again, I want to make sure this detail with you.
> You mean that I need splite this as a function and put it to
> cmdline.c, right?

Extract that functionality into a generic helper so that
handle_mem_options() and your get_acpi_rsdp() can call it instead
of duplicating the code. Also, why aren't they both using
cmdline_find_option() directly?

If something's missing, extend cmdline_find_option() to serve your
purposes too instead of copying the same code.

Make more sense?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

2018-10-13 Thread Borislav Petkov
On Sat, Oct 13, 2018 at 05:45:51PM -0400, Masayoshi Mizuma wrote:
> Yes, but I think we need to arrange the Chao's SRAT parsing to be used
> from kernel_randomize_memory() because Chao's approach needs the SRAT
> parsing before extract kernel and the padding size calculation needs
> the parsing in start_kernel(), so they are living in different life
> cycle and space.

Or you would like to pass some info from the compressed kernel to kernel
proper?

For example, something like the passing in add_e820ext() and the consumtion
in parse_setup_data():

switch (data_type) {
case SETUP_E820_EXT:
e820__memory_setup_extended(pa_data, data_len);

So info you need for your padding gets collected in
arch/x86/boot/compressed/acpitb.c and you consume it in
kernel_randomize_memory()?

Would that work?

:-)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 1/3] x86/boot: Add acpitb.c to parse acpi tables

2018-10-12 Thread Borislav Petkov
On Fri, Oct 12, 2018 at 05:36:38PM +0800, Chao Fan wrote:
> Prefer to compile out entire functions, rather than portions of functions or
> portions of expressions.  Rather than putting an ifdef in an expression, 
> factor
> out part or all of the expression into a separate helper function and apply 
> the
> conditional to that function.
> 
> So I am puzzled. If my understanding is wrong, please let me know.

If you do it the way I suggested, you simply have one ifdeffery branch
less. And ifdeffery is ugly. So less clutter. Also, this way you do
compile out entire functions too.

HTH.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 1/3] x86/boot: Add acpitb.c to parse acpi tables

2018-10-11 Thread Borislav Petkov
On Wed, Oct 10, 2018 at 04:41:17PM +0800, Chao Fan wrote:
> There is a bug that kaslr may randomly chooses some positions
> which are located in movable memory regions. This will break memory
> hotplug feature and make the movable memory chosen by KASLR can't be
> removed. So dig SRAT table from ACPI tables to get memory information.
> 
> Imitate the ACPI code of parsing ACPI tables to dig and read ACPI
> tables. Since some operations are not needed here, functions are
> simplified. Functions will be used to dig only SRAT tables to get
> information of memory, so that KASLR can the memory in immovable node.
> 
> And also, these functions won't influence the initialization of
> ACPI after start_kernel().
> 
> Since use physical address directely, so acpi_os_map_memory()
> and acpi_os_unmap_memory() are not needed.
> 
> Signed-off-by: Chao Fan 
> ---
>  arch/x86/boot/compressed/Makefile |   2 +
>  arch/x86/boot/compressed/acpitb.c | 405 ++
>  arch/x86/boot/compressed/misc.h   |   8 +
>  3 files changed, 415 insertions(+)
>  create mode 100644 arch/x86/boot/compressed/acpitb.c
> 
> diff --git a/arch/x86/boot/compressed/Makefile 
> b/arch/x86/boot/compressed/Makefile
> index 28764dacf018..1609e4efcaed 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -83,6 +83,8 @@ ifdef CONFIG_X86_64
>   vmlinux-objs-y += $(obj)/pgtable_64.o
>  endif
>  
> +vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/acpitb.o

This should be CONFIG_MEMORY_HOTREMOVE *and* CONFIG_RANDOMIZE_BASE.
Otherwise we don't need all that code.

>  $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
>  
>  vmlinux-objs-$(CONFIG_EFI_STUB) += $(obj)/eboot.o $(obj)/efi_stub_$(BITS).o \
> diff --git a/arch/x86/boot/compressed/acpitb.c 
> b/arch/x86/boot/compressed/acpitb.c
> new file mode 100644
> index ..6b869e3f9780
> --- /dev/null
> +++ b/arch/x86/boot/compressed/acpitb.c
> @@ -0,0 +1,405 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define BOOT_CTYPE_H
> +#include "misc.h"
> +#include "error.h"
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +extern unsigned long get_cmd_line_ptr(void);
> +
> +#define STATIC
> +#include 
> +
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +struct mem_vector {
> + unsigned long long start;
> + unsigned long long size;
> +};
> +/* Store the immovable memory regions */
> +struct mem_vector immovable_mem[MAX_NUMNODES*2];
> +#endif
> +
> +#ifdef CONFIG_EFI
> +/* Search EFI table for rsdp table. */
> +static bool efi_get_rsdp_addr(acpi_physical_address *rsdp_addr)
> +{
> + efi_system_table_t *systab;
> + bool find_rsdp = false;
> + bool efi_64 = false;
> + void *config_tables;
> + struct efi_info *e;
> + char *sig;
> + int size;
> + int i;
> +
> + e = _params->efi_info;
> + sig = (char *)>efi_loader_signature;
> +
> + if (!strncmp(sig, EFI64_LOADER_SIGNATURE, 4))
> + efi_64 = true;
> + else if (!strncmp(sig, EFI32_LOADER_SIGNATURE, 4))
> + efi_64 = false;
> + else {
> + debug_putstr("Wrong EFI loader signature.\n");
> + return false;
> + }
> +
> + /* Get systab from boot params. Based on efi_init(). */
> +#ifdef CONFIG_X86_32

Why the efi_64 detection above but the ifdeffery here? Why not test
efi_64 instead?

> + if (e->efi_systab_hi || e->efi_memmap_hi) {
> + debug_putstr("Table located above 4GB, disabling EFI.\n");

Are you disabling EFI? Where?

Ah, I see, this code is copied from arch/x86/platform/efi/efi.c.

So when copying, fix the user-visible strings too.

> + return false;
> + }
> + systab = (efi_system_table_t *)e->efi_systab;
> +#else
> + systab = (efi_system_table_t *)(
> + e->efi_systab | ((__u64)e->efi_systab_hi<<32));
> +#endif
> +
> + if (systab == NULL)

if (!systab)

Fix all other occurrences.

> + return false;
> +
> + /*
> +  * Get EFI tables from systab. Based on efi_config_init() and
> +  * efi_config_parse_tables(). Only dig the config_table.

dig out

> +  */
> + size = efi_64 ? sizeof(efi_config_table_64_t) :
> + sizeof(efi_config_table_32_t);
> +
> + for (i = 0; i < systab->nr_tables; i++) {
> + efi_guid_t guid;
> + unsigned long table;
> +
> + config_tables = (void *)(systab->tables + size * i);
> + if (efi_64) {
> + efi_config_table_64_t *tmp_table;
> +
> + tmp_table = (efi_config_table_64_t *)config_tables;
> + guid = tmp_table->guid;
> + table = tmp_table->table;
> +#ifndef CONFIG_64BIT
> + if (table >> 32) {
> + debug_putstr("Table located above 4G, disabling 
> EFI.\n");

Fix that.

> + return false;
> + 

Re: [PATCH v8 0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

2018-10-10 Thread Borislav Petkov
On Wed, Oct 10, 2018 at 04:41:16PM +0800, Chao Fan wrote:
> In the earliest time, I tried to dig ACPI tabls to solve this problem.
> But I didn't splite the code in 'compressed/' and ACPI code, so the patch
> is hard to follow so refused by community.
> Somebody suggest to add a kernel parameter to specify the
> immovable memory so that limit kaslr in these regions. Then I make
> a new patchset. After several versions, Ingo gave a suggestion:
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1634024.html
> Follow Ingo's suggestion, imitate the ACPI code to parse the acpi
> tables, so that the kaslr can get necessary memory information in
> ACPI tables.
> I think ACPI code is an independent part, so copy the codes
> and functions to 'compressed/' directory, so that kaslr won't
> influence the initialization of ACPI.

You say "copy". I'm still about to look at the code but can those
functions be carved out in a separate compilation unit which ACPI *and*
KASLR can both link with so that there's no duplication?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

2018-10-10 Thread Borislav Petkov
On Wed, Oct 10, 2018 at 11:14:26AM +0200, Thomas Gleixner wrote:
> Yes, it's different, but if the SRAT information is available early, then
> the command line parameter can go away because then the required
> information for Masa's problem is available as well.

Exactly. And I'd prefer we delayed the command line parameter until we
figure out we really need it and not expose it to upstream and then
remove it shortly after.

So I'd suggest we move Masa's patches to a separate branch and not send
it up this round.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v8 0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

2018-10-10 Thread Borislav Petkov
On Wed, Oct 10, 2018 at 04:41:16PM +0800, Chao Fan wrote:
> ***Background:
> People reported that kaslr may randomly chooses some positions
> which are located in movable memory regions. This will break memory
> hotplug feature and make the movable memory chosen by KASLR can't be
> removed.
> 
> ***Solutions:
> There should be a method to limit kaslr to choosing immovable memory
> regions, so there are 2 solutions:
> 1) Add a kernel parameter to specify the memory regions.
> 2) Get the information of memory hot-remove, then kaslr will know the
>right regions.
> In method 2, information about memory hot-remove is in ACPI
> tables, which will be parsed after start_kernel(), kaslr can't get
> the information.
> In method 1, users should know the regions address and specify in
> kernel parameter.
> 
> In the earliest time, I tried to dig ACPI tabls to solve this problem.
> But I didn't splite the code in 'compressed/' and ACPI code, so the patch
> is hard to follow so refused by community.
> Somebody suggest to add a kernel parameter to specify the
> immovable memory so that limit kaslr in these regions. Then I make
> a new patchset. After several versions, Ingo gave a suggestion:
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1634024.html
> Follow Ingo's suggestion, imitate the ACPI code to parse the acpi
> tables, so that the kaslr can get necessary memory information in
> ACPI tables.
> I think ACPI code is an independent part, so copy the codes
> and functions to 'compressed/' directory, so that kaslr won't
> influence the initialization of ACPI.

... and we just picked up

https://lkml.kernel.org/r/20181001140843.26137-1-msys.miz...@gmail.com

and without having looked at the rest of your stuff, if people accept
your solution, we don't need the silly parameter anymore, right?

Which means, we should not rush the whole thing yet until the whole
KASLR vs movable memory gets solved properly.

Ingo, Thomas?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH V2 2/6] x86/efi: Remove __init attribute from memory mapping functions

2018-09-03 Thread Borislav Petkov
On Mon, Sep 03, 2018 at 05:03:56AM +, Prakhya, Sai Praneeth wrote:
> Hmm.. thought that __efi_init might be confusing with the normal __init 
> attribute

How would that be confusing? It has "__efi" prepended?!

All I'm saying is, if you're going to define your own function
attributes, do them generic and short. "_fixup" is too specific IMO. It
also enlarges function definitions unnecessarily.

With "__efi_init" you already denote that it is a special attribute
which has relevance in the EFI code only. What you do about it - the
*fixup* - is the thing you do with the attribute. But you don't have to
have the "what you do" in the attribute name too.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH V2 2/6] x86/efi: Remove __init attribute from memory mapping functions

2018-09-02 Thread Borislav Petkov
On Sun, Sep 02, 2018 at 02:46:30AM -0700, Sai Praneeth Prakhya wrote:
> In order to not keep these functions needlessly when
> "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is not selected, add a new
> __efi_init_fixup attribute whose value changes based on whether the

Why not just __efi_init{,data} ?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/efi: Access EFI MMIO data as unencrypted when SEV is active

2018-07-03 Thread Borislav Petkov
On Tue, Jul 03, 2018 at 04:16:57PM -0500, Brijesh Singh wrote:
> I agree with Ard,  it may be good idea to extend the UEFI spec to
> include encryption information. Having this information may be helpful
> in some cases, e.g if we ever need to map a specific non IO memory as
> unencrypted. So far we have not seen the need for it. But I will ask AMD
> folks working closely with UEFI committee to float this and submit it as
> enhancement in Tianocore BZ.

Except that if the IO memory handling unencrypted changes in future
incarnations, the changes to the spec become moot. I'm just saying...

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86/efi: Access EFI MMIO data as unencrypted when SEV is active

2018-07-03 Thread Borislav Petkov
(dropping stable@ as this is not how you send patches to stable).

On Tue, Jul 03, 2018 at 05:37:18PM +0200, Ard Biesheuvel wrote:
> On 3 July 2018 at 15:32, Brijesh Singh  wrote:
> > SEV guest fails to update the UEFI runtime variables stored in the
> > flash. commit 1379edd59673 ("x86/efi: Access EFI data as encrypted
> > when SEV is active") unconditionally maps all the UEFI runtime data
> > as 'encrypted' (C=1). When SEV is active the UEFI runtime data marked
> > as EFI_MEMORY_MAPPED_IO should be mapped as 'unencrypted' so that both
> > guest and hypervisor can access the data.
> >
> 
> I'm uncomfortable having to carry these heuristics in the kernel. The
> UEFI memory map should be the definitive source of information
> regarding how the OS should map the regions it describes, and if we
> need to guess the encryption status, we are likely to get it wrong at
> least some of the times.

I think the problem here is that IO memory can't be encrypted, at least
at the moment. Thus this patch. I believe future versions will be able
to handle encrypted IO but that's something Brijesh can correct me on.

So it is not really about the UEFI spec but about what the hardware
does/supports currently.

And I don't think that change matters on anything else besides AMD with
SEV enabled...

Thx.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] efi/cper: Remove the INDENT_SP silliness

2018-03-29 Thread Borislav Petkov
From: Borislav Petkov <b...@suse.de>

A separate define just to print a space character is silly and
completely unneeded. Remove it.

Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: linux-efi@vger.kernel.org
---
 drivers/firmware/efi/cper-arm.c | 6 ++
 drivers/firmware/efi/cper.c | 6 ++
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c
index 698e5c8e0c8d..502811344e81 100644
--- a/drivers/firmware/efi/cper-arm.c
+++ b/drivers/firmware/efi/cper-arm.c
@@ -30,8 +30,6 @@
 #include 
 #include 
 
-#define INDENT_SP  " "
-
 static const char * const arm_reg_ctx_strs[] = {
"AArch32 general purpose registers",
"AArch32 EL1 context registers",
@@ -283,7 +281,7 @@ void cper_print_proc_arm(const char *pfx,
pfx, proc->psci_state);
}
 
-   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
+   snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
 
err_info = (struct cper_arm_err_info *)(proc + 1);
for (i = 0; i < proc->err_info_num; i++) {
@@ -310,7 +308,7 @@ void cper_print_proc_arm(const char *pfx,
if (err_info->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO) {
printk("%serror_info: 0x%016llx\n", newpfx,
   err_info->error_info);
-   snprintf(infopfx, sizeof(infopfx), "%s%s", newpfx, 
INDENT_SP);
+   snprintf(infopfx, sizeof(infopfx), "%s ", newpfx);
cper_print_arm_err_info(infopfx, err_info->type,
err_info->error_info);
}
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 5a59b582c9aa..3bf0dca378a6 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -37,8 +37,6 @@
 #include 
 #include 
 
-#define INDENT_SP  " "
-
 static char rcd_decode_str[CPER_REC_LEN];
 
 /*
@@ -433,7 +431,7 @@ cper_estatus_print_section(const char *pfx, struct 
acpi_hest_generic_data *gdata
if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
printk("%s""fru_text: %.20s\n", pfx, gdata->fru_text);
 
-   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
+   snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
if (guid_equal(sec_type, _SEC_PROC_GENERIC)) {
struct cper_sec_proc_generic *proc_err = 
acpi_hest_get_payload(gdata);
 
@@ -510,7 +508,7 @@ void cper_estatus_print(const char *pfx,
   "It has been corrected by h/w "
   "and requires no further action");
printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
-   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
+   snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
 
apei_estatus_for_each_section(estatus, gdata) {
cper_estatus_print_section(newpfx, gdata, sec_no);
-- 
2.13.0

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-03-29 Thread Borislav Petkov
On Sat, Mar 24, 2018 at 01:49:35PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghan...@amd.com>
> 
> Print the fields in the IA32/X64 Processor Error Info Structure.
> 
> Based on UEFI 2.7 Table 253. IA32/X64 Processor Error Information
> Structure.
> 
> Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com>
> ---
> Link:
> https://lkml.kernel.org/r/20180226193904.20532-4-yazen.ghan...@amd.com
> 
> v2->v3:
> * Fix table number in commit message.
> * Don't print raw validation bits.
> 
> v1->v2:
> * Add parantheses around "bits" expression in macro.
> * Fix indentation on multi-line statements.
> 
>  drivers/firmware/efi/cper-x86.c | 50 
> +
>  1 file changed, 50 insertions(+)
> 
> diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c
> index 863f0cd2a0ff..a9ab3bbf7986 100644
> --- a/drivers/firmware/efi/cper-x86.c
> +++ b/drivers/firmware/efi/cper-x86.c
> @@ -3,15 +3,28 @@
>  
>  #include 
>  
> +#define INDENT_SP" "

There's that thing again. So it was a total waste of time discussing
this last time. So let me save my time this time:

NAKed-by: Borislav Petkov <b...@suse.de>

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/8] efi: Decode IA32/X64 Processor Error Section

2018-03-29 Thread Borislav Petkov
On Sat, Mar 24, 2018 at 01:49:34PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam 
> 
> Recognize the IA32/X64 Processor Error Section.
> 
> Do the section decoding in a new "cper-x86.c" file and add this to the
> Makefile depending on a new "UEFI_CPER_X86" config option.
> 
> Print the Local APIC ID and CPUID info from the Processor Error Record.
> 
> The "Processor Error Info" and "Processor Context" fields will be
> decoded in following patches.
> 
> Based on UEFI 2.7 Table 252. Processor Error Record.
> 
> Signed-off-by: Yazen Ghannam 
> ---
> Link:
> https://lkml.kernel.org/r/20180226193904.20532-3-yazen.ghan...@amd.com
> 
> v2->v3:
> * Fix table number in commit message.
> * Don't print raw validation bits.
> 
> v1->v2:
> * Change config option depends to "X86" instead of "X86_32 || X64_64".
> * Remove extra newline in Makefile changes.
> * Drop author copyright line.
> 
>  drivers/firmware/efi/Kconfig|  5 +
>  drivers/firmware/efi/Makefile   |  1 +
>  drivers/firmware/efi/cper-x86.c | 23 +++

I'd prefer if that were:

drivers/firmware/efi/cper/x86.c
drivers/firmware/efi/cper/arm.c

but that's Ard's call.

>  drivers/firmware/efi/cper.c | 10 ++
>  include/linux/cper.h|  2 ++
>  5 files changed, 41 insertions(+)
>  create mode 100644 drivers/firmware/efi/cper-x86.c
> 
> diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
> index 3098410abad8..781a4a337557 100644
> --- a/drivers/firmware/efi/Kconfig
> +++ b/drivers/firmware/efi/Kconfig
> @@ -174,6 +174,11 @@ config UEFI_CPER_ARM
>   depends on UEFI_CPER && ( ARM || ARM64 )
>   default y
>  
> +config UEFI_CPER_X86
> + bool
> + depends on UEFI_CPER && X86
> + default y
> +
>  config EFI_DEV_PATH_PARSER
>   bool
>   depends on ACPI
> diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile
> index cb805374f4bc..5f9f5039de50 100644
> --- a/drivers/firmware/efi/Makefile
> +++ b/drivers/firmware/efi/Makefile
> @@ -31,3 +31,4 @@ obj-$(CONFIG_ARM)   += $(arm-obj-y)
>  obj-$(CONFIG_ARM64)  += $(arm-obj-y)
>  obj-$(CONFIG_EFI_CAPSULE_LOADER) += capsule-loader.o
>  obj-$(CONFIG_UEFI_CPER_ARM)  += cper-arm.o
> +obj-$(CONFIG_UEFI_CPER_X86)  += cper-x86.o
> diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c
> new file mode 100644
> index ..863f0cd2a0ff
> --- /dev/null
> +++ b/drivers/firmware/efi/cper-x86.c
> @@ -0,0 +1,23 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2018, Advanced Micro Devices, Inc.
> +
> +#include 
> +
> +/*
> + * We don't need a "CPER_IA" prefix since these are all locally defined.
> + * This will save us a lot of line space.
> + */
> +#define VALID_LAPIC_ID   BIT_ULL(0)
> +#define VALID_CPUID_INFO BIT_ULL(1)
> +
> +void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc)
> +{
> + if (proc->validation_bits & VALID_LAPIC_ID)
> + printk("%sLocal APIC_ID: 0x%llx\n", pfx, proc->lapic_id);
> +
> + if (proc->validation_bits & VALID_CPUID_INFO) {
> + printk("%sCPUID Info:\n", pfx);
> + print_hex_dump(pfx, "", DUMP_PREFIX_OFFSET, 16, 4, proc->cpuid,
> +sizeof(proc->cpuid), 0);

That's dumping 48 bytes of static information for every error!
Information which can be dumped only once when collecting system info.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/8] Decode IA32/X64 CPER

2018-03-23 Thread Borislav Petkov
On Fri, Mar 23, 2018 at 12:19:20AM +, Ghannam, Yazen wrote:
> Is it alright if I go ahead and just send the v3 of this set without the MCA 
> decoding?

Sure, but let's not drop the MCA decoding from the todo list. :)

Thx.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-14 Thread Borislav Petkov
On Sat, Mar 10, 2018 at 12:33:35AM +, Prakhya, Sai Praneeth wrote:
> Although the discussions were off my understanding, the present issue
> I see is, (and also the motivation for me to do the patch is) when a
> thread tries to execute any efi_runtime_service() we switch to efi_pgd
> (which doesn't have user space mappings) and all other subsystems in
> kernel aren't aware of this switch. This looks like a perfect case for
> kthread.

That's all fine and good but you need to be prepared and handle properly
an NMI while EFI is running.

I have no clue whether the platform delays delivery of NMIs during EFI
runtime services or whatever happens but you need to have all those
cases covered so that no monkey business happens.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-09 Thread Borislav Petkov
On Fri, Mar 09, 2018 at 02:37:59AM +, Prakhya, Sai Praneeth wrote:
> But, I guess, we have some downsides with this design:
> 1. We are doing this to have "no exceptions to use efi_rts_wq", but we will 
> be making
> the common case complicated. i.e. When a user requests to write some efi 
> variable,

So if the pstore use case is so important and special, I think we should
make the EFI path as fast as possible as getting that data to the pstore
is a priority.

> Alternatively, instead of playing around with in_atomic(), we could have 
> wrapper
> functions like efi_write_var_non_wq() which will only be used by pstore. This 
> function
> will not use efi_rts_wq and directly invoke efi_runtime_service. Just an 
> attempt to
> make the code not look messy.

I guess.

If the write-to-pstore case is such a critical one, I guess the
exception is justified.

> That's true! AFAIK, we don't have any issues handling NMI while in efi_pgd.
> We might have issues only when, we are already in efi_pgd, NMI comes along

Can you trigger this? Or is it something hypothetical?

> and NMI handler tries to touch the regions that are not mapped in efi_pgd

If it is not hypothetical, the NMI handler should learn to look at CR3
first and return if CR3 has the efi pgd.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-09 Thread Borislav Petkov
On Thu, Mar 08, 2018 at 05:05:44PM +, Luck, Tony wrote:
> > "Hence, pstore calls efi_runtime_services() without using efi_rts_wq" -
> > that doesn't sound like optimal design to me. I would try to shove them
> > all through the workqueue - not have exceptions.
> 
> But pstore is trying to save the last gasp dying words from a kernel that
> has paniced. There isn't any guarantee that work queue will run. I think
> it is reasonable to have some special case to make sure that we do save
> the information.  But perhaps that special case should be to have pstore
> call directly into the guts of the EFI code and not worry about all these
> fancy switches of "mm" etc.

I guess...

> This is true for the machine check logging case too, but the mitigation is
> that the details of the error persist in the machine check banks across the
> reset ... so all is not lost if the work queue isn't run here.

Well, I'm still hoping to have this wonderful and reliable
logging facility one day where we can dump bytes into a
persistent-across-reboots buffer and analyze them later. And yap, in
that case, I don't mind if the code simply bypasses all the dancing in
the OS and writes those bytes. Even switching pagetables would be too
much for that case - just fire'n'forget writing from ring 0.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-08 Thread Borislav Petkov
On Thu, Mar 08, 2018 at 05:31:03AM +, Prakhya, Sai Praneeth wrote:
> Another warning by checkpatch is "use of in_atomic() in drivers code"

I'm assuming it warns because you're touching files in drivers/ but the
efi fun is not really a driver...

But looking at patch 3, that thing looks like a real mess. Some of the
things - pstore, it seems - do stuff in atomic context and yet you want
to do efi stuff in a workqueue which doesn't stomach atomic context to
begin with.

So if you wanna do workqueue, you should make sure all efi stuff gets
delayed to process context and queued properly. For example, we log
MCEs from atomic context by putting them on a lockless buffer and then
kicking irq_work to queue the work when we return to process context.
Can you do something like that?

"Hence, pstore calls efi_runtime_services() without using efi_rts_wq" -
that doesn't sound like optimal design to me. I would try to shove them
all through the workqueue - not have exceptions.

Then this:

> A potential issue could be, for instance, an NMI interrupt (like perf)
> trying to profile some user data while in efi_pgd.

I can't understand.

How did we handle this until now and why is it a problem all of a
sudden?

Because I don't recall being unable to run perf while efi runtime
services are happening.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-07 Thread Borislav Petkov
On Mon, Mar 05, 2018 at 03:23:09PM -0800, Sai Praneeth Prakhya wrote:
> +#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)  
> \
> +({   \
> + struct efi_runtime_work efi_rts_work;   \
> + \
> + INIT_WORK_ONSTACK(_rts_work.work, efi_call_rts);\
> + efi_rts_work.func = _rts;   \
> + efi_rts_work.arg1 = _arg1;  \
> + efi_rts_work.arg2 = _arg2;  \
> + efi_rts_work.arg3 = _arg3;  \
> + efi_rts_work.arg4 = _arg4;  \
> + efi_rts_work.arg5 = _arg5;  \
> + /*  \
> +  * queue_work() returns 0 if work was already on queue, \
> +  * _ideally_ this should never happen.  \
> +  */ \
> + if (queue_work(efi_rts_wq, _rts_work.work)) \
> + flush_work(_rts_work.work); \
> + else\
> + BUG();  \

So failure to queue that work is such a critical problem that we need
to BUG() and can't possibly continue and shoult not attempt recovery at
all?

IOW, we should always strive to fail gracefully and not shit in pants at
the first sign of trouble.

Even checkpatch warns here:

WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather 
than BUG() or BUG_ON()
#184: FILE: drivers/firmware/efi/runtime-wrappers.c:92:
+   BUG();  \


and by looking at the other output, you should run your patches through
checkpatch. Some of the things make sense like:

WARNING: quoted string split across lines
#97: FILE: drivers/firmware/efi/efi.c:341:
+   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
services "
+  "disabled.\n");

for example.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/8] Decode IA32/X64 CPER

2018-03-01 Thread Borislav Petkov
On Wed, Feb 28, 2018 at 08:58:15PM +, Ghannam, Yazen wrote:
> 1) We keep this set mostly as-is. This would be our fallback if we don't have
> anything better.

Yes, sounds good. We try to decode it as MCE and if we cannot, we dump
the raw CPER record.

> 2) I add the MCA decoding to this set. I was thinking to do this in a separate
> set but maybe it's better to do it all together.

I'm fine if you do it separately, as long as you do it so that we have
user-friendly decoding in the end.

> Number 2 would mean we do a quick check on the CPER to see if it contains
> MCA info. There's no spec-defined way to do this, but we can make a good
> guess by seeing if we have an "MSR register" context and that context has
> an "MSR address" that is an MCA register.

Yap.

> If we think we have MCA info, then we pull as much out of the CPER as we
> can and put it in a struct mce which we then pass to the notifier chain.
> 
> If we don't think we have MCA info, then we fallback to number 1.

Ack.

> At the moment, it seems we'll be using x86 CPER to represent MCA errors
> in BERT since there's no other option in BERT. So I think having number 2
> would catch most, if not all, errors reported with x86 CPER.

Yeah, if you think about it, CPER is a clumsy and totally useless
indirection layer between MCA and the OS. And if the error is of
different type (AER, PCI, whatever), then it wraps around it too with
some dumb table. And that doesn't bring anything - just the need for
more support added to the OS and tools around it. Basically what you're
doing now.

I don't mind the aspect of firmware seeing the errors first and even
attempting to fix them as the firmware knows the platform intimately but
doing everything in firmware just because some misguided souls think
this gives added value but in reality ends up becoming a worse problem,
is simply the wrong wrong thing to do.

Thx.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/8] Decode IA32/X64 CPER

2018-02-28 Thread Borislav Petkov
On Wed, Feb 28, 2018 at 03:12:09PM +, Ghannam, Yazen wrote:
> CPER is the format used for BERT, etc. We'll only ever see a CPER if the
> firmware creates it. And it's up to firmware policy what is shared with
> the OS.

Yap, but we should still tie it into our infra.

> My main reason for printing all the info is that it may be too difficult or 
> too
> late to gather that info after the fact. I think this is especially true for 
> boot
> errors, though maybe there's another way that I don't know about
> (re-reading BERT later?).

So that BERT thing is a table, AFAIR. I don't see why it would be a
problem if we read it later, at our leisure and free the record only
when we're done.

Looking at bert.c is entered with a late_initcall() which is nicely late
and we have everything up and ready to process errors then.

> Right. I want to work on getting this more integrated with our existing
> x86 infrastructure. But I don't want to wait until we figure all that out
> before we have some sort of CPER decoding.

Just keep in mind that whenever you expose stuff and userspace starts
using it, it is much harder to change it. So let's do it right pls.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/8] Decode IA32/X64 CPER

2018-02-28 Thread Borislav Petkov
On Mon, Feb 26, 2018 at 01:38:56PM -0600, Yazen Ghannam wrote:
> From: Yazen Ghannam 
> 
> This series adds decoding for the IA32/X64 Common Platform Error Record.

One much more important thing I forgot about yesterday: how is
this thing playing into our RAS reporting, x86 decoding chain, etc
infrastructure?

Is CPER bypassing it completely and the firmware is doing everything
now? I sure hope not.

If not, it needs to tie into our infrastructure and the errors need
to go into the decoding chain where different things look at them and
filter them.

Tony, what are your plans here?

Perhaps we can finally get MCE decoding on Intel too :-)

Thx.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Borislav Petkov
On Tue, Feb 27, 2018 at 09:32:18PM +, Ghannam, Yazen wrote:
> Not much more readable. It's still vague and confusing to a user and devoid
> of any real info so an expert can't help. And now the information is printed
> arbitrarily, so someone needs to read the source to figure out what it really
> means.

WTF? You need to read the source to figure out what the error is? So
"Corrected Processor Error" is confusing? I think you've been clearly
staring at the spec for waaay too long.

Also, read my mail again:

"Now, I admit that my vesion of the record is not enough to debug it
but it needs to contain only information which is clear and humanly
readable to debug. You can always dump the raw data underneath from the
tracepoint but make the beginning human readable."

> Maybe. But these records are generated by Platform Firmware. Why would
> FW report the error knowing the system is about to die?

WTF more! Dude, are you kidding me?

So the firmware should not report the error if it knows the system is
about to die?!?!

Now you're just making up insane counterarguments, just because.

> Your example still says "Hardware Error"

Oh my, that's the *prefix*.

> and odds are general users won't understand what the error type means
> or what a corrected error is. So it's not much better.

Yeah, and the next thing you'll say is that users won't understand what
"error" means, right?

Geez.

> Exactly! The more info available (usually) the more quickly it can be
> diagnosed.

You still don't understand what I'm trying to explain to you.

It is *not* about diagnosing it - in order to do that you need to
involve people to diagnose it. It is about making the error record as
human readable as possible so that you don't *have* to involve people to
diagnose it in the first place and the user can say, ah ok, corrected
error, no need to do anything.

Or "System Fatal error" - I better replace that part.

> Hardware errors are generally rare and hard to reproduce.

Except when they're not like DRAM ECC floods which are pretty easy to
reproduce.

> So when one does occur we should capture the data and provide it.

Did I say you should not do that?!

I said: make it as human readable as possible and dump the gory hex crap
after it.

> Here are a couple of scenarios based on similar experiences I've had:

Now play that same scenario with the following record format:

[ human readable error record ]
[ full raw error dump ]

> I'll send a V3 set with the following changes:
> 1) Fix table numbering in commit messages.
> 2) Remove "Validation Bits" lines.
> 3) Only print error type GUID for unknown types.
> 
> I think this set should focus on printing the x86 CPER based on the UEFI
> spec and the convention of the other CPER code. CPERs are generated
> by Platform Firmware. So errors are explicitly intended to be viewed by
> the user and all info should be displayed.

You should look up from the spec and realize that real life is much
different.

> I *have* been thinking that it would be nice to take the CPER and pipe it
> through the MCA decoding in arch/x86 and EDAC. This would be really
> nice for when the CPER includes the MCA registers in the Context info.
> So we'd get our usual MCA decoding instead of a binary blob of registers.

That would definitely be a step in the right direction.

> I was thinking that the MCA decoding would be in addition to this. But
> based on Boris's comments, maybe we can make it a default selection.
> For example, if MCA/EDAC decoding is available, use it. Otherwise, print
> the CPER fields in a generic way like we do for the other CPER types.

Yes.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Borislav Petkov
On Tue, Feb 27, 2018 at 06:40:13PM +, Ghannam, Yazen wrote:
> Code readability.

Bullshit!

You can't tell me this:

snprintf(newpfx, sizeof(newpfx), "%s ", pfx);

is less readable than:

 snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);

> 1) No one except debug and HW design folks, who will eventually get a
> report from a user.

Hahahha, yeah right.

The only people who get those reports are the maintainers of the code in
the kernel and the distro people who get all the bugs assigned to them.

And if they can't decode the error - it is Tony and me.

HW folks hear about it from us. And we go and decode the damn crap
*every* time. Do you catch my drift now?

> [1.990948] [Hardware Error]:  Error 1, type: corrected 
> [1.995789] [Hardware Error]:  fru_text: ProcessorError 
> [2.000632] [Hardware Error]:   section_type: IA32/X64 processor error 
> [2.005796] [Hardware Error]:   Validation Bits: 0x0207 
> [2.010953] [Hardware Error]:   Local APIC_ID: 0x0 
> [2.015991] [Hardware Error]:   CPUID Info: 
> [2.020747] [Hardware Error]:   : 00800f12  00400800 
>  
> [2.025595] [Hardware Error]:   0010: 76d8320b  178bfbff 
>  
> [2.030423] [Hardware Error]:   0020:    
>  
> [2.035198] [Hardware Error]:   Error Information Structure 0:
> [2.039961] [Hardware Error]:Error Structure Type: 
> a55701f5-e3ef-43de-ac72-249b573fad2c
> [2.049608] [Hardware Error]:Error Structure Type: cache error
> [2.054344] [Hardware Error]:Validation Bits: 0x0001
> [2.059046] [Hardware Error]:Check Information: 0x20540087
> [2.063625] [Hardware Error]: Validation Bits: 0x0087
> [2.068032] [Hardware Error]: Transaction Type: 0, Instruction
> [2.072423] [Hardware Error]: Operation: 5, instruction fetch
> [2.076776] [Hardware Error]: Level: 1
> [2.081073] [Hardware Error]: Overflow: true
> [2.085360] [Hardware Error]:   Context Information Structure 0:
> [2.089661] [Hardware Error]:Register Context Type: MSR Registers 
> (Machine Check and other MSRs)
> [2.098487] [Hardware Error]:Register Array Size: 0x0050
> [2.103113] [Hardware Error]:MSR Address: 0xc0002011
> [2.107742] [Hardware Error]:Register Array:
> [2.112270] [Hardware Error]:: d82a0151 
> 
> [2.116845] [Hardware Error]:0010: d010 
> 00030031
> [2.121228] [Hardware Error]:0020: 000100b0 
> 4a00
> [2.125514] [Hardware Error]:0030:  
> 
> [2.129747] [Hardware Error]:0040:  
> 

Lemme simplify that error record:

[Hardware Error]:  Corrected Processor Error
[Hardware Error]:   APIC_ID: 0x0 | CPUID: 0x17|0x1|0x2
[Hardware Error]:Type: cache error during instruction fetch
[Hardware Error]:cache level 1
[Hardware Error]:Overflow: true

See how much more readable it got! And it is only 5 lines. I can make it
even smaller.

If it were a critical, uncorrectable error, every line counts: imagine
you do the above fat record and the machine freezes at line 5.

Now, I admit that my vesion of the record is not enough to debug it
but it needs to contain only information which is clear and humanly
readable to debug. You can always dump the raw data underneath from the
tracepoint but make the beginning human readable.

Do you know what users say about your error record?

"Err, it says hardware error, is my machine broken? I need to replace my
CPU."

I read that on a weekly basis.

Do you know how expensive support calls are about such errors which are
completely unreadable to people? 20 engineers need to get on a call to
realize it was a dumb correctable error? Btw, this is one of the reasons
why we did the error collector.

So put yourself in the users' shoes, look at the error record and think
hard whether the information displayed is readable to humans.

Btw, decode_error_status() in mce_amd.c is an attempt to explain the
error severity - note the "no action required." thing. It is still not
good enough - people still throw hands in the air and run in headless
chicken mode.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/8] efi: Decode IA32/X64 Processor Error Section

2018-02-27 Thread Borislav Petkov
On Tue, Feb 27, 2018 at 06:06:21PM +, Ghannam, Yazen wrote:
> It's not just about arches but record types. A single platform can report
> using arch-specific records, memory records, PCIe records, etc.
> 
> So all the other record types only show fields with a valid bit set and this
> has always been the case. There may be users or tools who expect that
> same behavior.

You know we have this thing called tracepoints for that, don't you?

You define one tracepoint which regurgitates an error record and then
all arches which have the same table, share them.

> Please see the other patches where these are decoded further. As I
> mentioned the cover letter, the decoding is applied incrementally rather
> than having one large patch.

Here's what needs to go:

All the validation bits crap:

printk("%sValidation Bits: 0x%04x\n", pfx, validation_bits);

Instead, you simply dump all entries unconditionally and say whether
they're valid or not. This way you have the same format for each error
record - much easier to work with.

printk("%sCPUID Info:\n", pfx);
print_hex_dump(pfx, "", DUMP_PREFIX_OFFSET, 16, 4, proc->cpuid,
   sizeof(proc->cpuid), 0);

Frankly, I have no clue why we should even dump CPUID for *every* error.
It is completely sufficient to dump family/model/stepping like we do in
mce_amd.c. Besides, when debugging, you collect much more info from the
system - CPUID only is not enough.

printk("%sError Structure Type: %pUl\n", newpfx,
   _info->err_type);

Unreadable GUIDs.

What are those even:

if (err_info->validation_bits & INFO_VALID_TARGET_ID) {
printk("%sTarget Identifier: 0x%016llx\n",
   newpfx, err_info->target_id);
}

if (err_info->validation_bits & INFO_VALID_REQUESTOR_ID) {
printk("%sRequestor Identifier: 0x%016llx\n",
   newpfx, err_info->requestor_id);
}

if (err_info->validation_bits & INFO_VALID_RESPONDER_ID) {
printk("%sResponder Identifier: 0x%016llx\n",
   newpfx, err_info->responder_id);
}

some 8-byte ids? Who knows... Either remove them or make them
human-readable, *explaining* what the requestor is and the target is and
so on.

printk("%sRegister Array Size: 0x%04x\n", newpfx,
   ctx_info->reg_arr_size);

I have no clue what that is for.

if (ctx_info->reg_ctx_type == CTX_TYPE_MSR) {
groupsize = 8; /* MSRs are 8 bytes wide. */
printk("%sMSR Address: 0x%08x\n", newpfx,
   ctx_info->msr_addr);
}

What do I need the MSR *addresses* for?

if (ctx_info->reg_ctx_type == CTX_TYPE_MMREG) {
printk("%sMM Register Address: 0x%016llx\n", newpfx,
   ctx_info->mm_reg_addr);
}

memory mapped registers?!?!

> Also, like I said in another thread, we should always print the raw value
> followed by the decoding. That way the raw info is there in case a user
> wants to send the data to the vendor or other expert party.

By that logic we shouldn't be decoding at all - we should be dumping a
fat hex blob.

Again, there's this thing called tracepoints which have exactly that,
you know. You can dump human readable info to dmesg and dump a whole raw
record into the tracepoint.

When the error is critical and we're about to die, the tracepoint
contents goes to dmesg too.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Borislav Petkov
On Tue, Feb 27, 2018 at 05:46:54PM +, Ghannam, Yazen wrote:
> I think there's value in following the conventions in a subsystem.

"conventions in a subsystem" my ass. That's brainless copy-pasting.

It was added by

f6edea77c8c8 ("ACPI, APEI, CPER: Cleanup CPER memory error output format")

and then replicated everywhere.

It is simply a dumb way of writing:

snprintf(newpfx, sizeof(newpfx), "%s ", pfx);


> I can change this if you give a reason besides "it's dumb".

Two can play that game: you get to keep it if you give a good reason
why.

> We do map the spec-defined GUIDs in patch 4 of this set. I don't know if 
> there's
> a central place where all vendor-defined GUIDs are listed. I can look into 
> this.

Yes, at least for the most prominent ones.

> And the raw value should still be printed because
> 1) It may represent a type that we can't decode. Maybe a type that's not part 
> of
> the spec.

If we can't decode it, *then* you dump it:

"Unrecognized type: 0x%llx ..."

> 2) It's good to have the raw value for reference. We do this with MCA_STATUS
> where we print the raw value followed by the decoding.

1. No one stares at the raw value if the bits are decoded
2. MCA_STATUS is one register - this error record is huge.

> The structs are all different even though some fields may be the same.

Fair enough. Only if it makes sense.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/8] efi: Decode IA32/X64 Processor Error Section

2018-02-27 Thread Borislav Petkov
On Tue, Feb 27, 2018 at 05:27:39PM +, Ghannam, Yazen wrote:
> Sure, we can print the fields unconditionally if Ard is okay with that.
> 
> The issue is that the x86 behavior will be different from all the others, so 
> that
> might be confusing.

Confusing for whom?

Are we sharing tools with other arches or what am I missing?

> This set does decode everything fully so that people can read the error.

No, it doesn't. It dumps

printk("%sValidation Bits: 0x%016llx\n", pfx, proc->validation_bits);

printk("%sError Structure Type: %pUl\n", newpfx,
_info->err_type);

printk("%sCheck Information: 0x%016llx\n", newpfx,
 err_info->check_info);

and so on which are half-baked.

Think of it this way: if you look at the error record and you still need
to look at the spec to decode it, then it is still not good enough.

Users don't care about validation bits, or unreadable GUIDs or WTF is
"Check Information" even?

"This section details the layout of the Processor Error Information
Structure and the detailed check information which is contained within."

And that "Check Information" thing is mentioned only twice in the whole
spec.

StructureErrorType only there in the table.

Very informative that.

So no, users don't care about any of that internal crap - they wanna
know what is wrong with their box and that should be written in plain
english.

> I already mentioned in the Context info patch that there could be
> further decoding which we can do in the future.

Then decode only those pieces fully now which you can do now and when
you add something in the future, add it then with the full decoding
functionality. No fields which need additional decoding with the spec
opened on one side.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/8] efi: Decode IA32/X64 Cache, TLB, and Bus Check structures

2018-02-27 Thread Borislav Petkov
On Tue, Feb 27, 2018 at 03:33:44PM +, Ghannam, Yazen wrote:
> I agree which is why I've included the Validation Bits. A user can then
> check the Validation Bits for any field that is of interest but missing.

No, no half-baked fields.

So you know drivers/edac/mce_amd.c, right? And you know the lengths we
go to make an error record as readable as possible?

So please apply the same logic and intent here too.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Borislav Petkov
On Tue, Feb 27, 2018 at 03:25:06PM +, Ghannam, Yazen wrote:
> This is the same as the other CPER code.

Dude, turn on brain!

So if it is in the other CPER code, we should copy it, no matter how
dumb it is?!?

> Also, the spec allows platform-defined GUIDs. So we should always print this
> even if the GUID is not defined by the spec.

We need to have a way to map the GUID to a hw part. Dumb numbers mean shit
because the error record is worthless.

> The Check Information will be decoded further in another patch.
> 
> I don't think there's much else to decode for the ID fields.

Again, those numbers can't help decoding the error, no need to dump
them. Or we find a way to make sense out of that info.

> Other tables may have the same fields but the offsets are usually different.
> So it may be more trouble than it's worth trying to unify the different 
> tables.

If the structs are the same, you can use generic functions for dumping -
the offsets are meaningless then.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/8] efi: Decode IA32/X64 Processor Error Section

2018-02-27 Thread Borislav Petkov
On Tue, Feb 27, 2018 at 03:13:11PM +, Ghannam, Yazen wrote:
> I decided to print the Validation Bits as a sanity check for whomever is 
> looking
> at this. Since we only print fields with a valid bit, it may be confusing for 
> users
> who don't know why fields are missing.

I suggested what to do about that already: print the fields unconditionally.

> Also, I don't think we should be interpreting the spec for the user. We should
> print the fields as they and users can refer back to the spec for more 
> information.

This is a very very bad idea.

You can just as well dump binary blobs and then add a user script which
deciphers that. Oh wait, we had that, it is called mcelog and it is a
huge pain trying to decode an error properly.

By that same logic, we could simply dump 64-bit MCA MSR values.

And everytime we get an error, we have to go dig out the spec. Hell no!
We had that, we know it is awful, we won't have it again.

You want to decode everything and fully in the kernel so that you can
have a ready error record which people can simply *read* from dmesg and
know what the error is.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/8] efi: Decode IA32/X64 Cache, TLB, and Bus Check structures

2018-02-27 Thread Borislav Petkov
On Mon, Feb 26, 2018 at 01:39:01PM -0600, Yazen Ghannam wrote:
> +static void print_err_info(const char *pfx, u8 err_type, u64 check)
> +{
> + u16 validation_bits = CHECK_VALID_BITS(check);
> +
> + printk("%sValidation Bits: 0x%04x\n", pfx, validation_bits);
> +
> + if (err_type == ERR_TYPE_MS)
> + return;
> +
> + if (validation_bits & CHECK_VALID_TRANS_TYPE) {
> + u8 trans_type = CHECK_TRANS_TYPE(check);
> +
> + printk("%sTransaction Type: %u, %s\n", pfx, trans_type,
> +trans_type < ARRAY_SIZE(ia_check_trans_type_strs) ?
> +ia_check_trans_type_strs[trans_type] : "unknown");
> + }
> +
> + if (validation_bits & CHECK_VALID_OPERATION) {
> + u8 op = CHECK_OPERATION(check);
> +
> + /*
> +  * CACHE has more operation types than TLB or BUS, though the
> +  * name and the order are the same.
> +  */
> + u8 max_ops = (err_type == ERR_TYPE_CACHE) ? 9 : 7;
> +
> + printk("%sOperation: %u, %s\n", pfx, op,
> +op < max_ops ? ia_check_op_strs[op] : "unknown");
> + }
> +
> + if (validation_bits & CHECK_VALID_LEVEL)
> + printk("%sLevel: %llu\n", pfx, CHECK_LEVEL(check));
> +
> + if (validation_bits & CHECK_VALID_PCC)
> + print_bool("Processor Context Corrupt", pfx, check, CHECK_PCC);

I think we want to print PCC here unconditionally and say:

PCC: (yes|no|invalid)

I don't think the absence of PCC in the error record is a good enough
hint that the PCC field is invalid.

Ditto for the rest and transaction type above too. I think it would be
much easier if we have fixed fields error record.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/8] efi: Decode UEFI-defined IA32/X64 Error Structure GUIDs

2018-02-27 Thread Borislav Petkov
On Mon, Feb 26, 2018 at 01:39:00PM -0600, Yazen Ghannam wrote:
> @@ -45,6 +81,11 @@ void cper_print_proc_ia(const char *pfx, const struct 
> cper_sec_proc_ia *proc)
>   printk("%sError Structure Type: %pUl\n", newpfx,
>  _info->err_type);
>  
> + err_type = cper_get_err_type(_info->err_type);
> + printk("%sError Structure Type: %s\n", newpfx,
> +err_type < ARRAY_SIZE(cper_proc_error_type_strs) ?
> +cper_proc_error_type_strs[err_type] : "unknown");

Ah, there it is, much better. Now this tells us what component it is.

So cper-arm.c does the same thing and there's a similar thing in cper.c -
cper_print_proc_generic().

Let's not tri-plicate that code pls and have a generic function instead.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Borislav Petkov
On Mon, Feb 26, 2018 at 01:38:59PM -0600, Yazen Ghannam wrote:
> From: Yazen Ghannam 
> 
> Print the fields in the IA32/X64 Processor Error Info Structure.
> 
> Based on UEFI 2.7 Table 256. IA32/X64 Processor Error Information
> Structure.
> 
> Signed-off-by: Yazen Ghannam 
> ---
> Link:
> https://lkml.kernel.org/r/20180223200333.6410-4-yazen.ghan...@amd.com
> 
> v1->v2:
> * Add parantheses around "bits" expression in macro.
> * Fix indentation on multi-line statements.
> 
>  drivers/firmware/efi/cper-x86.c | 53 
> +
>  1 file changed, 53 insertions(+)
> 
> diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c
> index 9da0d981178f..417bd4e500a7 100644
> --- a/drivers/firmware/efi/cper-x86.c
> +++ b/drivers/firmware/efi/cper-x86.c
> @@ -3,15 +3,28 @@
>  
>  #include 
>  
> +#define INDENT_SP" "

That's just silly.

> +
>  /*
>   * We don't need a "CPER_IA" prefix since these are all locally defined.
>   * This will save us a lot of line space.
>   */
>  #define VALID_LAPIC_ID   BIT_ULL(0)
>  #define VALID_CPUID_INFO BIT_ULL(1)
> +#define VALID_PROC_ERR_INFO_NUM(bits)(((bits) & GENMASK_ULL(7, 2)) 
> >> 2)
> +
> +#define INFO_VALID_CHECK_INFOBIT_ULL(0)
> +#define INFO_VALID_TARGET_ID BIT_ULL(1)
> +#define INFO_VALID_REQUESTOR_ID  BIT_ULL(2)
> +#define INFO_VALID_RESPONDER_ID  BIT_ULL(3)
> +#define INFO_VALID_IPBIT_ULL(4)
>  
>  void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc)
>  {
> + int i;
> + struct cper_ia_err_info *err_info;
> + char newpfx[64];
> +
>   printk("%sValidation Bits: 0x%016llx\n", pfx, proc->validation_bits);
>  
>   if (proc->validation_bits & VALID_LAPIC_ID)
> @@ -22,4 +35,44 @@ void cper_print_proc_ia(const char *pfx, const struct 
> cper_sec_proc_ia *proc)
>   print_hex_dump(pfx, "", DUMP_PREFIX_OFFSET, 16, 4, proc->cpuid,
>  sizeof(proc->cpuid), 0);
>   }
> +
> + snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
> +
> + err_info = (struct cper_ia_err_info *)(proc + 1);
> + for (i = 0; i < VALID_PROC_ERR_INFO_NUM(proc->validation_bits); i++) {
> + printk("%sError Information Structure %d:\n", pfx, i);
> +
> + printk("%sError Structure Type: %pUl\n", newpfx,
> +_info->err_type);

That dumps a GUID - how do I find out what type that GUID refers to?

> +
> + printk("%sValidation Bits: 0x%016llx\n",
> +newpfx, err_info->validation_bits);

As before, no need for those.

> +
> + if (err_info->validation_bits & INFO_VALID_CHECK_INFO) {
> + printk("%sCheck Information: 0x%016llx\n", newpfx,
> +err_info->check_info);
> + }
> +
> + if (err_info->validation_bits & INFO_VALID_TARGET_ID) {
> + printk("%sTarget Identifier: 0x%016llx\n",
> +newpfx, err_info->target_id);
> + }
> +
> + if (err_info->validation_bits & INFO_VALID_REQUESTOR_ID) {
> + printk("%sRequestor Identifier: 0x%016llx\n",
> +newpfx, err_info->requestor_id);
> + }
> +
> + if (err_info->validation_bits & INFO_VALID_RESPONDER_ID) {
> + printk("%sResponder Identifier: 0x%016llx\n",
> +newpfx, err_info->responder_id);
> + }

Those look like would need an additional decoding. Can we do that here
too?

> +
> + if (err_info->validation_bits & INFO_VALID_IP) {
> + printk("%sInstruction Pointer: 0x%016llx\n",
> +newpfx, err_info->ip);

The only thing that makes sense so far.

> + }
> +
> + err_info++;
> + }

Also, it is worth checking how much duplication there is with
drivers/firmware/efi/cper-arm.c and unifying the common pieces into
functions in cper-common.c or so.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/8] efi: Decode IA32/X64 Processor Error Section

2018-02-27 Thread Borislav Petkov
On Mon, Feb 26, 2018 at 01:38:58PM -0600, Yazen Ghannam wrote:
> + * We don't need a "CPER_IA" prefix since these are all locally defined.
> + * This will save us a lot of line space.
> + */
> +#define VALID_LAPIC_ID   BIT_ULL(0)
> +#define VALID_CPUID_INFO BIT_ULL(1)
> +
> +void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc)
> +{
> + printk("%sValidation Bits: 0x%016llx\n", pfx, proc->validation_bits);

Ok, so this...

> +
> + if (proc->validation_bits & VALID_LAPIC_ID)
> + printk("%sLocal APIC_ID: 0x%llx\n", pfx, proc->lapic_id);
> +
> + if (proc->validation_bits & VALID_CPUID_INFO) {
> + printk("%sCPUID Info:\n", pfx);
> + print_hex_dump(pfx, "", DUMP_PREFIX_OFFSET, 16, 4, proc->cpuid,
> +sizeof(proc->cpuid), 0);

... and this are semi-decoded information bits which I'd have to go open
the spec and continue decoding.

Can we please change the whole approach of not simply dumping such
fields but decode them fully. We want that error information to be
helpful to the user and she should be able to immediately understand
what type of error it is.

Validation Bits and a CPUID hexdump simply makes you go look at the spec
again and such dumps are only useful as a debugging aid but nothing
more.

The APIC ID dump is how this should be done - properly and fully decoded
error info which can be used immediately for diagnosing the error.

Thx.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/8] efi: Fix IA32/X64 Processor Error Record definition

2018-02-27 Thread Borislav Petkov
On Mon, Feb 26, 2018 at 01:38:57PM -0600, Yazen Ghannam wrote:
> From: Yazen Ghannam 
> 
> Based on UEFI 2.7 Table 255. Processor Error Record, the "Local APIC_ID"

My pdf says this is table 252.

> field is 8 bytes but Linux defines this field as 1 byte.
> 
> Fix this in the struct cper_sec_proc_ia definition.
> 
> Signed-off-by: Yazen Ghannam 
> ---
> Link:
> https://lkml.kernel.org/r/20180223200333.6410-2-yazen.ghan...@amd.com
> 
> v1->v2:
> * No changes.
> 
>  include/linux/cper.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/cper.h b/include/linux/cper.h
> index d14ef4e77c8a..4b5f8459b403 100644
> --- a/include/linux/cper.h
> +++ b/include/linux/cper.h
> @@ -381,7 +381,7 @@ struct cper_sec_proc_generic {
>  /* IA32/X64 Processor Error Section */
>  struct cper_sec_proc_ia {
>   __u64   validation_bits;
> - __u8lapic_id;
> + __u64   lapic_id;
>   __u8cpuid[48];

Ok, that processor error record has a variable length structure at byte
offset 64 and we don't have it in this struct.

I guess I'll see it in the following patches but right now it looks
like that "Processor Error Info" thing is simply situated after that
Processor Error Record so we are supposed to simply find the info at
offset 64...

/me continues reading...

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] efivars: reading variables can generate SMIs

2018-02-16 Thread Borislav Petkov
On Fri, Feb 16, 2018 at 10:48:32AM -0800, Joe Konno wrote:
>  We may see some other patches or RFCs about caching and/or shadowing
> variable values in efivarfs to reduce the number of direct EFI reads,
> with the goal of reducing how many SMIs are generated.

So if you do the caching scheme, the question about narrowing
permissions becomes moot...

> Any obvious EFI variables that userspace tools have come to depend on--
> those which normal, unprivileged users need to read-- are helpful inputs
> to this discussion.

... which solves the aspect of not breaking userspace nicely.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] efivars: reading variables can generate SMIs

2018-02-16 Thread Borislav Petkov
On Fri, Feb 16, 2018 at 10:58:47AM +, Ard Biesheuvel wrote:
> By your own reasoning above, that's a no-no as well.

I'm sure we can come up with some emulation - the same way we did the
BIOS emulation.

> But thanks for your input. Anyone else got something constructive to 
> contribute?

The not-breaking userspace is constructive contribution. The last
paragraph is my usual rant.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] efivars: reading variables can generate SMIs

2018-02-16 Thread Borislav Petkov
On Fri, Feb 16, 2018 at 10:41:45AM +, Ard Biesheuvel wrote:
> On 15 February 2018 at 18:22, Joe Konno  wrote:
> > From: Joe Konno 
> >
> > It was pointed out that normal, unprivileged users reading certain EFI
> > variables (through efivarfs) can generate SMIs. Given these nodes are 
> > created
> > with 0644 permissions, normal users could generate a lot of SMIs. By
> > restricting permissions a bit (patch 1), we can make it harder for normal 
> > users
> > to generate spurious SMIs.
> >
> > A normal user could generate lots of SMIs by reading the efivarfs in a 
> > trivial
> > loop:
> >
> > ```
> > while true; do
> > cat /sys/firmware/efi/evivars/* > /dev/null
> > done
> > ```
> >
> > Patch 1 in this series limits read and write permissions on efivarfs to the
> > owner/superuser. Group and world cannot access.
> >
> > Patch 2 is for consistency and hygiene. If we restrict permissions for 
> > either
> > efivarfs or efi/vars, the other interface should get the same treatment.
> >
> 
> I am inclined to apply this as a fix, but I will give the x86 guys a
> chance to respond as well.

That stinking pile EFI never ceases to amaze me...

Just one question: by narrowing permissions this way, aren't you
breaking some userspace which reads those?

And if you do, then that's a no-no.

Which then would mean that you'd have to come up with some caching
scheme to protect the firmware from itself.

Or we could simply admit that EFI is a piece of crap, kill it and
start anew, this time much more conservatively and not add a whole OS
underneath the actual OS.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Part1 PATCH v4 07/17] x86/efi: Access EFI data as encrypted when SEV is active

2017-09-17 Thread Borislav Petkov
On Sat, Sep 16, 2017 at 07:34:08AM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <thomas.lenda...@amd.com>
> 
> EFI data is encrypted when the kernel is run under SEV. Update the
> page table references to be sure the EFI memory areas are accessed
> encrypted.
> 
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: "H. Peter Anvin" <h...@zytor.com>
> Cc: Borislav Petkov <b...@suse.de>
> Cc: Andy Lutomirski <l...@kernel.org>
> Cc: Matt Fleming <m...@codeblueprint.co.uk>
> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
> Cc: linux-efi@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: x...@kernel.org
> Signed-off-by: Tom Lendacky <thomas.lenda...@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
> ---
>  arch/x86/platform/efi/efi_64.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)

Reviewed-by: Borislav Petkov <b...@suse.de>

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-09-15 Thread Borislav Petkov
On Fri, Sep 15, 2017 at 09:48:53AM -0500, Brijesh Singh wrote:
> I see the similar issue with non SEV guest with my simple patch below.
> Guest will reboot as soon as it tries to enable the key.

Can't do it there as the pagetable is not setup yet and you're probably
getting a #PF on any of the derefs down the static_key_enable() path.

I guess one possible place to enable the static key would be in
mem_encrypt_init() where everything should be set up already.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-09-15 Thread Borislav Petkov
On Fri, Sep 15, 2017 at 09:13:00AM -0500, Brijesh Singh wrote:
> thanks for the suggestion Boris, it will make patch much simpler.
> I will try this out.

It won't build - this was supposed to show the general idea.

I need to figure out the include hell first.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-09-04 Thread Borislav Petkov
On Fri, Sep 01, 2017 at 05:52:13PM -0500, Brijesh Singh wrote:
>  So far, we have not seen the need for having such functions except
> this cases. The approach we have right now works just fine and not
> sure if its worth adding new functions.

Then put the call to kvm_map_hv_shared_decrypted() into
kvm_smp_prepare_boot_cpu() to denote that you're executing this whole
stuff only once during guest init.

Now you're doing additional jumping-through-hoops with that once static
var just so you can force something which needs to execute only once but
gets called in a per-CPU path.

See what I mean?

> Thoughts ?
> 
> [1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of 
> memory

Add

[core]
abbrev = 12

to the core section of your .gitconfig.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 17/17] X86/KVM: Clear encryption attribute when SEV is active

2017-08-31 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:57PM -0500, Brijesh Singh wrote:
> The guest physical memory area holding the struct pvclock_wall_clock and
> struct pvclock_vcpu_time_info are shared with the hypervisor. Hypervisor
> periodically updates the contents of the memory. When SEV is active, we
> must clear the encryption attributes from the shared memory pages so that
> both hypervisor and guest can access the data.
> 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/entry/vdso/vma.c  |  5 ++--
>  arch/x86/kernel/kvmclock.c | 64 
> +++---
>  2 files changed, 58 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index 726355c..ff50251 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -114,10 +114,11 @@ static int vvar_fault(const struct vm_special_mapping 
> *sm,
>   struct pvclock_vsyscall_time_info *pvti =
>   pvclock_pvti_cpu0_va();
>   if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) {
> - ret = vm_insert_pfn(
> + ret = vm_insert_pfn_prot(
>   vma,
>   vmf->address,
> - __pa(pvti) >> PAGE_SHIFT);
> + __pa(pvti) >> PAGE_SHIFT,
> + pgprot_decrypted(vma->vm_page_prot));
>   }
>   } else if (sym_offset == image->sym_hvclock_page) {
>   struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page();

Yuck, that vvar_fault() function is one unreadable mess.

> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index d889676..f3a8101 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -45,7 +46,7 @@ early_param("no-kvmclock", parse_no_kvmclock);
>  
>  /* The hypervisor will put information about time periodically here */
>  static struct pvclock_vsyscall_time_info *hv_clock;
> -static struct pvclock_wall_clock wall_clock;
> +static struct pvclock_wall_clock *wall_clock;
>  
>  struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
>  {
> @@ -64,15 +65,18 @@ static void kvm_get_wallclock(struct timespec *now)
>   int low, high;
>   int cpu;
>  
> - low = (int)__pa_symbol(_clock);
> - high = ((u64)__pa_symbol(_clock) >> 32);
> + if (!wall_clock)
> + return;

Hmm, so if you return here, @now will remain unchanged so how is the
caller to know that ->get_wallclock() failed?

Maybe a WARN_ON_ONCE() at least...?

Dunno, what's the policy in kvm if the kvmclock init fails?

Paolo? Radim?

Because it does say:

printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
msr_kvm_system_time, msr_kvm_wall_clock);

too early. We can error out later and users will still think it is using
kvmclock ...

Hmmm.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-08-30 Thread Borislav Petkov
On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote:
> I was trying to avoid mixing early and no-early set_memory_decrypted() but if
> feedback is: use early_set_memory_decrypted() only if its required otherwise
> use set_memory_decrypted() then I can improve the logic in next rev. thanks

Yes, I think you should use the early versions when you're, well,
*early* :-) But get rid of that for_each_possible_cpu() and do it only
on the current CPU, as this is a per-CPU path anyway. If you need to
do it on *every* CPU and very early, then you need a separate function
which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP.

Thx.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-08-29 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:56PM -0500, Brijesh Singh wrote:
> Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU

   MSRs

> variable at compile time and share its physical address with hypervisor.

That sentence needs changing - the MSRs don't allocate - for them gets
allocated.

> It presents a challege when SEV is active in guest OS, when SEV is active,
> the guest memory is encrypted with guest key hence hypervisor will not
> able to modify the guest memory. When SEV is active, we need to clear the
> encryption attribute (aka C-bit) of shared physical addresses so that both
> guest and hypervisor can access the data.

This whole paragraph needs rewriting.

> To solve this problem, I have tried these three options:
> 
> 1) Convert the static per-CPU to dynamic per-CPU allocation and when SEV
> is detected clear the C-bit from the page table. But while doing so I
> found that per-CPU dynamic allocator was not ready when kvm_guest_cpu_init
> was called.
> 
> 2) Since the C-bit works on PAGE_SIZE hence add some extra padding to
> 'struct kvm-steal-time' to make it PAGE_SIZE and then at runtime

"to make it PAGE_SIZE"?

I know what it means but it reads strange and needs more restraint when
rewriting it. :)

> clear the encryption attribute of the full PAGE. The downside of this -
> we need to modify structure which may break the compatibility.
> 
> 3) Define a new per-CPU section (.data..percpu.hv_shared) which will be
> used to hold the compile time shared per-CPU variables. When SEV is
> detected we map this section without C-bit.
> 
> This patch implements #3.

>From Documentation/process/submitting-patches.rst:

 "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
  instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
  to do frotz", as if you are giving orders to the codebase to change
  its behaviour."

Also, never say "This patch" in a commit message of a patch. It is
tautologically useless.

> It introduces a new DEFINE_PER_CPU_HV_SHAHRED

There's no DEFINE_PER_CPU_HV_SHAHRED. Typo.

> macro to create a compile time per-CPU variable. When SEV is detected we
> clear the C-bit from the shared per-CPU variable.
> 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/kernel/kvm.c | 46 
> ---
>  include/asm-generic/vmlinux.lds.h |  3 +++
>  include/linux/percpu-defs.h   | 12 ++
>  3 files changed, 58 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 71c17a5..1f6fec8 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -75,8 +75,8 @@ static int parse_no_kvmclock_vsyscall(char *arg)
>  
>  early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
>  
> -static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
> -static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
> +static DEFINE_PER_CPU_HV_SHARED(struct kvm_vcpu_pv_apf_data, apf_reason) 
> __aligned(64);
> +static DEFINE_PER_CPU_HV_SHARED(struct kvm_steal_time, steal_time) 
> __aligned(64);
>  static int has_steal_clock = 0;
>  
>  /*
> @@ -303,7 +303,7 @@ static void kvm_register_steal_time(void)
>   cpu, (unsigned long long) slow_virt_to_phys(st));
>  }
>  
> -static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
> +static DEFINE_PER_CPU_HV_SHARED(unsigned long, kvm_apic_eoi) = 
> KVM_PV_EOI_DISABLED;
>  
>  static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
>  {
> @@ -319,11 +319,51 @@ static notrace void kvm_guest_apic_eoi_write(u32 reg, 
> u32 val)
>   apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK);
>  }
>  
> +/* NOTE: function is marked as __ref because it is used by __init functions 
> */

No need for that comment.

What should you look into is why do you need to call the early versions:

" * producing a warning (of course, no warning does not mean code is
 * correct, so optimally document why the __ref is needed and why it's OK)."

And we do have the normal set_memory_decrypted() etc helpers so why
aren't we using those?

If you need to use the early ones too, then you probably need to
differentiate this in the callers by passing a "bool early", which calls
the proper flavor.

> +static int __ref kvm_map_hv_shared_decrypted(void)
> +{
> + static int once, ret;
> + int cpu;
> +
> + if (once)
> + return ret;

So this function gets called per-CPU but you need to do this ugly "once"
thing - i.e., global function called in a per-CPU context.

Why can't you do that mapping only on the current CPU and then
when that function is called on the next CPU, it will do the same thing
on that next CPU?

> + /*
> +  * Iterate through all possible CPU's and clear the C-bit from
> +  * percpu variables.
> +  */
> + for_each_possible_cpu(cpu) {
> + struct kvm_vcpu_pv_apf_data *apf;
> +   

Re: [RFC Part1 PATCH v3 15/17] x86: Add support for changing memory encryption attribute in early boot

2017-08-28 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:55PM -0500, Brijesh Singh wrote:
> Some KVM-specific custom MSRs shares the guest physical address with

s/shares/share/

> hypervisor.

"the hypervisor."

> When SEV is active, the shared physical address must be mapped
> with encryption attribute cleared so that both hypervsior and guest can
> access the data.
> 
> Add APIs to change memory encryption attribute in early boot code.
> 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/include/asm/mem_encrypt.h |  17 ++
>  arch/x86/mm/mem_encrypt.c  | 117 
> +
>  2 files changed, 134 insertions(+)

...

> +static int __init early_set_memory_enc_dec(resource_size_t paddr,
> +unsigned long size, bool enc)
> +{
> + unsigned long vaddr, vaddr_end, vaddr_next;
> + unsigned long psize, pmask;
> + int split_page_size_mask;
> + pte_t *kpte;
> + int level;
> +
> + vaddr = (unsigned long)__va(paddr);
> + vaddr_next = vaddr;
> + vaddr_end = vaddr + size;
> +
> + /*
> +  * We are going to change the physical page attribute from C=1 to C=0
> +  * or vice versa. Flush the caches to ensure that data is written into
> +  * memory with correct C-bit before we change attribute.
> +  */
> + clflush_cache_range(__va(paddr), size);
> +
> + for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> + kpte = lookup_address(vaddr, );
> + if (!kpte || pte_none(*kpte))
> + return 1;

Return before flushing TLBs? Perhaps you mean

ret = 1;
goto out;

here and out does

__flush_tlb_all();
return ret;

?

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 14/17] x86/boot: Add early boot support when running with SEV active

2017-08-25 Thread Borislav Petkov
Btw,

I don't see our SEV-specific chicken bit which disables SEV only.

Do we need it? If so, maybe something like

mem_encrypt=sme_only

or so.

Or is the mem_encrypt=off chicken bit enough?

What about the use case where you want SME but no encrypted guests?

A bunch of hmmm.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 14/17] x86/boot: Add early boot support when running with SEV active

2017-08-23 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:54PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> Early in the boot process, add checks to determine if the kernel is
> running with Secure Encrypted Virtualization (SEV) active.
> 
> Checking for SEV requires checking that the kernel is running under a
> hypervisor (CPUID 0x0001, bit 31), that the SEV feature is available
> (CPUID 0x801f, bit 1) and then check a non-interceptable SEV MSR
> (0xc0010131, bit 0).
> 
> This check is required so that during early compressed kernel booting the
> pagetables (both the boot pagetables and KASLR pagetables (if enabled) are
> updated to include the encryption mask so that when the kernel is
> decompressed into encrypted memory.

, it can boot properly.

:)

> After the kernel is decompressed and continues booting the same logic is
> used to check if SEV is active and set a flag indicating so.  This allows
> us to distinguish between SME and SEV, each of which have unique
> differences in how certain things are handled: e.g. DMA (always bounce
> buffered with SEV) or EFI tables (always access decrypted with SME).
> 
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/boot/compressed/Makefile  |   2 +
>  arch/x86/boot/compressed/head_64.S |  16 +
>  arch/x86/boot/compressed/mem_encrypt.S | 103 
> +
>  arch/x86/boot/compressed/misc.h|   2 +
>  arch/x86/boot/compressed/pagetable.c   |   8 ++-
>  arch/x86/include/asm/mem_encrypt.h |   3 +
>  arch/x86/include/asm/msr-index.h   |   3 +
>  arch/x86/include/uapi/asm/kvm_para.h   |   1 -
>  arch/x86/mm/mem_encrypt.c  |  42 +++---
>  9 files changed, 169 insertions(+), 11 deletions(-)
>  create mode 100644 arch/x86/boot/compressed/mem_encrypt.S
> 
> diff --git a/arch/x86/boot/compressed/Makefile 
> b/arch/x86/boot/compressed/Makefile
> index 2c860ad..d2fe901 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -72,6 +72,8 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o 
> $(obj)/misc.o \
>   $(obj)/string.o $(obj)/cmdline.o $(obj)/error.o \
>   $(obj)/piggy.o $(obj)/cpuflags.o
>  
> +vmlinux-objs-$(CONFIG_X86_64) += $(obj)/mem_encrypt.o

There's a

ifdef CONFIG_X86_64

a couple of lines below. Just put it there.

...

> +++ b/arch/x86/boot/compressed/mem_encrypt.S
> @@ -0,0 +1,103 @@
> +/*
> + * AMD Memory Encryption Support
> + *
> + * Copyright (C) 2017 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> + .text
> + .code32
> +ENTRY(get_sev_encryption_bit)
> + xor %eax, %eax
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + push%ebx
> + push%ecx
> + push%edx
> +
> + /* Check if running under a hypervisor */
> + movl$1, %eax
> + cpuid
> + bt  $31, %ecx   /* Check the hypervisor bit */
> + jnc .Lno_sev
> +
> + movl$0x8000, %eax   /* CPUID to check the highest leaf */
> + cpuid
> + cmpl$0x801f, %eax   /* See if 0x801f is available */
> + jb  .Lno_sev
> +
> + /*
> +  * Check for the SEV feature:
> +  *   CPUID Fn8000_001F[EAX] - Bit 1
> +  *   CPUID Fn8000_001F[EBX] - Bits 5:0
> +  * Pagetable bit position used to indicate encryption
> +  */
> + movl$0x801f, %eax
> + cpuid
> + bt  $1, %eax/* Check if SEV is available */
> + jnc .Lno_sev
> +
> + movl$MSR_F17H_SEV, %ecx /* Read the SEV MSR */
> + rdmsr
> + bt  $MSR_F17H_SEV_ENABLED_BIT, %eax /* Check if SEV is active */
> + jnc .Lno_sev
> +
> + /*
> +  * Get memory encryption information:
> +  */

The side-comment is enough. This one above can go.

> + movl%ebx, %eax
> + andl$0x3f, %eax /* Return the encryption bit location */
> + jmp .Lsev_exit
> +
> +.Lno_sev:
> + xor %eax, %eax
> +
> +.Lsev_exit:
> + pop %edx
> + pop %ecx
> + pop %ebx
> +
> +#endif   /* CONFIG_AMD_MEM_ENCRYPT */
> +
> + ret
> +ENDPROC(get_sev_encryption_bit)
> +
> + .code64
> +ENTRY(get_sev_encryption_mask)
> + xor %rax, %rax
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + push%rbp
> + push%rdx
> +
> + movq%rsp, %rbp  /* Save current stack pointer */
> +
> + callget_sev_encryption_bit  /* Get the encryption bit position */

So we get to call get_sev_encryption_bit() here again and noodle through
the CPUID discovery and 

Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-08-22 Thread Borislav Petkov
On Wed, Jul 26, 2017 at 03:07:14PM -0500, Brijesh Singh wrote:
> Are you commenting on amount of code duplication ? If so, I can certainly 
> improve
> and use the similar macro used into header file to generate the functions 
> body.

So the argument about having CONFIG_AMD_MEM_ENCRYPT disabled doesn't
bring a whole lot because distro kernels will all have it enabled.

Optimally, it would be best if when SEV is enabled, we patch those IO
insns but we can't patch at arbitrary times - we just do it once, at
pre-SMP time.

And from looking at the code, we do set sev_enabled very early, as
part of __startup_64() -> sme_enable() so I guess we can make that
set a synthetic X86_FEATURE_ bit and then patch REP IN/OUT* with a
CALL, similar to what we do in arch/x86/include/asm/arch_hweight.h with
POPCNT.

But there you need to pay attention to registers being clobbered, see

  f5967101e9de ("x86/hweight: Get rid of the special calling convention")

Yap, it does sound a bit more complex but if done right, we will be
patching all call sites the same way we patch hweight*() calls and there
should be no change to kernel size...

As always, the devil is in the detail.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 11/17] x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory pages

2017-08-01 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:51PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> In order for memory pages to be properly mapped when SEV is active, we
> need to use the PAGE_KERNEL protection attribute as the base protection.
> This will insure that memory mapping of, e.g. ACPI tables, receives the
> proper mapping attributes.
> 
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/mm/ioremap.c  | 28 
>  include/linux/ioport.h |  3 +++
>  kernel/resource.c  | 17 +
>  3 files changed, 48 insertions(+)
> 
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index c0be7cf..7b27332 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -69,6 +69,26 @@ static int __ioremap_check_ram(unsigned long start_pfn, 
> unsigned long nr_pages,
>   return 0;
>  }
>  
> +static int __ioremap_res_desc_other(struct resource *res, void *arg)
> +{
> + return (res->desc != IORES_DESC_NONE);
> +}
> +
> +/*
> + * This function returns true if the target memory is marked as
> + * IORESOURCE_MEM and IORESOURCE_BUSY and described as other than
> + * IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
> + */
> +static bool __ioremap_check_if_mem(resource_size_t addr, unsigned long size)
> +{
> + u64 start, end;
> +
> + start = (u64)addr;
> + end = start + size - 1;
> +
> + return (walk_mem_res(start, end, NULL, __ioremap_res_desc_other) == 1);
> +}
> +
>  /*
>   * Remap an arbitrary physical address space into the kernel virtual
>   * address space. It transparently creates kernel huge I/O mapping when
> @@ -146,7 +166,15 @@ static void __iomem *__ioremap_caller(resource_size_t 
> phys_addr,
>   pcm = new_pcm;
>   }
>  
> + /*
> +  * If the page being mapped is in memory and SEV is active then
> +  * make sure the memory encryption attribute is enabled in the
> +  * resulting mapping.
> +  */
>   prot = PAGE_KERNEL_IO;
> + if (sev_active() && __ioremap_check_if_mem(phys_addr, size))
> + prot = pgprot_encrypted(prot);

Hmm, so this function already does walk_system_ram_range() a bit
earlier and now on SEV systems we're going to do it again. Can we make
walk_system_ram_range() return a distinct value for SEV systems and act
accordingly in __ioremap_caller() instead of repeating the operation?

It looks to me like we could...

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 09/17] resource: Consolidate resource walking code

2017-07-28 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:49PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> The walk_iomem_res_desc(), walk_system_ram_res() and walk_system_ram_range()
> functions each have much of the same code.  Create a new function that
> consolidates the common code from these functions in one place to reduce
> the amount of duplicated code.
> 
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
>  kernel/resource.c | 53 ++---
>  1 file changed, 26 insertions(+), 27 deletions(-)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 9b5f044..7b20b3e 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -397,9 +397,30 @@ static int find_next_iomem_res(struct resource *res, 
> unsigned long desc,
>   res->start = p->start;
>   if (res->end > p->end)
>   res->end = p->end;
> + res->desc = p->desc;
>   return 0;

I must be going blind: where are we using that res->desc?

> +static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
> +  bool first_level_children_only,

Btw, that variable name is insanely long.

The rest looks ok to me, thanks for the cleanup!

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 08/17] x86/efi: Access EFI data as encrypted when SEV is active

2017-07-28 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:48PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> EFI data is encrypted when the kernel is run under SEV. Update the
> page table references to be sure the EFI memory areas are accessed
> encrypted.
> 
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/platform/efi/efi_64.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index 12e8388..1ecb3f6 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -369,7 +370,10 @@ int __init efi_setup_page_tables(unsigned long 
> pa_memmap, unsigned num_pages)
>* as trim_bios_range() will reserve the first page and isolate it away
>* from memory allocators anyway.
>*/
> - if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, _PAGE_RW)) {
> + pf = _PAGE_RW;
> + if (sev_active())
> + pf |= _PAGE_ENC;

\n here

> + if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
>   pr_err("Failed to create 1:1 mapping for the first page!\n");
>   return 1;
>   }
> @@ -412,6 +416,9 @@ static void __init __map_region(efi_memory_desc_t *md, 
> u64 va)
>   if (!(md->attribute & EFI_MEMORY_WB))
>   flags |= _PAGE_PCD;
>  
> + if (sev_active())
> + flags |= _PAGE_ENC;
> +
>   pfn = md->phys_addr >> PAGE_SHIFT;
>   if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
>   pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
> @@ -511,6 +518,9 @@ static int __init efi_update_mappings(efi_memory_desc_t 
> *md, unsigned long pf)
>   pgd_t *pgd = efi_pgd;
>   int err1, err2;
>  
> + if (sev_active())
> + pf |= _PAGE_ENC;

Move this assignment to the caller efi_update_mem_attr() where pf is being
set...

> +
>   /* Update the 1:1 mapping */
>   pfn = md->phys_addr >> PAGE_SHIFT;
>   err1 = kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, md->num_pages, 
> pf);
> @@ -589,6 +599,9 @@ void __init efi_runtime_update_mappings(void)
>   (md->type != EFI_RUNTIME_SERVICES_CODE))
>   pf |= _PAGE_RW;
>  
> + if (sev_active())
> + pf |= _PAGE_ENC;

... just like here.

> +
>   efi_update_mappings(md, pf);

In general, I'm not totally excited about that sprinkling of if
(sev_active())... :-\

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 07/17] x86/mm: Include SEV for encryption memory attribute changes

2017-07-27 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:47PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> The current code checks only for sme_active() when determining whether
> to perform the encryption attribute change.  Include sev_active() in this
> check so that memory attribute changes can occur under SME and SEV.
> 
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/mm/pageattr.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index dfb7d65..b726b23 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, int 
> numpages, bool enc)
>   unsigned long start;
>   int ret;
>  
> - /* Nothing to do if the SME is not active */
> - if (!sme_active())
> + /* Nothing to do if SME and SEV are not active */
> + if (!sme_active() && !sev_active())

This is the second place which does

if (!SME && !SEV)

I wonder if, instead of sprinking those, we should have a

if (mem_enc_active())

or so which unifies all those memory encryption logic tests and makes
the code more straightforward for readers who don't have to pay
attention to SME vs SEV ...

Just a thought.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 03/17] x86/mm: Secure Encrypted Virtualization (SEV) support

2017-07-27 Thread Borislav Petkov
On Wed, Jul 26, 2017 at 11:47:32AM -0500, Tom Lendacky wrote:
> If it's made static then the sme_active()/sev_active() inline functions
> would need to be turned into functions within the mem_encrypt.c file. So
> there's a trade-off to do that, which is the better one?

Simple: why do we have functions if the variables are exported?

The reasoning for sme_me_mask is more or less obvious but for sev_enabled...

IOW, either make the bool static and unlinine the function - this way
you're free to change how you determine whether SEV is enabled later as
callers will be using the function.

Or, if it doesn't really matter because you can always change callers
later, simply drop sev_active() the function and use a bool sev_active
everywhere.

> The kernel needs to distinguish between running under SME and running
> under SEV. SME and SEV are similar but not the same. The trampoline code
> is a good example.  Before paging is activated, SME will access all
> memory as decrypted, but SEV will access all memory as encrypted.  So
> when APs are being brought up under SME the trampoline area cannot be
> encrypted, whereas under SEV the trampoline area must be encrypted.

I guess you're sensing by now that we need this clarification in a
comment above it...

:-)

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 06/17] x86/mm: Use encrypted access of boot related data with SEV

2017-07-27 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:46PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> When Secure Encrypted Virtualization (SEV) is active, boot data (such as
> EFI related data, setup data) is encrypted and needs to be accessed as
> such when mapped. Update the architecture override in early_memremap to
> keep the encryption attribute when mapping this data.
> 
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/mm/ioremap.c | 44 
>  1 file changed, 32 insertions(+), 12 deletions(-)

...

> @@ -590,10 +598,15 @@ bool arch_memremap_can_ram_remap(resource_size_t 
> phys_addr, unsigned long size,
>   if (flags & MEMREMAP_DEC)
>   return false;
>  
> - if (memremap_is_setup_data(phys_addr, size) ||
> - memremap_is_efi_data(phys_addr, size) ||
> - memremap_should_map_decrypted(phys_addr, size))
> - return false;
> + if (sme_active()) {
> + if (memremap_is_setup_data(phys_addr, size) ||
> + memremap_is_efi_data(phys_addr, size) ||
> + memremap_should_map_decrypted(phys_addr, size))
> + return false;
> + } else if (sev_active()) {
> + if (memremap_should_map_decrypted(phys_addr, size))
> + return false;
> + }
>  
>   return true;
>  }

I guess this function's hind part can be simplified to:

if (sme_active()) {
if (memremap_is_setup_data(phys_addr, size) ||
memremap_is_efi_data(phys_addr, size))
return false;
}

return ! memremap_should_map_decrypted(phys_addr, size);
}

> @@ -608,15 +621,22 @@ pgprot_t __init 
> early_memremap_pgprot_adjust(resource_size_t phys_addr,
>unsigned long size,
>pgprot_t prot)

And this one in a similar manner...

>  {
> - if (!sme_active())
> + if (!sme_active() && !sev_active())
>   return prot;

... and you don't need that check...

> - if (early_memremap_is_setup_data(phys_addr, size) ||
> - memremap_is_efi_data(phys_addr, size) ||
> - memremap_should_map_decrypted(phys_addr, size))
> - prot = pgprot_decrypted(prot);
> - else
> - prot = pgprot_encrypted(prot);
> + if (sme_active()) {

... if you're going to do it here too.

> + if (early_memremap_is_setup_data(phys_addr, size) ||
> + memremap_is_efi_data(phys_addr, size) ||
> + memremap_should_map_decrypted(phys_addr, size))
> + prot = pgprot_decrypted(prot);
> + else
> + prot = pgprot_encrypted(prot);
> + } else if (sev_active()) {

And here.

> + if (memremap_should_map_decrypted(phys_addr, size))
> + prot = pgprot_decrypted(prot);
> + else
> + prot = pgprot_encrypted(prot);
> + }

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 05/17] x86, realmode: Don't decrypt trampoline area under SEV

2017-07-26 Thread Borislav Petkov
 Subject: x86/realmode: ...

On Mon, Jul 24, 2017 at 02:07:45PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> When SEV is active the trampoline area will need to be in encrypted
> memory so only mark the area decrypted if SME is active.
> 
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/realmode/init.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 1f71980..c7eeca7 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -63,9 +63,11 @@ static void __init setup_real_mode(void)
>   /*
>* If SME is active, the trampoline area will need to be in
>* decrypted memory in order to bring up other processors
> -  * successfully.
> +  * successfully. For SEV the trampoline area needs to be in
> +  * encrypted memory, so only do this for SME.

Or simply say:

"It is not needed for SEV."

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 04/17] x86/mm: Don't attempt to encrypt initrd under SEV

2017-07-26 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:44PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> When SEV is active the initrd/initramfs will already have already been
> placed in memory encyrpted so do not try to encrypt it.
> 
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/kernel/setup.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 0bfe0c1..01d56a1 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -379,9 +379,11 @@ static void __init reserve_initrd(void)
>* If SME is active, this memory will be marked encrypted by the
>* kernel when it is accessed (including relocation). However, the
>* ramdisk image was loaded decrypted by the bootloader, so make
> -  * sure that it is encrypted before accessing it.
> +  * sure that it is encrypted before accessing it. For SEV the
> +  * ramdisk will already be encyrpted, so only do this for SME.
   ^
Please introduce a spellchecker into your patch creation workflow.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 02/17] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

2017-07-25 Thread Borislav Petkov
On Tue, Jul 25, 2017 at 10:29:40AM -0500, Tom Lendacky wrote:
> But early_identify_cpu() calls get_cpu_cap() which will check for cpuid
> leaf 0x8008 support and set x86_phys_bits.

Right, but it can't be less than 32, can it? And if it is more than 32
bits, then it probably doesn't really matter on 32-bit. Unless it is
less than 36 bits and you do PAE...

> I'll try to build and run a 32-bit kernel and see how this all flows.

Yeah, that would be good.

Thanks.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 02/17] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

2017-07-25 Thread Borislav Petkov
On Tue, Jul 25, 2017 at 09:58:54AM -0500, Tom Lendacky wrote:
> True, but it is more about being accurate and making sure the value is
> correct where ever it may be used.

So early_identify_cpu() initializes phys_bits to 32 on 32-bit.
Subtracting it there would actually make actively it wrong, AFAICT.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 01/17] Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV) descrption

2017-07-24 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:41PM -0500, Brijesh Singh wrote:

Subject: Re: [RFC Part1 PATCH v3 01/17] Documentation/x86: Add AMD Secure 
Encrypted Virtualization (SEV) descrption

 ^^

Please introduce a spellchecker into your workflow.

> Update amd-memory-encryption document describing the AMD Secure Encrypted

"Update the AMD memory encryption document...

The patch has the proper URL already.

> Virtualization (SEV) feature.
> 
> Signed-off-by: Brijesh Singh 
> ---
>  Documentation/x86/amd-memory-encryption.txt | 29 
> ++---
>  1 file changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/x86/amd-memory-encryption.txt 
> b/Documentation/x86/amd-memory-encryption.txt
> index f512ab7..747df07 100644
> --- a/Documentation/x86/amd-memory-encryption.txt
> +++ b/Documentation/x86/amd-memory-encryption.txt
> @@ -1,4 +1,5 @@
> -Secure Memory Encryption (SME) is a feature found on AMD processors.
> +Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV) are
> +features found on AMD processors.
>  
>  SME provides the ability to mark individual pages of memory as encrypted 
> using
>  the standard x86 page tables.  A page that is marked encrypted will be
> @@ -6,6 +7,12 @@ automatically decrypted when read from DRAM and encrypted 
> when written to
>  DRAM.  SME can therefore be used to protect the contents of DRAM from 
> physical
>  attacks on the system.
>  
> +SEV enables running encrypted virtual machine (VMs) in which the code and 
> data

 machines

> +of the virtual machine are secured so that decrypted version is available 
> only

... of the guest VM ...   ... so that a decrypted ...

> +within the VM itself. SEV guest VMs have concept of private and shared 
> memory.

have *the* concept - you need to use
definite and indefinite articles in your
text.

> +Private memory is encrypted with the guest-specific key, while shared memory
> +may be encrypted with hypervisor key.

And here you explain that the hypervisor key is the same key which we
use in SME. So that people can make the connection.

> +
>  A page is encrypted when a page table entry has the encryption bit set (see
>  below on how to determine its position).  The encryption bit can also be
>  specified in the cr3 register, allowing the PGD table to be encrypted. Each
> @@ -19,11 +26,20 @@ so that the PGD is encrypted, but not set the encryption 
> bit in the PGD entry
>  for a PUD which results in the PUD pointed to by that entry to not be
>  encrypted.
>  
> -Support for SME can be determined through the CPUID instruction. The CPUID
> -function 0x801f reports information related to SME:
> +When SEV is enabled, certain type of memory (namely insruction pages and 
> guest

When SEV is enabled, instruction pages and guest page tables are ...

> +page tables) are always treated as private. Due to security reasons all DMA

security reasons??

> +operations inside the guest must be performed on shared memory. Since the
> +memory encryption bit is only controllable by the guest OS when it is 
> operating

 ... is controlled ...

> +in 64-bit or 32-bit PAE mode, in all other modes the SEV hardware forces 
> memory

... forces the 
memory ...

> +encryption bit to 1.
> +
> +Support for SME and SEV can be determined through the CPUID instruction. The
> +CPUID function 0x801f reports information related to SME:
>  
>   0x801f[eax]:
>   Bit[0] indicates support for SME
> + 0x81f[eax]:

There's a 0 missing and you don't really need it as it is already above.

> + Bit[1] indicates support for SEV
>   0x801f[ebx]:
>   Bits[5:0]  pagetable bit number used to activate memory
>  encryption
> @@ -39,6 +55,13 @@ determine if SME is enabled and/or to enable memory 
> encryption:
>   Bit[23]   0 = memory encryption features are disabled
> 1 = memory encryption features are enabled
>  
> +If SEV is supported, MSR 0xc0010131 (MSR_F17H_SEV) can be used to determine 
> if

If this MSR is going to be part of the architecture - and I really think
it is - then call it MSR_AMD64_SEV.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 04/38] x86/CPU/AMD: Add the Secure Memory Encryption CPU feature

2017-07-10 Thread Borislav Petkov
On Tue, Jul 11, 2017 at 01:07:46AM -0400, Brian Gerst wrote:
> > If I make the scattered feature support conditional on CONFIG_X86_64
> > (based on comment below) then cpu_has() will always be false unless
> > CONFIG_X86_64 is enabled. So this won't need to be wrapped by the
> > #ifdef.
> 
> If you change it to use cpu_feature_enabled(), gcc will see that it is
> disabled and eliminate the dead code at compile time.

Just do this:

   if (cpu_has(c, X86_FEATURE_SME)) {
   if (IS_ENABLED(CONFIG_X86_32)) {
   clear_cpu_cap(c, X86_FEATURE_SME);
   } else {
   u64 msr;

   /* Check if SME is enabled */
  rdmsrl(MSR_K8_SYSCFG, msr);
  if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
  clear_cpu_cap(c, X86_FEATURE_SME);
   }
   }

so that it is explicit that we disable it on 32-bit and we can save us
the ifdeffery elsewhere.

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 34/36] x86/mm: Add support to encrypt the kernel in-place

2017-06-26 Thread Borislav Petkov
On Fri, Jun 23, 2017 at 12:44:46PM -0500, Tom Lendacky wrote:
> Normally the __p4d() macro would be used and that would be ok whether
> CONFIG_X86_5LEVEL is defined or not. But since __p4d() is part of the
> paravirt ops path I have to use native_make_p4d().

So __p4d is in !CONFIG_PARAVIRT path.

Regardless, we use the native_* variants in generic code to mean, not
paravirt. Just define it in a separate patch like the rest of the p4*
machinery and use it in your code. Sooner or later someone else will
need it.

> True, 5-level will only be turned on for specific hardware which is why
> I originally had this as only 4-level pagetables. But in a comment from
> you back on the v5 version you said it needed to support 5-level. I
> guess we should have discussed this more,

AFAIR, I said something along the lines of "what about 5-level page
tables?" and whether we care.

> but I also thought that should our hardware ever support 5-level
> paging in the future then this would be good to go.

There it is :-)

> The macros work great if you are not running identity mapped. You could
> use p*d_offset() to move easily through the tables, but those functions
> use __va() to generate table virtual addresses. I've seen where
> boot/compressed/pagetable.c #defines __va() to work with identity mapped
> pages but that would only work if I create a separate file just for this
> function.
> 
> Given when this occurs it's very similar to what __startup_64() does in
> regards to the IS_ENABLED(CONFIG_X86_5LEVEL) checks.

Ok.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   >