RE: Why does memblock only refer to E820 table and not EFI Memory Map?

2019-07-23 Thread Prakhya, Sai Praneeth
> > On x86 platforms, there are two sources through which kernel learns
> > about physical memory in the system namely E820 table and EFI Memory
> > Map. Each table describes which regions of system memory is usable by
> > kernel and which regions should be preserved (i.e. reserved regions
> > that typically have BIOS code/data) so that no other component in the
> > system could read/write to these regions. I think they are duplicating
> > the information and hence I have couple of questions regarding these
> >
> > 1. I see that only E820 table is being consumed by kernel [1] (i.e.
> > memblock subsystem in kernel) to distinguish between "usable" vs "reserved"
> regions.
> > Assume someone has called memblock_alloc(), the memblock subsystem
> > would service the caller by allocating memory from "usable" regions
> > and it knows this *only* from E820 table [2] (it does not check if EFI
> > Memory Map also says that this region is usable as well). So, why
> > isn't the kernel taking EFI Memory Map into consideration? (I see that
> > it does happen only when "add_efi_memmap" kernel command line arg is
> > passed i.e. passing this argument updates E820 table based on EFI
> > Memory Map) [3]. The problem I see with memblock not taking EFI Memory
> > Map into consideration is that, we are ignoring the main purpose for which 
> > EFI
> Memory Map exists.
> 
> https://blog.fpmurphy.com/2012/08/uefi-memory-v-e820-memory.html
> Probably above blog can explain some background.

Thanks a lot! Dave. The link was helpful, it did explain that Linus and HPA 
were 
not very happy with EFI and things were going good with E820 and hence it was 
given 
more preference compared to EFI.

But sadly, I am not 100% convinced yet :( (just my thoughts)
This decision was made a decade ago when EFI wasn't stable. Now that UEFI is 
the defacto 
on most of the x86 platforms (and since I believe UEFI is getting better) I am 
still unable to 
digest that kernel throws away EFI Memory Map (unless explicitly asked by 
"add_efi_memap")

Regards,
Sai


RE: [PATCH v2] x86/efi: fix a -Wtype-limits compilation warning

2019-06-19 Thread Prakhya, Sai Praneeth
> Compiling a kernel with W=1 generates this warning,
> 
> arch/x86/platform/efi/quirks.c:731:16: warning: comparison of unsigned
> expression >= 0 is always true [-Wtype-limits]
> 
> Fixes: 3425d934fc03 ("efi/x86: Handle page faults occurring while running EFI
> runtime services")
> Signed-off-by: Qian Cai 
> ---
> 
> v2: Add a "Fixes" tag.

Makes sense.
Thanks for the fix Qian Cai.

Regards,
Sai


RE: kexec crash on OVMF i386 + x86_64 kernel (Re: [PATCH v4] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel)

2019-04-17 Thread Prakhya, Sai Praneeth
> Added efi people.
> 
> I remember previously Sai did some efi32 tests for kexec, but I'm not sure if 
> he
> tested EFI32 + 64bit kernel.
> 
> Kexec status is not certain because I'm not sure anyone tesed and reported
> issues for that.

No.. I didn't test kexec on EFI32 + 64bit kernel and I also haven't tested 
kexec'ing from 
kexec'd kernel.

Regards,
Sai


RE: EFI reboot vs. ACPI reboot (was: Re: [tip:x86/urgent] x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51T)

2019-04-17 Thread Prakhya, Sai Praneeth
> > >> That seems counter-intuitive to me: if no full ACPI hardware is
> > >> implemented then we should assume reduced ACPI functionality, i.e.
> > >> if
> > >the
> > >> EFI runtime is otherwise available we should default to it.
> > >
> > >It's a bit confusing, but my loose understanding is that previous
> > >versions of the ACPI spec required system implementors to implement
> > >the whole thing; but that's increasingly impractical today, e.g. with
> > >ARM systems coming along, which do not gel well with some of the
> > >historical x86-rooted design aspects that spilled over into ACPI. The
> > >V5 spec introduces reduced mode as an opt-in new feature, but for
> > >compatibility with pre-V5 implementations it needs to consider "full
> > >hardware" mode as the default.
> > >
> > >> Feel free to send a patch that makes EFI reboot the default one
> > >> under these circumstances,
> > >
> > >Just to check, you mean: EFI reboot (and shutdown) become the default
> > >methods when the machine is booted in EFI mode, and EFI stuff has not
> > >been disabled with a kernel parameter?
> > >Even when running in full hardware ACPI mode.
> > >
> > >Thanks
> > >Daniel
> >
> > This, I believe, is known to not work.
> 
> It definitely used to be the case that EFI reboot was unreliable, but I don't 
> know
> the details.

Sorry! Ard.
I don't have the enough historical knowledge to state that EFI reboot is 
unreliable 
on most of the x86 systems but I did come across two systems that had buggy 
implementation 
of EFI reboot and hence I have added code to deal with EFI runtime services 
that cause page faults.

I will ask around internally to check if it is a good idea to make EFI reboot 
the default method 
when the machine is booted in EFI mode.

> I have added Sai to cc, he may be able to provide a bit more context
> here, since he added the code to deal with page faults during EFI runtime 
> service
> invocations (which was inspired by a EFI reboot issue IIRC). That feature may
> have improved the situation, but I am not confident at all that it fixes all
> systems.

I believe, the code I added should improve the situation but I am hesitant to 
say that it fixes 
*all* the systems because the code addresses only one type of bug namely EFI 
Reboot accessing 
illegal memory and I have no idea about other buggy implementations.

Regards,
Sai


RE: [PATCH 02/10] x86/efi: Return error status if mapping EFI regions fail

2019-02-04 Thread Prakhya, Sai Praneeth
> > > efi_map_region() creates VA mappings for an given EFI region using
> > > any one of the two helper functions (namely __map_region() and
> old_map_region()).
> > > These helper functions *could* fail while creating mappings and
> > > presently their return value is not checked. Not checking for the
> > > return value of these functions might create issues because after
> > > these functions return "md->virt_addr" is set to the requested
> > > virtual address (so it's assumed that these functions always succeed
> > > which is not quite true). This assumption leads to "md->virt_addr"
> > > having invalid mapping should any of
> > > __map_region() or old_map_region() fail.
> > >
> > > Hence, check for the return value of these functions and if indeed
> > > they fail, turn off EFI Runtime Services forever because kernel
> > > cannot prioritize among EFI regions.
> > >
[...]
> > > -void __init old_map_region(efi_memory_desc_t *md)
> > > +int __init old_map_region(efi_memory_desc_t *md)
> > >  {
> > >   u64 start_pfn, end_pfn, end;
> > >   unsigned long size;
> > > @@ -601,10 +601,14 @@ void __init old_map_region(efi_memory_desc_t
> *md)
> > >   va = efi_ioremap(md->phys_addr, size,
> > >md->type, md->attribute);
> > >
> > > - md->virt_addr = (u64) (unsigned long) va;
> > > - if (!va)
> > > + if (!va) {
> > >   pr_err("ioremap of 0x%llX failed!\n",
> > >  (unsigned long long)md->phys_addr);
> > > + return -ENOMEM;
> > > + }
> > > +
> > > + md->virt_addr = (u64)(unsigned long)va;
> > > + return 0;
> >
> > Just wondering, shouldn't the failure path set ->virt_addr to
> > something safe, just in case a caller doesn't check the error and relies on 
> > it?
> >
> > That's because in this commit we've now changed it from 0 to undefined.
> >
> 
> Indeed. We don't usually rely on the value of ->virt_addr when
> EFI_RUNTIME_SERVICES is unset, but there is some sysfs code, and perhaps
> some other places where we do reference ->virt_addr, and not assigning it at 
> all
> is obviously wrong, and potentially hazardous.
>

Ok.. makes sense.
Do you think md->virt_addr = 0 for fail case is ok?

> > > +int __init efi_map_region_fixed(efi_memory_desc_t *md) { return 0;
> > > +}
> >
> > Inline functions should be marked inline ...
> >
> > >   if (efi_va < EFI_VA_END) {
> > > - pr_warn(FW_WARN "VA address range overflow!\n");
> > > - return;
> > > + pr_err(FW_WARN "VA address range overflow!\n");
> > > + return -ENOMEM;
> > >   }
> > >
> > >   /* Do the VA map */
> > > - __map_region(md, efi_va);
> > > + if (__map_region(md, efi_va))
> > > + return -ENOMEM;
> > > +
> > >   md->virt_addr = efi_va;
> > > + return 0;
> >
> > Same error return problem of leaving ->virt_addr undefined.
> >

Sure! Will fix it in V2.

> > Note that I also fixed up the grammar and readability of the changelog
> > - see the updated version below.

Thanks for fixing :)

> >
> > Thanks,
> >
> > Ingo
> >
> > =>
> > Subject: x86/efi: Return error status if mapping of EFI regions fails
> > From: Ard Biesheuvel 
> > Date: Sat, 2 Feb 2019 10:41:11 +0100
> >
> > From: Sai Praneeth Prakhya 
> >
> > efi_map_region() creates VA mappings for a given EFI region using one
> > of the two helper functions (namely __map_region() and old_map_region()).
> >
> > These helper functions could fail while creating mappings and
> > presently their return value is not checked.
> >
> > Not checking for the return value of these functions might create
> > bugs, because after these functions return "md->virt_addr" is set to
> > the requested virtual address (so it's assumed that these functions
> > always succeed which is not quite true). This assumption leads to
> > "md->virt_addr" having invalid mapping, should any of __map_region()
> > or old_map_region() fail.
> >
> > Hence, check for the return value of these functions and if indeed
> > they fail, turn off EFI Runtime Services forever because kernel cannot
> > prioritize among EFI regions.
> >
> > This also fixes the comment "FIXME: add error handling" in
> > kexec_enter_virtual_mode().
> >
> 
> Thanks Ingo.
> 
> Sai, could you please respin this and use Ingo's updated version of the commit
> log?

Sure! I will send a V2 with the mentioned changes.

Regards,
Sai


RE: [PATCH] x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and EFI_MIXED_MODE

2018-12-28 Thread Prakhya, Sai Praneeth
> > Acked-by: Ard Biesheuvel 
> >
> > with the sidenote that I won't be able to test this myself until
> > monday at the earliest.
> 
> OK, so I have tested both efi=old_map and mixed mode before and after
> applying this patch, using QEMU/KVM with 64-bit and 32-bit builds of OVMF
> [respectively]
> 
> efi=old_map is indeed broken before and needs this patch.
> 
> Mixed mode works just fine both before and after applying the patch, but I
> suggest we keep this patch as-is and address mixed mode later if needed (I
> spotted a couple of things in the boot log that may need some attention but 
> I'm
> not sure if the issue is in Linux or in OVMF)

Thanks for testing this Ard.

For testing Mixed mode, I have used QEMU/KVM with 64-bit and 32-bit OVMF 
and mixed mode is broken for me without this patch.

I am using an older OVMF build and probably that might be the reason for us 
seeing 
different behaviors. Also, kernel config file might have some role to play.. I 
guess.

Regards,
Sai


RE: [PATCH] efi: memattr: don't bail on zero VA if it equals the region's PA

2018-12-26 Thread Prakhya, Sai Praneeth
> The EFI memory attributes code cross-references the EFI memory map with the
> more granular EFI memory attributes table to ensure that they are in sync 
> before
> applying the strict permissions to the regions it describes.
> 
> Since we always install virtual mappings for the EFI runtime regions to which
> these strict permissions apply, we currently perform a sanity check on the EFI
> memory descriptor, and ensure that the EFI_MEMORY_RUNTIME bit is set, and
> that the virtual address has been assigned.
> 
> However, in cases where a runtime region exists at physical address 0x0, and 
> the
> virtual mapping equals the physical mapping, e.g., when running in mixed mode
> on x86, we encounter a memory descriptor with the runtime attribute and
> virtual address 0x0, and incorrectly draw the conclusion that a runtime region
> exists for which no virtual mapping was installed, and give up altogether. The
> consequence of this is that firmware mappings retain their read-write-execute
> permissions, making the system more vulnerable to attacks.
> 
> So let's only bail if the virtual address of 0x0 has been assigned to a 
> physical
> region that does not reside at address 0x0.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  drivers/firmware/efi/memattr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c
> index 8f289dbc237c..58452fde92cc 100644
> --- a/drivers/firmware/efi/memattr.c
> +++ b/drivers/firmware/efi/memattr.c
> @@ -91,7 +91,7 @@ static bool entry_is_valid(const efi_memory_desc_t *in,
> efi_memory_desc_t *out)
> 
>   if (!(md->attribute & EFI_MEMORY_RUNTIME))
>   continue;
> - if (md->virt_addr == 0) {
> + if (md->virt_addr == 0 && md->phys_addr != 0) {
>   /* no virtual mapping has been installed by the stub */
>   break;
>   }

Thanks for the explanation Ard.
This change looks good to me.

Presently, on x86, even when the mapping fails we don't have any scenario where 
in 
md->phys_addr != 0 && md->virt_addr == 0

Regards,
Sai


RE: [PATCH] x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and EFI_MIXED_MODE

2018-12-22 Thread Prakhya, Sai Praneeth
> 
> * Sai Praneeth Prakhya  wrote:
> 
> > Commit d5052a7130a6 ("x86/efi: Unmap EFI boot services code/data
> > regions from efi_pgd") forgets to take two EFI modes into
> > consideration namely EFI_OLD_MEMMAP and EFI_MIXED_MODE.
> 
> So the commit sha1 ended up being this one in tip:efi/core:
> 
>   08cfb38f3ef4: x86/efi: Unmap EFI boot services code/data regions from
> efi_pgd

Ya.. sorry! I took the sha1 from efi tree.

> 
> > EFI_OLD_MEMMAP is a legacy way of mapping EFI regions into
> > swapper_pg_dir using ioremap() and init_memory_mapping(). This feature
> > can be enabled by passing "efi=old_map" as kernel command line
> > argument. But,
> > efi_unmap_pages() unmaps EFI boot services code/data regions *only*
> > from efi_pgd and hence cannot be used for unmapping EFI boot services
> > code/data regions from swapper_pg_dir.


[.]


> > +   /*
> > +* EFI mixed mode has all RAM mapped to access arguments while
> making
> > +* EFI runtime calls, hence don't unmap EFI boot services code/data
> > +* regions.
> > +*/
> > +   if (!efi_is_native() && IS_ENABLED(CONFIG_EFI_MIXED))
> > +   return;
> 
> I suppose old_mmap and mixed mode stopped working altogether after the
> unmapping changes?

Yes.. that's true.

> What are the symptoms, instant reboots, crasher, or some
> more benign behavior like non-working runtime EFI functionality?

I noticed a kernel panic due to page fault.

AFAIK, it happens because efi_pgd is not defined for old_map and mixed mode.
Hence efi_unmap_pages() which calls __change_page_attr() unmaps mappings 
from swapper_pg_dir (please note that this my theory but I haven't checked if
this is really true).

> 
> If Ard acks this I'll apply it immediately, as these bugs look like 
> show-stoppers for
> merging the EFI tree into v4.21.
> 
> Thanks,
> 
>   Ingo


RE: [PATCH V2 0/3] Fix EFI memory map leaks

2018-12-05 Thread Prakhya, Sai Praneeth
> > Although majority of the changes are made to
> > drivers/firmware/efi/memmap.c file (which is common across
> > architectures), this bug is only limited to x86_64 machines and hence this 
> > patch
> set shouldn't effect any other architectures.
> 
> I will give this a test run on the arm64 test hardware available at my end. I 
> have
> shared some minor comments on the patchset - see respective patch threads for
> the same.

Thanks a lot! for the comments  and taking time to test the patches :)
Really appreciate your effort :)

Regards,
Sai


RE: [PATCH V2 2/3] x86/efi: Fix EFI memory map leaks

2018-12-05 Thread Prakhya, Sai Praneeth
> > -static void __init efi_clean_memmap(void)
> > +void __init efi_clean_memmap(void)
> >   {
> > -   efi_memory_desc_t *out = efi.memmap.map;
> > -   const efi_memory_desc_t *in = out;
> > -   const efi_memory_desc_t *end = efi.memmap.map_end;
> > -   int i, n_removal;
> > -
> > -   for (i = n_removal = 0; in < end; i++) {
> > -   if (efi_memmap_entry_valid(in, i)) {
> > -   if (out != in)
> > -   memcpy(out, in, efi.memmap.desc_size);
> > -   out = (void *)out + efi.memmap.desc_size;
> > -   } else {
> > +   void *out;
> > +   efi_memory_desc_t *md;
> > +   unsigned int i = 0, n_removal = 0;
> 
> Perhaps we can use 'unsigned int i = n_removal = 0;', which appears more
> readable.

Hmm.. that actually throws an compilation error saying that "n_removal is 
undeclared".
So, let's stick with unsigned int i = 0, n_removal = 0;

> 
> > +   struct efi_memory_map new_memmap;
> > +
> > +   for_each_efi_memory_desc(md) {
> > +   if (!efi_memmap_entry_valid(md, i))
> > n_removal++;
> > -   }
> > -   in = (void *)in + efi.memmap.desc_size;
> > }
> >
> > -   if (n_removal > 0) {
> > -   u64 size = efi.memmap.nr_map - n_removal;
> > +   if (n_removal == 0)
> 
> and 'if (!n_removal)' above, to stay in sync with the rest of the code in 
> this file.

Sure! Thanks for the catch. I will fix it.

Regards,
Sai


RE: [PATCH V2 1/3] efi: Introduce efi_memmap_free() and efi_memmap_unmap_and_free()

2018-12-05 Thread Prakhya, Sai Praneeth
> > +/**
> > + * efi_memmap_free - Free memory pointed by new_memmap.map
> > + * @new_memmap: Structure that describes EFI memory map.
> > + *
> > + * Memory is freed depending on the type of allocation performed.
> > + */
> > +static void __init efi_memmap_free(struct efi_memory_map new_memmap)
> 
> ^^ Its not very clear what you mean by the term 'new_memmap' here.
> Also can we pass a pointer to struct efi_memory_map to this function rather
> than passing the entire struct.

Sure! I will change it.

> > +static void __init __efi_memmap_unmap(struct efi_memory_map
> > +new_memmap) {
> > +   if (!new_memmap.late) {
> > +   unsigned long size;
> > +
> > +   size = new_memmap.desc_size * new_memmap.nr_map;
> > +   early_memunmap(new_memmap.map, size);
> 
> Nitpick: May be we can avoid the extra local variable size and directly pass
> 'new_memmap.desc_size * new_memmap.nr_map' while calling
> early_memunmap(). I think the code would still be readable and the calculation
> seems self-explanatory.

Yes, we could do that but that would make early_memunmap() exceed 80 characters 
limit.
Personally, I feel single line code is more readable when compared to split 
lines 
and this local variable would anyways be gone once the function returns.

> > +/**
> > + * Unmap and free existing EFI Memory Map i.e. efi.memmap  */ void
> > +__init efi_memmap_unmap_and_free(void) {
> > +   if (!efi_enabled(EFI_MEMMAP)) {
> > +   WARN(1, "EFI_MEMMAP is not enabled");
> 
> Nitpick: Do we really need a WARN() here? May be we can do away with the
> same.

My thought behind adding a WARN() is to let the developer know that he's using 
the
API in wrong scenario i.e. trying to unmap/free memory map when it's not 
present.
If we just return, the developer might think that his code isn't buggy when he 
tries to 
unmap an non-existent EFI memory map.

Regards,
Sai


RE: [PATCH V2 0/3] Fix EFI memory map leaks

2018-12-05 Thread Prakhya, Sai Praneeth
> Since we're being pedantic, it also makes sense to decide now whether 'area'
> refers to all [discontiguous] regions or just one of them. I'd say the 
> former, and
> use 'region' for the latter, i.e., an area may be made up of several regions, 
> but
> only one exists of each type.
> 
> > Parameters and local variables should be named unambiguously following
> > these concepts as well. There should be no opaque 'new' variables
> > *especially* where the primary role of the efi_memmap_insert() API
> > call is to add a new *entry* - it should be 'new_efi_map_table' (or
> > 'new_table' where it's unambiguous) instead, and new entries added
> > should be 'new_entry' - etc.
> >
> > Once this is reorganized and clarified it should be a lot more easy to
> > review 'table freeing' patches. I can help out with patches if there's
> > no conceptual objections.

I have a question,

As this table freeing patches do some cleanup of some efi functions, namely, 
efi_clean_memmap(), efi_fake_memmap(), efi_arch_mem_reserve(), 
efi_memmap_install() and __efi_enter_virtual_mode(), I am not very sure if 
you want to do the name changing now and later review the patches or review the 
patches and later do the name changing.

As you pointed out that reviewing them now is pain, which I agree, but my 
concern is 
you might be looking at code and might as well change names that might 
eventually 
be gone because of these table freeing patches.

Regards,
Sai


RE: [PATCH V2 2/3] x86/efi: Fix EFI memory map leaks

2018-12-05 Thread Prakhya, Sai Praneeth
> > if (efi_enabled(EFI_MEMMAP)) {
> > +   /*
> > +* efi_clean_memmap() uses memblock_phys_alloc() to allocate
> > +* memory for new EFI memmap and hence will work only after
> > +* e820__memblock_setup()
> > +*/
> > +   efi_clean_memmap();
> > efi_fake_memmap();
> > efi_find_mirror();
> > efi_esrt_init();
> 
> I'd also suggest a namespace cleanup before we do any material
> modifications:
> 
> 'efi_esrt_init()' is the proper pattern to follow, it's prefixed by efi_, then
> followed by the more generic subsystem (_esrt) and then by the functionality
> (_init). This is a good, hierarchical, top-down nomenclature that makes it 
> easy to
> grep for ESRT functionality by typing 'git grep esrt'.
> 
> The same is not true of the memmap functionality: 'git grep efi_memmap_'
> doesn't do the right thing.
> 
> So I think this should be renamed to:
> 
>   efi_memmap_clean()
>   efi_memmap_insert()
>   efi_memmap_free()
>   efi_memmap_print()
>   efi_memmap_fake()
>   etc.
> 
> While at it, there's another area that could be improved:
> 
>  - efi_memmap_fake() is a bit weird naming: it's not really 'fake',
>presumably the specified table is still very much real, otherwise it
>won't result in a working kernel.
> 
>What that functionality *really* is about is user-defined entries. Why
>not name it in such a fashion? efi_memmap_init_user_defined() or so?

>From the example you gave about efi_esrt_init(), the naming of efi memmap 
related functions does look messy to me now.. and yes, a namespace clean up 
might be good.

Regards,
Sai


RE: [PATCH V2 1/3] efi: Introduce efi_memmap_free() and efi_memmap_unmap_and_free()

2018-12-05 Thread Prakhya, Sai Praneeth
> > +static void __init efi_memmap_free(struct efi_memory_map new_memmap)
> > +{
> > +   phys_addr_t start, end;
> > +   unsigned long size = new_memmap.nr_map * new_memmap.desc_size;
> > +   unsigned int order = get_order(size);
> > +
> > +   start = new_memmap.phys_map;
> > +   end = start + size;
> > +   if (new_memmap.late) {
> > +   __free_pages(pfn_to_page(PHYS_PFN(start)), order);
> > +   return;
> > +   }
> > +
> > +   if (memblock_free(start, size))
> > +   pr_err("Failed to free mem from %pa to %pa\n", , );
> 
> Why is this rather large structure passed in by value and not by reference?

My bad.. I will change it to pass by reference.

> Also, 'new_memmap' naming is confusing - by this time this is a rather old
> memmap table we are going to free, right? So naming it 'memmap' would be
> fine, right?

Yes, that should be fine. I will change it.

Regards,
Sai


RE: [GIT PULL 00/11] EFI updates

2018-11-30 Thread Prakhya, Sai Praneeth
> On Thu, 29 Nov 2018 at 19:27, Prakhya, Sai Praneeth
>  wrote:
> >
> > Hi Ard,
> >
> > While building from next branch of efi tree, I noticed the below warning. 
> > Could
> you please check the same on your side?
> >
> >   CC  lib/list_debug.o
> > drivers/firmware/efi/efi.c: In function 'efi_mem_reserve_persistent':
> > drivers/firmware/efi/efi.c:1000:6: warning: unused variable 'rsvsize' [-
> Wunused-variable]
> >   int rsvsize = EFI_MEMRESERVE_SIZE(1);
> >   ^~~
> >   CC  lib/bitrev.o
> >   CC  net/core/sock_reuseport.o
> >
> 
> Thanks Sai
> 
> Ingo spotted it as well and fixed it up before merging.

Great! :)

Regards,
Sai


RE: [GIT PULL 00/11] EFI updates

2018-11-29 Thread Prakhya, Sai Praneeth
Hi Ard,

While building from next branch of efi tree, I noticed the below warning. Could 
you please check the same on your side?

  CC  lib/list_debug.o
drivers/firmware/efi/efi.c: In function 'efi_mem_reserve_persistent':
drivers/firmware/efi/efi.c:1000:6: warning: unused variable 'rsvsize' 
[-Wunused-variable]
  int rsvsize = EFI_MEMRESERVE_SIZE(1);
  ^~~
  CC  lib/bitrev.o
  CC  net/core/sock_reuseport.o

> -Original Message-
> From: Ard Biesheuvel [mailto:ard.biesheu...@linaro.org]
> Sent: Thursday, November 29, 2018 9:12 AM
> To: linux-efi@vger.kernel.org; Ingo Molnar ; Thomas
> Gleixner 
> Cc: Ard Biesheuvel ; linux-ker...@vger.kernel.org;
> Andy Lutomirski ; Arend van Spriel
> ; Bhupesh Sharma ;
> Borislav Petkov ; Hansen, Dave ; Eric
> Snowberg ; Hans de Goede
> ; Joe Perches ; Jon Hunter
> ; Julien Thierry ; Marc
> Zyngier ; Nathan Chancellor
> ; Peter Zijlstra ; Prakhya,
> Sai Praneeth ; Sedat Dilek
> ; YiFei Zhu 
> Subject: [GIT PULL 00/11] EFI updates
> 
> The following changes since commit
> 976b489120cdab2b1b3a41ffa14661db43d58190:
> 
>   efi: Prevent GICv3 WARN() by mapping the memreserve table before first use
> (2018-11-27 13:50:20 +0100)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-next
> 
> for you to fetch changes up to 1d29afdbf7ae878a23627ebee81efcd213f11749:
> 
>   efi/x86: earlyprintk - Fix infinite loop on some screen widths (2018-11-28
> 17:58:42 +0100)
>

Regards,
Sai


RE: [PATCH V3 1/3] x86/mm/pageattr: Introduce helper function to unmap EFI boot services

2018-11-04 Thread Prakhya, Sai Praneeth
> Ideally, after kernel assumes control of the platform, firmware shouldn't 
> access
> EFI boot services code/data regions. But, it's noticed that this is not so 
> true in
> many x86 platforms. Hence, during boot, kernel reserves EFI boot services
> code/data regions [1] and maps [2] them to efi_pgd so that call to
> set_virtual_address_map() doesn't fail.
> After returning from set_virtual_address_map(), kernel frees the reserved
> regions [3] but they still remain mapped. Hence, introduce
> kernel_unmap_pages_in_pgd() which will later be used to unmap EFI boot
> services code/data regions.
> 
> While at it modify kernel_map_pages_in_pgd() by 1. Adding __init modifier
> because it's always used *only* during boot.
> 2. Add a warning if it's used after SMP is initialized because it uses
>__flush_tlb_all() which flushes mappings only on current CPU.
> 
> Unmapping EFI boot services code/data regions will result in clearing
> PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already
> handled by protnone_mask() at arch/x86/include/asm/pgtable-invert.h.
> 
> [1] efi_reserve_boot_services()
> [2] efi_map_region() -> __map_region() -> kernel_map_pages_in_pgd() [3]
> efi_free_boot_services()
> 
> Signed-off-by: Sai Praneeth Prakhya 
> Cc: Borislav Petkov 
> Cc: Ingo Molnar 
> Cc: Andy Lutomirski 
> Cc: Dave Hansen 
> Cc: Bhupesh Sharma 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: Ard Biesheuvel 

Wasn't sure (and hence didn't) if I should add Acked-by Ingo because this patch 
changed from V2.

Regards,
Sai


RE: [PATCH V2 1/2] x86/efi: Unmap EFI boot services code/data regions from efi_pgd

2018-10-29 Thread Prakhya, Sai Praneeth
> >  /*
> >   * The testcases use internal knowledge of the implementation that 
> > shouldn't
> >   * be exposed to the rest of the kernel. Include these directly here.
> > diff --git a/arch/x86/platform/efi/quirks.c
> > b/arch/x86/platform/efi/quirks.c index 669babcaf245..fb1c44b11235
> > 100644
> > --- a/arch/x86/platform/efi/quirks.c
> > +++ b/arch/x86/platform/efi/quirks.c
> 
> While you are at it, can you please split the EFI part out into a separate 
> patch?

Sure! I will do it in V3.

Regards,
Sai


RE: [PATCH V2 1/2] x86/efi: Unmap EFI boot services code/data regions from efi_pgd

2018-10-29 Thread Prakhya, Sai Praneeth
> > In general it's good to do on list because others can learn from the
> > answers as well.
> 
> So you somehow managed to keep everyone and the lists in CC nevertheless.
> Did not pay attention to the CC list until I got my reply on the list :)

Sorry! my bad.. yes, the conversation did happen on the list and it's by 
accident.
> 
> > > > > > +   retval = __change_page_attr_set_clr(, 0);

[]

> > > So some questions I have are,
> > > 1. What did Peter Z mean here? "How is that not a TLB invalidation bug ?"
> >
> > __flush_tlb_all() flushes only the mapping on the current CPU. So in a
> > SMP environment it's not sufficient.

Thanks for the explanation. The issue makes sense to me now.

[.]

> > I'll have a second look at the whole thing and reply on list.
> 
> So actually for the problem at hand __flush_tlb_all() works, because that is
> called way before the secondary CPUs are up.
> 
> Ditto for kernel_map_pages_in_pgd().

Yes, you are right. Both the functions are *always* called before SMP 
initialization
and as you rightly pointed out they should be marked with __init.

> But both functions want to be marked
> __init and aside of that they both should contain a warning when they are 
> called
> after the secondary CPUs have been brought up.

Makes sense too.. I will include these changes in V3.

Regards,
Sai


RE: [PATCH V2 1/2] x86/efi: Unmap EFI boot services code/data regions from efi_pgd

2018-10-29 Thread Prakhya, Sai Praneeth
Hi Thomas and Peter,

[off the mailing list because I wasn't sure if it's a good idea to spam others 
with my questions]

> > > + /*
> > > +  * The typical sequence for unmapping is to find a pte through
> > > +  * lookup_address_in_pgd() (ideally, it should never return NULL
> because
> > > +  * the address is already mapped) and change it's protections.
> > > +  * As pfn is the *target* of a mapping, it's not useful while unmapping.
> > > +  */
> > > + struct cpa_data cpa = {
> > > + .vaddr  = ,
> > > + .pgd= pgd,
> > > + .numpages   = numpages,
> > > + .mask_set   = __pgprot(0),
> > > + .mask_clr   = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> > > + .flags  = 0,
> > > + };
> > > +
> > > + retval = __change_page_attr_set_clr(, 0);
> > > + __flush_tlb_all();
> >
> > So this looks like you copied it from kernel_map_pages_in_pgd() which
> > has been discussed before to be not sufficient, but it can't be
> > changed right now due to locking issues.
>
> Managed to confuse myself. The place which cannot be changed is a different
> one, but still for your call site __flush_tlb_all() might not be sufficient.

Since the mappings are changed, I thought a flush_tlb() might be needed.
But, (to be honest), I don't completely understand the x86/mm code. So, could 
you 
please elaborate the issue more for my better understanding?

So some questions I have are,
1. What did Peter Z mean here? "How is that not a TLB invalidation bug ?"
2. I assumed kernel_map_pages_in_pgd() to be a good code but you said that 
it has some locking issues. So, could you please elaborate more on that.

Or, could you please provide me some pointers in the source code that I can 
take a look at so that I could understand the issue much better.

Regards,
Sai


RE: [PATCH V2 1/2] x86/efi: Unmap EFI boot services code/data regions from efi_pgd

2018-10-29 Thread Prakhya, Sai Praneeth
> > +int kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
> > + unsigned long numpages)
> > +{
> > +   int retval;
> > +
> > +   /*
> > +* The typical sequence for unmapping is to find a pte through
> > +* lookup_address_in_pgd() (ideally, it should never return NULL
> because
> > +* the address is already mapped) and change it's protections.
> > +* As pfn is the *target* of a mapping, it's not useful while unmapping.
> > +*/
> > +   struct cpa_data cpa = {
> > +   .vaddr  = ,
> > +   .pgd= pgd,
> > +   .numpages   = numpages,
> > +   .mask_set   = __pgprot(0),
> > +   .mask_clr   = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> > +   .flags  = 0,
> > +   };
> > +
> > +   retval = __change_page_attr_set_clr(, 0);
> > +   __flush_tlb_all();
> 
> So, just to clarify, 'pfn' is kept at 0 here? Might make sense to write it out
> explicitly like 'flags', even if it's not used by 
> __change_page_attr_set_clr().

Sure! Makes sense. I will add it.

Regards,
Sai


RE: [PATCH] efi: Fix debugobjects warning on efi_rts_work

2018-10-23 Thread Prakhya, Sai Praneeth



> -Original Message-
> From: Waiman Long [mailto:long...@redhat.com]
> Sent: Tuesday, October 23, 2018 7:18 AM
> To: Ard Biesheuvel 
> Cc: linux-efi@vger.kernel.org; linux-ker...@vger.kernel.org; Prakhya, Sai
> Praneeth ; Waiman Long
> 
> Subject: [PATCH] efi: Fix debugobjects warning on efi_rts_work
> 
> The commit 9dbbedaa6171 ("efi: Make efi_rts_work accessible to efi page fault
> handler") converted efi_rts_work from an auto variable to a global variable.
> However, when submitting the work, INIT_WORK_ONSTACK() was still used
> causing the following complaint from debugobjects:
> 
> ODEBUG: object ed27b500 is NOT on stack c7d38760, but
> annotated.
> 
> Change the macro to just INIT_WORK() to eliminate the warning.
> 
> Fixes: 9dbbedaa6171 ("efi: Make efi_rts_work accessible to efi page fault
> handler")

Thanks for fixing this :)
Looks good to me.

Regards,
Sai


RE: [PATCH 1/2] x86/efi: Unmap efi boot services code/data regions from efi_pgd

2018-10-22 Thread Prakhya, Sai Praneeth
> On 10/21/2018 06:57 PM, Ingo Molnar wrote:
> > Does the CPU _ever_ look look at the PFN if the page is
> > !_PAGE_PRESENT, for example speculatively? If yes then what is the
> > recommended value for the pfn - zero perhaps?
> 
> I'll never say never. :)
> 
> For L1TF[1], we know the CPU did exactly this; it ignored the _PAGE_PRESENT
> bit when fetching data from the L1.  That's what is worked around with the 
> gunk
> in arch/x86/include/asm/pgtable-invert.h.
> 
> I think Andi plugged the code in here at a low enough level in the page table
> manipulation that pageattr.c should inherit it without doing anything 
> explicit.
> But, Sai, you might want to double-check this.
> 
> 1.
> https://www.intel.com/content/www/us/en/architecture-and-
> technology/l1tf.html

Sure! Dave. Thanks for the link. I will take a look at it.

Regards,
Sai


RE: [PATCH 1/2] x86/efi: Unmap efi boot services code/data regions from efi_pgd

2018-10-22 Thread Prakhya, Sai Praneeth
> > The one bit that is odd is the cpa->pfn field - for unmapped pages
> > that's totally uninteresting and I'm wondering whether setting it to 0
> > wouldn't be better.
> >
> > Does the CPU _ever_ look look at the PFN if the page is
> > !_PAGE_PRESENT, for example speculatively? If yes then what is the
> > recommended value for the pfn - zero perhaps?
> >
> 
> This is L1TF, right?  Isn't all ones the recommended value?
> 
> I would love to see EFI start treating its mm just like any other mm at some
> point, though.  That is, it should not modify any mappings in the kernel 
> range,
> and it could use the regular mm modification APIs for the user range.  But 
> maybe
> this is a pipe dream.

Sounds good to me. I have some more fixes for EFI code and as soon as they are 
done, 
I will start looking into this.

Regards,
Sai


RE: [PATCH 1/2] x86/efi: Unmap efi boot services code/data regions from efi_pgd

2018-10-21 Thread Prakhya, Sai Praneeth
> > +int kernel_unmap_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> > + unsigned long numpages)
> > +{
> > +   int retval;
> > +
> > +   struct cpa_data cpa = {
> > +   .vaddr = ,
> > +   .pfn = pfn,
> > +   .pgd = pgd,
> > +   .numpages = numpages,
> > +   .mask_set = __pgprot(0),
> > +   .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
> > +   .flags = 0,
> > +   };
> > +
> > +   retval = __change_page_attr_set_clr(, 0);
> > +   __flush_tlb_all();
> > +
> > +   return retval;
> > +}
> 
> That's certainly a creative use of __change_page_attr_set_clr() by EFI used 
> for
> mapping in pages so far (kernel_map_pages_in_pgd()), and now used for
> unmapping as well. Doesn't look wrong, just a bit weird as part of CPA.
> 

Haha.. yes.. I copied from kernel_map_pages_in_pgd()

> Could you please write the initializer in an easier to read fashion:
> 
>   struct cpa_data cpa = {
>   .vaddr  = ,
>   .pfn= pfn,
>   .pgd= pgd,
>   .numpages   = numpages,
>   .mask_set   = __pgprot(0),
>   .mask_clr   = __pgprot(_PAGE_PRESENT | _PAGE_RW),
>   .flags  = 0,
>   };
> 
> ?

Sure!

> 
> The one bit that is odd is the cpa->pfn field - for unmapped pages that's 
> totally
> uninteresting and I'm wondering whether setting it to 0 wouldn't be better.
> 
> Does the CPU _ever_ look look at the PFN if the page is !_PAGE_PRESENT, for
> example speculatively? If yes then what is the recommended value for the pfn -
> zero perhaps?
> 
> Also note that if for whatever reason the PFN range of the EFI boot area gets
> hot-unplugged, we'd have outright invalid PFNs - although this is probably 
> very
> unlikely from a platform perspective.
> 
> > +/*
> > + * Apart from having VA mappings for efi boot services code/data
> > +regions,
> > + * (duplicate) 1:1 mappings were also created as a catch for buggy
> > +firmware. So,
> > + * unmap both 1:1 and VA mappings.
> > + */
> 
> Speling nits:
> 
> - please capitalize 'EFI' consistently.
> - s/catch/quirk ?
> 

Sure! I will fix them

> BTW., are the 1:1 'boot mappings' a buggy firmware quirk, or something
> required by the EFI spec? (or both? ;-)
> 

It's a quirk for buggy firmware.
According to EFI spec, EFI Boot Services code/data regions shouldn't be 
accessed 
after calling exit_boot_services(). This call is typically performed by 
bootloader 
(grub) or efi_stub.

> > +static void __init efi_unmap_pages(efi_memory_desc_t *md) {
> > +   pgd_t *pgd = efi_mm.pgd;
> > +   u64 pfn = md->phys_addr >> PAGE_SHIFT;
> 
> Note that this md->phys_addr isn't really meaningful once it gets unmapped.
> 

Yes, makes sense. In efi_free_boot_services(), after freeing up the memory and 
unmapping, a new memory map is created (which has only EFI Runtime regions) 
and hence we can safely assume that this memory descriptor and md->phys_addr 
would never be used.

> > +
> > +   if (kernel_unmap_pages_in_pgd(pgd, pfn, md->phys_addr, md-
> >num_pages))
> > +   pr_err("Failed to unmap 1:1 mapping: PA 0x%llx -> VA
> 0x%llx!\n",
> > +  md->phys_addr, md->virt_addr);
> > +
> > +   if (kernel_unmap_pages_in_pgd(pgd, pfn, md->virt_addr, md-
> >num_pages))
> > +   pr_err("Failed to unmap VA mapping: PA 0x%llx -> VA
> 0x%llx!\n",
> > +  md->phys_addr, md->virt_addr);
> 
> Please keep pr_err()'s in a single line. (and ignore checkpatch.)
>

Sure!

> > +}
> > +
> >  void __init efi_free_boot_services(void)  {
> > phys_addr_t new_phys, new_size;
> > @@ -415,6 +434,13 @@ void __init efi_free_boot_services(void)
> > }
> >
> > free_bootmem_late(start, size);
> > +
> > +   /*
> > +* Before calling set_virtual_address_map(), boot services
> > +* code/data regions were mapped as a catch for buggy
> firmware.
> > +* Unmap them from efi_pgd as they have already been freed.
> > +*/
> > +   efi_unmap_pages(md);
> 
> Ditto.
> 
> BTW., the ordering here is wrong: we should unmap any virtual aliases from
> pagetables _before_ we free the underlying memory. The ordering is probably
> harmless in this case but overall a good practice.

Sure! Makes sense. I will fix it in V2.

Regards,
Sai


RE: [PATCH V6 2/2] x86/efi: Add efi page fault handler to recover from page faults caused by the firmware

2018-09-12 Thread Prakhya, Sai Praneeth
> On 12 September 2018 at 20:56, Thomas Gleixner  wrote:
> > On Tue, 11 Sep 2018, Sai Praneeth Prakhya wrote:
> >> diff --git a/drivers/firmware/efi/runtime-wrappers.c
> b/drivers/firmware/efi/runtime-wrappers.c
> >> index b18b2d864c2c..7455277a3b65 100644
> >> --- a/drivers/firmware/efi/runtime-wrappers.c
> >> +++ b/drivers/firmware/efi/runtime-wrappers.c
> >> @@ -61,6 +61,11 @@ struct efi_runtime_work efi_rts_work;
> >>  ({   \
> >>   efi_rts_work.status = EFI_ABORTED;  \
> >>   \
> >> + if (!efi_enabled(EFI_RUNTIME_SERVICES)) {   \
> >> + pr_info("Aborting! EFI Runtime Services disabled\n");   \
> >
> > This probaby wants to be pr_info_ince().
> >

Yes.. that makes sense.

> 
> I can fix that up when applying.
>

Thanks a lot! Ard.
 
> > Other than that:
> >
> >   Reviewed-by: Thomas Gleixner 
> 
> Thanks! I will queue this up in efi/next


RE: [PATCH V6 0/2] Add efi page fault handler to recover from page

2018-09-12 Thread Prakhya, Sai Praneeth
> > This issue was reported by Al Stone when he saw that reboot via EFI
> > hangs the machine. Upon debugging, I found that it's
> > efi_reset_system() that's touching memory regions which it shouldn't.
> > To reproduce the same behavior, I have hacked OVMF and made
> > efi_reset_system() buggy. Along with efi_reset_system(), I have also
> > modified get_next_high_mono_count() and set_virtual_address_map().
> > They illegally access both boot time and other efi regions.
> >
> > Testing the patch set:
> > --
> > 1. Download buggy firmware from here [1].
> > 2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
> > Add reboot=efi to the kernel command line arguments and after the
> > kernel is up and running, type "reboot". The kernel should hang while
> rebooting.
> > 3. With the same setup, boot kernel after applying patches and the
> > reboot should work fine. Also please notice warning/error messages
> > printed by kernel.
> >
> 
> Did you test these patches with other buggy runtime services?

Yes, I did. I have modified efi runtime service GetNextHighMonotonicCount 
and made it buggy, when invoked from FWTS test suites the efi page fault 
handler works as expected (i.e. freezing efi_rts_wq and disabling efi runtime 
services forever).

Regards,
Sai


RE: [PATCH V5 0/2] Add efi page fault handler to recover from page

2018-09-10 Thread Prakhya, Sai Praneeth
Hi All,

> Changes from V4 to V5:
> --
> 1. Drop config option that enables efi page fault handler, instead make
>it default.
> 2. Call schedule() in an infinite loop to account for spurious wake ups.
> 3. Introduce "NONE" as an efi runtime service function identifier so that
>it could be used in efi_recover_from_page_fault() to check if the page
>fault was indeed triggered by an efi runtime service.

Please note that apart from dropping the config knob and accounting for 
spurious wake ups, I have added a third change.

I realized that when kernel command line option efi=old_map is passed, the efi 
page fault handler in V4 wouldn't work because, it checks for efi_mm.

In the efi page fault handler, I have also added an explicit check for x86_64 
because 
AFAIK, x86_32 bit machines do not suffer from this problem (Boris, please 
correct me if I am wrong).
Also, I was unable to trigger page faults with buggy OVMF for x86_32.

Just wanted to state the third change explicitly here so that it doesn't escape 
reviewers eyes.

Regards,
Sai


RE: [PATCH V4 0/3] Add efi page fault handler to recover from page

2018-09-07 Thread Prakhya, Sai Praneeth
> Thanks Sai for this work. I think this a step in the right direction.
> I tested this on qemu x86_64 with OVMF firmware modified to access some
> random address in the EFI_Reserved_Region. I was able to reboot the qemu
> instance successfully with the patches (see logs below) while without the
> patchset, reboot earlier used to get stuck.
> 
> So, feel free to add:
> Tested-by: Bhupesh Sharma 
> 

Thanks a lot Bhupesh, for trying the patches and as you said, the patches need 
a lot 
more testing on real machines.

> Qemu Console Logs:
> ---
> 
> # reboot
> 
> 
> 
> [   11.44] [ cut here ]
> [   11.400137] [Firmware Bug]: Page fault caused by firmware at PA: 0x7e924100
> [   11.400484] WARNING: CPU: 0 PID:  at
> arch/x86/platform/efi/quirks.c:691
> efi_recover_from_page_fault+0x3b/0xf0
> [   11.400751] Modules linked in:
> [   11.400992] CPU: 0 PID:  Comm: init Not tainted 4.18.0-rc5+ #1
> [   11.401146] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 0.0.0 02/06/2015
> [   11.401397] RIP: 0010:efi_recover_from_page_fault+0x3b/0xf0

[snipped stack trace]

> [   11.410378]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   11.410554] ---[ end trace ad3d0a220a88a45b ]---
> [   11.410742] efi: efi_reset_system() buggy! Reboot through BIOS

Thanks for the log, it looks good to me.


RE: [PATCH V4 3/3] x86/efi: Introduce EFI_PAGE_FAULT_HANDLER

2018-09-07 Thread Prakhya, Sai Praneeth
> > > Why make this optional?
> >
> > I made it as a config option in RFC because the page fault handler was
> > complicated and touching many parts (it had lots of code change and I
> > didn't want to break any existing functionality). Now that it's
> > simple, I don't think we need the config option.
> >
> > Without efi page fault handler, any page fault caused by firmware
> > should panic kernel but with this patch I think we are just improving 
> > existing
> condition (ideally).
> >
> > So, if Thomas, Ingo, Andy, Ard and Boris are ok.. I will make it as
> > default (i.e. without config).
> >
> > Regards,
> > Sai
> >
> Also, some distributions already have specific ways to handle buggy firmwares
> which can be at times dependent on the underlying hardware and the firmware
> versions.
> 
> I would suggest that we enable this under a CONFIG for the first round and 
> once
> it is tested with wider variety of x86 machines which have buggy or orphaned
> firmware and linux (and reboot) works fine with them, we can drop the CONFIG
> option in future and enable this by default.

Sounds fair to me, but, I would like to wait for someone experienced to make 
the final call.

Regards,
Sai


RE: [PATCH V4 0/3] Add efi page fault handler to recover from page

2018-09-07 Thread Prakhya, Sai Praneeth
> > The efi page fault handler will check if the access is by
> > efi_reset_system().
> > 1. If so, then the efi page fault handler will reboot the machine
> >through BIOS and not through efi_reset_system().
> > 2. If not, then the efi page fault handler will freeze efi_rts_wq and
> >schedules a new process.
> >
> 
> Thanks Sai! I am pretty happy how this patch set turned out. It still 
> requires the
> blessing of the x86 maintainers, of course, but from my pov, this is good to 
> go
> (but I will fold patch #3 into #2)

Hopefully, patch #3 goes away too.. in V5.

Regards,
Sai


RE: [PATCH V4 2/3] x86/efi: Add efi page fault handler to recover from page faults caused by the firmware

2018-09-07 Thread Prakhya, Sai Praneeth
> > +   if (efi_rts_work.efi_rts_id == RESET_SYSTEM) {
> > +   pr_info("efi_reset_system() buggy! Reboot through BIOS\n");
> > +   machine_real_restart(MRR_BIOS);
> > +   return 0;
> > +   }
> > +
> > +   /* Firmware has caused page fault, hence, freeze efi_rts_wq. */
> > +   set_current_state(TASK_UNINTERRUPTIBLE);
> 
> This doesn't freeze it, as such, it just sets the state.

True! Thanks for pointing it out. I will update the comment.

> > +
> > +   /*
> > +* Before calling EFI Runtime Service, the kernel has switched the
> > +* calling process to efi_mm. Hence, switch back to task_mm.
> > +*/
> > +   arch_efi_call_virt_teardown();
> > +
> > +   /* Signal error status to the efi caller process */
> > +   efi_rts_work.status = EFI_ABORTED;
> > +   complete(_rts_work.efi_rts_comp);
> > +
> > +   clear_bit(EFI_RUNTIME_SERVICES, );
> > +   pr_info("Froze efi_rts_wq and disabled EFI Runtime Services\n");
> 
> > +   schedule();
> 
> So what happens when we get a spurious wakeup and return from this?
> 
> Quite possibly you want something like:
> 
>   for (;;) {
>   set_current_state(TASK_IDLE);
>   schedule();
>   }
> 
> here. The TASK_UNINTERRUPTIBLE thing will cause the load-avg to spike; is that
> what you want?

Yes, makes sense. TASK_IDLE seems more appropriate. I will change it.

Regards,
Sai


RE: [PATCH V4 3/3] x86/efi: Introduce EFI_PAGE_FAULT_HANDLER

2018-09-07 Thread Prakhya, Sai Praneeth
> > There may exist some buggy UEFI firmware implementations that might
> > access efi regions other than EFI_RUNTIME_SERVICES_ even
> > after the kernel has assumed control of the platform. This violates
> > UEFI specification.
> >
> > If selected, this debug option will print a warning message if the
> > UEFI firmware tries to access any memory region which it shouldn't.
> > Along with the warning, the efi page fault handler will also try to
> > recover from the page fault triggered by the firmware so that the
> > machine doesn't hang.
> 
> Why make this optional?

I made it as a config option in RFC because the page fault handler was 
complicated and touching many parts (it had lots of code change and I didn't 
want 
to break any existing functionality). Now that it's simple, I don't think we 
need 
the config option.

Without efi page fault handler, any page fault caused by firmware should panic 
kernel but with this patch I think we are just improving existing condition 
(ideally).

So, if Thomas, Ingo, Andy, Ard and Boris are ok.. I will make it as default 
(i.e. without
config).

Regards,
Sai


RE: [PATCH V3 3/5] x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware

2018-09-06 Thread Prakhya, Sai Praneeth
> >> I agree. Keep it simple. If the EFI crap fails, then assist with the
> >> reboot and otherwise just kill it.
> >
> > The reasons for saving old memory map are (in my view, these are the
> > less important ones because they are very unlikely to happen)
> >
> > 1. Make sure that a memory descriptor exists for the physical address
> > that was faulted on (EFI Memory Map could sometime have holes).
> > Assuming a case that the physical address that caused page fault
> > doesn't have a valid efi memory descriptor, the efi page fault handler 
> > shouldn't
> take any action because it hasn't triaged the problem yet.
> >
> > 2. Make sure that the faulted physical address is _not_ efi runtime service
> code/data region.
> > Efi runtime service code/data regions are always mapped by kernel in
> > efi_pgd and accesses to these regions should _never_ page fault.
> > Assuming that something like this happens, efi page fault handler
> > shouldn't take any action because it's not any illegal access by firmware 
> > but it's
> a kernel bug.
> >
> 
> What about attempts to modify code regions or attempts to execute data
> regions? What kind of fault will that trigger, and are they being handled at 
> the
> moment?

AFAIK, at least in the x86 world, attempts to write to read only regions or 
attempts 
to execute XP (execute protected) pages will result in page fault and I don't 
think 
we are handling them.

> 
> As I pointed out, EFI runtime services code may legally access the stack or
> dereference pointer arguments, but could still contain bugs that result in 
> out of
> bounds accesses or writes to read-only regions.

Yes, agreed. In fact, I did see these bugs.

> So I don't really care about the address of the illegal access, any fault that
> occurs while running in the firmware should be treated the same.

Ok.. makes sense.

> In fact, cross
> referencing the value of IP with RuntimeServicesCode regions may be more
> useful

This is to verify that firmware is indeed executing code from 
RuntimeServicesCode
regions when it faulted. Is that correct? Or did you mean something else?

> > That said, a more important reason (in my view) is to print out the
> > memory descriptor that we faulted on. This is a *proof* showing that
> > it's buggy firmware that caused page fault and hence is not a kernel
> > bug. This proof is important because whenever a stack trace is printed
> > with some efi function, kernel is the usual suspect and hence we need to 
> > show
> that it's not kernel fault. It could also help firmware engineers to fix the 
> bug
> easily.
> >
> > dmesg would show something like this when buggy efi_reset_system()
> accesses reserved region:
> >
> > [  296.141511] efi: EFI Memory Descriptor for offending PA is:
> > [  296.141844] efi: [Reserved   |   |  |  |  |  |  |  |   
> > |WB|WT|WC|UC]
> range=[0x7e915000-0x7e933fff] (0MB)
> > [  296.142522] efi: efi_reset_system() buggy! Reboot through BIOS
> >
> > So, I would be concerned if we miss this proof.
> >
> 
> You can dump the entire memory map by putting efi=debug on the kernel
> command line, so all we need to do is report the physical address, and you can
> easily figure out for yourself which memory map entry covers it.

That's true. In fact, that's how I debugged this issue and hence thought that 
it might be 
useful to have all that info at one place (i.e. in efi page fault handler).
But, as you said, to make the code look simpler, I will roll out a V4 without 
saving 
original memory map.

Regards,
Sai


RE: [PATCH V3 3/5] x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware

2018-09-05 Thread Prakhya, Sai Praneeth
> On Wed, 5 Sep 2018, Ard Biesheuvel wrote:
> > On 5 September 2018 at 14:56, Peter Zijlstra  wrote:
> > > On Wed, Sep 05, 2018 at 02:27:49PM +0200, Ard Biesheuvel wrote:
> > >> Would we still need to preserve the old memory map in that case?
> > >
> > > I thought the reason for having this was being able to know the
> > > fault is in an EFI area. But of course, I'm not wel versed in this
> > > whole EFI crapola.
> >
> > I'm not entirely sure whether that really matters. The EFI services
> > access the stack and can access byref/pointer arguments which are not
> > covered by the EFI memory map as runtime services code/data, and so
> > they can trigger page faults by running off the vmapped stack or
> > writing to const byref arguments.
> >
> > EFI runtime services using boot services regions after they are no
> > longer available are a known source of headaches, but I don't see why
> > we should restrict ourselves to such cases if we bother to wire up
> > fault handling specifically for EFI services calls.
> >
> > So any page or permission fault occurring in the context of a EFI
> > runtime services invocation should be treated the same, I think.
> 
> I agree. Keep it simple. If the EFI crap fails, then assist with the reboot 
> and
> otherwise just kill it.

The reasons for saving old memory map are
(in my view, these are the less important ones because they are very unlikely 
to happen)

1. Make sure that a memory descriptor exists for the physical address that was 
faulted on (EFI Memory Map could sometime have holes). Assuming a case that the 
physical address that caused page fault doesn't have a valid efi memory 
descriptor, the 
efi page fault handler shouldn't take any action because it hasn't triaged the 
problem yet.

2. Make sure that the faulted physical address is _not_ efi runtime service 
code/data region.
Efi runtime service code/data regions are always mapped by kernel in efi_pgd 
and accesses 
to these regions should _never_ page fault. Assuming that something like this 
happens, 
efi page fault handler shouldn't take any action because it's not any illegal 
access by firmware 
but it's a kernel bug.

Generally, the above two scenarios should never happen. I am just being 
paranoid and wanted 
to make sure that the efi page fault handler is fixing the right firmware bug 
that I came across 
and not something else. I also agree that, we could make the patch set and efi 
page fault handler 
much simpler by not saving old memory map. So, I am OK if we are not checking 
for the above 
two scenarios. If they are really needed, we could add them later.

That said, a more important reason (in my view) is to print out the memory 
descriptor that 
we faulted on. This is a *proof* showing that it's buggy firmware that caused 
page fault and 
hence is not a kernel bug. This proof is important because whenever a stack 
trace is printed 
with some efi function, kernel is the usual suspect and hence we need to show 
that it's not 
kernel fault. It could also help firmware engineers to fix the bug easily.

dmesg would show something like this when buggy efi_reset_system() accesses 
reserved region:

[  296.141511] efi: EFI Memory Descriptor for offending PA is:
[  296.141844] efi: [Reserved   |   |  |  |  |  |  |  |   |WB|WT|WC|UC] 
range=[0x7e915000-0x7e933fff] (0MB)
[  296.142522] efi: efi_reset_system() buggy! Reboot through BIOS

So, I would be concerned if we miss this proof.

Regards,
Sai


RE: [PATCH V3 2/5] efi: Introduce __efi_init attribute

2018-09-04 Thread Prakhya, Sai Praneeth
Hi Boris and Ard,

> Buggy firmware could illegally access some efi regions even after the kernel 
> has
> assumed control of the platform. When
> "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is enabled, the efi page fault
> handler will detect and recover from these illegal accesses.
> efi_md_typeattr_format() and memory_type_name are used by the efi page
> fault handler to print information about memory descriptor that was illegally
> accessed. As the page fault handler is present during/after kernel boot it 
> doesn't
> have an __init attribute, but
> efi_md_typeattr_format() has it and thus during kernel build, "WARNING:
> modpost: Found * section mismatch(es)" build warning is observed. To fix it,
> remove __init attribute for efi_md_typeattr_format().
> 
> In order to not keep efi_md_typeattr_format() and memory_type_name
> needlessly when "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is not selected,
> add a new __efi_init attribute whose value changes based on whether the config
> option is selected or not.

In previous versions (i.e. up to V2), where we handled 
EFI_BOOT_SERVICES_ 
regions differently, it made sense to have a separate attribute like __efi_init 
because many 
function definitions were modified. From V3, do you think it's still OK to have 
__efi_init or 
should I just remove __init attribute (and not have __efi_init) for 
efi_md_typeattr_format() 
and memory_type_name because we are just modifying two.

Regards,
Sai


RE: [PATCH V2 4/6] x86/efi: Add efi page fault handler to fixup/recover from page faults caused by firmware

2018-09-03 Thread Prakhya, Sai Praneeth
> > The efi specific page fault handler offers us two advantages:
> > 1. Avoid panics/hangs caused by buggy firmware.
> > 2. Shout loud that the firmware is buggy and hence is not a kernel bug.
> >
> > Finally, this new mapping will not impact a reboot from kexec, as
> > kexec is only concerned about runtime memory regions.
> 
> No. This is just a horrible hack to make completely bogus firmware work and
> never fixed.
> 

Yes, that's true.

> The proper thing to do is to have a minimal page fault handler which does:
> 
>  1) Yell loudly if that ever happens
> 
>  2) Handles the reboot request gracefully
> 
>  3) Freeze and disable the EFI mess for all other cases
> 
> That does not require any hackery to make these mappings work from atomic
> context and keeps the mess confined to the EFI code where it belongs.
> 
> Ideally we just blacklist the offending system and be done with it.

This makes sense to me. I will implement the above suggested and as said should 
avoid the need for making mappings work from atomic context.

Regards,
Sai


RE: [PATCH V2 5/6] x86/mm: If in_atomic(), allocate pages without sleeping

2018-09-03 Thread Prakhya, Sai Praneeth
> Thanks for CC'ing me. I wonder why the world and some more is on CC, but
> x...@kernel.org is NOT.
>

Sorry! about that. I will CC you and x86 from V3.

> > On Sun, Sep 02, 2018 at 02:46:33AM -0700, Sai Praneeth Prakhya wrote:
> 
> > > @@ -926,7 +926,13 @@ static void unmap_pud_range(p4d_t *p4d,
> > > unsigned long start, unsigned long end)
> > >
> > >  static int alloc_pte_page(pmd_t *pmd)  {
> > > - pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
> > > + pte_t *pte;
> > > +
> > > + if (in_atomic())
> > > + pte = (pte_t *)get_zeroed_page(GFP_ATOMIC);
> > > + else
> > > + pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
> > > +
> > >   if (!pte)
> > >   return -1;
> > >
> >
> > This looks like tinkering to me..
> 
> Yes it is and it's not going to happen.

Haha.. ok.. I will take your suggestion in the 4th patch and that should avoid 
this.

Regards,
Sai


RE: [PATCH V2 2/6] x86/efi: Remove __init attribute from memory mapping functions

2018-09-03 Thread Prakhya, Sai Praneeth
> On Mon, Sep 03, 2018 at 05:03:56AM +0000, Prakhya, Sai Praneeth wrote:
> > Hmm.. thought that __efi_init might be confusing with the normal
> > __init attribute
> 
> How would that be confusing? It has "__efi" prepended?!
> 
> All I'm saying is, if you're going to define your own function attributes, do 
> them
> generic and short. "_fixup" is too specific IMO. It also enlarges function
> definitions unnecessarily.
> 

Yes.. agreed that the function definitions did enlarge.

> With "__efi_init" you already denote that it is a special attribute which has
> relevance in the EFI code only. What you do about it - the
> *fixup* - is the thing you do with the attribute. But you don't have to have 
> the
> "what you do" in the attribute name too.

Ok.. makes sense. Will roll a V3 with __efi_init.

Regards,
Sai


RE: [PATCH V2 2/6] x86/efi: Remove __init attribute from memory mapping functions

2018-09-02 Thread Prakhya, Sai Praneeth
> On Sun, Sep 02, 2018 at 02:46:30AM -0700, Sai Praneeth Prakhya wrote:
> > In order to not keep these functions needlessly when
> > "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is not selected, add a new
> > __efi_init_fixup attribute whose value changes based on whether the
> 
> Why not just __efi_init{,data} ?

Hmm.. thought that __efi_init might be confusing with the normal __init 
attribute
and hence added "fixup", so that it's explicit that we are trying to fixup the 
normal 
__init attribute. __efi_init_data doesn't seem appropriate because we are 
fixing 
both functions and data.

Regards,
Sai


RE: [PATCH V1 6/6] x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES

2018-08-10 Thread Prakhya, Sai Praneeth
> > +config EFI_WARN_ON_ILLEGAL_ACCESSES
> 
> How about shortening the name to just 'EFI_WARN_ON_ILLEGAL_ACCESS'?
> 

Makes sense.. I will shorten it.

> > +   bool "Warn about illegal memory accesses by firmware"
> > +   depends on EFI
> 
> From a distribution p-o-v, I would suggest that we set this CONFIG option 
> only if
> we are in the EXPERT mode, as this need more thrashing with the behaviour we
> see on old, buggy EFI firmwares available on very old x86 machines. So
> something like:
>bool "Warn about illegal memory accesses by firmware" if EXPERT
> 

Agreed. Although the feature is intended to warn about all these buggy 
firmware's
instead of silently working around (as we are doing presently), it needs more 
testing.
Hence, as you said, I think it's safe to have it as an EXPERT config option.

> > +   help
> > + Enable this debug feature so that the kernel can detect illegal
> > + memory accesses by firmware and issue a warning. Also,
> > + 1. If the illegally accessed region is 
> > EFI_BOOT_SERVICES_,
> > +the kernel fixes it up by mapping the requested region.

[]

> > diff --git a/init/main.c b/init/main.c index
> > 3b4ada11ed52..dce0520861a1 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -730,7 +730,8 @@ asmlinkage __visible void __init start_kernel(void)
> > arch_post_acpi_subsys_init();
> > sfi_init_late();
> >
> > -   if (efi_enabled(EFI_RUNTIME_SERVICES)) {
> > +   if (efi_enabled(EFI_RUNTIME_SERVICES) &&
> > +   !IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES)) {
> 
> Since this is an arch agnostic code leg, do we really want to introduce a 
> generic
> check for CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
> here, without checking for whether we are actually running this on a
> x86 hardware, or alternatively we can consider moving
> CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES out of 'arch/x86/Kconfig' to
> 'arch/Kconfig' so that later it can be used on other archs like ARM64 as well.
> 
> I think the later would be more useful..

Thanks for bringing this up. I see your point. I will refrain from polluting 
architecture
agnostic code. As we don't need efi_free_boot_services() at all, if 
EFI_WARN_ON_ILLEGAL_ACCESSES
is enabled, probably making changes to include/linux/efi.h would be better, I 
guess..

Moving this to arch/Kconfig could be too futuristic.. I guess.. because I am 
not sure
if this problem exists on ARM machines, if so, probably Ard could suggest 
something here.

Regards,
Sai


RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-08-07 Thread Prakhya, Sai Praneeth
> > 1. Since we don't use "efi_rts_wq" for efi_reset_system(), the above
> > solution doesn't work if efi_reset_system() is buggy. We cannot
> > neglect it either because that's the issue faced by Al Stone and hence
> > I started the patch set. So, I am thinking to use long jump technique 
> > *only* for
> efi_reset_system(). Would you be OK with that?
> > Or please propose a new solution.
> 
> The long jump thing is super ugly.  I'd be okay with it if it were somehow
> factored out into a testable mechanism, but it's a nasty complicated bit of
> assembly, it will only be used very very rarely, and it more or less 
> duplicates
> switch_to().

Yes, I agree that it's complicated assembly. But I tested it by hacking OVMF.
And.. you are correct that it's very very rarely used.. so probably there is no 
point adding complicated code for it.

> 
> Could you just do a fallback reboot from the fault handler if
> efi_reset_system() fails?
>

Sure! I will try that and will update you on how that goes. Hopefully, well.
 
> >
> > 2. Also, as suggested, I haven’t used a dedicated "efi_kthread", I got
> > this working with the existing "efi_rts_wq" setup. Please note that,
> > the existing "efi_rts_wq", has only one worker thread. Could you
> > please brief me on why we should have a dedicated kthread and not leverage
> "efi_rts_wq" (I am unclear on this).
> 
> Two reasons:
> 
> 1. If you used a dedicated thread, then you could probably just set that 
> thread's -
> >mm to the EFI mm rather than manually switching the mm.

That means, we don't need efi_switch_mm() anymore and could just get away
by changing efi_kthread ->mm = efi_mm.
Is that what you meant? Just wanted to make sure, I understood you correctly.

> 
> 2. If you used a dedicated thread, you would be a lot closer to being able to 
> call
> into EFI at CPL 3.  In x86 Linux (and probably most or all other arches), the 
> kernel
> never calls into CPL 3.  Instead it
> *returns* into CPL 3.  So you would set up a thread and make that thread's
> task_pt_regs() have the correct context for a little stub that calls into EFI 
> and
> then does "syscall".  Then you would return all the way out of the kernel 
> into the
> stub.

I got it.. not 100% though.. but I will surely work on this because it sounds 
very interesting to me.

But still, I have a question, this article https://lwn.net/Articles/23634/ says 
that
"each workqueue has one or more dedicated worker threads (one per CPU, by 
default) associated with it"
So, do you think we still need our own efi dedicated kthread?

Regards,
Sai
N�r��y���b�X��ǧv�^�)޺{.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�

RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-08-07 Thread Prakhya, Sai Praneeth
> > As discussed in previous mails, I have implemented the below:
> >
> > If kernel detects an illegal access by firmware to any efi region
> > other than efi boot time code/data, a. It freezes "efi_rts_wq" (efi
> > runtime services work queue).
> > b. Signals the requested process that an error occurred.
> > c. Disables EFI Runtime Services forever.
> > d. Explicitly calls the scheduler to schedule another process.
> >
> > But, I have couple of questions:
> >
> > 1. Since we don't use "efi_rts_wq" for efi_reset_system(), the above
> > solution doesn't work if efi_reset_system() is buggy. We cannot
> > neglect it either because that's the issue faced by Al Stone and hence
> > I started the patch set. So, I am thinking to use long jump technique 
> > *only* for
> efi_reset_system(). Would you be OK with that?
> > Or please propose a new solution.
> >
> 
> What would be the point of that? Can you reasonably expect the kernel to still
> run reliably after it has taken itself down, called
> efi_reset_system() and failed?
> 

The problem with efi_reset_system() being buggy is,
If user calls "reboot", kernel would call efi_reset_system() and if unhandled, 
kernel 
hangs and never reboots. But, if handled and returning error (either using long 
jump 
or some other technique), kernel could continue rebooting through other fall 
backs. 
I think, this is true only for x86 though.

Please note that native_machine_emergency_restart() would call "BOOT_BIOS", 
if efi_reboot() returns in arch/x86/kernel/reboot

> > 2. Also, as suggested, I haven’t used a dedicated "efi_kthread", I got
> > this working with the existing "efi_rts_wq" setup. Please note that,
> > the existing "efi_rts_wq", has only one worker thread. Could you
> > please brief me on why we should have a dedicated kthread and not leverage
> "efi_rts_wq" (I am unclear on this).
> >
> 
> Does every work queue have its own dedicated kernel thread?

Hmm.. I think so.. I referred to the below articles while working on 
"efi_rts_wq"

They say
1. Each workqueue has one or more dedicated worker threads (one per CPU, by 
default) associated with it.
2. Tasks running out of a special-purpose workqueue can sleep indefinitely, 
since they will not interfere with tasks in any other queue.

[1] https://lwn.net/Articles/23634/
[2] https://lwn.net/Articles/11360/

Regards,
Sai


RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-08-07 Thread Prakhya, Sai Praneeth
Hi All,

> > >> The main problem is that we have just merged Sai's code to use a
> > >> work queue for invoking EFI services, and killing kworker threads
> > >> is obviously not going to make EFI any new friends.
> > >>
> > >> So I guess we should probably rework that code to use a dedicated
> > >> kthread, and just freeze it when it performs an illegal memory
> > >> access, and signal the completion in a way that makes the calling
> > >> thread understand that a) the call failed and b) runtime services
> > >> are no longer
> > available.
> > >
> > > Yes, this makes sense to me.
> > > Initially I did use a dedicated kthread for efi but moved to work
> > > queues later so
> > that the synchronization is well handled. I am ok to rework on the
> > patches, could we ask Ingo to hold onto efi_workqueue patches?
> > >
> >
> > I am fine with keeping them. We will have a different approach in
> > v4.19 than in subsequent kernels, but the workqueue approach is still
> > better than nothing at all.

As discussed in previous mails, I have implemented the below:

If kernel detects an illegal access by firmware to any efi region other than 
efi boot time code/data,
a. It freezes "efi_rts_wq" (efi runtime services work queue).
b. Signals the requested process that an error occurred.
c. Disables EFI Runtime Services forever.
d. Explicitly calls the scheduler to schedule another process.

But, I have couple of questions:

1. Since we don't use "efi_rts_wq" for efi_reset_system(), the above solution 
doesn't work if efi_reset_system() is buggy. We cannot neglect it either 
because 
that's the issue faced by Al Stone and hence I started the patch set. So, I am 
thinking 
to use long jump technique *only* for efi_reset_system(). Would you be OK with 
that?
Or please propose a new solution.

2. Also, as suggested, I haven’t used a dedicated "efi_kthread", I got this 
working with 
the existing "efi_rts_wq" setup. Please note that, the existing "efi_rts_wq", 
has only one 
worker thread. Could you please brief me on why we should have a dedicated 
kthread 
and not leverage "efi_rts_wq" (I am unclear on this).

Regards,
Sai
N�r��y���b�X��ǧv�^�)޺{.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�

RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-26 Thread Prakhya, Sai Praneeth
> >> The main problem is that we have just merged Sai's code to use a work
> >> queue for invoking EFI services, and killing kworker threads is
> >> obviously not going to make EFI any new friends.
> >>
> >> So I guess we should probably rework that code to use a dedicated
> >> kthread, and just freeze it when it performs an illegal memory
> >> access, and signal the completion in a way that makes the calling
> >> thread understand that a) the call failed and b) runtime services are no 
> >> longer
> available.
> >
> > Yes, this makes sense to me.
> > Initially I did use a dedicated kthread for efi but moved to work queues 
> > later so
> that the synchronization is well handled. I am ok to rework on the patches, 
> could
> we ask Ingo to hold onto efi_workqueue patches?
> >
> 
> I am fine with keeping them. We will have a different approach in
> v4.19 than in subsequent kernels, but the workqueue approach is still better 
> than
> nothing at all.

Makes sense to me.



RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-26 Thread Prakhya, Sai Praneeth
> > The basic idea is that, when you get an exception from a context that
> > is busted (i.e. it wasn't user code with a signal handler or kernel
> > code with a fixup), you can't return from the exception handler.  That
> > leaves two choices:
> >
> > 1. Kill the task without returning.  You could call do_exit(), or,
> > roughly equivalently, let yourself OOPS (fall through to die(), etc).
> > Then you have to make sure that the code that called into the thread
> > can handle it by signaling some completion first or whatever you need
> > to do.
> >
> > 2. Sleep forever.  For example, set the task state to
> > TASK_UNINTERRUPTIBLE and reschedule.
> >
> > If the thread is a real kthread, (1) may be a bad idea. I’m not sure a
> > kthread can tolerate do_exit(), since the kthread core plays awful
> > games involving storing important data structures on the stack. Also,
> > with (2), it might be possible to enable some after-the-fact
> > debugging, since the full crashing state is still there.
> >
> 
> The main problem is that we have just merged Sai's code to use a work queue
> for invoking EFI services, and killing kworker threads is obviously not going 
> to
> make EFI any new friends.
> 
> So I guess we should probably rework that code to use a dedicated kthread, and
> just freeze it when it performs an illegal memory access, and signal the
> completion in a way that makes the calling thread understand that a) the call
> failed and b) runtime services are no longer available.

Yes, this makes sense to me.
Initially I did use a dedicated kthread for efi but moved to work queues later 
so that the synchronization is well handled. I am ok to rework on the patches, 
could we ask Ingo to hold onto efi_workqueue patches?

Regards,
Sai
N�r��y���b�X��ǧv�^�)޺{.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�

RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-26 Thread Prakhya, Sai Praneeth
I have added some x86/intel folks to cc.

I am fine with these patches, and I think it is useful to be able to
detect and recover from buggy UEFI implementations that use boot time
regions at runtime.

However, I need help from the x86 maintainers/developers to review
this so please cc them on these patches.

I'm okay with the general concept, but I'm not really thrilled by the 
longjmp-like approach.

Wasn't there a bunch of talk of having an actual kernel thread (kefid?) that 
makes runtime services calls?  Did that actually get implemented?  IMO a much 
nicer approach would be to handle the page fault by killing the thread, more or 
less like how we kill unruly user threads.  (And it's yet another step toward 
calling EFI runtime services at CPL 3!)

Hi Andy,

Thanks for the feedback ☺.

We have efi_kthread implemented and I did briefly think about killing the 
efi_kthread approach, but I thought it might not be possible (I might be wrong) 
because, we are in page fault handler and if we kill efi_kthread, the page 
fault handler still returns to firmware (because a firmware instruction caused 
page fault) and firmware will try to perform illegal access again thinking that 
the page fault handler might have fixed the fault. So, I took this approach of 
jumping out of firmware.

Please let me know If you think I missed something.

Regards,
Sai


RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-26 Thread Prakhya, Sai Praneeth
> > AFAIK, efi runtime services are not reentrant. With this in mind, if 
> > something
> like above happens, I have completely turned off EFI runtime services in 
> kernel.
> Is that OK? Or should we keep them enabled hoping to catch further illegal
> accesses (assuming that this feature is not used in production kernels).
> >
> 
> I think it is reasonable to turn off services after that. The only problem is 
> that
> distros will never be able to enable this, given that it may break systems 
> that are
> working fine today.

Actually these patches shouldn't break any existing behavior. Below are the 
possible illegal accesses.

1. If the illegal access was to boot time region, presently, it works during 
kernel boot but not after kernel boot, because we free boot time regions after 
set_virtual_address_map() is called. Please see 
efi_reserve/free_boot_services(). With the patches, we save boot time regions 
forever and hence illegal access could be fixed even after kernel boot. So, 
distros shouldn't see anything different here.

2. If the illegal access was to any other region except boot time region, 
presently, kernel panics both during and after kernel boot (this is the case 
reported by Al Stone). With these patches, we exit firmware context and hence 
fixup page fault handler. So, distros here, instead of seeing a kernel panic 
would see EFI Runtime Services disabled.

Regards,
Sai


RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-25 Thread Prakhya, Sai Praneeth
> I have added some x86/intel folks to cc.
> 
> I am fine with these patches, and I think it is useful to be able to detect 
> and
> recover from buggy UEFI implementations that use boot time regions at
> runtime.
> 
> However, I need help from the x86 maintainers/developers to review this so
> please cc them on these patches.

Hi Ard,

Sure! I will keep them cc'ed.

Could you also please let me know you thoughts on this approach

If the illegal access occurs to any EFI region other than EFI boot time regions 
(Eg: EFI conventional memory or EFI loader code/data), these patches will exit 
firmware context and return to kernel i.e. we are adjusting RIP and RSP in efi 
page fault handler and leaving runtime service execution abruptly. Is that OK?

This code in "[PATCH RFC 4/8] x86/efi: Add page fault handler to fixup/recover 
from page faults caused by firmware"
+   regs->sp = xmm_regs_rsp;
+   regs->ip = exit_fw_ctx_rip;
+   exited_fw_ctx = true;
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   pr_info("Exited Firmware context and disabled EFI Runtime Services\n");

AFAIK, efi runtime services are not reentrant. With this in mind, if something 
like above happens, I have completely turned off EFI runtime services in 
kernel. Is that OK? Or should we keep them enabled hoping to catch further 
illegal accesses (assuming that this feature is not used in production kernels).

Regards,
Sai
N�r��y���b�X��ǧv�^�)޺{.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�

RE: [RFC PATCH] x86/efi: drop task_lock() from efi_switch_mm()

2018-07-24 Thread Prakhya, Sai Praneeth
> > Because local_save_flags() does not disable interrupts??
> 
> now that you say so, it does make sense…
> 
> > > Anyway, I would still like to get rid of task_lock() in efi_switch_mm().
> > > Any objections to that?
> >
> > No, not at all.
> okay.

I don’t have any either and thanks for the clear commit message. It makes sense
why task_lock()/task_unlock() should go away.

Regards,
Sai
N�r��y���b�X��ǧv�^�)޺{.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�

RE: [PATCH 2/8] efi/x86: Use non-blocking SetVariable() for efi_delete_dummy_variable()

2018-07-15 Thread Prakhya, Sai Praneeth
> > diff --git a/arch/x86/platform/efi/quirks.c
> > b/arch/x86/platform/efi/quirks.c index 36c1f8b9f7e0..6af39dc40325
> > 100644
> > --- a/arch/x86/platform/efi/quirks.c
> > +++ b/arch/x86/platform/efi/quirks.c
> > @@ -105,12 +105,11 @@ early_param("efi_no_storage_paranoia",
> > setup_storage_paranoia);  */  void efi_delete_dummy_variable(void)  {
> > -   efi.set_variable((efi_char16_t *)efi_dummy_name,
> > -_DUMMY_GUID,
> > -EFI_VARIABLE_NON_VOLATILE |
> > -EFI_VARIABLE_BOOTSERVICE_ACCESS |
> > -EFI_VARIABLE_RUNTIME_ACCESS,
> > -0, NULL);
> > +   efi.set_variable_nonblocking((efi_char16_t *)efi_dummy_name,
> > +_DUMMY_GUID,
> > +EFI_VARIABLE_NON_VOLATILE |
> > +EFI_VARIABLE_BOOTSERVICE_ACCESS |
> > +EFI_VARIABLE_RUNTIME_ACCESS, 0, NULL);
> >  }
> 
> Just wondering, what is the full stack trace of the splat? It sounds a bit 
> surprising
> to me that such type of EFI code is used from the idle thread.

Sorry! for the confusing commit message. Kernel warns about scheduling from 
idle thread only when "efi_rts_wq" is
used to invoke efi_runtime_services(). So, presently, this doesn't happen on 
mainline kernel. Support for "efi_rts_wq"
is added by commit 3eb420e70d87 (efi: Use a work queue to invoke EFI Runtime 
Services).

With v4.18-rc5 kernel, the stack trace looks as below:
Please note that it's not just a warning but a kernel panic due to NULL pointer 
dereference.
If I remember correctly, I noticed "bad: scheduling from the idle thread!" 
warning during development phase (probably with v4.15 or v4.16 kernels).

[0.075052] BUG: unable to handle kernel NULL pointer dereference at 
01c2
[0.076000] PGD 0 P4D 0 
[0.076000] Oops:  [#1] SMP PTI
[0.076000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.18.0-rc5-efitest+ 
#216
[0.076000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
[0.076000] RIP: 0010:__queue_work+0x41/0x5f0
[0.076000] Code: fd 48 83 ec 10 8b 35 2e e2 79 01 89 7c 24 04 85 f6 74 17 
65 48 8b 04 25 40 4f 01 00 8b 88 54 0c 00 00 85 c9 0f 84 b5 02 00 00 <41> f6 84 
24 c2 01 00 00 01 0f 85 f7 03 00 00 48 bd eb 83 b5 80 46 
[0.076000] RSP: :82603cf0 EFLAGS: 00010046
[0.076000] RAX: 8262a7c0 RBX: 0246 RCX: 
[0.076000] RDX:  RSI: 0001 RDI: 2000
[0.076000] RBP: 82603da0 R08:  R09: 0001
[0.076000] R10:  R11:  R12: 
[0.076000] R13: 2000 R14: 82603da0 R15: 
[0.076000] FS:  () GS:88007e00() 
knlGS:
[0.076000] CS:  0010 DS:  ES:  CR0: 80050033
[0.076000] CR2: 01c2 CR3: 05a24001 CR4: 000606b0
[0.076000] Call Trace:
[0.076000]  queue_work_on+0x33/0x70
[0.076000]  virt_efi_set_variable+0x11f/0x160
[0.076000]  ? efi_call_virt_check_flags+0x80/0x80
[0.076000]  efi_delete_dummy_variable+0x8c/0xb0
[0.076000]  ? efi_enter_virtual_mode+0x42c/0x4e0
[0.076000]  efi_enter_virtual_mode+0x42c/0x4e0
[0.076000]  start_kernel+0x456/0x4f4
[0.076000]  secondary_startup_64+0xa5/0xb0
[0.076000] Modules linked in:
[0.076000] CR2: 01c2
[0.076000] ---[ end trace 5a03876c3be00272 ]---
[0.076000] RIP: 0010:__queue_work+0x41/0x5f0
[0.076000] Code: fd 48 83 ec 10 8b 35 2e e2 79 01 89 7c 24 04 85 f6 74 17 
65 48 8b 04 25 40 4f 01 00 8b 88 54 0c 00 00 85 c9 0f 84 b5 02 00 00 <41> f6 84 
24 c2 01 00 00 01 0f 85 f7 03 00 00 48 bd eb 83 b5 80 46 
[0.076000] RSP: :82603cf0 EFLAGS: 00010046
[0.076000] RAX: 8262a7c0 RBX: 0246 RCX: 
[0.076000] RDX:  RSI: 0001 RDI: 2000
[0.076000] RBP: 82603da0 R08:  R09: 0001
[0.076000] R10:  R11:  R12: 
[0.076000] R13: 2000 R14: 82603da0 R15: 
[0.076000] FS:  () GS:88007e00() 
knlGS:
[0.076000] CS:  0010 DS:  ES:  CR0: 80050033
[0.076000] CR2: 01c2 CR3: 05a24001 CR4: 000606b0
[0.076000] Kernel panic - not syncing: Attempted to kill the idle task!
[0.076000] ---[ end Kernel panic - not syncing: Attempted to kill the idle 
task! ]---

Regards,
Sai
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/6] efi/x86 mixed mode cleanups

2018-07-12 Thread Prakhya, Sai Praneeth
> >> Patch #6 helps unused code paths to be optimized away on
> >> configurations that don't need them (32-bit only and 64-bit only)
> >
> > I have tested this patch set on qemu and I see mixed mode kernel not 
> > booting.
> > My test setup is:
> > Running qemu-system-x86_64 with 32 bit OVMF and x86_64 kernel built with
> Mixed mode enabled.
> > I am using elilinux as bootloader.
> >
> > Upon further investigation, I found that the issue isn't related to this 
> > patch set.
> > It's introduced with commit 2e6eb40ca5eb ("Fix incorrect invocation of
> > PciIO->Attributes") Reverting the change does help in booting mixed mode
> kernel.
> >
> 
> Hello Sai,
> 
> This is likely due to the fact that you are missing this patch
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=efi/urgen
> t=e296701800f30d260a66f8aa1971b5b1bc3d2f81
> 
> Could you please double check? Thanks.

Thanks for updating me on this. I now see why my testing broke.
I think, the above patch is only in TIP tree and didn't yet make to Linus's 
tree and is also not part of efi tree next branch (these were the trees I am 
testing on).

Regards,
Sai


RE: [PATCH 1/6] efi/x86: prevent reentrant firmware calls in mixed mode

2018-07-12 Thread Prakhya, Sai Praneeth
> The UEFI spec does not permit runtime services to be called reentrantly, and 
> so
> it is up to the OS to provide proper locking around such calls.
> 
> For the native case, this was fixed a long time ago, but for the mixed mode 
> case,
> no locking is done whatsoever. Note that the calls are made with preemption
> and interrupts disabled, so only SMP configurations are affected by this 
> issue.
> 
> So add a spinlock and grab it when invoking a UEFI runtime service in mixed
> mode. We will also need to provide non-blocking versions of SetVariable() and
> QueryVariableInfo(), so add those as well.
> 
> Signed-off-by: Ard Biesheuvel 

FWIW, this patch looks good to me.

Regards,
Sai
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/6] efi/x86 mixed mode cleanups

2018-07-12 Thread Prakhya, Sai Praneeth
Hi Ard,

> This series contains some fixes and cleanups for mixed-mode UEFI on x86.
> 
> Patch #1 adds the missing locking in the runtime service wrapper code for 
> mixed
> mode. This was found by inspection rather than having been reported but could
> be a candidate for -stable nonetheless.
> 
> Patch #2 merges the remaining 32/64-bit specific parts of the setup_efi_pci
> routine.
> 
> Patches #3 and #4 do the same for the UGA draw protocol discovery routines.
> 
> Patch #5 fixes a latent bug in the UGA draw code.
> 
> Patch #6 helps unused code paths to be optimized away on configurations that
> don't need them (32-bit only and 64-bit only)

I have tested this patch set on qemu and I see mixed mode kernel not booting.
My test setup is:
Running qemu-system-x86_64 with 32 bit OVMF and x86_64 kernel built with Mixed 
mode enabled.
I am using elilinux as bootloader.

Upon further investigation, I found that the issue isn't related to this patch 
set.
It's introduced with commit 2e6eb40ca5eb ("Fix incorrect invocation of 
PciIO->Attributes")
Reverting the change does help in booting mixed mode kernel.

Regards,
Sai
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/6] efi: Introduce efi_memmap_free() to free memory allocated by efi_memmap_alloc()

2018-07-05 Thread Prakhya, Sai Praneeth
> > Signed-off-by: Sai Praneeth Prakhya 
> > Suggested-by: Ard Biesheuvel 
> > Cc: Lee Chun-Yi 
> > Cc: Dave Young 
> > Cc: Borislav Petkov 
> > Cc: Laszlo Ersek 
> > Cc: Jan Kiszka 
> > Cc: Dave Hansen 
> > Cc: Bhupesh Sharma 
> > Cc: Nicolai Stange 
> > Cc: Naresh Bhat 
> > Cc: Ricardo Neri 
> > Cc: Peter Zijlstra 
> > Cc: Taku Izumi 
> > Cc: Ravi Shankar 
> > Cc: Matt Fleming 
> > Cc: Dan Williams 
> > Cc: Ard Biesheuvel 
> 
> Do you really think you should cc all these people?
> 

Sorry! I get that..

> > ---
> >  drivers/firmware/efi/memmap.c | 28 
> >  include/linux/efi.h   |  8 
> >  2 files changed, 36 insertions(+)
> >

[snip]

> I am now thinking we should perhaps hide the allocation type from the callers
> rather than passing around implementation details of the allocator, and have
> explicit alloc/free operations.
> 
> Could we instead have a single 'efi_realloc_memmap()' function that takes care
> of all of this?

Sure! I will give it a try.

Regards,
Sai


RE: [PATCH] efi: Free existing memory map before installing new memory map

2018-06-27 Thread Prakhya, Sai Praneeth
> > Also, could you please clarify if there is any specific reason why
> > memory allocated using memblock_reserve() shouldn't be freed. I mean,
> > not with memblock_free() but I think we could make it _available_
> > using free_bootmem() (or something similar, please correct me if this is not
> the right API).
> 
> On arm64, the memory map is provided to the core kernel by the stub, and after
> kexec, a pointer to the same memory map will be passed to the next kernel. So
> the kernel does not 'own' that allocation, and it should not free it or 
> overwrite
> it.

Thanks for the reply. It confirms that the issue is only on x86 systems.
I see that arm64 doesn't call efi_memmap_alloc() and hence there is no concept 
of allocating memory for new memory map and installing it (so no memory leak).

Regards,
Sai


RE: [kbuild-all] [PATCH] efi: Free existing memory map before installing new memory map

2018-06-27 Thread Prakhya, Sai Praneeth
> >Since efi_memmap_free() is a recently introduced API [1], please note
> >that the patch is based on efi tree
> >@https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git
> 
> I noticed the official efi tree recored in MAINTAINERS is
> git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git,
> are they identical?
> 
> EXTENSIBLE FIRMWARE INTERFACE (EFI)
> M:  Matt Fleming 
> L:  linux-efi@vger.kernel.org
> T:  git git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git
> S:  Maintained
> F:  Documentation/efi-stub.txt
> F:  arch/ia64/kernel/efi.c
> F:  arch/x86/boot/compressed/eboot.[ch]
> F:  arch/x86/include/asm/efi.h
> F:  arch/x86/platform/efi/*
> F:  drivers/firmware/efi/*
> F:  include/linux/efi*.h

Hi Xiaolong,

Ard replaced Matt as EFI maintainer. Please see updated MAINTAINERS file for 
more details.
https://elixir.bootlin.com/linux/v4.18-rc2/source/MAINTAINERS

EXTENSIBLE FIRMWARE INTERFACE (EFI)
M:  Ard Biesheuvel 
L:  linux-efi@vger.kernel.org
T:  git git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git
S:  Maintained
F:  Documentation/efi-stub.txt
F:  arch/*/kernel/efi.c
F:  arch/x86/boot/compressed/eboot.[ch]
F:  arch/*/include/asm/efi.h
F:  arch/x86/platform/efi/
F:  drivers/firmware/efi/
F:  include/linux/efi*.h
F:  arch/arm/boot/compressed/efi-header.S
F:  arch/arm64/kernel/efi-entry.S

Regards,
Sai
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] efi: Free existing memory map before installing new memory map

2018-06-26 Thread Prakhya, Sai Praneeth
> > +   /* Free the memory allocated to the existing memory map */
> > +   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
> > + efi.memmap.late);
> > +
> > data.phys_map = addr;
> > data.size = efi.memmap.desc_size * nr_map;
> > data.desc_version = efi.memmap.desc_version;
> > --
> > 2.7.4
> >
> 
> If only it were so simple :-)
> 
> At this point, efi.memmap.phys_map could point to memory that was allocated
> early, allocated late or simply passed to the OS at boot time by the stub (in
> which case it was memblock_reserve()d but not memblock_alloc()d, and it
> should not be freed)
> 

Yes, completely agree that there could be three types of allocations for 
memmap. 
I thought, 
efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map, efi.memmap.late);

should work because the previous type of allocation should have been recorded 
in efi.memmap.late.
But, now I see this will fail for memblock_reserved() memory because it will be 
mistaken to 
memblock_alloced() (I assumed both are almost similar :().

> So only allocations made with efi_memmap_alloc() should be freed here.

Makes sense and I think that also means efi_memmap_free() should be called from 
function 
that called efi_memmap_alloc() and not efi_memmap_install().

> I'm not sure /how/ we should keep track of that: perhaps it is simply a 
> matter of
> replacing the boolean 'late' with an enum that describes where the memory
> came from that phys_map points to.

I did try changing boolean late to enum and it seems to be working fine. I will 
do more 
testing/clean up and will submit a patch for review.

Also, could you please clarify if there is any specific reason why memory 
allocated 
using memblock_reserve() shouldn't be freed. I mean, not with memblock_free() 
but I 
think we could make it _available_ using free_bootmem() (or something similar, 
please 
correct me if this is not the right API). If we allocate and install a new 
memory map (as 
in case with efi_fake_memmap()), I think we should free the memory used by 
memory map 
originally passed by EFI stub, because, at any point of time there should only 
be one active 
memory map. If we don't free the original memory map passed by EFI stub, we 
will be having
two and hence will be leaking memory.

Regards,
Sai


RE: [PATCH] efi: Free existing memory map before installing new memory map

2018-06-26 Thread Prakhya, Sai Praneeth
> 
> Hi Sai,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.18-rc2 next-20180625] [if your patch is applied 
> to
> the wrong git tree, please drop us a note to help improve the system]

Since efi_memmap_free() is a recently introduced API [1], please note that the 
patch 
is based on efi tree 
@https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git

[1] https://lkml.org/lkml/2018/6/22/39

Regards,
Sai
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V5 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-06-06 Thread Prakhya, Sai Praneeth
> >> I have tested V4 series on ThunderX2 Val1 and Saber platform by
> >> integrating the patches with LUV project.
> >> I will test V5 version soon and let you know the outcome.
> >
> > Thanks a lot! for taking time to test the patch set, Naresh. Will be 
> > waiting to
> see the result.
> 
> Sai, I did upgrade LUV kernel to v4.17, apply and test your patches by running
> LUV on QEMU and ThunderX2 hardware. All efi test results looks ok.

Thanks a lot! for the update :)

> But as Ard pointed, I also noticed hang issue after executing reboot command 
> on console
> on real hardware.

Ard has fixed the issue before applying and the latest patches can be found at:
https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git/log/?h=next

Regards,
Sai


RE: [PATCH V5 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services

2018-06-05 Thread Prakhya, Sai Praneeth
> >> I noticed that -unsurprisingly- reboot no longer works with these changes.
> >>
> >> I will fix up the patch, and revert the efi_reset_system() change,
> >> both here and below.
> >
> > Could you please let me know what the bug is here? I am unable to see
> > it right away :( I have tested reboot on qemu x86_64 by passing
> > "reboot=efi" as command line arg and saw that reboot is working fine.
> >
> 
> My arm64 hangs at reboot or power off, unless i revert the ResetSystem() part.
>

That's interesting. Not sure how the behavior could be different between x86 
and arm64.
Maybe worth debugging further?
As you said, reverting ResetSystem() part makes things work, we cannot suspect 
a 
buggy efi_reset_system() (I came across buggy efi_reset_system() 
implementations 
on some x86 systems).

> But given that it is both risky (relying on a kthread running a workqueue in 
> the
> shutdown path) and unnecessary (ResetSystem() is not supposed to return, and 
> is
> only called by the kernel when the whole system has already been torn down), I
> think the main reason is simply that there is no reason to add it.

Makes sense.
I think, before calling firmware ResetSystem() kernel has already shut down all 
other parts 
and hence as the control will not transfer back to kernel (other than 
ResetSystem() failure), 
we could safely assume that no kernel code would access user space while in 
efi_reset_system() and hence we don't need to take the work queue path.

Regards,
Sai
N�r��y���b�X��ǧv�^�)޺{.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�

RE: [PATCH V5 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services

2018-06-05 Thread Prakhya, Sai Praneeth
> > +   case RESET_SYSTEM:
> > +   __efi_call_virt(reset_system, *(int *)arg1,
> > +   *(efi_status_t *)arg2,
> > +   *(unsigned long *)arg3,
> > +   (efi_char16_t *)arg4);
> > +   break;
> 
> I noticed that -unsurprisingly- reboot no longer works with these changes.
> 
> I will fix up the patch, and revert the efi_reset_system() change, both here 
> and
> below.

Could you please let me know what the bug is here? I am unable to see it right 
away :(
I have tested reboot on qemu x86_64 by passing "reboot=efi" as command line arg 
and 
saw that reboot is working fine.

> > @@ -340,7 +441,8 @@ static void virt_efi_reset_system(int reset_type,
> > "could not get exclusive access to the firmware\n");
> > return;
> > }
> > -   __efi_call_virt(reset_system, reset_type, status, data_size, data);
> > +   efi_queue_work(RESET_SYSTEM, _type, , _size, data,
> > +  NULL);
> > up(_runtime_lock);
> >  }

Regards,
Sai
N�r��y���b�X��ǧv�^�)޺{.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�

RE: [PATCH V5 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-06-03 Thread Prakhya, Sai Praneeth
> > Testing:
> > 
> > Tested using LUV (Linux UEFI Validation) for x86_64, x86_32 and arm64
> > (qemu only). Will appreciate the effort if someone could test the
> > patches on real ARM/ARM64 machines.
> > LUV: https://01.org/linux-uefi-validation
> 
> I have tested V4 series on ThunderX2 Val1 and Saber platform by integrating 
> the
> patches with LUV project.
> I will test V5 version soon and let you know the outcome.

Thanks a lot! for taking time to test the patch set, Naresh. Will be waiting to 
see the result.

Regards,
Sai
N�r��y���b�X��ǧv�^�)޺{.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�

RE: [PATCH V5 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-06-01 Thread Prakhya, Sai Praneeth
> >> Testing:
> >> 
> >> Tested using LUV (Linux UEFI Validation) for x86_64, x86_32 and arm64
> >> (qemu only). Will appreciate the effort if someone could test the
> >> patches on real ARM/ARM64 machines.
> 
> I would give the latest v5 a try on my ARM64 qualcomm board as well.
> WIll get back with the test results soon.
> 

Sure! Testing is always welcomed :)
and if you get a chance please try on UV300 machine too.. (probably it might 
find issues 
as happened with mm_struct patches).

Regards,
Sai


RE: [PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-05-27 Thread Prakhya, Sai Praneeth
> > Another follow on question is, does every firmware support both
> > blocking and non-blocking variants (specially legacy EFI firmware)? I
> > am worried about this because, presently efi_delete_dummy_variable()
> > uses set_variable() and
> > query_variable_info() but if I change efi_delete_dummy_variable() to
> > use non-blocking variants and if they aren’t supported, then, I guess,
> > efi_delete_dummy_variable() might fail :(
> >
> > So, could you please clarify on that?
> >
> 
> I don't follow. Why should it make any difference to the firmware whether the
> OS routines blocks or gives up? We always honor the mutual exclusion between
> different invocations of runtime services, and the firmware itself has no
> awareness of the kind of scheduling the OS needs to do to ensure this.

Sorry! my bad.. I thought firmware (with EFI System table revision > 2.0 ) 
offers two 
types of efi run time services, a blocking variant and a non-blocking variant. 
But, now I 
noticed in the spec that there is only set_variable() but _no_ 
set_variable_nonblocking(). 
Same with query_variable_info(). The same is also seen in runtime-wrappers.c 
file. 
Both the blocking and non-blocking variants call the same efi runtime service. 
I see that 
non-blocking() variants are just an additional feature (API) offered by OS.

Regards,
Sai


RE: [PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-05-27 Thread Prakhya, Sai Praneeth
> > One more question again, if we are sure that non-blocking variants
> > will _always_ be called in atomic context, then, we got it covered.
> > Because, in
> > set_variable() and query_variable_info() (both blocking and
> > non-blocking) we check for in_atomic() and if so, we don't use efi_rts_wq
> (please refer to patch 3).
> >
> > If you think, there might be a probability of calling non-blocking
> > efi_rts out of atomic context, then, sure! Let's make them never use
> efi_rts_wq.
> >
> 
> This is not about what happens to be the current situation. It is about the 
> API.
> 
> The non-blocking functions should never block, period. They either fail 
> gracefully
> or perform their duties without sleeping.

Yes, that makes sense.

> 
> In this particular case, I think it is useful to have a guaranteed 
> non-blocking
> version, not only to delete the dummy EFI variable, but potentially in other
> future cases as well, given that they can be called much earlier in the boot 
> (when
> the perf/%cr3 issue is not a concern to begin with)

Thanks for making it more clear :)
I will change the non-blocking variants _not_ to use efi_rts_wq and as you 
suggested 
make efi_delete_dummy_variable() use non-blocking variants (that should also 
make it 
local to arch/x86).

Another follow on question is, does every firmware support both blocking and 
non-blocking variants (specially legacy EFI firmware)? I am worried about 
this because, presently efi_delete_dummy_variable() uses set_variable() and 
query_variable_info() but if I change efi_delete_dummy_variable() to use 
non-blocking 
variants and if they aren’t supported, then, I guess, 
efi_delete_dummy_variable() might 
fail :(

So, could you please clarify on that?

Regards,
Sai


RE: [PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-05-26 Thread Prakhya, Sai Praneeth
> > Assume some user requested to execute some non-blocking variant of
> > efi_rts and the kernel hasn't called efi_call_virt() yet, but was
> > scheduled out. IOW, even though user requests for non-blocking efi call, we
> might still block. Am I right?
> >
> 
> No, that is the whole point. These functions may be called from atomic 
> context,
> which is why they trylock() and give up rather than block on the semaphore if 
> a rt
> services call is already in progress. E.g.,
> 
> /*
>  * efivar_entry_set_nonblocking - call set_variable_nonblocking()
>  *
>  * This function is guaranteed to not block and is suitable for calling
>  * from crash/panic handlers.
>  *
>  * Crucially, this function will not block if it cannot acquire
>  * efivars_lock. Instead, it returns -EBUSY.
>  */
>

One more question again, if we are sure that non-blocking variants will
_always_ be called in atomic context, then, we got it covered. Because, in
set_variable() and query_variable_info() (both blocking and non-blocking) we 
check
for in_atomic() and if so, we don't use efi_rts_wq (please refer to patch 3).

If you think, there might be a probability of calling non-blocking efi_rts out 
of atomic
context, then, sure! Let's make them never use efi_rts_wq.

Regards,
Sai
N�r��yb�X��ǧv�^�)޺{.n�+{�y^n�r���z���h�&���G���h�(�階�ݢj"���m��z�ޖ���f���h���~�m�

RE: [PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-05-25 Thread Prakhya, Sai Praneeth
> > Changes from V3 to V4:
> > --
> > 1. As suggested by Peter, use completions instead of flush_work() as the
> >   former is cheaper
> > 2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
> >   wasn't able to find a better alternative to keep this change local to
> >   arch/x86.
> >
> 
> Two questions:
> - Should the non-blocking variants of the query and set_variable_store use the
> work queue? Doesn't that make them blocking?

That's a good question . I think you are right, calling non-blocking variants 
of efi_rts 
using work queues makes them blocking. But, I have a follow on question.

Assume some user requested to execute some non-blocking variant of efi_rts and 
the kernel hasn't called efi_call_virt() yet, but was scheduled out. IOW, even 
though 
user requests for non-blocking efi call, we might still block. Am I right?

With efi_rts_wq, I think, I have increased the window of getting blocked. With 
efi_rts_wq, 
kernel should explicitly call schedule() to run firmware and the chances of 
getting blocked 
are much more.

Expect this increased window, I think firmware should be executed as before.

So, can you please explain me the difference between blocking and non-blocking 
variants
from kernel perspective?
(the way we get locks are different down_interruptible() vs down_trylock())

> - If the non-blocking set_variable() does not use the work queue, can we just 
> call
> it from efi_delete_dummy_variable(), and keep the calls where they are?

Yes, I think we can do that (if we don't use efi_rts_wq for non-blocking 
variants).

Regards,
Sai


RE: [PATCH V3 2/3] efi: Introduce efi_queue_work() to queue any efi_runtime_service() on efi_rts_wq

2018-05-22 Thread Prakhya, Sai Praneeth
> On Mon, May 21, 2018 at 08:13:03PM -0700, Sai Praneeth Prakhya wrote:
> > +   /*  \
> > +* queue_work() returns 0 if work was already on queue, \
> > +* _ideally_ this should never happen.  \
> > +*/ \
> > +   if (queue_work(efi_rts_wq, _rts_work.work))
>   \
> > +   flush_work(_rts_work.work);
>   \
> 
> Since you're _always_ going to wait for it, it is _much_ cheaper to put a
> completion in your actual work and wait for that.

Sure! I will change it.
I also noticed that flush_work() in turn calls wait_for_completion().

Will also wait for a couple of days before posting V4.

Regards,
Sai
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Query regarding SetVirtualAddressMap()

2018-05-21 Thread Prakhya, Sai Praneeth
> > AFAIK, ExitBootServices() means that boot services are no longer
> > needed by OS/bootloader and hence firmware can terminate them. Does it
> > also mean that the system is in runtime mode..? (I don't think so, as, I 
> > didn't
> find it in UEFI spec).
> >
> 
> Yes
> 
> > Also, could you please enlighten me on, Why calling
> > SetVirtualAddressMap() is optional, why not make it mandatory?
> >
> 
> Because it depends on the OS whether it needs it or not. Also, there will 
> always
> be a window after calling EBS() and before calling SVAM() where we are at
> runtime and no virtual mapping is installed yet, so mandating SVAM() is a bit
> ambiguous anyway.

Thanks for clarifying my doubts :)

Regards,
Sai
N�r��yb�X��ǧv�^�)޺{.n�+{�y^n�r���z���h�&���G���h�(�階�ݢj"���m��z�ޖ���f���h���~�m�

RE: Query regarding SetVirtualAddressMap()

2018-05-17 Thread Prakhya, Sai Praneeth
+ Ricardo

> > > I recently noticed in UEFI spec v2.7, section 2.3.4, Calling
> > > Conventions for X64, that “All selectors set to be flat with virtual
> > > = physical address. If the UEFI OS loader or OS used
> > > SetVirtualAddressMap() to relocate the runtime services in a virtual
> > > address space, then this condition does not have to be met. My
> > > understanding of it is as shown below. Please correct me if it’s wrong.
> > >
> > > 1.  It’s not mandatory for the OS to call SetVirtualAddressMap()
> > >
> > > 2.  If it’s not called, the OS can still use EFI Runtime Services in
> > > physical mode (i.e. PA == VA).
> > >
> > >
> > >
> > > For x86_64, I did a quick test with qemu and OVMF. I have commented
> > > out call to SetVirtualAddressMap() from kernel and noticed that
> > > everything works fine i.e. efi variables have the normal
> > > functionality and I ran LUV (Linux UEFI Validation). Not calling
> > > SetVirtualAddressMap() doesn’t break anything because on x86 we have
> > > 1:1 mappings. So, could someone please let me know why we might
> > > still need
> > to call SetVirtualAddressMap()?
> > >
> >
> > You are right, and the same applies to arm64. However, Apple Mac EFI
> > is known not to tolerate PA==VA at runtime,

Some doubts again,
When SetVirtualAddressMap() isn't called, how does Apple Mac EFI realize that 
the system is in runtime mode and how does it realize that it should now start 
using VA instead of PA?

Extending it further, if SetVirtualAddressMap() isn't called, any platform 
firmware 
shouldn't be aware that the system is in runtime mode.. right?

Or does the firmware behavior change on call to ExitBootServices()?

AFAIK, ExitBootServices() means that boot services are no longer needed by 
OS/bootloader and hence firmware can terminate them. Does it also mean that 
the system is in runtime mode..? (I don't think so, as, I didn't find it in 
UEFI spec).

Also, could you please enlighten me on,
Why calling SetVirtualAddressMap() is optional, why not make it mandatory?

> > and there may be other x86
> > firmware implementations that don't work correctly in this case, since
> > they are only tested with MS Windows, which also calls
> SetVirtualAddressMap().

Regards,
Sai
N�r��yb�X��ǧv�^�)޺{.n�+{�y^n�r���z���h�&���G���h�(�階�ݢj"���m��z�ޖ���f���h���~�m�

RE: Query regarding SetVirtualAddressMap()

2018-05-15 Thread Prakhya, Sai Praneeth
> > I recently noticed in UEFI spec v2.7, section 2.3.4, Calling
> > Conventions for X64, that “All selectors set to be flat with virtual =
> > physical address. If the UEFI OS loader or OS used
> > SetVirtualAddressMap() to relocate the runtime services in a virtual
> > address space, then this condition does not have to be met. My
> > understanding of it is as shown below. Please correct me if it’s wrong.
> >
> > 1.  It’s not mandatory for the OS to call SetVirtualAddressMap()
> >
> > 2.  If it’s not called, the OS can still use EFI Runtime Services in
> > physical mode (i.e. PA == VA).
> >
> >
> >
> > For x86_64, I did a quick test with qemu and OVMF. I have commented
> > out call to SetVirtualAddressMap() from kernel and noticed that
> > everything works fine i.e. efi variables have the normal functionality
> > and I ran LUV (Linux UEFI Validation). Not calling
> > SetVirtualAddressMap() doesn’t break anything because on x86 we have
> > 1:1 mappings. So, could someone please let me know why we might still need
> to call SetVirtualAddressMap()?
> >
> 
> You are right, and the same applies to arm64. However, Apple Mac EFI is known
> not to tolerate PA==VA at runtime, and there may be other x86 firmware
> implementations that don't work correctly in this case, since they are only 
> tested
> with MS Windows, which also calls SetVirtualAddressMap().
> 
> So the safe option is to still call it.

Thanks for the explanation Ard, it makes sense now :)

Regards,
Sai


RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-09 Thread Prakhya, Sai Praneeth
> > That's true! AFAIK, we don't have any issues handling NMI while in efi_pgd.
> > We might have issues only when, we are already in efi_pgd, NMI comes
> > along
> 
> Can you trigger this? Or is it something hypothetical?
> 

AFAIK, it's hypothetical. I did try to trigger the issue, but failed [1].
Maybe, I need to have some more constraints [2].

[1] https://lkml.org/lkml/2017/8/23/715
[2] https://lkml.org/lkml/2017/8/25/469

> > and NMI handler tries to touch the regions that are not mapped in
> > efi_pgd
> 
> If it is not hypothetical, the NMI handler should learn to look at CR3 first 
> and
> return if CR3 has the efi pgd.

This solution and it's variants were discussed here [1], [2] and for varied 
reasons 
the community had decided to go with "Everything EFI as kthread" approach [3] 
[4].

Although the discussions were off my understanding, the present issue I see is, 
(and also the motivation for me to do the patch is)
when a thread tries to execute any  efi_runtime_service() we switch to efi_pgd 
(which doesn't have user space mappings) and all other subsystems in kernel 
aren't aware of this switch. This looks like a perfect case for kthread.

Kthread by definition doesn’t have user space mappings and if we run 
efi_runtime_services()
in a kthread context and if any other subsystem tries to access user space 
mappings 
while in efi_kthread, it's terminally broken [5].

There were several issues Andy, Peter and Mark raised.
One such (hypothetical) case is accessing user space from the back of an 
interrupt (NMI).
Others include
1. Issue specific to ARM because it runs efi_runtime_services() with interrupts 
enabled [6]
2. Interrupt taken while mmap_sem() is held for write that tries to access user 
memory [7]
3. If EFI were to have IO memory mapped at a "user" address, perf could end up 
reading it [8]

[1] https://lkml.org/lkml/2017/8/15/757
[2] https://lkml.org/lkml/2017/8/16/487
[3] https://lkml.org/lkml/2017/8/21/573
[4] https://lkml.org/lkml/2017/8/16/540

[5] https://lkml.org/lkml/2017/8/17/667
[6] https://lkml.org/lkml/2017/8/16/176
[7] https://lkml.org/lkml/2017/8/17/667
[8] https://lkml.org/lkml/2017/8/21/427

Regards,
Sai


RE: [PATCH 07/12] efi: Use efi_mm in x86 as well as ARM

2018-03-09 Thread Prakhya, Sai Praneeth
> > diff --git a/include/linux/efi.h b/include/linux/efi.h index
> > f5083aa72eae..f1b7d68ac460 100644
> > --- a/include/linux/efi.h
> > +++ b/include/linux/efi.h
> > @@ -966,6 +966,8 @@ extern struct efi {
> > unsigned long flags;
> >  } efi;
> >
> > +extern struct mm_struct efi_mm;
> > +
> >  static inline int
> >  efi_guidcmp (efi_guid_t left, efi_guid_t right)  {
> 
> Ugh, I can see three problems with this patch:
> 
> 1)
> 
> Why is the low level asm/efi.h header polluted with two of the biggest header
> files in existence, to add a type to _another_ header (efi.h)?
> 
> 2)
> 
> Why is  included if what is being relied on is mm_struct?
> 
> 3)
> 
> But even  looks unnecessary in efi.h, a simple forward
> declaration of mm_struct would do ...
> 
> The high level MM and sched headers should be added to the actual .c files 
> that
> make use of them.

Ok, makes sense.
Sorry! for that. I will fix the issues.

Regards,
Sai
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-08 Thread Prakhya, Sai Praneeth
> > Another warning by checkpatch is "use of in_atomic() in drivers code"
> 
> I'm assuming it warns because you're touching files in drivers/ but the efi 
> fun is
> not really a driver...

True! That makes sense :)

> 
> But looking at patch 3, that thing looks like a real mess. Some of the things 
> -
> pstore, it seems - do stuff in atomic context and yet you want to do efi 
> stuff in a
> workqueue which doesn't stomach atomic context to begin with.
> 
> So if you wanna do workqueue, you should make sure all efi stuff gets delayed
> to process context and queued properly. For example, we log MCEs from atomic
> context by putting them on a lockless buffer and then kicking irq_work to 
> queue
> the work when we return to process context.
> Can you do something like that?
> 

I think we can do this, it's is a good idea. I looked at this approach and saw 
that
in oops_end() function, part of arch/x86/kernel/dumpstack, between oops_exit()
and panic() (here we are not in atomic context, so, I think we can use work 
queues)
we could have something like efi_flush_buffer() which will flush the buffer and
queue the work to efi_rts_wq.

But, I guess, we have some downsides with this design:
1. We are doing this to have "no exceptions to use efi_rts_wq", but we will be 
making
the common case complicated. i.e. When a user requests to write some efi 
variable,
we will first write it to a buffer and then flush the buffer using efi_rts_wq. 
Instead, we
could have written the variable directly.
Maybe, you meant, we should use this buffer only while pstore and not during 
normal
case (which sounds reasonably OK).
2. It doesn't look rational that, when we are already going down, we schedule 
(because
we will be invoking efi_runtime_services() through work queue) to log some 
stuff.
Not, sure if that's happening in other parts of kernel or if it's OK to do that.

I will try the suggested approach and will keep this thread posted.

> "Hence, pstore calls efi_runtime_services() without using efi_rts_wq" - that
> doesn't sound like optimal design to me. I would try to shove them all through
> the workqueue - not have exceptions.
> 

Alternatively, instead of playing around with in_atomic(), we could have wrapper
functions like efi_write_var_non_wq() which will only be used by pstore. This 
function
will not use efi_rts_wq and directly invoke efi_runtime_service. Just an 
attempt to
make the code not look messy.

> Then this:
> 
> > A potential issue could be, for instance, an NMI interrupt (like perf)
> > trying to profile some user data while in efi_pgd.
> 
> I can't understand.
> 
> How did we handle this until now and why is it a problem all of a sudden?
> 
> Because I don't recall being unable to run perf while efi runtime services are
> happening.
> 

That's true! AFAIK, we don't have any issues handling NMI while in efi_pgd.
We might have issues only when, we are already in efi_pgd, NMI comes along
and NMI handler tries to touch the regions that are not mapped in efi_pgd
(Eg: User space part of process address space) and using kthread inherently
means that we will not have any user space.

Regards,
Sai


RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-08 Thread Prakhya, Sai Praneeth
/*
> >> > +* Since we process only one efi_runtime_service() at a time, an
> >> > +* ordered workqueue (which creates only one execution context)
> >> > +* should suffice all our needs.
> >> > +*/
> >> > +   efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue",
> >> > + 0);
> >>
> >> efi_rts_wq or efi_rts_workqueue?
> >>
> >> > +   if (!efi_rts_wq) {
> >> > +   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
> >> > services "
> >>
> >> Same here.
> >
> > Sure! I will make it consistent with "efi_rts_wq". Just tried to be
> > more verbose with names :)
> >
> 
> It is not a big deal, but using the exact same name is better for the 
> purposes of
> grepping and things like that :-)

Yes, that makes sense.

> By the way, check the commit title/message, there are some others there too.

Sure! I will make changes to commit/title messages too.

Regards,
Sai


RE: [PATCH V2 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization

2018-03-08 Thread Prakhya, Sai Praneeth
> >  void __init efi_enter_virtual_mode(void) diff --git
> > a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index
> > cd42f66a7c85..838b8efe639c 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -328,6 +328,12 @@ static int __init efisubsys_init(void)
> > if (!efi_enabled(EFI_BOOT))
> > return 0;
> >
> > +   /*
> > +* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
> > +* it should be invoked only after efi_rts_workqueue is ready.
> > +*/
> > +   efi_delete_dummy_variable();
> > +
> 
> Is there any way to keep this local to arch/x86?
>

I think, we can definitely do that. I will retake a look at the possibilities
and will update this thread.

> > /* We register the efi directory at /sys/firmware/efi */
> > efi_kobj = kobject_create_and_add("efi", firmware_kobj);
> > if (!efi_kobj) {
> > diff --git a/include/linux/efi.h b/include/linux/efi.h index
> > f5083aa72eae..c4efb3ef0dfa 100644
> > --- a/include/linux/efi.h
> > +++ b/include/linux/efi.h
> > @@ -992,6 +992,7 @@ extern efi_status_t efi_query_variable_store(u32



RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-07 Thread Prakhya, Sai Praneeth
+Cc Miguel Ojeda

> > > +({   
> > > \
> > > + struct efi_runtime_work efi_rts_work;   \
> > > + \
> > > + INIT_WORK_ONSTACK(_rts_work.work, efi_call_rts);\
> > > + efi_rts_work.func = _rts;   \
> > > + efi_rts_work.arg1 = _arg1;  \
> > > + efi_rts_work.arg2 = _arg2;  \
> > > + efi_rts_work.arg3 = _arg3;  \
> > > + efi_rts_work.arg4 = _arg4;  \
> > > + efi_rts_work.arg5 = _arg5;  \
> > > + /*  \
> > > +  * queue_work() returns 0 if work was already on queue, \
> > > +  * _ideally_ this should never happen.  \
> > > +  */ \
> > > + if (queue_work(efi_rts_wq, _rts_work.work))
> > \
> > > + flush_work(_rts_work.work);
> > \
> > > + else\
> > > + BUG();  \
> >
> > So failure to queue that work is such a critical problem that we need
> > to BUG() and can't possibly continue and shoult not attempt recovery at all?
> >
> 
> I think it's not critical, we can just return error status.
> I think the problem in itself is not at all critical but when I initially 
> thought about
> why the problem could have occurred, it sounded like one i.e. ideally (if the
> system is running fine) we should always be able to queue work. Failure to 
> queue
> means that the previous work is already on queue and that shouldn't be the
> case.
> So, thought, maybe something bad had happened already (just doubtful).
> 
> But, I see your point. BUG() sounds more like an over kill. Instead of fixing 
> an
> existing problem, this patch could completely take down the system.
> 
> > IOW, we should always strive to fail gracefully and not shit in pants
> > at the first sign of trouble.
> >
> 
> Yes, that makes sense. I will remove BUG() in V3 (in the two places that I
> introduced).
> 
> > Even checkpatch warns here:
> >
> > WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code
> > rather than BUG() or BUG_ON()
> > #184: FILE: drivers/firmware/efi/runtime-wrappers.c:92:
> > +   BUG();  \
> >
> 
> Sure! I will fix this
> 
> >
> > and by looking at the other output, you should run your patches
> > through checkpatch. Some of the things make sense like:
> >
> > WARNING: quoted string split across lines
> > #97: FILE: drivers/firmware/efi/efi.c:341:
> > +   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
> > services "
> > +  "disabled.\n");
> >
> > for example.
> >
> 
> I will fix this one too.
> 
> Another warning by checkpatch is "use of in_atomic() in drivers code"
> Do you think it's OK to check if were are "in_atomic()" in drivers code.
> I wasn't able to decide on other alternative, if possible, could you please 
> suggest
> one?
> 
> Regards,
> Sai
N�r��yb�X��ǧv�^�)޺{.n�+{�y^n�r���z���h�&���G���h�(�階�ݢj"���m��z�ޖ���f���h���~�m�

RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-07 Thread Prakhya, Sai Praneeth
> > +({ \
> > +   struct efi_runtime_work efi_rts_work;   \
> > +   \
> > +   INIT_WORK_ONSTACK(_rts_work.work, efi_call_rts);\
> > +   efi_rts_work.func = _rts;   \
> > +   efi_rts_work.arg1 = _arg1;  \
> > +   efi_rts_work.arg2 = _arg2;  \
> > +   efi_rts_work.arg3 = _arg3;  \
> > +   efi_rts_work.arg4 = _arg4;  \
> > +   efi_rts_work.arg5 = _arg5;  \
> > +   /*  \
> > +* queue_work() returns 0 if work was already on queue, \
> > +* _ideally_ this should never happen.  \
> > +*/ \
> > +   if (queue_work(efi_rts_wq, _rts_work.work))
>   \
> > +   flush_work(_rts_work.work);
>   \
> > +   else\
> > +   BUG();  \
> 
> So failure to queue that work is such a critical problem that we need to BUG()
> and can't possibly continue and shoult not attempt recovery at all?
> 

I think it's not critical, we can just return error status.
I think the problem in itself is not at all critical but when I initially 
thought about
why the problem could have occurred, it sounded like one i.e. ideally (if the 
system
is running fine) we should always be able to queue work. Failure to queue means
that the previous work is already on queue and that shouldn't be the case.
So, thought, maybe something bad had happened already (just doubtful).
 
But, I see your point. BUG() sounds more like an over kill. Instead of fixing 
an existing
problem, this patch could completely take down the system.

> IOW, we should always strive to fail gracefully and not shit in pants at the 
> first
> sign of trouble.
>

Yes, that makes sense. I will remove BUG() in V3 (in the two places that I 
introduced).

> Even checkpatch warns here:
> 
> WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code
> rather than BUG() or BUG_ON()
> #184: FILE: drivers/firmware/efi/runtime-wrappers.c:92:
> +   BUG();  \
>

Sure! I will fix this
 
> 
> and by looking at the other output, you should run your patches through
> checkpatch. Some of the things make sense like:
> 
> WARNING: quoted string split across lines
> #97: FILE: drivers/firmware/efi/efi.c:341:
> +   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
> services "
> +  "disabled.\n");
> 
> for example.
> 

I will fix this one too.

Another warning by checkpatch is "use of in_atomic() in drivers code"
Do you think it's OK to check if were are "in_atomic()" in drivers code.
I wasn't able to decide on other alternative, if possible, could you please 
suggest one?

Regards,
Sai
N�r��yb�X��ǧv�^�)޺{.n�+{�y^n�r���z���h�&���G���h�(�階�ݢj"���m��z�ޖ���f���h���~�m�

RE: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-07 Thread Prakhya, Sai Praneeth
> >> > pstore writes could potentially be invoked in interrupt context and
> >> > it uses set_variable<>() and query_variable_info<>() to store logs.
> >> > If we invoke efi_runtime_services() through efi_rts_wq while in
> >> > atomic() kernel issues a warning ("scheduling wile in atomic") and
> >> > prints stack trace. One way to overcome this is to not make the
> >> > caller process wait for the worker thread to finish. This approach
> >> > breaks pstore i.e. the log messages aren't written to efi
> >> > variables. Hence, pstore calls
> >> > efi_runtime_services() without using efi_rts_wq or in other words
> >> > efi_rts_wq will be used unconditionally for all the
> >> > efi_runtime_services() except set_variable<>() and
> >> > query_variable_info<>()
> >>
> >> Can't NMIs still come in during this?
> >>
> >> ... or do we assume that since something has already gone wrong, this
> >> doesn't matter?
> >>
> >
> > I think they can come. AFAIK, pstore (if enabled) runs only when we crashed.
> > Since, we are still executing stuff to log messages and NMI's can't be
> > masked, there is still a possibility for NMI's to occur (please correct me 
> > if I am
> wrong).
> > But, as you said earlier, I guess it doesn't matter because anyways we are
> going down.
> 
> The problem is that we are not always in a "already going down"
> condition for typical set_variable and query_variable_info calls.

That's correct.

> So are we actually fixing anything with this patchset?

When we are _not_ in interrupt context (eg: process context)
we still use efi_rts_wq to invoke *all* efi_runtime_services().
This solves *someone trying to access user space* while in EFI runtime services
mapping problem. A instance could be, some user space process requests us to
execute efi_runtime_service(), so, kernel switches to efi_pgd (which doesn’t 
have
user space part of process address space) and while executing 
efi_runtime_service()
perf NMI comes along to profile user data.

> In other words if the NMI vs EFI
> mapping problem requires the workqueue context then we can't have any EFI
> calls outside of that context.  Am I missing how this scheme addresses that
> problem?

I think so, because, we are not trying to solve NMI vs EFI Runtime Service 
mappings.
AFAIK, they both work together (for x86_64). The problem is, as stated above,
"someone trying to access user space while executing EFI runtime services".
The problem exists because EFI runtime services mappings don’t have user space
part of process address space.

I think the problem still persists when we are already in interrupt context and 
then
we were requested to execute some efi_runtime_service() and then NMI happens
and that NMI handler touches user space.

Please let me know if my explanation didn’t make sense or if I misunderstood
your question.

Ard and Matt, please correct me if I stated something that is incorrect.


RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-07 Thread Prakhya, Sai Praneeth
> > +struct workqueue_struct *efi_rts_wq;
> > +
> >  static bool disable_runtime;
> >  static int __init setup_noefi(char *arg)  { @@ -329,6 +331,19 @@
> > static int __init efisubsys_init(void)
> > return 0;
> >
> > /*
> > +* Since we process only one efi_runtime_service() at a time, an
> > +* ordered workqueue (which creates only one execution context)
> > +* should suffice all our needs.
> > +*/
> > +   efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0);
> 
> efi_rts_wq or efi_rts_workqueue?
> 
> > +   if (!efi_rts_wq) {
> > +   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
> > services "
> 
> Same here.

Sure! I will make it consistent with "efi_rts_wq". Just tried to be more verbose
with names :)

[...]

> > +#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)
> > \
> > +({ \
> > +   struct efi_runtime_work efi_rts_work;   \
> > +   \
> > +   INIT_WORK_ONSTACK(_rts_work.work, efi_call_rts);\
> > +   efi_rts_work.func = _rts;   \
> > +   efi_rts_work.arg1 = _arg1;  \
> > +   efi_rts_work.arg2 = _arg2;  \
> > +   efi_rts_work.arg3 = _arg3;  \
> > +   efi_rts_work.arg4 = _arg4;  \
> > +   efi_rts_work.arg5 = _arg5;  \
> > +   /*  \
> > +* queue_work() returns 0 if work was already on queue, \
> > +* _ideally_ this should never happen.  \
> > +*/ \
> > +   if (queue_work(efi_rts_wq, _rts_work.work)) \
> > +   flush_work(_rts_work.work); \
> > +   else\
> > +   BUG();  \
> 
> Thanks for the change! One remark, I would just do:

Sorry! but I am planning to remove BUG(). Looks like it could defeat the purpose
of patch. Please see Boris comments on the other thread.

[...]

> > +/*
> > + * efi_runtime_work:   Details of EFI Runtime Service work
> > + * @func:  EFI Runtime Service function identifier
> > + * @arg<1-5>:  EFI Runtime Service function arguments
> > + * @status:Status of executing EFI Runtime Service
> > + */
> > +struct efi_runtime_work {
> > +   u8 func;
> > +   void *arg1;
> > +   void *arg2;
> > +   void *arg3;
> > +   void *arg4;
> > +   void *arg5;
> > +   efi_status_t status;
> > +   struct work_struct work;
> > +};
> 
> Why is efi_runtime_work in the .h at all?
> 

Thanks for the catch. I will move it to runtime-wrappers.c file and will make it
static too. It isn't being used in any other place.

> Please CC me for the next version! :-)

Sure! Sorry for that. I should have done in V2.

Regards,
Sai


RE: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-07 Thread Prakhya, Sai Praneeth
05, 2018 at 03:23:10PM -0800, Sai Praneeth Prakhya wrote:
> > From: Sai Praneeth 
> >
> > Presently, efi_runtime_services() are executed by firmware in process
> > context. To execute efi_runtime_service(), kernel switches the page
> > directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have
> > any user space mappings. A potential issue could be, for instance, an
> > NMI interrupt (like perf) trying to profile some user data while in efi_pgd.
> 
> It might be worth pointing out that this could result in disaster (e.g.
> if the frame pointer happens to point at MMIO in the EFI runtime services
> mappings).
> 

Sorry! I didn't get it. I would like to add this point, so could you please
explain it more?

> [...]
> 
> > pstore writes could potentially be invoked in interrupt context and it
> > uses set_variable<>() and query_variable_info<>() to store logs. If we
> > invoke efi_runtime_services() through efi_rts_wq while in atomic()
> > kernel issues a warning ("scheduling wile in atomic") and prints stack
> > trace. One way to overcome this is to not make the caller process wait
> > for the worker thread to finish. This approach breaks pstore i.e. the
> > log messages aren't written to efi variables. Hence, pstore calls
> > efi_runtime_services() without using efi_rts_wq or in other words
> > efi_rts_wq will be used unconditionally for all the
> > efi_runtime_services() except set_variable<>() and
> > query_variable_info<>()
> 
> Can't NMIs still come in during this?
> 
> ... or do we assume that since something has already gone wrong, this doesn't
> matter?
>

I think they can come. AFAIK, pstore (if enabled) runs only when we crashed.
Since, we are still executing stuff to log messages and NMI's can't be masked,
there is still a possibility for NMI's to occur (please correct me if I am 
wrong).
But, as you said earlier, I guess it doesn't matter because anyways we are 
going down.
 
> Otherwise, this doesn't seem to break basic stuff on arm64 platforms. I can 
> boot
> up, read the EFI RTC, and reboot. I ahd lockdep and KASAN enabled, I received
> no splats from either.
> 
> FWIW:
> 
> Tested-by: Mark Rutland  
> Thanks,
> Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-07 Thread Prakhya, Sai Praneeth
-0800, Sai Praneeth Prakhya wrote:
> > @@ -329,6 +331,19 @@ static int __init efisubsys_init(void)
> > return 0;
> >
> > /*
> > +* Since we process only one efi_runtime_service() at a time, an
> > +* ordered workqueue (which creates only one execution context)
> > +* should suffice all our needs.
> > +*/
> > +   efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0);
> > +   if (!efi_rts_wq) {
> > +   pr_err("Failed to create efi_rts_workqueue, EFI runtime services
> "
> > +  "disabled.\n");
> > +   clear_bit(EFI_RUNTIME_SERVICES, );
> > +   return 0;
> > +   }
> 
> I'm a little worried that something might sample this flag between it being 
> set in
> an early_initcall (arm_enable_runtime_services), and cleared in a 
> subsys_initcall
> here.
> 
> However, nothing seems to do that so far, so maybe that's ok...
> 

Thanks for raising this. I will take a look at initcalls.

> [...]
> 
> > +/* efi_runtime_service() function identifiers */ enum {
> > +   GET_TIME,
> > +   SET_TIME,
> > +   GET_WAKEUP_TIME,
> > +   SET_WAKEUP_TIME,
> > +   GET_VARIABLE,
> > +   GET_NEXT_VARIABLE,
> > +   SET_VARIABLE,
> > +   SET_VARIABLE_NONBLOCKING,
> > +   QUERY_VARIABLE_INFO,
> > +   QUERY_VARIABLE_INFO_NONBLOCKING,
> > +   GET_NEXT_HIGH_MONO_COUNT,
> > +   RESET_SYSTEM,
> > +   UPDATE_CAPSULE,
> > +   QUERY_CAPSULE_CAPS,
> > +};
> 
> Can we please give this enum a name

Sure! Added in V3.

> 
> [...]
> 
> > +/*
> > + * efi_runtime_work:   Details of EFI Runtime Service work
> > + * @func:  EFI Runtime Service function identifier
> > + * @arg<1-5>:  EFI Runtime Service function arguments
> > + * @status:Status of executing EFI Runtime Service
> > + */
> > +struct efi_runtime_work {
> > +   u8 func;
> 
> ... and use it here rather than an opaque u8? I realise that means placing the
> enum in .
> 

Actually, with Miguel comments, I am considering making this struct static and 
moving
it to runtime-wrappers.c, since "struct efi_runtime_work" isn't really being 
used anywhere
except runtime-wrappers.c. Please see in V3.

> > +   void *arg1;
> > +   void *arg2;
> > +   void *arg3;
> > +   void *arg4;
> > +   void *arg5;
> > +   efi_status_t status;
> > +   struct work_struct work;
> > +};
> 
> Thanks,
> Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-05 Thread Prakhya, Sai Praneeth
> > Presently, efi_runtime_services() are executed by firmware in process
> > context. To execute efi_runtime_service(), kernel switches the page
> > directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have
> > any user space mappings. A potential issue could be, for instance, an
> > NMI interrupt (like perf) trying to profile some user data while in efi_pgd.
> >
> > A solution for this issue could be to use kthread to run
> > efi_runtime_service(). When a user/kernel thread requests to execute
> > efi_runtime_service(), kernel off-loads this work to kthread which in
> > turn uses efi_pgd. Anything that tries to touch user space addresses
> > while in kthread is terminally broken. This patch adds support to efi
> > subsystem to handle all calls to efi_runtime_services() using a work
> > queue (which in turn uses kthread).
> >
> > Implementation summary:
> > ---
> > 1. When user/kernel thread requests to execute efi_runtime_service(),
> > enqueue work to efi_rts_workqueue.
> > 2. Caller thread waits until the work is finished because it's
> > dependent on the return status of efi_runtime_service().
> >
> > Semantics to pack arguments in efi_runtime_work (has void pointers):
> > 1. If argument is a pointer (of any type), pass it as is.
> > 2. If argument is a value (of any type), address of the value is
> > passed.
> >
> > Introduce a handler function (called efi_call_rts()) that
> > a. understands efi_runtime_work and
> > b. invokes the appropriate efi_runtime_service() with the
> > appropriate arguments
> >
> > Semantics followed by efi_call_rts() to understand efi_runtime_work:
> > 1. If argument was a pointer, recast it from void pointer to original
> > pointer type.
> > 2. If argument was a value, recast it from void pointer to original
> > pointer type and dereference it.
> >
> > pstore writes could potentially be invoked in interrupt context and it
> > uses set_variable<>() and query_variable_info<>() to store logs. If we
> > invoke efi_runtime_services() through efi_rts_wq while in atomic()
> > kernel issues a warning ("scheduling wile in atomic") and prints stack
> > trace. One way to overcome this is to not make the caller process wait
> > for the worker thread to finish. This approach breaks pstore i.e. the
> > log messages aren't written to efi variables. Hence, pstore calls
> > efi_runtime_services() without using efi_rts_wq or in other words
> > efi_rts_wq will be used unconditionally for all the
> > efi_runtime_services() except set_variable<>() and
> > query_variable_info<>()
> 
> 
> Is there a place in the system reboot path where we can try to flush these
> asynchronous pstore writes from interrupt context?

I don't think so because, the issue is not with the pstore writes but with 
pstore
using efi as backing store. Anything could register as pstore backend, eg: RAM,
ACPI-ERST etc.. and AFAIK, they don’t use work queues to store logs. Now that
efi_runtime_services() uses work queues, we unfortunately have to have this 
hack.

> It seems unfortunate that
> we need to have this wide exception for all
> set_variable() calls.

True, basically any efi_runtime_service() that might get called in interrupt 
context.
I am not very happy to have the hack too, but didn’t find other way.

Either that or switch to an explicit "emergency mode" where
> we stop caring about protecting the system from EFI runtime code because
> we're already crashing.

Should we care about extra warning (scheduling while in atomic) when we are 
already
crashing? This sounds kind of debatable. I will wait for feedback from 
community it they
think it's OK or maybe a better solution.

Regards,
Sai


RE: [PATCH V1 2/3] efi: Introduce efi_rts_workqueue and necessary infrastructure to invoke all efi_runtime_services()

2018-02-25 Thread Prakhya, Sai Praneeth
> > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > index ac5db5f8dbbf..4714b305ca90 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -76,6 +76,8 @@ static unsigned long *efi_tables[] = {
> > _attr_table,
> >  };
> >
> > +struct workqueue_struct *efi_rts_wq;
> > +
> >  static bool disable_runtime;
> >  static int __init setup_noefi(char *arg)  { @@ -329,6 +331,15 @@
> > static int __init efisubsys_init(void)
> > if (!efi_enabled(EFI_BOOT))
> > return 0;
> >
> > +   /* Create a work queue to run EFI Runtime Services */
> > +   efi_rts_wq = create_workqueue("efi_rts_workqueue");
> 
> Please consider alloc_workqueue() instead with the appropriate flags, since
> create_workqueue() and friends are deprecated.

Sure! Will change in V2.

> 
> > +   if (!efi_rts_wq) {
> > +   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
> > services "
> > +  "disabled.\n");
> > +   clear_bit(EFI_RUNTIME_SERVICES, );
> > +   return 0;
> > +   }
> > +
> > /*
> >  * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
> >  * it should be invoked only after efi_rts_workqueue is ready.
> > diff --git a/drivers/firmware/efi/runtime-wrappers.c
> > b/drivers/firmware/efi/runtime-wrappers.c
> > index ae54870b2788..5cdb787da5d3 100644
> > --- a/drivers/firmware/efi/runtime-wrappers.c
> > +++ b/drivers/firmware/efi/runtime-wrappers.c
> > @@ -1,6 +1,14 @@
> >  /*
> >   * runtime-wrappers.c - Runtime Services function call wrappers
> >   *
> > + * Implementation summary:
> > + * ---
> > + * 1. When user/kernel thread requests to execute
> > + efi_runtime_service(),
> > + * enqueue work to efi_rts_workqueue.
> > + * 2. Caller thread waits until the work is finished because it's
> > + * dependent on the return status and execution of efi_runtime_service().
> > + * For instance, get_variable() and get_next_variable().
> > + *
> >   * Copyright (C) 2014 Linaro Ltd. 
> >   *
> >   * Split off from arch/x86/platform/efi/efi.c @@ -22,6 +30,8 @@
> > #include   #include   #include
> > 
> > +#include 
> > +
> >  #include 
> >
> >  /*
> > @@ -33,6 +43,50 @@
> >  #define __efi_call_virt(f, args...) \
> > __efi_call_virt_pointer(efi.systab->runtime, f, args)
> >
> > +/* Each EFI Runtime Service is represented with a unique number */
> > +#define GET_TIME   0
> > +#define SET_TIME   1
> > +#define GET_WAKEUP_TIME2
> > +#define SET_WAKEUP_TIME3
> > +#define GET_VARIABLE   4
> > +#define GET_NEXT_VARIABLE  5
> > +#define SET_VARIABLE   6
> > +#define SET_VARIABLE_NONBLOCKING   7
> > +#define QUERY_VARIABLE_INFO8
> > +#define QUERY_VARIABLE_INFO_NONBLOCKING9
> > +#define GET_NEXT_HIGH_MONO_COUNT   10
> > +#define RESET_SYSTEM   11
> > +#define UPDATE_CAPSULE 12
> > +#define QUERY_CAPSULE_CAPS 13
> 
> An enum would be better, given these are just internal, contiguous IDs.
> 

Makes sense.

> > +
> > +/*
> > + * efi_queue_work: Queue efi_runtime_service() and wait until it's done
> > + * @rts:   efi_runtime_service() function identifier
> > + * @rts_arg<1-5>:  efi_runtime_service() function arguments
> > + *
> > + * Accesses to efi_runtime_services() are serialized by a binary
> > + * semaphore (efi_runtime_lock) and caller waits until the work is
> > + * finished, hence _only_ one work is queued at a time. So,
> > +queue_work()
> > + * should never fail.
> > + */
> > +#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)
> > \
> > +({ \
> > +   struct efi_runtime_work efi_rts_work;   \
> > +   efi_rts_work.status = EFI_ABORTED;  \
> > +   \
> > +   INIT_WORK_ONSTACK(_rts_work.work, efi_call_rts);\
> > +   efi_rts_work.func = _rts;   \
> > +   efi_rts_work.arg1 = _arg1;  \
> > +   efi_rts_work.arg2 = _arg2;  \
> > +   efi_rts_work.arg3 = _arg3;  \
> > +   efi_rts_work.arg4 = _arg4;  \
> > +   efi_rts_work.arg5 = _arg5;  \
> > +   if (queue_work(efi_rts_wq, _rts_work.work))

RE: [PATCH 0/3] Use mm_struct and switch_mm() instead of manually

2017-12-18 Thread Prakhya, Sai Praneeth
> > Changes in V3:
> > 1. When CPUMASK_OFFSTACK is enabled, switch_mm_irqs_off() sets cpumask
> > by calling cpumask_set_cpu(). This panics kernel as efi_mm is not
> > initialized, therefore initialize efi_mm in efi_alloc_page_tables().
> 
> Thanks for the v3.
> 
> I confirmed that the issue I saw with the v2 when I enabled 'efi=debug' on the
> sgi-uv 300 machine (i.e the NULL pointer access while accessing
> mm_cpumask(next), in the function call
> 'switch_mm_irqs_off') is fixed in the v3.
> 
> Also as I noted during the v2 review, introducing the 'efi_switch_mm() ' 
> helper
> instead of manually twiddling with %cr3 seems more cleaner (having personally
> debugged this leg several times on the underlying x86 EFI machines).
> 
> So in addition to me testing this on the sgi-uv300 and Dell Optiplex EFI 
> enabled
> machine, please feel free to add:
> 
> Reviewed-by: Bhupesh Sharma 
> 

Hi  Bhupesh,

Thanks for the review and re-iterating the usefulness of this patch set.

Regards,
Sai


RE: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually

2017-09-11 Thread Prakhya, Sai Praneeth
> So, in summary it seems that the primary kernel boot _fails_ with your
> v2 patchset on the real hardware for me irrespective of whether I use Matt's
> tree or Linus's tree:
> 
> a) I would suggest that you perform some more checks on real hardware as
> qemu boot tests sometimes do not expose the problems we might see when
> booting a kernel on efi capable hardware.
> 
> b) Also do note that both Matt's tree and Linus's tree work fine on this 
> hardware
> for me (with the 'nopcid' added to the bootargs)
> 

Hi Bhupesh,

Thanks a lot! for the detailed explanation. I will address these issues in V3.

Regards,
Sai


RE: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually

2017-09-06 Thread Prakhya, Sai Praneeth


> -Original Message-
> From: Sai Praneeth Prakhya [mailto:sai.praneeth.prak...@intel.com]
> Sent: Tuesday, September 5, 2017 7:43 PM
> To: Bhupesh Sharma 
> Cc: linux-efi@vger.kernel.org; linux-ker...@vger.kernel.org; Matt Fleming
> ; Ard Biesheuvel ;
> j...@suse.com; Borislav Petkov ; Luck, Tony
> ; l...@kernel.org; m...@redhat.com; Neri, Ricardo
> ; Shankar, Ravi V 
> Subject: Re: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of
> manually
> 
> On Tue, 2017-09-05 at 19:21 -0700, Sai Praneeth Prakhya wrote:
> > > I get a similar crash on Qemu with linus's master branch and the V2
> > > applied on top of it. Here are the details of my test environment:
> > >
> > > 1. I use the OVMF (EDK2) EFI firmware to launch the kernel:
> > > edk2.git/ovmf-x64
> > >
> > > 2. I used linus's master branch (HEAD - commit:
> > > b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top
> > > of the same.
> > >
> > > 3. I use the following qemu command line to launch the test:
> > >
> > > # /usr/local/bin/qemu-system-x86_64 --version QEMU emulator version
> > > 2.9.50 (v2.9.0-526-g76d20ea) Copyright (c) 2003-2017 Fabrice Bellard
> > > and the QEMU Project developers
> > >
> > > # /usr/local/bin/qemu-system-x86_64 -enable-kvm  -net nic -net tap
> > > -m $MEMSIZE -nographic -drive
> > > file=$DISK_IMAGE,if=virtio,format=qcow2
> > > -vga std -boot c -cpu host -kernel $KERNEL -append
> > > "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81"  -initrd
> > > $INITRAMFS -bios $OVMF_FW_PATH
> > >
> > > And here is the crash log:
> > >
> > > [0.006054] general protection fault:  [#1] SMP
> > > [0.006459] Modules linked in:
> > > [0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3
> > > [0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > > BIOS 0.0.0 02/06/2015
> > > [0.007000] task: 81e0f480 task.stack: 81e0
> > > [0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440
> > > [0.007000] RSP: :81e03d80 EFLAGS: 00010086
> > > [0.007000] RAX: 80007d084000 RBX:  RCX:
> 77ff8000
> > > [0.007000] RDX: 7d084000 RSI: 8000 RDI:
> 00019a00
> > > [0.007000] RBP: 81e03dc0 R08:  R09:
> 88007d085000
> > > [0.007000] R10: 81e03dd8 R11: 7d095063 R12:
> 81e5c6a0
> > > [0.007000] R13: 81ed4f40 R14: 0030 R15:
> 0001
> > > [0.007000] FS:  () GS:88007d40()
> > > knlGS:
> > > [0.007000] CS:  0010 DS:  ES:  CR0: 80050033
> > > [0.007000] CR2: 88007d754000 CR3: 0220a000 CR4:
> 000406b0
> > > [0.007000] Call Trace:
> > > [0.007000]  switch_mm+0xd/0x20
> > > [0.007000]  ? switch_mm+0xd/0x20
> > > [0.007000]  efi_switch_mm+0x3e/0x4a
> > > [0.007000]  efi_call_phys_prolog+0x28/0x1ac
> > > [0.007000]  efi_enter_virtual_mode+0x35a/0x48f
> > > [0.007000]  start_kernel+0x332/0x3b8
> > > [0.007000]  x86_64_start_reservations+0x2a/0x2c
> > > [0.007000]  x86_64_start_kernel+0x178/0x18b
> > > [0.007000]  secondary_startup_64+0xa5/0xa5
> > > [0.007000]  ? secondary_startup_64+0xa5/0xa5
> > > [0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9
> > > 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09
> > > f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87
> > > fb 7e 89
> > > [0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: 81e03d80
> > > [0.007000] ---[ end trace bfa55bf4e4765255 ]---
> > > [0.007000] Kernel panic - not syncing: Attempted to kill the idle 
> > > task!
> > > [0.007000] ---[ end Kernel panic - not syncing: Attempted to kill
> > > the idle task!
> > >
> > > 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf
> > > firmware and 64-bit x86 kernel) with your patches, the primary
> > > kernel boots fine on Qemu:
> > >
> > > ovmf firmware used in this case - edk2.git/ovmf-ia32
> > >
> > > 5. Also, if I append 'efi=old_map' to the bootargs (for the failing
> > > case in point 3 above), I see the primary kernel boots fine on Qemu
> > > as well.
> > >
> > > Regards,
> > > Bhupesh
> >
> > Hi Bhupesh,
> >
> > Thanks a lot for the detailed explanation. They are helpful to
> > reproduce the issue quickly. From my initial debug, I think that AMD
> > SME + efi_mm_struct patches + -cpu host (in qemu) are required to
> > reproduce the issue on qemu.
> >
> > I have tried the following combinations (all tests are on qemu):
> > On Linus's tree:
> > 1. With  SME and  efi_mm and  -cpu host -> panics 2. With  SME and
> > efi_mm and !-cpu host -> boots 3. With  SME and !efi_mm and  -cpu host
> > -> boots 4. With  SME 

RE: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually

2017-09-03 Thread Prakhya, Sai Praneeth
> >
> > Thanks for this v2.
> > Introducing the 'efi_switch_mm() ' helper instead of manually
> > twiddling with %cr3 seems more cleaner.
> >
> > I have tested this patchset on a x86_64 machine with the following
> > configurations:
> >
> > 1. Primary kernel boot with efi=old_map 2. Primary kernel boot with
> > new efi map
> >
> > While it seems that efi=old_map passes, the new efi map boot fails for
> > me on both the two x86 machine (Dell 3050MT and a SGI - UV300 machine.
> >
> > It seems we are hitting a NULL pointer deference in
> > 'efi_call_phys_prolog' while accessing '_mm'.
> >
> > Please see the log below for details:
> >
> > [0.020006] BUG: unable to handle kernel NULL pointer dereference
> > at   (null)
> > [0.021000] IP: switch_mm_irqs_off+0x44/0x270
> > [0.021000] Call Trace:
> > [0.021000]  switch_mm+0x20/0x30
> > [0.021000]  efi_switch_mm+0x49/0x60
> > [0.021000]  efi_call_phys_prolog+0x56/0x1b3
> > [0.021000]  efi_enter_virtual_mode+0x3a9/0x520
> > [0.021000]  start_kernel+0x424/0x4c8
> > [0.021000]  ? set_init_arg+0x5a/0x5a
> > [0.021000]  ? early_idt_handler_array+0x120/0x120
> > [0.021000]  x86_64_start_reservations+0x29/0x2b
> > [0.021000]  x86_64_start_kernel+0x151/0x174
> > [0.021000]  secondary_startup_64+0x9f/0x9f
> > [0.021000] Code: 2d 82 51 d9 4f 65 c7 05 0f 65 da 4f 01 00 00 00
> > 48 39 f7 0f 84 14 01 00 00 65 48 89 35 f6 64 da 4f 48 8b 86 e8 02 00
> > 00 45 89 ed  4c 0f ab 28 bf 00 00 00 80 48 03 7e 50 48 8b 05 27 b0
> > b9 00
> > [0.021000] RIP: switch_mm_irqs_off+0x44/0x270 RSP: b0e035d0
> > [0.021000] CR2: 
> > [0.021000] ---[ end trace fb94349305e1cb8b ]---
> > [0.021000] Kernel panic - not syncing: Fatal exception
> > [0.021000] ---[ end Kernel panic - not syncing: Fatal exception
> >
> 
> And I forgot to mention that I tried the patchset both with the efi/next and
> linus's trees and saw the same result.
> 
> I would be happy to help in case you need further details of the test 
> environment
> or need help in testing this crash further.
> 
> Regards,
> Bhupesh

Hi Bhupesh,

Thanks for trying the patches and raising the concern.
Could you also please also give a try on qemu? (if reproducible, we will be 
having a common platform to start debugging)
I have tested this patch set on qemu and real machines (different from one's 
you tried) in our lab and didn’t notice this issue.

Regards,
Sai
N�r��yb�X��ǧv�^�)޺{.n�+{�y^n�r���z���h�&���G���h�(�階�ݢj"���m��z�ޖ���f���h���~�m�

RE: [PATCH] x86/efi-bgrt: Fix kernel panic when mapping BGRT data

2015-12-10 Thread Prakhya, Sai Praneeth

>>On Thu, Dec 10, 2015 at 10:27:01AM -0800, Sai Praneeth Prakhya wrote:
>> From: Sai Praneeth 
>> 
>> Starting with this commit 35eb8b81edd4 ("x86/efi: Build our own page 
>> table structures") efi regions have a separate page directory called 
>> "efi_pgd". In order to access any efi region we have to first shift 
>>%cr3 to this page table. In the bgrt code we are trying to copy 
>> bgrt_header and image, but these regions fall under "EFI_BOOT_SERVICES_DATA"
>> and to access these regions we have to shift %cr3 to efi_pgd and not 
>> doing so will cause page fault as shown below.
>> 
>> [0.251599] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
>> [0.259126] Freeing SMP alternatives memory: 32K (8230e000 - 
>> 82316000)
>> [0.271803] BUG: unable to handle kernel paging request at 
>> fffefce35002
>> [0.279740] IP: [] efi_bgrt_init+0x144/0x1fd
>> [0.286383] PGD 300f067 PUD 0
>> [0.289879] Oops:  [#1] SMP
>> [0.293566] Modules linked in:
>> [0.297039] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
>> 4.4.0-rc1-eywa-eywa-built-in-47041+ #2
>> [0.306619] Hardware name: Intel Corporation Skylake Client 
>> platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.R00.B104.B01.150114 
>> 11/11/2015
>> [0.320925] task: 820134c0 ti: 8200 task.ti: 
>> 8200
>> [0.329420] RIP: 0010:[]  [] 
>> efi_bgrt_init+0x144/0x1fd
>> [0.338821] RSP: :82003f18  EFLAGS: 00010246
>> [0.344852] RAX: fffefce35000 RBX: fffefce35000 RCX: 
>> fffefce2b000
>> [0.352952] RDX: 8a82b000 RSI: 8235bb80 RDI: 
>> 8a835000
>> [0.361050] RBP: 82003f30 R08: 8a865000 R09: 
>> ff202850
>> [0.369149] R10: 811ad62f R11:  R12: 
>> 
>> [0.377248] R13: 88016dbaea40 R14: 822622c0 R15: 
>> 82003fb0
>> [0.385348] FS:  () GS:88016d80() 
>> knlGS:
>> [0.394533] CS:  0010 DS:  ES:  CR0: 80050033
>> [0.401054] CR2: fffefce35002 CR3: 0300c000 CR4: 
>> 003406f0
>> [0.409153] DR0:  DR1:  DR2: 
>> 
>> [0.417252] DR3:  DR6: fffe0ff0 DR7: 
>> 0400
>> [0.425350] Stack:
>> [0.427638]   82256900 88016dbaea40 
>> 82003f40
>> [0.436086]  821bbce0 82003f88 8219c0c2 
>> 
>> [0.444533]  8219ba4a 822622c0 00083000 
>> 
>> [0.452978] Call Trace:
>> [0.455763]  [] efi_late_init+0x9/0xb
>> [0.461697]  [] start_kernel+0x463/0x47f
>> [0.467928]  [] ? set_init_arg+0x55/0x55
>> [0.474159]  [] ? early_idt_handler_array+0x120/0x120
>> [0.481669]  [] x86_64_start_reservations+0x2a/0x2c
>> [0.488982]  [] x86_64_start_kernel+0x13d/0x14c
>> [0.495897] Code: 00 41 b4 01 48 8b 78 28 e8 09 36 01 00 48 85 c0 48 89 
>> c3 75 13 48 c7 c7 f8 ac d3 81 31 c0 e8 d7 3b fb fe e9 b5 00 00 00 45 84 e4 
>> <44> 8b 6b 02 74 0d be 06 00 00 00 48 89 df e8 ae 34 0$
>> [0.518151] RIP  [] efi_bgrt_init+0x144/0x1fd
>> [0.524888]  RSP 
>> [0.528851] CR2: fffefce35002
>> [0.532615] ---[ end trace 7b06521e6ebf2aea ]---
>> [0.537852] Kernel panic - not syncing: Attempted to kill the idle task!
>> 
>> As said above one way to fix this bug is to shift %cr3 to efi_pgd but 
>> we are not doing that way because it leaks inner details of how we 
>> switch to EFI page tables into a new call site and it also adds duplicate 
>> code.
>> Instead, we remove the call to efi_lookup_mapped_addr() and always 
>> perform early_mem*() instead of early_io*() because we want to remap 
>> RAM regions and not I/O regions.
>> 
>> We also delete efi_lookup_mapped_addr() because it is impossible to 
>> use it without also doing the page table switch to efi_pgd.

>The original motivation for efi_lookup_mapped_addr came from early_ioremap 
>printing a warning if used on an address range >already mapped as RAM.  Does 
>early_mem* handle that case correctly without a warning?  

Thanks a lot Josh for letting me know that. I don't think early_memremap() does 
that because early_memremap() and early_ioremap() both use __early_ioremap() 
but with different page protections (and I am not sure how those protections 
effect warning, but I will check that). Waiting for comments from Matt and 
Boris.

>Because not all firmware places the BGRT image in boot services memory; some 
>firmware places the BGRT image variously in BIOS >reserved memory, ACPI 
>reclaim >space, or other strange places.
>
>- Josh Triplett

I think we should not support buggy firmware implementations because it's same 
as encouraging them, instead we could let user know that he has got a buggy 
firmware and we skip bgrt code as if