Re: arm64/efistub boot error with CONFIG_GCC_PLUGIN_STACKLEAK

2019-08-15 Thread Mark Rutland
On Thu, Aug 15, 2019 at 02:21:26PM +0300, Ard Biesheuvel wrote:
> On Thu, 15 Aug 2019 at 14:03, Mark Rutland  wrote:
> > On Thu, Aug 15, 2019 at 05:56:27AM -0400, skodde wrote:
> > > Hi,
> > >
> > > I've enabled CONFIG_GCC_PLUGIN_STACKLEAK on 5.2.8 for an arm64
> > > macchiatobin board and I get the following error when loading the
> > > kernel (using grub-efi on top of edk ii):
> > >
> > > EFI stub: Booting Linux Kernel...
> > > EFI stub: ERROR: efi_get_random_bytes() failed
> > > EFI stub: ERROR: Failed to relocate kernel
> > >
> > > The kernel boots fine with that option disabled, but strangely
> > > presents the same error when disabling only CONFIG_RANDOMIZE_BASE.
> >
> > That shouldn't be possible, given the IS_ENABLED(CONFIG_RANDOMIZE_BASE)
> > guard around the efi_get_random_bytes() call, so something sounds wrong.
> >
> > Are you certain that you're running the same kernel Image that you
> > rebuilt?
> >
> > Ard, do you reckon it would be worth adding the UTS_RELEASE and
> > UTS_VERSION to the " Booting Linux Kernel..." string? It would make
> > debugging that potential issue easier.
> 
> Use of the UTS_xxx macros already triggers an annoying number of
> object rebuilds every time you change anything entirely unrelated in
> your kernel sources, so I'd prefer to avoid this tbh.

Fair enough; saves me writing a patch! :)

Mark.


Re: arm64/efistub boot error with CONFIG_GCC_PLUGIN_STACKLEAK

2019-08-15 Thread Mark Rutland
On Thu, Aug 15, 2019 at 05:56:27AM -0400, skodde wrote:
> Hi,
> 
> I've enabled CONFIG_GCC_PLUGIN_STACKLEAK on 5.2.8 for an arm64
> macchiatobin board and I get the following error when loading the
> kernel (using grub-efi on top of edk ii):
> 
> EFI stub: Booting Linux Kernel...
> EFI stub: ERROR: efi_get_random_bytes() failed
> EFI stub: ERROR: Failed to relocate kernel
> 
> The kernel boots fine with that option disabled, but strangely
> presents the same error when disabling only CONFIG_RANDOMIZE_BASE.

That shouldn't be possible, given the IS_ENABLED(CONFIG_RANDOMIZE_BASE)
guard around the efi_get_random_bytes() call, so something sounds wrong.

Are you certain that you're running the same kernel Image that you
rebuilt?

Ard, do you reckon it would be worth adding the UTS_RELEASE and
UTS_VERSION to the " Booting Linux Kernel..." string? It would make
debugging that potential issue easier.

> Let me know if I can provide more info or do some tests.

Maybe there's a problem with stale objects. If you're not doing so
already, could you try a clean build with CONFIG_RANDOMIZE_BASE
deselected?

Thanks,
Mark.


Re: [PATCH v8 2/5] arm64: replace -pg with CC_FLAGS_FTRACE in efi Makefiles

2019-04-08 Thread Mark Rutland
On Fri, Feb 08, 2019 at 04:10:11PM +0100, Torsten Duwe wrote:
>   In preparation for arm64 supporting ftrace built on other compiler
>   options, let's have makefiles remove the $(CC_FLAGS_FTRACE)
>   flags, whatever these may be, rather than assuming '-pg'.
>   While at it, fix arm32 as well.
> 
>   There should be no functional change as a result of this patch.
> 
> Signed-off-by: Torsten Duwe 

Ard, would you be happy if we were to take this via the arm64 tree?

Assuming so, can we have your ack?

Thanks,
Mark.

> 
> ---
>  drivers/firmware/efi/libstub/Makefile |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -16,9 +16,9 @@ cflags-$(CONFIG_X86)+= -m$(BITS) -D__K
>  
>  # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
>  # disable the stackleak plugin
> -cflags-$(CONFIG_ARM64)   := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie 
> \
> -$(DISABLE_STACKLEAK_PLUGIN)
> -cflags-$(CONFIG_ARM) := $(subst -pg,,$(KBUILD_CFLAGS)) \
> +cflags-$(CONFIG_ARM64)   := $(subst 
> $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> +-fpie $(DISABLE_STACKLEAK_PLUGIN)
> +cflags-$(CONFIG_ARM) := $(subst 
> $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
>  -fno-builtin -fpic \
>  $(call cc-option,-mno-single-pic-base)
>  


Re: [PATCH v8 2/5] arm64: replace -pg with CC_FLAGS_FTRACE in efi Makefiles

2019-02-11 Thread Mark Rutland
On Fri, Feb 08, 2019 at 04:10:11PM +0100, Torsten Duwe wrote:
>   In preparation for arm64 supporting ftrace built on other compiler
>   options, let's have makefiles remove the $(CC_FLAGS_FTRACE)
>   flags, whatever these may be, rather than assuming '-pg'.
>   While at it, fix arm32 as well.
> 
>   There should be no functional change as a result of this patch.
> 
> Signed-off-by: Torsten Duwe 

With the indentation removed from the commit text:

Reviewed-by: Mark Rutland 

Mark.

> 
> ---
>  drivers/firmware/efi/libstub/Makefile |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -16,9 +16,9 @@ cflags-$(CONFIG_X86)+= -m$(BITS) -D__K
>  
>  # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
>  # disable the stackleak plugin
> -cflags-$(CONFIG_ARM64)   := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie 
> \
> -$(DISABLE_STACKLEAK_PLUGIN)
> -cflags-$(CONFIG_ARM) := $(subst -pg,,$(KBUILD_CFLAGS)) \
> +cflags-$(CONFIG_ARM64)   := $(subst 
> $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> +-fpie $(DISABLE_STACKLEAK_PLUGIN)
> +cflags-$(CONFIG_ARM) := $(subst 
> $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
>  -fno-builtin -fpic \
>  $(call cc-option,-mno-single-pic-base)
>  


Re: [PATCH] efi/libstub/arm64: handle randomized TEXT_OFFSET

2018-04-24 Thread Mark Rutland
On Tue, Apr 24, 2018 at 01:11:40PM +0200, Ard Biesheuvel wrote:
> Hi Mark,
> 
> On 24 April 2018 at 13:00, Mark Rutland <mark.rutl...@arm.com> wrote:
> > When CONFIG_RANDOMIZE_TEXT_OFFSET is selected, TEXT_OFFSET is an
> > arbitrary multiple of PAGE_SIZE in the interval [0, 2MB).
> >
> > The EFI stub doesn't accuont for this, and only handles the case where
> 
> 'account'
> 
> If you agree, I will add something here to clarify that the newly
> chosen offset should retain the misalignment of TEXT_OFFSET relative
> to EFI_KIMG_ALIGN, because it took me a while to figure that out.

That makes sense to me. Sorry for the clumsy wording!

> 
> Other than that,
> 
> Reviewed-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
> 
> I will queue this as a fix

Great; thanks!

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-06 Thread Mark Rutland
On Mon, Mar 05, 2018 at 03:23:10PM -0800, Sai Praneeth Prakhya wrote:
> From: Sai Praneeth <sai.praneeth.prak...@intel.com>
> 
> Presently, efi_runtime_services() are executed by firmware in process
> context. To execute efi_runtime_service(), kernel switches the page
> directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
> user space mappings. A potential issue could be, for instance, an NMI
> interrupt (like perf) trying to profile some user data while in efi_pgd.

It might be worth pointing out that this could result in disaster (e.g.
if the frame pointer happens to point at MMIO in the EFI runtime
services mappings).

[...]

> pstore writes could potentially be invoked in interrupt context and it
> uses set_variable<>() and query_variable_info<>() to store logs. If we
> invoke efi_runtime_services() through efi_rts_wq while in atomic()
> kernel issues a warning ("scheduling wile in atomic") and prints stack
> trace. One way to overcome this is to not make the caller process wait
> for the worker thread to finish. This approach breaks pstore i.e. the
> log messages aren't written to efi variables. Hence, pstore calls
> efi_runtime_services() without using efi_rts_wq or in other words
> efi_rts_wq will be used unconditionally for all the
> efi_runtime_services() except set_variable<>() and
> query_variable_info<>()

Can't NMIs still come in during this?

... or do we assume that since something has already gone wrong, this
doesn't matter?

Otherwise, this doesn't seem to break basic stuff on arm64 platforms. I
can boot up, read the EFI RTC, and reboot. I ahd lockdep and KASAN
enabled, I received no splats from either.

FWIW:

Tested-by: Mark Rutland <mark.rutl...@arm.com [arm64]

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-06 Thread Mark Rutland
On Mon, Mar 05, 2018 at 03:23:09PM -0800, Sai Praneeth Prakhya wrote:
> @@ -329,6 +331,19 @@ static int __init efisubsys_init(void)
>   return 0;
>  
>   /*
> +  * Since we process only one efi_runtime_service() at a time, an
> +  * ordered workqueue (which creates only one execution context)
> +  * should suffice all our needs.
> +  */
> + efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0);
> + if (!efi_rts_wq) {
> + pr_err("Failed to create efi_rts_workqueue, EFI runtime 
> services "
> +"disabled.\n");
> + clear_bit(EFI_RUNTIME_SERVICES, );
> + return 0;
> + }

I'm a little worried that something might sample this flag between it
being set in an early_initcall (arm_enable_runtime_services), and
cleared in a subsys_initcall here.

However, nothing seems to do that so far, so maybe that's ok...

[...]

> +/* efi_runtime_service() function identifiers */
> +enum {
> + GET_TIME,
> + SET_TIME,
> + GET_WAKEUP_TIME,
> + SET_WAKEUP_TIME,
> + GET_VARIABLE,
> + GET_NEXT_VARIABLE,
> + SET_VARIABLE,
> + SET_VARIABLE_NONBLOCKING,
> + QUERY_VARIABLE_INFO,
> + QUERY_VARIABLE_INFO_NONBLOCKING,
> + GET_NEXT_HIGH_MONO_COUNT,
> + RESET_SYSTEM,
> + UPDATE_CAPSULE,
> + QUERY_CAPSULE_CAPS,
> +};

Can we please give this enum a name

[...]

> +/*
> + * efi_runtime_work: Details of EFI Runtime Service work
> + * @func:EFI Runtime Service function identifier
> + * @arg<1-5>:EFI Runtime Service function arguments
> + * @status:  Status of executing EFI Runtime Service
> + */
> +struct efi_runtime_work {
> + u8 func;

... and use it here rather than an opaque u8? I realise that means
placing the enum in .

> + void *arg1;
> + void *arg2;
> + void *arg3;
> + void *arg4;
> + void *arg5;
> + efi_status_t status;
> + struct work_struct work;
> +};

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] efi/arm*: Only register page tables when they exist

2018-01-16 Thread Mark Rutland
Currently the arm/arm64 runtime code registers the runtime servies
pagetables with ptdump regardless of whether runtime services page
tables have been created.

As efi_mm.pgd is NULL in these cases, attempting to dump the efi page
tables results in a NULL pointer dereference in the ptdump code:

/sys/kernel/debug# cat efi_page_tables
[  479.522600] Unable to handle kernel NULL pointer dereference at virtual 
address 
[  479.522715] Mem abort info:
[  479.522764]   ESR = 0x9606
[  479.522850]   Exception class = DABT (current EL), IL = 32 bits
[  479.522899]   SET = 0, FnV = 0
[  479.522937]   EA = 0, S1PTW = 0
[  479.528200] Data abort info:
[  479.528230]   ISV = 0, ISS = 0x0006
[  479.528317]   CM = 0, WnR = 0
[  479.528317] user pgtable: 4k pages, 48-bit VAs, pgd = 64ab0cb0
[  479.528449] [] *pgd=fbbe4003, *pud=fb66e003, 
*pmd=
[  479.528600] Internal error: Oops: 9606 [#1] PREEMPT SMP
[  479.528664] Modules linked in:
[  479.528699] CPU: 0 PID: 2457 Comm: cat Not tainted 
4.15.0-rc3-00065-g2ad2ee7ecb5c-dirty #7
[  479.528799] Hardware name: FVP Base (DT)
[  479.528899] pstate: 0049 (nzcv daif +PAN -UAO)
[  479.528941] pc : walk_pgd.isra.1+0x20/0x1d0
[  479.529011] lr : ptdump_walk_pgd+0x30/0x50
[  479.529105] sp : 0bf4bc20
[  479.529185] x29: 0bf4bc20 x28: 9d22e000
[  479.529271] x27: 0002 x26: 80007b4c63c0
[  479.529358] x25: 014000c0 x24: 80007c098900
[  479.529445] x23: 0bf4beb8 x22: 
[  479.529532] x21: 0bf4bd70 x20: 0001
[  479.529618] x19: 0bf4bcb0 x18: 
[  479.529760] x17: 0041a1c8 x16: 082139d8
[  479.529800] x15: 9d3c6030 x14: 9d2527f4
[  479.529924] x13: 03f3 x12: 0038
[  479.53] x11: 0003 x10: 0101010101010101
[  479.530099] x9 : 17e94050 x8 : 003f
[  479.530226] x7 :  x6 : 
[  479.530313] x5 : 0001 x4 : 
[  479.530416] x3 : 09069fd8 x2 : 
[  479.530500] x1 :  x0 : 
[  479.530599] Process cat (pid: 2457, stack limit = 0x5d1b0e6f)
[  479.530660] Call trace:
[  479.530746]  walk_pgd.isra.1+0x20/0x1d0
[  479.530833]  ptdump_walk_pgd+0x30/0x50
[  479.530907]  ptdump_show+0x10/0x20
[  479.530920]  seq_read+0xc8/0x470
[  479.531023]  full_proxy_read+0x60/0x90
[  479.531100]  __vfs_read+0x18/0x100
[  479.531180]  vfs_read+0x88/0x160
[  479.531267]  SyS_read+0x48/0xb0
[  479.531299]  el0_svc_naked+0x20/0x24
[  479.531400] Code: 91400420 f90033a0 a90707a2 f9403fa0 (f940)
[  479.531499] ---[ end trace bfe8e28d8acb2b67 ]---
Segmentation fault

Let's avoid this problem by only registering the tables after their
successful creation, which is also less confusing when EFI runtime
services are not in use.

Signed-off-by: Mark Rutland <mark.rutl...@arm.com>
Reported-by: Will Deacon <will.dea...@arm.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: linux-efi@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
---
 drivers/firmware/efi/arm-runtime.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 1cc41c3d6315..86a1ad17a32e 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -54,6 +54,9 @@ static struct ptdump_info efi_ptdump_info = {
 
 static int __init ptdump_init(void)
 {
+   if (!efi_enabled(EFI_RUNTIME_SERVICES))
+   return 0;
+
return ptdump_debugfs_register(_ptdump_info, "efi_page_tables");
 }
 device_initcall(ptdump_init);
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH v4 2/2] arm64: Add software workaround for Falkor erratum 1041

2017-12-11 Thread Mark Rutland
Hi,

On Sun, Dec 10, 2017 at 08:03:43PM -0600, Shanker Donthineni wrote:
> +/**
> + * Errata workaround prior to disable MMU. Insert an ISB immediately prior
> + * to executing the MSR that will change SCTLR_ELn[M] from a value of 1 to 0.
> + */
> + .macro pre_disable_mmu_workaround
> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041
> + isb
> +alternative_else_nop_endif
> +#endif
> + .endm

There's really no need for this to be an alternative. It makes the
kernel larger and more complex due to all the altinstr data and probing
code.

As Will suggested last time [1], please just use the ifdef, and always
compile-in the extra ISB if CONFIG_QCOM_FALKOR_ERRATUM_E1041 is
selected. Get rid of the alternatives and probing code.

All you need here is:

/*
 * Some Falkor parts make errant speculative instruction fetches
 * when SCTLR_ELx.M is cleared. An ISB before the write to
 * SCTLR_ELx prevents this.
 */
.macro pre_disable_mmu_workaround
#ifdef
isb
#endif
.endm

> +
> + .macro pre_disable_mmu_early_workaround
> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
> + isb
> +#endif
> + .endm
> +

... and we don't need a special early variant.

Thanks,
Mark.

[1] https://lkml.kernel.org/r/20171201112457.ge18...@arm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 19/27] x86: assembly, make some functions local

2017-10-25 Thread Mark Rutland
On Wed, Oct 25, 2017 at 04:21:48PM +0200, Jiri Slaby wrote:
> Hi,
> 
> On 10/06/2017, 03:21 PM, Mark Rutland wrote:
> > If the aim of this series is to introduce something that architectures
> > use consistently, then can we please actually poke other architectures
> > about it? e.g. send this to linux-arch, with a cover letter explaining
> > the idea and asking maintainers to take a look.
> 
> Sure, with v5, I will.

Thanks; that's much appreciated, and I look forward to it!

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 19/27] x86: assembly, make some functions local

2017-10-06 Thread Mark Rutland
Hi Jiri,

I can see that these serve a useful purpose (as they are necessary for
asm validation encessary for livepatching), and I am not personally
averse to the new annotations.

However, I am concerned that as-is, this is going to create more
problems for !x86 architectures. More on that below.

On Fri, Oct 06, 2017 at 02:53:08PM +0200, Jiri Slaby wrote:
> On 10/04/2017, 09:33 AM, Ard Biesheuvel wrote:
> > On 4 October 2017 at 08:22, Jiri Slaby  wrote:
> >> On 10/02/2017, 02:48 PM, Ard Biesheuvel wrote:
> >>> On 2 October 2017 at 10:12, Jiri Slaby  wrote:
>  There is a couple of assembly functions, which are invoked only locally
>  in the file they are defined. In C, we mark them "static". In assembly,
>  annotate them using SYM_{FUNC,CODE}_START_LOCAL (and switch their
>  ENDPROC to SYM_{FUNC,CODE}_END too). Whether FUNC or CODE depends on
>  ENDPROC/END for a particular function (C or non-C).
> >>>
> >>> I wasn't cc'ed on the cover letter, so I am missing the rationale of
> >>> replacing ENTRY/ENDPROC with other macros.
> >>
> >> There was no cover letter. I am attaching what is in PATCH 1/27 instead:
> >> Introduce new C macros for annotations of functions and data in
> >> assembly. There is a long-standing mess in macros like ENTRY, END,
> >> ENDPROC and similar. They are used in different manners and sometimes
> >> incorrectly.
> > 
> > I must say, I don't share this sentiment.
> > 
> > In arm64, we use ENTRY/ENDPROC for functions with external linkage,
> > and the bare symbol name/ENDPROC for functions with local linkage. I
> > guess we could add ENDOBJECT if we wanted to, but we never really felt
> > the need.
> 
> Yes and this is exactly the reason for the new macros. Now, it's a
> complete mess. One arch does this, another does that. 

If the aim of this series is to introduce something that architectures
use consistently, then can we please actually poke other architectures
about it? e.g. send this to linux-arch, with a cover letter explaining
the idea and asking maintainers to take a look.

I think one reason that ENTRY/END/ENDPROC are used inconsistently is
that they're insufficiently documented. So people assume (inconsistent)
semantics, and often cargo-cult usage without thinking. To avoid that,
could we please document how these new macros should (and should not) be
used?

That way, we have a much better chance of consistency, and it's easier
to figure out if the intended semantics are necessary/sufficient.

Otherwise, I'm worried that bits of this will be inconsistently and
incorrectly cargo-culted into other architectures, making matters worse.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with cr3

2017-08-16 Thread Mark Rutland
On Wed, Aug 16, 2017 at 11:07:10AM +0100, Will Deacon wrote:
> On Wed, Aug 16, 2017 at 10:53:38AM +0100, Mark Rutland wrote:
> > On Wed, Aug 16, 2017 at 10:31:12AM +0100, Ard Biesheuvel wrote:
> > > (+ Mark, Will)
> > > 
> > > On 15 August 2017 at 22:46, Andy Lutomirski <l...@kernel.org> wrote:
> > > > On Tue, Aug 15, 2017 at 12:18 PM, Sai Praneeth Prakhya
> > > > <sai.praneeth.prak...@intel.com> wrote:
> > > >> +/*
> > > >> + * Makes the calling kernel thread switch to/from efi_mm context
> > > >> + * Can be used from SetVirtualAddressMap() or during efi runtime calls
> > > >> + * (Note: This routine is heavily inspired from use_mm)
> > > >> + */
> > > >> +void efi_switch_mm(struct mm_struct *mm)
> > > >> +{
> > > >> +   struct task_struct *tsk = current;
> > > >> +
> > > >> +   task_lock(tsk);
> > > >> +   efi_scratch.prev_mm = tsk->active_mm;
> > > >> +   if (efi_scratch.prev_mm != mm) {
> > > >> +   mmgrab(mm);
> > > >> +   tsk->active_mm = mm;
> > > >> +   }
> > > >> +   switch_mm(efi_scratch.prev_mm, mm, NULL);
> > > >> +   task_unlock(tsk);
> > > >> +
> > > >> +   if (efi_scratch.prev_mm != mm)
> > > >> +   mmdrop(efi_scratch.prev_mm);
> > > >
> > > > I'm confused.  You're mmdropping an mm that you are still keeping a
> > > > pointer to.  This is also a bit confusing in the case where you do
> > > > efi_switch_mm(efi_scratch.prev_mm).
> > > >
> > > > This whole manipulation seems fairly dangerous to me for another
> > > > reason -- you're taking a user thread (I think) and swapping out its
> > > > mm to something that the user in question should *not* have access to.
> > > > What if a perf interrupt happens while you're in the alternate mm?
> > > > What if you segfault and dump core?  Should we maybe just have a flag
> > > > that says "this cpu is using a funny mm", assert that the flag is
> > > > clear when scheduling, and teach perf, coredumps, etc not to touch
> > > > user memory when the flag is set?
> > > 
> > > It appears we may have introduced this exact issue on arm64 and ARM by
> > > starting to run the UEFI runtime services with interrupts enabled.
> > > (perf does not use NMI on ARM, so the issue did not exist beforehand)
> > > 
> > > Mark, Will, any thoughts?
> > 
> > Yup, I can cause perf to take samples from the EFI FW code, so that's
> > less than ideal.
> 
> But that should only happen if you're profiling EL1, right, which needs
> root privileges? (assuming the skid issue is solved -- not sure what
> happened to those patches after they broke criu).

I *think* that only needs perf_event_paranoid < 1, rather than root.

It's certianly not accessible by default to most users (e.g. my Ubuntu
fs sets this to 2, and IIRC Debian go to a much more stringent
non-upstream paranoid level).
 
> > The "funny mm" flag sounds like a good idea to me, though given recent
> > pain with sampling in the case of skid, I don't know exactly what we
> > should do if/when we take an overflow interrupt while in EFI.
> 
> I don't think special-casing perf interrupts is the right thing to do here.
> If we're concerned about user-accesses being made off the back of interrupts
> taken whilst in EFI, then we should probably either swizzle back in the
> user page table on the IRQ path or postpone handling it until we're done
> with the firmware.

Doing that for every IRQ feels odd, especially as the result would be
sampling something that wasn't executed, potentially repeatedly, giveing
bogus info.

> Having a flag feels a bit weird: would the uaccess routines return
> -EFAULT if it's set?

I'd expect we'd abort at a higher level, not taking any sample. i.e.
we'd have the core overflow handler check in_funny_mm(), and if so, skip
the sample, as with the skid case.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with cr3

2017-08-16 Thread Mark Rutland
On Wed, Aug 16, 2017 at 10:31:12AM +0100, Ard Biesheuvel wrote:
> (+ Mark, Will)
> 
> On 15 August 2017 at 22:46, Andy Lutomirski  wrote:
> > On Tue, Aug 15, 2017 at 12:18 PM, Sai Praneeth Prakhya
> >  wrote:
> >> +/*
> >> + * Makes the calling kernel thread switch to/from efi_mm context
> >> + * Can be used from SetVirtualAddressMap() or during efi runtime calls
> >> + * (Note: This routine is heavily inspired from use_mm)
> >> + */
> >> +void efi_switch_mm(struct mm_struct *mm)
> >> +{
> >> +   struct task_struct *tsk = current;
> >> +
> >> +   task_lock(tsk);
> >> +   efi_scratch.prev_mm = tsk->active_mm;
> >> +   if (efi_scratch.prev_mm != mm) {
> >> +   mmgrab(mm);
> >> +   tsk->active_mm = mm;
> >> +   }
> >> +   switch_mm(efi_scratch.prev_mm, mm, NULL);
> >> +   task_unlock(tsk);
> >> +
> >> +   if (efi_scratch.prev_mm != mm)
> >> +   mmdrop(efi_scratch.prev_mm);
> >
> > I'm confused.  You're mmdropping an mm that you are still keeping a
> > pointer to.  This is also a bit confusing in the case where you do
> > efi_switch_mm(efi_scratch.prev_mm).
> >
> > This whole manipulation seems fairly dangerous to me for another
> > reason -- you're taking a user thread (I think) and swapping out its
> > mm to something that the user in question should *not* have access to.
> > What if a perf interrupt happens while you're in the alternate mm?
> > What if you segfault and dump core?  Should we maybe just have a flag
> > that says "this cpu is using a funny mm", assert that the flag is
> > clear when scheduling, and teach perf, coredumps, etc not to touch
> > user memory when the flag is set?
> 
> It appears we may have introduced this exact issue on arm64 and ARM by
> starting to run the UEFI runtime services with interrupts enabled.
> (perf does not use NMI on ARM, so the issue did not exist beforehand)
> 
> Mark, Will, any thoughts?

Yup, I can cause perf to take samples from the EFI FW code, so that's
less than ideal.

The "funny mm" flag sounds like a good idea to me, though given recent
pain with sampling in the case of skid, I don't know exactly what we
should do if/when we take an overflow interrupt while in EFI.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi: arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP

2017-06-05 Thread Mark Rutland
On Mon, Jun 05, 2017 at 08:04:35AM +, Ard Biesheuvel wrote:
> On ARM, regions of memory that are described by UEFI as having special
> significance to the firmware itself are omitted from the linear mapping.
> This is necessary since we cannot guarantee that alternate mappings of
> the same physical region will use attributes that are compatible with
> the ones we use for the linear mapping, and aliases with mismatched
> attributes are prohibited by the architecture.
> 
> The above does not apply to ACPI reclaim regions: such regions have no
> special significance to the firmware, and it is up to the OS to decide
> whether or not to preserve them after it has consumed their contents,
> and for how long, after which time the OS can use the memory in any way
> it likes. In the Linux case, such regions are preserved indefinitely,
> and are simply treated the same way as other 'reserved' memory types.
> 
> Punching holes into the linear mapping causes page table fragmentation,
> which increases TLB pressure, and so we should avoid doing so if we can.
> So add a special case for regions of type EFI_ACPI_RECLAIM_MEMORY, and
> memblock_reserve() them instead of marking them MEMBLOCK_NOMAP.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>

Sounds sane to me. FWIW:

Acked-by: Mark Rutland <mark.rutl...@arm.com>

Thanks,
Mark.

> ---
>  drivers/firmware/efi/arm-init.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
> index 1027d7b44358..0aa4ce7b4fbb 100644
> --- a/drivers/firmware/efi/arm-init.c
> +++ b/drivers/firmware/efi/arm-init.c
> @@ -159,6 +159,7 @@ static __init int is_usable_memory(efi_memory_desc_t *md)
>   switch (md->type) {
>   case EFI_LOADER_CODE:
>   case EFI_LOADER_DATA:
> + case EFI_ACPI_RECLAIM_MEMORY:
>   case EFI_BOOT_SERVICES_CODE:
>   case EFI_BOOT_SERVICES_DATA:
>   case EFI_CONVENTIONAL_MEMORY:
> @@ -211,6 +212,10 @@ static __init void reserve_regions(void)
>  
>   if (!is_usable_memory(md))
>   memblock_mark_nomap(paddr, size);
> +
> + /* keep ACPI reclaim memory intact for kexec etc. */
> + if (md->type == EFI_ACPI_RECLAIM_MEMORY)
> + memblock_reserve(paddr, size);
>   }
>   }
>  }
> -- 
> 2.9.3
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/libstub: arm/arm64: don't use TASK_SIZE when randomising the RT space

2017-04-10 Thread Mark Rutland
On Mon, Apr 10, 2017 at 11:30:38AM +0100, Ard Biesheuvel wrote:
> As reported by James, Catalin and Mark, commit e69176d68d26
> ("ef/libstub/arm/arm64: Randomize the base of the UEFI rt services
> region") results in a crash in the firmware regardless of whether KASLR
> is in effect or not, and whether the firmware implements EFI_RNG_PROTOCOL
> or not.
> 
> Mark has identified the root cause to be the inappropriate use of
> TASK_SIZE in the stub, which arm64 defines as
> 
>   #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
> TASK_SIZE_32 : TASK_SIZE_64)
> 
> and testing thread flags at this point results in the dereference of
> pointers in uninitialized structures.
> 
> So instead, introduce a preprocessor symbol EFI_RT_VIRTUAL_LIMIT and
> define it to TASK_SIZE_64 on arm64 and TASK_SIZE on ARM, both of which
> are compile time constants. Also, change the 'headroom' variable to
> static const to force an error if this changes in the future.
> 
> Cc: James Morse <james.mo...@arm.com>
> Cc: Mark Rutland <mark.rutl...@arm.com>
> Cc: Catalin Marinas <catalin.mari...@arm.com>
> Cc: Matt Fleming <m...@codeblueprint.co.uk>
> Cc: Ingo Molnar <mi...@kernel.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>

With this patch applied atop of next-20170410, a defconfig arm64 kernel
built with the Linaro 15.08 toolchain boots happily for me on Juno R1.

So FWIW:

Tested-by: Mark Rutland <mark.rutl...@arm.com>

Thanks,
Mark.

> ---
> 
> Apologies for the breakage. On the systems I have tested, sp_el0 apparently
> pointed somewhere sane when I inadvertently dereferenced it, and the
> resulting addresses looked sufficiently random to me.
> 
> Ingo, once we have some confirmation that this makes the problem go away,
> could you please take this straight into efi/core? Thanks. 
> 
>  drivers/firmware/efi/libstub/arm-stub.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/firmware/efi/libstub/arm-stub.c 
> b/drivers/firmware/efi/libstub/arm-stub.c
> index 1e45ec51b094..34010ff3b77e 100644
> --- a/drivers/firmware/efi/libstub/arm-stub.c
> +++ b/drivers/firmware/efi/libstub/arm-stub.c
> @@ -32,6 +32,12 @@
>  #define EFI_RT_VIRTUAL_BASE  SZ_512M
>  #define EFI_RT_VIRTUAL_SIZE  SZ_512M
>  
> +#ifdef CONFIG_ARM64
> +#define EFI_RT_VIRTUAL_LIMIT TASK_SIZE_64
> +#else
> +#define EFI_RT_VIRTUAL_LIMIT TASK_SIZE
> +#endif
> +
>  static u64 virtmap_base = EFI_RT_VIRTUAL_BASE;
>  
>  efi_status_t efi_open_volume(efi_system_table_t *sys_table_arg,
> @@ -236,8 +242,9 @@ unsigned long efi_entry(void *handle, efi_system_table_t 
> *sys_table,
>* shift of 21 bit positions into account when scaling
>* the headroom value using a 32-bit random value.
>*/
> - u64 headroom = TASK_SIZE - EFI_RT_VIRTUAL_BASE -
> -EFI_RT_VIRTUAL_SIZE;
> + static const u64 headroom = EFI_RT_VIRTUAL_LIMIT -
> + EFI_RT_VIRTUAL_BASE -
> + EFI_RT_VIRTUAL_SIZE;
>   u32 rnd;
>  
>   status = efi_get_random_bytes(sys_table, sizeof(rnd),
> -- 
> 2.9.3
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] ef/libstub: arm/arm64: randomize the base of the UEFI rt services region

2017-04-10 Thread Mark Rutland
On Fri, Apr 07, 2017 at 04:51:16PM +0100, Ard Biesheuvel wrote:
> On 7 April 2017 at 16:47, James Morse  wrote:
> > On 24/03/17 13:24, Ard Biesheuvel wrote:
> >> Update the allocation logic for the virtual mapping of the UEFI runtime
> >> services to start from a randomized base address if KASLR is in effect,
> >> and if the UEFI firmware exposes an implementation of EFI_RNG_PROTOCOL.
> >>
> >> This makes it more difficult to predict the location of exploitable
> >> data structures in the runtime UEFI firmware, which increases robustness
> >> against attacks. Note that these regions are only mapped during the
> >> time a runtime service call is in progress, and only on a single CPU
> >> at a time, bit give the lack of a downside, let's enable it nonetheless.
> >
> > With next-20170407 on Seattle Overdrive, I get an SError[0] on boot:

> > * Revert "ef/libstub/arm/arm64: Randomize the base of the UEFI rt services 
> > region"
> >
> > At which point the machine start booting to a prompt again, (its noisier 
> > than
> > usual but looks like double-printing).

> That is quite interesting, to be honest, because that patch should
> effectively be a NOP on systems that do not implement
> EFI_RNG_PROTOCOL.

FWIW, I'm also seeing a crash as a result of this patch, on a Juno R1
with an EFI_RNG_PROTOCOL implementation, with next-20170410 defconfig.
I'm using the Linaro 15.08 toolchain.


EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table


WARNING: this system is using an unsafe pseudo-random implementation of
RngLib!



Synchronous Exception at 0xF80FC7E8


If I replace the !nokaslr() in efi_entry() with false, my kernel boots.

If I replace !nokaslr() with true, it fails to boot, even with the call
to efi_get_random_bytes() replaced with status = EFI_ERROR.

The problem appears to be the use of TASK_SIZE, since on arm64 that's:

#define TASK_SIZE   (test_thread_flag(TIF_32BIT) ? \
TASK_SIZE_32 : TASK_SIZE_64)

... which would mean that the stub would be trying to dereference whatever it
found in sp_el0. Looking at objdump:

0160 :
 ...
 4b4:   9400bl  0 
 4b8:   35000280cbnzw0, 508 
 4bc:   d5384100mrs x0, sp_el0
 4c0:   f944ldr x4, [x0]
...

So I think we should revert this for now.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: xen: Implement EFI reset_system callback

2017-04-06 Thread Mark Rutland
[Adding the EFI maintainers]

Tl;DR: Xen's EFI wrappery doesn't implement reset_system, so when
invoked on arm64 we get a NULL dereference.

On Thu, Apr 06, 2017 at 04:39:13PM +0100, Julien Grall wrote:
> On 06/04/17 16:20, Daniel Kiper wrote:
> >On Thu, Apr 06, 2017 at 04:38:24PM +0200, Juergen Gross wrote:
> >>On 06/04/17 16:27, Daniel Kiper wrote:
> >>>On Thu, Apr 06, 2017 at 09:32:32AM +0100, Julien Grall wrote:
> Hi Juergen,
> 
> On 06/04/17 07:23, Juergen Gross wrote:
> >On 05/04/17 21:49, Boris Ostrovsky wrote:
> >>On 04/05/2017 02:14 PM, Julien Grall wrote:
> >>>The x86 code has theoritically a similar issue, altought EFI does not
> >>>seem to be the preferred method. I have left it unimplemented on x86 
> >>>and
> >>>CCed Linux Xen x86 maintainers to know their view here.
> >>
> >>(+Daniel)
> >>
> >>This could be a problem for x86 as well, at least theoretically.
> >>xen_machine_power_off() may call pm_power_off(), which is 
> >>efi.reset_system.
> >>
> >>So I think we should have a similar routine there.
> >
> >+1
> >
> >I don't see any problem with such a routine added, in contrast to
> >potential "reboots" instead of power off without it.
> >
> >So I think this dummy xen_efi_reset_system() should be added to
> >drivers/xen/efi.c instead.
> 
> I will resend the patch during day with xen_efi_reset_system moved
> to common code and implement the x86 counterpart (thought, I will
> not be able to test it).
> >>>
> >>>I think that this is ARM specific issue. On x86 machine_restart() calls
> >>>xen_restart(). Hence, everything works. So, I think that it should be
> >>>fixed only for ARM. Anyway, please CC me when you send a patch.
> >>
> >>What about xen_machine_power_off() (as stated in Boris' mail)?
> >
> >Guys what do you think about that:
> >
> >--- a/drivers/firmware/efi/reboot.c
> >+++ b/drivers/firmware/efi/reboot.c
> >@@ -55,7 +55,7 @@ static void efi_power_off(void)
> >
> > static int __init efi_shutdown_init(void)
> > {
> >-   if (!efi_enabled(EFI_RUNTIME_SERVICES))
> >+   if (!efi_enabled(EFI_RUNTIME_SERVICES) || efi_enabled(EFI_PARAVIRT))
> >return -ENODEV;
> >
> >if (efi_poweroff_required())
> >
> >
> >Julien, for ARM64 please take a look at 
> >arch/arm64/kernel/efi.c:efi_poweroff_required(void).
> >
> >I hope that tweaks for both files should solve our problem.
> 
> This sounds good for power off (I haven't tried to power off DOM0
> yet).

Please, let's keep the Xen knowledge constrained to the Xen EFI wrapper,
rather than spreading it further.

IMO, given reset_system is a *mandatory* function, the Xen wrapper
should provide an implementation.

I don't see why you can't implement a wrapper that calls the usual Xen
poweroff/reset functions.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] efi/libstub: arm/arm64: disable debug prints on 'quiet' cmdline arg

2017-03-24 Thread Mark Rutland
On Fri, Mar 24, 2017 at 01:24:09PM +, Ard Biesheuvel wrote:
> The EFI stub currently prints a number of diagnostic messages that do
> not carry a lot of information. Since these prints are not controlled
> by 'loglevel' or other command line parameters, and since they appear on
> the EFI framebuffer as well (if enabled), it would be nice if we could
> turn them off.
> 
> So let's add support for the 'quiet' command line parameter in the stub,
> and disable the non-error prints if it is passed.
> 
> Cc: Matt Fleming <m...@codeblueprint.co.uk>
> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
> ---
> This was previously sent out as a separate patch. The difference is that this
> version looks for 'quiet' rather than 'efi=debug', and preserves the noisy
> console as the default.

This addresses my prior concern, so FWIW:

Acked-by: Mark Rutland <mark.rutl...@arm.com>

Mark.

>  drivers/firmware/efi/libstub/arm-stub.c| 12 ++--
>  drivers/firmware/efi/libstub/arm32-stub.c  |  2 ++
>  drivers/firmware/efi/libstub/efi-stub-helper.c |  9 +
>  drivers/firmware/efi/libstub/efistub.h |  7 +++
>  drivers/firmware/efi/libstub/secureboot.c  |  2 ++
>  include/linux/efi.h|  3 ---
>  6 files changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/firmware/efi/libstub/arm-stub.c 
> b/drivers/firmware/efi/libstub/arm-stub.c
> index fcd34057dc1c..6f522e3091af 100644
> --- a/drivers/firmware/efi/libstub/arm-stub.c
> +++ b/drivers/firmware/efi/libstub/arm-stub.c
> @@ -116,8 +116,6 @@ unsigned long efi_entry(void *handle, efi_system_table_t 
> *sys_table,
>   if (sys_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
>   goto fail;
>  
> - pr_efi(sys_table, "Booting Linux Kernel...\n");
> -
>   status = check_platform_features(sys_table);
>   if (status != EFI_SUCCESS)
>   goto fail;
> @@ -151,6 +149,12 @@ unsigned long efi_entry(void *handle, efi_system_table_t 
> *sys_table,
>   goto fail;
>   }
>  
> + status = efi_parse_options(cmdline_ptr);
> + if (status != EFI_SUCCESS)
> + pr_efi_err(sys_table, "Failed to parse EFI cmdline options\n");
> +
> + pr_efi(sys_table, "Booting Linux Kernel...\n");
> +
>   si = setup_graphics(sys_table);
>  
>   status = handle_kernel_image(sys_table, image_addr, _size,
> @@ -162,10 +166,6 @@ unsigned long efi_entry(void *handle, efi_system_table_t 
> *sys_table,
>   goto fail_free_cmdline;
>   }
>  
> - status = efi_parse_options(cmdline_ptr);
> - if (status != EFI_SUCCESS)
> - pr_efi_err(sys_table, "Failed to parse EFI cmdline options\n");
> -
>   secure_boot = efi_get_secureboot(sys_table);
>  
>   /*
> diff --git a/drivers/firmware/efi/libstub/arm32-stub.c 
> b/drivers/firmware/efi/libstub/arm32-stub.c
> index 18a8b5eb55e7..becbda445913 100644
> --- a/drivers/firmware/efi/libstub/arm32-stub.c
> +++ b/drivers/firmware/efi/libstub/arm32-stub.c
> @@ -9,6 +9,8 @@
>  #include 
>  #include 
>  
> +#include "efistub.h"
> +
>  efi_status_t check_platform_features(efi_system_table_t *sys_table_arg)
>  {
>   int block;
> diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c 
> b/drivers/firmware/efi/libstub/efi-stub-helper.c
> index 1575b566cd4a..93685f79f9e8 100644
> --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
> +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
> @@ -33,11 +33,16 @@
>  static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
>  
>  static int __section(.data) __nokaslr;
> +static int __section(.data) __quiet;
>  
>  int __pure nokaslr(void)
>  {
>   return __nokaslr;
>  }
> +int __pure is_quiet(void)
> +{
> + return __quiet;
> +}
>  
>  #define EFI_MMAP_NR_SLACK_SLOTS  8
>  
> @@ -428,6 +433,10 @@ efi_status_t efi_parse_options(char const *cmdline)
>   if (str == cmdline || (str && str > cmdline && *(str - 1) == ' '))
>   __nokaslr = 1;
>  
> + str = strstr(cmdline, "quiet");
> + if (str == cmdline || (str && str > cmdline && *(str - 1) == ' '))
> + __quiet = 1;
> +
>   /*
>* If no EFI parameters were specified on the cmdline we've got
>* nothing to do.
> diff --git a/drivers/firmware/efi/libstub/efistub.h 
> b/drivers/firmware/efi/libstub/efistub.h
> index a7a2a2c3f199..83f268c05007 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -25,6 +25,13 @@
&g

Re: [PATCH] efi/libstub: arm/arm64: make debug prints dependent on efi=debug

2017-03-21 Thread Mark Rutland
On Mon, Mar 20, 2017 at 08:57:05PM +, Ard Biesheuvel wrote:
> The EFI stub currently prints a number of diagnostic messages that do
> not carry a lot of information. Since these prints are not controlled
> by 'loglevel' or other command line parameters, and since they appear on
> the EFI framebuffer as well (if enabled), it would be nice if we could
> turn them off.
> 
> So let's only print them if 'efi=debug' is passed on the command line,
> or when CONFIG_DEBUG_EFI is set in the kernel build configuration.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  drivers/firmware/efi/libstub/arm32-stub.c  |  2 ++
>  drivers/firmware/efi/libstub/efi-stub-helper.c | 23 ++-
>  drivers/firmware/efi/libstub/efistub.h |  9 +
>  drivers/firmware/efi/libstub/secureboot.c  |  2 ++
>  include/linux/efi.h|  3 ---
>  5 files changed, 27 insertions(+), 12 deletions(-)

> diff --git a/drivers/firmware/efi/libstub/efistub.h 
> b/drivers/firmware/efi/libstub/efistub.h
> index 71c4d0e3c4ed..5ed16e6e166f 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -24,6 +24,15 @@
>  #define EFI_ALLOC_ALIGN  EFI_PAGE_SIZE
>  #endif
>  
> +extern int __pure debug_enabled(void);
> +
> +#define pr_efi(sys_table, msg)   do {
> \
> + if (debug_enabled()) efi_printk(sys_table, "EFI stub: "msg);\
> +} while (0)

I was going to say that it's somewhat unfortunate to lose some of these
messages by default, but I guess this only adds one additional
round-trip when debugging a remote issue, and gives us the freedom to
add more comprehensive debug messages in future.

Is the idea is to make the EFI stub completely silent in the successful
case, or is the aim just to minimise the number of messages?

If it's only the latter, we could log a message regarding the existence
of efi=debug in the default case, which might help with bug reports.

No strong feelings from me.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/13] Xen: EFI: Parse DT parameters for Xen specific UEFI

2015-11-17 Thread Mark Rutland
On Tue, Nov 17, 2015 at 12:25:58PM +0100, Ard Biesheuvel wrote:
> On 17 November 2015 at 10:57,   wrote:
> > From: Shannon Zhao 
> >
> > Add a new function to parse DT parameters for Xen specific UEFI just
> > like the way for normal UEFI. Then it could reuse the existing codes.
> >
> > Signed-off-by: Shannon Zhao 
> > ---
> >  drivers/firmware/efi/efi.c | 67 
> > ++
> >  1 file changed, 62 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > index d6144e3..629bd06 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -24,6 +24,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  struct efi __read_mostly efi = {
> > .mps= EFI_INVALID_TABLE_ADDR,
> > @@ -488,12 +489,60 @@ static __initdata struct {
> > UEFI_PARAM("MemMap Desc. Version", "linux,uefi-mmap-desc-ver", 
> > desc_ver)
> >  };
> >
> > +static __initdata struct {
> > +   const char name[32];
> > +   const char propname[32];
> > +   int offset;
> > +   int size;
> > +} xen_dt_params[] = {
> > +   UEFI_PARAM("System Table", "xen,uefi-system-table", system_table),
> > +   UEFI_PARAM("MemMap Address", "xen,uefi-mmap-start", mmap),
> > +   UEFI_PARAM("MemMap Size", "xen,uefi-mmap-size", mmap_size),
> > +   UEFI_PARAM("MemMap Desc. Size", "xen,uefi-mmap-desc-size", 
> > desc_size),
> > +   UEFI_PARAM("MemMap Desc. Version", "xen,uefi-mmap-desc-ver", 
> > desc_ver)
> > +};
> > +
> 
> We discussed (and agreed afair) that dropping the "linux," prefix from
> the DT properties was an acceptable change. If we do that, and reuse
> the same names in the xen version (i.e., drop the "xen," prefix), we
> could make this change a *lot* simpler.

Regardless of if we drop the "linux," prefix from the existing strings,
I think we need the "xen," prefix here.

The xen EFI interface comes with additional caveats, and we need to
treat it separately.

Unless you're suggesting that /hypervisor/uefi-* is handled differently
to /chosen/uefi-*?

I think I'd still prefer the "xen," prefix regardless.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] arm64/efi: isolate EFI stub from the kernel proper

2015-10-30 Thread Mark Rutland
[Adding Catalin to the To line]

On Fri, Oct 30, 2015 at 01:17:24PM +0100, Ard Biesheuvel wrote:
> On 27 October 2015 at 23:44, Jeremy Linton  wrote:
> > On 10/26/2015 09:20 PM, Ard Biesheuvel wrote:
> >>
> >> Thanks for the report. The following patch should fix it
> >
> >
> > Ard,
> >
> > Thanks, it does fix the build problem...
> >
> > Now to get the darn thing to boot (not related to this AFAIK).
> >
> 
> Catalin,
> 
> Would you like me to resend this as a separate patch?

Ard's referring to the patch in a prior reply [1].

Mark.

[1] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-October/381074.html
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [v2] arm64: efi: make sure vmlinux load address aligned on 2MB

2015-10-29 Thread Mark Rutland
On Thu, Oct 29, 2015 at 11:59:46AM +0900, Ard Biesheuvel wrote:
> On 29 October 2015 at 03:21, Mark Rutland <mark.rutl...@arm.com> wrote:
> > On Wed, Oct 28, 2015 at 01:12:36PM -0500, Timur Tabi wrote:
> >> On 10/28/2015 01:08 PM, Mark Rutland wrote:
> >>
> >> >arm64: efi: ensure kernel is loaded at correct address
> >> >
> >> >The kernel image needs to be loaded text_offset_bytes from a 2M-aligned
> >> >base, per Documentation/arm64/booting.txt. If loaded at the wrong offset
> >> >modulo 2M, __create_page_tables will create incorrect page tables.
> >> >
> >> >The EFI stub implicitly assumes that dram_base (i.e. the lowest address
> >> >with a EFI_MEMORY_WB attribute) is 2M-aligned, and tries to load the
> >> >kernel at dram_base + TEXT_OFFSET. If dram_base is not 2M-aligned, the
> >> >kernel will be loaded at the wrong offset from 2M.
> >>
> >> Thanks, I'll use that.  I messed up a couple other things, so I need
> >> to send out a v3 anyway.
> >>
> >> >>-   *image_addr = *reserve_addr = dram_base + TEXT_OFFSET;
> >> >>+   *image_addr = *reserve_addr =
> >> >>+   round_up(dram_base, SZ_2M) + TEXT_OFFSET;
> >> >
> >> >We also need to fix the test for whether we need to relocate the kernel:
> >> >(*image_addr != (dram_base + TEXT_OFFSET)).
> >> >
> >> >When dram_base is not 2M aligned, that is broken, and it's been broken
> >> >since it was introduced in commit 3c7f255039a2ad6e ("arm64: efi: add EFI
> >> >stub") in v3.16.
> >> >
> >> >It's a bit hideous to fix the general case, though, it seems.
> >>
> >> Um, so I should I do something more in my v3 patch, or is this a
> >> change for a different patch?
> >
> > I think there should be a single patch, but please hold off v3 for a day
> > or so. I think there a few more edge cases here, and I'm currently
> > investigating.
> >
> 
> Apologies for the drive-by nature of my contributions to this thread.
> I am currently travelling.
> 
> I think the below should address both issues (and I even tried to
> compile it this time)
> 
> -8<-
> diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c
> index 816120ece6bc..78dfbd34b6bf 100644
> --- a/arch/arm64/kernel/efi-stub.c
> +++ b/arch/arm64/kernel/efi-stub.c
> @@ -25,10 +25,20 @@
> unsigned long kernel_size, kernel_memsize = 0;
> unsigned long nr_pages;
> void *old_image_addr = (void *)*image_addr;
> +   unsigned long preferred_offset;
> +
> +   /*
> +* The preferred offset of the kernel Image is TEXT_OFFSET bytes 
> beyond
> +* a 2 MB aligned base, which itself may be lower than dram_base, as
> +* long as the resulting offset equals or exceeds it.
> +*/
> +   preferred_offset = round_down(dram_base, SZ_2M) + TEXT_OFFSET;
> +   if (preferred_offset < dram_base)
> +   preferred_offset += SZ_2M;
> 
> /* Relocate the image, if required. */
> kernel_size = _edata - _text;
> -   if (*image_addr != (dram_base + TEXT_OFFSET)) {
> +   if (*image_addr != preferred_offset) {
> kernel_memsize = kernel_size + (_end - _edata);
> 
> /*
> @@ -42,7 +52,7 @@
>  * Mustang), we can still place the kernel at the address
>  * 'dram_base + TEXT_OFFSET'.
>  */
> -   *image_addr = *reserve_addr = dram_base + TEXT_OFFSET;
> +   *image_addr = *reserve_addr = preferred_offset;
> nr_pages = round_up(kernel_memsize, EFI_ALLOC_ALIGN) /
>EFI_PAGE_SIZE;
> status = efi_call_early(allocate_pages, EFI_ALLOCATE_ADDRESS,
> 

This looks good to me, and I've given it a spin on Juno (though I
haven't fiddled with dram_base). I trust that you will respin this as a
patch when you get the chance.

There is another (existing) problem I spotted, in that we'll sometimes
move the kernel to a worse address. If the kernel was loaded at a valid
address (i.e. image_addr % SZ_2MB == TEXT_OFFSET), but not at the
preferred offset, we try to relocate it, even if it's already at the
lowest possible address.

That doesn't break boot, so it's not as big a problem, and it's probably
better sovled with the split VA stuff.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: efi: make sure vmlinux load address aligned on 2MBytes

2015-10-28 Thread Mark Rutland
On Tue, Oct 27, 2015 at 09:13:22PM -0500, Timur Tabi wrote:
> Ard Biesheuvel wrote:
> >No. I screwed up the patch since the trailing ) should not be there,
> >but we do need to round first, and only then add the offset.
> 
> But if the offset is not a multiple of 2MB, won't the address passed
> to allocate_pages be unaligned?  I'll test your patch on our system,
> but I thought the problem was that the allocated address must be
> aligned.

The kernel needs to be loaded at an address which is text_offset bytes
past a 2M-aligned base. It is not loaded at the 2M-aligned base itself.

Ard's patch correctly findd a 2M-aligned base, then adds the text
offset.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: efi: make sure vmlinux load address aligned on 2MBytes

2015-10-28 Thread Mark Rutland
On Wed, Oct 28, 2015 at 12:11:57PM -0500, Timur Tabi wrote:
> On 10/27/2015 09:06 PM, Ard Biesheuvel wrote:
> >
> >diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c
> >index 816120ece6bc..a60ce249cfc0 100644
> >--- a/arch/arm64/kernel/efi-stub.c
> >+++ b/arch/arm64/kernel/efi-stub.c
> >@@ -42,7 +42,8 @@
> >  * Mustang), we can still place the kernel at the address
> >  * 'dram_base + TEXT_OFFSET'.
> >  */
> >-   *image_addr = *reserve_addr = dram_base + TEXT_OFFSET;
> >+   *image_addr = *reserve_addr = round_up(dram_base, SZ_2M) +
> >+  TEXT_OFFSET);
> > nr_pages = round_up(kernel_memsize, EFI_ALLOC_ALIGN) /
> >EFI_PAGE_SIZE;
> > status = efi_call_early(allocate_pages, 
> > EFI_ALLOCATE_ADDRESS,
> 
> Tested-by: Timur Tabi <ti...@codeaurora.org>
> Tested-by: Shanker Donthineni <shank...@codeaurora.org>

I assume with the trailing ')' removed. ;)

With that:

Acked-by: Mark Rutland <mark.rutl...@arm.com>

> However, I think this formatting is easier to read:
> 
>   *image_addr = *reserve_addr =
>   round_up(dram_base, SZ_2M) + TEXT_OFFSET;

I have no strong feeling on this either way.

> This does make the kernel boot, but we suspect that there may be
> another problem.  We need to investigate it, but we have a suspicion
> that the EFI stub is trying to allocate from the Runtime Data block,
> and the alignment adjustment "fixes" the problem by moving the
> pointer to Conventional Memory.

I don't follow. EFI_BOOT_SERVICES.AllocatePages() should only give us
pages which are available, so it shouldn't ever return pages which are
runtime data -- it would fail and we'd fall back to efi_low_alloc().

Could you elaborate?

> Anyway, is there any chance we can get this fix into 4.3?  I'd hate
> to have 4.3 released knowing that it's broken on our hardware.

It's probably too late now, but it should certainly be Cc'd stable and
get backported.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [v2] arm64: efi: make sure vmlinux load address aligned on 2MB

2015-10-28 Thread Mark Rutland
On Wed, Oct 28, 2015 at 01:12:36PM -0500, Timur Tabi wrote:
> On 10/28/2015 01:08 PM, Mark Rutland wrote:
> 
> >arm64: efi: ensure kernel is loaded at correct address
> >
> >The kernel image needs to be loaded text_offset_bytes from a 2M-aligned
> >base, per Documentation/arm64/booting.txt. If loaded at the wrong offset
> >modulo 2M, __create_page_tables will create incorrect page tables.
> >
> >The EFI stub implicitly assumes that dram_base (i.e. the lowest address
> >with a EFI_MEMORY_WB attribute) is 2M-aligned, and tries to load the
> >kernel at dram_base + TEXT_OFFSET. If dram_base is not 2M-aligned, the
> >kernel will be loaded at the wrong offset from 2M.
> 
> Thanks, I'll use that.  I messed up a couple other things, so I need
> to send out a v3 anyway.
> 
> >>-   *image_addr = *reserve_addr = dram_base + TEXT_OFFSET;
> >>+   *image_addr = *reserve_addr =
> >>+   round_up(dram_base, SZ_2M) + TEXT_OFFSET;
> >
> >We also need to fix the test for whether we need to relocate the kernel:
> >(*image_addr != (dram_base + TEXT_OFFSET)).
> >
> >When dram_base is not 2M aligned, that is broken, and it's been broken
> >since it was introduced in commit 3c7f255039a2ad6e ("arm64: efi: add EFI
> >stub") in v3.16.
> >
> >It's a bit hideous to fix the general case, though, it seems.
> 
> Um, so I should I do something more in my v3 patch, or is this a
> change for a different patch?

I think there should be a single patch, but please hold off v3 for a day
or so. I think there a few more edge cases here, and I'm currently
investigating.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: efi: make sure vmlinux load address aligned on 2MBytes

2015-10-28 Thread Mark Rutland
On Wed, Oct 28, 2015 at 04:02:23PM -0500, Timur Tabi wrote:
> On 10/28/2015 12:26 PM, Mark Rutland wrote:
> >>>This does make the kernel boot, but we suspect that there may be
> >>>another problem.  We need to investigate it, but we have a suspicion
> >>>that the EFI stub is trying to allocate from the Runtime Data block,
> >>>and the alignment adjustment "fixes" the problem by moving the
> >>>pointer to Conventional Memory.
> >I don't follow. EFI_BOOT_SERVICES.AllocatePages() should only give us
> >pages which are available, so it shouldn't ever return pages which are
> >runtime data -- it would fail and we'd fall back to efi_low_alloc().
> >
> >Could you elaborate?
> 
> So we're still debugging this internally, but it turns out that
> dram_base is equal to 0x400082, which also happens to be the
> start of a Runtime Data block:
> 
>   0x00400082-0x00400085 [Runtime Data   |RUN|XP|  |  |
> |WB|WT|WC|UC]
> 
> I think this is not supposed to happen.

It's perfectly valid for that to be detected as dram_base, and the stub may
call AllocatePages() for that region, but AllocatePages() shouldn't
successfully allocate from there.

The stub should fall back to efi_low_alloc, walking through the memory map
until it finds a large enough region to allocate from, with some subsequent
AllocatePages() call eventually succeeding.

Is that not what you're seeing?

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/6] KASAN for arm64

2015-10-09 Thread Mark Rutland
On Fri, Oct 09, 2015 at 12:32:18PM +0300, Andrey Ryabinin wrote:
[...]

> I thought the EFI stub isolation patches create a copy of mem*() functions in 
> the stub,
> but they are just create aliases with __efistub_ prefix.
> 
> We only need to create some more aliases for KASAN.
> The following patch on top of the EFI stub isolation series works for me.
> 
> 
> Signed-off-by: Andrey Ryabinin 
> ---
>  arch/arm64/kernel/image.h | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
> index e083af0..6eb8fee 100644
> --- a/arch/arm64/kernel/image.h
> +++ b/arch/arm64/kernel/image.h
> @@ -80,6 +80,12 @@ __efistub_strcmp   = __pi_strcmp;
>  __efistub_strncmp= __pi_strncmp;
>  __efistub___flush_dcache_area= __pi___flush_dcache_area;
>  
> +#ifdef CONFIG_KASAN
> +__efistub___memcpy   = __pi_memcpy;
> +__efistub___memmove  = __pi_memmove;
> +__efistub___memset   = __pi_memset;
> +#endif

Ard's v4 stub isolation series has these aliases [1], as the stub
requires these aliases regardless of KASAN in order to link.

Thanks,
Mark.

[1] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-October/375708.html
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/6] KASAN for arm64

2015-10-09 Thread Mark Rutland
On Fri, Oct 09, 2015 at 01:18:09PM +0300, Andrey Ryabinin wrote:
> 2015-10-09 12:48 GMT+03:00 Mark Rutland <mark.rutl...@arm.com>:
> > On Fri, Oct 09, 2015 at 12:32:18PM +0300, Andrey Ryabinin wrote:
> > [...]
> >
> >> I thought the EFI stub isolation patches create a copy of mem*() functions 
> >> in the stub,
> >> but they are just create aliases with __efistub_ prefix.
> >>
> >> We only need to create some more aliases for KASAN.
> >> The following patch on top of the EFI stub isolation series works for me.
> >>
> >>
> >> Signed-off-by: Andrey Ryabinin <ryabinin@gmail.com>
> >> ---
> >>  arch/arm64/kernel/image.h | 6 ++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
> >> index e083af0..6eb8fee 100644
> >> --- a/arch/arm64/kernel/image.h
> >> +++ b/arch/arm64/kernel/image.h
> >> @@ -80,6 +80,12 @@ __efistub_strcmp   = __pi_strcmp;
> >>  __efistub_strncmp= __pi_strncmp;
> >>  __efistub___flush_dcache_area= __pi___flush_dcache_area;
> >>
> >> +#ifdef CONFIG_KASAN
> >> +__efistub___memcpy   = __pi_memcpy;
> >> +__efistub___memmove  = __pi_memmove;
> >> +__efistub___memset   = __pi_memset;
> >> +#endif
> >
> > Ard's v4 stub isolation series has these aliases [1], as the stub
> > requires these aliases regardless of KASAN in order to link.
> 
> Stub isolation series has __efistub_memcpy, not __efistub___memcpy
> (two additional '_').

Ah, I see, sorry for my sloppy reading.

> The thing is, KASAN provides own implementation of memcpy() which
> checks memory before access.
> The original 'memcpy()' becomes __memcpy(), so we could still use it.

Ok.

> In code that not instrumented by KASAN (like the EFI stub) we replace
> KASAN's memcpy() with the original __mempcy():
> #define memcpy() __memcpy()

I'm a little confused by this. Surely that doesn't override implicit
calls generated by the compiler, leaving us with a mixture of calls to
memcpy and __memcpy?

That doesn't matter for the stub, as both __efistub_mem* and
__efistub___mem* would point at __pe_mem*, but doesn't that matter for
other users that shouldn't be instrumented?

Is that not a problem, or do we inhibit/override that somehow?

> So with CONFIG_KASAN=y the EFI stub uses __memcpy, thus we need to
> create the __efistub___memcpy alias.

Ok, that makes sense to me.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/6] KASAN for arm64

2015-10-08 Thread Mark Rutland
On Thu, Oct 08, 2015 at 01:36:09PM +0300, Andrey Ryabinin wrote:
> 2015-10-07 13:04 GMT+03:00 Catalin Marinas :
> > On Thu, Sep 17, 2015 at 12:38:06PM +0300, Andrey Ryabinin wrote:
> >> As usual patches available in git
> >>   git://github.com/aryabinin/linux.git kasan/arm64v6
> >>
> >> Changes since v5:
> >>  - Rebase on top of 4.3-rc1
> >>  - Fixed EFI boot.
> >>  - Updated Doc/features/KASAN.
> >
> > I tried to merge these patches (apart from the x86 one which is already
> > merged) but it still doesn't boot on Juno as an EFI application.
> >
> 
> 4.3-rc1 was ok and 4.3-rc4 is not. Break caused by 0ce3cc008ec04
> ("arm64/efi: Fix boot crash by not padding between EFI_MEMORY_RUNTIME
> regions")
> It introduced sort() call in efi_get_virtmap().
> sort() is generic kernel function and it's instrumented, so we crash
> when KASAN tries to access shadow in sort().
> 
> [+CC efi some guys]
> 
> Comment in drivers/firmware/efi/libstub/Makefile says that EFI stub
> executes with MMU disabled:
> # The stub may be linked into the kernel proper or into a separate
> boot binary,
> # but in either case, it executes before the kernel does (with MMU
> disabled) so
> # things like ftrace and stack-protector are likely to cause trouble if 
> left
> # enabled, even if doing so doesn't break the build.
> 
> But in arch/arm64/kernel/efi-entry.S:
> * We arrive here from the EFI boot manager with:
> *
> ** CPU in little-endian mode
> ** MMU on with identity-mapped RAM
> 
> So is MMU enabled in ARM64 efi-stub?

The stub is executed as an EFI application, which means that the MMU is
on, and the page tables are an idmap owned by the EFI implementation.

> If yes, we could solve this issue by mapping KASAN early shadow in efi stub.

As the page tables are owned by the implemenation and not the kernel, we
cannot alter them (at least not until we've called ExitBootServices(),
which happens relatively late).

Can we not build the stub without ASAN protections?

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters

2015-09-14 Thread Mark Rutland
On Sat, Sep 12, 2015 at 12:36:55PM +0100, Daniel Kiper wrote:
> On Fri, Sep 11, 2015 at 05:25:59PM +0100, Mark Rutland wrote:
> > On Fri, Sep 11, 2015 at 01:46:43PM +0100, Daniel Kiper wrote:
> > > On Thu, Sep 10, 2015 at 05:23:02PM +0100, Mark Rutland wrote:
> 
> [...]
> 
> > > > What's troublesome with the boot services?
> > > >
> > > > What can't be simulated?
> > >
> > > How do you want to access bare metal EFI boot services from dom0 if they
> > > were shutdown long time ago before loading dom0 image?
> >
> > I don't want to.
> >
> > I asked "What can't be simulated?" because I assumed everything
> > necessary/mandatory could be simulated without needinng access to any
> > real EFI boot services.
> >
> > As far as I can see all that's necessary is to provide a compatible
> > interface.
> 
> Could you be more precise what do you need? Please enumerate. UEFI spec has
> more than 2500 pages and I do not think that we need all stuff in dom0.
> 
> > > What do you need from EFI boot services in dom0?
> >
> > The ability to call ExitBootServices() and SetVirtualAddressMap() on a
> > _virtual_ address map for _virtual_ services provided by the hypervisor.
> 
> I am confused. Why do you need that? Please remember, EFI is owned and
> operated by Xen hypervisor. dom0 does not have direct access to EFI.

Let's take a step back.

My objection here is to passing the Dom0 kernel properties as if it were
booted with direct access to a full UEFI, then later fixing that up
(when Xen is detected and we apply its hypercall EFI implementation).

If the kernel cannot use EFI natively, why pretend to the kernel that it
can? The hypercall implementation is _not_ EFI (though it provides
access to some services).

The two ways I can see providing Dom0 with EFI services are:

* Have Xen create shims for any services, in which any hypercalls live,
  and pass these to the kernel with a virtual system table. This keeps
  the interface to the kernel the same regardless of Xen.

* Have the kernel detect Xen EFI capability via Xen, without passing the
  usual native EFI parameters. This can then be installed into the
  kernel in a Xen-specific manner, and we know from the outset that
  Xen-specific caveats apply.

As per my original email, I'm not against the renaming of the stub
parameters if we standardise the rest of the details, but I believe
that's orthogonal to the Xen Dom0 case.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters

2015-09-11 Thread Mark Rutland
> >> Considering that the EFI support is just for Dom0, and Dom0 (at
> >> the time) had to be PV anyway, it was the more natural solution to
> >> expose the interface via hypercalls, the more that this allows better
> >> control over what is and primarily what is not being exposed to
> >> Dom0. With the wrapper approach we'd be back to the same
> >> problem (discussed elsewhere) of which EFI version to surface: The
> >> host one would impose potentially missing extensions, while the
> >> most recent hypervisor known one might imply hiding valuable
> >> information from Dom0. Plus there are incompatible changes like
> >> the altered meaning of EFI_MEMORY_WP in 2.5.
> > 
> > I'm not sure I follow how hypercalls solve any impedance mismatch here;
> > you're still expecting Dom0 to call up to Xen in order to perform calls,
> > and all I suggested was a different location for those hypercalls.
> > 
> > If Xen is happy to make such calls blindly, why does it matter if the
> > hypercall was in the kernel binary or an external shim?
> 
> Because there could be new entries in SystemTable->RuntimeServices
> (expected and blindly but validly called by the kernel). Even worse
> (because likely harder to deal with) would be new fields in other
> structures.

Any of these could cause Xen to blow up, while Xen could always provide
a known-safe (but potentially sub-optimal) view to the kernel by
default.

> > Incompatible changes are a spec problem regardless of how this is
> > handled.
> 
> Not necessarily - we don't expose the memory map (we'd have to
> if we were to mimic EFI for Dom0), and hence the mentioned issue
> doesn't exist in our model.

We have to expose _some_ memory map, so I don't follow this point.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters

2015-09-11 Thread Mark Rutland
> It feels like this discussion is going in circles.
> 
> When we discussed this six months ago, we already concluded that,
> since UEFI is the only specified way that the presence of ACPI is
> advertised on an ARM system, we need to emulate UEFI to some extent.

My understanding from the last time I was present at such a discussion
was that the emulation was complete from the kernel's PoV (i.e. no
special case handling was required). 

Evidently I misunderstood.

One of the main points of rationale for requiring EFI was that we'd have
a well-defined system state as per the EFI (and ACPI) standards. We'd
know we had the EFI memory map, services, etc (with the memory map and
configuration tables being the most important elements). We didn't want
to have to try to reconcile a DT memory map and ACPI, for instance.

That is somewhat (though admitedly not entirely) broken if we require a
set of rules to be applied beyond what the standards mandate.  That is
broken if we require a non-standard set of rules to be applied, and
limits what we can do in the !Xen case.

> So we need the EFI system table to expose the UEFI configuration table
> that carries the ACPI root pointer.
> 
> Since ACPI support also relies on the UEFI memory map (I think?), we
> need that as well.
> 
> These two items are exactly what we pass via the UEFI DT properties,
> so we should indeed promote the current de-facto binding to a proper
> binding, and renaming the properties makes sense in that context.

I agree that we need to sort these out.

> I agree that this should also include a description of the expected
> state of the firmware, i.e., that ExitBootServices() has been called,
> and that the memory map has been populated with virtual address, which
> have been installed using SetVirtualAddressMap() if they differ from
> the physical addresses. (The current implementation on the kernel side
> is perfectly capable of dealing with a 1:1 mapping).
> 
> Beyond that, there is no point in pretending to be a full UEFI
> implementation, imo. Boot services are not required, nor are runtime
> services (only the current EFI init code on arm needs to be modified
> to deal with a NULL runtime services pointer)

I'm not keen on this because it involves applying Xen-specific caveats
atop of what the UEFI spec says (e.g. as runtime services might be NULL
under Xen). As the spec and Xen evolve, those caveats shift, and that's
going to be fragile for all users regardleses of Xen.

If Xen needs special-casing, then I'd rather that Xen were detected
first and provided us with what it knows is safe for us to use, rather
than we tip-toe around until we're sure Xen isn't present and/or doesn't
need additional constraints met.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters

2015-09-11 Thread Mark Rutland
On Fri, Sep 11, 2015 at 01:46:43PM +0100, Daniel Kiper wrote:
> On Thu, Sep 10, 2015 at 05:23:02PM +0100, Mark Rutland wrote:
> > > > C) When you could go:
> > > >
> > > >DT -> Discover Xen -> Xen-specific stuff -> Xen-specific EFI/ACPI 
> > > > discovery
> > >
> > > I take you mean discovering Xen with the usual Xen hypervisor node on
> > > device tree. I think that C) is a good option actually. I like it. Not
> > > sure why we didn't think about this earlier. Is there anything EFI or
> > > ACPI which is needed before Xen support is discovered by
> > > arch/arm64/kernel/setup.c:setup_arch -> xen_early_init()?
> >
> > Currently lots (including the memory map). With the stuff to support
> > SPCR, the ACPI discovery would be moved before xen_early_init().
> >
> > > If not, we could just go for this. A lot of complexity would go away.
> >
> > I suspect this would still be fairly complex, but would at least prevent
> > the Xen-specific EFI handling from adversely affecting the native case.
> >
> > > > D) If you want to be generic:
> > > >EFI -> EFI application -> EFI tables -> ACPI tables -> Xen-specific 
> > > > stuff
> > > >   \--/
> > > >(virtualize these, provide shims to Dom0, but handle
> > > > everything in Xen itself)
> > >
> > > I think that this is good in theory but could turn out to be a lot of
> > > work in practice. We could probably virtualize the RuntimeServices but
> > > the BootServices are troublesome.
> >
> > What's troublesome with the boot services?
> >
> > What can't be simulated?
> 
> How do you want to access bare metal EFI boot services from dom0 if they
> were shutdown long time ago before loading dom0 image?

I don't want to.

I asked "What can't be simulated?" because I assumed everything
necessary/mandatory could be simulated without needinng access to any
real EFI boot services.

As far as I can see all that's necessary is to provide a compatible
interface.

> What do you need from EFI boot services in dom0?

The ability to call ExitBootServices() and SetVirtualAddressMap() on a
_virtual_ address map for _virtual_ services provided by the hypervisor.
A console so that I can log things early on.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters

2015-09-10 Thread Mark Rutland
Hi,

I'm not necessarily opposed to the renaming, but I think that this is
the least important thing to standardize for this to work.

On Thu, Sep 10, 2015 at 09:41:56AM +0100, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> These EFI stub parameters are used to internal communication between EFI
> stub and Linux kernel and EFI stub creates these parameters. But for Xen
> on ARM when booting with UEFI, Xen will create a minimal DT providing
> these parameters for Dom0 and Dom0 is not only Linux kernel, but also
> other OS (such as FreeBSD) which will be used in the future.

Currently the Linux EFI stub and kernel have a symbiotic relationship,
because they're intimately coupled and we don't have kexec (yet) on EFI
platforms to loosen that coupling.

If an agent other than the (kernel-specific) stub is going to provide
this to the kernel, then we need more specified than just the property
names.

That at least includes the following:

* The state of boot services (we currently have the EFI stub call
  ExitBootServices(), and I believe this is crucial to the plan for
  kexec).

* The state of the address map (we currently have the EFI stub call
  SetVirtualAddressMap()).

* The virtual address range(s) that SetVirtualAddressMap() may map
  elements to (this logic is currently in the EFI stub, and this matches
  the expectations of the kernel that it is tied to).

> So here we plan to standardize the names by dropping the prefix
> "linux," and make them common to other OS. Also this will not break
> the compatibility since these parameters are used to internal
> communication between EFI stub and kernel.

For the moment this is true, but will not be once we have kexec, so
there's a dependency (or anti-dependency) there.

> Signed-off-by: Shannon Zhao 
> ---
> Look at [1] for the discussion about this in Xen ML. The purpose of this
> patch is to standardize the names to make Linux ARM kernel work on Xen
> with UEFI. Also it hopes other OS(e.g. FreeBSD), which will be used as
> Dom0 on Xen, could support this mechanism as well.
> 
> [1]http://lists.xenproject.org/archives/html/xen-devel/2015-08/msg02250.html

Per this post, it looks like to pass a DTB to the kernel Xen already
needs to know it's a Linux kernel...

I wasn't aware that there was a common standard for arm(64) kernels
other than a PE/COFF EFI application.

Does Xen not talk to EFI itself and/or give the kernel a virtual EFI
interface?

Thanks,
Mark.

>  Documentation/arm/uefi.txt | 10 +-
>  drivers/firmware/efi/efi.c | 10 +-
>  drivers/firmware/efi/libstub/fdt.c | 10 +-
>  3 files changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/Documentation/arm/uefi.txt b/Documentation/arm/uefi.txt
> index d60030a..8c83243 100644
> --- a/Documentation/arm/uefi.txt
> +++ b/Documentation/arm/uefi.txt
> @@ -45,18 +45,18 @@ following parameters:
>  
> 
>  Name  | Size   | Description
>  
> 
> -linux,uefi-system-table   | 64-bit | Physical address of the UEFI System 
> Table.
> +uefi-system-table | 64-bit | Physical address of the UEFI System 
> Table.
>  
> 
> -linux,uefi-mmap-start | 64-bit | Physical address of the UEFI memory map,
> +uefi-mmap-start   | 64-bit | Physical address of the UEFI memory map,
>|| populated by the UEFI GetMemoryMap() 
> call.
>  
> 
> -linux,uefi-mmap-size  | 32-bit | Size in bytes of the UEFI memory map
> +uefi-mmap-size| 32-bit | Size in bytes of the UEFI memory map
>|| pointed to in previous entry.
>  
> 
> -linux,uefi-mmap-desc-size | 32-bit | Size in bytes of each entry in the UEFI
> +uefi-mmap-desc-size   | 32-bit | Size in bytes of each entry in the UEFI
>|| memory map.
>  
> 
> -linux,uefi-mmap-desc-ver  | 32-bit | Version of the mmap descriptor format.
> +uefi-mmap-desc-ver| 32-bit | Version of the mmap descriptor format.
>  
> 
>  linux,uefi-stub-kern-ver  | string | Copy of linux_banner from build.
>  
> 
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index d6144e3..3878715 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -481,11 +481,11 @@ static __initdata struct {
>  

Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters

2015-09-10 Thread Mark Rutland
On Thu, Sep 10, 2015 at 12:37:57PM +0100, Stefano Stabellini wrote:
> On Thu, 10 Sep 2015, Mark Rutland wrote:
> > > > Does Xen not talk to EFI itself and/or give the kernel a virtual EFI
> > > > interface?
> > > 
> > > Xen talks to EFI itself but the interface provided to dom0 is somewhat
> > > different: there are no BootServices (Xen calls ExitBootServices before
> > > running the kernel), and the RuntimeServices go via hypercalls (see
> > > drivers/xen/efi.c).
> > 
> > That's somewhat hideous; a non Xen-aware OS wouild presumably die if
> > trying to use any runtime services the normal way? I'm not keen on
> > describing things that the OS cannot use.
>  
> I agree that is somewhat hideous, but a non-Xen aware OS traditionally
> has never been able to even boot as Dom0. On ARM it can, but it still
> wouldn't be very useful (one couldn't use it to start other guests).

Sure, but it feels odd to provide the usual information in this manner
if it cannot be used. If you require Xen-specific code to make things
work, I would imagine this information could be dciscovered in a
Xen-specific manner.

> > Why can't Xen give a virtual EFI interface to Dom0 / guests? e.g.
> > create pages of RuntimeServicesCode that are trivial assembly shims
> > doing hypercalls, and plumb these into the virtual EFI memory map and
> > tables?
> > 
> > That would keep things sane for any guest, allow for easy addition of
> > EFI features, and you could even enter the usual EFI entry point,
> > simulate ExitBootServices(), SetVirtualAddressMap(), and allow the guest
> > to make things sane for itself...
> 
> That's the way it was done on x86 and now we have common code both in
> Linux (drivers/xen/efi.c) and Xen (xen/common/efi) which implement this
> scheme.

This code is not currently used on arm. It might live in a location
where it may be shared, but that doesn't mean that it's common code yet.

> Switching to a different solution for ARM, would mean diverging
> with x86, which is not nice, or reimplementing the x86 solution too,
> which is expensive.
> 
> BTW I think that the idea you proposed was actually considered at the
> time and deemed hard to implement, if I recall correctly.

I appreciate that divergence is painful. We already diverge in other
respects (e.g. lack of PV page tables) because things that used to be
the case on x86 never applied to ARM.

It would be interesting to see why that was the case for x86, and
whether that applies to ARM.

> In any case this should be separate from the shim ABI discussion.

I disagree; I think this is very much relevant to the ABI discussion.
That's not to say that I insist on a particular approach, but I think
that they need to be considered together.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters

2015-09-10 Thread Mark Rutland
> > Does Xen not talk to EFI itself and/or give the kernel a virtual EFI
> > interface?
> 
> Xen talks to EFI itself but the interface provided to dom0 is somewhat
> different: there are no BootServices (Xen calls ExitBootServices before
> running the kernel), and the RuntimeServices go via hypercalls (see
> drivers/xen/efi.c).

That's somewhat hideous; a non Xen-aware OS wouild presumably die if
trying to use any runtime services the normal way? I'm not keen on
describing things that the OS cannot use.

Why can't Xen give a virtual EFI interface to Dom0 / guests? e.g.
create pages of RuntimeServicesCode that are trivial assembly shims
doing hypercalls, and plumb these into the virtual EFI memory map and
tables?

That would keep things sane for any guest, allow for easy addition of
EFI features, and you could even enter the usual EFI entry point,
simulate ExitBootServices(), SetVirtualAddressMap(), and allow the guest
to make things sane for itself...

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm64/efi: don't pad between EFI_MEMORY_RUNTIME regions

2015-09-10 Thread Mark Rutland
Hi,

FWIW I gave this a spin on Seattle and Juno and saw no regressions (both
are pre-2.5 EFI though).

I have some concerns below.

On Wed, Sep 09, 2015 at 08:06:54AM +0100, Ard Biesheuvel wrote:
> The new Properties Table feature introduced in UEFIv2.5 may split
> memory regions that cover PE/COFF memory images into separate code
> and data regions. Since these regions only differ in the type (runtime
> code vs runtime data) and the permission bits, but not in the memory
> type attributes (UC/WC/WT/WB), the spec does not require them to be
> aligned to 64 KB.

We should require those to be 64k-aligned for permissions too. I can
imagine vendors getting permissions wrong but things happening to work
for a 64k kernel (where I assume we have to use the superset of all
permissions within a 64k page).

> As the relative offset of PE/COFF .text and .data segments cannot be
> changed on the fly, this means that we can no longer pad out those
> regions to be mappable using 64 KB pages.
> Unfortunately, there is no annotation in the UEFI memory map that
> identifies data regions that were split off from a code region, so we
> must apply this logic to all adjacent runtime regions whose attributes
> only differ in the permission bits.
> 
> So instead of rounding each memory region to 64 KB alignment at both
> ends, only round down regions that are not directly preceded by another
> runtime region with the same type attributes. Since the UEFI spec does
> not mandate that the memory map be sorted, this means we also need to
> sort it first.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
> 
> As discussed off list, this is the arm64 side of what we should backport
> to stable to prevent firmware that adheres to the current version of the
> UEFI v2.5 spec with the memprotect feature enabled from blowing up the system
> upon the first OS call into the runtime services.
> 
> For arm64, we already map things in order, but since the spec does not mandate
> a sorted memory map, we need to sort it to be sure. This also allows us to
> easily find adjacent regions with < 64 KB granularity, which the current 
> version
> of the spec allows if they only differ in permission bits (which the spec says
> are 'unused' on AArch64, which could be interpreted as 'allowed but ignored').
> 
> Changes since v1:
> - Ensure that we don't inadvertently set the XN bit on the preceding region at
>   mapping time if we the OS is running with >4 KB pages.
>   
>  arch/arm64/kernel/efi.c |  3 +-
>  drivers/firmware/efi/libstub/arm-stub.c | 62 +++-
>  2 files changed, 49 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
> index e8ca6eaedd02..13671a9cf016 100644
> --- a/arch/arm64/kernel/efi.c
> +++ b/arch/arm64/kernel/efi.c
> @@ -258,7 +258,8 @@ static bool __init efi_virtmap_init(void)
>*/
>   if (!is_normal_ram(md))
>   prot = __pgprot(PROT_DEVICE_nGnRE);
> - else if (md->type == EFI_RUNTIME_SERVICES_CODE)
> + else if (md->type == EFI_RUNTIME_SERVICES_CODE ||
> +  !PAGE_ALIGNED(md->phys_addr))
>   prot = PAGE_KERNEL_EXEC;

This looks coarser than necessary. For memory organised like:

0x - 0xF000 (60KiB) : EFI_RUNTIME_SERVICES_CODE 
0xF000 - 0x0002 (68KiB) : EFI_RUNTIME_SERVICES_DATA

We should be able to make the last 64K non-executable, but with this all
128K is executable, unless I've missed something?

Maybe we could do a two-step pass, first mapping the data as
not-executable, then mapping any code pages executable (overriding any
overlapping portions, but only for the overlapping parts).

>   else
>   prot = PAGE_KERNEL;
> diff --git a/drivers/firmware/efi/libstub/arm-stub.c 
> b/drivers/firmware/efi/libstub/arm-stub.c
> index e29560e6b40b..cb4e9c4de952 100644
> --- a/drivers/firmware/efi/libstub/arm-stub.c
> +++ b/drivers/firmware/efi/libstub/arm-stub.c
> @@ -13,6 +13,7 @@
>   */
>  
>  #include 
> +#include 

Sort isn't an inline in this header. I thought it wasn't safe to call
arbitary kernel functions from the stub?

>  #include 
>  
>  #include "efistub.h"
> @@ -305,6 +306,13 @@ fail:
>   */
>  #define EFI_RT_VIRTUAL_BASE  0x4000
>  
> +static int cmp_mem_desc(const void *a, const void *b)
> +{
> + const efi_memory_desc_t *left = a, *right = b;
> +
> + return (left->phys_addr > right->phys_addr) ? 1 : -1;
> +}

Nit: please chose names to make the relationship between these clearer.
e.g. s/left/mem_a/, s/right/mem_b/.

> +
>  /*
>   * efi_get_virtmap() - create a virtual mapping for the EFI memory map
>   *
> @@ -316,34 +324,58 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, 
> unsigned long map_size,
>unsigned long desc_size, efi_memory_desc_t *runtime_map,
>int *count)
>  {
> + static const u64 mem_type_mask = 

Re: [PATCH v2] arm64/efi: don't pad between EFI_MEMORY_RUNTIME regions

2015-09-10 Thread Mark Rutland
> >> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
> >> index e8ca6eaedd02..13671a9cf016 100644
> >> --- a/arch/arm64/kernel/efi.c
> >> +++ b/arch/arm64/kernel/efi.c
> >> @@ -258,7 +258,8 @@ static bool __init efi_virtmap_init(void)
> >>*/
> >>   if (!is_normal_ram(md))
> >>   prot = __pgprot(PROT_DEVICE_nGnRE);
> >> - else if (md->type == EFI_RUNTIME_SERVICES_CODE)
> >> + else if (md->type == EFI_RUNTIME_SERVICES_CODE ||
> >> +  !PAGE_ALIGNED(md->phys_addr))
> >>   prot = PAGE_KERNEL_EXEC;
> >
> > This looks coarser than necessary. For memory organised like:
> >
> > 0x - 0xF000 (60KiB) : EFI_RUNTIME_SERVICES_CODE
> > 0xF000 - 0x0002 (68KiB) : EFI_RUNTIME_SERVICES_DATA
> >
> > We should be able to make the last 64K non-executable, but with this all
> > 128K is executable, unless I've missed something?
> >
> 
> In theory, yes. But considering that
> 
> a) this only affects 64 KB pages kernels, and
> b) this patch is intended for -stable
> 
> I chose to keep it simple and ignore this, and just relax the
> permissions for any region that is not aligned to 64 KB.
> 
> Since these regions are only mapped during Runtime Services calls, the
> window for abuse is not that large.

Ok, that does sound reasonable.

> > Maybe we could do a two-step pass, first mapping the data as
> > not-executable, then mapping any code pages executable (overriding any
> > overlapping portions, but only for the overlapping parts).
> >
> 
> Let me have a go at that.

Cheers!

> >>   else
> >>   prot = PAGE_KERNEL;
> >> diff --git a/drivers/firmware/efi/libstub/arm-stub.c 
> >> b/drivers/firmware/efi/libstub/arm-stub.c
> >> index e29560e6b40b..cb4e9c4de952 100644
> >> --- a/drivers/firmware/efi/libstub/arm-stub.c
> >> +++ b/drivers/firmware/efi/libstub/arm-stub.c
> >> @@ -13,6 +13,7 @@
> >>   */
> >>
> >>  #include 
> >> +#include 
> >
> > Sort isn't an inline in this header. I thought it wasn't safe to call
> > arbitary kernel functions from the stub?
> >
> 
> We call string functions, cache maintenance functions, libfdt
> functions etc etc so it seems not everyone got the memo :-)
> 
> I agree that treating vmlinux both as a static library and as a
> payload from the stub's pov is a bit sloppy, and I do remember
> discussing this, but for the life of me, I can't remember the exact
> issue, other than the use of adrp/add and adrp/ldr pairs, which we
> fixed by setting the PE/COFF section alignment to 4 KB.

I only had a vague recollection that there was a problem, which I
thought was more to do with potential use of absolute kernel virtual
addresses, which would be incorrect in the context of an EFI
application.

Digging a bit, the stub code itself is safe due to commit
f4f75ad5741fe033 ("efi: efistub: Convert into static library"), but that
isn't necessarily true of anything it calls (libfdt uses callbacks in
several places). I think the cache functions we call are all raw asm
which is position-oblivious.

We do seem to be ok so far, however. Maybe we just need to keep an eye
out.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters

2015-09-10 Thread Mark Rutland
On Thu, Sep 10, 2015 at 02:52:25PM +0100, Stefano Stabellini wrote:
> On Thu, 10 Sep 2015, Mark Rutland wrote:
> > On Thu, Sep 10, 2015 at 12:37:57PM +0100, Stefano Stabellini wrote:
> > > On Thu, 10 Sep 2015, Mark Rutland wrote:
> > > > > > Does Xen not talk to EFI itself and/or give the kernel a virtual EFI
> > > > > > interface?
> > > > > 
> > > > > Xen talks to EFI itself but the interface provided to dom0 is somewhat
> > > > > different: there are no BootServices (Xen calls ExitBootServices 
> > > > > before
> > > > > running the kernel), and the RuntimeServices go via hypercalls (see
> > > > > drivers/xen/efi.c).
> > > > 
> > > > That's somewhat hideous; a non Xen-aware OS wouild presumably die if
> > > > trying to use any runtime services the normal way? I'm not keen on
> > > > describing things that the OS cannot use.
> > >  
> > > I agree that is somewhat hideous, but a non-Xen aware OS traditionally
> > > has never been able to even boot as Dom0. On ARM it can, but it still
> > > wouldn't be very useful (one couldn't use it to start other guests).
> > 
> > Sure, but it feels odd to provide the usual information in this manner
> > if it cannot be used. If you require Xen-specific code to make things
> > work, I would imagine this information could be dciscovered in a
> > Xen-specific manner.
> 
> We need ACPI (or Device Tree) to find that Xen is available on the
> platform, so we cannot use Xen-specific code to get the ACPI tables.

I don't understand. The proposition already involves passing a custom DT
to the OS, implying that Xen knows how to boot that OS, and how to pass
it a DTB.

Consider:

A) In the DT-only case, we go:

   DT -> Discover Xen -> Xen-specific stuff


B) The proposition is that un the ACPI case we go:

   DT -> EFI tables -> ACPI tables -> Discover Xen -> Xen-specific stuff -> 
override EFI/ACPI stuff
 \---/
  (be really cautious here)


C) When you could go:

   DT -> Discover Xen -> Xen-specific stuff -> Xen-specific EFI/ACPI discovery


D) If you want to be generic:
   EFI -> EFI application -> EFI tables -> ACPI tables -> Xen-specific stuff
  \--/
   (virtualize these, provide shims to Dom0, but handle
everything in Xen itself)


E) Partially-generic option:
   EFI -> EFI application -> Xen detected by registered GUID -> Xen-specific 
EFI bootloader stuff -> OS in Xen-specific configuration


> > > In any case this should be separate from the shim ABI discussion.
> > 
> > I disagree; I think this is very much relevant to the ABI discussion.
> > That's not to say that I insist on a particular approach, but I think
> > that they need to be considered together.
> 
> Let's suppose Xen didn't expose any RuntimeServices at all, would that
> make it easier to discuss about the EFI stub parameters?

It would simply the protocol specific to Xen, certainly.

> In the grant scheme of things, they are not that important, as Ian
> wrote what is important is how to pass the RSDP.

Unfortunately we're still going to have to care about this eventually,
even if for something like kexec. So we still need to spec out the state
of things if this is going to be truly generic.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters

2015-09-10 Thread Mark Rutland
On Thu, Sep 10, 2015 at 01:55:25PM +0100, Jan Beulich wrote:
> >>> On 10.09.15 at 13:37, <stefano.stabell...@eu.citrix.com> wrote:
> > On Thu, 10 Sep 2015, Mark Rutland wrote:
> >> Why can't Xen give a virtual EFI interface to Dom0 / guests? e.g.
> >> create pages of RuntimeServicesCode that are trivial assembly shims
> >> doing hypercalls, and plumb these into the virtual EFI memory map and
> >> tables?
> >> 
> >> That would keep things sane for any guest, allow for easy addition of
> >> EFI features, and you could even enter the usual EFI entry point,
> >> simulate ExitBootServices(), SetVirtualAddressMap(), and allow the guest
> >> to make things sane for itself...
> > 
> > That's the way it was done on x86 and now we have common code both in
> > Linux (drivers/xen/efi.c) and Xen (xen/common/efi) which implement this
> > scheme.  Switching to a different solution for ARM, would mean diverging
> > with x86, which is not nice, or reimplementing the x86 solution too,
> > which is expensive.
> > 
> > BTW I think that the idea you proposed was actually considered at the
> > time and deemed hard to implement, if I recall correctly.
> 
> Considering that the EFI support is just for Dom0, and Dom0 (at
> the time) had to be PV anyway, it was the more natural solution to
> expose the interface via hypercalls, the more that this allows better
> control over what is and primarily what is not being exposed to
> Dom0. With the wrapper approach we'd be back to the same
> problem (discussed elsewhere) of which EFI version to surface: The
> host one would impose potentially missing extensions, while the
> most recent hypervisor known one might imply hiding valuable
> information from Dom0. Plus there are incompatible changes like
> the altered meaning of EFI_MEMORY_WP in 2.5.

I'm not sure I follow how hypercalls solve any impedance mismatch here;
you're still expecting Dom0 to call up to Xen in order to perform calls,
and all I suggested was a different location for those hypercalls.

If Xen is happy to make such calls blindly, why does it matter if the
hypercall was in the kernel binary or an external shim?

Incompatible changes are a spec problem regardless of how this is
handled.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm64/efi: don't pad between EFI_MEMORY_RUNTIME regions

2015-09-10 Thread Mark Rutland
> OK so what we could do is the following:
> 
> 8<--
> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
> index e8ca6eaedd02..39fa2a70a7f1 100644
> --- a/arch/arm64/kernel/efi.c
> +++ b/arch/arm64/kernel/efi.c
> @@ -233,6 +233,7 @@ void __init efi_init(void)
>  static bool __init efi_virtmap_init(void)
>  {
> efi_memory_desc_t *md;
> +   u64 prev_end = 0;
> 
> for_each_efi_memory_desc(, md) {
> u64 paddr, npages, size;
> @@ -256,13 +257,26 @@ static bool __init efi_virtmap_init(void)
>  * executable, everything else can be mapped with the XN bits
>  * set.
>  */
> -   if (!is_normal_ram(md))
> +   if (!is_normal_ram(md)) {
> prot = __pgprot(PROT_DEVICE_nGnRE);
> -   else if (md->type == EFI_RUNTIME_SERVICES_CODE)
> +   } else if (md->type == EFI_RUNTIME_SERVICES_CODE) {
> prot = PAGE_KERNEL_EXEC;
> -   else
> +   } else {
> +   /*
> +* If we are running with >4 KB pages and the current
> +* region shares a page frame with the preceding one,
> +* we should not map the leading page again since 
> doing
> +* so may take its executable permissions away.
> +*/
> +   if (PAGE_SIZE > EFI_PAGE_SIZE && paddr < prev_end) {
> +   paddr += PAGE_SIZE;
> +   size -= PAGE_SIZE;
> +   if (!size)
> +   continue;
> +   }
> prot = PAGE_KERNEL;
> -
> +   }
> +   prev_end = paddr + size;
> create_pgd_mapping(_mm, paddr, md->virt_addr, size, prot);
> }
> return true;
> 8<--
> 
> This will ensure that only the pages that are shared between 2 or more
> regions may have their permissions upgraded, but only if any of these
> regions requires it.
> 
> I prefer the much simpler previous version, though, and I think it is
> more suitable for -stable. I can always follow up with an improvement
> like this for v4.3-late.

Ok. Let's go with your previous version for now.

Could you put something in the commit message about this limitation, so
that we don't forget?

> >> >>   else
> >> >>   prot = PAGE_KERNEL;
> >> >> diff --git a/drivers/firmware/efi/libstub/arm-stub.c 
> >> >> b/drivers/firmware/efi/libstub/arm-stub.c
> >> >> index e29560e6b40b..cb4e9c4de952 100644
> >> >> --- a/drivers/firmware/efi/libstub/arm-stub.c
> >> >> +++ b/drivers/firmware/efi/libstub/arm-stub.c
> >> >> @@ -13,6 +13,7 @@
> >> >>   */
> >> >>
> >> >>  #include 
> >> >> +#include 
> >> >
> >> > Sort isn't an inline in this header. I thought it wasn't safe to call
> >> > arbitary kernel functions from the stub?
> >> >
> >>
> >> We call string functions, cache maintenance functions, libfdt
> >> functions etc etc so it seems not everyone got the memo :-)
> >>
> >> I agree that treating vmlinux both as a static library and as a
> >> payload from the stub's pov is a bit sloppy, and I do remember
> >> discussing this, but for the life of me, I can't remember the exact
> >> issue, other than the use of adrp/add and adrp/ldr pairs, which we
> >> fixed by setting the PE/COFF section alignment to 4 KB.
> >
> > I only had a vague recollection that there was a problem, which I
> > thought was more to do with potential use of absolute kernel virtual
> > addresses, which would be incorrect in the context of an EFI
> > application.
> >
> 
> That was it, of course. Unlike the x86 stub, which is built with -fPIC
> (as is the ARM decompressor, btw), the arm64 kernel is position
> dependent. Fortunately, the small code model is mostly position
> independent by default, but it would be good if we could spot any
> problems at build time.
> 
> > Digging a bit, the stub code itself is safe due to commit
> > f4f75ad5741fe033 ("efi: efistub: Convert into static library"), but that
> 
> libstub is linked into vmlinux so that does not make a different at all

Ok. I assumed that inter-stub references would still be relative, but I
have no idea what the linker would do.

> > We do seem to be ok so far, however. Maybe we just need to keep an eye
> > out.
> >
> 
> I'd much rather restrict the code that goes into the stub somehow than
> deal with any absolute references. Perhaps we could reuse some of the
> section mismatch code in some way to tag certain code as stub-safe and
> do a verification pass on the binary.

It would be great if we could, though I'm not really sure how that would
work.

Imagine stub_foo calls fdt_blah, which calls kern_baz by absolute
address. Normally the latter call is fine, but not when the original

Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters

2015-09-10 Thread Mark Rutland
> > C) When you could go:
> > 
> >DT -> Discover Xen -> Xen-specific stuff -> Xen-specific EFI/ACPI 
> > discovery
> 
> I take you mean discovering Xen with the usual Xen hypervisor node on
> device tree. I think that C) is a good option actually. I like it. Not
> sure why we didn't think about this earlier. Is there anything EFI or
> ACPI which is needed before Xen support is discovered by
> arch/arm64/kernel/setup.c:setup_arch -> xen_early_init()?

Currently lots (including the memory map). With the stuff to support
SPCR, the ACPI discovery would be moved before xen_early_init().

> If not, we could just go for this. A lot of complexity would go away.

I suspect this would still be fairly complex, but would at least prevent
the Xen-specific EFI handling from adversely affecting the native case.

> > D) If you want to be generic:
> >EFI -> EFI application -> EFI tables -> ACPI tables -> Xen-specific stuff
> >   \--/
> >(virtualize these, provide shims to Dom0, but handle
> > everything in Xen itself)
> 
> I think that this is good in theory but could turn out to be a lot of
> work in practice. We could probably virtualize the RuntimeServices but
> the BootServices are troublesome.

What's troublesome with the boot services?

What can't be simulated?

> > E) Partially-generic option:
> >EFI -> EFI application -> Xen detected by registered GUID -> 
> > Xen-specific EFI bootloader stuff -> OS in Xen-specific configuration
> > 
> > 
> > > > > In any case this should be separate from the shim ABI discussion.
> > > > 
> > > > I disagree; I think this is very much relevant to the ABI discussion.
> > > > That's not to say that I insist on a particular approach, but I think
> > > > that they need to be considered together.
> > > 
> > > Let's suppose Xen didn't expose any RuntimeServices at all, would that
> > > make it easier to discuss about the EFI stub parameters?
> > 
> > It would simply the protocol specific to Xen, certainly.
> > 
> > > In the grant scheme of things, they are not that important, as Ian
> > > wrote what is important is how to pass the RSDP.
> > 
> > Unfortunately we're still going to have to care about this eventually,
> > even if for something like kexec. So we still need to spec out the state
> > of things if this is going to be truly generic.
>  
> Fair enough. My position is that if we restrict this to RuntimeServices,
> it might be possible, but I still prefer C).

Regardless of what we do we still need a well-defined state here, which
brings us back to the initial problem eventually.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm64/efi: prefer AllocatePages() over efi_low_alloc() for vmlinux

2015-07-24 Thread Mark Rutland
On Fri, Jul 24, 2015 at 12:38:27PM +0100, Ard Biesheuvel wrote:
 When allocating memory for the kernel image, try the AllocatePages()
 boot service to obtain memory at the preferred offset of
 'dram_base + TEXT_OFFSET', and only revert to efi_low_alloc() if that
 fails. This is the only way to allocate at the base of DRAM if DRAM
 starts at 0x0, since efi_low_alloc() refuses to allocate at 0x0.
 
 Tested-by: Haojian Zhuang haojian.zhu...@linaro.org
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org

Reviewed-by: Mark Rutland mark.rutl...@arm.com

Mark.

 ---
 v2:
 - reshuffle code flow to make it more logical, and have only a single
   memcpy() invocation at the end of the function
 ---
  arch/arm64/kernel/efi-stub.c | 41 -
  1 file changed, 32 insertions(+), 9 deletions(-)
 
 diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c
 index f5374065ad53..816120ece6bc 100644
 --- a/arch/arm64/kernel/efi-stub.c
 +++ b/arch/arm64/kernel/efi-stub.c
 @@ -13,7 +13,7 @@
  #include asm/efi.h
  #include asm/sections.h
  
 -efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table,
 +efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,
   unsigned long *image_addr,
   unsigned long *image_size,
   unsigned long *reserve_addr,
 @@ -23,21 +23,44 @@ efi_status_t __init 
 handle_kernel_image(efi_system_table_t *sys_table,
  {
   efi_status_t status;
   unsigned long kernel_size, kernel_memsize = 0;
 + unsigned long nr_pages;
 + void *old_image_addr = (void *)*image_addr;
  
   /* Relocate the image, if required. */
   kernel_size = _edata - _text;
   if (*image_addr != (dram_base + TEXT_OFFSET)) {
   kernel_memsize = kernel_size + (_end - _edata);
 - status = efi_low_alloc(sys_table, kernel_memsize + TEXT_OFFSET,
 -SZ_2M, reserve_addr);
 +
 + /*
 +  * First, try a straight allocation at the preferred offset.
 +  * This will work around the issue where, if dram_base == 0x0,
 +  * efi_low_alloc() refuses to allocate at 0x0 (to prevent the
 +  * address of the allocation to be mistaken for a FAIL return
 +  * value or a NULL pointer). It will also ensure that, on
 +  * platforms where the [dram_base, dram_base + TEXT_OFFSET)
 +  * interval is partially occupied by the firmware (like on APM
 +  * Mustang), we can still place the kernel at the address
 +  * 'dram_base + TEXT_OFFSET'.
 +  */
 + *image_addr = *reserve_addr = dram_base + TEXT_OFFSET;
 + nr_pages = round_up(kernel_memsize, EFI_ALLOC_ALIGN) /
 +EFI_PAGE_SIZE;
 + status = efi_call_early(allocate_pages, EFI_ALLOCATE_ADDRESS,
 + EFI_LOADER_DATA, nr_pages,
 + (efi_physical_addr_t *)reserve_addr);
   if (status != EFI_SUCCESS) {
 - pr_efi_err(sys_table, Failed to relocate kernel\n);
 - return status;
 + kernel_memsize += TEXT_OFFSET;
 + status = efi_low_alloc(sys_table_arg, kernel_memsize,
 +SZ_2M, reserve_addr);
 +
 + if (status != EFI_SUCCESS) {
 + pr_efi_err(sys_table_arg, Failed to relocate 
 kernel\n);
 + return status;
 + }
 + *image_addr = *reserve_addr + TEXT_OFFSET;
   }
 - memcpy((void *)*reserve_addr + TEXT_OFFSET, (void *)*image_addr,
 -kernel_size);
 - *image_addr = *reserve_addr + TEXT_OFFSET;
 - *reserve_size = kernel_memsize + TEXT_OFFSET;
 + memcpy((void *)*image_addr, old_image_addr, kernel_size);
 + *reserve_size = kernel_memsize;
   }
  
  
 -- 
 1.9.1
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64/efi: prefer AllocatePages() over efi_low_alloc() for vmlinux

2015-07-24 Thread Mark Rutland
Hi Ard,

On Fri, Jul 24, 2015 at 10:41:53AM +0100, Ard Biesheuvel wrote:
 When allocating memory for the kernel image, try the AllocatePages()
 boot service to obtain memory at the preferred offset of
 'dram_base + TEXT_OFFSET', and only revert to efi_low_alloc() if that
 fails. This is the only way to allocate at the base of DRAM if DRAM
 starts at 0x0, since efi_low_alloc() refuses to allocate at 0x0.

 Tested-by: Haojian Zhuang haojian.zhu...@linaro.org
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
  arch/arm64/kernel/efi-stub.c | 47 
 
  1 file changed, 39 insertions(+), 8 deletions(-)
 
 diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c
 index f5374065ad53..c8df74d14368 100644
 --- a/arch/arm64/kernel/efi-stub.c
 +++ b/arch/arm64/kernel/efi-stub.c
 @@ -13,7 +13,7 @@
  #include asm/efi.h
  #include asm/sections.h
  
 -efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table,
 +efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,

Any reason for the _arg addition?

   unsigned long *image_addr,
   unsigned long *image_size,
   unsigned long *reserve_addr,
 @@ -23,21 +23,52 @@ efi_status_t __init 
 handle_kernel_image(efi_system_table_t *sys_table,
  {
   efi_status_t status;
   unsigned long kernel_size, kernel_memsize = 0;
 + unsigned long nr_pages;
  
   /* Relocate the image, if required. */
   kernel_size = _edata - _text;
   if (*image_addr != (dram_base + TEXT_OFFSET)) {
   kernel_memsize = kernel_size + (_end - _edata);
 - status = efi_low_alloc(sys_table, kernel_memsize + TEXT_OFFSET,
 -SZ_2M, reserve_addr);
 +
 + //
 + // First, try a straight allocation at the preferred offset.
 + // This will work around the issue where, if dram_base == 0x0,
 + // efi_low_alloc() refuses to allocate at 0x0 (to prevent the
 + // address of the allocation to be mistaken for a FAIL return
 + // value or a NULL pointer). It will also ensure that, on
 + // platforms where the [dram_base, dram_base + TEXT_OFFSET)
 + // interval is partially occupied by the firmware (like on APM
 + // Mustang), we can still place the kernel at the address
 + // 'dram_base + TEXT_OFFSET'.
 + //

/*
 * Nit: please use the standard comment style
 */

 + *reserve_addr = dram_base + TEXT_OFFSET;
 + nr_pages = round_up(kernel_memsize, EFI_ALLOC_ALIGN) /
 +EFI_PAGE_SIZE;
 + status = efi_call_early(allocate_pages, EFI_ALLOCATE_ADDRESS,
 + EFI_LOADER_DATA, nr_pages,
 + (efi_physical_addr_t *)reserve_addr);
 + if (status == EFI_SUCCESS) {
 + memcpy((void *)*reserve_addr, (void *)*image_addr,
 +kernel_size);
 + *image_addr = *reserve_addr;
 + *reserve_size = kernel_memsize;
 + } else {
 + status = efi_low_alloc(sys_table_arg,
 +kernel_memsize + TEXT_OFFSET,
 +SZ_2M, reserve_addr);
 +
 + if (status == EFI_SUCCESS) {
 + memcpy((void *)*reserve_addr + TEXT_OFFSET,
 +(void *)*image_addr,
 +kernel_size);
 + *image_addr = *reserve_addr + TEXT_OFFSET;
 + *reserve_size = kernel_memsize + TEXT_OFFSET;
 + }
 + }
   if (status != EFI_SUCCESS) {
 - pr_efi_err(sys_table, Failed to relocate kernel\n);
 + pr_efi_err(sys_table_arg, Failed to relocate 
 kernel\n);
   return status;
   }
 - memcpy((void *)*reserve_addr + TEXT_OFFSET, (void *)*image_addr,
 -kernel_size);

Could we have a new_image_addr assigned in each case, and keep the
common memcpy here, followed by assignment to *image_addr? That would
save a couple of lines and guarantee the two cases stay in sync.

Otherwise this looks good to me.

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64/efi: use UEFI ResetSystem() Runtime Service for system reset

2015-03-05 Thread Mark Rutland
Hi Ard,

On Thu, Mar 05, 2015 at 12:51:11PM +, Ard Biesheuvel wrote:
 If UEFI Runtime Services are available, the ResetSystem() service should
 be preferred over direct PSCI calls or other methods to reset the system.
 The reason is that the UpdateCapsule() UEFI Runtime Service, which is used
 to perform firmware updates, relies on this.
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
 I sent roughly the same patch ~6 months ago, but at the time, we were still
 waiting for the restart notifier call chain patches to land. Since that code
 got rejected, I am proposing this again. Note that efi_enabled(x) always
 evaluates to 'false' on !CONFIG_EFI.

Also, efi_reboot is a static inline for !CONFIG_EFI, so I can't see any
possibility of a build failure.

 This fixes reboot on my Seattle [although it doesn't make it any faster :-)]
 
  arch/arm64/kernel/process.c | 9 +
  1 file changed, 9 insertions(+)
 
 diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
 index fde9923af859..a52bc0c316a8 100644
 --- a/arch/arm64/kernel/process.c
 +++ b/arch/arm64/kernel/process.c
 @@ -21,6 +21,7 @@
  #include stdarg.h
  
  #include linux/compat.h
 +#include linux/efi.h
  #include linux/export.h
  #include linux/sched.h
  #include linux/kernel.h
 @@ -150,6 +151,14 @@ void machine_restart(char *cmd)
   local_irq_disable();
   smp_send_stop();
  
 + /*
 +  * Prefer reboot via EFI if available, so that capsule updates [which
 +  * rely on UEFI's ResetSystem() being called with the return value of
 +  * UpdateCapsule()] have a chance of working as expected.
 +  */
 + if (efi_enabled(EFI_RUNTIME_SERVICES))
 + efi_reboot(reboot_mode, NULL);

I expect that the particulars of the UpdateCapsule() will be handled
within efi_reboot and won't require any additions here. So the comment
could just be trimmed to say that UpdateCapsule() depends on the system
being reset with ResetSystem().

Also, we need to make sure we call efi_poweroff to make UpdateCapsule()
work when shutting the machine down (behind the scenes efi_poweroff
calls ResetSystem(EfiResetShutdown, ...)).

For that I think adding the following to arch/arm64/kernel/efi.c is
sufficient:

/*
 * UpdateCapsule() depends on the system being shutdown via
 * ResetSystem().
 */
bool efi_poweroff_required(void)
{
return efi_enabled(EFI_RUNTIME_SERVICES);
}

I've given the patch a spin (with and without that addition) on Juno and
Seattle. So with that folded in:

Tested-by: Mark Rutland mark.rutl...@arm.com
Reviewed-by: Mark Rutland mark.rutl...@arm.com

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi: fix boundary checking in efi_high_alloc()

2015-02-23 Thread Mark Rutland
Hi Matt,

 From 1e7295b5d4c5226a160a9167e61b581e388f7f9a Mon Sep 17 00:00:00 2001
 From: Yinghai Lu ying...@kernel.org
 Date: Thu, 19 Feb 2015 20:18:03 -0800
 Subject: [PATCH] efi/libstub: Fix boundary checking in efi_high_alloc()
 
 While adding support loading kernel and initrd above 4G to grub2 in legacy
 mode, I was referring to efi_high_alloc().
 That will allocate buffer for kernel and then initrd, and initrd will
 use kernel buffer start as limit.
 
 During testing found two buffers will be overlapped when initrd size is
 very big like 400M.
 
 It turns out efi_high_alloc() boundary checking is not right.
 end - size will be the new start, and should not compare new
 start with max, we need to make sure end is smaller than max.
 
 [ Basically, with the current efi_high_alloc() code it's possible to
   allocate memory above 'max', because efi_high_alloc() doesn't check
   that the tail of the allocation is below 'max'.
 
   If you have an EFI memory map with a single entry that looks like so,
 
[0xc000-0xc0004000]
 
   And want to allocate 0x3000 bytes below 0xc0003000 the current code
   will allocate [0xc0001000-0xc0004000], not [0xc000-0xc0003000]
   like you would expect. - Matt ]
 
 Signed-off-by: Yinghai Lu ying...@kernel.org
 Cc: sta...@vger.kernel.org

I've convinced myself that the new logic is sound, and with this patch
applied atop of v4.0-rc1 I don't see regressions on the platforms I have
access to. So:

Reviewed-by: Mark Rutland mark.rutl...@arm.com
Tested-by: Mark Rutland mark.rutl...@arm.com

Ard, Leif:

On a related note, I think that the logic for deciding where to place
the kernel and DTB isn't quite right. The kernel needs to be in the same
naturally-aligned 512M region as the DTB in order to be able to map it,
but the kernel could get relocated above the max address we'll consider
for the DTB if there isn't sufficient space for the kernel between
dram_base and dram_base + 512M.

We should try to use the fixmap to map the DTB so it can be located
anywhere in physical memory. That will make things easier for the stub
and other loaders.

Thanks,
Mark.

 Signed-off-by: Matt Fleming matt.flem...@intel.com
 ---
  drivers/firmware/efi/libstub/efi-stub-helper.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c 
 b/drivers/firmware/efi/libstub/efi-stub-helper.c
 index 9bd9fbb5bea8..c927bccd92bd 100644
 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
 +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
 @@ -170,12 +170,12 @@ again:
   start = desc-phys_addr;
   end = start + desc-num_pages * (1UL  EFI_PAGE_SHIFT);
  
 - if ((start + size)  end || (start + size)  max)
 - continue;
 -
 - if (end - size  max)
 + if (end  max)
   end = max;
  
 + if ((start + size)  end)
 + continue;
 +
   if (round_down(end - size, align)  start)
   continue;
  
 -- 
 1.9.3
 
 -- 
 Matt Fleming, Intel Open Source Technology Center
 --
 To unsubscribe from this list: send the line unsubscribe linux-efi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi: get_memory_map: add sufficient slack for memory descriptors

2015-02-13 Thread Mark Rutland
On Fri, Feb 13, 2015 at 04:04:48PM +, Matt Fleming wrote:
 On Thu, 12 Feb, at 11:31:02PM, Ard Biesheuvel wrote:
  
  Actually, looking again at the original patch, it appears that my
  analysis was incorrect regarding the possibility that the loop would
  never terminate. The only thing that could happen if desc_size 
  sizeof(efi_memory_desc_t) is that you need two iterations instead of
  one to get a pool allocation that is of sufficient size.
  So perhaps it is better to just revert the patch.
  
  My apologies for the hassle.
 
 This is what I've got queued up,

Looks sane to me:

Acked-by: Mark Rutland mark.rutl...@arm.com

Thanks,
Mark.

 
 ---
 
 From 3f281b98ffc99e604a3988aa93304a3a591eeeb5 Mon Sep 17 00:00:00 2001
 From: Matt Fleming matt.flem...@intel.com
 Date: Fri, 13 Feb 2015 15:46:56 +
 Subject: [PATCH] Revert efi/libstub: Call get_memory_map() to obtain map and
  desc sizes
 
 This reverts commit d1a8d66b9177105e898e73716f97eb61842c457a.
 
 Ard reported a boot failure when running UEFI under Qemu and Xen and
 experimenting with various Tianocore build options,
 
  As it turns out, when allocating room for the UEFI memory map using
   UEFI's AllocatePool (), it may result in two new memory map entries
   being created, for instance, when using Tianocore's preallocated region
   feature. For example, the following region
 
   0x5ead5000-0x5ebf [Conventional Memory|   |  |  |  |  
 |WB|WT|WC|UC]
 
   may be split like this
 
   0x5ead5000-0x5eae2fff [Conventional Memory|   |  |  |  |  
 |WB|WT|WC|UC]
   0x5eae3000-0x5eae4fff [Loader Data|   |  |  |  |  
 |WB|WT|WC|UC]
   0x5eae5000-0x5ebf [Conventional Memory|   |  |  |  |  
 |WB|WT|WC|UC]
 
   if the preallocated Loader Data region was chosen to be right in the
   middle of the original free space.
 
   After patch d1a8d66b9177 (efi/libstub: Call get_memory_map() to
   obtain map and desc sizes), this is not being dealt with correctly
   anymore, as the existing logic to allocate room for a single additional
   entry has become insufficient.
 
 Mark requested to reinstate the old loop we had before commit
 d1a8d66b9177, which grows the memory map buffer until it's big enough to
 hold the EFI memory map.
 
 Cc: Ard Biesheuvel ard.biesheu...@linaro.org
 Cc: Mark Rutland mark.rutl...@arm.com
 Signed-off-by: Matt Fleming matt.flem...@intel.com
 ---
  drivers/firmware/efi/libstub/efi-stub-helper.c | 16 ++--
  1 file changed, 6 insertions(+), 10 deletions(-)
 
 diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c 
 b/drivers/firmware/efi/libstub/efi-stub-helper.c
 index d073e3946383..9bd9fbb5bea8 100644
 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
 +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
 @@ -66,29 +66,25 @@ efi_status_t efi_get_memory_map(efi_system_table_t 
 *sys_table_arg,
   unsigned long key;
   u32 desc_version;
  
 - *map_size = 0;
 - *desc_size = 0;
 - key = 0;
 - status = efi_call_early(get_memory_map, map_size, NULL,
 - key, desc_size, desc_version);
 - if (status != EFI_BUFFER_TOO_SMALL)
 - return EFI_LOAD_ERROR;
 -
 + *map_size = sizeof(*m) * 32;
 +again:
   /*
* Add an additional efi_memory_desc_t because we're doing an
* allocation which may be in a new descriptor region.
*/
 - *map_size += *desc_size;
 + *map_size += sizeof(*m);
   status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
   *map_size, (void **)m);
   if (status != EFI_SUCCESS)
   goto fail;
  
 + *desc_size = 0;
 + key = 0;
   status = efi_call_early(get_memory_map, map_size, m,
   key, desc_size, desc_version);
   if (status == EFI_BUFFER_TOO_SMALL) {
   efi_call_early(free_pool, m);
 - return EFI_LOAD_ERROR;
 + goto again;
   }
  
   if (status != EFI_SUCCESS)
 -- 
 1.9.3
 
 -- 
 Matt Fleming, Intel Open Source Technology Center
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi: get_memory_map: add sufficient slack for memory descriptors

2015-02-12 Thread Mark Rutland
On Thu, Feb 12, 2015 at 05:24:19AM +, Ard Biesheuvel wrote:
 As it turns out, when allocating room for the UEFI memory map using
 UEFI's AllocatePool (), it may result in two new memory map entries
 being created, for instance, when using Tianocore's preallocated region
 feature. For example, the following region
 
 0x5ead5000-0x5ebf [Conventional Memory|   |  |  |  |   
 |WB|WT|WC|UC]
 
 may be split like this
 
 0x5ead5000-0x5eae2fff [Conventional Memory|   |  |  |  |   
 |WB|WT|WC|UC]
 0x5eae3000-0x5eae4fff [Loader Data|   |  |  |  |   
 |WB|WT|WC|UC]
 0x5eae5000-0x5ebf [Conventional Memory|   |  |  |  |   
 |WB|WT|WC|UC]
 
 if the preallocated Loader Data region was chosen to be right in the
 middle of the original free space.
 
 After patch d1a8d66b9177 (efi/libstub: Call get_memory_map() to
 obtain map and desc sizes), this is not being dealt with correctly
 anymore, as the existing logic to allocate room for a single additional
 entry has become insufficient.
 
 So instead, add room for two additional entries instead.
 
 Fixes: d1a8d66b9177 (efi/libstub: Call get_memory_map() to obtain map and 
 desc sizes)
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
  drivers/firmware/efi/libstub/efi-stub-helper.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c 
 b/drivers/firmware/efi/libstub/efi-stub-helper.c
 index af5d63c7cc53..ca0b07ed3b14 100644
 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
 +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
 @@ -84,10 +84,10 @@ efi_status_t efi_get_memory_map(efi_system_table_t 
 *sys_table_arg,
   return EFI_LOAD_ERROR;
  
   /*
 -  * Add an additional efi_memory_desc_t because we're doing an
 -  * allocation which may be in a new descriptor region.
 +  * Add room for two additional efi_memory_desc_t entries because we're
 +  * doing an allocation which may be in a new descriptor region.

It might be worth noting that a existing regions can be
split/reorganised here, otherwise it's a little difficult to deduce from
the comment why to regions are needed.

*/
 - *map_size += *desc_size;
 + *map_size += *desc_size * 2;

Can we forsee any cases where we might need more than two additional
descs? Is it perhaps adding a little more slack now?

Otherwise this looks fine to me.

Thanks,
Mark.

   status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
   *map_size, (void **)m);
   if (status != EFI_SUCCESS)
 -- 
 1.8.3.2
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi: get_memory_map: add sufficient slack for memory descriptors

2015-02-12 Thread Mark Rutland
On Thu, Feb 12, 2015 at 10:39:46AM +, Ard Biesheuvel wrote:
 On 12 February 2015 at 18:22, Mark Rutland mark.rutl...@arm.com wrote:
  On Thu, Feb 12, 2015 at 05:24:19AM +, Ard Biesheuvel wrote:
  As it turns out, when allocating room for the UEFI memory map using
  UEFI's AllocatePool (), it may result in two new memory map entries
  being created, for instance, when using Tianocore's preallocated region
  feature. For example, the following region
 
  0x5ead5000-0x5ebf [Conventional Memory|   |  |  |  |   
  |WB|WT|WC|UC]
 
  may be split like this
 
  0x5ead5000-0x5eae2fff [Conventional Memory|   |  |  |  |   
  |WB|WT|WC|UC]
  0x5eae3000-0x5eae4fff [Loader Data|   |  |  |  |   
  |WB|WT|WC|UC]
  0x5eae5000-0x5ebf [Conventional Memory|   |  |  |  |   
  |WB|WT|WC|UC]
 
  if the preallocated Loader Data region was chosen to be right in the
  middle of the original free space.
 
  After patch d1a8d66b9177 (efi/libstub: Call get_memory_map() to
  obtain map and desc sizes), this is not being dealt with correctly
  anymore, as the existing logic to allocate room for a single additional
  entry has become insufficient.
 
  So instead, add room for two additional entries instead.
 
  Fixes: d1a8d66b9177 (efi/libstub: Call get_memory_map() to obtain map and 
  desc sizes)
  Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
  ---
   drivers/firmware/efi/libstub/efi-stub-helper.c | 6 +++---
   1 file changed, 3 insertions(+), 3 deletions(-)
 
  diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c 
  b/drivers/firmware/efi/libstub/efi-stub-helper.c
  index af5d63c7cc53..ca0b07ed3b14 100644
  --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
  +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
  @@ -84,10 +84,10 @@ efi_status_t efi_get_memory_map(efi_system_table_t 
  *sys_table_arg,
return EFI_LOAD_ERROR;
 
/*
  -  * Add an additional efi_memory_desc_t because we're doing an
  -  * allocation which may be in a new descriptor region.
  +  * Add room for two additional efi_memory_desc_t entries because 
  we're
  +  * doing an allocation which may be in a new descriptor region.
 
  It might be worth noting that a existing regions can be
  split/reorganised here, otherwise it's a little difficult to deduce from
  the comment why to regions are needed.
 
 
 OK
 
 */
  - *map_size += *desc_size;
  + *map_size += *desc_size * 2;
 
  Can we forsee any cases where we might need more than two additional
  descs? Is it perhaps adding a little more slack now?
 
 
 I don't see how doing a single allocation could result in a single
 free region to be split into more than 1 occupied region + 2 free
 regions.
 So no, I don't think it is ...

Fair enough. If we're surprised later we can always fix things up.

With the comment fix up:

Acked-by: Mark Rutland mark.rutl...@arm.com

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi: get_memory_map: add sufficient slack for memory descriptors

2015-02-12 Thread Mark Rutland
On Thu, Feb 12, 2015 at 02:56:51PM +, Ard Biesheuvel wrote:
 On 12 February 2015 at 22:47, Matt Fleming m...@codeblueprint.co.uk wrote:
  On Thu, 12 Feb, at 06:39:46PM, Ard Biesheuvel wrote:
 
  I don't see how doing a single allocation could result in a single
  free region to be split into more than 1 occupied region + 2 free
  regions.
  So no, I don't think it is ...
 
  I don't think that's a guarantee we can make, nor is it something we
  should rely upon.
 
  Please explain the user-visible failure that this patch fixes. Does your
  machine refuse to boot?
 
 I am running UEFI under QEMU and Xen primarily at the moment, and
 experimenting with various build options in Tianocore, One of the
 options is preallocating and freeing blocks of various memory types,
 in a way that should result in the final number of distinct regions to
 be much lower. It could result however in a free memory region to be
 carved up in three instead of two, and that is a failure I have seen
 occur.

The simple answer is that the machine will fail to boot, beause the
efi_get_memory_map helper will give up after one go, and propagate the
error. The arm-stub will give up when the error is encountered.

  Why is the 'goto again' loop insufficient in
  handling this scenario?
 
 
 Yes, that should solve it as well, so if you prefer I reinstate that,
 I can respin the patch. There is a theoretical possibility that it
 would take more than just one more iteration, but that is highly
 unlikely and it should still always complete.

Please reinstate the loop. It will make this far less fragile.

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64/efi: handle potential failure to remap memory map

2015-01-15 Thread Mark Rutland
On Thu, Jan 15, 2015 at 12:01:06PM +, Ard Biesheuvel wrote:
 When remapping the UEFI memory map using ioremap_cache(), we
 have to deal with potential failure. Note that, even if the
 common case is for ioremap_cache() to return the existing linear
 mapping of the memory map, we cannot rely on that to be always the
 case, e.g., in the presence of a mem= kernel parameter.
 
 At the same time, remove a stale comment and move the memmap code
 together.
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org

Acked-by: Mark Rutland mark.rutl...@arm.com

This should probably be picked up before my mem= patch, which I will be
resending shortly. I'll mention that when posting.

I guess Catalin should pick this up given the other portions are in the
arm64 tree, and this falls under arch/arm64.

Thanks,
Mark.

 ---
  arch/arm64/kernel/efi.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)
 
 diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
 index a98415b5979c..c9cb0fbe7aa4 100644
 --- a/arch/arm64/kernel/efi.c
 +++ b/arch/arm64/kernel/efi.c
 @@ -229,19 +229,21 @@ static int __init arm64_enable_runtime_services(void)
   return -1;
   }
  
 - mapsize = memmap.map_end - memmap.map;
 -
   if (efi_runtime_disabled()) {
   pr_info(EFI runtime services will be disabled.\n);
   return -1;
   }
  
   pr_info(Remapping and enabling EFI services.\n);
 - /* replace early memmap mapping with permanent mapping */
 +
 + mapsize = memmap.map_end - memmap.map;
   memmap.map = (__force void *)ioremap_cache((phys_addr_t)memmap.phys_map,
  mapsize);
 + if (!memmap.map) {
 + pr_err(Failed to remap EFI memory map\n);
 + return -1;
 + }
   memmap.map_end = memmap.map + mapsize;
 -
   efi.memmap = memmap;
  
   efi.systab = (__force void *)ioremap_cache(efi_system_table,
 -- 
 1.8.3.2
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi: stub: call get_memory_map() to obtain map and desc sizes

2015-01-09 Thread Mark Rutland
On Thu, Jan 08, 2015 at 07:09:22PM +, Ard Biesheuvel wrote:
 On 8 January 2015 at 19:04, Mark Rutland mark.rutl...@arm.com wrote:
  Hi Ard,
 
  On Thu, Jan 08, 2015 at 05:51:47PM +, Ard Biesheuvel wrote:
  This fixes two minor issues in the implementation of get_memory_map():
  - Currently, it assumes that sizeof(efi_memory_desc_t) == desc_size,
which is usually true, but not mandated by the spec. (This was added
intentionally to allow future additions to the definition of
efi_memory_desc_t). The way the loop is implemented currently, the
added slack space may be insufficient if desc_size is larger, which in
some corner cases could result in the loop never terminating.
  - It allocates 32 efi_memory_desc_t entries first (again, using the size
of the struct instead of desc_size), and frees and reallocates if it
turns out to be insufficient. Few implementations of UEFI have such small
memory maps, which results in a unnecessary allocate/free pair on each
invocation.
 
  Fix this by calling the get_memory_map() boot service first with a '0'
  input value for map size to retrieve the map size and desc size from the
  firmware and only then perform the allocation, using desc_size rather
  than sizeof(efi_memory_desc_t).
 
  Is the desc_size guaranteed to be set up correctly if the size is too
  small?
 
  As far as I can see, for that case the spec only mandates that
  MemoryMapSize is updated nd EFI_BUFFER_TOO_SMALL is returned.
 
  It's not clear to me whether DescriptorSize or DescriptorVersion are
  initialised in cases other than success.
 
 
 The way I read it, descriptor size and descriptor version are always
 returned, e.g., as opposed to MapKey, which is only returned on
 success, and the spec mentions that specifically.

I agree that that would be the sensible reading of the spec. I just fear
that there's room for an implementor to read it slightly differently,
and cause pain for us.

 We could ask for clarification just to be sure.

I think we should.

Given it could take a while for any conclusion to be reached and
published, I'm happy to go with the patch below in the meantime, so long
as the issue gets raised.

Cheers,
Mark.

 
 -- 
 Ard.
 
 
  Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
  ---
   drivers/firmware/efi/libstub/efi-stub-helper.c | 16 ++--
   1 file changed, 10 insertions(+), 6 deletions(-)
 
  diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c 
  b/drivers/firmware/efi/libstub/efi-stub-helper.c
  index e766df60fbfb..caf91eab0bfc 100644
  --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
  +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
  @@ -75,25 +75,29 @@ efi_status_t efi_get_memory_map(efi_system_table_t 
  *sys_table_arg,
unsigned long key;
u32 desc_version;
 
  - *map_size = sizeof(*m) * 32;
  -again:
  + *map_size = 0;
  + *desc_size = 0;
  + key = 0;
  + status = efi_call_early(get_memory_map, map_size, NULL,
  + key, desc_size, desc_version);
  + if (status != EFI_BUFFER_TOO_SMALL)
  + return EFI_LOAD_ERROR;
  +
/*
 * Add an additional efi_memory_desc_t because we're doing an
 * allocation which may be in a new descriptor region.
 */
  - *map_size += sizeof(*m);
  + *map_size += *desc_size;
status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
*map_size, (void **)m);
if (status != EFI_SUCCESS)
goto fail;
 
  - *desc_size = 0;
  - key = 0;
status = efi_call_early(get_memory_map, map_size, m,
key, desc_size, desc_version);
if (status == EFI_BUFFER_TOO_SMALL) {
efi_call_early(free_pool, m);
  - goto again;
  + return EFI_LOAD_ERROR;
}
 
if (status != EFI_SUCCESS)
  --
  1.8.3.2
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-efi in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-efi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 6/8] arm64/efi: move SetVirtualAddressMap() to UEFI stub

2015-01-05 Thread Mark Rutland
Hi Ard,

I have a few (minor) comments below.

On Mon, Dec 22, 2014 at 10:59:02AM +, Ard Biesheuvel wrote:
 In order to support kexec, the kernel needs to be able to deal with the
 state of the UEFI firmware after SetVirtualAddressMap() has been called.
 To avoid having separate code paths for non-kexec and kexec, let's move
 the call to SetVirtualAddressMap() to the stub: this will guarantee us
 that it will only be called once (since the stub is not executed during
 kexec), and ensures that the UEFI state is identical between kexec and
 normal boot.
 
 This implies that the layout of the virtual mapping needs to be created
 by the stub as well. All regions are rounded up to a naturally aligned
 multiple of 64 KB (for compatibility with 64k pages kernels) and recorded
 in the UEFI memory map. The kernel proper reads those values and installs
 the mappings in a dedicated set of page tables that are swapped in during
 UEFI Runtime Services calls.
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
  arch/arm64/include/asm/efi.h   |  20 +++-
  arch/arm64/kernel/efi.c| 223 
 -
  arch/arm64/kernel/setup.c  |   1 +
  drivers/firmware/efi/libstub/fdt.c | 137 ++-
  4 files changed, 270 insertions(+), 111 deletions(-)

[...]

 +static void efi_set_pgd(struct mm_struct *mm)
 +{
 +   cpu_switch_mm(mm-pgd, mm);
 +   flush_tlb_all();
 +   if (icache_is_aivivt())
 +   __flush_icache_all();
 +}

Do we have any idea how often we call runtime services?

I assume not all that often (read the RTC at boot, set/get variables).

If we're nuking the TLBs and I-cache a lot we'll probably need to
reserve an asid for the EFI virtmap.

[...]

 +void __init efi_virtmap_init(void)
 +{
 +   efi_memory_desc_t *md;
 +
 +   if (!efi_enabled(EFI_BOOT))
 +   return;
 +
 +   for_each_efi_memory_desc(memmap, md) {
 +   u64 paddr, npages, size;
 +   pgprot_t prot;
 +
 +   if (!(md-attribute  EFI_MEMORY_RUNTIME))
 +   continue;
 +   if (WARN(md-virt_addr == 0,
 +UEFI virtual mapping incomplete or missing -- no 
 entry found for 0x%llx\n,
 +md-phys_addr))
 +   return;

I wonder if we might want to use a different address to signal a bad
mapping (e.g. 0UL-1 as it's not even EFI_PAGE_SIZE aligned), just in
case we have to handle a valid use of zero in future for some reason --
perhaps coming from another OS.

 +
 +   paddr = md-phys_addr;
 +   npages = md-num_pages;
 +   memrange_efi_to_native(paddr, npages);
 +   size = npages  PAGE_SHIFT;
 +
 +   pr_info(  EFI remap 0x%012llx = %p\n,

Why not use %016llx? We might only have 48-bit PAs currently, but we may
as well keep the VA and PA equally long when printing out -- that'll
also help to identify issues with garbage in the upper 16 bits of the
PA field.

[...]

 +/*
 + * This is the base address at which to start allocating virtual memory 
 ranges
 + * for UEFI Runtime Services. This is a userland range so that we can use any
 + * allocation we choose, and eliminate the risk of a conflict after kexec.
 + */
 +#define EFI_RT_VIRTUAL_BASE0x4000

Nit: I'm not keen on the term userland here. You can map stuff to EL0
in the high half of the address space if you wanted, so TTBR0/TTBR1
aren't architecturally user/kernel.

s/userland/low half/, perhaps?

It might also be worth pointing out that the arbitrary base address
isn't zero so as to be less likely to be an idmap.

 +
 +static void update_memory_map(efi_memory_desc_t *memory_map,
 + unsigned long map_size, unsigned long desc_size,
 + int *count)
 +{
 +   u64 efi_virt_base = EFI_RT_VIRTUAL_BASE;
 +   efi_memory_desc_t *out = (void *)memory_map + map_size;
 +   int l;
 +
 +   for (l = 0; l  map_size; l += desc_size) {
 +   efi_memory_desc_t *in = (void *)memory_map + l;
 +   u64 paddr, size;
 +
 +   if (!(in-attribute  EFI_MEMORY_RUNTIME))
 +   continue;
 +
 +   /*
 +* Make the mapping compatible with 64k pages: this allows
 +* a 4k page size kernel to kexec a 64k page size kernel and
 +* vice versa.
 +*/
 +   paddr = round_down(in-phys_addr, SZ_64K);
 +   size = round_up(in-num_pages * EFI_PAGE_SIZE +
 +   in-phys_addr - paddr, SZ_64K);
 +
 +   /*
 +* Avoid wasting memory on PTEs by choosing a virtual base 
 that
 +* is compatible with section mappings if this region has the
 +* appropriate size and physical alignment. (Sections are 2 MB
 +* on 4k granule kernels)
 +  

Re: [PATCH v2 09/10] arm64/efi: ignore unusable regions instead of reserving them

2014-11-11 Thread Mark Rutland
On Tue, Nov 11, 2014 at 05:12:09PM +, Mark Salter wrote:
 On Tue, 2014-11-11 at 10:42 -0500, Mark Salter wrote:
  On Mon, 2014-11-10 at 08:31 +0100, Ard Biesheuvel wrote:
   On 10 November 2014 05:11, Mark Salter msal...@redhat.com wrote:
On Thu, 2014-11-06 at 15:13 +0100, Ard Biesheuvel wrote:
This changes the way memblocks are installed based on the contents
   of
the UEFI memory map. Formerly, all regions would be
   memblock_add()'ed,
after which unusable regions would be memblock_reserve()'d as well.
To simplify things, but also to allow access to the unusable
   regions
through mmap(/dev/mem), even with CONFIG_STRICT_DEVMEM set, change
this so that only usable regions are memblock_add()'ed in the first
place.
   
This patch is crashing 64K pagesize kernels during boot. I'm not
   exactly
sure why, though. Here is what I get on an APM Mustang box:
   
   
   Ah, yes, I meant to mention this patch
   
   https://git.kernel.org/cgit/linux/kernel/git/glikely/linux.git/commit/?id=8cccffc52694938fc88f3d90bc7fed8460e27191
   
   in the cover letter, which addresses this issue at least for the DT
   case.
   
  
  That isn't the problem. In general, with 64K kernel pages, you can't be
  sure if you leave something you need out of the kernel linear mapping.

Regardless of 64k pages you can never assume that anything will be in
the linear mapping due to the current relationship between the start of
the linear map and the address the kernel was loaded at.

  If you have Loader Code/Data regions begin and/or end at something other
  than a 64K boundary and that region is adjacent to a region not being
  added, then you end up leaving out the unaligned bits from the linear
  mapping. This could be bits of the initramfs or devicetree.
  
  What I don't get with this failure is that it is an alignment fault
  which should be masked at EL1 for the kernel. The same unaligned
  access happens without this patch and it doesn't generate a fault.
  
 
 Ah, but unaligned accesses are not ignored for device memory.
 I have this in include/acpi/acpi_io.h:
 
 static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
   acpi_size size)
 {
 #ifdef CONFIG_ARM64
   if (!page_is_ram(phys  PAGE_SHIFT))
   return ioremap(phys, size);
 #endif
 
return ioremap_cache(phys, size);
 }
 
 Because the table isn't in the linear mapping, it fails the
 page_is_ram() test and it gits mapped with ioremap() leading to
 the alignment fault.
 
 If I take out the code inside the #ifdef, I get a different
 fault:
 
 [0.350057] Unhandled fault: synchronous external abort (0x9610) at 
 0xfefae6f4
 [0.358704] pgd = fe000116
 [0.362276] [fefae6f4] *pgd=004001370003, 
 *pud=004001370003, *pmd=004001370003, *pte=02c00040011a0713
 [0.373746] Internal error: : 9610 [#1] SMP
 [0.378484] Modules linked in:
 [0.381601] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc4+ #15
 [0.388248] Hardware name: APM X-Gene Mustang board (DT)
 [0.393738] task: fe03dbe1 ti: fe03dbf0 task.ti: 
 fe03dbf0
 [0.401503] PC is at acpi_ex_system_memory_space_handler+0x238/0x2e0
 [0.408160] LR is at acpi_ex_system_memory_space_handler+0x130/0x2e0
 
 That happens because AML is trying to access a hardware register
 which has been mapped as normal memory.

Which is why removing the check in the ifdef is completely nonsensical.
We already knew we can't map everything cacheable -- regardless of what
AML does the CPU can prefetch from anything mapped cacheable (or simply
executable) at any time.

 So, we need a way to tell a table in ram from an io address in AML.
 And page_is_ram() no longer cuts it if the tables are not in the
 linear mapping.

If the kernel were loaded at an address above the tables, it would fail
similarly. So the page_is_ram was never sufficient to ensure tables
would be mapped cacheable.

As we haven't decoupled the kernel text mapping from the linear mapping,
and that doesn't look to be happening any time soon, we can't fix up
page_is_ram -- we need something else entirely.

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 09/10] arm64/efi: ignore unusable regions instead of reserving them

2014-11-11 Thread Mark Rutland
On Tue, Nov 11, 2014 at 05:55:24PM +, Ard Biesheuvel wrote:
 On 11 November 2014 18:44, Mark Rutland mark.rutl...@arm.com wrote:
  On Tue, Nov 11, 2014 at 05:12:09PM +, Mark Salter wrote:
  On Tue, 2014-11-11 at 10:42 -0500, Mark Salter wrote:
   On Mon, 2014-11-10 at 08:31 +0100, Ard Biesheuvel wrote:
On 10 November 2014 05:11, Mark Salter msal...@redhat.com wrote:
 On Thu, 2014-11-06 at 15:13 +0100, Ard Biesheuvel wrote:
 This changes the way memblocks are installed based on the contents
of
 the UEFI memory map. Formerly, all regions would be
memblock_add()'ed,
 after which unusable regions would be memblock_reserve()'d as well.
 To simplify things, but also to allow access to the unusable
regions
 through mmap(/dev/mem), even with CONFIG_STRICT_DEVMEM set, change
 this so that only usable regions are memblock_add()'ed in the first
 place.

 This patch is crashing 64K pagesize kernels during boot. I'm not
exactly
 sure why, though. Here is what I get on an APM Mustang box:

   
Ah, yes, I meant to mention this patch
   
https://git.kernel.org/cgit/linux/kernel/git/glikely/linux.git/commit/?id=8cccffc52694938fc88f3d90bc7fed8460e27191
   
in the cover letter, which addresses this issue at least for the DT
case.
   
  
   That isn't the problem. In general, with 64K kernel pages, you can't be
   sure if you leave something you need out of the kernel linear mapping.
 
  Regardless of 64k pages you can never assume that anything will be in
  the linear mapping due to the current relationship between the start of
  the linear map and the address the kernel was loaded at.
 
   If you have Loader Code/Data regions begin and/or end at something other
   than a 64K boundary and that region is adjacent to a region not being
   added, then you end up leaving out the unaligned bits from the linear
   mapping. This could be bits of the initramfs or devicetree.
  
   What I don't get with this failure is that it is an alignment fault
   which should be masked at EL1 for the kernel. The same unaligned
   access happens without this patch and it doesn't generate a fault.
  
 
  Ah, but unaligned accesses are not ignored for device memory.
  I have this in include/acpi/acpi_io.h:
 
  static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
acpi_size size)
  {
  #ifdef CONFIG_ARM64
if (!page_is_ram(phys  PAGE_SHIFT))
return ioremap(phys, size);
  #endif
 
 return ioremap_cache(phys, size);
  }
 
  Because the table isn't in the linear mapping, it fails the
  page_is_ram() test and it gits mapped with ioremap() leading to
  the alignment fault.
 
  If I take out the code inside the #ifdef, I get a different
  fault:
 
  [ 0.350057] Unhandled fault: synchronous external abort (0x9610) at 
  0xfefae6f4
  [0.358704] pgd = fe000116
  [0.362276] [fefae6f4] *pgd=004001370003, 
  *pud=004001370003, *pmd=004001370003, *pte=02c00040011a0713
  [0.373746] Internal error: : 9610 [#1] SMP
  [0.378484] Modules linked in:
  [0.381601] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc4+ #15
  [0.388248] Hardware name: APM X-Gene Mustang board (DT)
  [0.393738] task: fe03dbe1 ti: fe03dbf0 task.ti: 
  fe03dbf0
  [0.401503] PC is at acpi_ex_system_memory_space_handler+0x238/0x2e0
  [0.408160] LR is at acpi_ex_system_memory_space_handler+0x130/0x2e0
 
  That happens because AML is trying to access a hardware register
  which has been mapped as normal memory.
 
  Which is why removing the check in the ifdef is completely nonsensical.
 
 I am sure we are all in agreement on this part.
 
  We already knew we can't map everything cacheable -- regardless of what
  AML does the CPU can prefetch from anything mapped cacheable (or simply
  executable) at any time.
 
  So, we need a way to tell a table in ram from an io address in AML.
  And page_is_ram() no longer cuts it if the tables are not in the
  linear mapping.
 
  If the kernel were loaded at an address above the tables, it would fail
  similarly. So the page_is_ram was never sufficient to ensure tables
  would be mapped cacheable.
 
  As we haven't decoupled the kernel text mapping from the linear mapping,
  and that doesn't look to be happening any time soon, we can't fix up
  page_is_ram -- we need something else entirely.
 
 
 Well, if you look at the series, and in particular at the /dev/mem
 handling, there is already some code that classifies physical
 addresses based on whether they appear in the UEFI memory map, and
 with which attributes. I suppose reimplementing page_is_ram() [whose
 default implementation is conveniently __weak] to return true for
 EFI_MEMORY_WB ranges and false for everything else would do the trick
 here, and would arguably be more elegant than matching the string
 System RAM

Re: [PATCH v2 04/10] arm64/efi: invert UEFI memory region reservation logic

2014-10-28 Thread Mark Rutland
On Tue, Oct 28, 2014 at 04:18:37PM +, Ard Biesheuvel wrote:
 Instead of reserving the memory regions based on which types we know
 need to be reserved, consider only regions of the following types as
 free for general use by the OS:
 
 EFI_LOADER_CODE
 EFI_LOADER_DATA
 EFI_BOOT_SERVICES_CODE
 EFI_BOOT_SERVICES_DATA
 EFI_CONVENTIONAL_MEMORY
 
 Note that this also fixes a problem with the original code, which would
 misidentify a EFI_RUNTIME_SERVICES_DATA region as not reserved if it
 does not have the EFI_MEMORY_RUNTIME attribute set. However, it is
 perfectly legal for the firmware not to request a virtual mapping for
 EFI_RUNTIME_SERVICES_DATA regions that contain configuration tables, in
 which case the EFI_MEMORY_RUNTIME attribute would not be set.
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
  arch/arm64/kernel/efi.c | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)
 
 diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
 index 95c49ebc660d..2e829148fb36 100644
 --- a/arch/arm64/kernel/efi.c
 +++ b/arch/arm64/kernel/efi.c
 @@ -125,17 +125,17 @@ out:
   */
  static __init int is_reserve_region(efi_memory_desc_t *md)
  {
 - if (!is_normal_ram(md))
 + switch (md-type) {
 + case EFI_LOADER_CODE:
 + case EFI_LOADER_DATA:
 + case EFI_BOOT_SERVICES_CODE:
 + case EFI_BOOT_SERVICES_DATA:
 + case EFI_CONVENTIONAL_MEMORY:
   return 0;
 -
 - if (md-attribute  EFI_MEMORY_RUNTIME)
 - return 1;
 -
 - if (md-type == EFI_ACPI_RECLAIM_MEMORY ||
 - md-type == EFI_RESERVED_TYPE)
 - return 1;
 -
 - return 0;
 + default:
 + break;
 + }
 + return is_normal_ram(md);

Just to check: did we figure out if UnusableMemory was allowed to have
EFI_MEMORY_WB attributes? If it isn't, this looks fine to me.

If it is, then we will need to remove that memory (rather than reserving
it) to prevent speculative accesses.

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/10] arm64/efi: invert UEFI memory region reservation logic

2014-10-28 Thread Mark Rutland
On Tue, Oct 28, 2014 at 05:08:29PM +, Ard Biesheuvel wrote:
 On 28 October 2014 17:47, Mark Rutland mark.rutl...@arm.com wrote:
  On Tue, Oct 28, 2014 at 04:18:37PM +, Ard Biesheuvel wrote:
  Instead of reserving the memory regions based on which types we know
  need to be reserved, consider only regions of the following types as
  free for general use by the OS:
 
  EFI_LOADER_CODE
  EFI_LOADER_DATA
  EFI_BOOT_SERVICES_CODE
  EFI_BOOT_SERVICES_DATA
  EFI_CONVENTIONAL_MEMORY
 
  Note that this also fixes a problem with the original code, which would
  misidentify a EFI_RUNTIME_SERVICES_DATA region as not reserved if it
  does not have the EFI_MEMORY_RUNTIME attribute set. However, it is
  perfectly legal for the firmware not to request a virtual mapping for
  EFI_RUNTIME_SERVICES_DATA regions that contain configuration tables, in
  which case the EFI_MEMORY_RUNTIME attribute would not be set.
 
  Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
  ---
   arch/arm64/kernel/efi.c | 20 ++--
   1 file changed, 10 insertions(+), 10 deletions(-)
 
  diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
  index 95c49ebc660d..2e829148fb36 100644
  --- a/arch/arm64/kernel/efi.c
  +++ b/arch/arm64/kernel/efi.c
  @@ -125,17 +125,17 @@ out:
*/
   static __init int is_reserve_region(efi_memory_desc_t *md)
   {
  - if (!is_normal_ram(md))
  + switch (md-type) {
  + case EFI_LOADER_CODE:
  + case EFI_LOADER_DATA:
  + case EFI_BOOT_SERVICES_CODE:
  + case EFI_BOOT_SERVICES_DATA:
  + case EFI_CONVENTIONAL_MEMORY:
return 0;
  -
  - if (md-attribute  EFI_MEMORY_RUNTIME)
  - return 1;
  -
  - if (md-type == EFI_ACPI_RECLAIM_MEMORY ||
  - md-type == EFI_RESERVED_TYPE)
  - return 1;
  -
  - return 0;
  + default:
  + break;
  + }
  + return is_normal_ram(md);
 
  Just to check: did we figure out if UnusableMemory was allowed to have
  EFI_MEMORY_WB attributes? If it isn't, this looks fine to me.
 
  If it is, then we will need to remove that memory (rather than reserving
  it) to prevent speculative accesses.
 
 
 The spec does not mention at all how EfiUnusableMemory should be used,
 and I would assume any such regions to have the EFI_MEMORY_WB
 attribute set, as it is carved out of the normal system RAM, and the
 way Tianocore/EDK2 implements it at least, all those attributes
 (including the write-protect/execute-protect ones) are copied straight
 from the underlying regions and never set to reflect the nature of the
 actual contents.

Ok. That's precisely the case I was concerned about.

 However, for 3.20 I intend to propose another change to this code, so
 that only non-reserved, usable memory gets memblock_add()'ed in the
 first place, and I suppose this should cover your concern as well. The
 reason for doing that is to allow tools like dmidecode and lshw access
 to the SMBIOS and other tables through /dev/mem, which is currently
 disallowed when STRICT_DEVMEM is set.

So long as said memory is not later passed to memblock_reserve, that
should be ok for the EfiUnusableMemory case. I guess we haven't actually
seen such memory yet anyhow?

I not all that keen on the usage of /dev/mem for those given the
availability of other interfaces.

Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/10] arm64/efi: use UEFI memory map unconditionally if available

2014-10-23 Thread Mark Rutland
On Wed, Oct 22, 2014 at 06:06:56PM +0100, Mark Salter wrote:
 On Wed, 2014-10-22 at 16:21 +0200, Ard Biesheuvel wrote:
  On systems that boot via UEFI, all memory nodes are deleted from the
  device tree, and instead, the size and location of system RAM is derived
  from the UEFI memory map. This is handled by reserve_regions, which not only
  reserves parts of memory that UEFI declares as reserved, but also installs
  the memblocks that cover the remaining usable memory.
  
  Currently, reserve_regions() is only called if uefi_init() succeeds.
  However, it does not actually depend on anything that uefi_init() does,
  and not calling reserve_regions() results in a broken boot, so it is
  better to just call it unconditionally.
  
  Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
  ---
   arch/arm64/kernel/efi.c | 11 ---
   1 file changed, 4 insertions(+), 7 deletions(-)
  
  diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
  index 51522ab0c6da..4cec21b1ecdd 100644
  --- a/arch/arm64/kernel/efi.c
  +++ b/arch/arm64/kernel/efi.c
  @@ -313,8 +313,7 @@ void __init efi_init(void)
  memmap.desc_size = params.desc_size;
  memmap.desc_version = params.desc_ver;
   
  -   if (uefi_init()  0)
  -   return;
  +   WARN_ON(uefi_init()  0);
   
  reserve_regions();
   }
 
 It also looks like EFI_BOOT flag will be set even if uefi_init fails.
 If uefi_init fails, we only need reserve_regions() for the purpose
 of adding memblocks. Otherwise, we end up wasting a lot of memory.

What memory are we wasting in that case? Surely the only items that we
could choose to throw away if we failed uefi_init are
EFI_ACPI_RECLAIM_MEMORY and EFI_MEMORY_RUNTIME?

We might want to keep those around so we can kexec into a kernel where
we can make use of them. Surely they shouldn't take up a significant
proportion of the available memory?

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/10] arm64/efi: use UEFI memory map unconditionally if available

2014-10-23 Thread Mark Rutland
[...]

   It also looks like EFI_BOOT flag will be set even if uefi_init fails.
   If uefi_init fails, we only need reserve_regions() for the purpose
   of adding memblocks. Otherwise, we end up wasting a lot of memory.
  
  What memory are we wasting in that case? Surely the only items that we
  could choose to throw away if we failed uefi_init are
  EFI_ACPI_RECLAIM_MEMORY and EFI_MEMORY_RUNTIME?
  
  We might want to keep those around so we can kexec into a kernel where
  we can make use of them. Surely they shouldn't take up a significant
  proportion of the available memory?
 
 In addition, reserve_regions() also reserves BOOT_SERVICES (which gets
 freed up later after set_virtual_address_map(). That won't happen if
 uefi_init() fails. In any case, if uefi_init() fails, we won't be able
 to find the ACPI tables kexec kernel won't be able to either.

If you need the ACPI tables, the first kernel won't boot. However, if we
fail to map the runtime services data and code, that doesn't mean the
next kernel can't. We might have failed to map the runtime services due
to an early_ioremap bug or similar, and that could be fixed in the
kernel we're about to kexec to.

If we're going to nuke runtime regions, we should nuke them in the
memory map descriptors as a courtesy to the next kernel. Otherwise we
could be more courteous and simply leave them be (which is my
preference).

 The BOOT_SERVICES stuff is the big chunk. There was some discussion of
 not reserving that but no one has gotten around to writing the patch
 to clean out the code that does it.

I've just spent some time doing so; patch below. It boots happily on my
Juno (with 4KiB pages at least), but I haven't given it much of a stress
test.

I wasn't sure if unusable memory could show up with EFI_MEMORY_WB
attributes. If so, this patch still won't prevent that from being mapped
as cacheable, but that should be easy to fix.

Thanks,
Mark.

8
From 581d5bac5b4a6f93e23737012b71c58d809af6bb Mon Sep 17 00:00:00 2001
From: Mark Rutland mark.rutl...@arm.com
Date: Thu, 23 Oct 2014 19:41:55 +0100
Subject: [PATCH] arm64: efi: Simplify memory descriptor handling

Currently discovering the memory map from UEFI is a multi-stage affair,
where the boot services code and data are kept around for much longer
than necessary. Additionally, potential overlap between memory and IO
mappings at kernel page granularity is not handled.

This patch attempts to solve both of these issues, with the addition and
filtering of memory regions being handled in a single function.

First, all normal memory (i.e. anything which can be mapped writeback
cacheable) is added to memblock. Second, IO and reserved regions are
filtered out of memblock at kernel page granularity, to prevent
potential conflicting mappings.

As some reserved regions must be present in the idmap these are
reserved. Everything else that's not normal memory is removed,
preventing the possibility of a cacheable mapping covering something it
shouldn't.

Signed-off-by: Mark Rutland mark.rutl...@arm.com
Cc: Ard Biesheuvel ard.biesheu...@linaro.org
Cc: Catalin Marinas catalin.mari...@arm.com
Cc: Leif Lindholm leif.lindh...@linaro.org
Cc: Mark Salter msal...@redhat.com
---
 arch/arm64/kernel/efi.c | 181 +++-
 1 file changed, 42 insertions(+), 139 deletions(-)

diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 03aaa99..8873c28 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -154,161 +154,66 @@ static __init int is_reserve_region(efi_memory_desc_t 
*md)
return 0;
 }
 
-static __init void reserve_regions(void)
+static __init void efi_add_memory(void)
 {
efi_memory_desc_t *md;
-   u64 paddr, npages, size;
+   u64 paddr, size;
 
if (uefi_debug)
-   pr_info(Processing EFI memory map:\n);
+   pr_info(EFI: adding normal memory:\n);
 
for_each_efi_memory_desc(memmap, md) {
+   if (!is_normal_ram(md))
+   continue;
+
paddr = md-phys_addr;
-   npages = md-num_pages;
+   size = md-num_pages  EFI_PAGE_SHIFT;
 
if (uefi_debug)
-   pr_info(  0x%012llx-0x%012llx [%s],
-   paddr, paddr + (npages  EFI_PAGE_SHIFT) - 1,
+   pr_info(  0x%012llx-0x%012llx [%s]\n,
+   paddr, size - 1,
memory_type_name[md-type]);
 
-   memrange_efi_to_native(paddr, npages);
-   size = npages  PAGE_SHIFT;
+   early_init_dt_add_memory_arch(paddr, size);
+   }
+
+   /*
+* Regions describe in an EFI_MEMORY_DESCRIPTOR are only guaranteed to
+* be 4KiB aligned but the kernel page size could be larger, and
+* regions with different attributes could fall into the same kernel
+* page. Thus we must remove any memory in the same

Re: [PATCH 01/10] arm64/efi: efistub: jump to 'stext' directly, not through the header

2014-10-22 Thread Mark Rutland
Hi Ard,

On Wed, Oct 22, 2014 at 03:21:44PM +0100, Ard Biesheuvel wrote:
 After the EFI stub has done its business, it jumps into the kernel by
 branching to offset #0 of the loaded Image, which is where it expects
 to find the header containing a 'branch to stext' instruction.
 
 However, the UEFI spec 2.1.1 states the following regarding PE/COFF
 image loading:
 A UEFI image is loaded into memory through the LoadImage() Boot
 Service. This service loads an image with a PE32+ format into memory.
 This PE32+ loader is required to load all sections of the PE32+ image
 into memory.
 
 In other words, it is /not/ required to load parts of the image that are
 not covered by a PE/COFF section, so it may not have loaded the header
 at the expected offset, as it is not covered by any PE/COFF section.
 
 So instead, jump to 'stext' directly, which is at the base of the
 PE/COFF .text section, by supplying a symbol 'stext_offset' to
 efi-entry.o which contains the relative offset of stext into the Image.
 Also replace other open coded calculations of the same value with a
 reference to 'stext_offset'
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org

Given the constraints you describe above, and prior discussions, this
looks sane to me:

Acked-by: Mark Rutland mark.rutl...@arm.com

Mark.

 ---
 v3:
 - rebased onto 3.17+
 - added spec reference to commit message
 
 v2:
 - drop :lo12: relocation against stext_offset in favor of using a literal
   '=stext_offset' which is safer
 ---
  arch/arm64/kernel/efi-entry.S |  3 ++-
  arch/arm64/kernel/head.S  | 10 ++
  2 files changed, 8 insertions(+), 5 deletions(-)
 
 diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
 index 619b1dd7bcde..a0016d3a17da 100644
 --- a/arch/arm64/kernel/efi-entry.S
 +++ b/arch/arm64/kernel/efi-entry.S
 @@ -61,7 +61,8 @@ ENTRY(efi_stub_entry)
*/
   mov x20, x0 // DTB address
   ldr x0, [sp, #16]   // relocated _text address
 - mov x21, x0
 + ldr x21, =stext_offset
 + add x21, x0, x21
  
   /*
* Flush dcache covering current runtime addresses
 diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
 index 0a6e4f924df8..8c06c9d269d2 100644
 --- a/arch/arm64/kernel/head.S
 +++ b/arch/arm64/kernel/head.S
 @@ -132,6 +132,8 @@ efi_head:
  #endif
  
  #ifdef CONFIG_EFI
 + .globl  stext_offset
 + .setstext_offset, stext - efi_head
   .align 3
  pe_header:
   .ascii  PE
 @@ -155,7 +157,7 @@ optional_header:
   .long   0   // SizeOfInitializedData
   .long   0   // SizeOfUninitializedData
   .long   efi_stub_entry - efi_head   // AddressOfEntryPoint
 - .long   stext - efi_head// BaseOfCode
 + .long   stext_offset// BaseOfCode
  
  extra_header_fields:
   .quad   0   // ImageBase
 @@ -172,7 +174,7 @@ extra_header_fields:
   .long   _end - efi_head // SizeOfImage
  
   // Everything before the kernel image is considered part of the header
 - .long   stext - efi_head// SizeOfHeaders
 + .long   stext_offset// SizeOfHeaders
   .long   0   // CheckSum
   .short  0xa // Subsystem (EFI application)
   .short  0   // DllCharacteristics
 @@ -217,9 +219,9 @@ section_table:
   .byte   0
   .byte   0   // end of 0 padding of section name
   .long   _end - stext// VirtualSize
 - .long   stext - efi_head// VirtualAddress
 + .long   stext_offset// VirtualAddress
   .long   _edata - stext  // SizeOfRawData
 - .long   stext - efi_head// PointerToRawData
 + .long   stext_offset// PointerToRawData
  
   .long   0   // PointerToRelocations (0 for executables)
   .long   0   // PointerToLineNumbers (0 for executables)
 -- 
 1.8.3.2
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] arm64/efi: set PE/COFF section alignment to 4 KB

2014-10-22 Thread Mark Rutland
On Wed, Oct 22, 2014 at 03:21:45PM +0100, Ard Biesheuvel wrote:
 Position independent AArch64 code needs to be linked and loaded at the
 same relative offset from a 4 KB boundary, or adrp/add and adrp/ldr
 pairs will not work correctly. (This is how PC relative symbol
 references with a 4 GB reach are emitted)
 
 We need to declare this in the PE/COFF header, otherwise the PE/COFF
 loader may load the Image and invoke the stub at an offset which
 violates this rule.
 
 Reviewed-by: Roy Franz roy.fr...@linaro.org
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org

Acked-by: Mark Rutland mark.rutl...@arm.com

Mark.

 ---
 v2: added comment explaining '.align 12' in head.S
 ---
  arch/arm64/kernel/head.S | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)
 
 diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
 index 8c06c9d269d2..8ae84d8c2a8c 100644
 --- a/arch/arm64/kernel/head.S
 +++ b/arch/arm64/kernel/head.S
 @@ -161,7 +161,7 @@ optional_header:
  
  extra_header_fields:
   .quad   0   // ImageBase
 - .long   0x20// SectionAlignment
 + .long   0x1000  // SectionAlignment
   .long   0x8 // FileAlignment
   .short  0   // MajorOperatingSystemVersion
   .short  0   // MinorOperatingSystemVersion
 @@ -228,7 +228,15 @@ section_table:
   .short  0   // NumberOfRelocations  (0 for executables)
   .short  0   // NumberOfLineNumbers  (0 for executables)
   .long   0xe0500020  // Characteristics (section flags)
 - .align 5
 +
 + /*
 +  * EFI will load stext onwards at the 4k section alignment
 +  * described in the PE/COFF header. To ensure that instruction
 +  * sequences using an adrp and a :lo12: immediate will function
 +  * correctly at this alignment, we must ensure that stext is
 +  * placed at a 4k boundary in the Image to begin with.
 +  */
 + .align 12
  #endif
  
  ENTRY(stext)
 -- 
 1.8.3.2
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64/efi: set PE/COFF section alignment to 4 KB

2014-10-10 Thread Mark Rutland
Hi Ard,

On Fri, Oct 10, 2014 at 10:25:24AM +0100, Ard Biesheuvel wrote:
 Position independent AArch64 code needs to be linked and loaded at the same
 relative offset from a 4 KB boundary, or adrp/add and adrp/ldr pairs will
 not work correctly. (This is how PC relative symbol references with a 4 GB
 reach are emitted)
 
 We need to declare this in the PE/COFF header, otherwise the PE/COFF loader
 may load the Image and invoke the stub at an offset which violates this rule.

Has this been observed happening, or was this just found by inspection?

 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
  arch/arm64/kernel/head.S | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
 index 0a6e4f924df8..5e83e5b8a9de 100644
 --- a/arch/arm64/kernel/head.S
 +++ b/arch/arm64/kernel/head.S
 @@ -159,7 +159,7 @@ optional_header:
  
  extra_header_fields:
   .quad   0   // ImageBase
 - .long   0x20// SectionAlignment
 + .long   0x1000  // SectionAlignment
   .long   0x8 // FileAlignment
   .short  0   // MajorOperatingSystemVersion
   .short  0   // MinorOperatingSystemVersion
 @@ -226,7 +226,7 @@ section_table:
   .short  0   // NumberOfRelocations  (0 for executables)
   .short  0   // NumberOfLineNumbers  (0 for executables)
   .long   0xe0500020  // Characteristics (section flags)
 - .align 5
 + .align 12

Can we get a comment explaining why stext needs the additional
alignment? Something like:

/*
 * EFI will load stext onwards at the 4k section alignment
 * described in the PE/COFF header. To ensure that instruction
 * sequences using an adrp and a :lo12: immediate will function
 * correctly at this alignment, we must ensure that stext is
 * placed at a 4k boundary in the Image to begin with.
 */
.align 12

Otherwise this looks sane to me.

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] arm64/efi: efistub: jump to 'stext' directly, not through the header

2014-10-10 Thread Mark Rutland
On Thu, Oct 09, 2014 at 08:03:52PM +0100, Ard Biesheuvel wrote:
 On 9 October 2014 19:23, Mark Rutland mark.rutl...@arm.com wrote:
  Hi Ard,
 
  On Wed, Oct 08, 2014 at 03:11:27PM +0100, Ard Biesheuvel wrote:
  After the EFI stub has done its business, it jumps into the kernel by
  branching to offset #0 of the loaded Image, which is where it expects
  to find the header containing a 'branch to stext' instruction.
 
  However, the UEFI spec 2.1.1 states the following regarding PE/COFF
  image loading:
  A UEFI image is loaded into memory through the LoadImage() Boot
  Service. This service loads an image with a PE32+ format into memory.
  This PE32+ loader is required to load all sections of the PE32+ image
  into memory.
 
  In other words, it is /not/ required to load parts of the image that are
  not covered by a PE/COFF section, so it may not have loaded the header
  at the expected offset, as it is not covered by any PE/COFF section.
 
  What does this mean for handle_kernel_image? Given we might not have
  _text through to _stext mapped, do we not need to take that into
  account?
 
 
 Actually, handle_kernel_image() does not interpret the header, it just
 copies it along with the rest of the image if it needs to be
 relocated, so I don't see an issue there.

Sorry, I wasn't clear enough with my concern. My concern was whether we
had any guarantee _something_ was mapped for the address range covering
efi_head to stext.

So long as _something_ is mapped there, we're ok -- handle_kernel_image
will just copy some garbage along with the usable portion of the kernel.

But if the EFI loader is allowed to load stext at the precise start of
RAM (or anywhere not in the idmap), in attempting the copy we'd try to
access unmapped addresses.

So if that's a possibility, we need to shrink the copy to cover stext
to _edata rather than _text to edata.

Does that make sense?

 However, I do remember Mark
 Salter mentioning that there is at least one other location that needs
 to be fixed up if this concern is valid. Mark?
 
  Also, have we seen problems on any systems yet?
 
 
 No, I am not aware of any occurrences of this exact issue, this is
 just one of the things I spotted while working on this code.

Ok. I was just curious as to how urgent this was.

 But I think we mostly agree that branching through the header relies
 on behavior of the PE/COFF loader that is not covered by the spec.

Yes. We should not rely on unspecified behaviour.

Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] arm64/efi: efistub: jump to 'stext' directly, not through the header

2014-10-10 Thread Mark Rutland
On Fri, Oct 10, 2014 at 12:52:32PM +0100, Ard Biesheuvel wrote:
 On 10 October 2014 12:49, Mark Rutland mark.rutl...@arm.com wrote:
  On Thu, Oct 09, 2014 at 08:03:52PM +0100, Ard Biesheuvel wrote:
  On 9 October 2014 19:23, Mark Rutland mark.rutl...@arm.com wrote:
   Hi Ard,
  
   On Wed, Oct 08, 2014 at 03:11:27PM +0100, Ard Biesheuvel wrote:
   After the EFI stub has done its business, it jumps into the kernel by
   branching to offset #0 of the loaded Image, which is where it expects
   to find the header containing a 'branch to stext' instruction.
  
   However, the UEFI spec 2.1.1 states the following regarding PE/COFF
   image loading:
   A UEFI image is loaded into memory through the LoadImage() Boot
   Service. This service loads an image with a PE32+ format into memory.
   This PE32+ loader is required to load all sections of the PE32+ image
   into memory.
  
   In other words, it is /not/ required to load parts of the image that are
   not covered by a PE/COFF section, so it may not have loaded the header
   at the expected offset, as it is not covered by any PE/COFF section.
  
   What does this mean for handle_kernel_image? Given we might not have
   _text through to _stext mapped, do we not need to take that into
   account?
  
 
  Actually, handle_kernel_image() does not interpret the header, it just
  copies it along with the rest of the image if it needs to be
  relocated, so I don't see an issue there.
 
  Sorry, I wasn't clear enough with my concern. My concern was whether we
  had any guarantee _something_ was mapped for the address range covering
  efi_head to stext.
 
  So long as _something_ is mapped there, we're ok -- handle_kernel_image
  will just copy some garbage along with the usable portion of the kernel.
 
 
 Indeed.
 
  But if the EFI loader is allowed to load stext at the precise start of
  RAM (or anywhere not in the idmap), in attempting the copy we'd try to
  access unmapped addresses.
 
  So if that's a possibility, we need to shrink the copy to cover stext
  to _edata rather than _text to edata.
 
  Does that make sense?
 
 
 That cannot happen. The PE/COFF .text section's positive relative
 virtual offset ensures that the memory image has room for the header,
 it's just not guaranteed that anything gets copied there.

Ok. If we're guaranteed to have some space there, we're fine.

I'm probably being a bit thick here, but where is the positive relative
virtual offset in the header? Which field defines that?

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] arm64/efi: efistub: jump to 'stext' directly, not through the header

2014-10-10 Thread Mark Rutland
On Fri, Oct 10, 2014 at 02:27:46PM +0100, Ard Biesheuvel wrote:
 On 10 October 2014 15:03, Mark Rutland mark.rutl...@arm.com wrote:
  [...]
 
But if the EFI loader is allowed to load stext at the precise start of
RAM (or anywhere not in the idmap), in attempting the copy we'd try to
access unmapped addresses.
   
So if that's a possibility, we need to shrink the copy to cover stext
to _edata rather than _text to edata.
   
Does that make sense?
   
  
   That cannot happen. The PE/COFF .text section's positive relative
   virtual offset ensures that the memory image has room for the header,
   it's just not guaranteed that anything gets copied there.
  
   Ok. If we're guaranteed to have some space there, we're fine.
  
   I'm probably being a bit thick here, but where is the positive relative
   virtual offset in the header? Which field defines that?
  
 
  The fields VirtualSize, VirtualAddress (the field I was referring to),
  SizeOfRawData and PointerToRawData define the relation between the
  file layout and the memory layout of the .text section (line 219 and
  up in head.S)
 
  I guess my confusion is over the semantics of the VirtualAddress field.
  If it's treated as an offset, what is that offset relative to in memory?
  And what defines that the space covered by that offset is accessible?
 
 
 The PE/COFF spec 8.3 describes VirtualAddress as
 
 
 For executable images, the address of the first byte of the section
 relative to the image base when the section is loaded into memory.
 
 
 ImageBase is a field itself in the PE/COFF header, described as
 
 
 The preferred address of the first byte of image when loaded into
 memory; must be a multiple of 64 K.
 

Thanks for the info, this now makes a lot more sense to me.

So that means the .text section is not the start of the image, but is
offset by (stext - efi_head) bytes from the start, covering the header
(regardless of whether it is actually present in the loaded image).

 The SizeOfImage field is described as
 
 
 The size (in bytes) of the image, including all headers, as the image
 is loaded in memory.
 
 
 My interpretation is that memory needs to be allocated for the header
 as well as all sections that have a VirtualSize (including sections
 like BSS which don't have a payload in the file)

That would match what I understand from reading the above, though it
strikes me as odd that space needs to be allocated for the headers if
they aren't guaranteed to be copied -- it's not defined where they would
live in the image.

  In our current definition, the memory offset and the file offset are
  identical (which this patch redefines as 'stext_offset'). The virtual
  size covers the entire static memory footprint of Image (minus the
  header). whereas the SizeOfRawData contains the size of the payload in
  the file (again, minus the header). The balance is zero initialized by
  the loader.
 
  I can see why this guarantees there is space for stext to _end, but I
  don't understand how this guarantees there is a valid mapping for the
  region that would otherwise be _head to stext.
 
 
 The allocation itself is defined in terms of ImageBase/SizeOfImage
 (although ImageBase is only a preferred offset)
 How this allocation is populated with data (and where the holes are)
 is described by the sections.

Ok. Thanks for putting this information together (it's remarkably
useful), and thanks for putting up with my ignorance of PE/COFF.

Cheers,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64/efi: set PE/COFF section alignment to 4 KB

2014-10-10 Thread Mark Rutland
On Fri, Oct 10, 2014 at 11:37:03AM +0100, Ard Biesheuvel wrote:
 On 10 October 2014 12:33, Mark Rutland mark.rutl...@arm.com wrote:
  Hi Ard,
 
  On Fri, Oct 10, 2014 at 10:25:24AM +0100, Ard Biesheuvel wrote:
  Position independent AArch64 code needs to be linked and loaded at the same
  relative offset from a 4 KB boundary, or adrp/add and adrp/ldr pairs will
  not work correctly. (This is how PC relative symbol references with a 4 GB
  reach are emitted)
 
  We need to declare this in the PE/COFF header, otherwise the PE/COFF loader
  may load the Image and invoke the stub at an offset which violates this 
  rule.
 
  Has this been observed happening, or was this just found by inspection?
 
 
 This is also something found by inspection, or rather, by the
 discussion going on in the other thread. I am not aware of any PE/COFF
 loaders that may choose an offset that is not 4 KB aligned, even if
 the header we give it appears to allow it.
 
  Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
  ---
   arch/arm64/kernel/head.S | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
 
  diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
  index 0a6e4f924df8..5e83e5b8a9de 100644
  --- a/arch/arm64/kernel/head.S
  +++ b/arch/arm64/kernel/head.S
  @@ -159,7 +159,7 @@ optional_header:
 
   extra_header_fields:
.quad   0   // ImageBase
  - .long   0x20// SectionAlignment
  + .long   0x1000  // SectionAlignment

Looking at this again, I'm more confused than I was to begin with.

Surely we know exactly where the .text section will be loaded because of
its VirtualAddress? If that were the case, we can drop the .align 12 as
we already load it at the offset any arp or :lo12: immediate will have
been built for.

I don't follow how SectionAlignment interacts with a given section's
VirtualAddress.

Thanks,
Mark.

.long   0x8 // FileAlignment
.short  0   // 
  MajorOperatingSystemVersion
.short  0   // 
  MinorOperatingSystemVersion
  @@ -226,7 +226,7 @@ section_table:
.short  0   // NumberOfRelocations  (0 for executables)
.short  0   // NumberOfLineNumbers  (0 for executables)
.long   0xe0500020  // Characteristics (section flags)
  - .align 5
  + .align 12
 
  Can we get a comment explaining why stext needs the additional
  alignment? Something like:
 
  /*
   * EFI will load stext onwards at the 4k section alignment
   * described in the PE/COFF header. To ensure that instruction
   * sequences using an adrp and a :lo12: immediate will function
   * correctly at this alignment, we must ensure that stext is
   * placed at a 4k boundary in the Image to begin with.
   */
  .align 12
 
 
 OK
 
 -- 
 Ard.
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64/efi: set PE/COFF section alignment to 4 KB

2014-10-10 Thread Mark Rutland
On Fri, Oct 10, 2014 at 03:50:49PM +0100, Ard Biesheuvel wrote:
 On 10 October 2014 16:09, Mark Rutland mark.rutl...@arm.com wrote:
  On Fri, Oct 10, 2014 at 11:37:03AM +0100, Ard Biesheuvel wrote:
  On 10 October 2014 12:33, Mark Rutland mark.rutl...@arm.com wrote:
   Hi Ard,
  
   On Fri, Oct 10, 2014 at 10:25:24AM +0100, Ard Biesheuvel wrote:
   Position independent AArch64 code needs to be linked and loaded at the 
   same
   relative offset from a 4 KB boundary, or adrp/add and adrp/ldr pairs 
   will
   not work correctly. (This is how PC relative symbol references with a 4 
   GB
   reach are emitted)
  
   We need to declare this in the PE/COFF header, otherwise the PE/COFF 
   loader
   may load the Image and invoke the stub at an offset which violates this 
   rule.
  
   Has this been observed happening, or was this just found by inspection?
  
 
  This is also something found by inspection, or rather, by the
  discussion going on in the other thread. I am not aware of any PE/COFF
  loaders that may choose an offset that is not 4 KB aligned, even if
  the header we give it appears to allow it.
 
   Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
   ---
arch/arm64/kernel/head.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
  
   diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
   index 0a6e4f924df8..5e83e5b8a9de 100644
   --- a/arch/arm64/kernel/head.S
   +++ b/arch/arm64/kernel/head.S
   @@ -159,7 +159,7 @@ optional_header:
  
extra_header_fields:
 .quad   0   // ImageBase
   - .long   0x20// SectionAlignment
   + .long   0x1000  // SectionAlignment
 
  Looking at this again, I'm more confused than I was to begin with.
 
 
 :-)
 
  Surely we know exactly where the .text section will be loaded because of
  its VirtualAddress? If that were the case, we can drop the .align 12 as
  we already load it at the offset any arp or :lo12: immediate will have
  been built for.
 
 
 No, not quite. It only tells us what the /offset/ should be of the
 section from the ImageBase chosen by the loader, not what the
 alignment of ImageBase itself should be. For instance, with a section
 VirtualAddress of 0x1000 and a SectionAlignment of 0x400, the section
 could legally be loaded @ 0x1400 or 0x1800 (for ImageBase == 0x400 or
 0x800, respectively)

I see.

I had myself confused between the image and sections, and (mistakenly)
thought we had control over the alignment of the image as opposed to
sections. I thought the loader chose an ImageBase that was sufficiently
aligned, then just loaded each segment at the requested offset from
that. It sounds like the loader actually has to try to reconcile the
offset of each section against each other to determine an ImageBase to
use.

Given that the only thing we can control the alignment of is the .text
section (with an offset applied below that), aligning the .text section
to 4k sounds right.

Thanks for bearing with me!

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] arm64/efi: efistub: jump to 'stext' directly, not through the header

2014-10-09 Thread Mark Rutland
Hi Ard,

On Wed, Oct 08, 2014 at 03:11:27PM +0100, Ard Biesheuvel wrote:
 After the EFI stub has done its business, it jumps into the kernel by
 branching to offset #0 of the loaded Image, which is where it expects
 to find the header containing a 'branch to stext' instruction.
 
 However, the UEFI spec 2.1.1 states the following regarding PE/COFF
 image loading:
 A UEFI image is loaded into memory through the LoadImage() Boot
 Service. This service loads an image with a PE32+ format into memory.
 This PE32+ loader is required to load all sections of the PE32+ image
 into memory.
 
 In other words, it is /not/ required to load parts of the image that are
 not covered by a PE/COFF section, so it may not have loaded the header
 at the expected offset, as it is not covered by any PE/COFF section.

What does this mean for handle_kernel_image? Given we might not have
_text through to _stext mapped, do we not need to take that into
account?

Also, have we seen problems on any systems yet?

Otherwise, this looks like a good fix for hte problem.

Thanks,
Mark.

 So instead, jump to 'stext' directly, which is at the base of the
 PE/COFF .text section, by supplying a symbol 'stext_offset' to
 efi-entry.o which contains the relative offset of stext into the Image.
 Also replace other open coded calculations of the same value with a
 reference to 'stext_offset'
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
 Changes since v2:
 - rebased onto 3.17+
 - added spec reference to commit message
 
 Changes since v1:
 - drop :lo12: relocation against stext_offset in favor of using a literal
   '=stext_offset' which is safer
 
  arch/arm64/kernel/efi-entry.S |  3 ++-
  arch/arm64/kernel/head.S  | 10 ++
  2 files changed, 8 insertions(+), 5 deletions(-)
 
 diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
 index 619b1dd7bcde..a0016d3a17da 100644
 --- a/arch/arm64/kernel/efi-entry.S
 +++ b/arch/arm64/kernel/efi-entry.S
 @@ -61,7 +61,8 @@ ENTRY(efi_stub_entry)
*/
   mov x20, x0 // DTB address
   ldr x0, [sp, #16]   // relocated _text address
 - mov x21, x0
 + ldr x21, =stext_offset
 + add x21, x0, x21
  
   /*
* Flush dcache covering current runtime addresses
 diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
 index 0a6e4f924df8..8c06c9d269d2 100644
 --- a/arch/arm64/kernel/head.S
 +++ b/arch/arm64/kernel/head.S
 @@ -132,6 +132,8 @@ efi_head:
  #endif
  
  #ifdef CONFIG_EFI
 + .globl  stext_offset
 + .setstext_offset, stext - efi_head
   .align 3
  pe_header:
   .ascii  PE
 @@ -155,7 +157,7 @@ optional_header:
   .long   0   // SizeOfInitializedData
   .long   0   // SizeOfUninitializedData
   .long   efi_stub_entry - efi_head   // AddressOfEntryPoint
 - .long   stext - efi_head// BaseOfCode
 + .long   stext_offset// BaseOfCode
  
  extra_header_fields:
   .quad   0   // ImageBase
 @@ -172,7 +174,7 @@ extra_header_fields:
   .long   _end - efi_head // SizeOfImage
  
   // Everything before the kernel image is considered part of the header
 - .long   stext - efi_head// SizeOfHeaders
 + .long   stext_offset// SizeOfHeaders
   .long   0   // CheckSum
   .short  0xa // Subsystem (EFI application)
   .short  0   // DllCharacteristics
 @@ -217,9 +219,9 @@ section_table:
   .byte   0
   .byte   0   // end of 0 padding of section name
   .long   _end - stext// VirtualSize
 - .long   stext - efi_head// VirtualAddress
 + .long   stext_offset// VirtualAddress
   .long   _edata - stext  // SizeOfRawData
 - .long   stext - efi_head// PointerToRawData
 + .long   stext_offset// PointerToRawData
  
   .long   0   // PointerToRelocations (0 for executables)
   .long   0   // PointerToLineNumbers (0 for executables)
 -- 
 1.8.3.2
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/44] qnap-poweroff: Drop reference to pm_power_off from devicetree bindings

2014-10-07 Thread Mark Rutland
On Tue, Oct 07, 2014 at 06:28:09AM +0100, Guenter Roeck wrote:
 Replace reference to pm_power_off (which is an implementation detail)
 and replace it with a more generic description of the driver's functionality.
 
 Cc: Rob Herring robh...@kernel.org
 Cc: Pawel Moll pawel.m...@arm.com
 Cc: Mark Rutland mark.rutl...@arm.com
 Signed-off-by: Guenter Roeck li...@roeck-us.net
 ---
  Documentation/devicetree/bindings/power_supply/qnap-poweroff.txt | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/Documentation/devicetree/bindings/power_supply/qnap-poweroff.txt 
 b/Documentation/devicetree/bindings/power_supply/qnap-poweroff.txt
 index af25e77..1e2260a 100644
 --- a/Documentation/devicetree/bindings/power_supply/qnap-poweroff.txt
 +++ b/Documentation/devicetree/bindings/power_supply/qnap-poweroff.txt
 @@ -3,8 +3,8 @@
  QNAP NAS devices have a microcontroller controlling the main power
  supply. This microcontroller is connected to UART1 of the Kirkwood and
  Orion5x SoCs. Sending the character 'A', at 19200 baud, tells the
 -microcontroller to turn the power off. This driver adds a handler to
 -pm_power_off which is called to turn the power off.
 +microcontroller to turn the power off. This driver installs a handler
 +to power off the system.

I'd remove the last sentence -- the driver is also independent of the
HW, and the description of how the power off works at the HW level is
sufficient.

With that:

Acked-by: Mark Rutland mark.rutl...@arm.com

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/44] gpio-poweroff: Drop reference to pm_power_off from devicetree bindings

2014-10-07 Thread Mark Rutland
On Tue, Oct 07, 2014 at 06:28:08AM +0100, Guenter Roeck wrote:
 pm_power_off is an implementation detail. Replace it with a more generic
 description of the driver's functionality.
 
 Cc: Rob Herring robh...@kernel.org
 Cc: Pawel Moll pawel.m...@arm.com
 Cc: Mark Rutland mark.rutl...@arm.com

Acked-by: Mark Rutland mark.rutl...@arm.com

 Signed-off-by: Guenter Roeck li...@roeck-us.net
 ---
  Documentation/devicetree/bindings/gpio/gpio-poweroff.txt | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)
 
 diff --git a/Documentation/devicetree/bindings/gpio/gpio-poweroff.txt 
 b/Documentation/devicetree/bindings/gpio/gpio-poweroff.txt
 index d4eab92..c95a1a6 100644
 --- a/Documentation/devicetree/bindings/gpio/gpio-poweroff.txt
 +++ b/Documentation/devicetree/bindings/gpio/gpio-poweroff.txt
 @@ -2,12 +2,12 @@ Driver a GPIO line that can be used to turn the power off.
  
  The driver supports both level triggered and edge triggered power off.
  At driver load time, the driver will request the given gpio line and
 -install a pm_power_off handler. If the optional properties 'input' is
 -not found, the GPIO line will be driven in the inactive
 +install a handler to power off the system. If the optional properties
 +'input' is not found, the GPIO line will be driven in the inactive
  state. Otherwise its configured as an input.
  
 -When the pm_power_off is called, the gpio is configured as an output,
 -and drive active, so triggering a level triggered power off
 +When the the poweroff handler is called, the gpio is configured as an
 +output, and drive active, so triggering a level triggered power off
  condition. This will also cause an inactive-active edge condition, so
  triggering positive edge triggered power off. After a delay of 100ms,
  the GPIO is set to inactive, thus causing an active-inactive edge,
 @@ -24,7 +24,7 @@ Required properties:
  
  Optional properties:
  - input : Initially configure the GPIO line as an input. Only reconfigure
 -  it to an output when the pm_power_off function is called. If this optional
 +  it to an output when the poweroff handler is called. If this optional
property is not specified, the GPIO is initialized as an output in its
inactive state.
  
 -- 
 1.9.1
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/44] mfd: as3722: Drop reference to pm_power_off from devicetree bindings

2014-10-07 Thread Mark Rutland
On Tue, Oct 07, 2014 at 05:21:11PM +0100, Rob Landley wrote:
 On 10/07/14 00:28, Guenter Roeck wrote:
  Devicetree bindings are supposed to be operating system independent
  and should thus not describe how a specific functionality is implemented
  in Linux.
 
 So your argument is that linux/Documentation/devicetree/bindings should
 not be specific to Linux. Merely hosted in the Linux kernel source
 repository.

Precisely. If nothing else as a general guideline this keeps us honest,
and prevents us from embedding arbitrary implementation details into
bidnings that cause pain later when we want to change things at either
end.

There are already otehr users of these bindings, so we can't really
claim they're strictly Linux-specific anyhow.

 Well that's certainly a point of view.

As far as I am aware, it's the point of view shared by the device tree
maintainers, and it's been that way for a while.

I don't really follow your concern. For one thing were this followed
more strictly this file wouldn't need patching at all to correct for
this Linux-internal rework...

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/arm64: fix fdt-related memory reservation

2014-09-08 Thread Mark Rutland
Hi Mark,

On Mon, Sep 08, 2014 at 02:31:42PM +0100, Mark Salter wrote:
 Commit 86c8b27a01cf:
  arm64: ignore DT memreserve entries when booting in UEFI mode
 
 prevents early_init_fdt_scan_reserved_mem() from being called for
 arm64 kernels booting via UEFI. This was done because the kernel
 will use the UEFI memory map to determine reserved memory regions.
 That approach has problems in that early_init_fdt_scan_reserved_mem()
 also reserves the FDT itself and any node-specific reserved memory.
 By chance of some kernel configs, the FDT may be overwritten before
 it can be unflattened and the kernel will fail to boot. More subtle
 problems will result if the FDT has node specific reserved memory
 which is not really reserved.

That doesn't sound like fun; apologies for allowing such brokenness
through in the first place.

[...]

 + /*
 +  * Delete all memory reserve map entries. When booting via UEFI,
 +  * kernel will use the UEFI memory map to find reserved regions.
 +  */
 + num_rsv = fdt_num_mem_rsv(fdt);
 + for (i = 0; i  num_rsv; i++)
 + fdt_del_mem_rsv(fdt, i);

I don't think that's right. Won't the memreserve entries shift down by
one each time we call fdt_del_mem_rsv?

Shouldn't this be something like:

while (fdt_num_mem_rsv(fdt))
fdt_del_mem_rsv(fdt, 0);

Or we could count downwards.

Otherwise, the general approach sounds sane to me, so with that bug
fixed or disproven:

Acked-by: Mark Rutland mark.rutl...@arm.com

Given this is mostly in the EFI stub I expect that this will go via the
EFI tree?

Mark.

 +
   node = fdt_subnode_offset(fdt, 0, chosen);
   if (node  0) {
   node = fdt_add_subnode(fdt, 0, chosen);
 -- 
 1.8.3.1
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-efi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/arm64: fix fdt-related memory reservation

2014-09-08 Thread Mark Rutland
On Mon, Sep 08, 2014 at 03:21:05PM +0100, Mark Salter wrote:
 On Mon, 2014-09-08 at 15:06 +0100, Mark Rutland wrote:
  Hi Mark,
  
  On Mon, Sep 08, 2014 at 02:31:42PM +0100, Mark Salter wrote:
   Commit 86c8b27a01cf:
arm64: ignore DT memreserve entries when booting in UEFI mode
   
   prevents early_init_fdt_scan_reserved_mem() from being called for
   arm64 kernels booting via UEFI. This was done because the kernel
   will use the UEFI memory map to determine reserved memory regions.
   That approach has problems in that early_init_fdt_scan_reserved_mem()
   also reserves the FDT itself and any node-specific reserved memory.
   By chance of some kernel configs, the FDT may be overwritten before
   it can be unflattened and the kernel will fail to boot. More subtle
   problems will result if the FDT has node specific reserved memory
   which is not really reserved.
  
  That doesn't sound like fun; apologies for allowing such brokenness
  through in the first place.
 
 Heh. It was obvious that DT unflattening was broken, but bisecting
 didn't help much because I kept finding patches which when reverted
 made the problem go away even though they obviously weren't the
 cause.

Yeah, those types of bugs are never fun. I recall a similar situation
with the __INIT annotation in the versatile pen code. 

  [...]
  
   + /*
   +  * Delete all memory reserve map entries. When booting via UEFI,
   +  * kernel will use the UEFI memory map to find reserved regions.
   +  */
   + num_rsv = fdt_num_mem_rsv(fdt);
   + for (i = 0; i  num_rsv; i++)
   + fdt_del_mem_rsv(fdt, i);
  
  I don't think that's right. Won't the memreserve entries shift down by
  one each time we call fdt_del_mem_rsv?
  
  Shouldn't this be something like:
  
  while (fdt_num_mem_rsv(fdt))
  fdt_del_mem_rsv(fdt, 0);
  
  Or we could count downwards.
  
 
 Sigh. Yes, you are right. I only tested with one reserved region.
 I think counting down would be the way to go. I'll send a fixed
 patch shortly.

Ok, that sounds fine by me; make sure you add my ack :)

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] arm64/efi: efistub: don't abort if base of DRAM is occupied

2014-08-20 Thread Mark Rutland
On Wed, Aug 20, 2014 at 06:10:57PM +0100, Matt Fleming wrote:
 On Thu, 14 Aug, at 12:32:05PM, Mark Rutland wrote:
  On Wed, Jul 30, 2014 at 11:59:04AM +0100, Ard Biesheuvel wrote:
   If we cannot relocate the kernel Image to its preferred offset of base of 
   DRAM
   plus TEXT_OFFSET, instead relocate it to the lowest available 2 MB 
   boundary plus
   TEXT_OFFSET. We may lose a bit of memory at the low end, but we can still
   proceed normally otherwise.
   
   Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
   Acked-by: Mark Salter msal...@redhat.com
  
  Acked-by: Mark Rutland mark.rutl...@arm.com
 
 Ard, who's picking this up?

Will's already taken this into arm64/devel [1,2] with the intention of
waiting for v3.18 [3]. Per Leif's comment [4] that might have to be
bumped.

Will?

Mark.

[1] https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/commit/?h=devel
[2] 
https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/commit/?h=develid=12f002aa8d96b90e6e37cc2b0d9a6a3cacdf8ef5
[3] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-August/279655.html
[4] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-August/279771.html
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] arm64/efi: efistub: cover entire static mem footprint in PE/COFF .text

2014-08-14 Thread Mark Rutland
On Wed, Jul 30, 2014 at 11:59:03AM +0100, Ard Biesheuvel wrote:
 The static memory footprint of a kernel Image at boot is larger than the
 Image file itself. Things like .bss data and initial page tables are allocated
 statically but populated dynamically so their content is not contained in the
 Image file.
 
 However, if EFI (or GRUB) has loaded the Image at precisely the desired offset
 of base of DRAM + TEXT_OFFSET, the Image will be booted in place, and we have
 to make sure that the allocation done by the PE/COFF loader is large enough.
 
 Fix this by growing the PE/COFF .text section to cover the entire static
 memory footprint. The part of the section that is not covered by the payload
 will be zero initialised by the PE/COFF loader.
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 Acked-by: Mark Salter msal...@redhat.com

This looks sane to me and it seems we do the same for x86 as of
c7fb93ec51d4 (x86/efi: Include a .bss section within the PE/COFF
headers). So:

Acked-by: Mark Rutland mark.rutl...@arm.com

 ---
  arch/arm64/kernel/head.S | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
 index 144f10567f82..b6ca95aee348 100644
 --- a/arch/arm64/kernel/head.S
 +++ b/arch/arm64/kernel/head.S
 @@ -151,7 +151,7 @@ optional_header:
   .short  0x20b   // PE32+ format
   .byte   0x02// MajorLinkerVersion
   .byte   0x14// MinorLinkerVersion
 - .long   _edata - stext  // SizeOfCode
 + .long   _end - stext// SizeOfCode
   .long   0   // SizeOfInitializedData
   .long   0   // SizeOfUninitializedData
   .long   efi_stub_entry - efi_head   // AddressOfEntryPoint
 @@ -169,7 +169,7 @@ extra_header_fields:
   .short  0   // MinorSubsystemVersion
   .long   0   // Win32VersionValue
  
 - .long   _edata - efi_head   // SizeOfImage
 + .long   _end - efi_head // SizeOfImage
  
   // Everything before the kernel image is considered part of the header
   .long   stext - efi_head// SizeOfHeaders
 @@ -216,7 +216,7 @@ section_table:
   .byte   0
   .byte   0
   .byte   0   // end of 0 padding of section name
 - .long   _edata - stext  // VirtualSize
 + .long   _end - stext// VirtualSize
   .long   stext - efi_head// VirtualAddress
   .long   _edata - stext  // SizeOfRawData
   .long   stext - efi_head// PointerToRawData
 -- 
 1.8.3.2
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-efi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] arm64/efi: efistub: don't abort if base of DRAM is occupied

2014-08-14 Thread Mark Rutland
On Wed, Jul 30, 2014 at 11:59:04AM +0100, Ard Biesheuvel wrote:
 If we cannot relocate the kernel Image to its preferred offset of base of DRAM
 plus TEXT_OFFSET, instead relocate it to the lowest available 2 MB boundary 
 plus
 TEXT_OFFSET. We may lose a bit of memory at the low end, but we can still
 proceed normally otherwise.
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 Acked-by: Mark Salter msal...@redhat.com

Acked-by: Mark Rutland mark.rutl...@arm.com

 ---
  arch/arm64/kernel/efi-stub.c | 16 ++--
  1 file changed, 6 insertions(+), 10 deletions(-)
 
 diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c
 index 1317fef8dde9..d27dd982ff26 100644
 --- a/arch/arm64/kernel/efi-stub.c
 +++ b/arch/arm64/kernel/efi-stub.c
 @@ -28,20 +28,16 @@ efi_status_t handle_kernel_image(efi_system_table_t 
 *sys_table,
   kernel_size = _edata - _text;
   if (*image_addr != (dram_base + TEXT_OFFSET)) {
   kernel_memsize = kernel_size + (_end - _edata);
 - status = efi_relocate_kernel(sys_table, image_addr,
 -  kernel_size, kernel_memsize,
 -  dram_base + TEXT_OFFSET,
 -  PAGE_SIZE);
 + status = efi_low_alloc(sys_table, kernel_memsize + TEXT_OFFSET,
 +SZ_2M, reserve_addr);
   if (status != EFI_SUCCESS) {
   pr_efi_err(sys_table, Failed to relocate kernel\n);
   return status;
   }
 - if (*image_addr != (dram_base + TEXT_OFFSET)) {
 - pr_efi_err(sys_table, Failed to alloc kernel 
 memory\n);
 - efi_free(sys_table, kernel_memsize, *image_addr);
 - return EFI_LOAD_ERROR;
 - }
 - *image_size = kernel_memsize;
 + memcpy((void *)*reserve_addr + TEXT_OFFSET, (void *)*image_addr,
 +kernel_size);
 + *image_addr = *reserve_addr + TEXT_OFFSET;
 + *reserve_size = kernel_memsize + TEXT_OFFSET;
   }
  
  
 -- 
 1.8.3.2
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] arm64: spin-table: handle unmapped cpu-release-addrs

2014-07-31 Thread Mark Rutland
On Thu, Jul 31, 2014 at 10:45:15AM +0100, Will Deacon wrote:
 On Wed, Jul 30, 2014 at 08:17:02PM +0100, Ard Biesheuvel wrote:
  ]On 30 July 2014 13:30, Will Deacon will.dea...@arm.com wrote:
   On Wed, Jul 30, 2014 at 11:59:02AM +0100, Ard Biesheuvel wrote:
   From: Mark Rutland mark.rutl...@arm.com
  
   In certain cases the cpu-release-addr of a CPU may not fall in the
   linear mapping (e.g. when the kernel is loaded above this address due to
   the presence of other images in memory). This is problematic for the
   spin-table code as it assumes that it can trivially convert a
   cpu-release-addr to a valid VA in the linear map.
  
   This patch modifies the spin-table code to use a temporary cached
   mapping to write to a given cpu-release-addr, enabling us to support
   addresses regardless of whether they are covered by the linear mapping.
  
   Signed-off-by: Mark Rutland mark.rutl...@arm.com
   Tested-by: Mark Salter msal...@redhat.com
   [ardb: added (__force void *) cast]
   Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
   ---
arch/arm64/kernel/smp_spin_table.c | 22 +-
1 file changed, 17 insertions(+), 5 deletions(-)
  
   I'm nervous about this. What if the spin table sits in the same physical 
   64k
   frame as a read-sensitive device and we're running with 64k pages?
  
  
  Actually, booting.txt requires cpu-release-addr to point to a
  /memreserve/d part of memory, which implies DRAM (or you wouldn't have
  to memreserve it)
  That means it should always be covered by the linear mapping, unless
  it is located before Image in DRAM, which is the case addressed by
  this patch.
 
 But if it's located before before the Image in DRAM and isn't covered by
 the linear mapping, then surely the /memreserve/ is pointless too? In which
 case, this looks like we're simply trying to cater for platforms that aren't
 following booting.txt (which may need updating if we need to handle this).

No. The DT is describing the memory which is present, and the subset
thereof which should not be used under normal circumstances. That's a
static property of the system.

Where the OS happens to get loaded and what it is able to address is a
dynamic property of the OS (and possibly the bootloader). The DT cannot
have knowledge of this.

It's always true that the OS should not blindly use memreserve'd memory.
The fact that it cannot address it in the linear mapping is orthogonal.

Cheers,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] arm64: spin-table: handle unmapped cpu-release-addrs

2014-07-31 Thread Mark Rutland
On Thu, Jul 31, 2014 at 11:04:39AM +0100, Will Deacon wrote:
 On Thu, Jul 31, 2014 at 10:58:54AM +0100, Mark Rutland wrote:
  On Thu, Jul 31, 2014 at 10:45:15AM +0100, Will Deacon wrote:
   On Wed, Jul 30, 2014 at 08:17:02PM +0100, Ard Biesheuvel wrote:
]On 30 July 2014 13:30, Will Deacon will.dea...@arm.com wrote:
 On Wed, Jul 30, 2014 at 11:59:02AM +0100, Ard Biesheuvel wrote:
 From: Mark Rutland mark.rutl...@arm.com

 In certain cases the cpu-release-addr of a CPU may not fall in the
 linear mapping (e.g. when the kernel is loaded above this address 
 due to
 the presence of other images in memory). This is problematic for the
 spin-table code as it assumes that it can trivially convert a
 cpu-release-addr to a valid VA in the linear map.

 This patch modifies the spin-table code to use a temporary cached
 mapping to write to a given cpu-release-addr, enabling us to support
 addresses regardless of whether they are covered by the linear 
 mapping.

 Signed-off-by: Mark Rutland mark.rutl...@arm.com
 Tested-by: Mark Salter msal...@redhat.com
 [ardb: added (__force void *) cast]
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
  arch/arm64/kernel/smp_spin_table.c | 22 +-
  1 file changed, 17 insertions(+), 5 deletions(-)

 I'm nervous about this. What if the spin table sits in the same 
 physical 64k
 frame as a read-sensitive device and we're running with 64k pages?


Actually, booting.txt requires cpu-release-addr to point to a
/memreserve/d part of memory, which implies DRAM (or you wouldn't have
to memreserve it)
That means it should always be covered by the linear mapping, unless
it is located before Image in DRAM, which is the case addressed by
this patch.
   
   But if it's located before before the Image in DRAM and isn't covered by
   the linear mapping, then surely the /memreserve/ is pointless too? In 
   which
   case, this looks like we're simply trying to cater for platforms that 
   aren't
   following booting.txt (which may need updating if we need to handle this).
  
  No. The DT is describing the memory which is present, and the subset
  thereof which should not be used under normal circumstances. That's a
  static property of the system.
  
  Where the OS happens to get loaded and what it is able to address is a
  dynamic property of the OS (and possibly the bootloader). The DT cannot
  have knowledge of this.
  
  It's always true that the OS should not blindly use memreserve'd memory.
  The fact that it cannot address it in the linear mapping is orthogonal.
 
 In which case, I think asserting that /memreserve/ implies DRAM is pretty
 fragile and not actually enforced anywhere. Sure, we can say `don't do
 that', but I'd prefer to have the kernel detect this dynamically.

I think the boot protocol needs an update to allow a cpu-release-addr
not covered by linear mapping. There are reasons that the kernel might
not be loaded at the start of RAM, and I think relying on the
cpu-release-addr addresses lying in the linear mapping is a limitation
we need to address. Given that I also think we should allow for
cpu-release-addrs outside of the range desribed by memory nodes (and
therefore not requiring any /mremreserve/).

I do not think we should rely on being able to address the
cpu-release-addr with a normal cacheable mapping. If the
cpu-release-addr falls outside of the memory described by the memory
node(s) then we have no idea where it lives. Currently this falls in
normal memory, but mandating that feels odd.

The sole purpose of /memreserve/ is to describe areas in physical memory
that memory should not be used for general allocation. I don't think it
makes any sense to derive any information from /memreserve/ other than
the fact said addresses shouldn't be poked arbitarily. If we allow
cpu-release-addrs outside of memory, then we won't have a /memreserve/
anyhow.

So the question becomes can or can't we always detect when we already
have a mapping that covers a cpu-release-addr?

 Does dtc check that the /memreserve/ region is actually a subset of the
 memory node?

I don't beleive it does. It's probably a sensible warning, but as far as
I am aware the only time the memory reservation table will be read in
any OS is to poke holes in its memory allocation pool(s).

Cheers,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] arm64: spin-table: handle unmapped cpu-release-addrs

2014-07-30 Thread Mark Rutland
On Wed, Jul 30, 2014 at 01:00:40PM +0100, Ard Biesheuvel wrote:
 On 30 July 2014 13:30, Will Deacon will.dea...@arm.com wrote:
  On Wed, Jul 30, 2014 at 11:59:02AM +0100, Ard Biesheuvel wrote:
  From: Mark Rutland mark.rutl...@arm.com
 
  In certain cases the cpu-release-addr of a CPU may not fall in the
  linear mapping (e.g. when the kernel is loaded above this address due to
  the presence of other images in memory). This is problematic for the
  spin-table code as it assumes that it can trivially convert a
  cpu-release-addr to a valid VA in the linear map.
 
  This patch modifies the spin-table code to use a temporary cached
  mapping to write to a given cpu-release-addr, enabling us to support
  addresses regardless of whether they are covered by the linear mapping.
 
  Signed-off-by: Mark Rutland mark.rutl...@arm.com
  Tested-by: Mark Salter msal...@redhat.com
  [ardb: added (__force void *) cast]
  Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
  ---
   arch/arm64/kernel/smp_spin_table.c | 22 +-
   1 file changed, 17 insertions(+), 5 deletions(-)
 
  I'm nervous about this. What if the spin table sits in the same physical 64k
  frame as a read-sensitive device and we're running with 64k pages?
 
 
 I see what you mean. This is potentially hairy, as EFI already
 ioremap_cache()s everything known to it as normal DRAM, so using plain
 ioremap() here if pfn_valid() returns false for cpu-release-addr's PFN
 may still result in mappings with different attributes for the same
 region. So how should we decide whether to call ioremap() or
 ioremap_cache() in this case?

If we're careful about handling mismatched attributes we might be able
to get away with always using a device mapping.

I'll need to have a think about that, I'm not sure on the architected
cache behaviour in such a case.

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] arm64: spin-table: handle unmapped cpu-release-addrs

2014-07-30 Thread Mark Rutland
On Wed, Jul 30, 2014 at 01:42:58PM +0100, Will Deacon wrote:
 On Wed, Jul 30, 2014 at 01:30:29PM +0100, Mark Rutland wrote:
  On Wed, Jul 30, 2014 at 01:00:40PM +0100, Ard Biesheuvel wrote:
   On 30 July 2014 13:30, Will Deacon will.dea...@arm.com wrote:
On Wed, Jul 30, 2014 at 11:59:02AM +0100, Ard Biesheuvel wrote:
From: Mark Rutland mark.rutl...@arm.com
   
In certain cases the cpu-release-addr of a CPU may not fall in the
linear mapping (e.g. when the kernel is loaded above this address due 
to
the presence of other images in memory). This is problematic for the
spin-table code as it assumes that it can trivially convert a
cpu-release-addr to a valid VA in the linear map.
   
This patch modifies the spin-table code to use a temporary cached
mapping to write to a given cpu-release-addr, enabling us to support
addresses regardless of whether they are covered by the linear mapping.
   
Signed-off-by: Mark Rutland mark.rutl...@arm.com
Tested-by: Mark Salter msal...@redhat.com
[ardb: added (__force void *) cast]
Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/arm64/kernel/smp_spin_table.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)
   
I'm nervous about this. What if the spin table sits in the same 
physical 64k
frame as a read-sensitive device and we're running with 64k pages?
   
   
   I see what you mean. This is potentially hairy, as EFI already
   ioremap_cache()s everything known to it as normal DRAM, so using plain
   ioremap() here if pfn_valid() returns false for cpu-release-addr's PFN
   may still result in mappings with different attributes for the same
   region. So how should we decide whether to call ioremap() or
   ioremap_cache() in this case?
  
  If we're careful about handling mismatched attributes we might be able
  to get away with always using a device mapping.
 
 Even then, I think ioremap hits a WARN_ON if pfn_valid.

Ok, that's that idea dead then.

  I'll need to have a think about that, I'm not sure on the architected
  cache behaviour in such a case.
 
 Of we just skip the cache flush if !pfn_valid.

I don't think that's always safe given Ard's comment that the EFI code
will possibly have a mapping covering the region created by
ioremap_cache.

Ard, what exactly does the EFI code map with ioremap_cache, and why?

Cheers,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: ignore DT memreserve entries when booting in UEFI mode

2014-07-28 Thread Mark Rutland
On Mon, Jul 28, 2014 at 07:03:03PM +0100, Leif Lindholm wrote:
 UEFI provides its own method for marking regions to reserve, via the
 memory map which is also used to initialise memblock. So when using the
 UEFI memory map, ignore any memreserve entries present in the DT.

It's worth noting that no-one is relying on this at present, and were
they it would imply that their UEFI implementation is broken (providing
a dodgy memory map).

So before people start doing dodgy things, let's, close that hole. If we
need a way of modifying the memory map we should come up with a more
general plan.

 Reported-by: Mark Rutland mark.rutl...@arm.com
 Signed-off-by: Leif Lindholm leif.lindh...@linaro.org
 ---
  arch/arm64/kernel/efi.c |2 ++
  arch/arm64/mm/init.c|4 +++-
  2 files changed, 5 insertions(+), 1 deletion(-)
 
 diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
 index 14db1f6..7ad17b2 100644
 --- a/arch/arm64/kernel/efi.c
 +++ b/arch/arm64/kernel/efi.c
 @@ -188,6 +188,8 @@ static __init void reserve_regions(void)
   if (uefi_debug)
   pr_cont(\n);
   }
 +
 + set_bit(EFI_MEMMAP, efi.flags);

Nothing outside of arch/x86 seems to hang off this, so using it here
looks fine.

  }
  
  
 diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
 index e90c542..58dbf2e 100644
 --- a/arch/arm64/mm/init.c
 +++ b/arch/arm64/mm/init.c
 @@ -32,6 +32,7 @@
  #include linux/of_fdt.h
  #include linux/dma-mapping.h
  #include linux/dma-contiguous.h
 +#include linux/efi.h
  
  #include asm/sections.h
  #include asm/setup.h
 @@ -151,7 +152,8 @@ void __init arm64_memblock_init(void)
   memblock_reserve(__pa(swapper_pg_dir), SWAPPER_DIR_SIZE);
   memblock_reserve(__pa(idmap_pg_dir), IDMAP_DIR_SIZE);
  
 - early_init_fdt_scan_reserved_mem();
 + if (!efi_enabled(EFI_MEMMAP))
 + early_init_fdt_scan_reserved_mem();

And this is a false stub for !EFI, so that looks fine.

FWIW:

Reviewed-by: Mark Rutland mark.rutl...@arm.com

I wonder if we have any other flat tree uses we need to be careful of
when booting via EFI.

Thanks
Mark.

  
   /* 4GB maximum for 32-bit only capable devices */
   if (IS_ENABLED(CONFIG_ZONE_DMA))
 -- 
 1.7.10.4
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 09/10] arm64/efi: enable minimal UEFI Runtime Services for big endian

2014-07-23 Thread Mark Rutland
Hi Ard,

This is certainly a neat feature, and I definitely want to be able to
boot BE kernels via UEFI.

However, I'm wary of calling EFI in a physical (i.e. idmap with dcaches
off) context. I'm not sure anyone else does that, and I'm not sure
whether that's going to work (both because of the cache maintenance
requirements and the expectations of a given UEFI implementation w.r.t.
memory cacheability).

I'd hoped we'd be able to use a LE EL0 context to call the runtime
services in, but I'm not sure that's possible by the spec :(

As I understand it, we shouldn't need these runtime services to simply
boot a BE kernel.

On Mon, Jul 21, 2014 at 04:16:24PM +0100, Ard Biesheuvel wrote:
 This enables the UEFI Runtime Services needed to manipulate the
 non-volatile variable store when running under a BE kernel.
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
  arch/arm64/include/asm/efi.h   |   2 +
  arch/arm64/kernel/efi-be-call.S|  55 
  arch/arm64/kernel/efi-be-runtime.c | 104 
 +
  arch/arm64/kernel/efi.c|  15 ++
  4 files changed, 176 insertions(+)
  create mode 100644 arch/arm64/kernel/efi-be-call.S
  create mode 100644 arch/arm64/kernel/efi-be-runtime.c
 
 diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
 index a34fd3b12e2b..44e642b6ab61 100644
 --- a/arch/arm64/include/asm/efi.h
 +++ b/arch/arm64/include/asm/efi.h
 @@ -44,4 +44,6 @@ extern void efi_idmap_init(void);
  
  #define efi_call_early(f, ...) sys_table_arg-boottime-f(__VA_ARGS__)
  
 +extern void efi_be_runtime_setup(void);
 +
  #endif /* _ASM_EFI_H */
 diff --git a/arch/arm64/kernel/efi-be-call.S b/arch/arm64/kernel/efi-be-call.S
 new file mode 100644
 index ..24c92a4c352b
 --- /dev/null
 +++ b/arch/arm64/kernel/efi-be-call.S
 @@ -0,0 +1,55 @@
 +
 +#include linux/linkage.h
 +
 + .text
 + .align  3
 +ENTRY(efi_be_phys_call)
 + /*
 +  * Entered at physical address with 1:1 mapping enabled.
 +  */
 + stp x29, x30, [sp, #-96]!
 + mov x29, sp
 + str x27, [sp, #16]
 +
 + ldr x8, =efi_be_phys_call   // virt address of this function
 + adr x9, efi_be_phys_call// phys address of this function
 + sub x9, x8, x9  // calculate virt to phys offset in x9
 +
 + /* preserve all inputs */
 + stp x0, x1, [sp, #32]
 + stp x2, x3, [sp, #48]
 + stp x4, x5, [sp, #64]
 + stp x6, x7, [sp, #80]
 +
 + /* get phys address of stack */
 + sub sp, sp, x9
 +
 + /* switch to LE, disable MMU and D-cache but leave I-cache enabled */
 + mrs x27, sctlr_el1
 + bic x8, x27, #1  2// clear SCTLR.C
 + msr sctlr_el1, x8
 +
 + bl  flush_cache_all

What is the cache flush for?

The only thing that flush_cache_all can do is empty the local
architected caches, and it can only do that when said caches are
disabled. Any other use is unsafe; we have no guarantee that the cache
is empty (or even clean), and we have no guarantee that prior writes
have made it to the PoC.

Even when the caches are disabled, flush_cache_all can only guarantee
that the local architected caches are empty. There is no guarantee that
the dirty data made it to the PoC.

 +
 + /* restore inputs but rotated by 1 register */
 + ldp x7, x0, [sp, #32]
 + ldp x1, x2, [sp, #48]
 + ldp x3, x4, [sp, #64]
 + ldp x5, x6, [sp, #80]
 +
 + bic x8, x27, #1  2// clear SCTLR.C
 + bic x8, x8, #1  0 // clear SCTLR.M
 + bic x8, x8, #1  25// clear SCTLR.EE
 + msr sctlr_el1, x8
 + isb
 +
 + blr x7

Is it safe to call EFI functions with the D-cache disabled?

Do the functions not care about the memory attributes for their own sue
(e.g. for exclusives)? 

Do they not care about IO? If IO to/from storage for variables is
cache-coherent EFI and the device won't have a coherent view of memory.

 +
 + /* switch back to BE and reenable MMU and D-cache */
 + msr sctlr_el1, x27
 +

Missing ISB?

 + mov sp, x29
 + ldr x27, [sp, #16]
 + ldp x29, x30, [sp], #96
 + ret
 +ENDPROC(efi_be_phys_call)
 diff --git a/arch/arm64/kernel/efi-be-runtime.c 
 b/arch/arm64/kernel/efi-be-runtime.c
 new file mode 100644
 index ..62e6c441b77b
 --- /dev/null
 +++ b/arch/arm64/kernel/efi-be-runtime.c
 @@ -0,0 +1,104 @@
 +
 +#include linux/efi.h
 +#include linux/spinlock.h
 +#include asm/efi.h
 +#include asm/neon.h
 +#include asm/tlbflush.h
 +
 +static efi_runtime_services_t *runtime;
 +static efi_status_t (*efi_be_call)(phys_addr_t func, ...);
 +
 +static DEFINE_SPINLOCK(efi_be_rt_lock);
 +
 +static unsigned long efi_be_call_pre(void)
 +{
 + unsigned long flags;
 +
 + kernel_neon_begin();
 + spin_lock_irqsave(efi_be_rt_lock, flags);

At this point we might still have DA_F unmasked, and I don't 

Re: [RFC PATCH] arm64/efi: efistub: reuse EFI mapping for Image if it is lower

2014-07-16 Thread Mark Rutland
Hi Ard,

On Tue, Jul 15, 2014 at 05:50:30PM +0100, Ard Biesheuvel wrote:
 The EFI loader may load Image at any available offset. This means Image may
 reside very close to or at the base of DRAM, in which case the relocation
 done by efi_entry() results in Image being moved up in memory, which is
 undesirable (memory below the kernel is currently not usable).

Rather than complicating this path I would prefer that we fixed not
being able to address memory below stext - TEXT_OFFSET. That would
simplify the bootloader's job regardless of UEFI, as any loader can then
simply place the kernel at a 2MB-aligned address + TEXT_OFFSET (the
kernel would have to not span a 512MB boundary to keep the initial page
tables simple, but that's still a weaker requirement than what we
currently need).

The approach taken in this patch seems sane for the current way the VA
space is laid out, but if we're not currently wasting an absurd amount
of memory I'd rather we got the VA layout fixed such that we can address
memory below the kernel image.

How much memory are we seeing wasted as a result of the existing
relocation code, and how often?

[...]

   /*
 +  * Check whether the original allocation done by the EFI loader is more
 +  * favorable (i.e., closer to the base of DRAM) than the new one created
 +  * by efi_entry(). If so, reuse the original one.
 +  */
 + adr x3, 0f
 + cmp x3, x0
 + bgt 1f  // this image is loaded higher, so just proceed normally
 +
 + /*
 +  * Jump into the relocated image so we can move the original Image to
 +  * a 2 MB boundary + TEXT_OFFSET inside the original mapping without
 +  * overwriting ourselves.
 +  */
 + sub x3, x3, x22 // subtract current base offset
 + add x3, x3, x0  // add base offset of relocated image
 + br  x3  // jump to next line, but in relocated image

At this point I don't believe the necessary icache maintenance + isb
have occurred to make the new image visible to the I-side and to ensure
we don't have any stale prefetched instructions in the pipeline. That
still seems to happen later. Have I missed something?

 +
 +0:   mov x1, x0  // copy from relocated Image
 + sub x0, x22, #1 // copy to base of original Image
 + orr x0, x0, #(SZ_2M - 1)// .. rounded up to 2 MB
 + ldr x3, =(TEXT_OFFSET + 1)  // .. plus TEXT_OFFSET
 + add x0, x0, x3
 + mov x2, x24 // copy 'size of Image' bytes
 + bl  memcpy
 + add x21, x0, x23// add stext_offset to entry point
 +1:
 + /*
* Flush dcache covering current runtime addresses
* of kernel text/data. Then flush all of icache.
*/
 - adrpx1, _text
 - add x1, x1, #:lo12:_text
 - adrpx2, _edata
 - add x2, x2, #:lo12:_edata
 - sub x1, x2, x1
 -
 + mov x1, x24 // size of Image
   bl  __flush_dcache_area
   ic  ialluis

Cheers,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm64/efi: efistub: jump to 'stext' directly, not through the header

2014-07-16 Thread Mark Rutland
On Wed, Jul 16, 2014 at 03:51:37PM +0100, Mark Salter wrote:
 On Tue, 2014-07-15 at 12:58 +0200, Ard Biesheuvel wrote:
  After the EFI stub has done its business, it jumps into the kernel by 
  branching
  to offset #0 of the loaded Image, which is where it expects to find the 
  header
  containing a 'branch to stext' instruction.
  
  However, the header is not covered by any PE/COFF section, so the header may
  not actually be loaded at the expected offset. So instead, jump to 'stext'
  directly, which is at the base of the PE/COFF .text section, by supplying a
  symbol 'stext_offset' to efi-entry.o which contains the relative offset of
  stext into the Image. Also replace other open coded calculations of the same
  value with a reference to 'stext_offset'
 
 Have you actually seen a situation where the header isn't there?
 Isn't the kernel header actually part of the pe/coff file and
 firmware loads the whole file into RAM?

From my understanding of Ard's earlier comments, this part isn't
guaranteed per the UEFI spec.

I would rather we weren't relying on implementation details.

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64/efi: efistub: jump to 'stext' directly, not through the header

2014-07-15 Thread Mark Rutland
Hi Ard,

On Tue, Jul 15, 2014 at 10:10:02AM +0100, Ard Biesheuvel wrote:
 After the EFI stub has done its business, it jumps into the kernel by 
 branching
 to offset #0 of the loaded Image, which is where it expects to find the header
 containing a 'branch to stext' instruction.
 However, the header is not covered by any PE/COFF section, so the header may
 not actually be loaded at the expected offset. So instead, jump to 'stext'
 directly, which is at the base of the PE/COFF .text section.

It would be nice to point out in the commit message that the other
changes in the patch are just cleanup to use stext_offset rather than
open-coding it.

 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
  arch/arm64/kernel/efi-entry.S |  2 +-
  arch/arm64/kernel/head.S  | 10 ++
  2 files changed, 7 insertions(+), 5 deletions(-)
 
 diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
 index 619b1dd7bcde..6ef541731d9e 100644
 --- a/arch/arm64/kernel/efi-entry.S
 +++ b/arch/arm64/kernel/efi-entry.S
 @@ -61,7 +61,7 @@ ENTRY(efi_stub_entry)
*/
   mov x20, x0 // DTB address
   ldr x0, [sp, #16]   // relocated _text address
 - mov x21, x0
 + add x21, x0, #:lo12:stext_offset

I think we can drop the :lo12: here, which will allow us to have a
warning if stext_offset is unexpectedly large (I believe this will
currently silently mask bits were that to happen?).

Other than that, this looks like a sensible thing to do given that we
cannot rely on the header being present.

Cheers,
Mark.

  
   /*
* Flush dcache covering current runtime addresses
 diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
 index a2c1195abb7f..78ddae28b090 100644
 --- a/arch/arm64/kernel/head.S
 +++ b/arch/arm64/kernel/head.S
 @@ -137,6 +137,8 @@ efi_head:
  #endif
  
  #ifdef CONFIG_EFI
 + .globl  stext_offset
 + .setstext_offset, stext - efi_head
   .align 3
  pe_header:
   .ascii  PE
 @@ -160,7 +162,7 @@ optional_header:
   .long   0   // SizeOfInitializedData
   .long   0   // SizeOfUninitializedData
   .long   efi_stub_entry - efi_head   // AddressOfEntryPoint
 - .long   stext - efi_head// BaseOfCode
 + .long   stext_offset// BaseOfCode
  
  extra_header_fields:
   .quad   0   // ImageBase
 @@ -177,7 +179,7 @@ extra_header_fields:
   .long   _edata - efi_head   // SizeOfImage
  
   // Everything before the kernel image is considered part of the header
 - .long   stext - efi_head// SizeOfHeaders
 + .long   stext_offset// SizeOfHeaders
   .long   0   // CheckSum
   .short  0xa // Subsystem (EFI application)
   .short  0   // DllCharacteristics
 @@ -222,9 +224,9 @@ section_table:
   .byte   0
   .byte   0   // end of 0 padding of section name
   .long   _edata - stext  // VirtualSize
 - .long   stext - efi_head// VirtualAddress
 + .long   stext_offset// VirtualAddress
   .long   _edata - stext  // SizeOfRawData
 - .long   stext - efi_head// PointerToRawData
 + .long   stext_offset// PointerToRawData
  
   .long   0   // PointerToRelocations (0 for executables)
   .long   0   // PointerToLineNumbers (0 for executables)
 -- 
 1.8.3.2
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/arm64: efistub: don't abort if base of DRAM is occupied

2014-07-15 Thread Mark Rutland
On Mon, Jul 14, 2014 at 07:40:48PM +0100, Mark Salter wrote:
 On Mon, 2014-07-14 at 17:25 +0200, Ard Biesheuvel wrote:
  If we fail to relocate the kernel Image to its preferred offset of 
  TEXT_OFFSET
  bytes above the base of DRAM, accept the lowest alternative mapping 
  available
  instead of aborting. We may lose a bit of memory at the low end, but we can
  still proceed normally otherwise.
 
 This breaks APM Mustang because the spin-table holding pen for secondary
 CPUs is marked as reserved memory in the TEXT_OFFSET area and the kernel
 placement using your patch makes it unreachable by kernel. Here is a
 patch I've been working with to solve the same problem:

I'm not sure that this is strictly speaking an issue with UEFI or the
relocation strategy (which sounds sane to me). I believe we could easily
hit similar issues with spin-table elsewhere, and I think we can fix
this more generally without complicating the EFI stub.

As I see it, we have two issues here:

1) The linear mapping starts at VA:PAGE_OFFSET+TEXT_OFFSET /
   PA:PHYS_OFFSET+TEXT_OFFSET, and we cannot access memory below this
   start address. This seems like a general issue we need to address, as
   it forces bootloader code to go through a tricky/impossible dance to
   get the kernel as close to the start of RAM as possible.

2) We cannot access a given cpu-release-addr if it is not in the linear
   mapping. This is the problem we're encountering now.

We can solve (2) now by using a temporary mapping to write to the
cpu-release-addr. Does the below patch (untested) fix your issue with
spin-table?

For (1) we need to rework the arm64 VA layout to decouple the kernel
text mapping from the linear map, but that's a lot more work. 

Cheers,
Mark.

8
From 73812b654a07f497f71bd38dfb4a6753fb0ad23e Mon Sep 17 00:00:00 2001
From: Mark Rutland mark.rutl...@arm.com
Date: Tue, 15 Jul 2014 11:32:53 +0100
Subject: [PATCH] arm64: spin-table: handle unmapped cpu-release-addrs

In certain cases the cpu-release-addr of a CPU may not fall in the
linear mapping (e.g. when the kernel is loaded above this address due to
the presence of other images in memory). This is problematic for the
spin-table code as it assumes that it can trivially convert a
cpu-release-addr to a valid VA in the linear map.

This patch modifies the spin-table code to use a temporary cached
mapping to write to a given cpu-release-addr, enabling us to support
addresses regardless of whether they are covered by the linear mapping.

Signed-off-by: Mark Rutland mark.rutl...@arm.com
---
 arch/arm64/kernel/smp_spin_table.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/smp_spin_table.c 
b/arch/arm64/kernel/smp_spin_table.c
index 0347d38..70181c1 100644
--- a/arch/arm64/kernel/smp_spin_table.c
+++ b/arch/arm64/kernel/smp_spin_table.c
@@ -20,6 +20,7 @@
 #include linux/init.h
 #include linux/of.h
 #include linux/smp.h
+#include linux/types.h
 
 #include asm/cacheflush.h
 #include asm/cpu_ops.h
@@ -65,12 +66,21 @@ static int smp_spin_table_cpu_init(struct device_node *dn, 
unsigned int cpu)
 
 static int smp_spin_table_cpu_prepare(unsigned int cpu)
 {
-   void **release_addr;
+   __le64 __iomem *release_addr;
 
if (!cpu_release_addr[cpu])
return -ENODEV;
 
-   release_addr = __va(cpu_release_addr[cpu]);
+   /*
+* The cpu-release-addr may or may not be inside the linear mapping.
+* As ioremap_cache will either give us a new mapping or reuse the
+* existing linear mapping, we can use it to cover both cases. In
+* either case the memory will be MT_NORMAL.
+*/
+   release_addr = ioremap_cache(cpu_release_addr[cpu],
+sizeof(*release_addr));
+   if (!release_addr)
+   return -ENOMEM;
 
/*
 * We write the release address as LE regardless of the native
@@ -79,15 +89,16 @@ static int smp_spin_table_cpu_prepare(unsigned int cpu)
 * boot-loader's endianess before jumping. This is mandated by
 * the boot protocol.
 */
-   release_addr[0] = (void *) cpu_to_le64(__pa(secondary_holding_pen));
-
-   __flush_dcache_area(release_addr, sizeof(release_addr[0]));
+   writeq_relaxed(__pa(secondary_holding_pen), release_addr);
+   __flush_dcache_area(release_addr, sizeof(*release_addr));
 
/*
 * Send an event to wake up the secondary CPU.
 */
sev();
 
+   iounmap(release_addr);
+
return 0;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/arm64: efistub: don't abort if base of DRAM is occupied

2014-07-15 Thread Mark Rutland
On Tue, Jul 15, 2014 at 11:02:22AM +0100, Leif Lindholm wrote:
 On Mon, Jul 14, 2014 at 02:40:48PM -0400, Mark Salter wrote:
  On Mon, 2014-07-14 at 17:25 +0200, Ard Biesheuvel wrote:
   If we fail to relocate the kernel Image to its preferred offset of 
   TEXT_OFFSET
   bytes above the base of DRAM, accept the lowest alternative mapping 
   available
   instead of aborting. We may lose a bit of memory at the low end, but we 
   can
   still proceed normally otherwise.
  
  This breaks APM Mustang because the spin-table holding pen for secondary
  CPUs is marked as reserved memory in the TEXT_OFFSET area and the kernel
  placement using your patch makes it unreachable by kernel. Here is a
  patch I've been working with to solve the same problem:
 
 Hmm. The thing I don't like about the below approach is that it hard
 wires in the memory below TEXT_OFFSET cannot be used aspect, beyond
 the current prectical limitation.
 
 Since we are likely to see platforms with UEFI memory in use around
 start of RAM, that is a limitation we should probably try to get rid of.

This isn't just an issue for UEFI. There are other reasons one might
want to load a kernel away from the start of RAM while still wanting to
address said RAM(e.g. kdump).

We should address that.

[...]

  @@ -273,6 +282,10 @@ static void __init free_boot_services(void)
  continue;
  }
   
  +   /* Don't free anything below kernel */
  +   if (md-phys_addr  PHYS_OFFSET)
  +   continue;
  +
 
 Is the spin table area really allocated as BOOT_SERVICES_*?

If that is the case, this platform is _broken_. The spin-table memory
(both the code and the mailboxes) needs to live around forever in case
you don't boot all of the secondaries.

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64/efi: efistub: jump to 'stext' directly, not through the header

2014-07-15 Thread Mark Rutland
On Tue, Jul 15, 2014 at 12:49:07PM +0100, Ard Biesheuvel wrote:
 On 15 July 2014 13:31, Mark Rutland mark.rutl...@arm.com wrote:
   diff --git a/arch/arm64/kernel/efi-entry.S 
   b/arch/arm64/kernel/efi-entry.S
   index 619b1dd7bcde..6ef541731d9e 100644
   --- a/arch/arm64/kernel/efi-entry.S
   +++ b/arch/arm64/kernel/efi-entry.S
   @@ -61,7 +61,7 @@ ENTRY(efi_stub_entry)
  */
 mov x20, x0 // DTB address
 ldr x0, [sp, #16]   // relocated _text address
   - mov x21, x0
   + add x21, x0, #:lo12:stext_offset
  
   I think we can drop the :lo12: here, which will allow us to have a
   warning if stext_offset is unexpectedly large (I believe this will
   currently silently mask bits were that to happen?).
  
 
  There is no alternative lo12 relocation that errors out when the value
  does not fit, so it would have to use a literal instead.
 
  Ah, that's a shame. What happens when the value doesn't fit if the
  linker / assembler don't error out?
 
 
 Obviously, it will jump into the wrong place if stext ever gets pushed
 beyond a 4k boundary, which is not that likely (current value is
 0x160, we may want to up it to 0x200 at some point if we need to
 increase the PE/COFF section alignment, which some versions of the
 (poor) PE/COFF documentation say should be 0x200 at the minimum)
 
 However, doing ldr x21, =stext_offset works fine as well
 
  That sounds like a toolchain bug if they're silently doing the wrong
  thing.
 
 
 Meh, don't think so, really. You get what you pay for, and :lo12: just
 isn't supposed to care. (It's mainly used with adrp to get a +/- 4gb
 PC relative reach)

We appear to have a misunderstanding.

I was suggesting we use:

add x21, x0, #stext_offset

... and I thought you meant that wouldn't produce an error if the
immediate was out of range. I understand that :lo12: cuts the top bits
off, and for its intended use that makes sense.

From testing I see see that gas isn't happy to accept an undefined
symbol as an immediate without :lo12:, and I can see why that makes
sense. My suggestion was bad, sorry.

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] efi/arm64: efistub: don't abort if base of DRAM is occupied

2014-07-15 Thread Mark Rutland
On Tue, Jul 15, 2014 at 03:23:26PM +0100, Mark Salter wrote:
 On Tue, 2014-07-15 at 14:54 +0100, Leif Lindholm wrote:
  On Tue, Jul 15, 2014 at 09:11:00AM -0400, Mark Salter wrote:
   On Tue, 2014-07-15 at 11:02 +0100, Leif Lindholm wrote:
 @@ -273,6 +282,10 @@ static void __init free_boot_services(void)
   continue;
   }
  
 + /* Don't free anything below kernel */
 + if (md-phys_addr  PHYS_OFFSET)
 + continue;
 +

Is the spin table area really allocated as BOOT_SERVICES_*?
   
   No. It is EFI_RESERVED_TYPE. But if UEFI is allowed below the kernel,
   then there could be BS code/data memory which we'd want to ignore.
  
  Well, if it is boot service code/data - then there is no need for us
  to keep it around after ExitBootServices().
  
  /
  Leif
 
 One would think, but EFI has proven to be less than strictly compliant
 in that regard in the past. I'm inclined to keep the boot services
 around until after SetVirtualAddressMap just in case.

Why should we add a work around for a potential bug that doesn't exist
yet?

That just provides fertile ground for such a bug to spring into
existence and for people to ignore it when bringing up their SoC. The
comment doesn't explain the rationale and the code doesn't make sense
given a sane implementation.

For the moment it's better to be strict, IMO. Otherwise there are plenty
of other potential bugs we could attempt to work around to enable people
to write firmware with even lower standards...

If we have to work around something then we should have an actual issue
to work around first.

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] arm64/efi: efistub: get text offset and image size from the Image header

2014-07-14 Thread Mark Rutland
Hi Ard,

On Mon, Jul 14, 2014 at 05:17:51PM +0100, Ard Biesheuvel wrote:
 The EFI stub for arm64 needs to behave like an ordinary bootloader in the 
 sense
 that it needs to use the EFI environment and the Image header at runtime and
 not rely on the linker or preprocessor to produce values for text offset,
 image size and kernel size.

Could you elaborate on _why_ we can't do that, given it's linked into
the kernel Image?

Are we splitting the stub from the kernel? What's going on?

 This patch also fixes the corner case where Image happens to be loaded
 at exactly the right offset, but the allocation is actually too small
 to satisfy the requirement imposed by image_size as set in the header.

It's not really imposed by image_size. While it's described by
image_size it's imposed by the existence of the BSS section and the
initial page tables.

 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
  arch/arm64/kernel/Makefile   |  2 --
  arch/arm64/kernel/efi-stub.c | 29 -
  2 files changed, 16 insertions(+), 15 deletions(-)
 
 diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
 index cdaedad3afe5..99b676eeeb0f 100644
 --- a/arch/arm64/kernel/Makefile
 +++ b/arch/arm64/kernel/Makefile
 @@ -4,8 +4,6 @@
  
  CPPFLAGS_vmlinux.lds := -DTEXT_OFFSET=$(TEXT_OFFSET)
  AFLAGS_head.o:= -DTEXT_OFFSET=$(TEXT_OFFSET)
 -CFLAGS_efi-stub.o:= -DTEXT_OFFSET=$(TEXT_OFFSET) \
 --I$(src)/../../../scripts/dtc/libfdt
  
  CFLAGS_REMOVE_ftrace.o = -pg
  CFLAGS_REMOVE_insn.o = -pg
 diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c
 index 9b61d66e2d20..4ba90b2ef677 100644
 --- a/arch/arm64/kernel/efi-stub.c
 +++ b/arch/arm64/kernel/efi-stub.c
 @@ -11,8 +11,7 @@
   */
  #include linux/efi.h
  #include asm/efi.h
 -#include asm/sections.h
 -
 +#include asm/image_hdr.h
  
  efi_status_t handle_kernel_image(efi_system_table_t *sys_table,
unsigned long *image_addr,
 @@ -23,24 +22,28 @@ efi_status_t handle_kernel_image(efi_system_table_t 
 *sys_table,
efi_loaded_image_t *image)
  {
   efi_status_t status;
 - unsigned long kernel_size, kernel_memsize = 0;
 + struct arm64_image_hdr *hdr = (struct arm64_image_hdr *)*image_addr;
 +
 + /* make sure image_addr points to an arm64 kernel Image */
 + if (!arm64_image_hdr_check(hdr)) {
 + pr_efi_err(sys_table, Kernel Image header check failed\n);
 + return EFI_LOAD_ERROR;
 + }

Surely this should always be the case if the stub is linked into the
kernel?

It would be nice to know the rationale for this.

  
   /* Relocate the image, if required. */
 - kernel_size = _edata - _text;
 - if (*image_addr != (dram_base + TEXT_OFFSET)) {
 - kernel_memsize = kernel_size + (_end - _edata) + TEXT_OFFSET;
 - status = efi_low_alloc(sys_table, kernel_memsize, SZ_2M,
 + if (*image_addr != (dram_base + hdr-text_offset) ||
 + image-image_size  hdr-image_size) {

As far as I can tell the size of the Image (image-image_size) is always
going to be less than the effective run time image size
(hdr-image_size).

The SizeOfImage field in head.S which I assume image-image_size is
derived from (not having a UEFI spec in front of me) seems to cover
everything up to _edata but skips the BSS, and initial page tables, but
this is covered by the header's image_size field.

Am I missing something? Surely we _always_ expect image-image_size to
be smaller than hdr-image_size?

Cheers,
Mark.

 + *reserve_size = hdr-text_offset + hdr-image_size;
 + status = efi_low_alloc(sys_table, *reserve_size, SZ_2M,
  reserve_addr);
   if (status != EFI_SUCCESS) {
   pr_efi_err(sys_table, Failed to relocate kernel\n);
 + *reserve_size = 0;
   return status;
   }
 - memcpy((void *)*reserve_addr + TEXT_OFFSET, (void *)*image_addr,
 -kernel_size);
 - *image_addr = *reserve_addr + TEXT_OFFSET;
 - *reserve_size = kernel_memsize;
 + memcpy((void *)*reserve_addr + hdr-text_offset,
 +(void *)*image_addr, image-image_size);
 + *image_addr = *reserve_addr + hdr-text_offset;
   }
 -
 -
   return EFI_SUCCESS;
  }
 -- 
 1.8.3.2
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-efi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] arm64: add C struct definition for Image header

2014-07-14 Thread Mark Rutland
Hi Ard,

On Mon, Jul 14, 2014 at 05:17:50PM +0100, Ard Biesheuvel wrote:
 In order to be able to interpret the Image header from C code, we need a
 struct definition that reflects the specification for Image headers as laid
 out in Documentation/arm64/booting.txt.
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
  arch/arm64/include/asm/image_hdr.h | 75 
 ++
  1 file changed, 75 insertions(+)
  create mode 100644 arch/arm64/include/asm/image_hdr.h
 
 diff --git a/arch/arm64/include/asm/image_hdr.h 
 b/arch/arm64/include/asm/image_hdr.h
 new file mode 100644
 index ..9dc74b672124
 --- /dev/null
 +++ b/arch/arm64/include/asm/image_hdr.h
 @@ -0,0 +1,75 @@
 +/*
 + * image_hdr.h - C struct definition of the arm64 Image header format
 + *
 + * Copyright (C) 2014 Linaro Ltd
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + */
 +
 +#ifndef __ASM_IMAGE_HDR_H
 +#define __ASM_IMAGE_HDR_H
 +
 +#ifdef __KERNEL__
 +#include linux/types.h
 +#else
 +#include stdint.h
 +#endif
 +
 +/**
 + * struct arm64_image_hdr - arm64 kernel Image binary header format
 + * @code0:   entry point, first instruction to jump to when booting
 + *   the Image
 + * @code1:   second instruction of entry point
 + * @text_offset: mandatory offset (little endian) beyond a 2 megabyte
 + *   aligned boundary that needs to be taken into account
 + *   when deciding where to load the image
 + * @image_size:  total size (little endian) of the memory area 
 (counted
 + *   from the offset where the image was loaded) that may be
 + *   used by the booting kernel before memory reservations
 + *   can be honoured
 + * @flags:   little endian bit field
 + *   Bit 0:  Kernel endianness.  1 if BE, 0 if LE.
 + *   Bits 1-63:  Reserved.
 + * @res2:reserved, must be 0
 + * @res3:reserved, must be 0
 + * @res4:reserved, must be 0
 + * @magic:   Magic number (little endian): ARM\x64
 + * @res5:reserved (used for PE COFF offset)
 + *
 + * This definition reflects the definition in 
 Documentation/arm64/booting.txt in
 + * the Linux source tree.
 + */

Can we not just say See Documentation/arm64/booting.txt rather than
duplicating the description?

 +struct arm64_image_hdr {
 + uint32_t code0;
 + uint32_t code1;
 + uint64_t text_offset;
 + uint64_t image_size;
 + uint64_t flags;
 + uint64_t res2;
 + uint64_t res3;
 + uint64_t res4;
 + uint32_t magic;
 + uint32_t res5;
 +};
 +
 +static const union {
 + uint8_t le_val[4];
 + uint32_tcpu_val;
 +} arm64_image_hdr_magic = {
 + .le_val = ARM\x64
 +};
 +
 +/**
 + * int arm64_image_hdr_check() - check whether hdr points to an arm64 Image
 + * @hdr: pointer to an arm64 Image
 + *
 + * Return: 1 if check is successful, 0 otherwise
 + */
 +static inline int arm64_image_hdr_check(struct arm64_image_hdr const *hdr)
 +{
 + return hdr-magic == arm64_image_hdr_magic.cpu_val;
 +}

Rather than the arm64_image_hdr_magic union trick above, could we not
just use the magic inline with a memcmp, e.g.

static inline int arm64_image_hdr_check(struct arm64_image_hdr const *hdr)
{
return !memcmp(hdr-magic, ARM\x64, 4);
}

I'm a little hesitant to expose this to userspace in case the size of
the structure grows and userspace starts relying on it having a
particular length (though perhaps that's unfounded).

It's also a little unfortunate that we lose the nice endianness
annotations here, as it gives a wrong impression of the structure as a
set of native-endian fields.

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] arm64/efi: efistub: get text offset and image size from the Image header

2014-07-14 Thread Mark Rutland
On Mon, Jul 14, 2014 at 06:35:32PM +0100, Ard Biesheuvel wrote:
 On 14 July 2014 18:54, Mark Rutland mark.rutl...@arm.com wrote:
  Hi Ard,
 
  On Mon, Jul 14, 2014 at 05:17:51PM +0100, Ard Biesheuvel wrote:
  The EFI stub for arm64 needs to behave like an ordinary bootloader in the 
  sense
  that it needs to use the EFI environment and the Image header at runtime 
  and
  not rely on the linker or preprocessor to produce values for text offset,
  image size and kernel size.
 
  Could you elaborate on _why_ we can't do that, given it's linked into
  the kernel Image?
 
  Are we splitting the stub from the kernel? What's going on?
 
 
 Perhaps Leif can chime in here? I was under the impression that you
 were aligned in this regard.

Perhaps we were, and I can see reasons for doing this (e.g. as a step
towards making it possible to boot a BE kernel using UEFI), but the
commit message doesn't mention any such reason. For the sake of my poor
recollection and that of others, some words in the commit message would
be helpful.

  This patch also fixes the corner case where Image happens to be loaded
  at exactly the right offset, but the allocation is actually too small
  to satisfy the requirement imposed by image_size as set in the header.
 
  It's not really imposed by image_size. While it's described by
  image_size it's imposed by the existence of the BSS section and the
  initial page tables.
 
 
 Yes, that is true. But from stub POV, it doesn't matter /why/
 image_size has a particular value.

Sure, the stub is a piece of software with no particular knowledge of
anything. However, those of us reading the commit message are not, and
for our sake it would be nice to have a description in case we care as
to /why/ image_size has a particular value and why that value happens to
be useful here.

As far as I can see, we needed to allocate memory for the BSS even
before the image_size field was introduced, so this isn't a new
requirement.

All that's needed is a change to the wording of the commit message to
explain that we're trying to allocate space for (runtime-initialised)
data between _edata and _end rather than how we find out how much space
we need for that (reading the image_size field).

 
  Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
  ---
   arch/arm64/kernel/Makefile   |  2 --
   arch/arm64/kernel/efi-stub.c | 29 -
   2 files changed, 16 insertions(+), 15 deletions(-)
 
  diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
  index cdaedad3afe5..99b676eeeb0f 100644
  --- a/arch/arm64/kernel/Makefile
  +++ b/arch/arm64/kernel/Makefile
  @@ -4,8 +4,6 @@
 
   CPPFLAGS_vmlinux.lds := -DTEXT_OFFSET=$(TEXT_OFFSET)
   AFLAGS_head.o:= -DTEXT_OFFSET=$(TEXT_OFFSET)
  -CFLAGS_efi-stub.o:= -DTEXT_OFFSET=$(TEXT_OFFSET) \
  --I$(src)/../../../scripts/dtc/libfdt
 
   CFLAGS_REMOVE_ftrace.o = -pg
   CFLAGS_REMOVE_insn.o = -pg
  diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c
  index 9b61d66e2d20..4ba90b2ef677 100644
  --- a/arch/arm64/kernel/efi-stub.c
  +++ b/arch/arm64/kernel/efi-stub.c
  @@ -11,8 +11,7 @@
*/
   #include linux/efi.h
   #include asm/efi.h
  -#include asm/sections.h
  -
  +#include asm/image_hdr.h
 
   efi_status_t handle_kernel_image(efi_system_table_t *sys_table,
 unsigned long *image_addr,
  @@ -23,24 +22,28 @@ efi_status_t handle_kernel_image(efi_system_table_t 
  *sys_table,
 efi_loaded_image_t *image)
   {
efi_status_t status;
  - unsigned long kernel_size, kernel_memsize = 0;
  + struct arm64_image_hdr *hdr = (struct arm64_image_hdr *)*image_addr;
  +
  + /* make sure image_addr points to an arm64 kernel Image */
  + if (!arm64_image_hdr_check(hdr)) {
  + pr_efi_err(sys_table, Kernel Image header check failed\n);
  + return EFI_LOAD_ERROR;
  + }
 
  Surely this should always be the case if the stub is linked into the
  kernel?
 
  It would be nice to know the rationale for this.
 
 
 Well, there is an actual issue addressed by this: the PE/COFF .text
 section covers everything from stext to _edata, so it does not cover
 the header itself. In fact, PE/COFF does not allow the headers
 themselves to be covered by a section (or at least, the Tianocore/EDK2
 UEFI implementation does not). So the pointer points *outside* of the
 .text section, and we are assuming that whatever is at that [negative]
 offset in the file was also copied to the same offset in memory.(For
 instance, there is no reason to assume that all implementations of EFI
 will copy data that is not part of any section to RAM when booting an
 image located in NOR flash)

If we are making assumptions which aren't always valid, then why the
hell are we making them in the first place, and then trying to validate
them? How does reading an address that we have absolutely no guarantee

Re: [PATCH v2 1/3] Documentation: arm: add UEFI support documentation

2013-10-03 Thread Mark Rutland
Hi Leif,

On Thu, Oct 03, 2013 at 12:24:39PM +0100, Leif Lindholm wrote:
 This patch provides documentation of the [U]EFI runtime services and
 configuration features.
 
 Cc: linux-...@vger.kernel.org
 Signed-off-by: Leif Lindholm leif.lindh...@linaro.org
 Acked-by: Rob Landley r...@landley.net
 ---
  Documentation/arm/00-INDEX |3 +++
  Documentation/arm/uefi.txt |   47 
 
  2 files changed, 50 insertions(+)
  create mode 100644 Documentation/arm/uefi.txt
 
 diff --git a/Documentation/arm/00-INDEX b/Documentation/arm/00-INDEX
 index 4978456..87e01d1 100644
 --- a/Documentation/arm/00-INDEX
 +++ b/Documentation/arm/00-INDEX
 @@ -36,3 +36,6 @@ nwfpe/
   - NWFPE floating point emulator documentation
  swp_emulation
   - SWP/SWPB emulation handler/logging description
 +
 +uefi.txt
 + - [U]EFI configuration and runtime services documentation
 diff --git a/Documentation/arm/uefi.txt b/Documentation/arm/uefi.txt
 new file mode 100644
 index 000..e6e4d41
 --- /dev/null
 +++ b/Documentation/arm/uefi.txt
 @@ -0,0 +1,47 @@
 +UEFI, the Unified Extensible Firmware Interface is a speifcication
 +governing the behaviours of compatible firmware interfaces. It is
 +maintained by the UEFI Forum - http://www.uefi.org/.
 +
 +Since UEFI is an evolution of its predecessor 'EFI', the terms EFI and
 +UEFI are used somewhat interchangeably in this document and associated
 +source code.
 +
 +The implementation depends on receiving the UEFI runtime memory map and a
 +pointer to the System Table in a Flattened Device Tree - so is only available
 +with CONFIG_OF.
 +
 +It parses the FDT /chosen node for the following parameters:
 +- 'linux,efi-system-table':
 +  Physical address of the system table. (required)
 +  64-bit value since an ARMv7 plattform may support LPAE, and to facilitate

s/plattform/platform/

 +  code sharing with arm64. Top 32 bits will be ignored, since UEFI 
 specification
 +  mandates a 1:1 mapping of all RAM.

You could use something like #size-cells to describe how big this is
going to be. Is this Linux-specific -- it looks like something provided
by EFI rather than the kernel itself.

 +- 'linux,efi-mmap':
 +  The EFI memory map as an embedded property. (required)
 +  An array of type EFI_MEMORY_DESCRIPTOR as described by the UEFI
 +  specification, current version described in Linux by efi_memory_desc_t.
 +  The memory map is represented in little-endian, not DT, byte order.
 +  This map needs to contain at least the regions to be preserved for runtime
 +  services, but would normally just be the map retreieved by calling UEFI
 +  GetMemoryMap() immediately before ExitBootServices().

This is a little scary. If the format is so complicated, should it
really be embedded? How big is this likely to be?

Given that this is in a format defined externally, this isn't really
Linux-specific. Maybe we need an efi pseudo-vendor prefix.

 +- 'linux,efi-mmap-desc-size':
 +  Size of each descriptor in the memory map. (override default)

What units is this in? How many u32 cells does this take up (one
presumably)?

 +- 'linux,efi-mmap-desc-ver':
 +  Memory descriptor format version. (override default)

Type, format, valid values and their meaning?

Thanks,
Mark.

 +
 +It also depends on early_memremap() to parse the UEFI configuration tables.
 +
 +For actually enabling [U]EFI support, enable:
 +- CONFIG_EFI=y
 +- CONFIG_EFI_VARS=y or m
 +
 +After the kernel has mapped the required regions into its address space,
 +a SetVirtualAddressMap() call is made into UEFI in order to update
 +relocations. This call must be performed with all the code in a 1:1
 +mapping. This implementation achieves this by temporarily disabling the
 +MMU for the duration of this call. This can only be done safely:
 +- before secondary CPUs are brought online.
 +- after early_initcalls have completed, since it uses setup_mm_for_reboot().
 +
 +For verbose debug messages, specify 'uefi_debug' on the kernel command
 +line.
 -- 
 1.7.10.4
 
 
` ___
 linux-arm-kernel mailing list
 linux-arm-ker...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
 
--
To unsubscribe from this list: send the line unsubscribe linux-efi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html