Re: arm64/efistub boot error with CONFIG_GCC_PLUGIN_STACKLEAK
On Thu, Aug 15, 2019 at 02:21:26PM +0300, Ard Biesheuvel wrote: > On Thu, 15 Aug 2019 at 14:03, Mark Rutland wrote: > > On Thu, Aug 15, 2019 at 05:56:27AM -0400, skodde wrote: > > > Hi, > > > > > > I've enabled CONFIG_GCC_PLUGIN_STACKLEAK on 5.2.8 for an arm64 > > > macchiatobin board and I get the following error when loading the > > > kernel (using grub-efi on top of edk ii): > > > > > > EFI stub: Booting Linux Kernel... > > > EFI stub: ERROR: efi_get_random_bytes() failed > > > EFI stub: ERROR: Failed to relocate kernel > > > > > > The kernel boots fine with that option disabled, but strangely > > > presents the same error when disabling only CONFIG_RANDOMIZE_BASE. > > > > That shouldn't be possible, given the IS_ENABLED(CONFIG_RANDOMIZE_BASE) > > guard around the efi_get_random_bytes() call, so something sounds wrong. > > > > Are you certain that you're running the same kernel Image that you > > rebuilt? > > > > Ard, do you reckon it would be worth adding the UTS_RELEASE and > > UTS_VERSION to the " Booting Linux Kernel..." string? It would make > > debugging that potential issue easier. > > Use of the UTS_xxx macros already triggers an annoying number of > object rebuilds every time you change anything entirely unrelated in > your kernel sources, so I'd prefer to avoid this tbh. Fair enough; saves me writing a patch! :) Mark.
Re: arm64/efistub boot error with CONFIG_GCC_PLUGIN_STACKLEAK
On Thu, Aug 15, 2019 at 05:56:27AM -0400, skodde wrote: > Hi, > > I've enabled CONFIG_GCC_PLUGIN_STACKLEAK on 5.2.8 for an arm64 > macchiatobin board and I get the following error when loading the > kernel (using grub-efi on top of edk ii): > > EFI stub: Booting Linux Kernel... > EFI stub: ERROR: efi_get_random_bytes() failed > EFI stub: ERROR: Failed to relocate kernel > > The kernel boots fine with that option disabled, but strangely > presents the same error when disabling only CONFIG_RANDOMIZE_BASE. That shouldn't be possible, given the IS_ENABLED(CONFIG_RANDOMIZE_BASE) guard around the efi_get_random_bytes() call, so something sounds wrong. Are you certain that you're running the same kernel Image that you rebuilt? Ard, do you reckon it would be worth adding the UTS_RELEASE and UTS_VERSION to the " Booting Linux Kernel..." string? It would make debugging that potential issue easier. > Let me know if I can provide more info or do some tests. Maybe there's a problem with stale objects. If you're not doing so already, could you try a clean build with CONFIG_RANDOMIZE_BASE deselected? Thanks, Mark.
Re: [PATCH v8 2/5] arm64: replace -pg with CC_FLAGS_FTRACE in efi Makefiles
On Fri, Feb 08, 2019 at 04:10:11PM +0100, Torsten Duwe wrote: > In preparation for arm64 supporting ftrace built on other compiler > options, let's have makefiles remove the $(CC_FLAGS_FTRACE) > flags, whatever these may be, rather than assuming '-pg'. > While at it, fix arm32 as well. > > There should be no functional change as a result of this patch. > > Signed-off-by: Torsten Duwe Ard, would you be happy if we were to take this via the arm64 tree? Assuming so, can we have your ack? Thanks, Mark. > > --- > drivers/firmware/efi/libstub/Makefile |6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > --- a/drivers/firmware/efi/libstub/Makefile > +++ b/drivers/firmware/efi/libstub/Makefile > @@ -16,9 +16,9 @@ cflags-$(CONFIG_X86)+= -m$(BITS) -D__K > > # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly > # disable the stackleak plugin > -cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie > \ > -$(DISABLE_STACKLEAK_PLUGIN) > -cflags-$(CONFIG_ARM) := $(subst -pg,,$(KBUILD_CFLAGS)) \ > +cflags-$(CONFIG_ARM64) := $(subst > $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \ > +-fpie $(DISABLE_STACKLEAK_PLUGIN) > +cflags-$(CONFIG_ARM) := $(subst > $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \ > -fno-builtin -fpic \ > $(call cc-option,-mno-single-pic-base) >
Re: [PATCH v8 2/5] arm64: replace -pg with CC_FLAGS_FTRACE in efi Makefiles
On Fri, Feb 08, 2019 at 04:10:11PM +0100, Torsten Duwe wrote: > In preparation for arm64 supporting ftrace built on other compiler > options, let's have makefiles remove the $(CC_FLAGS_FTRACE) > flags, whatever these may be, rather than assuming '-pg'. > While at it, fix arm32 as well. > > There should be no functional change as a result of this patch. > > Signed-off-by: Torsten Duwe With the indentation removed from the commit text: Reviewed-by: Mark Rutland Mark. > > --- > drivers/firmware/efi/libstub/Makefile |6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > --- a/drivers/firmware/efi/libstub/Makefile > +++ b/drivers/firmware/efi/libstub/Makefile > @@ -16,9 +16,9 @@ cflags-$(CONFIG_X86)+= -m$(BITS) -D__K > > # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly > # disable the stackleak plugin > -cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie > \ > -$(DISABLE_STACKLEAK_PLUGIN) > -cflags-$(CONFIG_ARM) := $(subst -pg,,$(KBUILD_CFLAGS)) \ > +cflags-$(CONFIG_ARM64) := $(subst > $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \ > +-fpie $(DISABLE_STACKLEAK_PLUGIN) > +cflags-$(CONFIG_ARM) := $(subst > $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \ > -fno-builtin -fpic \ > $(call cc-option,-mno-single-pic-base) >
Re: [PATCH] efi/libstub/arm64: handle randomized TEXT_OFFSET
On Tue, Apr 24, 2018 at 01:11:40PM +0200, Ard Biesheuvel wrote: > Hi Mark, > > On 24 April 2018 at 13:00, Mark Rutland <mark.rutl...@arm.com> wrote: > > When CONFIG_RANDOMIZE_TEXT_OFFSET is selected, TEXT_OFFSET is an > > arbitrary multiple of PAGE_SIZE in the interval [0, 2MB). > > > > The EFI stub doesn't accuont for this, and only handles the case where > > 'account' > > If you agree, I will add something here to clarify that the newly > chosen offset should retain the misalignment of TEXT_OFFSET relative > to EFI_KIMG_ALIGN, because it took me a while to figure that out. That makes sense to me. Sorry for the clumsy wording! > > Other than that, > > Reviewed-by: Ard Biesheuvel <ard.biesheu...@linaro.org> > > I will queue this as a fix Great; thanks! Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services
On Mon, Mar 05, 2018 at 03:23:10PM -0800, Sai Praneeth Prakhya wrote: > From: Sai Praneeth <sai.praneeth.prak...@intel.com> > > Presently, efi_runtime_services() are executed by firmware in process > context. To execute efi_runtime_service(), kernel switches the page > directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any > user space mappings. A potential issue could be, for instance, an NMI > interrupt (like perf) trying to profile some user data while in efi_pgd. It might be worth pointing out that this could result in disaster (e.g. if the frame pointer happens to point at MMIO in the EFI runtime services mappings). [...] > pstore writes could potentially be invoked in interrupt context and it > uses set_variable<>() and query_variable_info<>() to store logs. If we > invoke efi_runtime_services() through efi_rts_wq while in atomic() > kernel issues a warning ("scheduling wile in atomic") and prints stack > trace. One way to overcome this is to not make the caller process wait > for the worker thread to finish. This approach breaks pstore i.e. the > log messages aren't written to efi variables. Hence, pstore calls > efi_runtime_services() without using efi_rts_wq or in other words > efi_rts_wq will be used unconditionally for all the > efi_runtime_services() except set_variable<>() and > query_variable_info<>() Can't NMIs still come in during this? ... or do we assume that since something has already gone wrong, this doesn't matter? Otherwise, this doesn't seem to break basic stuff on arm64 platforms. I can boot up, read the EFI RTC, and reboot. I ahd lockdep and KASAN enabled, I received no splats from either. FWIW: Tested-by: Mark Rutland <mark.rutl...@arm.com [arm64] Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()
On Mon, Mar 05, 2018 at 03:23:09PM -0800, Sai Praneeth Prakhya wrote: > @@ -329,6 +331,19 @@ static int __init efisubsys_init(void) > return 0; > > /* > + * Since we process only one efi_runtime_service() at a time, an > + * ordered workqueue (which creates only one execution context) > + * should suffice all our needs. > + */ > + efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0); > + if (!efi_rts_wq) { > + pr_err("Failed to create efi_rts_workqueue, EFI runtime > services " > +"disabled.\n"); > + clear_bit(EFI_RUNTIME_SERVICES, ); > + return 0; > + } I'm a little worried that something might sample this flag between it being set in an early_initcall (arm_enable_runtime_services), and cleared in a subsys_initcall here. However, nothing seems to do that so far, so maybe that's ok... [...] > +/* efi_runtime_service() function identifiers */ > +enum { > + GET_TIME, > + SET_TIME, > + GET_WAKEUP_TIME, > + SET_WAKEUP_TIME, > + GET_VARIABLE, > + GET_NEXT_VARIABLE, > + SET_VARIABLE, > + SET_VARIABLE_NONBLOCKING, > + QUERY_VARIABLE_INFO, > + QUERY_VARIABLE_INFO_NONBLOCKING, > + GET_NEXT_HIGH_MONO_COUNT, > + RESET_SYSTEM, > + UPDATE_CAPSULE, > + QUERY_CAPSULE_CAPS, > +}; Can we please give this enum a name [...] > +/* > + * efi_runtime_work: Details of EFI Runtime Service work > + * @func:EFI Runtime Service function identifier > + * @arg<1-5>:EFI Runtime Service function arguments > + * @status: Status of executing EFI Runtime Service > + */ > +struct efi_runtime_work { > + u8 func; ... and use it here rather than an opaque u8? I realise that means placing the enum in . > + void *arg1; > + void *arg2; > + void *arg3; > + void *arg4; > + void *arg5; > + efi_status_t status; > + struct work_struct work; > +}; Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] efi/arm*: Only register page tables when they exist
Currently the arm/arm64 runtime code registers the runtime servies pagetables with ptdump regardless of whether runtime services page tables have been created. As efi_mm.pgd is NULL in these cases, attempting to dump the efi page tables results in a NULL pointer dereference in the ptdump code: /sys/kernel/debug# cat efi_page_tables [ 479.522600] Unable to handle kernel NULL pointer dereference at virtual address [ 479.522715] Mem abort info: [ 479.522764] ESR = 0x9606 [ 479.522850] Exception class = DABT (current EL), IL = 32 bits [ 479.522899] SET = 0, FnV = 0 [ 479.522937] EA = 0, S1PTW = 0 [ 479.528200] Data abort info: [ 479.528230] ISV = 0, ISS = 0x0006 [ 479.528317] CM = 0, WnR = 0 [ 479.528317] user pgtable: 4k pages, 48-bit VAs, pgd = 64ab0cb0 [ 479.528449] [] *pgd=fbbe4003, *pud=fb66e003, *pmd= [ 479.528600] Internal error: Oops: 9606 [#1] PREEMPT SMP [ 479.528664] Modules linked in: [ 479.528699] CPU: 0 PID: 2457 Comm: cat Not tainted 4.15.0-rc3-00065-g2ad2ee7ecb5c-dirty #7 [ 479.528799] Hardware name: FVP Base (DT) [ 479.528899] pstate: 0049 (nzcv daif +PAN -UAO) [ 479.528941] pc : walk_pgd.isra.1+0x20/0x1d0 [ 479.529011] lr : ptdump_walk_pgd+0x30/0x50 [ 479.529105] sp : 0bf4bc20 [ 479.529185] x29: 0bf4bc20 x28: 9d22e000 [ 479.529271] x27: 0002 x26: 80007b4c63c0 [ 479.529358] x25: 014000c0 x24: 80007c098900 [ 479.529445] x23: 0bf4beb8 x22: [ 479.529532] x21: 0bf4bd70 x20: 0001 [ 479.529618] x19: 0bf4bcb0 x18: [ 479.529760] x17: 0041a1c8 x16: 082139d8 [ 479.529800] x15: 9d3c6030 x14: 9d2527f4 [ 479.529924] x13: 03f3 x12: 0038 [ 479.53] x11: 0003 x10: 0101010101010101 [ 479.530099] x9 : 17e94050 x8 : 003f [ 479.530226] x7 : x6 : [ 479.530313] x5 : 0001 x4 : [ 479.530416] x3 : 09069fd8 x2 : [ 479.530500] x1 : x0 : [ 479.530599] Process cat (pid: 2457, stack limit = 0x5d1b0e6f) [ 479.530660] Call trace: [ 479.530746] walk_pgd.isra.1+0x20/0x1d0 [ 479.530833] ptdump_walk_pgd+0x30/0x50 [ 479.530907] ptdump_show+0x10/0x20 [ 479.530920] seq_read+0xc8/0x470 [ 479.531023] full_proxy_read+0x60/0x90 [ 479.531100] __vfs_read+0x18/0x100 [ 479.531180] vfs_read+0x88/0x160 [ 479.531267] SyS_read+0x48/0xb0 [ 479.531299] el0_svc_naked+0x20/0x24 [ 479.531400] Code: 91400420 f90033a0 a90707a2 f9403fa0 (f940) [ 479.531499] ---[ end trace bfe8e28d8acb2b67 ]--- Segmentation fault Let's avoid this problem by only registering the tables after their successful creation, which is also less confusing when EFI runtime services are not in use. Signed-off-by: Mark Rutland <mark.rutl...@arm.com> Reported-by: Will Deacon <will.dea...@arm.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: linux-efi@vger.kernel.org Cc: linux-ker...@vger.kernel.org --- drivers/firmware/efi/arm-runtime.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c index 1cc41c3d6315..86a1ad17a32e 100644 --- a/drivers/firmware/efi/arm-runtime.c +++ b/drivers/firmware/efi/arm-runtime.c @@ -54,6 +54,9 @@ static struct ptdump_info efi_ptdump_info = { static int __init ptdump_init(void) { + if (!efi_enabled(EFI_RUNTIME_SERVICES)) + return 0; + return ptdump_debugfs_register(_ptdump_info, "efi_page_tables"); } device_initcall(ptdump_init); -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND PATCH v4 2/2] arm64: Add software workaround for Falkor erratum 1041
Hi, On Sun, Dec 10, 2017 at 08:03:43PM -0600, Shanker Donthineni wrote: > +/** > + * Errata workaround prior to disable MMU. Insert an ISB immediately prior > + * to executing the MSR that will change SCTLR_ELn[M] from a value of 1 to 0. > + */ > + .macro pre_disable_mmu_workaround > +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041 > +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041 > + isb > +alternative_else_nop_endif > +#endif > + .endm There's really no need for this to be an alternative. It makes the kernel larger and more complex due to all the altinstr data and probing code. As Will suggested last time [1], please just use the ifdef, and always compile-in the extra ISB if CONFIG_QCOM_FALKOR_ERRATUM_E1041 is selected. Get rid of the alternatives and probing code. All you need here is: /* * Some Falkor parts make errant speculative instruction fetches * when SCTLR_ELx.M is cleared. An ISB before the write to * SCTLR_ELx prevents this. */ .macro pre_disable_mmu_workaround #ifdef isb #endif .endm > + > + .macro pre_disable_mmu_early_workaround > +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041 > + isb > +#endif > + .endm > + ... and we don't need a special early variant. Thanks, Mark. [1] https://lkml.kernel.org/r/20171201112457.ge18...@arm.com -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 19/27] x86: assembly, make some functions local
On Wed, Oct 25, 2017 at 04:21:48PM +0200, Jiri Slaby wrote: > Hi, > > On 10/06/2017, 03:21 PM, Mark Rutland wrote: > > If the aim of this series is to introduce something that architectures > > use consistently, then can we please actually poke other architectures > > about it? e.g. send this to linux-arch, with a cover letter explaining > > the idea and asking maintainers to take a look. > > Sure, with v5, I will. Thanks; that's much appreciated, and I look forward to it! Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 19/27] x86: assembly, make some functions local
Hi Jiri, I can see that these serve a useful purpose (as they are necessary for asm validation encessary for livepatching), and I am not personally averse to the new annotations. However, I am concerned that as-is, this is going to create more problems for !x86 architectures. More on that below. On Fri, Oct 06, 2017 at 02:53:08PM +0200, Jiri Slaby wrote: > On 10/04/2017, 09:33 AM, Ard Biesheuvel wrote: > > On 4 October 2017 at 08:22, Jiri Slabywrote: > >> On 10/02/2017, 02:48 PM, Ard Biesheuvel wrote: > >>> On 2 October 2017 at 10:12, Jiri Slaby wrote: > There is a couple of assembly functions, which are invoked only locally > in the file they are defined. In C, we mark them "static". In assembly, > annotate them using SYM_{FUNC,CODE}_START_LOCAL (and switch their > ENDPROC to SYM_{FUNC,CODE}_END too). Whether FUNC or CODE depends on > ENDPROC/END for a particular function (C or non-C). > >>> > >>> I wasn't cc'ed on the cover letter, so I am missing the rationale of > >>> replacing ENTRY/ENDPROC with other macros. > >> > >> There was no cover letter. I am attaching what is in PATCH 1/27 instead: > >> Introduce new C macros for annotations of functions and data in > >> assembly. There is a long-standing mess in macros like ENTRY, END, > >> ENDPROC and similar. They are used in different manners and sometimes > >> incorrectly. > > > > I must say, I don't share this sentiment. > > > > In arm64, we use ENTRY/ENDPROC for functions with external linkage, > > and the bare symbol name/ENDPROC for functions with local linkage. I > > guess we could add ENDOBJECT if we wanted to, but we never really felt > > the need. > > Yes and this is exactly the reason for the new macros. Now, it's a > complete mess. One arch does this, another does that. If the aim of this series is to introduce something that architectures use consistently, then can we please actually poke other architectures about it? e.g. send this to linux-arch, with a cover letter explaining the idea and asking maintainers to take a look. I think one reason that ENTRY/END/ENDPROC are used inconsistently is that they're insufficiently documented. So people assume (inconsistent) semantics, and often cargo-cult usage without thinking. To avoid that, could we please document how these new macros should (and should not) be used? That way, we have a much better chance of consistency, and it's easier to figure out if the intended semantics are necessary/sufficient. Otherwise, I'm worried that bits of this will be inconsistently and incorrectly cargo-culted into other architectures, making matters worse. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with cr3
On Wed, Aug 16, 2017 at 11:07:10AM +0100, Will Deacon wrote: > On Wed, Aug 16, 2017 at 10:53:38AM +0100, Mark Rutland wrote: > > On Wed, Aug 16, 2017 at 10:31:12AM +0100, Ard Biesheuvel wrote: > > > (+ Mark, Will) > > > > > > On 15 August 2017 at 22:46, Andy Lutomirski <l...@kernel.org> wrote: > > > > On Tue, Aug 15, 2017 at 12:18 PM, Sai Praneeth Prakhya > > > > <sai.praneeth.prak...@intel.com> wrote: > > > >> +/* > > > >> + * Makes the calling kernel thread switch to/from efi_mm context > > > >> + * Can be used from SetVirtualAddressMap() or during efi runtime calls > > > >> + * (Note: This routine is heavily inspired from use_mm) > > > >> + */ > > > >> +void efi_switch_mm(struct mm_struct *mm) > > > >> +{ > > > >> + struct task_struct *tsk = current; > > > >> + > > > >> + task_lock(tsk); > > > >> + efi_scratch.prev_mm = tsk->active_mm; > > > >> + if (efi_scratch.prev_mm != mm) { > > > >> + mmgrab(mm); > > > >> + tsk->active_mm = mm; > > > >> + } > > > >> + switch_mm(efi_scratch.prev_mm, mm, NULL); > > > >> + task_unlock(tsk); > > > >> + > > > >> + if (efi_scratch.prev_mm != mm) > > > >> + mmdrop(efi_scratch.prev_mm); > > > > > > > > I'm confused. You're mmdropping an mm that you are still keeping a > > > > pointer to. This is also a bit confusing in the case where you do > > > > efi_switch_mm(efi_scratch.prev_mm). > > > > > > > > This whole manipulation seems fairly dangerous to me for another > > > > reason -- you're taking a user thread (I think) and swapping out its > > > > mm to something that the user in question should *not* have access to. > > > > What if a perf interrupt happens while you're in the alternate mm? > > > > What if you segfault and dump core? Should we maybe just have a flag > > > > that says "this cpu is using a funny mm", assert that the flag is > > > > clear when scheduling, and teach perf, coredumps, etc not to touch > > > > user memory when the flag is set? > > > > > > It appears we may have introduced this exact issue on arm64 and ARM by > > > starting to run the UEFI runtime services with interrupts enabled. > > > (perf does not use NMI on ARM, so the issue did not exist beforehand) > > > > > > Mark, Will, any thoughts? > > > > Yup, I can cause perf to take samples from the EFI FW code, so that's > > less than ideal. > > But that should only happen if you're profiling EL1, right, which needs > root privileges? (assuming the skid issue is solved -- not sure what > happened to those patches after they broke criu). I *think* that only needs perf_event_paranoid < 1, rather than root. It's certianly not accessible by default to most users (e.g. my Ubuntu fs sets this to 2, and IIRC Debian go to a much more stringent non-upstream paranoid level). > > The "funny mm" flag sounds like a good idea to me, though given recent > > pain with sampling in the case of skid, I don't know exactly what we > > should do if/when we take an overflow interrupt while in EFI. > > I don't think special-casing perf interrupts is the right thing to do here. > If we're concerned about user-accesses being made off the back of interrupts > taken whilst in EFI, then we should probably either swizzle back in the > user page table on the IRQ path or postpone handling it until we're done > with the firmware. Doing that for every IRQ feels odd, especially as the result would be sampling something that wasn't executed, potentially repeatedly, giveing bogus info. > Having a flag feels a bit weird: would the uaccess routines return > -EFAULT if it's set? I'd expect we'd abort at a higher level, not taking any sample. i.e. we'd have the core overflow handler check in_funny_mm(), and if so, skip the sample, as with the skid case. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with cr3
On Wed, Aug 16, 2017 at 10:31:12AM +0100, Ard Biesheuvel wrote: > (+ Mark, Will) > > On 15 August 2017 at 22:46, Andy Lutomirskiwrote: > > On Tue, Aug 15, 2017 at 12:18 PM, Sai Praneeth Prakhya > > wrote: > >> +/* > >> + * Makes the calling kernel thread switch to/from efi_mm context > >> + * Can be used from SetVirtualAddressMap() or during efi runtime calls > >> + * (Note: This routine is heavily inspired from use_mm) > >> + */ > >> +void efi_switch_mm(struct mm_struct *mm) > >> +{ > >> + struct task_struct *tsk = current; > >> + > >> + task_lock(tsk); > >> + efi_scratch.prev_mm = tsk->active_mm; > >> + if (efi_scratch.prev_mm != mm) { > >> + mmgrab(mm); > >> + tsk->active_mm = mm; > >> + } > >> + switch_mm(efi_scratch.prev_mm, mm, NULL); > >> + task_unlock(tsk); > >> + > >> + if (efi_scratch.prev_mm != mm) > >> + mmdrop(efi_scratch.prev_mm); > > > > I'm confused. You're mmdropping an mm that you are still keeping a > > pointer to. This is also a bit confusing in the case where you do > > efi_switch_mm(efi_scratch.prev_mm). > > > > This whole manipulation seems fairly dangerous to me for another > > reason -- you're taking a user thread (I think) and swapping out its > > mm to something that the user in question should *not* have access to. > > What if a perf interrupt happens while you're in the alternate mm? > > What if you segfault and dump core? Should we maybe just have a flag > > that says "this cpu is using a funny mm", assert that the flag is > > clear when scheduling, and teach perf, coredumps, etc not to touch > > user memory when the flag is set? > > It appears we may have introduced this exact issue on arm64 and ARM by > starting to run the UEFI runtime services with interrupts enabled. > (perf does not use NMI on ARM, so the issue did not exist beforehand) > > Mark, Will, any thoughts? Yup, I can cause perf to take samples from the EFI FW code, so that's less than ideal. The "funny mm" flag sounds like a good idea to me, though given recent pain with sampling in the case of skid, I don't know exactly what we should do if/when we take an overflow interrupt while in EFI. Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi: arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP
On Mon, Jun 05, 2017 at 08:04:35AM +, Ard Biesheuvel wrote: > On ARM, regions of memory that are described by UEFI as having special > significance to the firmware itself are omitted from the linear mapping. > This is necessary since we cannot guarantee that alternate mappings of > the same physical region will use attributes that are compatible with > the ones we use for the linear mapping, and aliases with mismatched > attributes are prohibited by the architecture. > > The above does not apply to ACPI reclaim regions: such regions have no > special significance to the firmware, and it is up to the OS to decide > whether or not to preserve them after it has consumed their contents, > and for how long, after which time the OS can use the memory in any way > it likes. In the Linux case, such regions are preserved indefinitely, > and are simply treated the same way as other 'reserved' memory types. > > Punching holes into the linear mapping causes page table fragmentation, > which increases TLB pressure, and so we should avoid doing so if we can. > So add a special case for regions of type EFI_ACPI_RECLAIM_MEMORY, and > memblock_reserve() them instead of marking them MEMBLOCK_NOMAP. > > Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> Sounds sane to me. FWIW: Acked-by: Mark Rutland <mark.rutl...@arm.com> Thanks, Mark. > --- > drivers/firmware/efi/arm-init.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c > index 1027d7b44358..0aa4ce7b4fbb 100644 > --- a/drivers/firmware/efi/arm-init.c > +++ b/drivers/firmware/efi/arm-init.c > @@ -159,6 +159,7 @@ static __init int is_usable_memory(efi_memory_desc_t *md) > switch (md->type) { > case EFI_LOADER_CODE: > case EFI_LOADER_DATA: > + case EFI_ACPI_RECLAIM_MEMORY: > case EFI_BOOT_SERVICES_CODE: > case EFI_BOOT_SERVICES_DATA: > case EFI_CONVENTIONAL_MEMORY: > @@ -211,6 +212,10 @@ static __init void reserve_regions(void) > > if (!is_usable_memory(md)) > memblock_mark_nomap(paddr, size); > + > + /* keep ACPI reclaim memory intact for kexec etc. */ > + if (md->type == EFI_ACPI_RECLAIM_MEMORY) > + memblock_reserve(paddr, size); > } > } > } > -- > 2.9.3 > -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/libstub: arm/arm64: don't use TASK_SIZE when randomising the RT space
On Mon, Apr 10, 2017 at 11:30:38AM +0100, Ard Biesheuvel wrote: > As reported by James, Catalin and Mark, commit e69176d68d26 > ("ef/libstub/arm/arm64: Randomize the base of the UEFI rt services > region") results in a crash in the firmware regardless of whether KASLR > is in effect or not, and whether the firmware implements EFI_RNG_PROTOCOL > or not. > > Mark has identified the root cause to be the inappropriate use of > TASK_SIZE in the stub, which arm64 defines as > > #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ > TASK_SIZE_32 : TASK_SIZE_64) > > and testing thread flags at this point results in the dereference of > pointers in uninitialized structures. > > So instead, introduce a preprocessor symbol EFI_RT_VIRTUAL_LIMIT and > define it to TASK_SIZE_64 on arm64 and TASK_SIZE on ARM, both of which > are compile time constants. Also, change the 'headroom' variable to > static const to force an error if this changes in the future. > > Cc: James Morse <james.mo...@arm.com> > Cc: Mark Rutland <mark.rutl...@arm.com> > Cc: Catalin Marinas <catalin.mari...@arm.com> > Cc: Matt Fleming <m...@codeblueprint.co.uk> > Cc: Ingo Molnar <mi...@kernel.org> > Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> With this patch applied atop of next-20170410, a defconfig arm64 kernel built with the Linaro 15.08 toolchain boots happily for me on Juno R1. So FWIW: Tested-by: Mark Rutland <mark.rutl...@arm.com> Thanks, Mark. > --- > > Apologies for the breakage. On the systems I have tested, sp_el0 apparently > pointed somewhere sane when I inadvertently dereferenced it, and the > resulting addresses looked sufficiently random to me. > > Ingo, once we have some confirmation that this makes the problem go away, > could you please take this straight into efi/core? Thanks. > > drivers/firmware/efi/libstub/arm-stub.c | 11 +-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/firmware/efi/libstub/arm-stub.c > b/drivers/firmware/efi/libstub/arm-stub.c > index 1e45ec51b094..34010ff3b77e 100644 > --- a/drivers/firmware/efi/libstub/arm-stub.c > +++ b/drivers/firmware/efi/libstub/arm-stub.c > @@ -32,6 +32,12 @@ > #define EFI_RT_VIRTUAL_BASE SZ_512M > #define EFI_RT_VIRTUAL_SIZE SZ_512M > > +#ifdef CONFIG_ARM64 > +#define EFI_RT_VIRTUAL_LIMIT TASK_SIZE_64 > +#else > +#define EFI_RT_VIRTUAL_LIMIT TASK_SIZE > +#endif > + > static u64 virtmap_base = EFI_RT_VIRTUAL_BASE; > > efi_status_t efi_open_volume(efi_system_table_t *sys_table_arg, > @@ -236,8 +242,9 @@ unsigned long efi_entry(void *handle, efi_system_table_t > *sys_table, >* shift of 21 bit positions into account when scaling >* the headroom value using a 32-bit random value. >*/ > - u64 headroom = TASK_SIZE - EFI_RT_VIRTUAL_BASE - > -EFI_RT_VIRTUAL_SIZE; > + static const u64 headroom = EFI_RT_VIRTUAL_LIMIT - > + EFI_RT_VIRTUAL_BASE - > + EFI_RT_VIRTUAL_SIZE; > u32 rnd; > > status = efi_get_random_bytes(sys_table, sizeof(rnd), > -- > 2.9.3 > -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] ef/libstub: arm/arm64: randomize the base of the UEFI rt services region
On Fri, Apr 07, 2017 at 04:51:16PM +0100, Ard Biesheuvel wrote: > On 7 April 2017 at 16:47, James Morsewrote: > > On 24/03/17 13:24, Ard Biesheuvel wrote: > >> Update the allocation logic for the virtual mapping of the UEFI runtime > >> services to start from a randomized base address if KASLR is in effect, > >> and if the UEFI firmware exposes an implementation of EFI_RNG_PROTOCOL. > >> > >> This makes it more difficult to predict the location of exploitable > >> data structures in the runtime UEFI firmware, which increases robustness > >> against attacks. Note that these regions are only mapped during the > >> time a runtime service call is in progress, and only on a single CPU > >> at a time, bit give the lack of a downside, let's enable it nonetheless. > > > > With next-20170407 on Seattle Overdrive, I get an SError[0] on boot: > > * Revert "ef/libstub/arm/arm64: Randomize the base of the UEFI rt services > > region" > > > > At which point the machine start booting to a prompt again, (its noisier > > than > > usual but looks like double-printing). > That is quite interesting, to be honest, because that patch should > effectively be a NOP on systems that do not implement > EFI_RNG_PROTOCOL. FWIW, I'm also seeing a crash as a result of this patch, on a Juno R1 with an EFI_RNG_PROTOCOL implementation, with next-20170410 defconfig. I'm using the Linaro 15.08 toolchain. EFI stub: Booting Linux Kernel... EFI stub: Using DTB from configuration table WARNING: this system is using an unsafe pseudo-random implementation of RngLib! Synchronous Exception at 0xF80FC7E8 If I replace the !nokaslr() in efi_entry() with false, my kernel boots. If I replace !nokaslr() with true, it fails to boot, even with the call to efi_get_random_bytes() replaced with status = EFI_ERROR. The problem appears to be the use of TASK_SIZE, since on arm64 that's: #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ TASK_SIZE_32 : TASK_SIZE_64) ... which would mean that the stub would be trying to dereference whatever it found in sp_el0. Looking at objdump: 0160 : ... 4b4: 9400bl 0 4b8: 35000280cbnzw0, 508 4bc: d5384100mrs x0, sp_el0 4c0: f944ldr x4, [x0] ... So I think we should revert this for now. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64: xen: Implement EFI reset_system callback
[Adding the EFI maintainers] Tl;DR: Xen's EFI wrappery doesn't implement reset_system, so when invoked on arm64 we get a NULL dereference. On Thu, Apr 06, 2017 at 04:39:13PM +0100, Julien Grall wrote: > On 06/04/17 16:20, Daniel Kiper wrote: > >On Thu, Apr 06, 2017 at 04:38:24PM +0200, Juergen Gross wrote: > >>On 06/04/17 16:27, Daniel Kiper wrote: > >>>On Thu, Apr 06, 2017 at 09:32:32AM +0100, Julien Grall wrote: > Hi Juergen, > > On 06/04/17 07:23, Juergen Gross wrote: > >On 05/04/17 21:49, Boris Ostrovsky wrote: > >>On 04/05/2017 02:14 PM, Julien Grall wrote: > >>>The x86 code has theoritically a similar issue, altought EFI does not > >>>seem to be the preferred method. I have left it unimplemented on x86 > >>>and > >>>CCed Linux Xen x86 maintainers to know their view here. > >> > >>(+Daniel) > >> > >>This could be a problem for x86 as well, at least theoretically. > >>xen_machine_power_off() may call pm_power_off(), which is > >>efi.reset_system. > >> > >>So I think we should have a similar routine there. > > > >+1 > > > >I don't see any problem with such a routine added, in contrast to > >potential "reboots" instead of power off without it. > > > >So I think this dummy xen_efi_reset_system() should be added to > >drivers/xen/efi.c instead. > > I will resend the patch during day with xen_efi_reset_system moved > to common code and implement the x86 counterpart (thought, I will > not be able to test it). > >>> > >>>I think that this is ARM specific issue. On x86 machine_restart() calls > >>>xen_restart(). Hence, everything works. So, I think that it should be > >>>fixed only for ARM. Anyway, please CC me when you send a patch. > >> > >>What about xen_machine_power_off() (as stated in Boris' mail)? > > > >Guys what do you think about that: > > > >--- a/drivers/firmware/efi/reboot.c > >+++ b/drivers/firmware/efi/reboot.c > >@@ -55,7 +55,7 @@ static void efi_power_off(void) > > > > static int __init efi_shutdown_init(void) > > { > >- if (!efi_enabled(EFI_RUNTIME_SERVICES)) > >+ if (!efi_enabled(EFI_RUNTIME_SERVICES) || efi_enabled(EFI_PARAVIRT)) > >return -ENODEV; > > > >if (efi_poweroff_required()) > > > > > >Julien, for ARM64 please take a look at > >arch/arm64/kernel/efi.c:efi_poweroff_required(void). > > > >I hope that tweaks for both files should solve our problem. > > This sounds good for power off (I haven't tried to power off DOM0 > yet). Please, let's keep the Xen knowledge constrained to the Xen EFI wrapper, rather than spreading it further. IMO, given reset_system is a *mandatory* function, the Xen wrapper should provide an implementation. I don't see why you can't implement a wrapper that calls the usual Xen poweroff/reset functions. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] efi/libstub: arm/arm64: disable debug prints on 'quiet' cmdline arg
On Fri, Mar 24, 2017 at 01:24:09PM +, Ard Biesheuvel wrote: > The EFI stub currently prints a number of diagnostic messages that do > not carry a lot of information. Since these prints are not controlled > by 'loglevel' or other command line parameters, and since they appear on > the EFI framebuffer as well (if enabled), it would be nice if we could > turn them off. > > So let's add support for the 'quiet' command line parameter in the stub, > and disable the non-error prints if it is passed. > > Cc: Matt Fleming <m...@codeblueprint.co.uk> > Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> > --- > This was previously sent out as a separate patch. The difference is that this > version looks for 'quiet' rather than 'efi=debug', and preserves the noisy > console as the default. This addresses my prior concern, so FWIW: Acked-by: Mark Rutland <mark.rutl...@arm.com> Mark. > drivers/firmware/efi/libstub/arm-stub.c| 12 ++-- > drivers/firmware/efi/libstub/arm32-stub.c | 2 ++ > drivers/firmware/efi/libstub/efi-stub-helper.c | 9 + > drivers/firmware/efi/libstub/efistub.h | 7 +++ > drivers/firmware/efi/libstub/secureboot.c | 2 ++ > include/linux/efi.h| 3 --- > 6 files changed, 26 insertions(+), 9 deletions(-) > > diff --git a/drivers/firmware/efi/libstub/arm-stub.c > b/drivers/firmware/efi/libstub/arm-stub.c > index fcd34057dc1c..6f522e3091af 100644 > --- a/drivers/firmware/efi/libstub/arm-stub.c > +++ b/drivers/firmware/efi/libstub/arm-stub.c > @@ -116,8 +116,6 @@ unsigned long efi_entry(void *handle, efi_system_table_t > *sys_table, > if (sys_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE) > goto fail; > > - pr_efi(sys_table, "Booting Linux Kernel...\n"); > - > status = check_platform_features(sys_table); > if (status != EFI_SUCCESS) > goto fail; > @@ -151,6 +149,12 @@ unsigned long efi_entry(void *handle, efi_system_table_t > *sys_table, > goto fail; > } > > + status = efi_parse_options(cmdline_ptr); > + if (status != EFI_SUCCESS) > + pr_efi_err(sys_table, "Failed to parse EFI cmdline options\n"); > + > + pr_efi(sys_table, "Booting Linux Kernel...\n"); > + > si = setup_graphics(sys_table); > > status = handle_kernel_image(sys_table, image_addr, _size, > @@ -162,10 +166,6 @@ unsigned long efi_entry(void *handle, efi_system_table_t > *sys_table, > goto fail_free_cmdline; > } > > - status = efi_parse_options(cmdline_ptr); > - if (status != EFI_SUCCESS) > - pr_efi_err(sys_table, "Failed to parse EFI cmdline options\n"); > - > secure_boot = efi_get_secureboot(sys_table); > > /* > diff --git a/drivers/firmware/efi/libstub/arm32-stub.c > b/drivers/firmware/efi/libstub/arm32-stub.c > index 18a8b5eb55e7..becbda445913 100644 > --- a/drivers/firmware/efi/libstub/arm32-stub.c > +++ b/drivers/firmware/efi/libstub/arm32-stub.c > @@ -9,6 +9,8 @@ > #include > #include > > +#include "efistub.h" > + > efi_status_t check_platform_features(efi_system_table_t *sys_table_arg) > { > int block; > diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c > b/drivers/firmware/efi/libstub/efi-stub-helper.c > index 1575b566cd4a..93685f79f9e8 100644 > --- a/drivers/firmware/efi/libstub/efi-stub-helper.c > +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c > @@ -33,11 +33,16 @@ > static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE; > > static int __section(.data) __nokaslr; > +static int __section(.data) __quiet; > > int __pure nokaslr(void) > { > return __nokaslr; > } > +int __pure is_quiet(void) > +{ > + return __quiet; > +} > > #define EFI_MMAP_NR_SLACK_SLOTS 8 > > @@ -428,6 +433,10 @@ efi_status_t efi_parse_options(char const *cmdline) > if (str == cmdline || (str && str > cmdline && *(str - 1) == ' ')) > __nokaslr = 1; > > + str = strstr(cmdline, "quiet"); > + if (str == cmdline || (str && str > cmdline && *(str - 1) == ' ')) > + __quiet = 1; > + > /* >* If no EFI parameters were specified on the cmdline we've got >* nothing to do. > diff --git a/drivers/firmware/efi/libstub/efistub.h > b/drivers/firmware/efi/libstub/efistub.h > index a7a2a2c3f199..83f268c05007 100644 > --- a/drivers/firmware/efi/libstub/efistub.h > +++ b/drivers/firmware/efi/libstub/efistub.h > @@ -25,6 +25,13 @@ &g
Re: [PATCH] efi/libstub: arm/arm64: make debug prints dependent on efi=debug
On Mon, Mar 20, 2017 at 08:57:05PM +, Ard Biesheuvel wrote: > The EFI stub currently prints a number of diagnostic messages that do > not carry a lot of information. Since these prints are not controlled > by 'loglevel' or other command line parameters, and since they appear on > the EFI framebuffer as well (if enabled), it would be nice if we could > turn them off. > > So let's only print them if 'efi=debug' is passed on the command line, > or when CONFIG_DEBUG_EFI is set in the kernel build configuration. > > Signed-off-by: Ard Biesheuvel> --- > drivers/firmware/efi/libstub/arm32-stub.c | 2 ++ > drivers/firmware/efi/libstub/efi-stub-helper.c | 23 ++- > drivers/firmware/efi/libstub/efistub.h | 9 + > drivers/firmware/efi/libstub/secureboot.c | 2 ++ > include/linux/efi.h| 3 --- > 5 files changed, 27 insertions(+), 12 deletions(-) > diff --git a/drivers/firmware/efi/libstub/efistub.h > b/drivers/firmware/efi/libstub/efistub.h > index 71c4d0e3c4ed..5ed16e6e166f 100644 > --- a/drivers/firmware/efi/libstub/efistub.h > +++ b/drivers/firmware/efi/libstub/efistub.h > @@ -24,6 +24,15 @@ > #define EFI_ALLOC_ALIGN EFI_PAGE_SIZE > #endif > > +extern int __pure debug_enabled(void); > + > +#define pr_efi(sys_table, msg) do { > \ > + if (debug_enabled()) efi_printk(sys_table, "EFI stub: "msg);\ > +} while (0) I was going to say that it's somewhat unfortunate to lose some of these messages by default, but I guess this only adds one additional round-trip when debugging a remote issue, and gives us the freedom to add more comprehensive debug messages in future. Is the idea is to make the EFI stub completely silent in the successful case, or is the aim just to minimise the number of messages? If it's only the latter, we could log a message regarding the existence of efi=debug in the default case, which might help with bug reports. No strong feelings from me. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] Xen: EFI: Parse DT parameters for Xen specific UEFI
On Tue, Nov 17, 2015 at 12:25:58PM +0100, Ard Biesheuvel wrote: > On 17 November 2015 at 10:57,wrote: > > From: Shannon Zhao > > > > Add a new function to parse DT parameters for Xen specific UEFI just > > like the way for normal UEFI. Then it could reuse the existing codes. > > > > Signed-off-by: Shannon Zhao > > --- > > drivers/firmware/efi/efi.c | 67 > > ++ > > 1 file changed, 62 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c > > index d6144e3..629bd06 100644 > > --- a/drivers/firmware/efi/efi.c > > +++ b/drivers/firmware/efi/efi.c > > @@ -24,6 +24,7 @@ > > #include > > #include > > #include > > +#include > > > > struct efi __read_mostly efi = { > > .mps= EFI_INVALID_TABLE_ADDR, > > @@ -488,12 +489,60 @@ static __initdata struct { > > UEFI_PARAM("MemMap Desc. Version", "linux,uefi-mmap-desc-ver", > > desc_ver) > > }; > > > > +static __initdata struct { > > + const char name[32]; > > + const char propname[32]; > > + int offset; > > + int size; > > +} xen_dt_params[] = { > > + UEFI_PARAM("System Table", "xen,uefi-system-table", system_table), > > + UEFI_PARAM("MemMap Address", "xen,uefi-mmap-start", mmap), > > + UEFI_PARAM("MemMap Size", "xen,uefi-mmap-size", mmap_size), > > + UEFI_PARAM("MemMap Desc. Size", "xen,uefi-mmap-desc-size", > > desc_size), > > + UEFI_PARAM("MemMap Desc. Version", "xen,uefi-mmap-desc-ver", > > desc_ver) > > +}; > > + > > We discussed (and agreed afair) that dropping the "linux," prefix from > the DT properties was an acceptable change. If we do that, and reuse > the same names in the xen version (i.e., drop the "xen," prefix), we > could make this change a *lot* simpler. Regardless of if we drop the "linux," prefix from the existing strings, I think we need the "xen," prefix here. The xen EFI interface comes with additional caveats, and we need to treat it separately. Unless you're suggesting that /hypervisor/uefi-* is handled differently to /chosen/uefi-*? I think I'd still prefer the "xen," prefix regardless. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 3/3] arm64/efi: isolate EFI stub from the kernel proper
[Adding Catalin to the To line] On Fri, Oct 30, 2015 at 01:17:24PM +0100, Ard Biesheuvel wrote: > On 27 October 2015 at 23:44, Jeremy Lintonwrote: > > On 10/26/2015 09:20 PM, Ard Biesheuvel wrote: > >> > >> Thanks for the report. The following patch should fix it > > > > > > Ard, > > > > Thanks, it does fix the build problem... > > > > Now to get the darn thing to boot (not related to this AFAIK). > > > > Catalin, > > Would you like me to resend this as a separate patch? Ard's referring to the patch in a prior reply [1]. Mark. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-October/381074.html -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [v2] arm64: efi: make sure vmlinux load address aligned on 2MB
On Thu, Oct 29, 2015 at 11:59:46AM +0900, Ard Biesheuvel wrote: > On 29 October 2015 at 03:21, Mark Rutland <mark.rutl...@arm.com> wrote: > > On Wed, Oct 28, 2015 at 01:12:36PM -0500, Timur Tabi wrote: > >> On 10/28/2015 01:08 PM, Mark Rutland wrote: > >> > >> >arm64: efi: ensure kernel is loaded at correct address > >> > > >> >The kernel image needs to be loaded text_offset_bytes from a 2M-aligned > >> >base, per Documentation/arm64/booting.txt. If loaded at the wrong offset > >> >modulo 2M, __create_page_tables will create incorrect page tables. > >> > > >> >The EFI stub implicitly assumes that dram_base (i.e. the lowest address > >> >with a EFI_MEMORY_WB attribute) is 2M-aligned, and tries to load the > >> >kernel at dram_base + TEXT_OFFSET. If dram_base is not 2M-aligned, the > >> >kernel will be loaded at the wrong offset from 2M. > >> > >> Thanks, I'll use that. I messed up a couple other things, so I need > >> to send out a v3 anyway. > >> > >> >>- *image_addr = *reserve_addr = dram_base + TEXT_OFFSET; > >> >>+ *image_addr = *reserve_addr = > >> >>+ round_up(dram_base, SZ_2M) + TEXT_OFFSET; > >> > > >> >We also need to fix the test for whether we need to relocate the kernel: > >> >(*image_addr != (dram_base + TEXT_OFFSET)). > >> > > >> >When dram_base is not 2M aligned, that is broken, and it's been broken > >> >since it was introduced in commit 3c7f255039a2ad6e ("arm64: efi: add EFI > >> >stub") in v3.16. > >> > > >> >It's a bit hideous to fix the general case, though, it seems. > >> > >> Um, so I should I do something more in my v3 patch, or is this a > >> change for a different patch? > > > > I think there should be a single patch, but please hold off v3 for a day > > or so. I think there a few more edge cases here, and I'm currently > > investigating. > > > > Apologies for the drive-by nature of my contributions to this thread. > I am currently travelling. > > I think the below should address both issues (and I even tried to > compile it this time) > > -8<- > diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c > index 816120ece6bc..78dfbd34b6bf 100644 > --- a/arch/arm64/kernel/efi-stub.c > +++ b/arch/arm64/kernel/efi-stub.c > @@ -25,10 +25,20 @@ > unsigned long kernel_size, kernel_memsize = 0; > unsigned long nr_pages; > void *old_image_addr = (void *)*image_addr; > + unsigned long preferred_offset; > + > + /* > +* The preferred offset of the kernel Image is TEXT_OFFSET bytes > beyond > +* a 2 MB aligned base, which itself may be lower than dram_base, as > +* long as the resulting offset equals or exceeds it. > +*/ > + preferred_offset = round_down(dram_base, SZ_2M) + TEXT_OFFSET; > + if (preferred_offset < dram_base) > + preferred_offset += SZ_2M; > > /* Relocate the image, if required. */ > kernel_size = _edata - _text; > - if (*image_addr != (dram_base + TEXT_OFFSET)) { > + if (*image_addr != preferred_offset) { > kernel_memsize = kernel_size + (_end - _edata); > > /* > @@ -42,7 +52,7 @@ > * Mustang), we can still place the kernel at the address > * 'dram_base + TEXT_OFFSET'. > */ > - *image_addr = *reserve_addr = dram_base + TEXT_OFFSET; > + *image_addr = *reserve_addr = preferred_offset; > nr_pages = round_up(kernel_memsize, EFI_ALLOC_ALIGN) / >EFI_PAGE_SIZE; > status = efi_call_early(allocate_pages, EFI_ALLOCATE_ADDRESS, > This looks good to me, and I've given it a spin on Juno (though I haven't fiddled with dram_base). I trust that you will respin this as a patch when you get the chance. There is another (existing) problem I spotted, in that we'll sometimes move the kernel to a worse address. If the kernel was loaded at a valid address (i.e. image_addr % SZ_2MB == TEXT_OFFSET), but not at the preferred offset, we try to relocate it, even if it's already at the lowest possible address. That doesn't break boot, so it's not as big a problem, and it's probably better sovled with the split VA stuff. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64: efi: make sure vmlinux load address aligned on 2MBytes
On Tue, Oct 27, 2015 at 09:13:22PM -0500, Timur Tabi wrote: > Ard Biesheuvel wrote: > >No. I screwed up the patch since the trailing ) should not be there, > >but we do need to round first, and only then add the offset. > > But if the offset is not a multiple of 2MB, won't the address passed > to allocate_pages be unaligned? I'll test your patch on our system, > but I thought the problem was that the allocated address must be > aligned. The kernel needs to be loaded at an address which is text_offset bytes past a 2M-aligned base. It is not loaded at the 2M-aligned base itself. Ard's patch correctly findd a 2M-aligned base, then adds the text offset. Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64: efi: make sure vmlinux load address aligned on 2MBytes
On Wed, Oct 28, 2015 at 12:11:57PM -0500, Timur Tabi wrote: > On 10/27/2015 09:06 PM, Ard Biesheuvel wrote: > > > >diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c > >index 816120ece6bc..a60ce249cfc0 100644 > >--- a/arch/arm64/kernel/efi-stub.c > >+++ b/arch/arm64/kernel/efi-stub.c > >@@ -42,7 +42,8 @@ > > * Mustang), we can still place the kernel at the address > > * 'dram_base + TEXT_OFFSET'. > > */ > >- *image_addr = *reserve_addr = dram_base + TEXT_OFFSET; > >+ *image_addr = *reserve_addr = round_up(dram_base, SZ_2M) + > >+ TEXT_OFFSET); > > nr_pages = round_up(kernel_memsize, EFI_ALLOC_ALIGN) / > >EFI_PAGE_SIZE; > > status = efi_call_early(allocate_pages, > > EFI_ALLOCATE_ADDRESS, > > Tested-by: Timur Tabi <ti...@codeaurora.org> > Tested-by: Shanker Donthineni <shank...@codeaurora.org> I assume with the trailing ')' removed. ;) With that: Acked-by: Mark Rutland <mark.rutl...@arm.com> > However, I think this formatting is easier to read: > > *image_addr = *reserve_addr = > round_up(dram_base, SZ_2M) + TEXT_OFFSET; I have no strong feeling on this either way. > This does make the kernel boot, but we suspect that there may be > another problem. We need to investigate it, but we have a suspicion > that the EFI stub is trying to allocate from the Runtime Data block, > and the alignment adjustment "fixes" the problem by moving the > pointer to Conventional Memory. I don't follow. EFI_BOOT_SERVICES.AllocatePages() should only give us pages which are available, so it shouldn't ever return pages which are runtime data -- it would fail and we'd fall back to efi_low_alloc(). Could you elaborate? > Anyway, is there any chance we can get this fix into 4.3? I'd hate > to have 4.3 released knowing that it's broken on our hardware. It's probably too late now, but it should certainly be Cc'd stable and get backported. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [v2] arm64: efi: make sure vmlinux load address aligned on 2MB
On Wed, Oct 28, 2015 at 01:12:36PM -0500, Timur Tabi wrote: > On 10/28/2015 01:08 PM, Mark Rutland wrote: > > >arm64: efi: ensure kernel is loaded at correct address > > > >The kernel image needs to be loaded text_offset_bytes from a 2M-aligned > >base, per Documentation/arm64/booting.txt. If loaded at the wrong offset > >modulo 2M, __create_page_tables will create incorrect page tables. > > > >The EFI stub implicitly assumes that dram_base (i.e. the lowest address > >with a EFI_MEMORY_WB attribute) is 2M-aligned, and tries to load the > >kernel at dram_base + TEXT_OFFSET. If dram_base is not 2M-aligned, the > >kernel will be loaded at the wrong offset from 2M. > > Thanks, I'll use that. I messed up a couple other things, so I need > to send out a v3 anyway. > > >>- *image_addr = *reserve_addr = dram_base + TEXT_OFFSET; > >>+ *image_addr = *reserve_addr = > >>+ round_up(dram_base, SZ_2M) + TEXT_OFFSET; > > > >We also need to fix the test for whether we need to relocate the kernel: > >(*image_addr != (dram_base + TEXT_OFFSET)). > > > >When dram_base is not 2M aligned, that is broken, and it's been broken > >since it was introduced in commit 3c7f255039a2ad6e ("arm64: efi: add EFI > >stub") in v3.16. > > > >It's a bit hideous to fix the general case, though, it seems. > > Um, so I should I do something more in my v3 patch, or is this a > change for a different patch? I think there should be a single patch, but please hold off v3 for a day or so. I think there a few more edge cases here, and I'm currently investigating. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64: efi: make sure vmlinux load address aligned on 2MBytes
On Wed, Oct 28, 2015 at 04:02:23PM -0500, Timur Tabi wrote: > On 10/28/2015 12:26 PM, Mark Rutland wrote: > >>>This does make the kernel boot, but we suspect that there may be > >>>another problem. We need to investigate it, but we have a suspicion > >>>that the EFI stub is trying to allocate from the Runtime Data block, > >>>and the alignment adjustment "fixes" the problem by moving the > >>>pointer to Conventional Memory. > >I don't follow. EFI_BOOT_SERVICES.AllocatePages() should only give us > >pages which are available, so it shouldn't ever return pages which are > >runtime data -- it would fail and we'd fall back to efi_low_alloc(). > > > >Could you elaborate? > > So we're still debugging this internally, but it turns out that > dram_base is equal to 0x400082, which also happens to be the > start of a Runtime Data block: > > 0x00400082-0x00400085 [Runtime Data |RUN|XP| | | > |WB|WT|WC|UC] > > I think this is not supposed to happen. It's perfectly valid for that to be detected as dram_base, and the stub may call AllocatePages() for that region, but AllocatePages() shouldn't successfully allocate from there. The stub should fall back to efi_low_alloc, walking through the memory map until it finds a large enough region to allocate from, with some subsequent AllocatePages() call eventually succeeding. Is that not what you're seeing? Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 0/6] KASAN for arm64
On Fri, Oct 09, 2015 at 12:32:18PM +0300, Andrey Ryabinin wrote: [...] > I thought the EFI stub isolation patches create a copy of mem*() functions in > the stub, > but they are just create aliases with __efistub_ prefix. > > We only need to create some more aliases for KASAN. > The following patch on top of the EFI stub isolation series works for me. > > > Signed-off-by: Andrey Ryabinin> --- > arch/arm64/kernel/image.h | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h > index e083af0..6eb8fee 100644 > --- a/arch/arm64/kernel/image.h > +++ b/arch/arm64/kernel/image.h > @@ -80,6 +80,12 @@ __efistub_strcmp = __pi_strcmp; > __efistub_strncmp= __pi_strncmp; > __efistub___flush_dcache_area= __pi___flush_dcache_area; > > +#ifdef CONFIG_KASAN > +__efistub___memcpy = __pi_memcpy; > +__efistub___memmove = __pi_memmove; > +__efistub___memset = __pi_memset; > +#endif Ard's v4 stub isolation series has these aliases [1], as the stub requires these aliases regardless of KASAN in order to link. Thanks, Mark. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-October/375708.html -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 0/6] KASAN for arm64
On Fri, Oct 09, 2015 at 01:18:09PM +0300, Andrey Ryabinin wrote: > 2015-10-09 12:48 GMT+03:00 Mark Rutland <mark.rutl...@arm.com>: > > On Fri, Oct 09, 2015 at 12:32:18PM +0300, Andrey Ryabinin wrote: > > [...] > > > >> I thought the EFI stub isolation patches create a copy of mem*() functions > >> in the stub, > >> but they are just create aliases with __efistub_ prefix. > >> > >> We only need to create some more aliases for KASAN. > >> The following patch on top of the EFI stub isolation series works for me. > >> > >> > >> Signed-off-by: Andrey Ryabinin <ryabinin@gmail.com> > >> --- > >> arch/arm64/kernel/image.h | 6 ++ > >> 1 file changed, 6 insertions(+) > >> > >> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h > >> index e083af0..6eb8fee 100644 > >> --- a/arch/arm64/kernel/image.h > >> +++ b/arch/arm64/kernel/image.h > >> @@ -80,6 +80,12 @@ __efistub_strcmp = __pi_strcmp; > >> __efistub_strncmp= __pi_strncmp; > >> __efistub___flush_dcache_area= __pi___flush_dcache_area; > >> > >> +#ifdef CONFIG_KASAN > >> +__efistub___memcpy = __pi_memcpy; > >> +__efistub___memmove = __pi_memmove; > >> +__efistub___memset = __pi_memset; > >> +#endif > > > > Ard's v4 stub isolation series has these aliases [1], as the stub > > requires these aliases regardless of KASAN in order to link. > > Stub isolation series has __efistub_memcpy, not __efistub___memcpy > (two additional '_'). Ah, I see, sorry for my sloppy reading. > The thing is, KASAN provides own implementation of memcpy() which > checks memory before access. > The original 'memcpy()' becomes __memcpy(), so we could still use it. Ok. > In code that not instrumented by KASAN (like the EFI stub) we replace > KASAN's memcpy() with the original __mempcy(): > #define memcpy() __memcpy() I'm a little confused by this. Surely that doesn't override implicit calls generated by the compiler, leaving us with a mixture of calls to memcpy and __memcpy? That doesn't matter for the stub, as both __efistub_mem* and __efistub___mem* would point at __pe_mem*, but doesn't that matter for other users that shouldn't be instrumented? Is that not a problem, or do we inhibit/override that somehow? > So with CONFIG_KASAN=y the EFI stub uses __memcpy, thus we need to > create the __efistub___memcpy alias. Ok, that makes sense to me. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 0/6] KASAN for arm64
On Thu, Oct 08, 2015 at 01:36:09PM +0300, Andrey Ryabinin wrote: > 2015-10-07 13:04 GMT+03:00 Catalin Marinas: > > On Thu, Sep 17, 2015 at 12:38:06PM +0300, Andrey Ryabinin wrote: > >> As usual patches available in git > >> git://github.com/aryabinin/linux.git kasan/arm64v6 > >> > >> Changes since v5: > >> - Rebase on top of 4.3-rc1 > >> - Fixed EFI boot. > >> - Updated Doc/features/KASAN. > > > > I tried to merge these patches (apart from the x86 one which is already > > merged) but it still doesn't boot on Juno as an EFI application. > > > > 4.3-rc1 was ok and 4.3-rc4 is not. Break caused by 0ce3cc008ec04 > ("arm64/efi: Fix boot crash by not padding between EFI_MEMORY_RUNTIME > regions") > It introduced sort() call in efi_get_virtmap(). > sort() is generic kernel function and it's instrumented, so we crash > when KASAN tries to access shadow in sort(). > > [+CC efi some guys] > > Comment in drivers/firmware/efi/libstub/Makefile says that EFI stub > executes with MMU disabled: > # The stub may be linked into the kernel proper or into a separate > boot binary, > # but in either case, it executes before the kernel does (with MMU > disabled) so > # things like ftrace and stack-protector are likely to cause trouble if > left > # enabled, even if doing so doesn't break the build. > > But in arch/arm64/kernel/efi-entry.S: > * We arrive here from the EFI boot manager with: > * > ** CPU in little-endian mode > ** MMU on with identity-mapped RAM > > So is MMU enabled in ARM64 efi-stub? The stub is executed as an EFI application, which means that the MMU is on, and the page tables are an idmap owned by the EFI implementation. > If yes, we could solve this issue by mapping KASAN early shadow in efi stub. As the page tables are owned by the implemenation and not the kernel, we cannot alter them (at least not until we've called ExitBootServices(), which happens relatively late). Can we not build the stub without ASAN protections? Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters
On Sat, Sep 12, 2015 at 12:36:55PM +0100, Daniel Kiper wrote: > On Fri, Sep 11, 2015 at 05:25:59PM +0100, Mark Rutland wrote: > > On Fri, Sep 11, 2015 at 01:46:43PM +0100, Daniel Kiper wrote: > > > On Thu, Sep 10, 2015 at 05:23:02PM +0100, Mark Rutland wrote: > > [...] > > > > > What's troublesome with the boot services? > > > > > > > > What can't be simulated? > > > > > > How do you want to access bare metal EFI boot services from dom0 if they > > > were shutdown long time ago before loading dom0 image? > > > > I don't want to. > > > > I asked "What can't be simulated?" because I assumed everything > > necessary/mandatory could be simulated without needinng access to any > > real EFI boot services. > > > > As far as I can see all that's necessary is to provide a compatible > > interface. > > Could you be more precise what do you need? Please enumerate. UEFI spec has > more than 2500 pages and I do not think that we need all stuff in dom0. > > > > What do you need from EFI boot services in dom0? > > > > The ability to call ExitBootServices() and SetVirtualAddressMap() on a > > _virtual_ address map for _virtual_ services provided by the hypervisor. > > I am confused. Why do you need that? Please remember, EFI is owned and > operated by Xen hypervisor. dom0 does not have direct access to EFI. Let's take a step back. My objection here is to passing the Dom0 kernel properties as if it were booted with direct access to a full UEFI, then later fixing that up (when Xen is detected and we apply its hypercall EFI implementation). If the kernel cannot use EFI natively, why pretend to the kernel that it can? The hypercall implementation is _not_ EFI (though it provides access to some services). The two ways I can see providing Dom0 with EFI services are: * Have Xen create shims for any services, in which any hypercalls live, and pass these to the kernel with a virtual system table. This keeps the interface to the kernel the same regardless of Xen. * Have the kernel detect Xen EFI capability via Xen, without passing the usual native EFI parameters. This can then be installed into the kernel in a Xen-specific manner, and we know from the outset that Xen-specific caveats apply. As per my original email, I'm not against the renaming of the stub parameters if we standardise the rest of the details, but I believe that's orthogonal to the Xen Dom0 case. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters
> >> Considering that the EFI support is just for Dom0, and Dom0 (at > >> the time) had to be PV anyway, it was the more natural solution to > >> expose the interface via hypercalls, the more that this allows better > >> control over what is and primarily what is not being exposed to > >> Dom0. With the wrapper approach we'd be back to the same > >> problem (discussed elsewhere) of which EFI version to surface: The > >> host one would impose potentially missing extensions, while the > >> most recent hypervisor known one might imply hiding valuable > >> information from Dom0. Plus there are incompatible changes like > >> the altered meaning of EFI_MEMORY_WP in 2.5. > > > > I'm not sure I follow how hypercalls solve any impedance mismatch here; > > you're still expecting Dom0 to call up to Xen in order to perform calls, > > and all I suggested was a different location for those hypercalls. > > > > If Xen is happy to make such calls blindly, why does it matter if the > > hypercall was in the kernel binary or an external shim? > > Because there could be new entries in SystemTable->RuntimeServices > (expected and blindly but validly called by the kernel). Even worse > (because likely harder to deal with) would be new fields in other > structures. Any of these could cause Xen to blow up, while Xen could always provide a known-safe (but potentially sub-optimal) view to the kernel by default. > > Incompatible changes are a spec problem regardless of how this is > > handled. > > Not necessarily - we don't expose the memory map (we'd have to > if we were to mimic EFI for Dom0), and hence the mentioned issue > doesn't exist in our model. We have to expose _some_ memory map, so I don't follow this point. Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters
> It feels like this discussion is going in circles. > > When we discussed this six months ago, we already concluded that, > since UEFI is the only specified way that the presence of ACPI is > advertised on an ARM system, we need to emulate UEFI to some extent. My understanding from the last time I was present at such a discussion was that the emulation was complete from the kernel's PoV (i.e. no special case handling was required). Evidently I misunderstood. One of the main points of rationale for requiring EFI was that we'd have a well-defined system state as per the EFI (and ACPI) standards. We'd know we had the EFI memory map, services, etc (with the memory map and configuration tables being the most important elements). We didn't want to have to try to reconcile a DT memory map and ACPI, for instance. That is somewhat (though admitedly not entirely) broken if we require a set of rules to be applied beyond what the standards mandate. That is broken if we require a non-standard set of rules to be applied, and limits what we can do in the !Xen case. > So we need the EFI system table to expose the UEFI configuration table > that carries the ACPI root pointer. > > Since ACPI support also relies on the UEFI memory map (I think?), we > need that as well. > > These two items are exactly what we pass via the UEFI DT properties, > so we should indeed promote the current de-facto binding to a proper > binding, and renaming the properties makes sense in that context. I agree that we need to sort these out. > I agree that this should also include a description of the expected > state of the firmware, i.e., that ExitBootServices() has been called, > and that the memory map has been populated with virtual address, which > have been installed using SetVirtualAddressMap() if they differ from > the physical addresses. (The current implementation on the kernel side > is perfectly capable of dealing with a 1:1 mapping). > > Beyond that, there is no point in pretending to be a full UEFI > implementation, imo. Boot services are not required, nor are runtime > services (only the current EFI init code on arm needs to be modified > to deal with a NULL runtime services pointer) I'm not keen on this because it involves applying Xen-specific caveats atop of what the UEFI spec says (e.g. as runtime services might be NULL under Xen). As the spec and Xen evolve, those caveats shift, and that's going to be fragile for all users regardleses of Xen. If Xen needs special-casing, then I'd rather that Xen were detected first and provided us with what it knows is safe for us to use, rather than we tip-toe around until we're sure Xen isn't present and/or doesn't need additional constraints met. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters
On Fri, Sep 11, 2015 at 01:46:43PM +0100, Daniel Kiper wrote: > On Thu, Sep 10, 2015 at 05:23:02PM +0100, Mark Rutland wrote: > > > > C) When you could go: > > > > > > > >DT -> Discover Xen -> Xen-specific stuff -> Xen-specific EFI/ACPI > > > > discovery > > > > > > I take you mean discovering Xen with the usual Xen hypervisor node on > > > device tree. I think that C) is a good option actually. I like it. Not > > > sure why we didn't think about this earlier. Is there anything EFI or > > > ACPI which is needed before Xen support is discovered by > > > arch/arm64/kernel/setup.c:setup_arch -> xen_early_init()? > > > > Currently lots (including the memory map). With the stuff to support > > SPCR, the ACPI discovery would be moved before xen_early_init(). > > > > > If not, we could just go for this. A lot of complexity would go away. > > > > I suspect this would still be fairly complex, but would at least prevent > > the Xen-specific EFI handling from adversely affecting the native case. > > > > > > D) If you want to be generic: > > > >EFI -> EFI application -> EFI tables -> ACPI tables -> Xen-specific > > > > stuff > > > > \--/ > > > >(virtualize these, provide shims to Dom0, but handle > > > > everything in Xen itself) > > > > > > I think that this is good in theory but could turn out to be a lot of > > > work in practice. We could probably virtualize the RuntimeServices but > > > the BootServices are troublesome. > > > > What's troublesome with the boot services? > > > > What can't be simulated? > > How do you want to access bare metal EFI boot services from dom0 if they > were shutdown long time ago before loading dom0 image? I don't want to. I asked "What can't be simulated?" because I assumed everything necessary/mandatory could be simulated without needinng access to any real EFI boot services. As far as I can see all that's necessary is to provide a compatible interface. > What do you need from EFI boot services in dom0? The ability to call ExitBootServices() and SetVirtualAddressMap() on a _virtual_ address map for _virtual_ services provided by the hypervisor. A console so that I can log things early on. Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters
Hi, I'm not necessarily opposed to the renaming, but I think that this is the least important thing to standardize for this to work. On Thu, Sep 10, 2015 at 09:41:56AM +0100, Shannon Zhao wrote: > From: Shannon Zhao> > These EFI stub parameters are used to internal communication between EFI > stub and Linux kernel and EFI stub creates these parameters. But for Xen > on ARM when booting with UEFI, Xen will create a minimal DT providing > these parameters for Dom0 and Dom0 is not only Linux kernel, but also > other OS (such as FreeBSD) which will be used in the future. Currently the Linux EFI stub and kernel have a symbiotic relationship, because they're intimately coupled and we don't have kexec (yet) on EFI platforms to loosen that coupling. If an agent other than the (kernel-specific) stub is going to provide this to the kernel, then we need more specified than just the property names. That at least includes the following: * The state of boot services (we currently have the EFI stub call ExitBootServices(), and I believe this is crucial to the plan for kexec). * The state of the address map (we currently have the EFI stub call SetVirtualAddressMap()). * The virtual address range(s) that SetVirtualAddressMap() may map elements to (this logic is currently in the EFI stub, and this matches the expectations of the kernel that it is tied to). > So here we plan to standardize the names by dropping the prefix > "linux," and make them common to other OS. Also this will not break > the compatibility since these parameters are used to internal > communication between EFI stub and kernel. For the moment this is true, but will not be once we have kexec, so there's a dependency (or anti-dependency) there. > Signed-off-by: Shannon Zhao > --- > Look at [1] for the discussion about this in Xen ML. The purpose of this > patch is to standardize the names to make Linux ARM kernel work on Xen > with UEFI. Also it hopes other OS(e.g. FreeBSD), which will be used as > Dom0 on Xen, could support this mechanism as well. > > [1]http://lists.xenproject.org/archives/html/xen-devel/2015-08/msg02250.html Per this post, it looks like to pass a DTB to the kernel Xen already needs to know it's a Linux kernel... I wasn't aware that there was a common standard for arm(64) kernels other than a PE/COFF EFI application. Does Xen not talk to EFI itself and/or give the kernel a virtual EFI interface? Thanks, Mark. > Documentation/arm/uefi.txt | 10 +- > drivers/firmware/efi/efi.c | 10 +- > drivers/firmware/efi/libstub/fdt.c | 10 +- > 3 files changed, 15 insertions(+), 15 deletions(-) > > diff --git a/Documentation/arm/uefi.txt b/Documentation/arm/uefi.txt > index d60030a..8c83243 100644 > --- a/Documentation/arm/uefi.txt > +++ b/Documentation/arm/uefi.txt > @@ -45,18 +45,18 @@ following parameters: > > > Name | Size | Description > > > -linux,uefi-system-table | 64-bit | Physical address of the UEFI System > Table. > +uefi-system-table | 64-bit | Physical address of the UEFI System > Table. > > > -linux,uefi-mmap-start | 64-bit | Physical address of the UEFI memory map, > +uefi-mmap-start | 64-bit | Physical address of the UEFI memory map, >|| populated by the UEFI GetMemoryMap() > call. > > > -linux,uefi-mmap-size | 32-bit | Size in bytes of the UEFI memory map > +uefi-mmap-size| 32-bit | Size in bytes of the UEFI memory map >|| pointed to in previous entry. > > > -linux,uefi-mmap-desc-size | 32-bit | Size in bytes of each entry in the UEFI > +uefi-mmap-desc-size | 32-bit | Size in bytes of each entry in the UEFI >|| memory map. > > > -linux,uefi-mmap-desc-ver | 32-bit | Version of the mmap descriptor format. > +uefi-mmap-desc-ver| 32-bit | Version of the mmap descriptor format. > > > linux,uefi-stub-kern-ver | string | Copy of linux_banner from build. > > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c > index d6144e3..3878715 100644 > --- a/drivers/firmware/efi/efi.c > +++ b/drivers/firmware/efi/efi.c > @@ -481,11 +481,11 @@ static __initdata struct { >
Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters
On Thu, Sep 10, 2015 at 12:37:57PM +0100, Stefano Stabellini wrote: > On Thu, 10 Sep 2015, Mark Rutland wrote: > > > > Does Xen not talk to EFI itself and/or give the kernel a virtual EFI > > > > interface? > > > > > > Xen talks to EFI itself but the interface provided to dom0 is somewhat > > > different: there are no BootServices (Xen calls ExitBootServices before > > > running the kernel), and the RuntimeServices go via hypercalls (see > > > drivers/xen/efi.c). > > > > That's somewhat hideous; a non Xen-aware OS wouild presumably die if > > trying to use any runtime services the normal way? I'm not keen on > > describing things that the OS cannot use. > > I agree that is somewhat hideous, but a non-Xen aware OS traditionally > has never been able to even boot as Dom0. On ARM it can, but it still > wouldn't be very useful (one couldn't use it to start other guests). Sure, but it feels odd to provide the usual information in this manner if it cannot be used. If you require Xen-specific code to make things work, I would imagine this information could be dciscovered in a Xen-specific manner. > > Why can't Xen give a virtual EFI interface to Dom0 / guests? e.g. > > create pages of RuntimeServicesCode that are trivial assembly shims > > doing hypercalls, and plumb these into the virtual EFI memory map and > > tables? > > > > That would keep things sane for any guest, allow for easy addition of > > EFI features, and you could even enter the usual EFI entry point, > > simulate ExitBootServices(), SetVirtualAddressMap(), and allow the guest > > to make things sane for itself... > > That's the way it was done on x86 and now we have common code both in > Linux (drivers/xen/efi.c) and Xen (xen/common/efi) which implement this > scheme. This code is not currently used on arm. It might live in a location where it may be shared, but that doesn't mean that it's common code yet. > Switching to a different solution for ARM, would mean diverging > with x86, which is not nice, or reimplementing the x86 solution too, > which is expensive. > > BTW I think that the idea you proposed was actually considered at the > time and deemed hard to implement, if I recall correctly. I appreciate that divergence is painful. We already diverge in other respects (e.g. lack of PV page tables) because things that used to be the case on x86 never applied to ARM. It would be interesting to see why that was the case for x86, and whether that applies to ARM. > In any case this should be separate from the shim ABI discussion. I disagree; I think this is very much relevant to the ABI discussion. That's not to say that I insist on a particular approach, but I think that they need to be considered together. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters
> > Does Xen not talk to EFI itself and/or give the kernel a virtual EFI > > interface? > > Xen talks to EFI itself but the interface provided to dom0 is somewhat > different: there are no BootServices (Xen calls ExitBootServices before > running the kernel), and the RuntimeServices go via hypercalls (see > drivers/xen/efi.c). That's somewhat hideous; a non Xen-aware OS wouild presumably die if trying to use any runtime services the normal way? I'm not keen on describing things that the OS cannot use. Why can't Xen give a virtual EFI interface to Dom0 / guests? e.g. create pages of RuntimeServicesCode that are trivial assembly shims doing hypercalls, and plumb these into the virtual EFI memory map and tables? That would keep things sane for any guest, allow for easy addition of EFI features, and you could even enter the usual EFI entry point, simulate ExitBootServices(), SetVirtualAddressMap(), and allow the guest to make things sane for itself... Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] arm64/efi: don't pad between EFI_MEMORY_RUNTIME regions
Hi, FWIW I gave this a spin on Seattle and Juno and saw no regressions (both are pre-2.5 EFI though). I have some concerns below. On Wed, Sep 09, 2015 at 08:06:54AM +0100, Ard Biesheuvel wrote: > The new Properties Table feature introduced in UEFIv2.5 may split > memory regions that cover PE/COFF memory images into separate code > and data regions. Since these regions only differ in the type (runtime > code vs runtime data) and the permission bits, but not in the memory > type attributes (UC/WC/WT/WB), the spec does not require them to be > aligned to 64 KB. We should require those to be 64k-aligned for permissions too. I can imagine vendors getting permissions wrong but things happening to work for a 64k kernel (where I assume we have to use the superset of all permissions within a 64k page). > As the relative offset of PE/COFF .text and .data segments cannot be > changed on the fly, this means that we can no longer pad out those > regions to be mappable using 64 KB pages. > Unfortunately, there is no annotation in the UEFI memory map that > identifies data regions that were split off from a code region, so we > must apply this logic to all adjacent runtime regions whose attributes > only differ in the permission bits. > > So instead of rounding each memory region to 64 KB alignment at both > ends, only round down regions that are not directly preceded by another > runtime region with the same type attributes. Since the UEFI spec does > not mandate that the memory map be sorted, this means we also need to > sort it first. > > Signed-off-by: Ard Biesheuvel> --- > > As discussed off list, this is the arm64 side of what we should backport > to stable to prevent firmware that adheres to the current version of the > UEFI v2.5 spec with the memprotect feature enabled from blowing up the system > upon the first OS call into the runtime services. > > For arm64, we already map things in order, but since the spec does not mandate > a sorted memory map, we need to sort it to be sure. This also allows us to > easily find adjacent regions with < 64 KB granularity, which the current > version > of the spec allows if they only differ in permission bits (which the spec says > are 'unused' on AArch64, which could be interpreted as 'allowed but ignored'). > > Changes since v1: > - Ensure that we don't inadvertently set the XN bit on the preceding region at > mapping time if we the OS is running with >4 KB pages. > > arch/arm64/kernel/efi.c | 3 +- > drivers/firmware/efi/libstub/arm-stub.c | 62 +++- > 2 files changed, 49 insertions(+), 16 deletions(-) > > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c > index e8ca6eaedd02..13671a9cf016 100644 > --- a/arch/arm64/kernel/efi.c > +++ b/arch/arm64/kernel/efi.c > @@ -258,7 +258,8 @@ static bool __init efi_virtmap_init(void) >*/ > if (!is_normal_ram(md)) > prot = __pgprot(PROT_DEVICE_nGnRE); > - else if (md->type == EFI_RUNTIME_SERVICES_CODE) > + else if (md->type == EFI_RUNTIME_SERVICES_CODE || > + !PAGE_ALIGNED(md->phys_addr)) > prot = PAGE_KERNEL_EXEC; This looks coarser than necessary. For memory organised like: 0x - 0xF000 (60KiB) : EFI_RUNTIME_SERVICES_CODE 0xF000 - 0x0002 (68KiB) : EFI_RUNTIME_SERVICES_DATA We should be able to make the last 64K non-executable, but with this all 128K is executable, unless I've missed something? Maybe we could do a two-step pass, first mapping the data as not-executable, then mapping any code pages executable (overriding any overlapping portions, but only for the overlapping parts). > else > prot = PAGE_KERNEL; > diff --git a/drivers/firmware/efi/libstub/arm-stub.c > b/drivers/firmware/efi/libstub/arm-stub.c > index e29560e6b40b..cb4e9c4de952 100644 > --- a/drivers/firmware/efi/libstub/arm-stub.c > +++ b/drivers/firmware/efi/libstub/arm-stub.c > @@ -13,6 +13,7 @@ > */ > > #include > +#include Sort isn't an inline in this header. I thought it wasn't safe to call arbitary kernel functions from the stub? > #include > > #include "efistub.h" > @@ -305,6 +306,13 @@ fail: > */ > #define EFI_RT_VIRTUAL_BASE 0x4000 > > +static int cmp_mem_desc(const void *a, const void *b) > +{ > + const efi_memory_desc_t *left = a, *right = b; > + > + return (left->phys_addr > right->phys_addr) ? 1 : -1; > +} Nit: please chose names to make the relationship between these clearer. e.g. s/left/mem_a/, s/right/mem_b/. > + > /* > * efi_get_virtmap() - create a virtual mapping for the EFI memory map > * > @@ -316,34 +324,58 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, > unsigned long map_size, >unsigned long desc_size, efi_memory_desc_t *runtime_map, >int *count) > { > + static const u64 mem_type_mask =
Re: [PATCH v2] arm64/efi: don't pad between EFI_MEMORY_RUNTIME regions
> >> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c > >> index e8ca6eaedd02..13671a9cf016 100644 > >> --- a/arch/arm64/kernel/efi.c > >> +++ b/arch/arm64/kernel/efi.c > >> @@ -258,7 +258,8 @@ static bool __init efi_virtmap_init(void) > >>*/ > >> if (!is_normal_ram(md)) > >> prot = __pgprot(PROT_DEVICE_nGnRE); > >> - else if (md->type == EFI_RUNTIME_SERVICES_CODE) > >> + else if (md->type == EFI_RUNTIME_SERVICES_CODE || > >> + !PAGE_ALIGNED(md->phys_addr)) > >> prot = PAGE_KERNEL_EXEC; > > > > This looks coarser than necessary. For memory organised like: > > > > 0x - 0xF000 (60KiB) : EFI_RUNTIME_SERVICES_CODE > > 0xF000 - 0x0002 (68KiB) : EFI_RUNTIME_SERVICES_DATA > > > > We should be able to make the last 64K non-executable, but with this all > > 128K is executable, unless I've missed something? > > > > In theory, yes. But considering that > > a) this only affects 64 KB pages kernels, and > b) this patch is intended for -stable > > I chose to keep it simple and ignore this, and just relax the > permissions for any region that is not aligned to 64 KB. > > Since these regions are only mapped during Runtime Services calls, the > window for abuse is not that large. Ok, that does sound reasonable. > > Maybe we could do a two-step pass, first mapping the data as > > not-executable, then mapping any code pages executable (overriding any > > overlapping portions, but only for the overlapping parts). > > > > Let me have a go at that. Cheers! > >> else > >> prot = PAGE_KERNEL; > >> diff --git a/drivers/firmware/efi/libstub/arm-stub.c > >> b/drivers/firmware/efi/libstub/arm-stub.c > >> index e29560e6b40b..cb4e9c4de952 100644 > >> --- a/drivers/firmware/efi/libstub/arm-stub.c > >> +++ b/drivers/firmware/efi/libstub/arm-stub.c > >> @@ -13,6 +13,7 @@ > >> */ > >> > >> #include > >> +#include > > > > Sort isn't an inline in this header. I thought it wasn't safe to call > > arbitary kernel functions from the stub? > > > > We call string functions, cache maintenance functions, libfdt > functions etc etc so it seems not everyone got the memo :-) > > I agree that treating vmlinux both as a static library and as a > payload from the stub's pov is a bit sloppy, and I do remember > discussing this, but for the life of me, I can't remember the exact > issue, other than the use of adrp/add and adrp/ldr pairs, which we > fixed by setting the PE/COFF section alignment to 4 KB. I only had a vague recollection that there was a problem, which I thought was more to do with potential use of absolute kernel virtual addresses, which would be incorrect in the context of an EFI application. Digging a bit, the stub code itself is safe due to commit f4f75ad5741fe033 ("efi: efistub: Convert into static library"), but that isn't necessarily true of anything it calls (libfdt uses callbacks in several places). I think the cache functions we call are all raw asm which is position-oblivious. We do seem to be ok so far, however. Maybe we just need to keep an eye out. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters
On Thu, Sep 10, 2015 at 02:52:25PM +0100, Stefano Stabellini wrote: > On Thu, 10 Sep 2015, Mark Rutland wrote: > > On Thu, Sep 10, 2015 at 12:37:57PM +0100, Stefano Stabellini wrote: > > > On Thu, 10 Sep 2015, Mark Rutland wrote: > > > > > > Does Xen not talk to EFI itself and/or give the kernel a virtual EFI > > > > > > interface? > > > > > > > > > > Xen talks to EFI itself but the interface provided to dom0 is somewhat > > > > > different: there are no BootServices (Xen calls ExitBootServices > > > > > before > > > > > running the kernel), and the RuntimeServices go via hypercalls (see > > > > > drivers/xen/efi.c). > > > > > > > > That's somewhat hideous; a non Xen-aware OS wouild presumably die if > > > > trying to use any runtime services the normal way? I'm not keen on > > > > describing things that the OS cannot use. > > > > > > I agree that is somewhat hideous, but a non-Xen aware OS traditionally > > > has never been able to even boot as Dom0. On ARM it can, but it still > > > wouldn't be very useful (one couldn't use it to start other guests). > > > > Sure, but it feels odd to provide the usual information in this manner > > if it cannot be used. If you require Xen-specific code to make things > > work, I would imagine this information could be dciscovered in a > > Xen-specific manner. > > We need ACPI (or Device Tree) to find that Xen is available on the > platform, so we cannot use Xen-specific code to get the ACPI tables. I don't understand. The proposition already involves passing a custom DT to the OS, implying that Xen knows how to boot that OS, and how to pass it a DTB. Consider: A) In the DT-only case, we go: DT -> Discover Xen -> Xen-specific stuff B) The proposition is that un the ACPI case we go: DT -> EFI tables -> ACPI tables -> Discover Xen -> Xen-specific stuff -> override EFI/ACPI stuff \---/ (be really cautious here) C) When you could go: DT -> Discover Xen -> Xen-specific stuff -> Xen-specific EFI/ACPI discovery D) If you want to be generic: EFI -> EFI application -> EFI tables -> ACPI tables -> Xen-specific stuff \--/ (virtualize these, provide shims to Dom0, but handle everything in Xen itself) E) Partially-generic option: EFI -> EFI application -> Xen detected by registered GUID -> Xen-specific EFI bootloader stuff -> OS in Xen-specific configuration > > > In any case this should be separate from the shim ABI discussion. > > > > I disagree; I think this is very much relevant to the ABI discussion. > > That's not to say that I insist on a particular approach, but I think > > that they need to be considered together. > > Let's suppose Xen didn't expose any RuntimeServices at all, would that > make it easier to discuss about the EFI stub parameters? It would simply the protocol specific to Xen, certainly. > In the grant scheme of things, they are not that important, as Ian > wrote what is important is how to pass the RSDP. Unfortunately we're still going to have to care about this eventually, even if for something like kexec. So we still need to spec out the state of things if this is going to be truly generic. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters
On Thu, Sep 10, 2015 at 01:55:25PM +0100, Jan Beulich wrote: > >>> On 10.09.15 at 13:37, <stefano.stabell...@eu.citrix.com> wrote: > > On Thu, 10 Sep 2015, Mark Rutland wrote: > >> Why can't Xen give a virtual EFI interface to Dom0 / guests? e.g. > >> create pages of RuntimeServicesCode that are trivial assembly shims > >> doing hypercalls, and plumb these into the virtual EFI memory map and > >> tables? > >> > >> That would keep things sane for any guest, allow for easy addition of > >> EFI features, and you could even enter the usual EFI entry point, > >> simulate ExitBootServices(), SetVirtualAddressMap(), and allow the guest > >> to make things sane for itself... > > > > That's the way it was done on x86 and now we have common code both in > > Linux (drivers/xen/efi.c) and Xen (xen/common/efi) which implement this > > scheme. Switching to a different solution for ARM, would mean diverging > > with x86, which is not nice, or reimplementing the x86 solution too, > > which is expensive. > > > > BTW I think that the idea you proposed was actually considered at the > > time and deemed hard to implement, if I recall correctly. > > Considering that the EFI support is just for Dom0, and Dom0 (at > the time) had to be PV anyway, it was the more natural solution to > expose the interface via hypercalls, the more that this allows better > control over what is and primarily what is not being exposed to > Dom0. With the wrapper approach we'd be back to the same > problem (discussed elsewhere) of which EFI version to surface: The > host one would impose potentially missing extensions, while the > most recent hypervisor known one might imply hiding valuable > information from Dom0. Plus there are incompatible changes like > the altered meaning of EFI_MEMORY_WP in 2.5. I'm not sure I follow how hypercalls solve any impedance mismatch here; you're still expecting Dom0 to call up to Xen in order to perform calls, and all I suggested was a different location for those hypercalls. If Xen is happy to make such calls blindly, why does it matter if the hypercall was in the kernel binary or an external shim? Incompatible changes are a spec problem regardless of how this is handled. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] arm64/efi: don't pad between EFI_MEMORY_RUNTIME regions
> OK so what we could do is the following: > > 8<-- > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c > index e8ca6eaedd02..39fa2a70a7f1 100644 > --- a/arch/arm64/kernel/efi.c > +++ b/arch/arm64/kernel/efi.c > @@ -233,6 +233,7 @@ void __init efi_init(void) > static bool __init efi_virtmap_init(void) > { > efi_memory_desc_t *md; > + u64 prev_end = 0; > > for_each_efi_memory_desc(, md) { > u64 paddr, npages, size; > @@ -256,13 +257,26 @@ static bool __init efi_virtmap_init(void) > * executable, everything else can be mapped with the XN bits > * set. > */ > - if (!is_normal_ram(md)) > + if (!is_normal_ram(md)) { > prot = __pgprot(PROT_DEVICE_nGnRE); > - else if (md->type == EFI_RUNTIME_SERVICES_CODE) > + } else if (md->type == EFI_RUNTIME_SERVICES_CODE) { > prot = PAGE_KERNEL_EXEC; > - else > + } else { > + /* > +* If we are running with >4 KB pages and the current > +* region shares a page frame with the preceding one, > +* we should not map the leading page again since > doing > +* so may take its executable permissions away. > +*/ > + if (PAGE_SIZE > EFI_PAGE_SIZE && paddr < prev_end) { > + paddr += PAGE_SIZE; > + size -= PAGE_SIZE; > + if (!size) > + continue; > + } > prot = PAGE_KERNEL; > - > + } > + prev_end = paddr + size; > create_pgd_mapping(_mm, paddr, md->virt_addr, size, prot); > } > return true; > 8<-- > > This will ensure that only the pages that are shared between 2 or more > regions may have their permissions upgraded, but only if any of these > regions requires it. > > I prefer the much simpler previous version, though, and I think it is > more suitable for -stable. I can always follow up with an improvement > like this for v4.3-late. Ok. Let's go with your previous version for now. Could you put something in the commit message about this limitation, so that we don't forget? > >> >> else > >> >> prot = PAGE_KERNEL; > >> >> diff --git a/drivers/firmware/efi/libstub/arm-stub.c > >> >> b/drivers/firmware/efi/libstub/arm-stub.c > >> >> index e29560e6b40b..cb4e9c4de952 100644 > >> >> --- a/drivers/firmware/efi/libstub/arm-stub.c > >> >> +++ b/drivers/firmware/efi/libstub/arm-stub.c > >> >> @@ -13,6 +13,7 @@ > >> >> */ > >> >> > >> >> #include > >> >> +#include > >> > > >> > Sort isn't an inline in this header. I thought it wasn't safe to call > >> > arbitary kernel functions from the stub? > >> > > >> > >> We call string functions, cache maintenance functions, libfdt > >> functions etc etc so it seems not everyone got the memo :-) > >> > >> I agree that treating vmlinux both as a static library and as a > >> payload from the stub's pov is a bit sloppy, and I do remember > >> discussing this, but for the life of me, I can't remember the exact > >> issue, other than the use of adrp/add and adrp/ldr pairs, which we > >> fixed by setting the PE/COFF section alignment to 4 KB. > > > > I only had a vague recollection that there was a problem, which I > > thought was more to do with potential use of absolute kernel virtual > > addresses, which would be incorrect in the context of an EFI > > application. > > > > That was it, of course. Unlike the x86 stub, which is built with -fPIC > (as is the ARM decompressor, btw), the arm64 kernel is position > dependent. Fortunately, the small code model is mostly position > independent by default, but it would be good if we could spot any > problems at build time. > > > Digging a bit, the stub code itself is safe due to commit > > f4f75ad5741fe033 ("efi: efistub: Convert into static library"), but that > > libstub is linked into vmlinux so that does not make a different at all Ok. I assumed that inter-stub references would still be relative, but I have no idea what the linker would do. > > We do seem to be ok so far, however. Maybe we just need to keep an eye > > out. > > > > I'd much rather restrict the code that goes into the stub somehow than > deal with any absolute references. Perhaps we could reuse some of the > section mismatch code in some way to tag certain code as stub-safe and > do a verification pass on the binary. It would be great if we could, though I'm not really sure how that would work. Imagine stub_foo calls fdt_blah, which calls kern_baz by absolute address. Normally the latter call is fine, but not when the original
Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters
> > C) When you could go: > > > >DT -> Discover Xen -> Xen-specific stuff -> Xen-specific EFI/ACPI > > discovery > > I take you mean discovering Xen with the usual Xen hypervisor node on > device tree. I think that C) is a good option actually. I like it. Not > sure why we didn't think about this earlier. Is there anything EFI or > ACPI which is needed before Xen support is discovered by > arch/arm64/kernel/setup.c:setup_arch -> xen_early_init()? Currently lots (including the memory map). With the stuff to support SPCR, the ACPI discovery would be moved before xen_early_init(). > If not, we could just go for this. A lot of complexity would go away. I suspect this would still be fairly complex, but would at least prevent the Xen-specific EFI handling from adversely affecting the native case. > > D) If you want to be generic: > >EFI -> EFI application -> EFI tables -> ACPI tables -> Xen-specific stuff > > \--/ > >(virtualize these, provide shims to Dom0, but handle > > everything in Xen itself) > > I think that this is good in theory but could turn out to be a lot of > work in practice. We could probably virtualize the RuntimeServices but > the BootServices are troublesome. What's troublesome with the boot services? What can't be simulated? > > E) Partially-generic option: > >EFI -> EFI application -> Xen detected by registered GUID -> > > Xen-specific EFI bootloader stuff -> OS in Xen-specific configuration > > > > > > > > > In any case this should be separate from the shim ABI discussion. > > > > > > > > I disagree; I think this is very much relevant to the ABI discussion. > > > > That's not to say that I insist on a particular approach, but I think > > > > that they need to be considered together. > > > > > > Let's suppose Xen didn't expose any RuntimeServices at all, would that > > > make it easier to discuss about the EFI stub parameters? > > > > It would simply the protocol specific to Xen, certainly. > > > > > In the grant scheme of things, they are not that important, as Ian > > > wrote what is important is how to pass the RSDP. > > > > Unfortunately we're still going to have to care about this eventually, > > even if for something like kexec. So we still need to spec out the state > > of things if this is going to be truly generic. > > Fair enough. My position is that if we restrict this to RuntimeServices, > it might be possible, but I still prefer C). Regardless of what we do we still need a well-defined state here, which brings us back to the initial problem eventually. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] arm64/efi: prefer AllocatePages() over efi_low_alloc() for vmlinux
On Fri, Jul 24, 2015 at 12:38:27PM +0100, Ard Biesheuvel wrote: When allocating memory for the kernel image, try the AllocatePages() boot service to obtain memory at the preferred offset of 'dram_base + TEXT_OFFSET', and only revert to efi_low_alloc() if that fails. This is the only way to allocate at the base of DRAM if DRAM starts at 0x0, since efi_low_alloc() refuses to allocate at 0x0. Tested-by: Haojian Zhuang haojian.zhu...@linaro.org Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org Reviewed-by: Mark Rutland mark.rutl...@arm.com Mark. --- v2: - reshuffle code flow to make it more logical, and have only a single memcpy() invocation at the end of the function --- arch/arm64/kernel/efi-stub.c | 41 - 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c index f5374065ad53..816120ece6bc 100644 --- a/arch/arm64/kernel/efi-stub.c +++ b/arch/arm64/kernel/efi-stub.c @@ -13,7 +13,7 @@ #include asm/efi.h #include asm/sections.h -efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table, +efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg, unsigned long *image_addr, unsigned long *image_size, unsigned long *reserve_addr, @@ -23,21 +23,44 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table, { efi_status_t status; unsigned long kernel_size, kernel_memsize = 0; + unsigned long nr_pages; + void *old_image_addr = (void *)*image_addr; /* Relocate the image, if required. */ kernel_size = _edata - _text; if (*image_addr != (dram_base + TEXT_OFFSET)) { kernel_memsize = kernel_size + (_end - _edata); - status = efi_low_alloc(sys_table, kernel_memsize + TEXT_OFFSET, -SZ_2M, reserve_addr); + + /* + * First, try a straight allocation at the preferred offset. + * This will work around the issue where, if dram_base == 0x0, + * efi_low_alloc() refuses to allocate at 0x0 (to prevent the + * address of the allocation to be mistaken for a FAIL return + * value or a NULL pointer). It will also ensure that, on + * platforms where the [dram_base, dram_base + TEXT_OFFSET) + * interval is partially occupied by the firmware (like on APM + * Mustang), we can still place the kernel at the address + * 'dram_base + TEXT_OFFSET'. + */ + *image_addr = *reserve_addr = dram_base + TEXT_OFFSET; + nr_pages = round_up(kernel_memsize, EFI_ALLOC_ALIGN) / +EFI_PAGE_SIZE; + status = efi_call_early(allocate_pages, EFI_ALLOCATE_ADDRESS, + EFI_LOADER_DATA, nr_pages, + (efi_physical_addr_t *)reserve_addr); if (status != EFI_SUCCESS) { - pr_efi_err(sys_table, Failed to relocate kernel\n); - return status; + kernel_memsize += TEXT_OFFSET; + status = efi_low_alloc(sys_table_arg, kernel_memsize, +SZ_2M, reserve_addr); + + if (status != EFI_SUCCESS) { + pr_efi_err(sys_table_arg, Failed to relocate kernel\n); + return status; + } + *image_addr = *reserve_addr + TEXT_OFFSET; } - memcpy((void *)*reserve_addr + TEXT_OFFSET, (void *)*image_addr, -kernel_size); - *image_addr = *reserve_addr + TEXT_OFFSET; - *reserve_size = kernel_memsize + TEXT_OFFSET; + memcpy((void *)*image_addr, old_image_addr, kernel_size); + *reserve_size = kernel_memsize; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/efi: prefer AllocatePages() over efi_low_alloc() for vmlinux
Hi Ard, On Fri, Jul 24, 2015 at 10:41:53AM +0100, Ard Biesheuvel wrote: When allocating memory for the kernel image, try the AllocatePages() boot service to obtain memory at the preferred offset of 'dram_base + TEXT_OFFSET', and only revert to efi_low_alloc() if that fails. This is the only way to allocate at the base of DRAM if DRAM starts at 0x0, since efi_low_alloc() refuses to allocate at 0x0. Tested-by: Haojian Zhuang haojian.zhu...@linaro.org Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/efi-stub.c | 47 1 file changed, 39 insertions(+), 8 deletions(-) diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c index f5374065ad53..c8df74d14368 100644 --- a/arch/arm64/kernel/efi-stub.c +++ b/arch/arm64/kernel/efi-stub.c @@ -13,7 +13,7 @@ #include asm/efi.h #include asm/sections.h -efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table, +efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg, Any reason for the _arg addition? unsigned long *image_addr, unsigned long *image_size, unsigned long *reserve_addr, @@ -23,21 +23,52 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table, { efi_status_t status; unsigned long kernel_size, kernel_memsize = 0; + unsigned long nr_pages; /* Relocate the image, if required. */ kernel_size = _edata - _text; if (*image_addr != (dram_base + TEXT_OFFSET)) { kernel_memsize = kernel_size + (_end - _edata); - status = efi_low_alloc(sys_table, kernel_memsize + TEXT_OFFSET, -SZ_2M, reserve_addr); + + // + // First, try a straight allocation at the preferred offset. + // This will work around the issue where, if dram_base == 0x0, + // efi_low_alloc() refuses to allocate at 0x0 (to prevent the + // address of the allocation to be mistaken for a FAIL return + // value or a NULL pointer). It will also ensure that, on + // platforms where the [dram_base, dram_base + TEXT_OFFSET) + // interval is partially occupied by the firmware (like on APM + // Mustang), we can still place the kernel at the address + // 'dram_base + TEXT_OFFSET'. + // /* * Nit: please use the standard comment style */ + *reserve_addr = dram_base + TEXT_OFFSET; + nr_pages = round_up(kernel_memsize, EFI_ALLOC_ALIGN) / +EFI_PAGE_SIZE; + status = efi_call_early(allocate_pages, EFI_ALLOCATE_ADDRESS, + EFI_LOADER_DATA, nr_pages, + (efi_physical_addr_t *)reserve_addr); + if (status == EFI_SUCCESS) { + memcpy((void *)*reserve_addr, (void *)*image_addr, +kernel_size); + *image_addr = *reserve_addr; + *reserve_size = kernel_memsize; + } else { + status = efi_low_alloc(sys_table_arg, +kernel_memsize + TEXT_OFFSET, +SZ_2M, reserve_addr); + + if (status == EFI_SUCCESS) { + memcpy((void *)*reserve_addr + TEXT_OFFSET, +(void *)*image_addr, +kernel_size); + *image_addr = *reserve_addr + TEXT_OFFSET; + *reserve_size = kernel_memsize + TEXT_OFFSET; + } + } if (status != EFI_SUCCESS) { - pr_efi_err(sys_table, Failed to relocate kernel\n); + pr_efi_err(sys_table_arg, Failed to relocate kernel\n); return status; } - memcpy((void *)*reserve_addr + TEXT_OFFSET, (void *)*image_addr, -kernel_size); Could we have a new_image_addr assigned in each case, and keep the common memcpy here, followed by assignment to *image_addr? That would save a couple of lines and guarantee the two cases stay in sync. Otherwise this looks good to me. Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/efi: use UEFI ResetSystem() Runtime Service for system reset
Hi Ard, On Thu, Mar 05, 2015 at 12:51:11PM +, Ard Biesheuvel wrote: If UEFI Runtime Services are available, the ResetSystem() service should be preferred over direct PSCI calls or other methods to reset the system. The reason is that the UpdateCapsule() UEFI Runtime Service, which is used to perform firmware updates, relies on this. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- I sent roughly the same patch ~6 months ago, but at the time, we were still waiting for the restart notifier call chain patches to land. Since that code got rejected, I am proposing this again. Note that efi_enabled(x) always evaluates to 'false' on !CONFIG_EFI. Also, efi_reboot is a static inline for !CONFIG_EFI, so I can't see any possibility of a build failure. This fixes reboot on my Seattle [although it doesn't make it any faster :-)] arch/arm64/kernel/process.c | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index fde9923af859..a52bc0c316a8 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -21,6 +21,7 @@ #include stdarg.h #include linux/compat.h +#include linux/efi.h #include linux/export.h #include linux/sched.h #include linux/kernel.h @@ -150,6 +151,14 @@ void machine_restart(char *cmd) local_irq_disable(); smp_send_stop(); + /* + * Prefer reboot via EFI if available, so that capsule updates [which + * rely on UEFI's ResetSystem() being called with the return value of + * UpdateCapsule()] have a chance of working as expected. + */ + if (efi_enabled(EFI_RUNTIME_SERVICES)) + efi_reboot(reboot_mode, NULL); I expect that the particulars of the UpdateCapsule() will be handled within efi_reboot and won't require any additions here. So the comment could just be trimmed to say that UpdateCapsule() depends on the system being reset with ResetSystem(). Also, we need to make sure we call efi_poweroff to make UpdateCapsule() work when shutting the machine down (behind the scenes efi_poweroff calls ResetSystem(EfiResetShutdown, ...)). For that I think adding the following to arch/arm64/kernel/efi.c is sufficient: /* * UpdateCapsule() depends on the system being shutdown via * ResetSystem(). */ bool efi_poweroff_required(void) { return efi_enabled(EFI_RUNTIME_SERVICES); } I've given the patch a spin (with and without that addition) on Juno and Seattle. So with that folded in: Tested-by: Mark Rutland mark.rutl...@arm.com Reviewed-by: Mark Rutland mark.rutl...@arm.com Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi: fix boundary checking in efi_high_alloc()
Hi Matt, From 1e7295b5d4c5226a160a9167e61b581e388f7f9a Mon Sep 17 00:00:00 2001 From: Yinghai Lu ying...@kernel.org Date: Thu, 19 Feb 2015 20:18:03 -0800 Subject: [PATCH] efi/libstub: Fix boundary checking in efi_high_alloc() While adding support loading kernel and initrd above 4G to grub2 in legacy mode, I was referring to efi_high_alloc(). That will allocate buffer for kernel and then initrd, and initrd will use kernel buffer start as limit. During testing found two buffers will be overlapped when initrd size is very big like 400M. It turns out efi_high_alloc() boundary checking is not right. end - size will be the new start, and should not compare new start with max, we need to make sure end is smaller than max. [ Basically, with the current efi_high_alloc() code it's possible to allocate memory above 'max', because efi_high_alloc() doesn't check that the tail of the allocation is below 'max'. If you have an EFI memory map with a single entry that looks like so, [0xc000-0xc0004000] And want to allocate 0x3000 bytes below 0xc0003000 the current code will allocate [0xc0001000-0xc0004000], not [0xc000-0xc0003000] like you would expect. - Matt ] Signed-off-by: Yinghai Lu ying...@kernel.org Cc: sta...@vger.kernel.org I've convinced myself that the new logic is sound, and with this patch applied atop of v4.0-rc1 I don't see regressions on the platforms I have access to. So: Reviewed-by: Mark Rutland mark.rutl...@arm.com Tested-by: Mark Rutland mark.rutl...@arm.com Ard, Leif: On a related note, I think that the logic for deciding where to place the kernel and DTB isn't quite right. The kernel needs to be in the same naturally-aligned 512M region as the DTB in order to be able to map it, but the kernel could get relocated above the max address we'll consider for the DTB if there isn't sufficient space for the kernel between dram_base and dram_base + 512M. We should try to use the fixmap to map the DTB so it can be located anywhere in physical memory. That will make things easier for the stub and other loaders. Thanks, Mark. Signed-off-by: Matt Fleming matt.flem...@intel.com --- drivers/firmware/efi/libstub/efi-stub-helper.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c index 9bd9fbb5bea8..c927bccd92bd 100644 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c @@ -170,12 +170,12 @@ again: start = desc-phys_addr; end = start + desc-num_pages * (1UL EFI_PAGE_SHIFT); - if ((start + size) end || (start + size) max) - continue; - - if (end - size max) + if (end max) end = max; + if ((start + size) end) + continue; + if (round_down(end - size, align) start) continue; -- 1.9.3 -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi: get_memory_map: add sufficient slack for memory descriptors
On Fri, Feb 13, 2015 at 04:04:48PM +, Matt Fleming wrote: On Thu, 12 Feb, at 11:31:02PM, Ard Biesheuvel wrote: Actually, looking again at the original patch, it appears that my analysis was incorrect regarding the possibility that the loop would never terminate. The only thing that could happen if desc_size sizeof(efi_memory_desc_t) is that you need two iterations instead of one to get a pool allocation that is of sufficient size. So perhaps it is better to just revert the patch. My apologies for the hassle. This is what I've got queued up, Looks sane to me: Acked-by: Mark Rutland mark.rutl...@arm.com Thanks, Mark. --- From 3f281b98ffc99e604a3988aa93304a3a591eeeb5 Mon Sep 17 00:00:00 2001 From: Matt Fleming matt.flem...@intel.com Date: Fri, 13 Feb 2015 15:46:56 + Subject: [PATCH] Revert efi/libstub: Call get_memory_map() to obtain map and desc sizes This reverts commit d1a8d66b9177105e898e73716f97eb61842c457a. Ard reported a boot failure when running UEFI under Qemu and Xen and experimenting with various Tianocore build options, As it turns out, when allocating room for the UEFI memory map using UEFI's AllocatePool (), it may result in two new memory map entries being created, for instance, when using Tianocore's preallocated region feature. For example, the following region 0x5ead5000-0x5ebf [Conventional Memory| | | | | |WB|WT|WC|UC] may be split like this 0x5ead5000-0x5eae2fff [Conventional Memory| | | | | |WB|WT|WC|UC] 0x5eae3000-0x5eae4fff [Loader Data| | | | | |WB|WT|WC|UC] 0x5eae5000-0x5ebf [Conventional Memory| | | | | |WB|WT|WC|UC] if the preallocated Loader Data region was chosen to be right in the middle of the original free space. After patch d1a8d66b9177 (efi/libstub: Call get_memory_map() to obtain map and desc sizes), this is not being dealt with correctly anymore, as the existing logic to allocate room for a single additional entry has become insufficient. Mark requested to reinstate the old loop we had before commit d1a8d66b9177, which grows the memory map buffer until it's big enough to hold the EFI memory map. Cc: Ard Biesheuvel ard.biesheu...@linaro.org Cc: Mark Rutland mark.rutl...@arm.com Signed-off-by: Matt Fleming matt.flem...@intel.com --- drivers/firmware/efi/libstub/efi-stub-helper.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c index d073e3946383..9bd9fbb5bea8 100644 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c @@ -66,29 +66,25 @@ efi_status_t efi_get_memory_map(efi_system_table_t *sys_table_arg, unsigned long key; u32 desc_version; - *map_size = 0; - *desc_size = 0; - key = 0; - status = efi_call_early(get_memory_map, map_size, NULL, - key, desc_size, desc_version); - if (status != EFI_BUFFER_TOO_SMALL) - return EFI_LOAD_ERROR; - + *map_size = sizeof(*m) * 32; +again: /* * Add an additional efi_memory_desc_t because we're doing an * allocation which may be in a new descriptor region. */ - *map_size += *desc_size; + *map_size += sizeof(*m); status = efi_call_early(allocate_pool, EFI_LOADER_DATA, *map_size, (void **)m); if (status != EFI_SUCCESS) goto fail; + *desc_size = 0; + key = 0; status = efi_call_early(get_memory_map, map_size, m, key, desc_size, desc_version); if (status == EFI_BUFFER_TOO_SMALL) { efi_call_early(free_pool, m); - return EFI_LOAD_ERROR; + goto again; } if (status != EFI_SUCCESS) -- 1.9.3 -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi: get_memory_map: add sufficient slack for memory descriptors
On Thu, Feb 12, 2015 at 05:24:19AM +, Ard Biesheuvel wrote: As it turns out, when allocating room for the UEFI memory map using UEFI's AllocatePool (), it may result in two new memory map entries being created, for instance, when using Tianocore's preallocated region feature. For example, the following region 0x5ead5000-0x5ebf [Conventional Memory| | | | | |WB|WT|WC|UC] may be split like this 0x5ead5000-0x5eae2fff [Conventional Memory| | | | | |WB|WT|WC|UC] 0x5eae3000-0x5eae4fff [Loader Data| | | | | |WB|WT|WC|UC] 0x5eae5000-0x5ebf [Conventional Memory| | | | | |WB|WT|WC|UC] if the preallocated Loader Data region was chosen to be right in the middle of the original free space. After patch d1a8d66b9177 (efi/libstub: Call get_memory_map() to obtain map and desc sizes), this is not being dealt with correctly anymore, as the existing logic to allocate room for a single additional entry has become insufficient. So instead, add room for two additional entries instead. Fixes: d1a8d66b9177 (efi/libstub: Call get_memory_map() to obtain map and desc sizes) Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- drivers/firmware/efi/libstub/efi-stub-helper.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c index af5d63c7cc53..ca0b07ed3b14 100644 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c @@ -84,10 +84,10 @@ efi_status_t efi_get_memory_map(efi_system_table_t *sys_table_arg, return EFI_LOAD_ERROR; /* - * Add an additional efi_memory_desc_t because we're doing an - * allocation which may be in a new descriptor region. + * Add room for two additional efi_memory_desc_t entries because we're + * doing an allocation which may be in a new descriptor region. It might be worth noting that a existing regions can be split/reorganised here, otherwise it's a little difficult to deduce from the comment why to regions are needed. */ - *map_size += *desc_size; + *map_size += *desc_size * 2; Can we forsee any cases where we might need more than two additional descs? Is it perhaps adding a little more slack now? Otherwise this looks fine to me. Thanks, Mark. status = efi_call_early(allocate_pool, EFI_LOADER_DATA, *map_size, (void **)m); if (status != EFI_SUCCESS) -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi: get_memory_map: add sufficient slack for memory descriptors
On Thu, Feb 12, 2015 at 10:39:46AM +, Ard Biesheuvel wrote: On 12 February 2015 at 18:22, Mark Rutland mark.rutl...@arm.com wrote: On Thu, Feb 12, 2015 at 05:24:19AM +, Ard Biesheuvel wrote: As it turns out, when allocating room for the UEFI memory map using UEFI's AllocatePool (), it may result in two new memory map entries being created, for instance, when using Tianocore's preallocated region feature. For example, the following region 0x5ead5000-0x5ebf [Conventional Memory| | | | | |WB|WT|WC|UC] may be split like this 0x5ead5000-0x5eae2fff [Conventional Memory| | | | | |WB|WT|WC|UC] 0x5eae3000-0x5eae4fff [Loader Data| | | | | |WB|WT|WC|UC] 0x5eae5000-0x5ebf [Conventional Memory| | | | | |WB|WT|WC|UC] if the preallocated Loader Data region was chosen to be right in the middle of the original free space. After patch d1a8d66b9177 (efi/libstub: Call get_memory_map() to obtain map and desc sizes), this is not being dealt with correctly anymore, as the existing logic to allocate room for a single additional entry has become insufficient. So instead, add room for two additional entries instead. Fixes: d1a8d66b9177 (efi/libstub: Call get_memory_map() to obtain map and desc sizes) Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- drivers/firmware/efi/libstub/efi-stub-helper.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c index af5d63c7cc53..ca0b07ed3b14 100644 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c @@ -84,10 +84,10 @@ efi_status_t efi_get_memory_map(efi_system_table_t *sys_table_arg, return EFI_LOAD_ERROR; /* - * Add an additional efi_memory_desc_t because we're doing an - * allocation which may be in a new descriptor region. + * Add room for two additional efi_memory_desc_t entries because we're + * doing an allocation which may be in a new descriptor region. It might be worth noting that a existing regions can be split/reorganised here, otherwise it's a little difficult to deduce from the comment why to regions are needed. OK */ - *map_size += *desc_size; + *map_size += *desc_size * 2; Can we forsee any cases where we might need more than two additional descs? Is it perhaps adding a little more slack now? I don't see how doing a single allocation could result in a single free region to be split into more than 1 occupied region + 2 free regions. So no, I don't think it is ... Fair enough. If we're surprised later we can always fix things up. With the comment fix up: Acked-by: Mark Rutland mark.rutl...@arm.com Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi: get_memory_map: add sufficient slack for memory descriptors
On Thu, Feb 12, 2015 at 02:56:51PM +, Ard Biesheuvel wrote: On 12 February 2015 at 22:47, Matt Fleming m...@codeblueprint.co.uk wrote: On Thu, 12 Feb, at 06:39:46PM, Ard Biesheuvel wrote: I don't see how doing a single allocation could result in a single free region to be split into more than 1 occupied region + 2 free regions. So no, I don't think it is ... I don't think that's a guarantee we can make, nor is it something we should rely upon. Please explain the user-visible failure that this patch fixes. Does your machine refuse to boot? I am running UEFI under QEMU and Xen primarily at the moment, and experimenting with various build options in Tianocore, One of the options is preallocating and freeing blocks of various memory types, in a way that should result in the final number of distinct regions to be much lower. It could result however in a free memory region to be carved up in three instead of two, and that is a failure I have seen occur. The simple answer is that the machine will fail to boot, beause the efi_get_memory_map helper will give up after one go, and propagate the error. The arm-stub will give up when the error is encountered. Why is the 'goto again' loop insufficient in handling this scenario? Yes, that should solve it as well, so if you prefer I reinstate that, I can respin the patch. There is a theoretical possibility that it would take more than just one more iteration, but that is highly unlikely and it should still always complete. Please reinstate the loop. It will make this far less fragile. Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/efi: handle potential failure to remap memory map
On Thu, Jan 15, 2015 at 12:01:06PM +, Ard Biesheuvel wrote: When remapping the UEFI memory map using ioremap_cache(), we have to deal with potential failure. Note that, even if the common case is for ioremap_cache() to return the existing linear mapping of the memory map, we cannot rely on that to be always the case, e.g., in the presence of a mem= kernel parameter. At the same time, remove a stale comment and move the memmap code together. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org Acked-by: Mark Rutland mark.rutl...@arm.com This should probably be picked up before my mem= patch, which I will be resending shortly. I'll mention that when posting. I guess Catalin should pick this up given the other portions are in the arm64 tree, and this falls under arch/arm64. Thanks, Mark. --- arch/arm64/kernel/efi.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c index a98415b5979c..c9cb0fbe7aa4 100644 --- a/arch/arm64/kernel/efi.c +++ b/arch/arm64/kernel/efi.c @@ -229,19 +229,21 @@ static int __init arm64_enable_runtime_services(void) return -1; } - mapsize = memmap.map_end - memmap.map; - if (efi_runtime_disabled()) { pr_info(EFI runtime services will be disabled.\n); return -1; } pr_info(Remapping and enabling EFI services.\n); - /* replace early memmap mapping with permanent mapping */ + + mapsize = memmap.map_end - memmap.map; memmap.map = (__force void *)ioremap_cache((phys_addr_t)memmap.phys_map, mapsize); + if (!memmap.map) { + pr_err(Failed to remap EFI memory map\n); + return -1; + } memmap.map_end = memmap.map + mapsize; - efi.memmap = memmap; efi.systab = (__force void *)ioremap_cache(efi_system_table, -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi: stub: call get_memory_map() to obtain map and desc sizes
On Thu, Jan 08, 2015 at 07:09:22PM +, Ard Biesheuvel wrote: On 8 January 2015 at 19:04, Mark Rutland mark.rutl...@arm.com wrote: Hi Ard, On Thu, Jan 08, 2015 at 05:51:47PM +, Ard Biesheuvel wrote: This fixes two minor issues in the implementation of get_memory_map(): - Currently, it assumes that sizeof(efi_memory_desc_t) == desc_size, which is usually true, but not mandated by the spec. (This was added intentionally to allow future additions to the definition of efi_memory_desc_t). The way the loop is implemented currently, the added slack space may be insufficient if desc_size is larger, which in some corner cases could result in the loop never terminating. - It allocates 32 efi_memory_desc_t entries first (again, using the size of the struct instead of desc_size), and frees and reallocates if it turns out to be insufficient. Few implementations of UEFI have such small memory maps, which results in a unnecessary allocate/free pair on each invocation. Fix this by calling the get_memory_map() boot service first with a '0' input value for map size to retrieve the map size and desc size from the firmware and only then perform the allocation, using desc_size rather than sizeof(efi_memory_desc_t). Is the desc_size guaranteed to be set up correctly if the size is too small? As far as I can see, for that case the spec only mandates that MemoryMapSize is updated nd EFI_BUFFER_TOO_SMALL is returned. It's not clear to me whether DescriptorSize or DescriptorVersion are initialised in cases other than success. The way I read it, descriptor size and descriptor version are always returned, e.g., as opposed to MapKey, which is only returned on success, and the spec mentions that specifically. I agree that that would be the sensible reading of the spec. I just fear that there's room for an implementor to read it slightly differently, and cause pain for us. We could ask for clarification just to be sure. I think we should. Given it could take a while for any conclusion to be reached and published, I'm happy to go with the patch below in the meantime, so long as the issue gets raised. Cheers, Mark. -- Ard. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- drivers/firmware/efi/libstub/efi-stub-helper.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c index e766df60fbfb..caf91eab0bfc 100644 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c @@ -75,25 +75,29 @@ efi_status_t efi_get_memory_map(efi_system_table_t *sys_table_arg, unsigned long key; u32 desc_version; - *map_size = sizeof(*m) * 32; -again: + *map_size = 0; + *desc_size = 0; + key = 0; + status = efi_call_early(get_memory_map, map_size, NULL, + key, desc_size, desc_version); + if (status != EFI_BUFFER_TOO_SMALL) + return EFI_LOAD_ERROR; + /* * Add an additional efi_memory_desc_t because we're doing an * allocation which may be in a new descriptor region. */ - *map_size += sizeof(*m); + *map_size += *desc_size; status = efi_call_early(allocate_pool, EFI_LOADER_DATA, *map_size, (void **)m); if (status != EFI_SUCCESS) goto fail; - *desc_size = 0; - key = 0; status = efi_call_early(get_memory_map, map_size, m, key, desc_size, desc_version); if (status == EFI_BUFFER_TOO_SMALL) { efi_call_early(free_pool, m); - goto again; + return EFI_LOAD_ERROR; } if (status != EFI_SUCCESS) -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 6/8] arm64/efi: move SetVirtualAddressMap() to UEFI stub
Hi Ard, I have a few (minor) comments below. On Mon, Dec 22, 2014 at 10:59:02AM +, Ard Biesheuvel wrote: In order to support kexec, the kernel needs to be able to deal with the state of the UEFI firmware after SetVirtualAddressMap() has been called. To avoid having separate code paths for non-kexec and kexec, let's move the call to SetVirtualAddressMap() to the stub: this will guarantee us that it will only be called once (since the stub is not executed during kexec), and ensures that the UEFI state is identical between kexec and normal boot. This implies that the layout of the virtual mapping needs to be created by the stub as well. All regions are rounded up to a naturally aligned multiple of 64 KB (for compatibility with 64k pages kernels) and recorded in the UEFI memory map. The kernel proper reads those values and installs the mappings in a dedicated set of page tables that are swapped in during UEFI Runtime Services calls. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/include/asm/efi.h | 20 +++- arch/arm64/kernel/efi.c| 223 - arch/arm64/kernel/setup.c | 1 + drivers/firmware/efi/libstub/fdt.c | 137 ++- 4 files changed, 270 insertions(+), 111 deletions(-) [...] +static void efi_set_pgd(struct mm_struct *mm) +{ + cpu_switch_mm(mm-pgd, mm); + flush_tlb_all(); + if (icache_is_aivivt()) + __flush_icache_all(); +} Do we have any idea how often we call runtime services? I assume not all that often (read the RTC at boot, set/get variables). If we're nuking the TLBs and I-cache a lot we'll probably need to reserve an asid for the EFI virtmap. [...] +void __init efi_virtmap_init(void) +{ + efi_memory_desc_t *md; + + if (!efi_enabled(EFI_BOOT)) + return; + + for_each_efi_memory_desc(memmap, md) { + u64 paddr, npages, size; + pgprot_t prot; + + if (!(md-attribute EFI_MEMORY_RUNTIME)) + continue; + if (WARN(md-virt_addr == 0, +UEFI virtual mapping incomplete or missing -- no entry found for 0x%llx\n, +md-phys_addr)) + return; I wonder if we might want to use a different address to signal a bad mapping (e.g. 0UL-1 as it's not even EFI_PAGE_SIZE aligned), just in case we have to handle a valid use of zero in future for some reason -- perhaps coming from another OS. + + paddr = md-phys_addr; + npages = md-num_pages; + memrange_efi_to_native(paddr, npages); + size = npages PAGE_SHIFT; + + pr_info( EFI remap 0x%012llx = %p\n, Why not use %016llx? We might only have 48-bit PAs currently, but we may as well keep the VA and PA equally long when printing out -- that'll also help to identify issues with garbage in the upper 16 bits of the PA field. [...] +/* + * This is the base address at which to start allocating virtual memory ranges + * for UEFI Runtime Services. This is a userland range so that we can use any + * allocation we choose, and eliminate the risk of a conflict after kexec. + */ +#define EFI_RT_VIRTUAL_BASE0x4000 Nit: I'm not keen on the term userland here. You can map stuff to EL0 in the high half of the address space if you wanted, so TTBR0/TTBR1 aren't architecturally user/kernel. s/userland/low half/, perhaps? It might also be worth pointing out that the arbitrary base address isn't zero so as to be less likely to be an idmap. + +static void update_memory_map(efi_memory_desc_t *memory_map, + unsigned long map_size, unsigned long desc_size, + int *count) +{ + u64 efi_virt_base = EFI_RT_VIRTUAL_BASE; + efi_memory_desc_t *out = (void *)memory_map + map_size; + int l; + + for (l = 0; l map_size; l += desc_size) { + efi_memory_desc_t *in = (void *)memory_map + l; + u64 paddr, size; + + if (!(in-attribute EFI_MEMORY_RUNTIME)) + continue; + + /* +* Make the mapping compatible with 64k pages: this allows +* a 4k page size kernel to kexec a 64k page size kernel and +* vice versa. +*/ + paddr = round_down(in-phys_addr, SZ_64K); + size = round_up(in-num_pages * EFI_PAGE_SIZE + + in-phys_addr - paddr, SZ_64K); + + /* +* Avoid wasting memory on PTEs by choosing a virtual base that +* is compatible with section mappings if this region has the +* appropriate size and physical alignment. (Sections are 2 MB +* on 4k granule kernels) +
Re: [PATCH v2 09/10] arm64/efi: ignore unusable regions instead of reserving them
On Tue, Nov 11, 2014 at 05:12:09PM +, Mark Salter wrote: On Tue, 2014-11-11 at 10:42 -0500, Mark Salter wrote: On Mon, 2014-11-10 at 08:31 +0100, Ard Biesheuvel wrote: On 10 November 2014 05:11, Mark Salter msal...@redhat.com wrote: On Thu, 2014-11-06 at 15:13 +0100, Ard Biesheuvel wrote: This changes the way memblocks are installed based on the contents of the UEFI memory map. Formerly, all regions would be memblock_add()'ed, after which unusable regions would be memblock_reserve()'d as well. To simplify things, but also to allow access to the unusable regions through mmap(/dev/mem), even with CONFIG_STRICT_DEVMEM set, change this so that only usable regions are memblock_add()'ed in the first place. This patch is crashing 64K pagesize kernels during boot. I'm not exactly sure why, though. Here is what I get on an APM Mustang box: Ah, yes, I meant to mention this patch https://git.kernel.org/cgit/linux/kernel/git/glikely/linux.git/commit/?id=8cccffc52694938fc88f3d90bc7fed8460e27191 in the cover letter, which addresses this issue at least for the DT case. That isn't the problem. In general, with 64K kernel pages, you can't be sure if you leave something you need out of the kernel linear mapping. Regardless of 64k pages you can never assume that anything will be in the linear mapping due to the current relationship between the start of the linear map and the address the kernel was loaded at. If you have Loader Code/Data regions begin and/or end at something other than a 64K boundary and that region is adjacent to a region not being added, then you end up leaving out the unaligned bits from the linear mapping. This could be bits of the initramfs or devicetree. What I don't get with this failure is that it is an alignment fault which should be masked at EL1 for the kernel. The same unaligned access happens without this patch and it doesn't generate a fault. Ah, but unaligned accesses are not ignored for device memory. I have this in include/acpi/acpi_io.h: static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys, acpi_size size) { #ifdef CONFIG_ARM64 if (!page_is_ram(phys PAGE_SHIFT)) return ioremap(phys, size); #endif return ioremap_cache(phys, size); } Because the table isn't in the linear mapping, it fails the page_is_ram() test and it gits mapped with ioremap() leading to the alignment fault. If I take out the code inside the #ifdef, I get a different fault: [0.350057] Unhandled fault: synchronous external abort (0x9610) at 0xfefae6f4 [0.358704] pgd = fe000116 [0.362276] [fefae6f4] *pgd=004001370003, *pud=004001370003, *pmd=004001370003, *pte=02c00040011a0713 [0.373746] Internal error: : 9610 [#1] SMP [0.378484] Modules linked in: [0.381601] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc4+ #15 [0.388248] Hardware name: APM X-Gene Mustang board (DT) [0.393738] task: fe03dbe1 ti: fe03dbf0 task.ti: fe03dbf0 [0.401503] PC is at acpi_ex_system_memory_space_handler+0x238/0x2e0 [0.408160] LR is at acpi_ex_system_memory_space_handler+0x130/0x2e0 That happens because AML is trying to access a hardware register which has been mapped as normal memory. Which is why removing the check in the ifdef is completely nonsensical. We already knew we can't map everything cacheable -- regardless of what AML does the CPU can prefetch from anything mapped cacheable (or simply executable) at any time. So, we need a way to tell a table in ram from an io address in AML. And page_is_ram() no longer cuts it if the tables are not in the linear mapping. If the kernel were loaded at an address above the tables, it would fail similarly. So the page_is_ram was never sufficient to ensure tables would be mapped cacheable. As we haven't decoupled the kernel text mapping from the linear mapping, and that doesn't look to be happening any time soon, we can't fix up page_is_ram -- we need something else entirely. Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 09/10] arm64/efi: ignore unusable regions instead of reserving them
On Tue, Nov 11, 2014 at 05:55:24PM +, Ard Biesheuvel wrote: On 11 November 2014 18:44, Mark Rutland mark.rutl...@arm.com wrote: On Tue, Nov 11, 2014 at 05:12:09PM +, Mark Salter wrote: On Tue, 2014-11-11 at 10:42 -0500, Mark Salter wrote: On Mon, 2014-11-10 at 08:31 +0100, Ard Biesheuvel wrote: On 10 November 2014 05:11, Mark Salter msal...@redhat.com wrote: On Thu, 2014-11-06 at 15:13 +0100, Ard Biesheuvel wrote: This changes the way memblocks are installed based on the contents of the UEFI memory map. Formerly, all regions would be memblock_add()'ed, after which unusable regions would be memblock_reserve()'d as well. To simplify things, but also to allow access to the unusable regions through mmap(/dev/mem), even with CONFIG_STRICT_DEVMEM set, change this so that only usable regions are memblock_add()'ed in the first place. This patch is crashing 64K pagesize kernels during boot. I'm not exactly sure why, though. Here is what I get on an APM Mustang box: Ah, yes, I meant to mention this patch https://git.kernel.org/cgit/linux/kernel/git/glikely/linux.git/commit/?id=8cccffc52694938fc88f3d90bc7fed8460e27191 in the cover letter, which addresses this issue at least for the DT case. That isn't the problem. In general, with 64K kernel pages, you can't be sure if you leave something you need out of the kernel linear mapping. Regardless of 64k pages you can never assume that anything will be in the linear mapping due to the current relationship between the start of the linear map and the address the kernel was loaded at. If you have Loader Code/Data regions begin and/or end at something other than a 64K boundary and that region is adjacent to a region not being added, then you end up leaving out the unaligned bits from the linear mapping. This could be bits of the initramfs or devicetree. What I don't get with this failure is that it is an alignment fault which should be masked at EL1 for the kernel. The same unaligned access happens without this patch and it doesn't generate a fault. Ah, but unaligned accesses are not ignored for device memory. I have this in include/acpi/acpi_io.h: static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys, acpi_size size) { #ifdef CONFIG_ARM64 if (!page_is_ram(phys PAGE_SHIFT)) return ioremap(phys, size); #endif return ioremap_cache(phys, size); } Because the table isn't in the linear mapping, it fails the page_is_ram() test and it gits mapped with ioremap() leading to the alignment fault. If I take out the code inside the #ifdef, I get a different fault: [ 0.350057] Unhandled fault: synchronous external abort (0x9610) at 0xfefae6f4 [0.358704] pgd = fe000116 [0.362276] [fefae6f4] *pgd=004001370003, *pud=004001370003, *pmd=004001370003, *pte=02c00040011a0713 [0.373746] Internal error: : 9610 [#1] SMP [0.378484] Modules linked in: [0.381601] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc4+ #15 [0.388248] Hardware name: APM X-Gene Mustang board (DT) [0.393738] task: fe03dbe1 ti: fe03dbf0 task.ti: fe03dbf0 [0.401503] PC is at acpi_ex_system_memory_space_handler+0x238/0x2e0 [0.408160] LR is at acpi_ex_system_memory_space_handler+0x130/0x2e0 That happens because AML is trying to access a hardware register which has been mapped as normal memory. Which is why removing the check in the ifdef is completely nonsensical. I am sure we are all in agreement on this part. We already knew we can't map everything cacheable -- regardless of what AML does the CPU can prefetch from anything mapped cacheable (or simply executable) at any time. So, we need a way to tell a table in ram from an io address in AML. And page_is_ram() no longer cuts it if the tables are not in the linear mapping. If the kernel were loaded at an address above the tables, it would fail similarly. So the page_is_ram was never sufficient to ensure tables would be mapped cacheable. As we haven't decoupled the kernel text mapping from the linear mapping, and that doesn't look to be happening any time soon, we can't fix up page_is_ram -- we need something else entirely. Well, if you look at the series, and in particular at the /dev/mem handling, there is already some code that classifies physical addresses based on whether they appear in the UEFI memory map, and with which attributes. I suppose reimplementing page_is_ram() [whose default implementation is conveniently __weak] to return true for EFI_MEMORY_WB ranges and false for everything else would do the trick here, and would arguably be more elegant than matching the string System RAM
Re: [PATCH v2 04/10] arm64/efi: invert UEFI memory region reservation logic
On Tue, Oct 28, 2014 at 04:18:37PM +, Ard Biesheuvel wrote: Instead of reserving the memory regions based on which types we know need to be reserved, consider only regions of the following types as free for general use by the OS: EFI_LOADER_CODE EFI_LOADER_DATA EFI_BOOT_SERVICES_CODE EFI_BOOT_SERVICES_DATA EFI_CONVENTIONAL_MEMORY Note that this also fixes a problem with the original code, which would misidentify a EFI_RUNTIME_SERVICES_DATA region as not reserved if it does not have the EFI_MEMORY_RUNTIME attribute set. However, it is perfectly legal for the firmware not to request a virtual mapping for EFI_RUNTIME_SERVICES_DATA regions that contain configuration tables, in which case the EFI_MEMORY_RUNTIME attribute would not be set. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/efi.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c index 95c49ebc660d..2e829148fb36 100644 --- a/arch/arm64/kernel/efi.c +++ b/arch/arm64/kernel/efi.c @@ -125,17 +125,17 @@ out: */ static __init int is_reserve_region(efi_memory_desc_t *md) { - if (!is_normal_ram(md)) + switch (md-type) { + case EFI_LOADER_CODE: + case EFI_LOADER_DATA: + case EFI_BOOT_SERVICES_CODE: + case EFI_BOOT_SERVICES_DATA: + case EFI_CONVENTIONAL_MEMORY: return 0; - - if (md-attribute EFI_MEMORY_RUNTIME) - return 1; - - if (md-type == EFI_ACPI_RECLAIM_MEMORY || - md-type == EFI_RESERVED_TYPE) - return 1; - - return 0; + default: + break; + } + return is_normal_ram(md); Just to check: did we figure out if UnusableMemory was allowed to have EFI_MEMORY_WB attributes? If it isn't, this looks fine to me. If it is, then we will need to remove that memory (rather than reserving it) to prevent speculative accesses. Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 04/10] arm64/efi: invert UEFI memory region reservation logic
On Tue, Oct 28, 2014 at 05:08:29PM +, Ard Biesheuvel wrote: On 28 October 2014 17:47, Mark Rutland mark.rutl...@arm.com wrote: On Tue, Oct 28, 2014 at 04:18:37PM +, Ard Biesheuvel wrote: Instead of reserving the memory regions based on which types we know need to be reserved, consider only regions of the following types as free for general use by the OS: EFI_LOADER_CODE EFI_LOADER_DATA EFI_BOOT_SERVICES_CODE EFI_BOOT_SERVICES_DATA EFI_CONVENTIONAL_MEMORY Note that this also fixes a problem with the original code, which would misidentify a EFI_RUNTIME_SERVICES_DATA region as not reserved if it does not have the EFI_MEMORY_RUNTIME attribute set. However, it is perfectly legal for the firmware not to request a virtual mapping for EFI_RUNTIME_SERVICES_DATA regions that contain configuration tables, in which case the EFI_MEMORY_RUNTIME attribute would not be set. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/efi.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c index 95c49ebc660d..2e829148fb36 100644 --- a/arch/arm64/kernel/efi.c +++ b/arch/arm64/kernel/efi.c @@ -125,17 +125,17 @@ out: */ static __init int is_reserve_region(efi_memory_desc_t *md) { - if (!is_normal_ram(md)) + switch (md-type) { + case EFI_LOADER_CODE: + case EFI_LOADER_DATA: + case EFI_BOOT_SERVICES_CODE: + case EFI_BOOT_SERVICES_DATA: + case EFI_CONVENTIONAL_MEMORY: return 0; - - if (md-attribute EFI_MEMORY_RUNTIME) - return 1; - - if (md-type == EFI_ACPI_RECLAIM_MEMORY || - md-type == EFI_RESERVED_TYPE) - return 1; - - return 0; + default: + break; + } + return is_normal_ram(md); Just to check: did we figure out if UnusableMemory was allowed to have EFI_MEMORY_WB attributes? If it isn't, this looks fine to me. If it is, then we will need to remove that memory (rather than reserving it) to prevent speculative accesses. The spec does not mention at all how EfiUnusableMemory should be used, and I would assume any such regions to have the EFI_MEMORY_WB attribute set, as it is carved out of the normal system RAM, and the way Tianocore/EDK2 implements it at least, all those attributes (including the write-protect/execute-protect ones) are copied straight from the underlying regions and never set to reflect the nature of the actual contents. Ok. That's precisely the case I was concerned about. However, for 3.20 I intend to propose another change to this code, so that only non-reserved, usable memory gets memblock_add()'ed in the first place, and I suppose this should cover your concern as well. The reason for doing that is to allow tools like dmidecode and lshw access to the SMBIOS and other tables through /dev/mem, which is currently disallowed when STRICT_DEVMEM is set. So long as said memory is not later passed to memblock_reserve, that should be ok for the EfiUnusableMemory case. I guess we haven't actually seen such memory yet anyhow? I not all that keen on the usage of /dev/mem for those given the availability of other interfaces. Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/10] arm64/efi: use UEFI memory map unconditionally if available
On Wed, Oct 22, 2014 at 06:06:56PM +0100, Mark Salter wrote: On Wed, 2014-10-22 at 16:21 +0200, Ard Biesheuvel wrote: On systems that boot via UEFI, all memory nodes are deleted from the device tree, and instead, the size and location of system RAM is derived from the UEFI memory map. This is handled by reserve_regions, which not only reserves parts of memory that UEFI declares as reserved, but also installs the memblocks that cover the remaining usable memory. Currently, reserve_regions() is only called if uefi_init() succeeds. However, it does not actually depend on anything that uefi_init() does, and not calling reserve_regions() results in a broken boot, so it is better to just call it unconditionally. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/efi.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c index 51522ab0c6da..4cec21b1ecdd 100644 --- a/arch/arm64/kernel/efi.c +++ b/arch/arm64/kernel/efi.c @@ -313,8 +313,7 @@ void __init efi_init(void) memmap.desc_size = params.desc_size; memmap.desc_version = params.desc_ver; - if (uefi_init() 0) - return; + WARN_ON(uefi_init() 0); reserve_regions(); } It also looks like EFI_BOOT flag will be set even if uefi_init fails. If uefi_init fails, we only need reserve_regions() for the purpose of adding memblocks. Otherwise, we end up wasting a lot of memory. What memory are we wasting in that case? Surely the only items that we could choose to throw away if we failed uefi_init are EFI_ACPI_RECLAIM_MEMORY and EFI_MEMORY_RUNTIME? We might want to keep those around so we can kexec into a kernel where we can make use of them. Surely they shouldn't take up a significant proportion of the available memory? Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/10] arm64/efi: use UEFI memory map unconditionally if available
[...] It also looks like EFI_BOOT flag will be set even if uefi_init fails. If uefi_init fails, we only need reserve_regions() for the purpose of adding memblocks. Otherwise, we end up wasting a lot of memory. What memory are we wasting in that case? Surely the only items that we could choose to throw away if we failed uefi_init are EFI_ACPI_RECLAIM_MEMORY and EFI_MEMORY_RUNTIME? We might want to keep those around so we can kexec into a kernel where we can make use of them. Surely they shouldn't take up a significant proportion of the available memory? In addition, reserve_regions() also reserves BOOT_SERVICES (which gets freed up later after set_virtual_address_map(). That won't happen if uefi_init() fails. In any case, if uefi_init() fails, we won't be able to find the ACPI tables kexec kernel won't be able to either. If you need the ACPI tables, the first kernel won't boot. However, if we fail to map the runtime services data and code, that doesn't mean the next kernel can't. We might have failed to map the runtime services due to an early_ioremap bug or similar, and that could be fixed in the kernel we're about to kexec to. If we're going to nuke runtime regions, we should nuke them in the memory map descriptors as a courtesy to the next kernel. Otherwise we could be more courteous and simply leave them be (which is my preference). The BOOT_SERVICES stuff is the big chunk. There was some discussion of not reserving that but no one has gotten around to writing the patch to clean out the code that does it. I've just spent some time doing so; patch below. It boots happily on my Juno (with 4KiB pages at least), but I haven't given it much of a stress test. I wasn't sure if unusable memory could show up with EFI_MEMORY_WB attributes. If so, this patch still won't prevent that from being mapped as cacheable, but that should be easy to fix. Thanks, Mark. 8 From 581d5bac5b4a6f93e23737012b71c58d809af6bb Mon Sep 17 00:00:00 2001 From: Mark Rutland mark.rutl...@arm.com Date: Thu, 23 Oct 2014 19:41:55 +0100 Subject: [PATCH] arm64: efi: Simplify memory descriptor handling Currently discovering the memory map from UEFI is a multi-stage affair, where the boot services code and data are kept around for much longer than necessary. Additionally, potential overlap between memory and IO mappings at kernel page granularity is not handled. This patch attempts to solve both of these issues, with the addition and filtering of memory regions being handled in a single function. First, all normal memory (i.e. anything which can be mapped writeback cacheable) is added to memblock. Second, IO and reserved regions are filtered out of memblock at kernel page granularity, to prevent potential conflicting mappings. As some reserved regions must be present in the idmap these are reserved. Everything else that's not normal memory is removed, preventing the possibility of a cacheable mapping covering something it shouldn't. Signed-off-by: Mark Rutland mark.rutl...@arm.com Cc: Ard Biesheuvel ard.biesheu...@linaro.org Cc: Catalin Marinas catalin.mari...@arm.com Cc: Leif Lindholm leif.lindh...@linaro.org Cc: Mark Salter msal...@redhat.com --- arch/arm64/kernel/efi.c | 181 +++- 1 file changed, 42 insertions(+), 139 deletions(-) diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c index 03aaa99..8873c28 100644 --- a/arch/arm64/kernel/efi.c +++ b/arch/arm64/kernel/efi.c @@ -154,161 +154,66 @@ static __init int is_reserve_region(efi_memory_desc_t *md) return 0; } -static __init void reserve_regions(void) +static __init void efi_add_memory(void) { efi_memory_desc_t *md; - u64 paddr, npages, size; + u64 paddr, size; if (uefi_debug) - pr_info(Processing EFI memory map:\n); + pr_info(EFI: adding normal memory:\n); for_each_efi_memory_desc(memmap, md) { + if (!is_normal_ram(md)) + continue; + paddr = md-phys_addr; - npages = md-num_pages; + size = md-num_pages EFI_PAGE_SHIFT; if (uefi_debug) - pr_info( 0x%012llx-0x%012llx [%s], - paddr, paddr + (npages EFI_PAGE_SHIFT) - 1, + pr_info( 0x%012llx-0x%012llx [%s]\n, + paddr, size - 1, memory_type_name[md-type]); - memrange_efi_to_native(paddr, npages); - size = npages PAGE_SHIFT; + early_init_dt_add_memory_arch(paddr, size); + } + + /* +* Regions describe in an EFI_MEMORY_DESCRIPTOR are only guaranteed to +* be 4KiB aligned but the kernel page size could be larger, and +* regions with different attributes could fall into the same kernel +* page. Thus we must remove any memory in the same
Re: [PATCH 01/10] arm64/efi: efistub: jump to 'stext' directly, not through the header
Hi Ard, On Wed, Oct 22, 2014 at 03:21:44PM +0100, Ard Biesheuvel wrote: After the EFI stub has done its business, it jumps into the kernel by branching to offset #0 of the loaded Image, which is where it expects to find the header containing a 'branch to stext' instruction. However, the UEFI spec 2.1.1 states the following regarding PE/COFF image loading: A UEFI image is loaded into memory through the LoadImage() Boot Service. This service loads an image with a PE32+ format into memory. This PE32+ loader is required to load all sections of the PE32+ image into memory. In other words, it is /not/ required to load parts of the image that are not covered by a PE/COFF section, so it may not have loaded the header at the expected offset, as it is not covered by any PE/COFF section. So instead, jump to 'stext' directly, which is at the base of the PE/COFF .text section, by supplying a symbol 'stext_offset' to efi-entry.o which contains the relative offset of stext into the Image. Also replace other open coded calculations of the same value with a reference to 'stext_offset' Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org Given the constraints you describe above, and prior discussions, this looks sane to me: Acked-by: Mark Rutland mark.rutl...@arm.com Mark. --- v3: - rebased onto 3.17+ - added spec reference to commit message v2: - drop :lo12: relocation against stext_offset in favor of using a literal '=stext_offset' which is safer --- arch/arm64/kernel/efi-entry.S | 3 ++- arch/arm64/kernel/head.S | 10 ++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S index 619b1dd7bcde..a0016d3a17da 100644 --- a/arch/arm64/kernel/efi-entry.S +++ b/arch/arm64/kernel/efi-entry.S @@ -61,7 +61,8 @@ ENTRY(efi_stub_entry) */ mov x20, x0 // DTB address ldr x0, [sp, #16] // relocated _text address - mov x21, x0 + ldr x21, =stext_offset + add x21, x0, x21 /* * Flush dcache covering current runtime addresses diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 0a6e4f924df8..8c06c9d269d2 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -132,6 +132,8 @@ efi_head: #endif #ifdef CONFIG_EFI + .globl stext_offset + .setstext_offset, stext - efi_head .align 3 pe_header: .ascii PE @@ -155,7 +157,7 @@ optional_header: .long 0 // SizeOfInitializedData .long 0 // SizeOfUninitializedData .long efi_stub_entry - efi_head // AddressOfEntryPoint - .long stext - efi_head// BaseOfCode + .long stext_offset// BaseOfCode extra_header_fields: .quad 0 // ImageBase @@ -172,7 +174,7 @@ extra_header_fields: .long _end - efi_head // SizeOfImage // Everything before the kernel image is considered part of the header - .long stext - efi_head// SizeOfHeaders + .long stext_offset// SizeOfHeaders .long 0 // CheckSum .short 0xa // Subsystem (EFI application) .short 0 // DllCharacteristics @@ -217,9 +219,9 @@ section_table: .byte 0 .byte 0 // end of 0 padding of section name .long _end - stext// VirtualSize - .long stext - efi_head// VirtualAddress + .long stext_offset// VirtualAddress .long _edata - stext // SizeOfRawData - .long stext - efi_head// PointerToRawData + .long stext_offset// PointerToRawData .long 0 // PointerToRelocations (0 for executables) .long 0 // PointerToLineNumbers (0 for executables) -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] arm64/efi: set PE/COFF section alignment to 4 KB
On Wed, Oct 22, 2014 at 03:21:45PM +0100, Ard Biesheuvel wrote: Position independent AArch64 code needs to be linked and loaded at the same relative offset from a 4 KB boundary, or adrp/add and adrp/ldr pairs will not work correctly. (This is how PC relative symbol references with a 4 GB reach are emitted) We need to declare this in the PE/COFF header, otherwise the PE/COFF loader may load the Image and invoke the stub at an offset which violates this rule. Reviewed-by: Roy Franz roy.fr...@linaro.org Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org Acked-by: Mark Rutland mark.rutl...@arm.com Mark. --- v2: added comment explaining '.align 12' in head.S --- arch/arm64/kernel/head.S | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 8c06c9d269d2..8ae84d8c2a8c 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -161,7 +161,7 @@ optional_header: extra_header_fields: .quad 0 // ImageBase - .long 0x20// SectionAlignment + .long 0x1000 // SectionAlignment .long 0x8 // FileAlignment .short 0 // MajorOperatingSystemVersion .short 0 // MinorOperatingSystemVersion @@ -228,7 +228,15 @@ section_table: .short 0 // NumberOfRelocations (0 for executables) .short 0 // NumberOfLineNumbers (0 for executables) .long 0xe0500020 // Characteristics (section flags) - .align 5 + + /* + * EFI will load stext onwards at the 4k section alignment + * described in the PE/COFF header. To ensure that instruction + * sequences using an adrp and a :lo12: immediate will function + * correctly at this alignment, we must ensure that stext is + * placed at a 4k boundary in the Image to begin with. + */ + .align 12 #endif ENTRY(stext) -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/efi: set PE/COFF section alignment to 4 KB
Hi Ard, On Fri, Oct 10, 2014 at 10:25:24AM +0100, Ard Biesheuvel wrote: Position independent AArch64 code needs to be linked and loaded at the same relative offset from a 4 KB boundary, or adrp/add and adrp/ldr pairs will not work correctly. (This is how PC relative symbol references with a 4 GB reach are emitted) We need to declare this in the PE/COFF header, otherwise the PE/COFF loader may load the Image and invoke the stub at an offset which violates this rule. Has this been observed happening, or was this just found by inspection? Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/head.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 0a6e4f924df8..5e83e5b8a9de 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -159,7 +159,7 @@ optional_header: extra_header_fields: .quad 0 // ImageBase - .long 0x20// SectionAlignment + .long 0x1000 // SectionAlignment .long 0x8 // FileAlignment .short 0 // MajorOperatingSystemVersion .short 0 // MinorOperatingSystemVersion @@ -226,7 +226,7 @@ section_table: .short 0 // NumberOfRelocations (0 for executables) .short 0 // NumberOfLineNumbers (0 for executables) .long 0xe0500020 // Characteristics (section flags) - .align 5 + .align 12 Can we get a comment explaining why stext needs the additional alignment? Something like: /* * EFI will load stext onwards at the 4k section alignment * described in the PE/COFF header. To ensure that instruction * sequences using an adrp and a :lo12: immediate will function * correctly at this alignment, we must ensure that stext is * placed at a 4k boundary in the Image to begin with. */ .align 12 Otherwise this looks sane to me. Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] arm64/efi: efistub: jump to 'stext' directly, not through the header
On Thu, Oct 09, 2014 at 08:03:52PM +0100, Ard Biesheuvel wrote: On 9 October 2014 19:23, Mark Rutland mark.rutl...@arm.com wrote: Hi Ard, On Wed, Oct 08, 2014 at 03:11:27PM +0100, Ard Biesheuvel wrote: After the EFI stub has done its business, it jumps into the kernel by branching to offset #0 of the loaded Image, which is where it expects to find the header containing a 'branch to stext' instruction. However, the UEFI spec 2.1.1 states the following regarding PE/COFF image loading: A UEFI image is loaded into memory through the LoadImage() Boot Service. This service loads an image with a PE32+ format into memory. This PE32+ loader is required to load all sections of the PE32+ image into memory. In other words, it is /not/ required to load parts of the image that are not covered by a PE/COFF section, so it may not have loaded the header at the expected offset, as it is not covered by any PE/COFF section. What does this mean for handle_kernel_image? Given we might not have _text through to _stext mapped, do we not need to take that into account? Actually, handle_kernel_image() does not interpret the header, it just copies it along with the rest of the image if it needs to be relocated, so I don't see an issue there. Sorry, I wasn't clear enough with my concern. My concern was whether we had any guarantee _something_ was mapped for the address range covering efi_head to stext. So long as _something_ is mapped there, we're ok -- handle_kernel_image will just copy some garbage along with the usable portion of the kernel. But if the EFI loader is allowed to load stext at the precise start of RAM (or anywhere not in the idmap), in attempting the copy we'd try to access unmapped addresses. So if that's a possibility, we need to shrink the copy to cover stext to _edata rather than _text to edata. Does that make sense? However, I do remember Mark Salter mentioning that there is at least one other location that needs to be fixed up if this concern is valid. Mark? Also, have we seen problems on any systems yet? No, I am not aware of any occurrences of this exact issue, this is just one of the things I spotted while working on this code. Ok. I was just curious as to how urgent this was. But I think we mostly agree that branching through the header relies on behavior of the PE/COFF loader that is not covered by the spec. Yes. We should not rely on unspecified behaviour. Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] arm64/efi: efistub: jump to 'stext' directly, not through the header
On Fri, Oct 10, 2014 at 12:52:32PM +0100, Ard Biesheuvel wrote: On 10 October 2014 12:49, Mark Rutland mark.rutl...@arm.com wrote: On Thu, Oct 09, 2014 at 08:03:52PM +0100, Ard Biesheuvel wrote: On 9 October 2014 19:23, Mark Rutland mark.rutl...@arm.com wrote: Hi Ard, On Wed, Oct 08, 2014 at 03:11:27PM +0100, Ard Biesheuvel wrote: After the EFI stub has done its business, it jumps into the kernel by branching to offset #0 of the loaded Image, which is where it expects to find the header containing a 'branch to stext' instruction. However, the UEFI spec 2.1.1 states the following regarding PE/COFF image loading: A UEFI image is loaded into memory through the LoadImage() Boot Service. This service loads an image with a PE32+ format into memory. This PE32+ loader is required to load all sections of the PE32+ image into memory. In other words, it is /not/ required to load parts of the image that are not covered by a PE/COFF section, so it may not have loaded the header at the expected offset, as it is not covered by any PE/COFF section. What does this mean for handle_kernel_image? Given we might not have _text through to _stext mapped, do we not need to take that into account? Actually, handle_kernel_image() does not interpret the header, it just copies it along with the rest of the image if it needs to be relocated, so I don't see an issue there. Sorry, I wasn't clear enough with my concern. My concern was whether we had any guarantee _something_ was mapped for the address range covering efi_head to stext. So long as _something_ is mapped there, we're ok -- handle_kernel_image will just copy some garbage along with the usable portion of the kernel. Indeed. But if the EFI loader is allowed to load stext at the precise start of RAM (or anywhere not in the idmap), in attempting the copy we'd try to access unmapped addresses. So if that's a possibility, we need to shrink the copy to cover stext to _edata rather than _text to edata. Does that make sense? That cannot happen. The PE/COFF .text section's positive relative virtual offset ensures that the memory image has room for the header, it's just not guaranteed that anything gets copied there. Ok. If we're guaranteed to have some space there, we're fine. I'm probably being a bit thick here, but where is the positive relative virtual offset in the header? Which field defines that? Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] arm64/efi: efistub: jump to 'stext' directly, not through the header
On Fri, Oct 10, 2014 at 02:27:46PM +0100, Ard Biesheuvel wrote: On 10 October 2014 15:03, Mark Rutland mark.rutl...@arm.com wrote: [...] But if the EFI loader is allowed to load stext at the precise start of RAM (or anywhere not in the idmap), in attempting the copy we'd try to access unmapped addresses. So if that's a possibility, we need to shrink the copy to cover stext to _edata rather than _text to edata. Does that make sense? That cannot happen. The PE/COFF .text section's positive relative virtual offset ensures that the memory image has room for the header, it's just not guaranteed that anything gets copied there. Ok. If we're guaranteed to have some space there, we're fine. I'm probably being a bit thick here, but where is the positive relative virtual offset in the header? Which field defines that? The fields VirtualSize, VirtualAddress (the field I was referring to), SizeOfRawData and PointerToRawData define the relation between the file layout and the memory layout of the .text section (line 219 and up in head.S) I guess my confusion is over the semantics of the VirtualAddress field. If it's treated as an offset, what is that offset relative to in memory? And what defines that the space covered by that offset is accessible? The PE/COFF spec 8.3 describes VirtualAddress as For executable images, the address of the first byte of the section relative to the image base when the section is loaded into memory. ImageBase is a field itself in the PE/COFF header, described as The preferred address of the first byte of image when loaded into memory; must be a multiple of 64 K. Thanks for the info, this now makes a lot more sense to me. So that means the .text section is not the start of the image, but is offset by (stext - efi_head) bytes from the start, covering the header (regardless of whether it is actually present in the loaded image). The SizeOfImage field is described as The size (in bytes) of the image, including all headers, as the image is loaded in memory. My interpretation is that memory needs to be allocated for the header as well as all sections that have a VirtualSize (including sections like BSS which don't have a payload in the file) That would match what I understand from reading the above, though it strikes me as odd that space needs to be allocated for the headers if they aren't guaranteed to be copied -- it's not defined where they would live in the image. In our current definition, the memory offset and the file offset are identical (which this patch redefines as 'stext_offset'). The virtual size covers the entire static memory footprint of Image (minus the header). whereas the SizeOfRawData contains the size of the payload in the file (again, minus the header). The balance is zero initialized by the loader. I can see why this guarantees there is space for stext to _end, but I don't understand how this guarantees there is a valid mapping for the region that would otherwise be _head to stext. The allocation itself is defined in terms of ImageBase/SizeOfImage (although ImageBase is only a preferred offset) How this allocation is populated with data (and where the holes are) is described by the sections. Ok. Thanks for putting this information together (it's remarkably useful), and thanks for putting up with my ignorance of PE/COFF. Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/efi: set PE/COFF section alignment to 4 KB
On Fri, Oct 10, 2014 at 11:37:03AM +0100, Ard Biesheuvel wrote: On 10 October 2014 12:33, Mark Rutland mark.rutl...@arm.com wrote: Hi Ard, On Fri, Oct 10, 2014 at 10:25:24AM +0100, Ard Biesheuvel wrote: Position independent AArch64 code needs to be linked and loaded at the same relative offset from a 4 KB boundary, or adrp/add and adrp/ldr pairs will not work correctly. (This is how PC relative symbol references with a 4 GB reach are emitted) We need to declare this in the PE/COFF header, otherwise the PE/COFF loader may load the Image and invoke the stub at an offset which violates this rule. Has this been observed happening, or was this just found by inspection? This is also something found by inspection, or rather, by the discussion going on in the other thread. I am not aware of any PE/COFF loaders that may choose an offset that is not 4 KB aligned, even if the header we give it appears to allow it. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/head.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 0a6e4f924df8..5e83e5b8a9de 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -159,7 +159,7 @@ optional_header: extra_header_fields: .quad 0 // ImageBase - .long 0x20// SectionAlignment + .long 0x1000 // SectionAlignment Looking at this again, I'm more confused than I was to begin with. Surely we know exactly where the .text section will be loaded because of its VirtualAddress? If that were the case, we can drop the .align 12 as we already load it at the offset any arp or :lo12: immediate will have been built for. I don't follow how SectionAlignment interacts with a given section's VirtualAddress. Thanks, Mark. .long 0x8 // FileAlignment .short 0 // MajorOperatingSystemVersion .short 0 // MinorOperatingSystemVersion @@ -226,7 +226,7 @@ section_table: .short 0 // NumberOfRelocations (0 for executables) .short 0 // NumberOfLineNumbers (0 for executables) .long 0xe0500020 // Characteristics (section flags) - .align 5 + .align 12 Can we get a comment explaining why stext needs the additional alignment? Something like: /* * EFI will load stext onwards at the 4k section alignment * described in the PE/COFF header. To ensure that instruction * sequences using an adrp and a :lo12: immediate will function * correctly at this alignment, we must ensure that stext is * placed at a 4k boundary in the Image to begin with. */ .align 12 OK -- Ard. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/efi: set PE/COFF section alignment to 4 KB
On Fri, Oct 10, 2014 at 03:50:49PM +0100, Ard Biesheuvel wrote: On 10 October 2014 16:09, Mark Rutland mark.rutl...@arm.com wrote: On Fri, Oct 10, 2014 at 11:37:03AM +0100, Ard Biesheuvel wrote: On 10 October 2014 12:33, Mark Rutland mark.rutl...@arm.com wrote: Hi Ard, On Fri, Oct 10, 2014 at 10:25:24AM +0100, Ard Biesheuvel wrote: Position independent AArch64 code needs to be linked and loaded at the same relative offset from a 4 KB boundary, or adrp/add and adrp/ldr pairs will not work correctly. (This is how PC relative symbol references with a 4 GB reach are emitted) We need to declare this in the PE/COFF header, otherwise the PE/COFF loader may load the Image and invoke the stub at an offset which violates this rule. Has this been observed happening, or was this just found by inspection? This is also something found by inspection, or rather, by the discussion going on in the other thread. I am not aware of any PE/COFF loaders that may choose an offset that is not 4 KB aligned, even if the header we give it appears to allow it. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/head.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 0a6e4f924df8..5e83e5b8a9de 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -159,7 +159,7 @@ optional_header: extra_header_fields: .quad 0 // ImageBase - .long 0x20// SectionAlignment + .long 0x1000 // SectionAlignment Looking at this again, I'm more confused than I was to begin with. :-) Surely we know exactly where the .text section will be loaded because of its VirtualAddress? If that were the case, we can drop the .align 12 as we already load it at the offset any arp or :lo12: immediate will have been built for. No, not quite. It only tells us what the /offset/ should be of the section from the ImageBase chosen by the loader, not what the alignment of ImageBase itself should be. For instance, with a section VirtualAddress of 0x1000 and a SectionAlignment of 0x400, the section could legally be loaded @ 0x1400 or 0x1800 (for ImageBase == 0x400 or 0x800, respectively) I see. I had myself confused between the image and sections, and (mistakenly) thought we had control over the alignment of the image as opposed to sections. I thought the loader chose an ImageBase that was sufficiently aligned, then just loaded each segment at the requested offset from that. It sounds like the loader actually has to try to reconcile the offset of each section against each other to determine an ImageBase to use. Given that the only thing we can control the alignment of is the .text section (with an offset applied below that), aligning the .text section to 4k sounds right. Thanks for bearing with me! Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] arm64/efi: efistub: jump to 'stext' directly, not through the header
Hi Ard, On Wed, Oct 08, 2014 at 03:11:27PM +0100, Ard Biesheuvel wrote: After the EFI stub has done its business, it jumps into the kernel by branching to offset #0 of the loaded Image, which is where it expects to find the header containing a 'branch to stext' instruction. However, the UEFI spec 2.1.1 states the following regarding PE/COFF image loading: A UEFI image is loaded into memory through the LoadImage() Boot Service. This service loads an image with a PE32+ format into memory. This PE32+ loader is required to load all sections of the PE32+ image into memory. In other words, it is /not/ required to load parts of the image that are not covered by a PE/COFF section, so it may not have loaded the header at the expected offset, as it is not covered by any PE/COFF section. What does this mean for handle_kernel_image? Given we might not have _text through to _stext mapped, do we not need to take that into account? Also, have we seen problems on any systems yet? Otherwise, this looks like a good fix for hte problem. Thanks, Mark. So instead, jump to 'stext' directly, which is at the base of the PE/COFF .text section, by supplying a symbol 'stext_offset' to efi-entry.o which contains the relative offset of stext into the Image. Also replace other open coded calculations of the same value with a reference to 'stext_offset' Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- Changes since v2: - rebased onto 3.17+ - added spec reference to commit message Changes since v1: - drop :lo12: relocation against stext_offset in favor of using a literal '=stext_offset' which is safer arch/arm64/kernel/efi-entry.S | 3 ++- arch/arm64/kernel/head.S | 10 ++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S index 619b1dd7bcde..a0016d3a17da 100644 --- a/arch/arm64/kernel/efi-entry.S +++ b/arch/arm64/kernel/efi-entry.S @@ -61,7 +61,8 @@ ENTRY(efi_stub_entry) */ mov x20, x0 // DTB address ldr x0, [sp, #16] // relocated _text address - mov x21, x0 + ldr x21, =stext_offset + add x21, x0, x21 /* * Flush dcache covering current runtime addresses diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 0a6e4f924df8..8c06c9d269d2 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -132,6 +132,8 @@ efi_head: #endif #ifdef CONFIG_EFI + .globl stext_offset + .setstext_offset, stext - efi_head .align 3 pe_header: .ascii PE @@ -155,7 +157,7 @@ optional_header: .long 0 // SizeOfInitializedData .long 0 // SizeOfUninitializedData .long efi_stub_entry - efi_head // AddressOfEntryPoint - .long stext - efi_head// BaseOfCode + .long stext_offset// BaseOfCode extra_header_fields: .quad 0 // ImageBase @@ -172,7 +174,7 @@ extra_header_fields: .long _end - efi_head // SizeOfImage // Everything before the kernel image is considered part of the header - .long stext - efi_head// SizeOfHeaders + .long stext_offset// SizeOfHeaders .long 0 // CheckSum .short 0xa // Subsystem (EFI application) .short 0 // DllCharacteristics @@ -217,9 +219,9 @@ section_table: .byte 0 .byte 0 // end of 0 padding of section name .long _end - stext// VirtualSize - .long stext - efi_head// VirtualAddress + .long stext_offset// VirtualAddress .long _edata - stext // SizeOfRawData - .long stext - efi_head// PointerToRawData + .long stext_offset// PointerToRawData .long 0 // PointerToRelocations (0 for executables) .long 0 // PointerToLineNumbers (0 for executables) -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/44] qnap-poweroff: Drop reference to pm_power_off from devicetree bindings
On Tue, Oct 07, 2014 at 06:28:09AM +0100, Guenter Roeck wrote: Replace reference to pm_power_off (which is an implementation detail) and replace it with a more generic description of the driver's functionality. Cc: Rob Herring robh...@kernel.org Cc: Pawel Moll pawel.m...@arm.com Cc: Mark Rutland mark.rutl...@arm.com Signed-off-by: Guenter Roeck li...@roeck-us.net --- Documentation/devicetree/bindings/power_supply/qnap-poweroff.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/power_supply/qnap-poweroff.txt b/Documentation/devicetree/bindings/power_supply/qnap-poweroff.txt index af25e77..1e2260a 100644 --- a/Documentation/devicetree/bindings/power_supply/qnap-poweroff.txt +++ b/Documentation/devicetree/bindings/power_supply/qnap-poweroff.txt @@ -3,8 +3,8 @@ QNAP NAS devices have a microcontroller controlling the main power supply. This microcontroller is connected to UART1 of the Kirkwood and Orion5x SoCs. Sending the character 'A', at 19200 baud, tells the -microcontroller to turn the power off. This driver adds a handler to -pm_power_off which is called to turn the power off. +microcontroller to turn the power off. This driver installs a handler +to power off the system. I'd remove the last sentence -- the driver is also independent of the HW, and the description of how the power off works at the HW level is sufficient. With that: Acked-by: Mark Rutland mark.rutl...@arm.com Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/44] gpio-poweroff: Drop reference to pm_power_off from devicetree bindings
On Tue, Oct 07, 2014 at 06:28:08AM +0100, Guenter Roeck wrote: pm_power_off is an implementation detail. Replace it with a more generic description of the driver's functionality. Cc: Rob Herring robh...@kernel.org Cc: Pawel Moll pawel.m...@arm.com Cc: Mark Rutland mark.rutl...@arm.com Acked-by: Mark Rutland mark.rutl...@arm.com Signed-off-by: Guenter Roeck li...@roeck-us.net --- Documentation/devicetree/bindings/gpio/gpio-poweroff.txt | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/devicetree/bindings/gpio/gpio-poweroff.txt b/Documentation/devicetree/bindings/gpio/gpio-poweroff.txt index d4eab92..c95a1a6 100644 --- a/Documentation/devicetree/bindings/gpio/gpio-poweroff.txt +++ b/Documentation/devicetree/bindings/gpio/gpio-poweroff.txt @@ -2,12 +2,12 @@ Driver a GPIO line that can be used to turn the power off. The driver supports both level triggered and edge triggered power off. At driver load time, the driver will request the given gpio line and -install a pm_power_off handler. If the optional properties 'input' is -not found, the GPIO line will be driven in the inactive +install a handler to power off the system. If the optional properties +'input' is not found, the GPIO line will be driven in the inactive state. Otherwise its configured as an input. -When the pm_power_off is called, the gpio is configured as an output, -and drive active, so triggering a level triggered power off +When the the poweroff handler is called, the gpio is configured as an +output, and drive active, so triggering a level triggered power off condition. This will also cause an inactive-active edge condition, so triggering positive edge triggered power off. After a delay of 100ms, the GPIO is set to inactive, thus causing an active-inactive edge, @@ -24,7 +24,7 @@ Required properties: Optional properties: - input : Initially configure the GPIO line as an input. Only reconfigure - it to an output when the pm_power_off function is called. If this optional + it to an output when the poweroff handler is called. If this optional property is not specified, the GPIO is initialized as an output in its inactive state. -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/44] mfd: as3722: Drop reference to pm_power_off from devicetree bindings
On Tue, Oct 07, 2014 at 05:21:11PM +0100, Rob Landley wrote: On 10/07/14 00:28, Guenter Roeck wrote: Devicetree bindings are supposed to be operating system independent and should thus not describe how a specific functionality is implemented in Linux. So your argument is that linux/Documentation/devicetree/bindings should not be specific to Linux. Merely hosted in the Linux kernel source repository. Precisely. If nothing else as a general guideline this keeps us honest, and prevents us from embedding arbitrary implementation details into bidnings that cause pain later when we want to change things at either end. There are already otehr users of these bindings, so we can't really claim they're strictly Linux-specific anyhow. Well that's certainly a point of view. As far as I am aware, it's the point of view shared by the device tree maintainers, and it's been that way for a while. I don't really follow your concern. For one thing were this followed more strictly this file wouldn't need patching at all to correct for this Linux-internal rework... Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/arm64: fix fdt-related memory reservation
Hi Mark, On Mon, Sep 08, 2014 at 02:31:42PM +0100, Mark Salter wrote: Commit 86c8b27a01cf: arm64: ignore DT memreserve entries when booting in UEFI mode prevents early_init_fdt_scan_reserved_mem() from being called for arm64 kernels booting via UEFI. This was done because the kernel will use the UEFI memory map to determine reserved memory regions. That approach has problems in that early_init_fdt_scan_reserved_mem() also reserves the FDT itself and any node-specific reserved memory. By chance of some kernel configs, the FDT may be overwritten before it can be unflattened and the kernel will fail to boot. More subtle problems will result if the FDT has node specific reserved memory which is not really reserved. That doesn't sound like fun; apologies for allowing such brokenness through in the first place. [...] + /* + * Delete all memory reserve map entries. When booting via UEFI, + * kernel will use the UEFI memory map to find reserved regions. + */ + num_rsv = fdt_num_mem_rsv(fdt); + for (i = 0; i num_rsv; i++) + fdt_del_mem_rsv(fdt, i); I don't think that's right. Won't the memreserve entries shift down by one each time we call fdt_del_mem_rsv? Shouldn't this be something like: while (fdt_num_mem_rsv(fdt)) fdt_del_mem_rsv(fdt, 0); Or we could count downwards. Otherwise, the general approach sounds sane to me, so with that bug fixed or disproven: Acked-by: Mark Rutland mark.rutl...@arm.com Given this is mostly in the EFI stub I expect that this will go via the EFI tree? Mark. + node = fdt_subnode_offset(fdt, 0, chosen); if (node 0) { node = fdt_add_subnode(fdt, 0, chosen); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/arm64: fix fdt-related memory reservation
On Mon, Sep 08, 2014 at 03:21:05PM +0100, Mark Salter wrote: On Mon, 2014-09-08 at 15:06 +0100, Mark Rutland wrote: Hi Mark, On Mon, Sep 08, 2014 at 02:31:42PM +0100, Mark Salter wrote: Commit 86c8b27a01cf: arm64: ignore DT memreserve entries when booting in UEFI mode prevents early_init_fdt_scan_reserved_mem() from being called for arm64 kernels booting via UEFI. This was done because the kernel will use the UEFI memory map to determine reserved memory regions. That approach has problems in that early_init_fdt_scan_reserved_mem() also reserves the FDT itself and any node-specific reserved memory. By chance of some kernel configs, the FDT may be overwritten before it can be unflattened and the kernel will fail to boot. More subtle problems will result if the FDT has node specific reserved memory which is not really reserved. That doesn't sound like fun; apologies for allowing such brokenness through in the first place. Heh. It was obvious that DT unflattening was broken, but bisecting didn't help much because I kept finding patches which when reverted made the problem go away even though they obviously weren't the cause. Yeah, those types of bugs are never fun. I recall a similar situation with the __INIT annotation in the versatile pen code. [...] + /* + * Delete all memory reserve map entries. When booting via UEFI, + * kernel will use the UEFI memory map to find reserved regions. + */ + num_rsv = fdt_num_mem_rsv(fdt); + for (i = 0; i num_rsv; i++) + fdt_del_mem_rsv(fdt, i); I don't think that's right. Won't the memreserve entries shift down by one each time we call fdt_del_mem_rsv? Shouldn't this be something like: while (fdt_num_mem_rsv(fdt)) fdt_del_mem_rsv(fdt, 0); Or we could count downwards. Sigh. Yes, you are right. I only tested with one reserved region. I think counting down would be the way to go. I'll send a fixed patch shortly. Ok, that sounds fine by me; make sure you add my ack :) Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] arm64/efi: efistub: don't abort if base of DRAM is occupied
On Wed, Aug 20, 2014 at 06:10:57PM +0100, Matt Fleming wrote: On Thu, 14 Aug, at 12:32:05PM, Mark Rutland wrote: On Wed, Jul 30, 2014 at 11:59:04AM +0100, Ard Biesheuvel wrote: If we cannot relocate the kernel Image to its preferred offset of base of DRAM plus TEXT_OFFSET, instead relocate it to the lowest available 2 MB boundary plus TEXT_OFFSET. We may lose a bit of memory at the low end, but we can still proceed normally otherwise. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org Acked-by: Mark Salter msal...@redhat.com Acked-by: Mark Rutland mark.rutl...@arm.com Ard, who's picking this up? Will's already taken this into arm64/devel [1,2] with the intention of waiting for v3.18 [3]. Per Leif's comment [4] that might have to be bumped. Will? Mark. [1] https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/commit/?h=devel [2] https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/commit/?h=develid=12f002aa8d96b90e6e37cc2b0d9a6a3cacdf8ef5 [3] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-August/279655.html [4] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-August/279771.html -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] arm64/efi: efistub: cover entire static mem footprint in PE/COFF .text
On Wed, Jul 30, 2014 at 11:59:03AM +0100, Ard Biesheuvel wrote: The static memory footprint of a kernel Image at boot is larger than the Image file itself. Things like .bss data and initial page tables are allocated statically but populated dynamically so their content is not contained in the Image file. However, if EFI (or GRUB) has loaded the Image at precisely the desired offset of base of DRAM + TEXT_OFFSET, the Image will be booted in place, and we have to make sure that the allocation done by the PE/COFF loader is large enough. Fix this by growing the PE/COFF .text section to cover the entire static memory footprint. The part of the section that is not covered by the payload will be zero initialised by the PE/COFF loader. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org Acked-by: Mark Salter msal...@redhat.com This looks sane to me and it seems we do the same for x86 as of c7fb93ec51d4 (x86/efi: Include a .bss section within the PE/COFF headers). So: Acked-by: Mark Rutland mark.rutl...@arm.com --- arch/arm64/kernel/head.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 144f10567f82..b6ca95aee348 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -151,7 +151,7 @@ optional_header: .short 0x20b // PE32+ format .byte 0x02// MajorLinkerVersion .byte 0x14// MinorLinkerVersion - .long _edata - stext // SizeOfCode + .long _end - stext// SizeOfCode .long 0 // SizeOfInitializedData .long 0 // SizeOfUninitializedData .long efi_stub_entry - efi_head // AddressOfEntryPoint @@ -169,7 +169,7 @@ extra_header_fields: .short 0 // MinorSubsystemVersion .long 0 // Win32VersionValue - .long _edata - efi_head // SizeOfImage + .long _end - efi_head // SizeOfImage // Everything before the kernel image is considered part of the header .long stext - efi_head// SizeOfHeaders @@ -216,7 +216,7 @@ section_table: .byte 0 .byte 0 .byte 0 // end of 0 padding of section name - .long _edata - stext // VirtualSize + .long _end - stext// VirtualSize .long stext - efi_head// VirtualAddress .long _edata - stext // SizeOfRawData .long stext - efi_head// PointerToRawData -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] arm64/efi: efistub: don't abort if base of DRAM is occupied
On Wed, Jul 30, 2014 at 11:59:04AM +0100, Ard Biesheuvel wrote: If we cannot relocate the kernel Image to its preferred offset of base of DRAM plus TEXT_OFFSET, instead relocate it to the lowest available 2 MB boundary plus TEXT_OFFSET. We may lose a bit of memory at the low end, but we can still proceed normally otherwise. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org Acked-by: Mark Salter msal...@redhat.com Acked-by: Mark Rutland mark.rutl...@arm.com --- arch/arm64/kernel/efi-stub.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c index 1317fef8dde9..d27dd982ff26 100644 --- a/arch/arm64/kernel/efi-stub.c +++ b/arch/arm64/kernel/efi-stub.c @@ -28,20 +28,16 @@ efi_status_t handle_kernel_image(efi_system_table_t *sys_table, kernel_size = _edata - _text; if (*image_addr != (dram_base + TEXT_OFFSET)) { kernel_memsize = kernel_size + (_end - _edata); - status = efi_relocate_kernel(sys_table, image_addr, - kernel_size, kernel_memsize, - dram_base + TEXT_OFFSET, - PAGE_SIZE); + status = efi_low_alloc(sys_table, kernel_memsize + TEXT_OFFSET, +SZ_2M, reserve_addr); if (status != EFI_SUCCESS) { pr_efi_err(sys_table, Failed to relocate kernel\n); return status; } - if (*image_addr != (dram_base + TEXT_OFFSET)) { - pr_efi_err(sys_table, Failed to alloc kernel memory\n); - efi_free(sys_table, kernel_memsize, *image_addr); - return EFI_LOAD_ERROR; - } - *image_size = kernel_memsize; + memcpy((void *)*reserve_addr + TEXT_OFFSET, (void *)*image_addr, +kernel_size); + *image_addr = *reserve_addr + TEXT_OFFSET; + *reserve_size = kernel_memsize + TEXT_OFFSET; } -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/3] arm64: spin-table: handle unmapped cpu-release-addrs
On Thu, Jul 31, 2014 at 10:45:15AM +0100, Will Deacon wrote: On Wed, Jul 30, 2014 at 08:17:02PM +0100, Ard Biesheuvel wrote: ]On 30 July 2014 13:30, Will Deacon will.dea...@arm.com wrote: On Wed, Jul 30, 2014 at 11:59:02AM +0100, Ard Biesheuvel wrote: From: Mark Rutland mark.rutl...@arm.com In certain cases the cpu-release-addr of a CPU may not fall in the linear mapping (e.g. when the kernel is loaded above this address due to the presence of other images in memory). This is problematic for the spin-table code as it assumes that it can trivially convert a cpu-release-addr to a valid VA in the linear map. This patch modifies the spin-table code to use a temporary cached mapping to write to a given cpu-release-addr, enabling us to support addresses regardless of whether they are covered by the linear mapping. Signed-off-by: Mark Rutland mark.rutl...@arm.com Tested-by: Mark Salter msal...@redhat.com [ardb: added (__force void *) cast] Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/smp_spin_table.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) I'm nervous about this. What if the spin table sits in the same physical 64k frame as a read-sensitive device and we're running with 64k pages? Actually, booting.txt requires cpu-release-addr to point to a /memreserve/d part of memory, which implies DRAM (or you wouldn't have to memreserve it) That means it should always be covered by the linear mapping, unless it is located before Image in DRAM, which is the case addressed by this patch. But if it's located before before the Image in DRAM and isn't covered by the linear mapping, then surely the /memreserve/ is pointless too? In which case, this looks like we're simply trying to cater for platforms that aren't following booting.txt (which may need updating if we need to handle this). No. The DT is describing the memory which is present, and the subset thereof which should not be used under normal circumstances. That's a static property of the system. Where the OS happens to get loaded and what it is able to address is a dynamic property of the OS (and possibly the bootloader). The DT cannot have knowledge of this. It's always true that the OS should not blindly use memreserve'd memory. The fact that it cannot address it in the linear mapping is orthogonal. Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/3] arm64: spin-table: handle unmapped cpu-release-addrs
On Thu, Jul 31, 2014 at 11:04:39AM +0100, Will Deacon wrote: On Thu, Jul 31, 2014 at 10:58:54AM +0100, Mark Rutland wrote: On Thu, Jul 31, 2014 at 10:45:15AM +0100, Will Deacon wrote: On Wed, Jul 30, 2014 at 08:17:02PM +0100, Ard Biesheuvel wrote: ]On 30 July 2014 13:30, Will Deacon will.dea...@arm.com wrote: On Wed, Jul 30, 2014 at 11:59:02AM +0100, Ard Biesheuvel wrote: From: Mark Rutland mark.rutl...@arm.com In certain cases the cpu-release-addr of a CPU may not fall in the linear mapping (e.g. when the kernel is loaded above this address due to the presence of other images in memory). This is problematic for the spin-table code as it assumes that it can trivially convert a cpu-release-addr to a valid VA in the linear map. This patch modifies the spin-table code to use a temporary cached mapping to write to a given cpu-release-addr, enabling us to support addresses regardless of whether they are covered by the linear mapping. Signed-off-by: Mark Rutland mark.rutl...@arm.com Tested-by: Mark Salter msal...@redhat.com [ardb: added (__force void *) cast] Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/smp_spin_table.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) I'm nervous about this. What if the spin table sits in the same physical 64k frame as a read-sensitive device and we're running with 64k pages? Actually, booting.txt requires cpu-release-addr to point to a /memreserve/d part of memory, which implies DRAM (or you wouldn't have to memreserve it) That means it should always be covered by the linear mapping, unless it is located before Image in DRAM, which is the case addressed by this patch. But if it's located before before the Image in DRAM and isn't covered by the linear mapping, then surely the /memreserve/ is pointless too? In which case, this looks like we're simply trying to cater for platforms that aren't following booting.txt (which may need updating if we need to handle this). No. The DT is describing the memory which is present, and the subset thereof which should not be used under normal circumstances. That's a static property of the system. Where the OS happens to get loaded and what it is able to address is a dynamic property of the OS (and possibly the bootloader). The DT cannot have knowledge of this. It's always true that the OS should not blindly use memreserve'd memory. The fact that it cannot address it in the linear mapping is orthogonal. In which case, I think asserting that /memreserve/ implies DRAM is pretty fragile and not actually enforced anywhere. Sure, we can say `don't do that', but I'd prefer to have the kernel detect this dynamically. I think the boot protocol needs an update to allow a cpu-release-addr not covered by linear mapping. There are reasons that the kernel might not be loaded at the start of RAM, and I think relying on the cpu-release-addr addresses lying in the linear mapping is a limitation we need to address. Given that I also think we should allow for cpu-release-addrs outside of the range desribed by memory nodes (and therefore not requiring any /mremreserve/). I do not think we should rely on being able to address the cpu-release-addr with a normal cacheable mapping. If the cpu-release-addr falls outside of the memory described by the memory node(s) then we have no idea where it lives. Currently this falls in normal memory, but mandating that feels odd. The sole purpose of /memreserve/ is to describe areas in physical memory that memory should not be used for general allocation. I don't think it makes any sense to derive any information from /memreserve/ other than the fact said addresses shouldn't be poked arbitarily. If we allow cpu-release-addrs outside of memory, then we won't have a /memreserve/ anyhow. So the question becomes can or can't we always detect when we already have a mapping that covers a cpu-release-addr? Does dtc check that the /memreserve/ region is actually a subset of the memory node? I don't beleive it does. It's probably a sensible warning, but as far as I am aware the only time the memory reservation table will be read in any OS is to poke holes in its memory allocation pool(s). Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/3] arm64: spin-table: handle unmapped cpu-release-addrs
On Wed, Jul 30, 2014 at 01:00:40PM +0100, Ard Biesheuvel wrote: On 30 July 2014 13:30, Will Deacon will.dea...@arm.com wrote: On Wed, Jul 30, 2014 at 11:59:02AM +0100, Ard Biesheuvel wrote: From: Mark Rutland mark.rutl...@arm.com In certain cases the cpu-release-addr of a CPU may not fall in the linear mapping (e.g. when the kernel is loaded above this address due to the presence of other images in memory). This is problematic for the spin-table code as it assumes that it can trivially convert a cpu-release-addr to a valid VA in the linear map. This patch modifies the spin-table code to use a temporary cached mapping to write to a given cpu-release-addr, enabling us to support addresses regardless of whether they are covered by the linear mapping. Signed-off-by: Mark Rutland mark.rutl...@arm.com Tested-by: Mark Salter msal...@redhat.com [ardb: added (__force void *) cast] Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/smp_spin_table.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) I'm nervous about this. What if the spin table sits in the same physical 64k frame as a read-sensitive device and we're running with 64k pages? I see what you mean. This is potentially hairy, as EFI already ioremap_cache()s everything known to it as normal DRAM, so using plain ioremap() here if pfn_valid() returns false for cpu-release-addr's PFN may still result in mappings with different attributes for the same region. So how should we decide whether to call ioremap() or ioremap_cache() in this case? If we're careful about handling mismatched attributes we might be able to get away with always using a device mapping. I'll need to have a think about that, I'm not sure on the architected cache behaviour in such a case. Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/3] arm64: spin-table: handle unmapped cpu-release-addrs
On Wed, Jul 30, 2014 at 01:42:58PM +0100, Will Deacon wrote: On Wed, Jul 30, 2014 at 01:30:29PM +0100, Mark Rutland wrote: On Wed, Jul 30, 2014 at 01:00:40PM +0100, Ard Biesheuvel wrote: On 30 July 2014 13:30, Will Deacon will.dea...@arm.com wrote: On Wed, Jul 30, 2014 at 11:59:02AM +0100, Ard Biesheuvel wrote: From: Mark Rutland mark.rutl...@arm.com In certain cases the cpu-release-addr of a CPU may not fall in the linear mapping (e.g. when the kernel is loaded above this address due to the presence of other images in memory). This is problematic for the spin-table code as it assumes that it can trivially convert a cpu-release-addr to a valid VA in the linear map. This patch modifies the spin-table code to use a temporary cached mapping to write to a given cpu-release-addr, enabling us to support addresses regardless of whether they are covered by the linear mapping. Signed-off-by: Mark Rutland mark.rutl...@arm.com Tested-by: Mark Salter msal...@redhat.com [ardb: added (__force void *) cast] Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/smp_spin_table.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) I'm nervous about this. What if the spin table sits in the same physical 64k frame as a read-sensitive device and we're running with 64k pages? I see what you mean. This is potentially hairy, as EFI already ioremap_cache()s everything known to it as normal DRAM, so using plain ioremap() here if pfn_valid() returns false for cpu-release-addr's PFN may still result in mappings with different attributes for the same region. So how should we decide whether to call ioremap() or ioremap_cache() in this case? If we're careful about handling mismatched attributes we might be able to get away with always using a device mapping. Even then, I think ioremap hits a WARN_ON if pfn_valid. Ok, that's that idea dead then. I'll need to have a think about that, I'm not sure on the architected cache behaviour in such a case. Of we just skip the cache flush if !pfn_valid. I don't think that's always safe given Ard's comment that the EFI code will possibly have a mapping covering the region created by ioremap_cache. Ard, what exactly does the EFI code map with ioremap_cache, and why? Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64: ignore DT memreserve entries when booting in UEFI mode
On Mon, Jul 28, 2014 at 07:03:03PM +0100, Leif Lindholm wrote: UEFI provides its own method for marking regions to reserve, via the memory map which is also used to initialise memblock. So when using the UEFI memory map, ignore any memreserve entries present in the DT. It's worth noting that no-one is relying on this at present, and were they it would imply that their UEFI implementation is broken (providing a dodgy memory map). So before people start doing dodgy things, let's, close that hole. If we need a way of modifying the memory map we should come up with a more general plan. Reported-by: Mark Rutland mark.rutl...@arm.com Signed-off-by: Leif Lindholm leif.lindh...@linaro.org --- arch/arm64/kernel/efi.c |2 ++ arch/arm64/mm/init.c|4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c index 14db1f6..7ad17b2 100644 --- a/arch/arm64/kernel/efi.c +++ b/arch/arm64/kernel/efi.c @@ -188,6 +188,8 @@ static __init void reserve_regions(void) if (uefi_debug) pr_cont(\n); } + + set_bit(EFI_MEMMAP, efi.flags); Nothing outside of arch/x86 seems to hang off this, so using it here looks fine. } diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index e90c542..58dbf2e 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -32,6 +32,7 @@ #include linux/of_fdt.h #include linux/dma-mapping.h #include linux/dma-contiguous.h +#include linux/efi.h #include asm/sections.h #include asm/setup.h @@ -151,7 +152,8 @@ void __init arm64_memblock_init(void) memblock_reserve(__pa(swapper_pg_dir), SWAPPER_DIR_SIZE); memblock_reserve(__pa(idmap_pg_dir), IDMAP_DIR_SIZE); - early_init_fdt_scan_reserved_mem(); + if (!efi_enabled(EFI_MEMMAP)) + early_init_fdt_scan_reserved_mem(); And this is a false stub for !EFI, so that looks fine. FWIW: Reviewed-by: Mark Rutland mark.rutl...@arm.com I wonder if we have any other flat tree uses we need to be careful of when booting via EFI. Thanks Mark. /* 4GB maximum for 32-bit only capable devices */ if (IS_ENABLED(CONFIG_ZONE_DMA)) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 09/10] arm64/efi: enable minimal UEFI Runtime Services for big endian
Hi Ard, This is certainly a neat feature, and I definitely want to be able to boot BE kernels via UEFI. However, I'm wary of calling EFI in a physical (i.e. idmap with dcaches off) context. I'm not sure anyone else does that, and I'm not sure whether that's going to work (both because of the cache maintenance requirements and the expectations of a given UEFI implementation w.r.t. memory cacheability). I'd hoped we'd be able to use a LE EL0 context to call the runtime services in, but I'm not sure that's possible by the spec :( As I understand it, we shouldn't need these runtime services to simply boot a BE kernel. On Mon, Jul 21, 2014 at 04:16:24PM +0100, Ard Biesheuvel wrote: This enables the UEFI Runtime Services needed to manipulate the non-volatile variable store when running under a BE kernel. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/include/asm/efi.h | 2 + arch/arm64/kernel/efi-be-call.S| 55 arch/arm64/kernel/efi-be-runtime.c | 104 + arch/arm64/kernel/efi.c| 15 ++ 4 files changed, 176 insertions(+) create mode 100644 arch/arm64/kernel/efi-be-call.S create mode 100644 arch/arm64/kernel/efi-be-runtime.c diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h index a34fd3b12e2b..44e642b6ab61 100644 --- a/arch/arm64/include/asm/efi.h +++ b/arch/arm64/include/asm/efi.h @@ -44,4 +44,6 @@ extern void efi_idmap_init(void); #define efi_call_early(f, ...) sys_table_arg-boottime-f(__VA_ARGS__) +extern void efi_be_runtime_setup(void); + #endif /* _ASM_EFI_H */ diff --git a/arch/arm64/kernel/efi-be-call.S b/arch/arm64/kernel/efi-be-call.S new file mode 100644 index ..24c92a4c352b --- /dev/null +++ b/arch/arm64/kernel/efi-be-call.S @@ -0,0 +1,55 @@ + +#include linux/linkage.h + + .text + .align 3 +ENTRY(efi_be_phys_call) + /* + * Entered at physical address with 1:1 mapping enabled. + */ + stp x29, x30, [sp, #-96]! + mov x29, sp + str x27, [sp, #16] + + ldr x8, =efi_be_phys_call // virt address of this function + adr x9, efi_be_phys_call// phys address of this function + sub x9, x8, x9 // calculate virt to phys offset in x9 + + /* preserve all inputs */ + stp x0, x1, [sp, #32] + stp x2, x3, [sp, #48] + stp x4, x5, [sp, #64] + stp x6, x7, [sp, #80] + + /* get phys address of stack */ + sub sp, sp, x9 + + /* switch to LE, disable MMU and D-cache but leave I-cache enabled */ + mrs x27, sctlr_el1 + bic x8, x27, #1 2// clear SCTLR.C + msr sctlr_el1, x8 + + bl flush_cache_all What is the cache flush for? The only thing that flush_cache_all can do is empty the local architected caches, and it can only do that when said caches are disabled. Any other use is unsafe; we have no guarantee that the cache is empty (or even clean), and we have no guarantee that prior writes have made it to the PoC. Even when the caches are disabled, flush_cache_all can only guarantee that the local architected caches are empty. There is no guarantee that the dirty data made it to the PoC. + + /* restore inputs but rotated by 1 register */ + ldp x7, x0, [sp, #32] + ldp x1, x2, [sp, #48] + ldp x3, x4, [sp, #64] + ldp x5, x6, [sp, #80] + + bic x8, x27, #1 2// clear SCTLR.C + bic x8, x8, #1 0 // clear SCTLR.M + bic x8, x8, #1 25// clear SCTLR.EE + msr sctlr_el1, x8 + isb + + blr x7 Is it safe to call EFI functions with the D-cache disabled? Do the functions not care about the memory attributes for their own sue (e.g. for exclusives)? Do they not care about IO? If IO to/from storage for variables is cache-coherent EFI and the device won't have a coherent view of memory. + + /* switch back to BE and reenable MMU and D-cache */ + msr sctlr_el1, x27 + Missing ISB? + mov sp, x29 + ldr x27, [sp, #16] + ldp x29, x30, [sp], #96 + ret +ENDPROC(efi_be_phys_call) diff --git a/arch/arm64/kernel/efi-be-runtime.c b/arch/arm64/kernel/efi-be-runtime.c new file mode 100644 index ..62e6c441b77b --- /dev/null +++ b/arch/arm64/kernel/efi-be-runtime.c @@ -0,0 +1,104 @@ + +#include linux/efi.h +#include linux/spinlock.h +#include asm/efi.h +#include asm/neon.h +#include asm/tlbflush.h + +static efi_runtime_services_t *runtime; +static efi_status_t (*efi_be_call)(phys_addr_t func, ...); + +static DEFINE_SPINLOCK(efi_be_rt_lock); + +static unsigned long efi_be_call_pre(void) +{ + unsigned long flags; + + kernel_neon_begin(); + spin_lock_irqsave(efi_be_rt_lock, flags); At this point we might still have DA_F unmasked, and I don't
Re: [RFC PATCH] arm64/efi: efistub: reuse EFI mapping for Image if it is lower
Hi Ard, On Tue, Jul 15, 2014 at 05:50:30PM +0100, Ard Biesheuvel wrote: The EFI loader may load Image at any available offset. This means Image may reside very close to or at the base of DRAM, in which case the relocation done by efi_entry() results in Image being moved up in memory, which is undesirable (memory below the kernel is currently not usable). Rather than complicating this path I would prefer that we fixed not being able to address memory below stext - TEXT_OFFSET. That would simplify the bootloader's job regardless of UEFI, as any loader can then simply place the kernel at a 2MB-aligned address + TEXT_OFFSET (the kernel would have to not span a 512MB boundary to keep the initial page tables simple, but that's still a weaker requirement than what we currently need). The approach taken in this patch seems sane for the current way the VA space is laid out, but if we're not currently wasting an absurd amount of memory I'd rather we got the VA layout fixed such that we can address memory below the kernel image. How much memory are we seeing wasted as a result of the existing relocation code, and how often? [...] /* + * Check whether the original allocation done by the EFI loader is more + * favorable (i.e., closer to the base of DRAM) than the new one created + * by efi_entry(). If so, reuse the original one. + */ + adr x3, 0f + cmp x3, x0 + bgt 1f // this image is loaded higher, so just proceed normally + + /* + * Jump into the relocated image so we can move the original Image to + * a 2 MB boundary + TEXT_OFFSET inside the original mapping without + * overwriting ourselves. + */ + sub x3, x3, x22 // subtract current base offset + add x3, x3, x0 // add base offset of relocated image + br x3 // jump to next line, but in relocated image At this point I don't believe the necessary icache maintenance + isb have occurred to make the new image visible to the I-side and to ensure we don't have any stale prefetched instructions in the pipeline. That still seems to happen later. Have I missed something? + +0: mov x1, x0 // copy from relocated Image + sub x0, x22, #1 // copy to base of original Image + orr x0, x0, #(SZ_2M - 1)// .. rounded up to 2 MB + ldr x3, =(TEXT_OFFSET + 1) // .. plus TEXT_OFFSET + add x0, x0, x3 + mov x2, x24 // copy 'size of Image' bytes + bl memcpy + add x21, x0, x23// add stext_offset to entry point +1: + /* * Flush dcache covering current runtime addresses * of kernel text/data. Then flush all of icache. */ - adrpx1, _text - add x1, x1, #:lo12:_text - adrpx2, _edata - add x2, x2, #:lo12:_edata - sub x1, x2, x1 - + mov x1, x24 // size of Image bl __flush_dcache_area ic ialluis Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] arm64/efi: efistub: jump to 'stext' directly, not through the header
On Wed, Jul 16, 2014 at 03:51:37PM +0100, Mark Salter wrote: On Tue, 2014-07-15 at 12:58 +0200, Ard Biesheuvel wrote: After the EFI stub has done its business, it jumps into the kernel by branching to offset #0 of the loaded Image, which is where it expects to find the header containing a 'branch to stext' instruction. However, the header is not covered by any PE/COFF section, so the header may not actually be loaded at the expected offset. So instead, jump to 'stext' directly, which is at the base of the PE/COFF .text section, by supplying a symbol 'stext_offset' to efi-entry.o which contains the relative offset of stext into the Image. Also replace other open coded calculations of the same value with a reference to 'stext_offset' Have you actually seen a situation where the header isn't there? Isn't the kernel header actually part of the pe/coff file and firmware loads the whole file into RAM? From my understanding of Ard's earlier comments, this part isn't guaranteed per the UEFI spec. I would rather we weren't relying on implementation details. Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/efi: efistub: jump to 'stext' directly, not through the header
Hi Ard, On Tue, Jul 15, 2014 at 10:10:02AM +0100, Ard Biesheuvel wrote: After the EFI stub has done its business, it jumps into the kernel by branching to offset #0 of the loaded Image, which is where it expects to find the header containing a 'branch to stext' instruction. However, the header is not covered by any PE/COFF section, so the header may not actually be loaded at the expected offset. So instead, jump to 'stext' directly, which is at the base of the PE/COFF .text section. It would be nice to point out in the commit message that the other changes in the patch are just cleanup to use stext_offset rather than open-coding it. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/efi-entry.S | 2 +- arch/arm64/kernel/head.S | 10 ++ 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S index 619b1dd7bcde..6ef541731d9e 100644 --- a/arch/arm64/kernel/efi-entry.S +++ b/arch/arm64/kernel/efi-entry.S @@ -61,7 +61,7 @@ ENTRY(efi_stub_entry) */ mov x20, x0 // DTB address ldr x0, [sp, #16] // relocated _text address - mov x21, x0 + add x21, x0, #:lo12:stext_offset I think we can drop the :lo12: here, which will allow us to have a warning if stext_offset is unexpectedly large (I believe this will currently silently mask bits were that to happen?). Other than that, this looks like a sensible thing to do given that we cannot rely on the header being present. Cheers, Mark. /* * Flush dcache covering current runtime addresses diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index a2c1195abb7f..78ddae28b090 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -137,6 +137,8 @@ efi_head: #endif #ifdef CONFIG_EFI + .globl stext_offset + .setstext_offset, stext - efi_head .align 3 pe_header: .ascii PE @@ -160,7 +162,7 @@ optional_header: .long 0 // SizeOfInitializedData .long 0 // SizeOfUninitializedData .long efi_stub_entry - efi_head // AddressOfEntryPoint - .long stext - efi_head// BaseOfCode + .long stext_offset// BaseOfCode extra_header_fields: .quad 0 // ImageBase @@ -177,7 +179,7 @@ extra_header_fields: .long _edata - efi_head // SizeOfImage // Everything before the kernel image is considered part of the header - .long stext - efi_head// SizeOfHeaders + .long stext_offset// SizeOfHeaders .long 0 // CheckSum .short 0xa // Subsystem (EFI application) .short 0 // DllCharacteristics @@ -222,9 +224,9 @@ section_table: .byte 0 .byte 0 // end of 0 padding of section name .long _edata - stext // VirtualSize - .long stext - efi_head// VirtualAddress + .long stext_offset// VirtualAddress .long _edata - stext // SizeOfRawData - .long stext - efi_head// PointerToRawData + .long stext_offset// PointerToRawData .long 0 // PointerToRelocations (0 for executables) .long 0 // PointerToLineNumbers (0 for executables) -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/arm64: efistub: don't abort if base of DRAM is occupied
On Mon, Jul 14, 2014 at 07:40:48PM +0100, Mark Salter wrote: On Mon, 2014-07-14 at 17:25 +0200, Ard Biesheuvel wrote: If we fail to relocate the kernel Image to its preferred offset of TEXT_OFFSET bytes above the base of DRAM, accept the lowest alternative mapping available instead of aborting. We may lose a bit of memory at the low end, but we can still proceed normally otherwise. This breaks APM Mustang because the spin-table holding pen for secondary CPUs is marked as reserved memory in the TEXT_OFFSET area and the kernel placement using your patch makes it unreachable by kernel. Here is a patch I've been working with to solve the same problem: I'm not sure that this is strictly speaking an issue with UEFI or the relocation strategy (which sounds sane to me). I believe we could easily hit similar issues with spin-table elsewhere, and I think we can fix this more generally without complicating the EFI stub. As I see it, we have two issues here: 1) The linear mapping starts at VA:PAGE_OFFSET+TEXT_OFFSET / PA:PHYS_OFFSET+TEXT_OFFSET, and we cannot access memory below this start address. This seems like a general issue we need to address, as it forces bootloader code to go through a tricky/impossible dance to get the kernel as close to the start of RAM as possible. 2) We cannot access a given cpu-release-addr if it is not in the linear mapping. This is the problem we're encountering now. We can solve (2) now by using a temporary mapping to write to the cpu-release-addr. Does the below patch (untested) fix your issue with spin-table? For (1) we need to rework the arm64 VA layout to decouple the kernel text mapping from the linear map, but that's a lot more work. Cheers, Mark. 8 From 73812b654a07f497f71bd38dfb4a6753fb0ad23e Mon Sep 17 00:00:00 2001 From: Mark Rutland mark.rutl...@arm.com Date: Tue, 15 Jul 2014 11:32:53 +0100 Subject: [PATCH] arm64: spin-table: handle unmapped cpu-release-addrs In certain cases the cpu-release-addr of a CPU may not fall in the linear mapping (e.g. when the kernel is loaded above this address due to the presence of other images in memory). This is problematic for the spin-table code as it assumes that it can trivially convert a cpu-release-addr to a valid VA in the linear map. This patch modifies the spin-table code to use a temporary cached mapping to write to a given cpu-release-addr, enabling us to support addresses regardless of whether they are covered by the linear mapping. Signed-off-by: Mark Rutland mark.rutl...@arm.com --- arch/arm64/kernel/smp_spin_table.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kernel/smp_spin_table.c b/arch/arm64/kernel/smp_spin_table.c index 0347d38..70181c1 100644 --- a/arch/arm64/kernel/smp_spin_table.c +++ b/arch/arm64/kernel/smp_spin_table.c @@ -20,6 +20,7 @@ #include linux/init.h #include linux/of.h #include linux/smp.h +#include linux/types.h #include asm/cacheflush.h #include asm/cpu_ops.h @@ -65,12 +66,21 @@ static int smp_spin_table_cpu_init(struct device_node *dn, unsigned int cpu) static int smp_spin_table_cpu_prepare(unsigned int cpu) { - void **release_addr; + __le64 __iomem *release_addr; if (!cpu_release_addr[cpu]) return -ENODEV; - release_addr = __va(cpu_release_addr[cpu]); + /* +* The cpu-release-addr may or may not be inside the linear mapping. +* As ioremap_cache will either give us a new mapping or reuse the +* existing linear mapping, we can use it to cover both cases. In +* either case the memory will be MT_NORMAL. +*/ + release_addr = ioremap_cache(cpu_release_addr[cpu], +sizeof(*release_addr)); + if (!release_addr) + return -ENOMEM; /* * We write the release address as LE regardless of the native @@ -79,15 +89,16 @@ static int smp_spin_table_cpu_prepare(unsigned int cpu) * boot-loader's endianess before jumping. This is mandated by * the boot protocol. */ - release_addr[0] = (void *) cpu_to_le64(__pa(secondary_holding_pen)); - - __flush_dcache_area(release_addr, sizeof(release_addr[0])); + writeq_relaxed(__pa(secondary_holding_pen), release_addr); + __flush_dcache_area(release_addr, sizeof(*release_addr)); /* * Send an event to wake up the secondary CPU. */ sev(); + iounmap(release_addr); + return 0; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/arm64: efistub: don't abort if base of DRAM is occupied
On Tue, Jul 15, 2014 at 11:02:22AM +0100, Leif Lindholm wrote: On Mon, Jul 14, 2014 at 02:40:48PM -0400, Mark Salter wrote: On Mon, 2014-07-14 at 17:25 +0200, Ard Biesheuvel wrote: If we fail to relocate the kernel Image to its preferred offset of TEXT_OFFSET bytes above the base of DRAM, accept the lowest alternative mapping available instead of aborting. We may lose a bit of memory at the low end, but we can still proceed normally otherwise. This breaks APM Mustang because the spin-table holding pen for secondary CPUs is marked as reserved memory in the TEXT_OFFSET area and the kernel placement using your patch makes it unreachable by kernel. Here is a patch I've been working with to solve the same problem: Hmm. The thing I don't like about the below approach is that it hard wires in the memory below TEXT_OFFSET cannot be used aspect, beyond the current prectical limitation. Since we are likely to see platforms with UEFI memory in use around start of RAM, that is a limitation we should probably try to get rid of. This isn't just an issue for UEFI. There are other reasons one might want to load a kernel away from the start of RAM while still wanting to address said RAM(e.g. kdump). We should address that. [...] @@ -273,6 +282,10 @@ static void __init free_boot_services(void) continue; } + /* Don't free anything below kernel */ + if (md-phys_addr PHYS_OFFSET) + continue; + Is the spin table area really allocated as BOOT_SERVICES_*? If that is the case, this platform is _broken_. The spin-table memory (both the code and the mailboxes) needs to live around forever in case you don't boot all of the secondaries. Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64/efi: efistub: jump to 'stext' directly, not through the header
On Tue, Jul 15, 2014 at 12:49:07PM +0100, Ard Biesheuvel wrote: On 15 July 2014 13:31, Mark Rutland mark.rutl...@arm.com wrote: diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S index 619b1dd7bcde..6ef541731d9e 100644 --- a/arch/arm64/kernel/efi-entry.S +++ b/arch/arm64/kernel/efi-entry.S @@ -61,7 +61,7 @@ ENTRY(efi_stub_entry) */ mov x20, x0 // DTB address ldr x0, [sp, #16] // relocated _text address - mov x21, x0 + add x21, x0, #:lo12:stext_offset I think we can drop the :lo12: here, which will allow us to have a warning if stext_offset is unexpectedly large (I believe this will currently silently mask bits were that to happen?). There is no alternative lo12 relocation that errors out when the value does not fit, so it would have to use a literal instead. Ah, that's a shame. What happens when the value doesn't fit if the linker / assembler don't error out? Obviously, it will jump into the wrong place if stext ever gets pushed beyond a 4k boundary, which is not that likely (current value is 0x160, we may want to up it to 0x200 at some point if we need to increase the PE/COFF section alignment, which some versions of the (poor) PE/COFF documentation say should be 0x200 at the minimum) However, doing ldr x21, =stext_offset works fine as well That sounds like a toolchain bug if they're silently doing the wrong thing. Meh, don't think so, really. You get what you pay for, and :lo12: just isn't supposed to care. (It's mainly used with adrp to get a +/- 4gb PC relative reach) We appear to have a misunderstanding. I was suggesting we use: add x21, x0, #stext_offset ... and I thought you meant that wouldn't produce an error if the immediate was out of range. I understand that :lo12: cuts the top bits off, and for its intended use that makes sense. From testing I see see that gas isn't happy to accept an undefined symbol as an immediate without :lo12:, and I can see why that makes sense. My suggestion was bad, sorry. Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] efi/arm64: efistub: don't abort if base of DRAM is occupied
On Tue, Jul 15, 2014 at 03:23:26PM +0100, Mark Salter wrote: On Tue, 2014-07-15 at 14:54 +0100, Leif Lindholm wrote: On Tue, Jul 15, 2014 at 09:11:00AM -0400, Mark Salter wrote: On Tue, 2014-07-15 at 11:02 +0100, Leif Lindholm wrote: @@ -273,6 +282,10 @@ static void __init free_boot_services(void) continue; } + /* Don't free anything below kernel */ + if (md-phys_addr PHYS_OFFSET) + continue; + Is the spin table area really allocated as BOOT_SERVICES_*? No. It is EFI_RESERVED_TYPE. But if UEFI is allowed below the kernel, then there could be BS code/data memory which we'd want to ignore. Well, if it is boot service code/data - then there is no need for us to keep it around after ExitBootServices(). / Leif One would think, but EFI has proven to be less than strictly compliant in that regard in the past. I'm inclined to keep the boot services around until after SetVirtualAddressMap just in case. Why should we add a work around for a potential bug that doesn't exist yet? That just provides fertile ground for such a bug to spring into existence and for people to ignore it when bringing up their SoC. The comment doesn't explain the rationale and the code doesn't make sense given a sane implementation. For the moment it's better to be strict, IMO. Otherwise there are plenty of other potential bugs we could attempt to work around to enable people to write firmware with even lower standards... If we have to work around something then we should have an actual issue to work around first. Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] arm64/efi: efistub: get text offset and image size from the Image header
Hi Ard, On Mon, Jul 14, 2014 at 05:17:51PM +0100, Ard Biesheuvel wrote: The EFI stub for arm64 needs to behave like an ordinary bootloader in the sense that it needs to use the EFI environment and the Image header at runtime and not rely on the linker or preprocessor to produce values for text offset, image size and kernel size. Could you elaborate on _why_ we can't do that, given it's linked into the kernel Image? Are we splitting the stub from the kernel? What's going on? This patch also fixes the corner case where Image happens to be loaded at exactly the right offset, but the allocation is actually too small to satisfy the requirement imposed by image_size as set in the header. It's not really imposed by image_size. While it's described by image_size it's imposed by the existence of the BSS section and the initial page tables. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/Makefile | 2 -- arch/arm64/kernel/efi-stub.c | 29 - 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index cdaedad3afe5..99b676eeeb0f 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -4,8 +4,6 @@ CPPFLAGS_vmlinux.lds := -DTEXT_OFFSET=$(TEXT_OFFSET) AFLAGS_head.o:= -DTEXT_OFFSET=$(TEXT_OFFSET) -CFLAGS_efi-stub.o:= -DTEXT_OFFSET=$(TEXT_OFFSET) \ --I$(src)/../../../scripts/dtc/libfdt CFLAGS_REMOVE_ftrace.o = -pg CFLAGS_REMOVE_insn.o = -pg diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c index 9b61d66e2d20..4ba90b2ef677 100644 --- a/arch/arm64/kernel/efi-stub.c +++ b/arch/arm64/kernel/efi-stub.c @@ -11,8 +11,7 @@ */ #include linux/efi.h #include asm/efi.h -#include asm/sections.h - +#include asm/image_hdr.h efi_status_t handle_kernel_image(efi_system_table_t *sys_table, unsigned long *image_addr, @@ -23,24 +22,28 @@ efi_status_t handle_kernel_image(efi_system_table_t *sys_table, efi_loaded_image_t *image) { efi_status_t status; - unsigned long kernel_size, kernel_memsize = 0; + struct arm64_image_hdr *hdr = (struct arm64_image_hdr *)*image_addr; + + /* make sure image_addr points to an arm64 kernel Image */ + if (!arm64_image_hdr_check(hdr)) { + pr_efi_err(sys_table, Kernel Image header check failed\n); + return EFI_LOAD_ERROR; + } Surely this should always be the case if the stub is linked into the kernel? It would be nice to know the rationale for this. /* Relocate the image, if required. */ - kernel_size = _edata - _text; - if (*image_addr != (dram_base + TEXT_OFFSET)) { - kernel_memsize = kernel_size + (_end - _edata) + TEXT_OFFSET; - status = efi_low_alloc(sys_table, kernel_memsize, SZ_2M, + if (*image_addr != (dram_base + hdr-text_offset) || + image-image_size hdr-image_size) { As far as I can tell the size of the Image (image-image_size) is always going to be less than the effective run time image size (hdr-image_size). The SizeOfImage field in head.S which I assume image-image_size is derived from (not having a UEFI spec in front of me) seems to cover everything up to _edata but skips the BSS, and initial page tables, but this is covered by the header's image_size field. Am I missing something? Surely we _always_ expect image-image_size to be smaller than hdr-image_size? Cheers, Mark. + *reserve_size = hdr-text_offset + hdr-image_size; + status = efi_low_alloc(sys_table, *reserve_size, SZ_2M, reserve_addr); if (status != EFI_SUCCESS) { pr_efi_err(sys_table, Failed to relocate kernel\n); + *reserve_size = 0; return status; } - memcpy((void *)*reserve_addr + TEXT_OFFSET, (void *)*image_addr, -kernel_size); - *image_addr = *reserve_addr + TEXT_OFFSET; - *reserve_size = kernel_memsize; + memcpy((void *)*reserve_addr + hdr-text_offset, +(void *)*image_addr, image-image_size); + *image_addr = *reserve_addr + hdr-text_offset; } - - return EFI_SUCCESS; } -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] arm64: add C struct definition for Image header
Hi Ard, On Mon, Jul 14, 2014 at 05:17:50PM +0100, Ard Biesheuvel wrote: In order to be able to interpret the Image header from C code, we need a struct definition that reflects the specification for Image headers as laid out in Documentation/arm64/booting.txt. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/include/asm/image_hdr.h | 75 ++ 1 file changed, 75 insertions(+) create mode 100644 arch/arm64/include/asm/image_hdr.h diff --git a/arch/arm64/include/asm/image_hdr.h b/arch/arm64/include/asm/image_hdr.h new file mode 100644 index ..9dc74b672124 --- /dev/null +++ b/arch/arm64/include/asm/image_hdr.h @@ -0,0 +1,75 @@ +/* + * image_hdr.h - C struct definition of the arm64 Image header format + * + * Copyright (C) 2014 Linaro Ltd + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef __ASM_IMAGE_HDR_H +#define __ASM_IMAGE_HDR_H + +#ifdef __KERNEL__ +#include linux/types.h +#else +#include stdint.h +#endif + +/** + * struct arm64_image_hdr - arm64 kernel Image binary header format + * @code0: entry point, first instruction to jump to when booting + * the Image + * @code1: second instruction of entry point + * @text_offset: mandatory offset (little endian) beyond a 2 megabyte + * aligned boundary that needs to be taken into account + * when deciding where to load the image + * @image_size: total size (little endian) of the memory area (counted + * from the offset where the image was loaded) that may be + * used by the booting kernel before memory reservations + * can be honoured + * @flags: little endian bit field + * Bit 0: Kernel endianness. 1 if BE, 0 if LE. + * Bits 1-63: Reserved. + * @res2:reserved, must be 0 + * @res3:reserved, must be 0 + * @res4:reserved, must be 0 + * @magic: Magic number (little endian): ARM\x64 + * @res5:reserved (used for PE COFF offset) + * + * This definition reflects the definition in Documentation/arm64/booting.txt in + * the Linux source tree. + */ Can we not just say See Documentation/arm64/booting.txt rather than duplicating the description? +struct arm64_image_hdr { + uint32_t code0; + uint32_t code1; + uint64_t text_offset; + uint64_t image_size; + uint64_t flags; + uint64_t res2; + uint64_t res3; + uint64_t res4; + uint32_t magic; + uint32_t res5; +}; + +static const union { + uint8_t le_val[4]; + uint32_tcpu_val; +} arm64_image_hdr_magic = { + .le_val = ARM\x64 +}; + +/** + * int arm64_image_hdr_check() - check whether hdr points to an arm64 Image + * @hdr: pointer to an arm64 Image + * + * Return: 1 if check is successful, 0 otherwise + */ +static inline int arm64_image_hdr_check(struct arm64_image_hdr const *hdr) +{ + return hdr-magic == arm64_image_hdr_magic.cpu_val; +} Rather than the arm64_image_hdr_magic union trick above, could we not just use the magic inline with a memcmp, e.g. static inline int arm64_image_hdr_check(struct arm64_image_hdr const *hdr) { return !memcmp(hdr-magic, ARM\x64, 4); } I'm a little hesitant to expose this to userspace in case the size of the structure grows and userspace starts relying on it having a particular length (though perhaps that's unfounded). It's also a little unfortunate that we lose the nice endianness annotations here, as it gives a wrong impression of the structure as a set of native-endian fields. Thanks, Mark. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] arm64/efi: efistub: get text offset and image size from the Image header
On Mon, Jul 14, 2014 at 06:35:32PM +0100, Ard Biesheuvel wrote: On 14 July 2014 18:54, Mark Rutland mark.rutl...@arm.com wrote: Hi Ard, On Mon, Jul 14, 2014 at 05:17:51PM +0100, Ard Biesheuvel wrote: The EFI stub for arm64 needs to behave like an ordinary bootloader in the sense that it needs to use the EFI environment and the Image header at runtime and not rely on the linker or preprocessor to produce values for text offset, image size and kernel size. Could you elaborate on _why_ we can't do that, given it's linked into the kernel Image? Are we splitting the stub from the kernel? What's going on? Perhaps Leif can chime in here? I was under the impression that you were aligned in this regard. Perhaps we were, and I can see reasons for doing this (e.g. as a step towards making it possible to boot a BE kernel using UEFI), but the commit message doesn't mention any such reason. For the sake of my poor recollection and that of others, some words in the commit message would be helpful. This patch also fixes the corner case where Image happens to be loaded at exactly the right offset, but the allocation is actually too small to satisfy the requirement imposed by image_size as set in the header. It's not really imposed by image_size. While it's described by image_size it's imposed by the existence of the BSS section and the initial page tables. Yes, that is true. But from stub POV, it doesn't matter /why/ image_size has a particular value. Sure, the stub is a piece of software with no particular knowledge of anything. However, those of us reading the commit message are not, and for our sake it would be nice to have a description in case we care as to /why/ image_size has a particular value and why that value happens to be useful here. As far as I can see, we needed to allocate memory for the BSS even before the image_size field was introduced, so this isn't a new requirement. All that's needed is a change to the wording of the commit message to explain that we're trying to allocate space for (runtime-initialised) data between _edata and _end rather than how we find out how much space we need for that (reading the image_size field). Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm64/kernel/Makefile | 2 -- arch/arm64/kernel/efi-stub.c | 29 - 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index cdaedad3afe5..99b676eeeb0f 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -4,8 +4,6 @@ CPPFLAGS_vmlinux.lds := -DTEXT_OFFSET=$(TEXT_OFFSET) AFLAGS_head.o:= -DTEXT_OFFSET=$(TEXT_OFFSET) -CFLAGS_efi-stub.o:= -DTEXT_OFFSET=$(TEXT_OFFSET) \ --I$(src)/../../../scripts/dtc/libfdt CFLAGS_REMOVE_ftrace.o = -pg CFLAGS_REMOVE_insn.o = -pg diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c index 9b61d66e2d20..4ba90b2ef677 100644 --- a/arch/arm64/kernel/efi-stub.c +++ b/arch/arm64/kernel/efi-stub.c @@ -11,8 +11,7 @@ */ #include linux/efi.h #include asm/efi.h -#include asm/sections.h - +#include asm/image_hdr.h efi_status_t handle_kernel_image(efi_system_table_t *sys_table, unsigned long *image_addr, @@ -23,24 +22,28 @@ efi_status_t handle_kernel_image(efi_system_table_t *sys_table, efi_loaded_image_t *image) { efi_status_t status; - unsigned long kernel_size, kernel_memsize = 0; + struct arm64_image_hdr *hdr = (struct arm64_image_hdr *)*image_addr; + + /* make sure image_addr points to an arm64 kernel Image */ + if (!arm64_image_hdr_check(hdr)) { + pr_efi_err(sys_table, Kernel Image header check failed\n); + return EFI_LOAD_ERROR; + } Surely this should always be the case if the stub is linked into the kernel? It would be nice to know the rationale for this. Well, there is an actual issue addressed by this: the PE/COFF .text section covers everything from stext to _edata, so it does not cover the header itself. In fact, PE/COFF does not allow the headers themselves to be covered by a section (or at least, the Tianocore/EDK2 UEFI implementation does not). So the pointer points *outside* of the .text section, and we are assuming that whatever is at that [negative] offset in the file was also copied to the same offset in memory.(For instance, there is no reason to assume that all implementations of EFI will copy data that is not part of any section to RAM when booting an image located in NOR flash) If we are making assumptions which aren't always valid, then why the hell are we making them in the first place, and then trying to validate them? How does reading an address that we have absolutely no guarantee
Re: [PATCH v2 1/3] Documentation: arm: add UEFI support documentation
Hi Leif, On Thu, Oct 03, 2013 at 12:24:39PM +0100, Leif Lindholm wrote: This patch provides documentation of the [U]EFI runtime services and configuration features. Cc: linux-...@vger.kernel.org Signed-off-by: Leif Lindholm leif.lindh...@linaro.org Acked-by: Rob Landley r...@landley.net --- Documentation/arm/00-INDEX |3 +++ Documentation/arm/uefi.txt | 47 2 files changed, 50 insertions(+) create mode 100644 Documentation/arm/uefi.txt diff --git a/Documentation/arm/00-INDEX b/Documentation/arm/00-INDEX index 4978456..87e01d1 100644 --- a/Documentation/arm/00-INDEX +++ b/Documentation/arm/00-INDEX @@ -36,3 +36,6 @@ nwfpe/ - NWFPE floating point emulator documentation swp_emulation - SWP/SWPB emulation handler/logging description + +uefi.txt + - [U]EFI configuration and runtime services documentation diff --git a/Documentation/arm/uefi.txt b/Documentation/arm/uefi.txt new file mode 100644 index 000..e6e4d41 --- /dev/null +++ b/Documentation/arm/uefi.txt @@ -0,0 +1,47 @@ +UEFI, the Unified Extensible Firmware Interface is a speifcication +governing the behaviours of compatible firmware interfaces. It is +maintained by the UEFI Forum - http://www.uefi.org/. + +Since UEFI is an evolution of its predecessor 'EFI', the terms EFI and +UEFI are used somewhat interchangeably in this document and associated +source code. + +The implementation depends on receiving the UEFI runtime memory map and a +pointer to the System Table in a Flattened Device Tree - so is only available +with CONFIG_OF. + +It parses the FDT /chosen node for the following parameters: +- 'linux,efi-system-table': + Physical address of the system table. (required) + 64-bit value since an ARMv7 plattform may support LPAE, and to facilitate s/plattform/platform/ + code sharing with arm64. Top 32 bits will be ignored, since UEFI specification + mandates a 1:1 mapping of all RAM. You could use something like #size-cells to describe how big this is going to be. Is this Linux-specific -- it looks like something provided by EFI rather than the kernel itself. +- 'linux,efi-mmap': + The EFI memory map as an embedded property. (required) + An array of type EFI_MEMORY_DESCRIPTOR as described by the UEFI + specification, current version described in Linux by efi_memory_desc_t. + The memory map is represented in little-endian, not DT, byte order. + This map needs to contain at least the regions to be preserved for runtime + services, but would normally just be the map retreieved by calling UEFI + GetMemoryMap() immediately before ExitBootServices(). This is a little scary. If the format is so complicated, should it really be embedded? How big is this likely to be? Given that this is in a format defined externally, this isn't really Linux-specific. Maybe we need an efi pseudo-vendor prefix. +- 'linux,efi-mmap-desc-size': + Size of each descriptor in the memory map. (override default) What units is this in? How many u32 cells does this take up (one presumably)? +- 'linux,efi-mmap-desc-ver': + Memory descriptor format version. (override default) Type, format, valid values and their meaning? Thanks, Mark. + +It also depends on early_memremap() to parse the UEFI configuration tables. + +For actually enabling [U]EFI support, enable: +- CONFIG_EFI=y +- CONFIG_EFI_VARS=y or m + +After the kernel has mapped the required regions into its address space, +a SetVirtualAddressMap() call is made into UEFI in order to update +relocations. This call must be performed with all the code in a 1:1 +mapping. This implementation achieves this by temporarily disabling the +MMU for the duration of this call. This can only be done safely: +- before secondary CPUs are brought online. +- after early_initcalls have completed, since it uses setup_mm_for_reboot(). + +For verbose debug messages, specify 'uefi_debug' on the kernel command +line. -- 1.7.10.4 ` ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html