Re: [PATCH] x86: Add early quirk to reset Apple AirPort card
On Sun, 29 May, at 01:35:28AM, Lukas Wunner wrote: > The EFI firmware on Macs contains a full-fledged network stack for > downloading OS X images from osrecovery.apple.com. Unfortunately > on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331 > wireless card on every boot and leaves it enabled even after > ExitBootServices has been called. The card continues to assert its IRQ > line, causing spurious interrupts if the IRQ is shared. It also corrupts > memory by DMAing received packets, allowing for remote code execution > over the air. This only stops when a driver is loaded for the wireless > card, which may be never if the driver is not installed or blacklisted. [... Snip a very thorough changelog ...] This patch looks fine to me from an EFI perspective. Acked-by: Matt Fleming
[PATCH 1/2] efi: Fix for_each_efi_memory_desc_in_map() for empty memmaps
From: Vitaly Kuznetsov <vkuzn...@redhat.com> Commit 78ce248faa3c ("efi: Iterate over efi.memmap in for_each_efi_memory_desc()") introduced a regression for systems booted with 'noefi' kernel option. In particular, I observe early kernel hang in efi_find_mirror() on for_each_efi_memory_desc() call. As we don't have efi memmap we enter this iterator with the following parameters: efi.memmap.map = 0, efi.memmap.map_end = 0, efi.memmap.desc_size = 28 for_each_efi_memory_desc_in_map() does the following comparison: (md) <= (efi_memory_desc_t *)((m)->map_end - (m)->desc_size); where md = 0, (m)->map_end = 0 and (m)->desc_size = 28 but when we subtract something from a NULL pointer wrap around happens and we end up returning invalid pointer. Fixes: 78ce248faa3c ("efi: Iterate over efi.memmap in for_each_efi_memory_desc()") Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Mark Salter <msal...@redhat.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- include/linux/efi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/efi.h b/include/linux/efi.h index c2db3ca22217..f196dd0b0f2f 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1005,7 +1005,7 @@ extern int efi_memattr_apply_permissions(struct mm_struct *mm, /* Iterate through an efi_memory_map */ #define for_each_efi_memory_desc_in_map(m, md)\ for ((md) = (m)->map; \ -(md) <= (efi_memory_desc_t *)((m)->map_end - (m)->desc_size); \ +((void *)(md) + (m)->desc_size) <= (m)->map_end; \ (md) = (void *)(md) + (m)->desc_size) /** -- 2.7.3
[PATCH 1/2] efi: Fix for_each_efi_memory_desc_in_map() for empty memmaps
From: Vitaly Kuznetsov Commit 78ce248faa3c ("efi: Iterate over efi.memmap in for_each_efi_memory_desc()") introduced a regression for systems booted with 'noefi' kernel option. In particular, I observe early kernel hang in efi_find_mirror() on for_each_efi_memory_desc() call. As we don't have efi memmap we enter this iterator with the following parameters: efi.memmap.map = 0, efi.memmap.map_end = 0, efi.memmap.desc_size = 28 for_each_efi_memory_desc_in_map() does the following comparison: (md) <= (efi_memory_desc_t *)((m)->map_end - (m)->desc_size); where md = 0, (m)->map_end = 0 and (m)->desc_size = 28 but when we subtract something from a NULL pointer wrap around happens and we end up returning invalid pointer. Fixes: 78ce248faa3c ("efi: Iterate over efi.memmap in for_each_efi_memory_desc()") Signed-off-by: Vitaly Kuznetsov Cc: Mark Salter Cc: "K. Y. Srinivasan" Signed-off-by: Matt Fleming --- include/linux/efi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/efi.h b/include/linux/efi.h index c2db3ca22217..f196dd0b0f2f 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1005,7 +1005,7 @@ extern int efi_memattr_apply_permissions(struct mm_struct *mm, /* Iterate through an efi_memory_map */ #define for_each_efi_memory_desc_in_map(m, md)\ for ((md) = (m)->map; \ -(md) <= (efi_memory_desc_t *)((m)->map_end - (m)->desc_size); \ +((void *)(md) + (m)->desc_size) <= (m)->map_end; \ (md) = (void *)(md) + (m)->desc_size) /** -- 2.7.3
[PATCH 2/2] efi/arm: Fix the format of debug message from efi
From: Dennis Chen <dennis.c...@arm.com> When enable debug of efi and memblock with 'efi=debug memblock=debug' appended to the kernel command line, the debug message output for earyly_con looks like: [0.00] efi: 0xe105-0xe105 [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [0.00] efi: 0xe130-0xe1300fff [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [0.00] efi: 0xe820-0xe827 [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [0.00] efi: 0x0080-0x008001e7 [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC] [0.00] memblock_add: [0x80-0x8001e7] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c [0.00] * ... This patch is trying to fix the above output messed up by memblock_add(), so we can get below debug mesg looks more formally after applied: [0.00] efi: 0xe105-0xe105 [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [0.00] efi: 0xe130-0xe1300fff [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [0.00] efi: 0xe820-0xe827 [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [0.00] efi: 0x0080-0x008001e7 [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC]* [0.00] memblock_add: [0x80-0x8001e7] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c ... Signed-off-by: Dennis Chen <dennis.c...@arm.com> Acked-by: Mark Rutland <mark.rutl...@arm.com> Cc: Catalin Marinas <catalin.mari...@arm.com> Cc: Steve Capper <steve.cap...@arm.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: linux-...@vger.kernel.org Cc: Steve McIntyre <st...@einval.com> Cc: Steven Rostedt <rost...@goodmis.org> Cc: Dan Williams <dan.j.willi...@intel.com> Cc: Mark Salter <msal...@redhat.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- drivers/firmware/efi/arm-init.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c index a850cbc48d8d..c49d50e68aee 100644 --- a/drivers/firmware/efi/arm-init.c +++ b/drivers/firmware/efi/arm-init.c @@ -174,6 +174,7 @@ static __init void reserve_regions(void) { efi_memory_desc_t *md; u64 paddr, npages, size; + int resv; if (efi_enabled(EFI_DBG)) pr_info("Processing EFI memory map:\n"); @@ -190,12 +191,14 @@ static __init void reserve_regions(void) paddr = md->phys_addr; npages = md->num_pages; + resv = is_reserve_region(md); if (efi_enabled(EFI_DBG)) { char buf[64]; - pr_info(" 0x%012llx-0x%012llx %s", + pr_info(" 0x%012llx-0x%012llx %s%s\n", paddr, paddr + (npages << EFI_PAGE_SHIFT) - 1, - efi_md_typeattr_format(buf, sizeof(buf), md)); + efi_md_typeattr_format(buf, sizeof(buf), md), + resv ? "*" : ""); } memrange_efi_to_native(, ); @@ -204,14 +207,9 @@ static __init void reserve_regions(void) if (is_normal_ram(md)) early_init_dt_add_memory_arch(paddr, size); - if (is_reserve_region(md)) { + if (resv) memblock_mark_nomap(paddr, size); - if (efi_enabled(EFI_DBG)) - pr_cont("*"); - } - if (efi_enabled(EFI_DBG)) - pr_cont("\n"); } set_bit(EFI_MEMMAP, ); -- 2.7.3
[GIT PULL 0/2] EFI urgent fixes
Folks, please pull the following urgent patches which fix a boot crash when using the "noefi" parameter and the debug output on arm. The following changes since commit 1a695a905c18548062509178b98bc91e67510864: Linux 4.7-rc1 (2016-05-29 09:29:24 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git tags/efi-urgent for you to fetch changes up to 1f0cf3892caeab20a99c19f5523499be77b533cd: efi/arm: Fix the format of debug message from efi (2016-05-30 22:51:53 +0100) * Fix crash when booting with the "noefi" kernel parameter, caused by recent changes to for_each_efi_memory_desc_in_map() - Vitaly Kuznetsov * Unscramble the debug output on arm when efi=debug and memblock=debug is passed on the kernel cmdline - Dennis Chen Dennis Chen (1): efi/arm: Fix the format of debug message from efi Vitaly Kuznetsov (1): efi: Fix for_each_efi_memory_desc_in_map() for empty memmaps drivers/firmware/efi/arm-init.c | 14 ++ include/linux/efi.h | 2 +- 2 files changed, 7 insertions(+), 9 deletions(-)
[PATCH 2/2] efi/arm: Fix the format of debug message from efi
From: Dennis Chen When enable debug of efi and memblock with 'efi=debug memblock=debug' appended to the kernel command line, the debug message output for earyly_con looks like: [0.00] efi: 0xe105-0xe105 [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [0.00] efi: 0xe130-0xe1300fff [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [0.00] efi: 0xe820-0xe827 [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [0.00] efi: 0x0080-0x008001e7 [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC] [0.00] memblock_add: [0x80-0x8001e7] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c [0.00] * ... This patch is trying to fix the above output messed up by memblock_add(), so we can get below debug mesg looks more formally after applied: [0.00] efi: 0xe105-0xe105 [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [0.00] efi: 0xe130-0xe1300fff [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [0.00] efi: 0xe820-0xe827 [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [0.00] efi: 0x0080-0x008001e7 [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC]* [0.00] memblock_add: [0x80-0x8001e7] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c ... Signed-off-by: Dennis Chen Acked-by: Mark Rutland Cc: Catalin Marinas Cc: Steve Capper Cc: Will Deacon Cc: Ard Biesheuvel Cc: linux-...@vger.kernel.org Cc: Steve McIntyre Cc: Steven Rostedt Cc: Dan Williams Cc: Mark Salter Signed-off-by: Matt Fleming --- drivers/firmware/efi/arm-init.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c index a850cbc48d8d..c49d50e68aee 100644 --- a/drivers/firmware/efi/arm-init.c +++ b/drivers/firmware/efi/arm-init.c @@ -174,6 +174,7 @@ static __init void reserve_regions(void) { efi_memory_desc_t *md; u64 paddr, npages, size; + int resv; if (efi_enabled(EFI_DBG)) pr_info("Processing EFI memory map:\n"); @@ -190,12 +191,14 @@ static __init void reserve_regions(void) paddr = md->phys_addr; npages = md->num_pages; + resv = is_reserve_region(md); if (efi_enabled(EFI_DBG)) { char buf[64]; - pr_info(" 0x%012llx-0x%012llx %s", + pr_info(" 0x%012llx-0x%012llx %s%s\n", paddr, paddr + (npages << EFI_PAGE_SHIFT) - 1, - efi_md_typeattr_format(buf, sizeof(buf), md)); + efi_md_typeattr_format(buf, sizeof(buf), md), + resv ? "*" : ""); } memrange_efi_to_native(, ); @@ -204,14 +207,9 @@ static __init void reserve_regions(void) if (is_normal_ram(md)) early_init_dt_add_memory_arch(paddr, size); - if (is_reserve_region(md)) { + if (resv) memblock_mark_nomap(paddr, size); - if (efi_enabled(EFI_DBG)) - pr_cont("*"); - } - if (efi_enabled(EFI_DBG)) - pr_cont("\n"); } set_bit(EFI_MEMMAP, ); -- 2.7.3
[GIT PULL 0/2] EFI urgent fixes
Folks, please pull the following urgent patches which fix a boot crash when using the "noefi" parameter and the debug output on arm. The following changes since commit 1a695a905c18548062509178b98bc91e67510864: Linux 4.7-rc1 (2016-05-29 09:29:24 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git tags/efi-urgent for you to fetch changes up to 1f0cf3892caeab20a99c19f5523499be77b533cd: efi/arm: Fix the format of debug message from efi (2016-05-30 22:51:53 +0100) * Fix crash when booting with the "noefi" kernel parameter, caused by recent changes to for_each_efi_memory_desc_in_map() - Vitaly Kuznetsov * Unscramble the debug output on arm when efi=debug and memblock=debug is passed on the kernel cmdline - Dennis Chen Dennis Chen (1): efi/arm: Fix the format of debug message from efi Vitaly Kuznetsov (1): efi: Fix for_each_efi_memory_desc_in_map() for empty memmaps drivers/firmware/efi/arm-init.c | 14 ++ include/linux/efi.h | 2 +- 2 files changed, 7 insertions(+), 9 deletions(-)
Re: [PATCH] efi: fix for_each_efi_memory_desc_in_map() for empty memmaps
(Cc'ing Mark, the original author) On Wed, 25 May, at 04:36:55PM, Vitaly Kuznetsov wrote: > Commit 78ce248faa3c ("efi: Iterate over efi.memmap in > for_each_efi_memory_desc()") introduced a regression for systems booted > with 'noefi' kernel option. In particular, I observe early kernel hang in > efi_find_mirror() on for_each_efi_memory_desc() call. As we don't have > efi memmap we enter this iterator with the following parameters: > > efi.memmap.map = 0, efi.memmap.map_end = 0, efi.memmap.desc_size = 28 > > for_each_efi_memory_desc_in_map() does the following comparison: > > (md) <= (efi_memory_desc_t *)((m)->map_end - (m)->desc_size); > > where md = 0, (m)->map_end = 0 and (m)->desc_size = 28 but when we subtract > something from a NULL pointer wrap around happens and we end up returning > invalid pointer. > > Fixes: 78ce248faa3c ("efi: Iterate over efi.memmap in > for_each_efi_memory_desc()") > Signed-off-by: Vitaly Kuznetsov> --- > include/linux/efi.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/efi.h b/include/linux/efi.h > index c2db3ca..229ccb5 100644 > --- a/include/linux/efi.h > +++ b/include/linux/efi.h > @@ -1005,7 +1005,8 @@ extern int efi_memattr_apply_permissions(struct > mm_struct *mm, > /* Iterate through an efi_memory_map */ > #define for_each_efi_memory_desc_in_map(m, md) >\ > for ((md) = (m)->map; \ > - (md) <= (efi_memory_desc_t *)((m)->map_end - (m)->desc_size); \ > + (efi_memory_desc_t *)((md) + (m)->desc_size) <= \ > + (efi_memory_desc_t *)(m)->map_end;\ >(md) = (void *)(md) + (m)->desc_size) Curse C type casting! You're adding m->desc_size to a (efi_memory_desc_t *) which is 40 bytes. You need the below to avoid breaking regular EFI boot. But yeah, this looks like a fine fix. Mark, any concerns? --- diff --git a/include/linux/efi.h b/include/linux/efi.h index d8ab480b1089..3a3f8d8bd9be 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1026,7 +1026,7 @@ extern int efi_memattr_apply_permissions(struct mm_struct *mm, /* Iterate through an efi_memory_map */ #define for_each_efi_memory_desc_in_map(m, md)\ for ((md) = (m)->map; \ -(md) <= (efi_memory_desc_t *)((m)->map_end - (m)->desc_size); \ +((void *)(md) + (m)->desc_size) <= (m)->map_end; \ (md) = (void *)(md) + (m)->desc_size) /**
Re: [PATCH] efi: fix for_each_efi_memory_desc_in_map() for empty memmaps
(Cc'ing Mark, the original author) On Wed, 25 May, at 04:36:55PM, Vitaly Kuznetsov wrote: > Commit 78ce248faa3c ("efi: Iterate over efi.memmap in > for_each_efi_memory_desc()") introduced a regression for systems booted > with 'noefi' kernel option. In particular, I observe early kernel hang in > efi_find_mirror() on for_each_efi_memory_desc() call. As we don't have > efi memmap we enter this iterator with the following parameters: > > efi.memmap.map = 0, efi.memmap.map_end = 0, efi.memmap.desc_size = 28 > > for_each_efi_memory_desc_in_map() does the following comparison: > > (md) <= (efi_memory_desc_t *)((m)->map_end - (m)->desc_size); > > where md = 0, (m)->map_end = 0 and (m)->desc_size = 28 but when we subtract > something from a NULL pointer wrap around happens and we end up returning > invalid pointer. > > Fixes: 78ce248faa3c ("efi: Iterate over efi.memmap in > for_each_efi_memory_desc()") > Signed-off-by: Vitaly Kuznetsov > --- > include/linux/efi.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/efi.h b/include/linux/efi.h > index c2db3ca..229ccb5 100644 > --- a/include/linux/efi.h > +++ b/include/linux/efi.h > @@ -1005,7 +1005,8 @@ extern int efi_memattr_apply_permissions(struct > mm_struct *mm, > /* Iterate through an efi_memory_map */ > #define for_each_efi_memory_desc_in_map(m, md) >\ > for ((md) = (m)->map; \ > - (md) <= (efi_memory_desc_t *)((m)->map_end - (m)->desc_size); \ > + (efi_memory_desc_t *)((md) + (m)->desc_size) <= \ > + (efi_memory_desc_t *)(m)->map_end;\ >(md) = (void *)(md) + (m)->desc_size) Curse C type casting! You're adding m->desc_size to a (efi_memory_desc_t *) which is 40 bytes. You need the below to avoid breaking regular EFI boot. But yeah, this looks like a fine fix. Mark, any concerns? --- diff --git a/include/linux/efi.h b/include/linux/efi.h index d8ab480b1089..3a3f8d8bd9be 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1026,7 +1026,7 @@ extern int efi_memattr_apply_permissions(struct mm_struct *mm, /* Iterate through an efi_memory_map */ #define for_each_efi_memory_desc_in_map(m, md)\ for ((md) = (m)->map; \ -(md) <= (efi_memory_desc_t *)((m)->map_end - (m)->desc_size); \ +((void *)(md) + (m)->desc_size) <= (m)->map_end; \ (md) = (void *)(md) + (m)->desc_size) /**
Re: [RFC PATCH v1 10/18] x86/efi: Access EFI related tables in the clear
On Tue, 24 May, at 09:54:31AM, Tom Lendacky wrote: > > I looked into this and this would be a large change also to parse tables > and build lists. It occurred to me that this could all be taken care of > if the early_memremap calls were changed to early_ioremap calls. Looking > in the git log I see that they were originally early_ioremap calls but > were changed to early_memremap calls with this commit: > > commit abc93f8eb6e4 ("efi: Use early_mem*() instead of early_io*()") > > Looking at the early_memremap code and the early_ioremap code they both > call __early_ioremap so I don't see how this change makes any > difference (especially since FIXMAP_PAGE_NORMAL and FIXMAP_PAGE_IO are > identical in this case). > > Is it safe to change these back to early_ioremap calls (at least on > x86)? I really don't want to begin mixing early_ioremap() calls and early_memremap() calls for any of the EFI code if it can be avoided. There is slow but steady progress to move more and more of the architecture specific EFI code out into generic code. Swapping early_memremap() for early_ioremap() would be a step backwards, because FIXMAP_PAGE_NORMAL and FIXMAP_PAGE_IO are not identical on ARM/arm64. Could you point me at the patch that in this series that fixes up early_ioremap() to work with mem encrypt/decrypt? I took another (quick) look through but couldn't find it.
Re: [RFC PATCH v1 10/18] x86/efi: Access EFI related tables in the clear
On Tue, 24 May, at 09:54:31AM, Tom Lendacky wrote: > > I looked into this and this would be a large change also to parse tables > and build lists. It occurred to me that this could all be taken care of > if the early_memremap calls were changed to early_ioremap calls. Looking > in the git log I see that they were originally early_ioremap calls but > were changed to early_memremap calls with this commit: > > commit abc93f8eb6e4 ("efi: Use early_mem*() instead of early_io*()") > > Looking at the early_memremap code and the early_ioremap code they both > call __early_ioremap so I don't see how this change makes any > difference (especially since FIXMAP_PAGE_NORMAL and FIXMAP_PAGE_IO are > identical in this case). > > Is it safe to change these back to early_ioremap calls (at least on > x86)? I really don't want to begin mixing early_ioremap() calls and early_memremap() calls for any of the EFI code if it can be avoided. There is slow but steady progress to move more and more of the architecture specific EFI code out into generic code. Swapping early_memremap() for early_ioremap() would be a step backwards, because FIXMAP_PAGE_NORMAL and FIXMAP_PAGE_IO are not identical on ARM/arm64. Could you point me at the patch that in this series that fixes up early_ioremap() to work with mem encrypt/decrypt? I took another (quick) look through but couldn't find it.
Re: [GIT PULL] EFI fix
On Mon, 16 May, at 01:05:45PM, Linus Torvalds wrote: > > I think the right fix is to just get rid of that silly conditional > frame pointer thing, and always use frame pointers in this stub > function. And then we don't need that (odd) load to get the old stack > pointer into %rax - we can just use the frame pointer. > > Something like the attached completely untested patch. Linus, are you going to apply your patch directly or would you prefer someone to send a pull request which also includes Josh's patch for arch/x86/entry/thunk_64.S?
Re: [GIT PULL] EFI fix
On Mon, 16 May, at 01:05:45PM, Linus Torvalds wrote: > > I think the right fix is to just get rid of that silly conditional > frame pointer thing, and always use frame pointers in this stub > function. And then we don't need that (odd) load to get the old stack > pointer into %rax - we can just use the frame pointer. > > Something like the attached completely untested patch. Linus, are you going to apply your patch directly or would you prefer someone to send a pull request which also includes Josh's patch for arch/x86/entry/thunk_64.S?
Re: [PATCH 2/3] sched,fair: Fix local starvation
On Tue, 10 May, at 07:43:16PM, Peter Zijlstra wrote: > Mike reported that the recent commit 3a47d5124a95 ("sched/fair: Fix > fairness issue on migration") broke interactivity and the signal > starve test. > > The problem is that I assumed ENQUEUE_WAKING was only set when we do a > cross-cpu wakeup (migration), which isn't true. This means we now > destroy the vruntime history of tasks and wakeup-preemption suffers. > > Cure this by making my assumption true, only call > sched_class::task_waking() when we do a cross-cpu wakeup. This avoids > the indirect call in the case we do a local wakeup. > > Cc: Pavan Kondeti <pkond...@codeaurora.org> > Cc: Ben Segall <bseg...@google.com> > Cc: Matt Fleming <m...@codeblueprint.co.uk> > Cc: Morten Rasmussen <morten.rasmus...@arm.com> > Cc: Paul Turner <p...@google.com> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: byungchul.p...@lge.com > Cc: Andrew Hunter <a...@google.com> > Fixes: 3a47d5124a95 ("sched/fair: Fix fairness issue on migration") > Reported-by: Mike Galbraith <mgalbra...@suse.de> > Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> > --- > kernel/sched/core.c | 29 + > kernel/sched/fair.c | 41 - > 2 files changed, 61 insertions(+), 9 deletions(-) This patch appears to cause a regression for hackbench -pipe of between ~8% and ~10% with groups >= NR_CPU. I haven't probed much yet, but it looks like the vruntime of tasks has gone nuts.
Re: [PATCH 2/3] sched,fair: Fix local starvation
On Tue, 10 May, at 07:43:16PM, Peter Zijlstra wrote: > Mike reported that the recent commit 3a47d5124a95 ("sched/fair: Fix > fairness issue on migration") broke interactivity and the signal > starve test. > > The problem is that I assumed ENQUEUE_WAKING was only set when we do a > cross-cpu wakeup (migration), which isn't true. This means we now > destroy the vruntime history of tasks and wakeup-preemption suffers. > > Cure this by making my assumption true, only call > sched_class::task_waking() when we do a cross-cpu wakeup. This avoids > the indirect call in the case we do a local wakeup. > > Cc: Pavan Kondeti > Cc: Ben Segall > Cc: Matt Fleming > Cc: Morten Rasmussen > Cc: Paul Turner > Cc: Thomas Gleixner > Cc: byungchul.p...@lge.com > Cc: Andrew Hunter > Fixes: 3a47d5124a95 ("sched/fair: Fix fairness issue on migration") > Reported-by: Mike Galbraith > Signed-off-by: Peter Zijlstra (Intel) > --- > kernel/sched/core.c | 29 + > kernel/sched/fair.c | 41 - > 2 files changed, 61 insertions(+), 9 deletions(-) This patch appears to cause a regression for hackbench -pipe of between ~8% and ~10% with groups >= NR_CPU. I haven't probed much yet, but it looks like the vruntime of tasks has gone nuts.
Re: [Xen-devel] [GIT PULL] EFI ARM Xen support
On Wed, 18 May, at 12:46:38PM, Ingo Molnar wrote: > > I have no particular objections, since this seems to be Xen-next merged to > the EFI > tree that is already upstream, plus this single commit on top of t: > > 11ee5491e5ff Xen: EFI: Parse DT parameters for Xen specific UEFI > > Which, if Matt is fine with it, you could send to Linus too as part of this > cycle's Xen pull request. Sure, I'm fine with that.
Re: [Xen-devel] [GIT PULL] EFI ARM Xen support
On Wed, 18 May, at 12:46:38PM, Ingo Molnar wrote: > > I have no particular objections, since this seems to be Xen-next merged to > the EFI > tree that is already upstream, plus this single commit on top of t: > > 11ee5491e5ff Xen: EFI: Parse DT parameters for Xen specific UEFI > > Which, if Matt is fine with it, you could send to Linus too as part of this > cycle's Xen pull request. Sure, I'm fine with that.
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Wed, 18 May, at 03:01:27AM, Yuyang Du wrote: > On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote: > > So, if the code looks like the following, either now or in the future, > > > > static void __schedule(bool preempt) > > { > > ... > > /* Clear RQCF_ACT_SKIP */ > > rq->clock_update_flags = 0; > > ... > > delta = rq_clock(); > > } > > Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it, > you clear everything. That was sloppy on my part but intentional because that's what the code looks like in tip/sched/core today. It was purely meant to demonstrate that setting RQCF_UPDATE just because RQCF_ACT_SKIP is set does not make sense. You can replace the clearing line with the correct bit masking operation. But I get it, the pseudo-code was confusing. I'll send out a v2. > And if you clear the RQCF_UPDATE also (maybe you shouldn't, but > actually it does not matter), of course you will get a warning... Right, I wouldn't actually clear RQCF_UPDATE in v2 of this patch. > In addition, it looks like multiple skips are possible, so: I'm not sure what you mean, could you elaborate? > update_rq_clock() { > rq->clock_update_flags |= RQCF_UPDATE; > > ... > } > > instead of clearing the skip flag there. Huh? RQCF_*_SKIP are not cleared in update_rq_clock().
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Wed, 18 May, at 03:01:27AM, Yuyang Du wrote: > On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote: > > So, if the code looks like the following, either now or in the future, > > > > static void __schedule(bool preempt) > > { > > ... > > /* Clear RQCF_ACT_SKIP */ > > rq->clock_update_flags = 0; > > ... > > delta = rq_clock(); > > } > > Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it, > you clear everything. That was sloppy on my part but intentional because that's what the code looks like in tip/sched/core today. It was purely meant to demonstrate that setting RQCF_UPDATE just because RQCF_ACT_SKIP is set does not make sense. You can replace the clearing line with the correct bit masking operation. But I get it, the pseudo-code was confusing. I'll send out a v2. > And if you clear the RQCF_UPDATE also (maybe you shouldn't, but > actually it does not matter), of course you will get a warning... Right, I wouldn't actually clear RQCF_UPDATE in v2 of this patch. > In addition, it looks like multiple skips are possible, so: I'm not sure what you mean, could you elaborate? > update_rq_clock() { > rq->clock_update_flags |= RQCF_UPDATE; > > ... > } > > instead of clearing the skip flag there. Huh? RQCF_*_SKIP are not cleared in update_rq_clock().
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Tue, 17 May, at 04:11:09AM, Yuyang Du wrote: > On Mon, May 16, 2016 at 10:46:38AM +0100, Matt Fleming wrote: > > > > No because if someone calls rq_clock() immediately after __schedule(), > > or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we > > should trigger a warning since the clock has not actually been > > updated. > > Well, I don't know how concurrent it can be, but aren't both update > and read synchronized by rq->lock? So I don't understand the latter > case, and the former should be addressed (missing its own update?). I'm not talking about concurrency; when I said "someone" above, I was referring to code. So, if the code looks like the following, either now or in the future, static void __schedule(bool preempt) { ... /* Clear RQCF_ACT_SKIP */ rq->clock_update_flags = 0; ... delta = rq_clock(); } I would expect to see a warning triggered, because we've read the rq clock outside of the code area where we know it's safe to do so without a clock update. The solution for that bug may be as simple as rearranging the code, delta = rq_clock(); ... rq->clock_update_flags = 0; but we definitely want to catch such bugs in the first instance.
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Tue, 17 May, at 04:11:09AM, Yuyang Du wrote: > On Mon, May 16, 2016 at 10:46:38AM +0100, Matt Fleming wrote: > > > > No because if someone calls rq_clock() immediately after __schedule(), > > or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we > > should trigger a warning since the clock has not actually been > > updated. > > Well, I don't know how concurrent it can be, but aren't both update > and read synchronized by rq->lock? So I don't understand the latter > case, and the former should be addressed (missing its own update?). I'm not talking about concurrency; when I said "someone" above, I was referring to code. So, if the code looks like the following, either now or in the future, static void __schedule(bool preempt) { ... /* Clear RQCF_ACT_SKIP */ rq->clock_update_flags = 0; ... delta = rq_clock(); } I would expect to see a warning triggered, because we've read the rq clock outside of the code area where we know it's safe to do so without a clock update. The solution for that bug may be as simple as rearranging the code, delta = rq_clock(); ... rq->clock_update_flags = 0; but we definitely want to catch such bugs in the first instance.
Re: [PATCH 1/2] Create UV efi_call macros
On Mon, 16 May, at 05:58:40PM, Alex Thorlton wrote: > > I was simply re-using the efi_call implementation. Boris suggested that > I re-write this using the efi_call_virt macro, so I just went with that. > It all seems to work just fine, so I don't see much reason to stray away > from that implementation. That being said, I'm obviously not a huge fun > of the code duplication across the macros. I think there's probably a > way to minimize this, though I haven't quite worked out the best method > yet (ideas are welcome :) The reason I'm pressing for details is that we have a related issue with the EFI thunking code (CONFIG_EFI_MIXED), where the function pointer we want to call isn't accessible via the EFI System Table, see efi_thunk(). Well, technically it *is* accessible, you just can't dereference the services at runtime because the pointers in the tables are not 64-bit. But the same constraints exist for EFI thunk and UV code; given a function pointer to execute that isn't in efi.systab, setup the EFI runtime environment and call a custom ABI function. I haven't tested this (or thought through all the implications), but could you look at providing a table (or something) for mapping a function name to a ptr,func pair, e.g. thunk_get_time: runtime_services32(get_time), efi64_thunk thunk_set_time: runtime_services32(set_time), efi64_thunk ... uv_call_func: efi.uv_systab->function, uv_efi_call_virt which we could use in arch_efi_call_virt()? That should give us much less code duplication and hide all this inside arch/x86.
Re: [PATCH 1/2] Create UV efi_call macros
On Mon, 16 May, at 05:58:40PM, Alex Thorlton wrote: > > I was simply re-using the efi_call implementation. Boris suggested that > I re-write this using the efi_call_virt macro, so I just went with that. > It all seems to work just fine, so I don't see much reason to stray away > from that implementation. That being said, I'm obviously not a huge fun > of the code duplication across the macros. I think there's probably a > way to minimize this, though I haven't quite worked out the best method > yet (ideas are welcome :) The reason I'm pressing for details is that we have a related issue with the EFI thunking code (CONFIG_EFI_MIXED), where the function pointer we want to call isn't accessible via the EFI System Table, see efi_thunk(). Well, technically it *is* accessible, you just can't dereference the services at runtime because the pointers in the tables are not 64-bit. But the same constraints exist for EFI thunk and UV code; given a function pointer to execute that isn't in efi.systab, setup the EFI runtime environment and call a custom ABI function. I haven't tested this (or thought through all the implications), but could you look at providing a table (or something) for mapping a function name to a ptr,func pair, e.g. thunk_get_time: runtime_services32(get_time), efi64_thunk thunk_set_time: runtime_services32(set_time), efi64_thunk ... uv_call_func: efi.uv_systab->function, uv_efi_call_virt which we could use in arch_efi_call_virt()? That should give us much less code duplication and hide all this inside arch/x86.
Re: [GIT PULL] EFI fix
On Tue, 17 May, at 10:04:34AM, Matt Fleming wrote: > > Now I'm wondering whether other users of FRAME_BEGIN/FRAME_END make > this same mistake. Coccinelle might be able to detect it perhaps. A quick bit of sed turned up the code in arch/x86/entry/entry_64.S, which looks to suffer from the same bug, /* rdi: arg1 ... normal C conventions. rax is saved/restored. */ .macro THUNK name, func, put_ret_addr_in_rdi=0 .globl \name .type \name, @function \name: FRAME_BEGIN /* this one pushes 9 elems, the next one would be %rIP */ pushq %rdi pushq %rsi pushq %rdx pushq %rcx pushq %rax pushq %r8 pushq %r9 pushq %r10 pushq %r11 .if \put_ret_addr_in_rdi /* 9*8(%rsp) is return addr on stack */ movq 9*8(%rsp), %rdi .endif With CONFIG_FRAME_POINTER=y 9*8(%rsp) is actually the value of %rbp on entry, not the return address.
Re: [GIT PULL] EFI fix
On Tue, 17 May, at 10:04:34AM, Matt Fleming wrote: > > Now I'm wondering whether other users of FRAME_BEGIN/FRAME_END make > this same mistake. Coccinelle might be able to detect it perhaps. A quick bit of sed turned up the code in arch/x86/entry/entry_64.S, which looks to suffer from the same bug, /* rdi: arg1 ... normal C conventions. rax is saved/restored. */ .macro THUNK name, func, put_ret_addr_in_rdi=0 .globl \name .type \name, @function \name: FRAME_BEGIN /* this one pushes 9 elems, the next one would be %rIP */ pushq %rdi pushq %rsi pushq %rdx pushq %rcx pushq %rax pushq %r8 pushq %r9 pushq %r10 pushq %r11 .if \put_ret_addr_in_rdi /* 9*8(%rsp) is return addr on stack */ movq 9*8(%rsp), %rdi .endif With CONFIG_FRAME_POINTER=y 9*8(%rsp) is actually the value of %rbp on entry, not the return address.
Re: [GIT PULL] EFI fix
On Mon, 16 May, at 01:05:45PM, Linus Torvalds wrote: > > So that whole 8-vs-16 offset confusion depends on the frame pointer! > If frame pointers were enabled, it will be 16. If they weren't, it > will be 8. That patch that changes it from 8 to 16 will just move the > bug around. Before, it was correct when frame pointers were disabled > and buggy otherwise. Now, it's correct if frame pointers are enabled, > and buggy otherwise. Urgh, right. We didn't always use frame pointers in efi_call(), they were introduced in commit 779c433b8ea5c ("x86/asm/efi: Create a stack frame in efi_call()") earlier this year to stop objtool complaining. I admit I totally missed the part about pulling arguments off the stack when I reviewed that patch. > I may be missing something, but I think that commit is pure garbage. You're correct. > I think the right fix is to just get rid of that silly conditional > frame pointer thing, and always use frame pointers in this stub > function. And then we don't need that (odd) load to get the old stack > pointer into %rax - we can just use the frame pointer. > > Something like the attached completely untested patch. Looks good to me, but I haven't tested it. Now I'm wondering whether other users of FRAME_BEGIN/FRAME_END make this same mistake. Coccinelle might be able to detect it perhaps.
Re: [GIT PULL] EFI fix
On Mon, 16 May, at 01:05:45PM, Linus Torvalds wrote: > > So that whole 8-vs-16 offset confusion depends on the frame pointer! > If frame pointers were enabled, it will be 16. If they weren't, it > will be 8. That patch that changes it from 8 to 16 will just move the > bug around. Before, it was correct when frame pointers were disabled > and buggy otherwise. Now, it's correct if frame pointers are enabled, > and buggy otherwise. Urgh, right. We didn't always use frame pointers in efi_call(), they were introduced in commit 779c433b8ea5c ("x86/asm/efi: Create a stack frame in efi_call()") earlier this year to stop objtool complaining. I admit I totally missed the part about pulling arguments off the stack when I reviewed that patch. > I may be missing something, but I think that commit is pure garbage. You're correct. > I think the right fix is to just get rid of that silly conditional > frame pointer thing, and always use frame pointers in this stub > function. And then we don't need that (odd) load to get the old stack > pointer into %rax - we can just use the frame pointer. > > Something like the attached completely untested patch. Looks good to me, but I haven't tested it. Now I'm wondering whether other users of FRAME_BEGIN/FRAME_END make this same mistake. Coccinelle might be able to detect it perhaps.
Re: [PATCH] ftrace/x86: Fix function graph tracer reset path
On Mon, 16 May, at 09:58:33AM, Steven Rostedt wrote: > > Can we solve this by doing the same thing it did for the kernel? > > -- Steve > > diff --git a/arch/x86/kernel/mcount_64.S b/arch/x86/kernel/mcount_64.S > index ed48a9f465f8..e13a695c3084 100644 > --- a/arch/x86/kernel/mcount_64.S > +++ b/arch/x86/kernel/mcount_64.S > @@ -182,7 +182,8 @@ GLOBAL(ftrace_graph_call) > jmp ftrace_stub > #endif > > -GLOBAL(ftrace_stub) > +/* This is weak to keep gas from relaxing the jumps */ > +WEAK(ftrace_stub) > retq > END(ftrace_caller) Works for me. Tested-by: Matt Fleming <m...@codeblueprint.co.uk>
Re: [PATCH] ftrace/x86: Fix function graph tracer reset path
On Mon, 16 May, at 09:58:33AM, Steven Rostedt wrote: > > Can we solve this by doing the same thing it did for the kernel? > > -- Steve > > diff --git a/arch/x86/kernel/mcount_64.S b/arch/x86/kernel/mcount_64.S > index ed48a9f465f8..e13a695c3084 100644 > --- a/arch/x86/kernel/mcount_64.S > +++ b/arch/x86/kernel/mcount_64.S > @@ -182,7 +182,8 @@ GLOBAL(ftrace_graph_call) > jmp ftrace_stub > #endif > > -GLOBAL(ftrace_stub) > +/* This is weak to keep gas from relaxing the jumps */ > +WEAK(ftrace_stub) > retq > END(ftrace_caller) Works for me. Tested-by: Matt Fleming
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Sun, 15 May, at 10:14:39AM, Yuyang Du wrote: > Hi Matt, > > Thanks for Ccing me. > > I am indeed interested, because I recently encountered an rq clock > issue, which is that the clock jumps about 200ms when I was > experimenting the "flat util hierarchy" patches, which really annoyed > me, and I had to stop to figure out what is wrong (but haven't yet > figured out ;)) > > First, this patchset does not solve my problem, but never mind, by > reviewing your patches, I have some comments: Thanks for the review. One gap that this patch series doesn't address is that some callers of update_rq_clock() do not pin rq->lock, which makes the diagnostic checks useless in that case. I plan on handling that next, but I wanted to get this series out as soon as possible for review. > On Thu, May 12, 2016 at 08:49:53PM +0100, Matt Fleming wrote: > > > > - rq->clock_skip_update = 0; > > + /* Clear ACT, preserve everything else */ > > + rq->clock_update_flags ^= RQCF_ACT_SKIP; > > The comment says "Clear ACT", but this is really xor, and I am not sure > this is even what you want. Urgh, you're right. I'm not sure what I was thinking when I wrote that. > In addition, would it be simpler to do this? > > update_rq_clock() > if (flags & RQCF_ACT_SKIP) > flags <<= 1; /* effective skip is an update */ > return; > > flags = RQCF_UPDATED; No because if someone calls rq_clock() immediately after __schedule(), or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we should trigger a warning since the clock has not actually been updated.
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Sun, 15 May, at 10:14:39AM, Yuyang Du wrote: > Hi Matt, > > Thanks for Ccing me. > > I am indeed interested, because I recently encountered an rq clock > issue, which is that the clock jumps about 200ms when I was > experimenting the "flat util hierarchy" patches, which really annoyed > me, and I had to stop to figure out what is wrong (but haven't yet > figured out ;)) > > First, this patchset does not solve my problem, but never mind, by > reviewing your patches, I have some comments: Thanks for the review. One gap that this patch series doesn't address is that some callers of update_rq_clock() do not pin rq->lock, which makes the diagnostic checks useless in that case. I plan on handling that next, but I wanted to get this series out as soon as possible for review. > On Thu, May 12, 2016 at 08:49:53PM +0100, Matt Fleming wrote: > > > > - rq->clock_skip_update = 0; > > + /* Clear ACT, preserve everything else */ > > + rq->clock_update_flags ^= RQCF_ACT_SKIP; > > The comment says "Clear ACT", but this is really xor, and I am not sure > this is even what you want. Urgh, you're right. I'm not sure what I was thinking when I wrote that. > In addition, would it be simpler to do this? > > update_rq_clock() > if (flags & RQCF_ACT_SKIP) > flags <<= 1; /* effective skip is an update */ > return; > > flags = RQCF_UPDATED; No because if someone calls rq_clock() immediately after __schedule(), or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we should trigger a warning since the clock has not actually been updated.
Re: [PATCH] ftrace/x86: Fix function graph tracer reset path
On Fri, 13 May, at 10:06:10AM, Steven Rostedt wrote: > Matt, > > This bug looks very similar to what you were hitting with the function > profiler. Can you apply this patch and see if it fixes the issue for > you. Yep, this patch fixes it for me. For the record, this is what objdump tells me (with patch applied), 00b6 : b6: eb 03 jmpbb b8: 90 nop b9: 90 nop ba: 90 nop So my toolchain is definitely generating a short jump. This is binutils 2.26. But on one of my other test machines with binutils 2.24 I see this, 00aa : aa: e9 00 00 00 00 jmpq af ab: R_X86_64_PC32 ftrace_stub-0x4 i.e. a near jump.
Re: [PATCH] ftrace/x86: Fix function graph tracer reset path
On Fri, 13 May, at 10:06:10AM, Steven Rostedt wrote: > Matt, > > This bug looks very similar to what you were hitting with the function > profiler. Can you apply this patch and see if it fixes the issue for > you. Yep, this patch fixes it for me. For the record, this is what objdump tells me (with patch applied), 00b6 : b6: eb 03 jmpbb b8: 90 nop b9: 90 nop ba: 90 nop So my toolchain is definitely generating a short jump. This is binutils 2.26. But on one of my other test machines with binutils 2.24 I see this, 00aa : aa: e9 00 00 00 00 jmpq af ab: R_X86_64_PC32 ftrace_stub-0x4 i.e. a near jump.
Re: [PATCH 2/5] efibc: Fix excessive stack footprint warning
On Wed, 11 May, at 05:16:24PM, Jeremy Compostella wrote: > From 3a54e6872e220e1ac4db0eae126a20b5383dae3e Mon Sep 17 00:00:00 2001 > From: Jeremy Compostella> Date: Tue, 10 May 2016 10:34:21 +0200 > Subject: [PATCH] efibc: report more information in the error messages > > Report the name of the EFI variable if the value size is too large or > if efibc_set_variable() fails to allocate the struct efivar_entry > object. If efibc_set_variable() fails because the value size is too > large, it also reports the value size in the error message. > > Signed-off-by: Jeremy Compostella > --- > drivers/firmware/efi/efibc.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) Applied, thanks Jeremy.
Re: [PATCH 2/5] efibc: Fix excessive stack footprint warning
On Wed, 11 May, at 05:16:24PM, Jeremy Compostella wrote: > From 3a54e6872e220e1ac4db0eae126a20b5383dae3e Mon Sep 17 00:00:00 2001 > From: Jeremy Compostella > Date: Tue, 10 May 2016 10:34:21 +0200 > Subject: [PATCH] efibc: report more information in the error messages > > Report the name of the EFI variable if the value size is too large or > if efibc_set_variable() fails to allocate the struct efivar_entry > object. If efibc_set_variable() fails because the value size is too > large, it also reports the value size in the error message. > > Signed-off-by: Jeremy Compostella > --- > drivers/firmware/efi/efibc.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) Applied, thanks Jeremy.
[GIT PULL] EFI ARM Xen support
Folks, Please pull the following branch which contains support for Xen on ARM EFI platforms. This merge includes a merge of Stefano's xen/linux-next branch to pull in the prerequisites required for Shannon's commit: 11ee5491e5ff ("Xen: EFI: Parse DT parameters for Xen specific UEFI") as it needs both the latest changes in the EFI 'next' branch (now tip/efi/core) and xen/linux-next. I have intentionally not included the individual patches as I would normally do in a pull request, so that commit history created by my merging of Stefano's branch is preserved, and so that there's no way to accidentally apply the patches individually. The following changes since commit 6c5450ef66816216e574885cf8d3ddb31ef77428: efivarfs: Make efivarfs_file_ioctl() static (2016-05-07 07:06:13 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git efi/arm-xen for you to fetch changes up to 11ee5491e5ff519e0d3a7018eb21351cb6955a98: Xen: EFI: Parse DT parameters for Xen specific UEFI (2016-05-14 19:31:01 +0100) David Vrabel (1): xen/gntdev: reduce copy batch size to 16 Matt Fleming (1): Merge branch 'xen/linux-next' into efi/arm-xen Shannon Zhao (16): Xen: ACPI: Hide UART used by Xen xen/grant-table: Move xlated_setup_gnttab_pages to common place Xen: xlate: Use page_to_xen_pfn instead of page_to_pfn arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table xen: memory : Add new XENMAPSPACE type XENMAPSPACE_dev_mmio Xen: ARM: Add support for mapping platform device mmio Xen: ARM: Add support for mapping AMBA device mmio Xen: public/hvm: sync changes of HVM_PARAM_CALLBACK_VIA ABI from Xen xen/hvm/params: Add a new delivery type for event-channel in HVM_PARAM_CALLBACK_IRQ arm/xen: Get event-channel irq through HVM_PARAM when booting with ACPI ARM: XEN: Move xen_early_init() before efi_init() ARM: Xen: Document UEFI support on Xen ARM virtual platforms XEN: EFI: Move x86 specific codes to architecture directory ARM64: XEN: Add a function to initialize Xen specific UEFI runtime services FDT: Add a helper to get the subnode by given name Xen: EFI: Parse DT parameters for Xen specific UEFI Stefano Stabellini (1): xen/x86: don't lose event interrupts Documentation/devicetree/bindings/arm/xen.txt | 35 + arch/arm/include/asm/xen/xen-ops.h| 6 + arch/arm/kernel/setup.c | 2 +- arch/arm/xen/Makefile | 1 + arch/arm/xen/efi.c| 40 ++ arch/arm/xen/enlighten.c | 125 arch/arm64/include/asm/xen/xen-ops.h | 6 + arch/arm64/kernel/setup.c | 2 +- arch/arm64/xen/Makefile | 1 + arch/x86/xen/efi.c| 111 +++ arch/x86/xen/grant-table.c| 57 +--- arch/x86/xen/time.c | 6 +- drivers/acpi/scan.c | 74 ++ drivers/firmware/efi/arm-runtime.c| 5 + drivers/firmware/efi/efi.c| 81 --- drivers/of/fdt.c | 13 ++ drivers/xen/Kconfig | 2 +- drivers/xen/Makefile | 1 + drivers/xen/arm-device.c | 196 ++ drivers/xen/efi.c | 173 +-- drivers/xen/gntdev.c | 2 +- drivers/xen/xlate_mmu.c | 77 ++ include/linux/of_fdt.h| 2 + include/xen/interface/hvm/params.h| 40 +- include/xen/interface/memory.h| 1 + include/xen/xen-ops.h | 32 +++-- 26 files changed, 840 insertions(+), 251 deletions(-) create mode 100644 arch/arm/include/asm/xen/xen-ops.h create mode 100644 arch/arm/xen/efi.c create mode 100644 arch/arm64/include/asm/xen/xen-ops.h create mode 100644 drivers/xen/arm-device.c
[GIT PULL] EFI ARM Xen support
Folks, Please pull the following branch which contains support for Xen on ARM EFI platforms. This merge includes a merge of Stefano's xen/linux-next branch to pull in the prerequisites required for Shannon's commit: 11ee5491e5ff ("Xen: EFI: Parse DT parameters for Xen specific UEFI") as it needs both the latest changes in the EFI 'next' branch (now tip/efi/core) and xen/linux-next. I have intentionally not included the individual patches as I would normally do in a pull request, so that commit history created by my merging of Stefano's branch is preserved, and so that there's no way to accidentally apply the patches individually. The following changes since commit 6c5450ef66816216e574885cf8d3ddb31ef77428: efivarfs: Make efivarfs_file_ioctl() static (2016-05-07 07:06:13 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git efi/arm-xen for you to fetch changes up to 11ee5491e5ff519e0d3a7018eb21351cb6955a98: Xen: EFI: Parse DT parameters for Xen specific UEFI (2016-05-14 19:31:01 +0100) David Vrabel (1): xen/gntdev: reduce copy batch size to 16 Matt Fleming (1): Merge branch 'xen/linux-next' into efi/arm-xen Shannon Zhao (16): Xen: ACPI: Hide UART used by Xen xen/grant-table: Move xlated_setup_gnttab_pages to common place Xen: xlate: Use page_to_xen_pfn instead of page_to_pfn arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table xen: memory : Add new XENMAPSPACE type XENMAPSPACE_dev_mmio Xen: ARM: Add support for mapping platform device mmio Xen: ARM: Add support for mapping AMBA device mmio Xen: public/hvm: sync changes of HVM_PARAM_CALLBACK_VIA ABI from Xen xen/hvm/params: Add a new delivery type for event-channel in HVM_PARAM_CALLBACK_IRQ arm/xen: Get event-channel irq through HVM_PARAM when booting with ACPI ARM: XEN: Move xen_early_init() before efi_init() ARM: Xen: Document UEFI support on Xen ARM virtual platforms XEN: EFI: Move x86 specific codes to architecture directory ARM64: XEN: Add a function to initialize Xen specific UEFI runtime services FDT: Add a helper to get the subnode by given name Xen: EFI: Parse DT parameters for Xen specific UEFI Stefano Stabellini (1): xen/x86: don't lose event interrupts Documentation/devicetree/bindings/arm/xen.txt | 35 + arch/arm/include/asm/xen/xen-ops.h| 6 + arch/arm/kernel/setup.c | 2 +- arch/arm/xen/Makefile | 1 + arch/arm/xen/efi.c| 40 ++ arch/arm/xen/enlighten.c | 125 arch/arm64/include/asm/xen/xen-ops.h | 6 + arch/arm64/kernel/setup.c | 2 +- arch/arm64/xen/Makefile | 1 + arch/x86/xen/efi.c| 111 +++ arch/x86/xen/grant-table.c| 57 +--- arch/x86/xen/time.c | 6 +- drivers/acpi/scan.c | 74 ++ drivers/firmware/efi/arm-runtime.c| 5 + drivers/firmware/efi/efi.c| 81 --- drivers/of/fdt.c | 13 ++ drivers/xen/Kconfig | 2 +- drivers/xen/Makefile | 1 + drivers/xen/arm-device.c | 196 ++ drivers/xen/efi.c | 173 +-- drivers/xen/gntdev.c | 2 +- drivers/xen/xlate_mmu.c | 77 ++ include/linux/of_fdt.h| 2 + include/xen/interface/hvm/params.h| 40 +- include/xen/interface/memory.h| 1 + include/xen/xen-ops.h | 32 +++-- 26 files changed, 840 insertions(+), 251 deletions(-) create mode 100644 arch/arm/include/asm/xen/xen-ops.h create mode 100644 arch/arm/xen/efi.c create mode 100644 arch/arm64/include/asm/xen/xen-ops.h create mode 100644 drivers/xen/arm-device.c
Re: [PATCH v2] Xen: EFI: Parse DT parameters for Xen specific UEFI
(Including more folks, quoting entire patch) On Thu, 12 May, at 08:19:54PM, Shannon Zhao wrote: > From: Shannon Zhao <shannon.z...@linaro.org> > > The EFI DT parameters for bare metal are located under /chosen node, > while for Xen Dom0 they are located under /hyperviosr/uefi node. These > parameters under /chosen and /hyperviosr/uefi are not expected to appear > at the same time. > Parse these EFI parameters and initialize EFI like the way for bare > metal except the runtime services because the runtime services for Xen > Dom0 are available through hypercalls and they are always enabled. So it > sets the EFI_RUNTIME_SERVICES flag if it finds /hyperviosr/uefi node and > bails out in arm_enable_runtime_services() when EFI_RUNTIME_SERVICES > flag is set already. > > CC: Matt Fleming <m...@codeblueprint.co.uk> > Signed-off-by: Shannon Zhao <shannon.z...@linaro.org> > --- > v2: rebase it on top of EFI and Xen next branch, add some comments to > explain the codes. > --- > arch/arm/xen/enlighten.c | 22 +++ > drivers/firmware/efi/arm-runtime.c | 5 +++ > drivers/firmware/efi/efi.c | 81 > ++ > 3 files changed, 92 insertions(+), 16 deletions(-) This looks good to me. Thanks for all the work Shannon. Stefan, OK, let's figure out how to deal with this. What I've done is merge xen/linux-next into the 'efi/xen-arm' topic branch and applied this patch here, https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/log/?h=efi/arm-xen Yes, the merge includes some Xen patches that are not actually needed for any of the EFI patches, but Linus has complained about picking random commits of a tree to merge. At least this way this is all the Xen linux-next material. If you're OK with this, and assuming no one else has any preferred methods, I'll send out a pull request to tip over the weekend. > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c > index 13e3e9f..d4f36eb 100644 > --- a/arch/arm/xen/enlighten.c > +++ b/arch/arm/xen/enlighten.c > @@ -15,7 +15,9 @@ > #include > #include > #include > +#include > #include > +#include > #include > #include > #include > @@ -261,6 +263,19 @@ static int __init fdt_find_hyper_node(unsigned long > node, const char *uname, > !strncmp(hyper_node.prefix, s, strlen(hyper_node.prefix))) > hyper_node.version = s + strlen(hyper_node.prefix); > > + /* > + * Check if Xen supports EFI by checking whether there is the > + * "/hypervisor/uefi" node in DT. If so, runtime services are available > + * through proxy functions (e.g. in case of Xen dom0 EFI implementation > + * they call special hypercall which executes relevant EFI functions) > + * and that is why they are always enabled. > + */ > + if (IS_ENABLED(CONFIG_XEN_EFI)) { > + if ((of_get_flat_dt_subnode_by_name(node, "uefi") > 0) && > + !efi_runtime_disabled()) > + set_bit(EFI_RUNTIME_SERVICES, ); > + } > + > return 0; > } > > @@ -352,6 +367,13 @@ static int __init xen_guest_init(void) > return -ENODEV; > } > > + /* > + * The fdt parsing codes have set EFI_RUNTIME_SERVICES if Xen EFI > + * parameters are found. Force enable runtime services. > + */ > + if (efi_enabled(EFI_RUNTIME_SERVICES)) > + xen_efi_runtime_setup(); > + > shared_info_page = (struct shared_info *)get_zeroed_page(GFP_KERNEL); > > if (!shared_info_page) { > diff --git a/drivers/firmware/efi/arm-runtime.c > b/drivers/firmware/efi/arm-runtime.c > index 17ccf0a..c394b81 100644 > --- a/drivers/firmware/efi/arm-runtime.c > +++ b/drivers/firmware/efi/arm-runtime.c > @@ -107,6 +107,11 @@ static int __init arm_enable_runtime_services(void) > return 0; > } > > + if (efi_enabled(EFI_RUNTIME_SERVICES)) { > + pr_info("EFI runtime services access via paravirt.\n"); > + return 0; > + } > + > pr_info("Remapping and enabling EFI services.\n"); > > mapsize = efi.memmap.map_end - efi.memmap.map; > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c > index 05509f3..1c6f9dd 100644 > --- a/drivers/firmware/efi/efi.c > +++ b/drivers/firmware/efi/efi.c > @@ -472,12 +472,14 @@ device_initcall(efi_load_efivars); > FIELD_SIZEOF(struct efi_fdt_params, field) \ > } > > -static __initdata struct { > +struct params { > const char name[32]; > const char propname[32]; > int offset; > int size;
Re: [PATCH v2] Xen: EFI: Parse DT parameters for Xen specific UEFI
(Including more folks, quoting entire patch) On Thu, 12 May, at 08:19:54PM, Shannon Zhao wrote: > From: Shannon Zhao > > The EFI DT parameters for bare metal are located under /chosen node, > while for Xen Dom0 they are located under /hyperviosr/uefi node. These > parameters under /chosen and /hyperviosr/uefi are not expected to appear > at the same time. > Parse these EFI parameters and initialize EFI like the way for bare > metal except the runtime services because the runtime services for Xen > Dom0 are available through hypercalls and they are always enabled. So it > sets the EFI_RUNTIME_SERVICES flag if it finds /hyperviosr/uefi node and > bails out in arm_enable_runtime_services() when EFI_RUNTIME_SERVICES > flag is set already. > > CC: Matt Fleming > Signed-off-by: Shannon Zhao > --- > v2: rebase it on top of EFI and Xen next branch, add some comments to > explain the codes. > --- > arch/arm/xen/enlighten.c | 22 +++ > drivers/firmware/efi/arm-runtime.c | 5 +++ > drivers/firmware/efi/efi.c | 81 > ++ > 3 files changed, 92 insertions(+), 16 deletions(-) This looks good to me. Thanks for all the work Shannon. Stefan, OK, let's figure out how to deal with this. What I've done is merge xen/linux-next into the 'efi/xen-arm' topic branch and applied this patch here, https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/log/?h=efi/arm-xen Yes, the merge includes some Xen patches that are not actually needed for any of the EFI patches, but Linus has complained about picking random commits of a tree to merge. At least this way this is all the Xen linux-next material. If you're OK with this, and assuming no one else has any preferred methods, I'll send out a pull request to tip over the weekend. > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c > index 13e3e9f..d4f36eb 100644 > --- a/arch/arm/xen/enlighten.c > +++ b/arch/arm/xen/enlighten.c > @@ -15,7 +15,9 @@ > #include > #include > #include > +#include > #include > +#include > #include > #include > #include > @@ -261,6 +263,19 @@ static int __init fdt_find_hyper_node(unsigned long > node, const char *uname, > !strncmp(hyper_node.prefix, s, strlen(hyper_node.prefix))) > hyper_node.version = s + strlen(hyper_node.prefix); > > + /* > + * Check if Xen supports EFI by checking whether there is the > + * "/hypervisor/uefi" node in DT. If so, runtime services are available > + * through proxy functions (e.g. in case of Xen dom0 EFI implementation > + * they call special hypercall which executes relevant EFI functions) > + * and that is why they are always enabled. > + */ > + if (IS_ENABLED(CONFIG_XEN_EFI)) { > + if ((of_get_flat_dt_subnode_by_name(node, "uefi") > 0) && > + !efi_runtime_disabled()) > + set_bit(EFI_RUNTIME_SERVICES, ); > + } > + > return 0; > } > > @@ -352,6 +367,13 @@ static int __init xen_guest_init(void) > return -ENODEV; > } > > + /* > + * The fdt parsing codes have set EFI_RUNTIME_SERVICES if Xen EFI > + * parameters are found. Force enable runtime services. > + */ > + if (efi_enabled(EFI_RUNTIME_SERVICES)) > + xen_efi_runtime_setup(); > + > shared_info_page = (struct shared_info *)get_zeroed_page(GFP_KERNEL); > > if (!shared_info_page) { > diff --git a/drivers/firmware/efi/arm-runtime.c > b/drivers/firmware/efi/arm-runtime.c > index 17ccf0a..c394b81 100644 > --- a/drivers/firmware/efi/arm-runtime.c > +++ b/drivers/firmware/efi/arm-runtime.c > @@ -107,6 +107,11 @@ static int __init arm_enable_runtime_services(void) > return 0; > } > > + if (efi_enabled(EFI_RUNTIME_SERVICES)) { > + pr_info("EFI runtime services access via paravirt.\n"); > + return 0; > + } > + > pr_info("Remapping and enabling EFI services.\n"); > > mapsize = efi.memmap.map_end - efi.memmap.map; > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c > index 05509f3..1c6f9dd 100644 > --- a/drivers/firmware/efi/efi.c > +++ b/drivers/firmware/efi/efi.c > @@ -472,12 +472,14 @@ device_initcall(efi_load_efivars); > FIELD_SIZEOF(struct efi_fdt_params, field) \ > } > > -static __initdata struct { > +struct params { > const char name[32]; > const char propname[32]; > int offset; > int size; > -} dt_params[] = { > +}; > + > +static __initdata struct params fdt_param
[PATCH] x86/efi: Fix 7th argument to efi_call
From: Alex Thorlton <athorl...@sgi.com> The efi_call assembly code has a slight error that prevents us from using arguments 7 and higher, which will be passed in on the stack. mov (%rsp), %rax mov 8(%rax), %rax ... mov %rax, 40(%rsp) This code goes and grabs the return address for the current stack frame, and puts it on the stack, next to the 5th argument for the EFI runtime call. Considering the fact that having the return address in that position on the stack makes no sense, I'm guessing that the intent of this code was actually to grab an argument off the stack frame for this call and place it into the frame for the next one. The small change to that offset (i.e. 8(%rax) to 16(%rax)) ensures that we grab the 7th argument off the stack, and pass it as the 6th argument to the EFI runtime function that we're about to call. This change gets our EFI runtime calls that need to pass more than 6 arguments working again. SGI/UV is the only platform that passes more than 6 arguments. Signed-off-by: Alex Thorlton <athorl...@sgi.com> Cc: Dimitri Sivanich <sivan...@sgi.com> Cc: Russ Anderson <r...@sgi.com> Cc: Mike Travis <tra...@sgi.com> Cc: Borislav Petkov <b...@suse.de> Cc: Thomas Gleixner <t...@linutronix.de> Cc: Ingo Molnar <mi...@redhat.com> Cc: "H. Peter Anvin" <h...@zytor.com> Cc: x...@kernel.org Cc: linux-...@vger.kernel.org Cc: <sta...@vger.kernel.org> [ Updated changelog. ] Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- arch/x86/platform/efi/efi_stub_64.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S index 92723aeae0f9..62938ffbb9f9 100644 --- a/arch/x86/platform/efi/efi_stub_64.S +++ b/arch/x86/platform/efi/efi_stub_64.S @@ -43,7 +43,7 @@ ENTRY(efi_call) FRAME_BEGIN SAVE_XMM mov (%rsp), %rax - mov 8(%rax), %rax + mov 16(%rax), %rax subq $48, %rsp mov %r9, 32(%rsp) mov %rax, 40(%rsp) -- 2.7.3
[PATCH] x86/efi: Fix 7th argument to efi_call
From: Alex Thorlton The efi_call assembly code has a slight error that prevents us from using arguments 7 and higher, which will be passed in on the stack. mov (%rsp), %rax mov 8(%rax), %rax ... mov %rax, 40(%rsp) This code goes and grabs the return address for the current stack frame, and puts it on the stack, next to the 5th argument for the EFI runtime call. Considering the fact that having the return address in that position on the stack makes no sense, I'm guessing that the intent of this code was actually to grab an argument off the stack frame for this call and place it into the frame for the next one. The small change to that offset (i.e. 8(%rax) to 16(%rax)) ensures that we grab the 7th argument off the stack, and pass it as the 6th argument to the EFI runtime function that we're about to call. This change gets our EFI runtime calls that need to pass more than 6 arguments working again. SGI/UV is the only platform that passes more than 6 arguments. Signed-off-by: Alex Thorlton Cc: Dimitri Sivanich Cc: Russ Anderson Cc: Mike Travis Cc: Borislav Petkov Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: x...@kernel.org Cc: linux-...@vger.kernel.org Cc: [ Updated changelog. ] Signed-off-by: Matt Fleming --- arch/x86/platform/efi/efi_stub_64.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S index 92723aeae0f9..62938ffbb9f9 100644 --- a/arch/x86/platform/efi/efi_stub_64.S +++ b/arch/x86/platform/efi/efi_stub_64.S @@ -43,7 +43,7 @@ ENTRY(efi_call) FRAME_BEGIN SAVE_XMM mov (%rsp), %rax - mov 8(%rax), %rax + mov 16(%rax), %rax subq $48, %rsp mov %r9, 32(%rsp) mov %rax, 40(%rsp) -- 2.7.3
[GIT PULL] EFI urgent fix
The following changes since commit c10fcb14c7afd6688c7b197a814358fecf244222: x86/sysfb_efi: Fix valid BAR address range check (2016-05-05 16:01:00 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git tags/efi-urgent for you to fetch changes up to 6f4c184576aeb5594b8d21f9e7206b7b62e3d96e: x86/efi: Fix 7th argument to efi_call (2016-05-13 21:12:13 +0100) * Fix passing of 7 parameters or more to efi_call. This issue is only triggered on SGI/UV systems - Alex Thorlton Alex Thorlton (1): x86/efi: Fix 7th argument to efi_call arch/x86/platform/efi/efi_stub_64.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[GIT PULL] EFI urgent fix
The following changes since commit c10fcb14c7afd6688c7b197a814358fecf244222: x86/sysfb_efi: Fix valid BAR address range check (2016-05-05 16:01:00 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git tags/efi-urgent for you to fetch changes up to 6f4c184576aeb5594b8d21f9e7206b7b62e3d96e: x86/efi: Fix 7th argument to efi_call (2016-05-13 21:12:13 +0100) * Fix passing of 7 parameters or more to efi_call. This issue is only triggered on SGI/UV systems - Alex Thorlton Alex Thorlton (1): x86/efi: Fix 7th argument to efi_call arch/x86/platform/efi/efi_stub_64.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Re: [PATCH 1/3] MAINTAINERS: Remove asterisk from EFI directory names
On Fri, 13 May, at 08:23:47AM, Joe Perches wrote: > On Fri, 2016-05-13 at 17:11 +0200, Xose Vazquez Perez wrote: > > Matt Fleming wrote: > > > > > > > > Mark reported that having asterisks on the end of directory names > > > confuses get_maintainer.pl when it encounters subdirectories, and > > > that > > If this's a get_maintainer.pl bug, it should also happen in: > > Mark is confused. Sorry, "confuses" was my word, not Mark's. > The meaning of * is all the files in a particular > directory but not any subdirectories. Exactly. This is not what EFI maintainer entries should be using. I am responsible for everything under drivers/firmware/efi, including subdirectories.
Re: [PATCH 1/3] MAINTAINERS: Remove asterisk from EFI directory names
On Fri, 13 May, at 08:23:47AM, Joe Perches wrote: > On Fri, 2016-05-13 at 17:11 +0200, Xose Vazquez Perez wrote: > > Matt Fleming wrote: > > > > > > > > Mark reported that having asterisks on the end of directory names > > > confuses get_maintainer.pl when it encounters subdirectories, and > > > that > > If this's a get_maintainer.pl bug, it should also happen in: > > Mark is confused. Sorry, "confuses" was my word, not Mark's. > The meaning of * is all the files in a particular > directory but not any subdirectories. Exactly. This is not what EFI maintainer entries should be using. I am responsible for everything under drivers/firmware/efi, including subdirectories.
[RFC][PATCH 0/5] sched: Diagnostic checks for missing rq clock updates
There are currently no runtime diagnostic checks for detecting when we have inadvertently missed a call to update_rq_clock() before accessing rq_clock(). The idea in these patches, which came from Peter, is to piggyback on the rq->lock pin/unpin context to detect when we expected (and failed) to see an update to the rq clock. They've already caught a couple of bugs: see patch 1 and commit b52fad2db5d7 ("sched/fair: Update rq clock before updating nohz CPU load") in tip/sched/core. I'm not sure how palatable the s/pin_cookie/rq_flags/ changes will be in patch 2, so I've marked this entire series as RFC. All the diagnostic code is guarded by CONFIG_SCHED_DEBUG, but there are minimal changes to __schedule() in patch 5 for the !SCHED_DEBUG case. Matt Fleming (5): sched/fair: Update the rq clock before detaching tasks sched: Add wrappers for lockdep_(un)pin_lock() sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock sched/fair: Push rq lock pin/unpin into idle_balance() sched/core: Add debug code to catch missing update_rq_clock() kernel/sched/core.c | 94 +++ kernel/sched/deadline.c | 10 ++--- kernel/sched/fair.c | 31 +-- kernel/sched/idle_task.c | 2 +- kernel/sched/rt.c| 6 +-- kernel/sched/sched.h | 101 --- kernel/sched/stop_task.c | 2 +- 7 files changed, 166 insertions(+), 80 deletions(-) -- 2.7.3
[RFC][PATCH 0/5] sched: Diagnostic checks for missing rq clock updates
There are currently no runtime diagnostic checks for detecting when we have inadvertently missed a call to update_rq_clock() before accessing rq_clock(). The idea in these patches, which came from Peter, is to piggyback on the rq->lock pin/unpin context to detect when we expected (and failed) to see an update to the rq clock. They've already caught a couple of bugs: see patch 1 and commit b52fad2db5d7 ("sched/fair: Update rq clock before updating nohz CPU load") in tip/sched/core. I'm not sure how palatable the s/pin_cookie/rq_flags/ changes will be in patch 2, so I've marked this entire series as RFC. All the diagnostic code is guarded by CONFIG_SCHED_DEBUG, but there are minimal changes to __schedule() in patch 5 for the !SCHED_DEBUG case. Matt Fleming (5): sched/fair: Update the rq clock before detaching tasks sched: Add wrappers for lockdep_(un)pin_lock() sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock sched/fair: Push rq lock pin/unpin into idle_balance() sched/core: Add debug code to catch missing update_rq_clock() kernel/sched/core.c | 94 +++ kernel/sched/deadline.c | 10 ++--- kernel/sched/fair.c | 31 +-- kernel/sched/idle_task.c | 2 +- kernel/sched/rt.c| 6 +-- kernel/sched/sched.h | 101 --- kernel/sched/stop_task.c | 2 +- 7 files changed, 166 insertions(+), 80 deletions(-) -- 2.7.3
[RFC][PATCH 2/5] sched: Add wrappers for lockdep_(un)pin_lock()
In preparation for adding diagnostic checks to catch missing calls to update_rq_clock(), provide wrappers for (re)pinning and unpinning rq->lock. Because the pending diagnostic checks allow state to be maintained in rq_flags across pin contexts, swap the 'struct pin_cookie' arguments for 'struct rq_flags *'. Cc: Peter Zijlstra <pet...@infradead.org> Cc: Ingo Molnar <mi...@kernel.org> Cc: Mike Galbraith <umgwanakikb...@gmail.com> Cc: Mel Gorman <mgor...@techsingularity.net> Cc: Thomas Gleixner <t...@linutronix.de> Cc: Frederic Weisbecker <fweis...@gmail.com> Cc: Wanpeng Li <wanpeng...@hotmail.com> Cc: Luca Abeni <luca.ab...@unitn.it> Cc: Yuyang Du <yuyang...@intel.com> Cc: Byungchul Park <byungchul.p...@lge.com> Cc: Rik van Riel <r...@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wyso...@intel.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- kernel/sched/core.c | 80 kernel/sched/deadline.c | 10 +++--- kernel/sched/fair.c | 6 ++-- kernel/sched/idle_task.c | 2 +- kernel/sched/rt.c| 6 ++-- kernel/sched/sched.h | 31 ++- kernel/sched/stop_task.c | 2 +- 7 files changed, 76 insertions(+), 61 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 404c0784b1fc..dfdd8c3c083b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -184,7 +184,7 @@ struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf) rq = task_rq(p); raw_spin_lock(>lock); if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) { - rf->cookie = lockdep_pin_lock(>lock); + rq_pin_lock(rq, rf); return rq; } raw_spin_unlock(>lock); @@ -224,7 +224,7 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf) * pair with the WMB to ensure we must then also see migrating. */ if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) { - rf->cookie = lockdep_pin_lock(>lock); + rq_pin_lock(rq, rf); return rq; } raw_spin_unlock(>lock); @@ -1183,9 +1183,9 @@ static int __set_cpus_allowed_ptr(struct task_struct *p, * OK, since we're going to drop the lock immediately * afterwards anyway. */ - lockdep_unpin_lock(>lock, rf.cookie); + rq_unpin_lock(rq, ); rq = move_queued_task(rq, p, dest_cpu); - lockdep_repin_lock(>lock, rf.cookie); + rq_repin_lock(rq, ); } out: task_rq_unlock(rq, p, ); @@ -1677,7 +1677,7 @@ static inline void ttwu_activate(struct rq *rq, struct task_struct *p, int en_fl * Mark the task runnable and perform wakeup-preemption. */ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags, - struct pin_cookie cookie) + struct rq_flags *rf) { check_preempt_curr(rq, p, wake_flags); p->state = TASK_RUNNING; @@ -1689,9 +1689,9 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags, * Our task @p is fully woken up and running; so its safe to * drop the rq->lock, hereafter rq is only used for statistics. */ - lockdep_unpin_lock(>lock, cookie); + rq_unpin_lock(rq, rf); p->sched_class->task_woken(rq, p); - lockdep_repin_lock(>lock, cookie); + rq_repin_lock(rq, rf); } if (rq->idle_stamp) { @@ -1710,7 +1710,7 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags, static void ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, -struct pin_cookie cookie) +struct rq_flags *rf) { int en_flags = ENQUEUE_WAKEUP; @@ -1725,7 +1725,7 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, #endif ttwu_activate(rq, p, en_flags); - ttwu_do_wakeup(rq, p, wake_flags, cookie); + ttwu_do_wakeup(rq, p, wake_flags, rf); } /* @@ -1744,7 +1744,7 @@ static int ttwu_remote(struct task_struct *p, int wake_flags) if (task_on_rq_queued(p)) { /* check_preempt_curr() may use rq clock */ update_rq_clock(rq); - ttwu_do_wakeup(rq, p, wake_flags, rf.cookie); + ttwu_do_wakeup(rq, p, wake_flags, ); ret = 1; } __task_rq_unlock(rq, ); @@ -1757,15 +1757,15 @@ void sched_ttwu_pending(void) { struct rq *rq = this_rq(); struct
[RFC][PATCH 2/5] sched: Add wrappers for lockdep_(un)pin_lock()
In preparation for adding diagnostic checks to catch missing calls to update_rq_clock(), provide wrappers for (re)pinning and unpinning rq->lock. Because the pending diagnostic checks allow state to be maintained in rq_flags across pin contexts, swap the 'struct pin_cookie' arguments for 'struct rq_flags *'. Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Mike Galbraith Cc: Mel Gorman Cc: Thomas Gleixner Cc: Frederic Weisbecker Cc: Wanpeng Li Cc: Luca Abeni Cc: Yuyang Du Cc: Byungchul Park Cc: Rik van Riel Cc: "Rafael J. Wysocki" Signed-off-by: Matt Fleming --- kernel/sched/core.c | 80 kernel/sched/deadline.c | 10 +++--- kernel/sched/fair.c | 6 ++-- kernel/sched/idle_task.c | 2 +- kernel/sched/rt.c| 6 ++-- kernel/sched/sched.h | 31 ++- kernel/sched/stop_task.c | 2 +- 7 files changed, 76 insertions(+), 61 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 404c0784b1fc..dfdd8c3c083b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -184,7 +184,7 @@ struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf) rq = task_rq(p); raw_spin_lock(>lock); if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) { - rf->cookie = lockdep_pin_lock(>lock); + rq_pin_lock(rq, rf); return rq; } raw_spin_unlock(>lock); @@ -224,7 +224,7 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf) * pair with the WMB to ensure we must then also see migrating. */ if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) { - rf->cookie = lockdep_pin_lock(>lock); + rq_pin_lock(rq, rf); return rq; } raw_spin_unlock(>lock); @@ -1183,9 +1183,9 @@ static int __set_cpus_allowed_ptr(struct task_struct *p, * OK, since we're going to drop the lock immediately * afterwards anyway. */ - lockdep_unpin_lock(>lock, rf.cookie); + rq_unpin_lock(rq, ); rq = move_queued_task(rq, p, dest_cpu); - lockdep_repin_lock(>lock, rf.cookie); + rq_repin_lock(rq, ); } out: task_rq_unlock(rq, p, ); @@ -1677,7 +1677,7 @@ static inline void ttwu_activate(struct rq *rq, struct task_struct *p, int en_fl * Mark the task runnable and perform wakeup-preemption. */ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags, - struct pin_cookie cookie) + struct rq_flags *rf) { check_preempt_curr(rq, p, wake_flags); p->state = TASK_RUNNING; @@ -1689,9 +1689,9 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags, * Our task @p is fully woken up and running; so its safe to * drop the rq->lock, hereafter rq is only used for statistics. */ - lockdep_unpin_lock(>lock, cookie); + rq_unpin_lock(rq, rf); p->sched_class->task_woken(rq, p); - lockdep_repin_lock(>lock, cookie); + rq_repin_lock(rq, rf); } if (rq->idle_stamp) { @@ -1710,7 +1710,7 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags, static void ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, -struct pin_cookie cookie) +struct rq_flags *rf) { int en_flags = ENQUEUE_WAKEUP; @@ -1725,7 +1725,7 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, #endif ttwu_activate(rq, p, en_flags); - ttwu_do_wakeup(rq, p, wake_flags, cookie); + ttwu_do_wakeup(rq, p, wake_flags, rf); } /* @@ -1744,7 +1744,7 @@ static int ttwu_remote(struct task_struct *p, int wake_flags) if (task_on_rq_queued(p)) { /* check_preempt_curr() may use rq clock */ update_rq_clock(rq); - ttwu_do_wakeup(rq, p, wake_flags, rf.cookie); + ttwu_do_wakeup(rq, p, wake_flags, ); ret = 1; } __task_rq_unlock(rq, ); @@ -1757,15 +1757,15 @@ void sched_ttwu_pending(void) { struct rq *rq = this_rq(); struct llist_node *llist = llist_del_all(>wake_list); - struct pin_cookie cookie; struct task_struct *p; unsigned long flags; + struct rq_flags rf; if (!llist) return; raw_spin_lock_irqsave(>lock, flags); - cookie = lockdep_pin_lock(>lock); + rq_pin_lock(rq, ); while (llist) {
[RFC][PATCH 1/5] sched/fair: Update the rq clock before detaching tasks
detach_task_cfs_rq() may indirectly call rq_clock() to inform the cpufreq code that the rq utilisation has changed. In which case, we need to update the rq clock. Cc: Peter Zijlstra <pet...@infradead.org> Cc: Ingo Molnar <mi...@kernel.org> Cc: Mel Gorman <mgor...@techsingularity.net> Cc: Mike Galbraith <umgwanakikb...@gmail.com> Cc: Yuyang Du <yuyang...@intel.com> Cc: Byungchul Park <byungchul.p...@lge.com> Cc: Rik van Riel <r...@redhat.com> Cc: Frederic Weisbecker <fweis...@gmail.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- kernel/sched/fair.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 218f8e83db73..02856647339d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8378,6 +8378,8 @@ static void detach_task_cfs_rq(struct task_struct *p) struct sched_entity *se = >se; struct cfs_rq *cfs_rq = cfs_rq_of(se); + update_rq_clock(task_rq(p)); + if (!vruntime_normalized(p)) { /* * Fix up our vruntime so that the current sleep doesn't -- 2.7.3
[RFC][PATCH 1/5] sched/fair: Update the rq clock before detaching tasks
detach_task_cfs_rq() may indirectly call rq_clock() to inform the cpufreq code that the rq utilisation has changed. In which case, we need to update the rq clock. Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Mel Gorman Cc: Mike Galbraith Cc: Yuyang Du Cc: Byungchul Park Cc: Rik van Riel Cc: Frederic Weisbecker Signed-off-by: Matt Fleming --- kernel/sched/fair.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 218f8e83db73..02856647339d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8378,6 +8378,8 @@ static void detach_task_cfs_rq(struct task_struct *p) struct sched_entity *se = >se; struct cfs_rq *cfs_rq = cfs_rq_of(se); + update_rq_clock(task_rq(p)); + if (!vruntime_normalized(p)) { /* * Fix up our vruntime so that the current sleep doesn't -- 2.7.3
[RFC][PATCH 3/5] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock
rq_clock() is called from sched_info_{depart,arrive}() after resetting RQCF_ACT_SKIP but prior to a call to update_rq_clock(). In preparation for pending patches that check whether the rq clock has been updated inside of a pin context before rq_clock() is called, move the reset of rq->clock_skip_update immediately before unpinning the rq lock. This will avoid the new warnings which check if update_rq_clock() is being actively skipped. Cc: Peter Zijlstra <pet...@infradead.org> Cc: Ingo Molnar <mi...@kernel.org> Cc: Mike Galbraith <umgwanakikb...@gmail.com> Cc: Mel Gorman <mgor...@techsingularity.net> Cc: Thomas Gleixner <t...@linutronix.de> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- kernel/sched/core.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index dfdd8c3c083b..d2112c268fd1 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2824,6 +2824,9 @@ context_switch(struct rq *rq, struct task_struct *prev, prev->active_mm = NULL; rq->prev_mm = oldmm; } + + rq->clock_skip_update = 0; + /* * Since the runqueue lock will be released by the next * task (which is an invalid locking op but in the case @@ -3316,7 +3319,6 @@ static void __sched notrace __schedule(bool preempt) next = pick_next_task(rq, prev, ); clear_tsk_need_resched(prev); clear_preempt_need_resched(); - rq->clock_skip_update = 0; if (likely(prev != next)) { rq->nr_switches++; @@ -3326,6 +3328,7 @@ static void __sched notrace __schedule(bool preempt) trace_sched_switch(preempt, prev, next); rq = context_switch(rq, prev, next, ); /* unlocks the rq */ } else { + rq->clock_skip_update = 0; rq_unpin_lock(rq, ); raw_spin_unlock_irq(>lock); } -- 2.7.3
[RFC][PATCH 3/5] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock
rq_clock() is called from sched_info_{depart,arrive}() after resetting RQCF_ACT_SKIP but prior to a call to update_rq_clock(). In preparation for pending patches that check whether the rq clock has been updated inside of a pin context before rq_clock() is called, move the reset of rq->clock_skip_update immediately before unpinning the rq lock. This will avoid the new warnings which check if update_rq_clock() is being actively skipped. Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Mike Galbraith Cc: Mel Gorman Cc: Thomas Gleixner Signed-off-by: Matt Fleming --- kernel/sched/core.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index dfdd8c3c083b..d2112c268fd1 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2824,6 +2824,9 @@ context_switch(struct rq *rq, struct task_struct *prev, prev->active_mm = NULL; rq->prev_mm = oldmm; } + + rq->clock_skip_update = 0; + /* * Since the runqueue lock will be released by the next * task (which is an invalid locking op but in the case @@ -3316,7 +3319,6 @@ static void __sched notrace __schedule(bool preempt) next = pick_next_task(rq, prev, ); clear_tsk_need_resched(prev); clear_preempt_need_resched(); - rq->clock_skip_update = 0; if (likely(prev != next)) { rq->nr_switches++; @@ -3326,6 +3328,7 @@ static void __sched notrace __schedule(bool preempt) trace_sched_switch(preempt, prev, next); rq = context_switch(rq, prev, next, ); /* unlocks the rq */ } else { + rq->clock_skip_update = 0; rq_unpin_lock(rq, ); raw_spin_unlock_irq(>lock); } -- 2.7.3
[RFC][PATCH 4/5] sched/fair: Push rq lock pin/unpin into idle_balance()
Future patches will emit warnings if rq_clock() is called before update_rq_clock() inside a rq_pin_lock()/rq_unpin_lock() pair. Since there is only one caller of idle_balance() we can push the unpin/repin there. Cc: Peter Zijlstra <pet...@infradead.org> Cc: Ingo Molnar <mi...@kernel.org> Cc: Mike Galbraith <umgwanakikb...@gmail.com> Cc: Mel Gorman <mgor...@techsingularity.net> Cc: Yuyang Du <yuyang...@intel.com> Cc: Byungchul Park <byungchul.p...@lge.com> Cc: Rik van Riel <r...@redhat.com> Cc: Frederic Weisbecker <fweis...@gmail.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- kernel/sched/fair.c | 27 +++ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7f776a99bde0..217e3a9d78db 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3095,7 +3095,7 @@ static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq) return cfs_rq->avg.load_avg; } -static int idle_balance(struct rq *this_rq); +static int idle_balance(struct rq *this_rq, struct rq_flags *rf); #else /* CONFIG_SMP */ @@ -3118,7 +3118,7 @@ attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {} static inline void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {} -static inline int idle_balance(struct rq *rq) +static inline int idle_balance(struct rq *rq, struct rq_flags *rf) { return 0; } @@ -5701,15 +5701,8 @@ simple: return p; idle: - /* -* This is OK, because current is on_cpu, which avoids it being picked -* for load-balance and preemption/IRQs are still disabled avoiding -* further scheduler activity on it and we're being very careful to -* re-start the picking loop. -*/ - rq_unpin_lock(rq, rf); - new_tasks = idle_balance(rq); - rq_repin_lock(rq, rf); + new_tasks = idle_balance(rq, rf); + /* * Because idle_balance() releases (and re-acquires) rq->lock, it is * possible for any higher priority task to appear. In that case we @@ -7641,7 +7634,7 @@ update_next_balance(struct sched_domain *sd, int cpu_busy, unsigned long *next_b * idle_balance is called by schedule() if this_cpu is about to become * idle. Attempts to pull tasks from other CPUs. */ -static int idle_balance(struct rq *this_rq) +static int idle_balance(struct rq *this_rq, struct rq_flags *rf) { unsigned long next_balance = jiffies + HZ; int this_cpu = this_rq->cpu; @@ -7655,6 +7648,14 @@ static int idle_balance(struct rq *this_rq) */ this_rq->idle_stamp = rq_clock(this_rq); + /* +* This is OK, because current is on_cpu, which avoids it being picked +* for load-balance and preemption/IRQs are still disabled avoiding +* further scheduler activity on it and we're being very careful to +* re-start the picking loop. +*/ + rq_unpin_lock(this_rq, rf); + if (this_rq->avg_idle < sysctl_sched_migration_cost || !this_rq->rd->overload) { rcu_read_lock(); @@ -7732,6 +7733,8 @@ out: if (pulled_task) this_rq->idle_stamp = 0; + rq_repin_lock(this_rq, rf); + return pulled_task; } -- 2.7.3
[RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
There's no diagnostic checks for figuring out when we've accidentally missed update_rq_clock() calls. Let's add some by piggybacking on the rq_*pin_lock() wrappers. The idea behind the diagnostic checks is that upon pining rq lock the rq clock should be updated, via update_rq_clock(), before anybody reads the clock with rq_clock(). The exception to this rule is when updates have explicitly been disabled with the rq_clock_skip_update() optimisation. There are some functions that only unpin the rq lock in order to grab some other lock and avoid deadlock. In that case we don't need to update the clock again and the previous diagnostic state can be carried over in rq_repin_lock() by saving the state in the rq_flags context. Since this patch adds a new clock update flag and some already exist in rq::clock_skip_update, that field has now been renamed. An attempt has been made to keep the flag manipulation code small and fast since it's used in the heart of the __schedule() fast path. For the !CONFIG_SCHED_DEBUG case the only object code change (other than addresses) is the following change to the two lines to reset RQCF_ACT_SKIP inside of __schedule(), - 41 c7 84 24 f0 08 00movl$0x0,0x8f0(%r12) - 00 00 00 00 00 + 41 83 b4 24 f0 08 00xorl$0x2,0x8f0(%r12) + 00 02 Suggested-by: Peter Zijlstra <pet...@infradead.org> Cc: Ingo Molnar <mi...@kernel.org> Cc: Mike Galbraith <umgwanakikb...@gmail.com> Cc: Mel Gorman <mgor...@techsingularity.net> Cc: Thomas Gleixner <t...@linutronix.de> Cc: Frederic Weisbecker <fweis...@gmail.com> Cc: Yuyang Du <yuyang...@intel.com> Cc: "Rafael J. Wysocki" <rafael.j.wyso...@intel.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- kernel/sched/core.c | 13 +++--- kernel/sched/sched.h | 70 +++- 2 files changed, 73 insertions(+), 10 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d2112c268fd1..0999b3f23dde 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -101,9 +101,12 @@ void update_rq_clock(struct rq *rq) lockdep_assert_held(>lock); - if (rq->clock_skip_update & RQCF_ACT_SKIP) + if (rq->clock_update_flags & RQCF_ACT_SKIP) return; +#ifdef CONFIG_SCHED_DEBUG + rq->clock_update_flags |= RQCF_UPDATED; +#endif delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; if (delta < 0) return; @@ -2825,7 +2828,8 @@ context_switch(struct rq *rq, struct task_struct *prev, rq->prev_mm = oldmm; } - rq->clock_skip_update = 0; + /* Clear ACT, preserve everything else */ + rq->clock_update_flags ^= RQCF_ACT_SKIP; /* * Since the runqueue lock will be released by the next @@ -3287,7 +3291,7 @@ static void __sched notrace __schedule(bool preempt) raw_spin_lock(>lock); rq_pin_lock(rq, ); - rq->clock_skip_update <<= 1; /* promote REQ to ACT */ + rq->clock_update_flags <<= 1; /* promote REQ to ACT */ switch_count = >nivcsw; if (!preempt && prev->state) { @@ -3328,7 +3332,8 @@ static void __sched notrace __schedule(bool preempt) trace_sched_switch(preempt, prev, next); rq = context_switch(rq, prev, next, ); /* unlocks the rq */ } else { - rq->clock_skip_update = 0; + /* Clear ACT, preserve everything else */ + rq->clock_update_flags ^= RQCF_ACT_SKIP; rq_unpin_lock(rq, ); raw_spin_unlock_irq(>lock); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index cbeb830c7ac6..072c020cd8e3 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -628,7 +628,7 @@ struct rq { unsigned long next_balance; struct mm_struct *prev_mm; - unsigned int clock_skip_update; + unsigned int clock_update_flags; u64 clock; u64 clock_task; @@ -735,9 +735,45 @@ static inline u64 __rq_clock_broken(struct rq *rq) return READ_ONCE(rq->clock); } +/* + * rq::clock_update_flags bits + * + * %RQCF_REQ_SKIP - will request skipping of clock update on the next + * call to __schedule(). This is an optimisation to avoid + * neighbouring rq clock updates. + * + * %RQCF_ACT_SKIP - is set from inside of __schedule() when skipping is + * in effect and calls to update_rq_clock() are being ignored. + * + * %RQCF_UPDATED - is a debug flag that indicates whether a call has been + * made to update_rq_clock() since the last time rq::lock was pinned. + * + * If inside of __schedule(), clock_update_flags will have been + * shifted left (a left shift is a cheap operation for the fast path + * to promote %RQCF_REQ_SKIP to %RQCF_ACT_SKIP), so you must use, + * + * if (rq-clock_
[RFC][PATCH 4/5] sched/fair: Push rq lock pin/unpin into idle_balance()
Future patches will emit warnings if rq_clock() is called before update_rq_clock() inside a rq_pin_lock()/rq_unpin_lock() pair. Since there is only one caller of idle_balance() we can push the unpin/repin there. Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Mike Galbraith Cc: Mel Gorman Cc: Yuyang Du Cc: Byungchul Park Cc: Rik van Riel Cc: Frederic Weisbecker Signed-off-by: Matt Fleming --- kernel/sched/fair.c | 27 +++ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7f776a99bde0..217e3a9d78db 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3095,7 +3095,7 @@ static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq) return cfs_rq->avg.load_avg; } -static int idle_balance(struct rq *this_rq); +static int idle_balance(struct rq *this_rq, struct rq_flags *rf); #else /* CONFIG_SMP */ @@ -3118,7 +3118,7 @@ attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {} static inline void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {} -static inline int idle_balance(struct rq *rq) +static inline int idle_balance(struct rq *rq, struct rq_flags *rf) { return 0; } @@ -5701,15 +5701,8 @@ simple: return p; idle: - /* -* This is OK, because current is on_cpu, which avoids it being picked -* for load-balance and preemption/IRQs are still disabled avoiding -* further scheduler activity on it and we're being very careful to -* re-start the picking loop. -*/ - rq_unpin_lock(rq, rf); - new_tasks = idle_balance(rq); - rq_repin_lock(rq, rf); + new_tasks = idle_balance(rq, rf); + /* * Because idle_balance() releases (and re-acquires) rq->lock, it is * possible for any higher priority task to appear. In that case we @@ -7641,7 +7634,7 @@ update_next_balance(struct sched_domain *sd, int cpu_busy, unsigned long *next_b * idle_balance is called by schedule() if this_cpu is about to become * idle. Attempts to pull tasks from other CPUs. */ -static int idle_balance(struct rq *this_rq) +static int idle_balance(struct rq *this_rq, struct rq_flags *rf) { unsigned long next_balance = jiffies + HZ; int this_cpu = this_rq->cpu; @@ -7655,6 +7648,14 @@ static int idle_balance(struct rq *this_rq) */ this_rq->idle_stamp = rq_clock(this_rq); + /* +* This is OK, because current is on_cpu, which avoids it being picked +* for load-balance and preemption/IRQs are still disabled avoiding +* further scheduler activity on it and we're being very careful to +* re-start the picking loop. +*/ + rq_unpin_lock(this_rq, rf); + if (this_rq->avg_idle < sysctl_sched_migration_cost || !this_rq->rd->overload) { rcu_read_lock(); @@ -7732,6 +7733,8 @@ out: if (pulled_task) this_rq->idle_stamp = 0; + rq_repin_lock(this_rq, rf); + return pulled_task; } -- 2.7.3
[RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
There's no diagnostic checks for figuring out when we've accidentally missed update_rq_clock() calls. Let's add some by piggybacking on the rq_*pin_lock() wrappers. The idea behind the diagnostic checks is that upon pining rq lock the rq clock should be updated, via update_rq_clock(), before anybody reads the clock with rq_clock(). The exception to this rule is when updates have explicitly been disabled with the rq_clock_skip_update() optimisation. There are some functions that only unpin the rq lock in order to grab some other lock and avoid deadlock. In that case we don't need to update the clock again and the previous diagnostic state can be carried over in rq_repin_lock() by saving the state in the rq_flags context. Since this patch adds a new clock update flag and some already exist in rq::clock_skip_update, that field has now been renamed. An attempt has been made to keep the flag manipulation code small and fast since it's used in the heart of the __schedule() fast path. For the !CONFIG_SCHED_DEBUG case the only object code change (other than addresses) is the following change to the two lines to reset RQCF_ACT_SKIP inside of __schedule(), - 41 c7 84 24 f0 08 00movl$0x0,0x8f0(%r12) - 00 00 00 00 00 + 41 83 b4 24 f0 08 00xorl$0x2,0x8f0(%r12) + 00 02 Suggested-by: Peter Zijlstra Cc: Ingo Molnar Cc: Mike Galbraith Cc: Mel Gorman Cc: Thomas Gleixner Cc: Frederic Weisbecker Cc: Yuyang Du Cc: "Rafael J. Wysocki" Signed-off-by: Matt Fleming --- kernel/sched/core.c | 13 +++--- kernel/sched/sched.h | 70 +++- 2 files changed, 73 insertions(+), 10 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d2112c268fd1..0999b3f23dde 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -101,9 +101,12 @@ void update_rq_clock(struct rq *rq) lockdep_assert_held(>lock); - if (rq->clock_skip_update & RQCF_ACT_SKIP) + if (rq->clock_update_flags & RQCF_ACT_SKIP) return; +#ifdef CONFIG_SCHED_DEBUG + rq->clock_update_flags |= RQCF_UPDATED; +#endif delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; if (delta < 0) return; @@ -2825,7 +2828,8 @@ context_switch(struct rq *rq, struct task_struct *prev, rq->prev_mm = oldmm; } - rq->clock_skip_update = 0; + /* Clear ACT, preserve everything else */ + rq->clock_update_flags ^= RQCF_ACT_SKIP; /* * Since the runqueue lock will be released by the next @@ -3287,7 +3291,7 @@ static void __sched notrace __schedule(bool preempt) raw_spin_lock(>lock); rq_pin_lock(rq, ); - rq->clock_skip_update <<= 1; /* promote REQ to ACT */ + rq->clock_update_flags <<= 1; /* promote REQ to ACT */ switch_count = >nivcsw; if (!preempt && prev->state) { @@ -3328,7 +3332,8 @@ static void __sched notrace __schedule(bool preempt) trace_sched_switch(preempt, prev, next); rq = context_switch(rq, prev, next, ); /* unlocks the rq */ } else { - rq->clock_skip_update = 0; + /* Clear ACT, preserve everything else */ + rq->clock_update_flags ^= RQCF_ACT_SKIP; rq_unpin_lock(rq, ); raw_spin_unlock_irq(>lock); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index cbeb830c7ac6..072c020cd8e3 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -628,7 +628,7 @@ struct rq { unsigned long next_balance; struct mm_struct *prev_mm; - unsigned int clock_skip_update; + unsigned int clock_update_flags; u64 clock; u64 clock_task; @@ -735,9 +735,45 @@ static inline u64 __rq_clock_broken(struct rq *rq) return READ_ONCE(rq->clock); } +/* + * rq::clock_update_flags bits + * + * %RQCF_REQ_SKIP - will request skipping of clock update on the next + * call to __schedule(). This is an optimisation to avoid + * neighbouring rq clock updates. + * + * %RQCF_ACT_SKIP - is set from inside of __schedule() when skipping is + * in effect and calls to update_rq_clock() are being ignored. + * + * %RQCF_UPDATED - is a debug flag that indicates whether a call has been + * made to update_rq_clock() since the last time rq::lock was pinned. + * + * If inside of __schedule(), clock_update_flags will have been + * shifted left (a left shift is a cheap operation for the fast path + * to promote %RQCF_REQ_SKIP to %RQCF_ACT_SKIP), so you must use, + * + * if (rq-clock_update_flags >= RQCF_UPDATED) + * + * to check if %RQCF_UPADTED is set. It'll never be shifted more than + * one position though, because the next rq_unpin_lock() will shift it + * back. + */ +#define RQCF_REQ_SKIP 0x01 +#define RQCF_ACT_SKIP 0x02 +#define RQCF_U
Re: [PATCH 1/2] Create UV efi_call macros
(Adding author of arch_efi_call code) On Wed, 11 May, at 02:55:44PM, Alex Thorlton wrote: > We need a slightly different macro than the standard efi_call_virt, > since those macros all assume that the function pointer, f, that gets > passed in will live somewhere in efi.systab->runtime. Our EFI function > pointer lives in efi.uv_systab, so we can't use the standard macros out > of the box. Is that true of all EFI services pointers? From reading the SGI/UV code I got the impression that it's possible to call the standard services via efi.systab->runtime but you also need the ability to call these UV bios functions, which are not accessible via efi.systab. Do you actually want the same environment setup and teardown to happen when calling efi.uv_systab ? Or are you simply trying to reuse the efi_call() implementation? > This commit creates some new uv_* macros that are functionally > equivalent to the standard ones, with the exception of allowing us to > use a different function pointer. I figure that we won't want to call > these uv_* in the end (we'll probably want something more generic), but > I thought I would get everyone's thoughts on how we might best reach > that goal instead of just trying to come up with a new implementation on > my own. > > By itself, this commit does get our machines booting, but it needs the > small fix to the efi_call assembly code for our modules to work. Could you provide some more details in the changelog on why your machine doesn't boot without this change? > Signed-off-by: Alex Thorlton <athorl...@sgi.com> > Cc: Dimitri Sivanich <sivan...@sgi.com> > Cc: Russ Anderson <r...@sgi.com> > Cc: Mike Travis <tra...@sgi.com> > Cc: Matt Fleming <m...@codeblueprint.co.uk> > Cc: Borislav Petkov <b...@suse.de> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Ingo Molnar <mi...@redhat.com> > Cc: "H. Peter Anvin" <h...@zytor.com> > Cc: x...@kernel.org > Cc: linux-...@vger.kernel.org > --- > arch/x86/include/asm/efi.h | 3 ++ > arch/x86/platform/uv/bios_uv.c | 3 +- > drivers/firmware/efi/runtime-wrappers.c | 44 +- > include/linux/efi.h | 55 > + > 4 files changed, 60 insertions(+), 45 deletions(-) > > diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h > index 78d1e74..f384047 100644 > --- a/arch/x86/include/asm/efi.h > +++ b/arch/x86/include/asm/efi.h > @@ -84,6 +84,9 @@ struct efi_scratch { > #define arch_efi_call_virt(f, args...) > \ > efi_call((void *)efi.systab->runtime->f, args) \ > > +#define uv_efi_call_virt(f, args...) \ > + efi_call((void *)f, args) \ > + > #define arch_efi_call_virt_teardown() > \ > ({ \ > if (efi_scratch.use_pgd) { \ > diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c > index 815fec6..62a46cb 100644 > --- a/arch/x86/platform/uv/bios_uv.c > +++ b/arch/x86/platform/uv/bios_uv.c > @@ -40,8 +40,7 @@ s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, > u64 a3, u64 a4, u64 a5) >*/ > return BIOS_STATUS_UNIMPLEMENTED; > > - ret = efi_call((void *)__va(tab->function), (u64)which, > - a1, a2, a3, a4, a5); > + ret = uv_call_virt(tab->function, (u64)which, a1, a2, a3, a4, a5); > return ret; > } > EXPORT_SYMBOL_GPL(uv_bios_call); > diff --git a/drivers/firmware/efi/runtime-wrappers.c > b/drivers/firmware/efi/runtime-wrappers.c > index 23bef6b..008a1a3 100644 > --- a/drivers/firmware/efi/runtime-wrappers.c > +++ b/drivers/firmware/efi/runtime-wrappers.c > @@ -22,7 +22,7 @@ > #include > #include > > -static void efi_call_virt_check_flags(unsigned long flags, const char *call) > +void efi_call_virt_check_flags(unsigned long flags, const char *call) > { > unsigned long cur_flags, mismatch; > > @@ -39,48 +39,6 @@ static void efi_call_virt_check_flags(unsigned long flags, > const char *call) > } > > /* > - * Arch code can implement the following three template macros, avoiding > - * reptition for the void/non-void return cases of {__,}efi_call_virt: > - * > - * * arch_efi_call_virt_setup > - * > - *Sets up the environment for the call (e.g. switching page tables, > - *allowing kernel-mode use of floating point, if required). > - * > - * *
Re: [PATCH 1/2] Create UV efi_call macros
(Adding author of arch_efi_call code) On Wed, 11 May, at 02:55:44PM, Alex Thorlton wrote: > We need a slightly different macro than the standard efi_call_virt, > since those macros all assume that the function pointer, f, that gets > passed in will live somewhere in efi.systab->runtime. Our EFI function > pointer lives in efi.uv_systab, so we can't use the standard macros out > of the box. Is that true of all EFI services pointers? From reading the SGI/UV code I got the impression that it's possible to call the standard services via efi.systab->runtime but you also need the ability to call these UV bios functions, which are not accessible via efi.systab. Do you actually want the same environment setup and teardown to happen when calling efi.uv_systab ? Or are you simply trying to reuse the efi_call() implementation? > This commit creates some new uv_* macros that are functionally > equivalent to the standard ones, with the exception of allowing us to > use a different function pointer. I figure that we won't want to call > these uv_* in the end (we'll probably want something more generic), but > I thought I would get everyone's thoughts on how we might best reach > that goal instead of just trying to come up with a new implementation on > my own. > > By itself, this commit does get our machines booting, but it needs the > small fix to the efi_call assembly code for our modules to work. Could you provide some more details in the changelog on why your machine doesn't boot without this change? > Signed-off-by: Alex Thorlton > Cc: Dimitri Sivanich > Cc: Russ Anderson > Cc: Mike Travis > Cc: Matt Fleming > Cc: Borislav Petkov > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: x...@kernel.org > Cc: linux-...@vger.kernel.org > --- > arch/x86/include/asm/efi.h | 3 ++ > arch/x86/platform/uv/bios_uv.c | 3 +- > drivers/firmware/efi/runtime-wrappers.c | 44 +- > include/linux/efi.h | 55 > + > 4 files changed, 60 insertions(+), 45 deletions(-) > > diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h > index 78d1e74..f384047 100644 > --- a/arch/x86/include/asm/efi.h > +++ b/arch/x86/include/asm/efi.h > @@ -84,6 +84,9 @@ struct efi_scratch { > #define arch_efi_call_virt(f, args...) > \ > efi_call((void *)efi.systab->runtime->f, args) \ > > +#define uv_efi_call_virt(f, args...) \ > + efi_call((void *)f, args) \ > + > #define arch_efi_call_virt_teardown() > \ > ({ \ > if (efi_scratch.use_pgd) { \ > diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c > index 815fec6..62a46cb 100644 > --- a/arch/x86/platform/uv/bios_uv.c > +++ b/arch/x86/platform/uv/bios_uv.c > @@ -40,8 +40,7 @@ s64 uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, > u64 a3, u64 a4, u64 a5) >*/ > return BIOS_STATUS_UNIMPLEMENTED; > > - ret = efi_call((void *)__va(tab->function), (u64)which, > - a1, a2, a3, a4, a5); > + ret = uv_call_virt(tab->function, (u64)which, a1, a2, a3, a4, a5); > return ret; > } > EXPORT_SYMBOL_GPL(uv_bios_call); > diff --git a/drivers/firmware/efi/runtime-wrappers.c > b/drivers/firmware/efi/runtime-wrappers.c > index 23bef6b..008a1a3 100644 > --- a/drivers/firmware/efi/runtime-wrappers.c > +++ b/drivers/firmware/efi/runtime-wrappers.c > @@ -22,7 +22,7 @@ > #include > #include > > -static void efi_call_virt_check_flags(unsigned long flags, const char *call) > +void efi_call_virt_check_flags(unsigned long flags, const char *call) > { > unsigned long cur_flags, mismatch; > > @@ -39,48 +39,6 @@ static void efi_call_virt_check_flags(unsigned long flags, > const char *call) > } > > /* > - * Arch code can implement the following three template macros, avoiding > - * reptition for the void/non-void return cases of {__,}efi_call_virt: > - * > - * * arch_efi_call_virt_setup > - * > - *Sets up the environment for the call (e.g. switching page tables, > - *allowing kernel-mode use of floating point, if required). > - * > - * * arch_efi_call_virt > - * > - *Performs the call. The last expression in the macro must be the call > - *itself, allowing the logic to be shared by the void and non-void > - *cases. > - * > - * * arch_efi_call_virt_teardown > - * > -
Re: [PATCH 2/2] Fix efi_call
On Thu, 12 May, at 08:48:35AM, Ingo Molnar wrote: > > * Alex Thorltonwrote: > > > The efi_call assembly code has a slight error that prevents us from > > using arguments 7 and higher, which will be passed in on the stack. > > > > mov (%rsp), %rax > > mov 8(%rax), %rax > > ... > > mov %rax, 40(%rsp) > > > > This code goes and grabs the return address for the current stack frame, > > and puts it on the stack, next the 5th argument for the EFI runtime > > call. Considering the fact that having the return address in that > > position on the stack makes no sense, I'm guessing that the intent of > > this code was actually to grab an argument off the stack frame for this > > call and place it into the frame for the next one. > > > > The small change to that offset (i.e. 8(%rax) to 16(%rax)) ensures that > > we grab the 7th argument off the stack, and pass it as the 6th argument > > to the EFI runtime function that we're about to call. This change gets > > our EFI runtime calls that need to pass more than 6 arguments working > > again. > > I suppose the SGI/UV code is the only one using 7 arguments or more? Might > make > sense to point that out in the changelog. Yeah, I included that info when I applied this patch.
Re: [PATCH 2/2] Fix efi_call
On Thu, 12 May, at 08:48:35AM, Ingo Molnar wrote: > > * Alex Thorlton wrote: > > > The efi_call assembly code has a slight error that prevents us from > > using arguments 7 and higher, which will be passed in on the stack. > > > > mov (%rsp), %rax > > mov 8(%rax), %rax > > ... > > mov %rax, 40(%rsp) > > > > This code goes and grabs the return address for the current stack frame, > > and puts it on the stack, next the 5th argument for the EFI runtime > > call. Considering the fact that having the return address in that > > position on the stack makes no sense, I'm guessing that the intent of > > this code was actually to grab an argument off the stack frame for this > > call and place it into the frame for the next one. > > > > The small change to that offset (i.e. 8(%rax) to 16(%rax)) ensures that > > we grab the 7th argument off the stack, and pass it as the 6th argument > > to the EFI runtime function that we're about to call. This change gets > > our EFI runtime calls that need to pass more than 6 arguments working > > again. > > I suppose the SGI/UV code is the only one using 7 arguments or more? Might > make > sense to point that out in the changelog. Yeah, I included that info when I applied this patch.
Re: [PATCH 2/2] Fix efi_call
On Wed, 11 May, at 02:55:45PM, Alex Thorlton wrote: > The efi_call assembly code has a slight error that prevents us from > using arguments 7 and higher, which will be passed in on the stack. > > mov (%rsp), %rax > mov 8(%rax), %rax > ... > mov %rax, 40(%rsp) > > This code goes and grabs the return address for the current stack frame, > and puts it on the stack, next the 5th argument for the EFI runtime > call. Considering the fact that having the return address in that > position on the stack makes no sense, I'm guessing that the intent of > this code was actually to grab an argument off the stack frame for this > call and place it into the frame for the next one. > > The small change to that offset (i.e. 8(%rax) to 16(%rax)) ensures that > we grab the 7th argument off the stack, and pass it as the 6th argument > to the EFI runtime function that we're about to call. This change gets > our EFI runtime calls that need to pass more than 6 arguments working > again. > > Signed-off-by: Alex Thorlton <athorl...@sgi.com> > Cc: Dimitri Sivanich <sivan...@sgi.com> > Cc: Russ Anderson <r...@sgi.com> > Cc: Mike Travis <tra...@sgi.com> > Cc: Matt Fleming <m...@codeblueprint.co.uk> > Cc: Borislav Petkov <b...@suse.de> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Ingo Molnar <mi...@redhat.com> > Cc: "H. Peter Anvin" <h...@zytor.com> > Cc: x...@kernel.org > Cc: linux-...@vger.kernel.org > --- > arch/x86/platform/efi/efi_stub_64.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/platform/efi/efi_stub_64.S > b/arch/x86/platform/efi/efi_stub_64.S > index 92723ae..62938ff 100644 > --- a/arch/x86/platform/efi/efi_stub_64.S > +++ b/arch/x86/platform/efi/efi_stub_64.S > @@ -43,7 +43,7 @@ ENTRY(efi_call) > FRAME_BEGIN > SAVE_XMM > mov (%rsp), %rax > - mov 8(%rax), %rax > + mov 16(%rax), %rax > subq $48, %rsp > mov %r9, 32(%rsp) > mov %rax, 40(%rsp) Nice. Your fix looks good, so I've put it in the urgent queue and tagged it for stable.
Re: [PATCH 2/2] Fix efi_call
On Wed, 11 May, at 02:55:45PM, Alex Thorlton wrote: > The efi_call assembly code has a slight error that prevents us from > using arguments 7 and higher, which will be passed in on the stack. > > mov (%rsp), %rax > mov 8(%rax), %rax > ... > mov %rax, 40(%rsp) > > This code goes and grabs the return address for the current stack frame, > and puts it on the stack, next the 5th argument for the EFI runtime > call. Considering the fact that having the return address in that > position on the stack makes no sense, I'm guessing that the intent of > this code was actually to grab an argument off the stack frame for this > call and place it into the frame for the next one. > > The small change to that offset (i.e. 8(%rax) to 16(%rax)) ensures that > we grab the 7th argument off the stack, and pass it as the 6th argument > to the EFI runtime function that we're about to call. This change gets > our EFI runtime calls that need to pass more than 6 arguments working > again. > > Signed-off-by: Alex Thorlton > Cc: Dimitri Sivanich > Cc: Russ Anderson > Cc: Mike Travis > Cc: Matt Fleming > Cc: Borislav Petkov > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: x...@kernel.org > Cc: linux-...@vger.kernel.org > --- > arch/x86/platform/efi/efi_stub_64.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/platform/efi/efi_stub_64.S > b/arch/x86/platform/efi/efi_stub_64.S > index 92723ae..62938ff 100644 > --- a/arch/x86/platform/efi/efi_stub_64.S > +++ b/arch/x86/platform/efi/efi_stub_64.S > @@ -43,7 +43,7 @@ ENTRY(efi_call) > FRAME_BEGIN > SAVE_XMM > mov (%rsp), %rax > - mov 8(%rax), %rax > + mov 16(%rax), %rax > subq $48, %rsp > mov %r9, 32(%rsp) > mov %rax, 40(%rsp) Nice. Your fix looks good, so I've put it in the urgent queue and tagged it for stable.
Re: [PATCH 0/3] sched: Fix wakeup preemption regression
On Tue, 10 May, at 07:43:14PM, Peter Zijlstra wrote: > A recent commit caused an interactivity/starvation issue because we wrecked rq > local wakeup preemption. > > These patches rectify this while also (hopefully) keeping the problem that led > to the fault patch fixed. > > Mike, Pavan, could you guys please confirm? FWIW, I took a quick look over these patches and they made sense to me. (I appreciate the comment block above enqueue_entity())
Re: [PATCH 0/3] sched: Fix wakeup preemption regression
On Tue, 10 May, at 07:43:14PM, Peter Zijlstra wrote: > A recent commit caused an interactivity/starvation issue because we wrecked rq > local wakeup preemption. > > These patches rectify this while also (hopefully) keeping the problem that led > to the fault patch fixed. > > Mike, Pavan, could you guys please confirm? FWIW, I took a quick look over these patches and they made sense to me. (I appreciate the comment block above enqueue_entity())
Re: [PATCH 1/3] sched,fair: Move record_wakee()
On Tue, 10 May, at 07:43:15PM, Peter Zijlstra wrote: > Since I want to make ->task_woken() conditional on the task getting > migrated, we cannot use it to call record_wakee(). You mean ->task_waking(), right?
Re: [PATCH 1/3] sched,fair: Move record_wakee()
On Tue, 10 May, at 07:43:15PM, Peter Zijlstra wrote: > Since I want to make ->task_woken() conditional on the task getting > migrated, we cannot use it to call record_wakee(). You mean ->task_waking(), right?
Re: [PATCH] Xen: EFI: Parse DT parameters for Xen specific UEFI
On Thu, 12 May, at 10:22:07AM, Shannon Zhao wrote: > > As said above, I will rebase this patch on top of the EFI next branch. OK thanks. Note that it is not possible to escape merge conflicts, since there are changes in the xen tip tree that are not in the EFI next branch and vice versa. For example these commits from xen/linux-next look relevant, 8e147fcc3ffa ("FDT: Add a helper to get the subnode by given name") 37060935dc04 ("ARM64: XEN: Add a function to initialize Xen specific UEFI runtime services") acb2c923a860 ("XEN: EFI: Move x86 specific codes to architecture directory") 055be2d42408 ("ARM: Xen: Document UEFI support on Xen ARM virtual platforms") 3915fea959b6 ("ARM: XEN: Move xen_early_init() before efi_init()") Linus, Stefano, tip-folks: I'm soliciting opinions on the correct way to handle this inter-tree dependency where we've got changes to EFI code in two separate trees and Shannon wants to write patches on top of both. I'm guessing the best solution is to merge xen/linux-next and efi/next into a topic branch either in the EFI tree or Xen tree, but I want to be cautious of the branch history that will create. (In hindsight all of these change should have probably gone via the EFI tree.)
Re: [PATCH] Xen: EFI: Parse DT parameters for Xen specific UEFI
On Thu, 12 May, at 10:22:07AM, Shannon Zhao wrote: > > As said above, I will rebase this patch on top of the EFI next branch. OK thanks. Note that it is not possible to escape merge conflicts, since there are changes in the xen tip tree that are not in the EFI next branch and vice versa. For example these commits from xen/linux-next look relevant, 8e147fcc3ffa ("FDT: Add a helper to get the subnode by given name") 37060935dc04 ("ARM64: XEN: Add a function to initialize Xen specific UEFI runtime services") acb2c923a860 ("XEN: EFI: Move x86 specific codes to architecture directory") 055be2d42408 ("ARM: Xen: Document UEFI support on Xen ARM virtual platforms") 3915fea959b6 ("ARM: XEN: Move xen_early_init() before efi_init()") Linus, Stefano, tip-folks: I'm soliciting opinions on the correct way to handle this inter-tree dependency where we've got changes to EFI code in two separate trees and Shannon wants to write patches on top of both. I'm guessing the best solution is to merge xen/linux-next and efi/next into a topic branch either in the EFI tree or Xen tree, but I want to be cautious of the branch history that will create. (In hindsight all of these change should have probably gone via the EFI tree.)
Re: [PATCH] Xen: EFI: Parse DT parameters for Xen specific UEFI
On Wed, 11 May, at 09:35:47PM, Shannon Zhao wrote: > > > > Also, why do you need to setup efi.runtime_version here? Isn't that > > done inside uefi_init()? > > > I don't see any codes which setup efi.runtime_version in uefi_init(). Look in the EFI tree, https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/tree/drivers/firmware/efi/arm-init.c?h=next#n120 > > Here is how I would have expected this patch to look: > > > > - Add FDT code to find hypervisor EFI params > > > > - Force enable EFI_RUNTIME_SERVICES for Xen and call > > xen_efi_runtime_setup() inside xen_guest_init() > > > > - Change arm_enable_runtime_services() to handle the case where > > EFI_RUNTIME_SERVICES is already set > > > > Am I misunderstanding some ordering issues? > > Since xen_guest_init() and arm_enable_runtime_services() are both > early_initcall and the calling order is random I think. Urgh, right, I missed that. > And when we call xen_efi_runtime_setup() and setup > efi.runtime_version in xen_guest_init(), the efi.systab might be > NULL. So it needs to map the systanle explicitly before we use > efi.systab. Could you explain why you need to copy efi.runtime_version instead of letting the existing code in uefi_init() set it up? Also, efi.systab isn't used by xen_efi_runtime_setup() as far I can see, not sure why you need that either?
Re: [PATCH] Xen: EFI: Parse DT parameters for Xen specific UEFI
On Wed, 11 May, at 09:35:47PM, Shannon Zhao wrote: > > > > Also, why do you need to setup efi.runtime_version here? Isn't that > > done inside uefi_init()? > > > I don't see any codes which setup efi.runtime_version in uefi_init(). Look in the EFI tree, https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/tree/drivers/firmware/efi/arm-init.c?h=next#n120 > > Here is how I would have expected this patch to look: > > > > - Add FDT code to find hypervisor EFI params > > > > - Force enable EFI_RUNTIME_SERVICES for Xen and call > > xen_efi_runtime_setup() inside xen_guest_init() > > > > - Change arm_enable_runtime_services() to handle the case where > > EFI_RUNTIME_SERVICES is already set > > > > Am I misunderstanding some ordering issues? > > Since xen_guest_init() and arm_enable_runtime_services() are both > early_initcall and the calling order is random I think. Urgh, right, I missed that. > And when we call xen_efi_runtime_setup() and setup > efi.runtime_version in xen_guest_init(), the efi.systab might be > NULL. So it needs to map the systanle explicitly before we use > efi.systab. Could you explain why you need to copy efi.runtime_version instead of letting the existing code in uefi_init() set it up? Also, efi.systab isn't used by xen_efi_runtime_setup() as far I can see, not sure why you need that either?
Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared
On Wed, 11 May, at 07:37:52PM, Peter Zijlstra wrote: > On Wed, May 11, 2016 at 12:55:56PM +0100, Matt Fleming wrote: > > > This breaks my POWER7 box which presumably doesn't have > > SD_SHARE_PKG_RESOURCES, > > > index 978b3ef2d87e..d27153adee4d 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -7920,7 +7920,8 @@ static inline void set_cpu_sd_state_busy(void) > > goto unlock; > > sd->nohz_idle = 0; > > > > - atomic_inc(>shared->nr_busy_cpus); > > + if (sd->shared) > > + atomic_inc(>shared->nr_busy_cpus); > > unlock: > > rcu_read_unlock(); > > } > > > Ah, no, the problem is that while it does have SHARE_PKG_RESOURCES (in > its SMT domain -- SMT threads share all cache after all), I failed to > connect the sched_domain_shared structure for it. > > Does something like this also work? Yep, it does.
Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared
On Wed, 11 May, at 07:37:52PM, Peter Zijlstra wrote: > On Wed, May 11, 2016 at 12:55:56PM +0100, Matt Fleming wrote: > > > This breaks my POWER7 box which presumably doesn't have > > SD_SHARE_PKG_RESOURCES, > > > index 978b3ef2d87e..d27153adee4d 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -7920,7 +7920,8 @@ static inline void set_cpu_sd_state_busy(void) > > goto unlock; > > sd->nohz_idle = 0; > > > > - atomic_inc(>shared->nr_busy_cpus); > > + if (sd->shared) > > + atomic_inc(>shared->nr_busy_cpus); > > unlock: > > rcu_read_unlock(); > > } > > > Ah, no, the problem is that while it does have SHARE_PKG_RESOURCES (in > its SMT domain -- SMT threads share all cache after all), I failed to > connect the sched_domain_shared structure for it. > > Does something like this also work? Yep, it does.
Re: [PATCH 2/5] efibc: Fix excessive stack footprint warning
On Tue, 10 May, at 10:40:22AM, Jeremy Compostella wrote: > Why not. See patch as attachment. > > Thanks, > > Jérémy > > From 8a9b07e2d7242fa8a36157f1025202a96c3c7c9a Mon Sep 17 00:00:00 2001 > From: Jeremy Compostella> Date: Tue, 10 May 2016 10:34:21 +0200 > Subject: [PATCH] efibc: report the EFI variable name in the error messages > > Report the name of the EFI variable if the value is incorrect or if > efibc_set_variable() fails to allocate the struct efivar_entry object. > > Signed-off-by: Jeremy Compostella > --- > drivers/firmware/efi/efibc.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/firmware/efi/efibc.c b/drivers/firmware/efi/efibc.c > index cb4f573..93d34a1 100644 > --- a/drivers/firmware/efi/efibc.c > +++ b/drivers/firmware/efi/efibc.c > @@ -37,13 +37,14 @@ static int efibc_set_variable(const char *name, const > char *value) > size_t size = (strlen(value) + 1) * sizeof(efi_char16_t); > > if (size > sizeof(entry->var.Data)) { > - pr_err("value is too large"); > + pr_err("value is too large for %s EFI variable", name); > return -EINVAL; > } It'd be a good idea to print 'size' too. > > entry = kmalloc(sizeof(*entry), GFP_KERNEL); > if (!entry) { > - pr_err("failed to allocate efivar entry"); > + pr_err("failed to allocate efivar entry for %s EFI variable", > +name); > return -ENOMEM; > } Aren't these pr_err() calls missing newline characters?
Re: [PATCH 2/5] efibc: Fix excessive stack footprint warning
On Tue, 10 May, at 10:40:22AM, Jeremy Compostella wrote: > Why not. See patch as attachment. > > Thanks, > > Jérémy > > From 8a9b07e2d7242fa8a36157f1025202a96c3c7c9a Mon Sep 17 00:00:00 2001 > From: Jeremy Compostella > Date: Tue, 10 May 2016 10:34:21 +0200 > Subject: [PATCH] efibc: report the EFI variable name in the error messages > > Report the name of the EFI variable if the value is incorrect or if > efibc_set_variable() fails to allocate the struct efivar_entry object. > > Signed-off-by: Jeremy Compostella > --- > drivers/firmware/efi/efibc.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/firmware/efi/efibc.c b/drivers/firmware/efi/efibc.c > index cb4f573..93d34a1 100644 > --- a/drivers/firmware/efi/efibc.c > +++ b/drivers/firmware/efi/efibc.c > @@ -37,13 +37,14 @@ static int efibc_set_variable(const char *name, const > char *value) > size_t size = (strlen(value) + 1) * sizeof(efi_char16_t); > > if (size > sizeof(entry->var.Data)) { > - pr_err("value is too large"); > + pr_err("value is too large for %s EFI variable", name); > return -EINVAL; > } It'd be a good idea to print 'size' too. > > entry = kmalloc(sizeof(*entry), GFP_KERNEL); > if (!entry) { > - pr_err("failed to allocate efivar entry"); > + pr_err("failed to allocate efivar entry for %s EFI variable", > +name); > return -ENOMEM; > } Aren't these pr_err() calls missing newline characters?
Re: [PATCH] Xen: EFI: Parse DT parameters for Xen specific UEFI
On Fri, 06 May, at 09:52:42AM, Mathieu Poirier wrote: > > +static int __init efi_remap_init(void) > > +{ > > + u64 mapsize; > > + > > + pr_info("Remapping and enabling EFI services.\n"); > > + > > + mapsize = memmap.map_end - memmap.map; > > + memmap.map = (__force void *)ioremap_cache(memmap.phys_map, > > + mapsize); > > + if (!memmap.map) { > > + pr_err("Failed to remap EFI memory map\n"); > > + return -ENOMEM; > > + } > > + memmap.map_end = memmap.map + mapsize; > > + efi.memmap = > > + > > + efi.systab = (__force void *)ioremap_cache(efi_system_table, > > + > > sizeof(efi_system_table_t)); > > + if (!efi.systab) { > > Is there a reason why memmap.map isn't iounmap() in the error path? > The original code didn't have it either, hence the question. I suspect that is a bug. In reality, if you can't access the EFI System Table because you fail to map it I would guess you are going to crash very, very quickly anyhow.
Re: [PATCH] Xen: EFI: Parse DT parameters for Xen specific UEFI
On Fri, 06 May, at 09:52:42AM, Mathieu Poirier wrote: > > +static int __init efi_remap_init(void) > > +{ > > + u64 mapsize; > > + > > + pr_info("Remapping and enabling EFI services.\n"); > > + > > + mapsize = memmap.map_end - memmap.map; > > + memmap.map = (__force void *)ioremap_cache(memmap.phys_map, > > + mapsize); > > + if (!memmap.map) { > > + pr_err("Failed to remap EFI memory map\n"); > > + return -ENOMEM; > > + } > > + memmap.map_end = memmap.map + mapsize; > > + efi.memmap = > > + > > + efi.systab = (__force void *)ioremap_cache(efi_system_table, > > + > > sizeof(efi_system_table_t)); > > + if (!efi.systab) { > > Is there a reason why memmap.map isn't iounmap() in the error path? > The original code didn't have it either, hence the question. I suspect that is a bug. In reality, if you can't access the EFI System Table because you fail to map it I would guess you are going to crash very, very quickly anyhow.
Re: [PATCH] Xen: EFI: Parse DT parameters for Xen specific UEFI
On Fri, 06 May, at 04:32:06PM, Shannon Zhao wrote: > From: Shannon Zhao <shannon.z...@linaro.org> > > Add a new function to parse DT parameters for Xen specific UEFI just > like the way for normal UEFI. Then it could reuse the existing codes. > > If Xen supports EFI, initialize runtime services. This commit log would benefit from a little expansion. I'd like to see information that answers the following questions, - How do the Xen DT paramters differ from bare metal? - What existing code is reused with your patch? - How are Xen runtime services different from bare metal? - Why is it OK to force enable EFI runtime services for Xen? I think it would also be good to explicitly state that we do not expect to find both Xen EFI DT parameters and bare metal EFI DT parameters when performing the search; the two should be mutually exclusive. > CC: Matt Fleming <m...@codeblueprint.co.uk> > Signed-off-by: Shannon Zhao <shannon.z...@linaro.org> > --- > Drop using of EFI_PARAVIRT. Discussion can be found from [1]. > [1] https://lkml.org/lkml/2016/4/29/8 > --- > arch/arm/include/asm/efi.h | 2 + > arch/arm/xen/enlighten.c | 16 > arch/arm64/include/asm/efi.h | 2 + > drivers/firmware/efi/arm-runtime.c | 70 +++- > drivers/firmware/efi/efi.c | 81 > ++ > 5 files changed, 137 insertions(+), 34 deletions(-) > > diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h > index e0eea72..5ba4343 100644 > --- a/arch/arm/include/asm/efi.h > +++ b/arch/arm/include/asm/efi.h > @@ -52,9 +52,11 @@ static inline void efi_set_pgd(struct mm_struct *mm) > > void efi_virtmap_load(void); > void efi_virtmap_unload(void); > +int xen_arm_enable_runtime_services(void); > > #else > #define efi_init() > +#define xen_arm_enable_runtime_services() (0) > #endif /* CONFIG_EFI */ > > /* arch specific definitions used by the stub code */ > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c > index 13e3e9f..3362870 100644 > --- a/arch/arm/xen/enlighten.c > +++ b/arch/arm/xen/enlighten.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -261,6 +262,13 @@ static int __init fdt_find_hyper_node(unsigned long > node, const char *uname, > !strncmp(hyper_node.prefix, s, strlen(hyper_node.prefix))) > hyper_node.version = s + strlen(hyper_node.prefix); > > + if (IS_ENABLED(CONFIG_XEN_EFI)) { > + /* Check if Xen supports EFI */ > + if ((of_get_flat_dt_subnode_by_name(node, "uefi") > 0) && > + !efi_runtime_disabled()) > + set_bit(EFI_RUNTIME_SERVICES, ); > + } > + > return 0; > } > The above comment could do with including similar information to the commit log as to why we want to force enable runtime services. For x86 we have this, * * When EFI_PARAVIRT is in force then we could not map runtime * service memory region because we do not have direct access to it. * However, runtime services are available through proxy functions * (e.g. in case of Xen dom0 EFI implementation they call special * hypercall which executes relevant EFI functions) and that is why * they are always enabled. */ We need something similar here. > @@ -338,6 +346,7 @@ static int __init xen_guest_init(void) > { > struct xen_add_to_physmap xatp; > struct shared_info *shared_info_page = NULL; > + int ret; > > if (!xen_domain()) > return 0; > @@ -352,6 +361,13 @@ static int __init xen_guest_init(void) > return -ENODEV; > } > > + if (IS_ENABLED(CONFIG_XEN_EFI) && efi_enabled(EFI_BOOT) && > + efi_enabled(EFI_RUNTIME_SERVICES)) { > + ret = xen_arm_enable_runtime_services(); > + if (ret) > + return ret; > + } > + Is it ever possible to have EFI_RUNTIME_SERVICES set but efi_enabled(EFI_BOOT) and IS_ENABLED(CONFIG_XEN_EFI) be false inside this function? If the answer is "no", I'd suggest just reducing this down to, /* * The fdt parsing code will have set EFI_RUNTIME_SERVICES if * it found Xen EFI parameters. Force enable runtime services. */ if (efi_enabled(EFI_RUNTIME_SERVICES)) { ret = xen_arm_enable_runtime_services(); if (ret) return ret; } But first, see my comments below. > @@ -39,6 +40,33 @@ static struct mm_struct efi_mm = { >
Re: [PATCH] Xen: EFI: Parse DT parameters for Xen specific UEFI
On Fri, 06 May, at 04:32:06PM, Shannon Zhao wrote: > From: Shannon Zhao > > Add a new function to parse DT parameters for Xen specific UEFI just > like the way for normal UEFI. Then it could reuse the existing codes. > > If Xen supports EFI, initialize runtime services. This commit log would benefit from a little expansion. I'd like to see information that answers the following questions, - How do the Xen DT paramters differ from bare metal? - What existing code is reused with your patch? - How are Xen runtime services different from bare metal? - Why is it OK to force enable EFI runtime services for Xen? I think it would also be good to explicitly state that we do not expect to find both Xen EFI DT parameters and bare metal EFI DT parameters when performing the search; the two should be mutually exclusive. > CC: Matt Fleming > Signed-off-by: Shannon Zhao > --- > Drop using of EFI_PARAVIRT. Discussion can be found from [1]. > [1] https://lkml.org/lkml/2016/4/29/8 > --- > arch/arm/include/asm/efi.h | 2 + > arch/arm/xen/enlighten.c | 16 > arch/arm64/include/asm/efi.h | 2 + > drivers/firmware/efi/arm-runtime.c | 70 +++- > drivers/firmware/efi/efi.c | 81 > ++ > 5 files changed, 137 insertions(+), 34 deletions(-) > > diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h > index e0eea72..5ba4343 100644 > --- a/arch/arm/include/asm/efi.h > +++ b/arch/arm/include/asm/efi.h > @@ -52,9 +52,11 @@ static inline void efi_set_pgd(struct mm_struct *mm) > > void efi_virtmap_load(void); > void efi_virtmap_unload(void); > +int xen_arm_enable_runtime_services(void); > > #else > #define efi_init() > +#define xen_arm_enable_runtime_services() (0) > #endif /* CONFIG_EFI */ > > /* arch specific definitions used by the stub code */ > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c > index 13e3e9f..3362870 100644 > --- a/arch/arm/xen/enlighten.c > +++ b/arch/arm/xen/enlighten.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -261,6 +262,13 @@ static int __init fdt_find_hyper_node(unsigned long > node, const char *uname, > !strncmp(hyper_node.prefix, s, strlen(hyper_node.prefix))) > hyper_node.version = s + strlen(hyper_node.prefix); > > + if (IS_ENABLED(CONFIG_XEN_EFI)) { > + /* Check if Xen supports EFI */ > + if ((of_get_flat_dt_subnode_by_name(node, "uefi") > 0) && > + !efi_runtime_disabled()) > + set_bit(EFI_RUNTIME_SERVICES, ); > + } > + > return 0; > } > The above comment could do with including similar information to the commit log as to why we want to force enable runtime services. For x86 we have this, * * When EFI_PARAVIRT is in force then we could not map runtime * service memory region because we do not have direct access to it. * However, runtime services are available through proxy functions * (e.g. in case of Xen dom0 EFI implementation they call special * hypercall which executes relevant EFI functions) and that is why * they are always enabled. */ We need something similar here. > @@ -338,6 +346,7 @@ static int __init xen_guest_init(void) > { > struct xen_add_to_physmap xatp; > struct shared_info *shared_info_page = NULL; > + int ret; > > if (!xen_domain()) > return 0; > @@ -352,6 +361,13 @@ static int __init xen_guest_init(void) > return -ENODEV; > } > > + if (IS_ENABLED(CONFIG_XEN_EFI) && efi_enabled(EFI_BOOT) && > + efi_enabled(EFI_RUNTIME_SERVICES)) { > + ret = xen_arm_enable_runtime_services(); > + if (ret) > + return ret; > + } > + Is it ever possible to have EFI_RUNTIME_SERVICES set but efi_enabled(EFI_BOOT) and IS_ENABLED(CONFIG_XEN_EFI) be false inside this function? If the answer is "no", I'd suggest just reducing this down to, /* * The fdt parsing code will have set EFI_RUNTIME_SERVICES if * it found Xen EFI parameters. Force enable runtime services. */ if (efi_enabled(EFI_RUNTIME_SERVICES)) { ret = xen_arm_enable_runtime_services(); if (ret) return ret; } But first, see my comments below. > @@ -39,6 +40,33 @@ static struct mm_struct efi_mm = { > .mmlist = LIST_HEAD_INIT(efi_mm.mmlist), > }; > > +static int __init efi_r
Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared
On Mon, 09 May, at 12:48:11PM, Peter Zijlstra wrote: > Move the nr_busy_cpus thing from its hacky sd->parent->groups->sgc > location into the much more natural sched_domain_shared location. > > Signed-off-by: Peter Zijlstra (Intel)> --- > include/linux/sched.h|1 + > kernel/sched/core.c | 10 +- > kernel/sched/fair.c | 22 -- > kernel/sched/sched.h |6 +- > kernel/time/tick-sched.c | 10 +- > 5 files changed, 24 insertions(+), 25 deletions(-) > > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1059,6 +1059,7 @@ struct sched_group; > > struct sched_domain_shared { > atomic_tref; > + atomic_tnr_busy_cpus; > }; > > struct sched_domain { [...] > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7842,13 +7842,13 @@ static inline void set_cpu_sd_state_busy > int cpu = smp_processor_id(); > > rcu_read_lock(); > - sd = rcu_dereference(per_cpu(sd_busy, cpu)); > + sd = rcu_dereference(per_cpu(sd_llc, cpu)); > > if (!sd || !sd->nohz_idle) > goto unlock; > sd->nohz_idle = 0; > > - atomic_inc(>groups->sgc->nr_busy_cpus); > + atomic_inc(>shared->nr_busy_cpus); > unlock: > rcu_read_unlock(); > } This breaks my POWER7 box which presumably doesn't have SD_SHARE_PKG_RESOURCES, NIP [c012de68] .set_cpu_sd_state_idle+0x58/0x80 LR [c017ded4] .tick_nohz_idle_enter+0x24/0x90 Call Trace: [c007774b7cf0] [c007774b4080] 0xc007774b4080 (unreliable) [c007774b7d60] [c017ded4] .tick_nohz_idle_enter+0x24/0x90 [c007774b7dd0] [c0137200] .cpu_startup_entry+0xe0/0x440 [c007774b7ee0] [c004739c] .start_secondary+0x35c/0x3a0 [c007774b7f90] [c0008bfc] start_secondary_prolog+0x10/0x14 The following fixes it, diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 978b3ef2d87e..d27153adee4d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7920,7 +7920,8 @@ static inline void set_cpu_sd_state_busy(void) goto unlock; sd->nohz_idle = 0; - atomic_inc(>shared->nr_busy_cpus); + if (sd->shared) + atomic_inc(>shared->nr_busy_cpus); unlock: rcu_read_unlock(); } @@ -7937,7 +7938,8 @@ void set_cpu_sd_state_idle(void) goto unlock; sd->nohz_idle = 1; - atomic_dec(>shared->nr_busy_cpus); + if (sd->shared) + atomic_dec(>shared->nr_busy_cpus); unlock: rcu_read_unlock(); }
Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared
On Mon, 09 May, at 12:48:11PM, Peter Zijlstra wrote: > Move the nr_busy_cpus thing from its hacky sd->parent->groups->sgc > location into the much more natural sched_domain_shared location. > > Signed-off-by: Peter Zijlstra (Intel) > --- > include/linux/sched.h|1 + > kernel/sched/core.c | 10 +- > kernel/sched/fair.c | 22 -- > kernel/sched/sched.h |6 +- > kernel/time/tick-sched.c | 10 +- > 5 files changed, 24 insertions(+), 25 deletions(-) > > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1059,6 +1059,7 @@ struct sched_group; > > struct sched_domain_shared { > atomic_tref; > + atomic_tnr_busy_cpus; > }; > > struct sched_domain { [...] > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7842,13 +7842,13 @@ static inline void set_cpu_sd_state_busy > int cpu = smp_processor_id(); > > rcu_read_lock(); > - sd = rcu_dereference(per_cpu(sd_busy, cpu)); > + sd = rcu_dereference(per_cpu(sd_llc, cpu)); > > if (!sd || !sd->nohz_idle) > goto unlock; > sd->nohz_idle = 0; > > - atomic_inc(>groups->sgc->nr_busy_cpus); > + atomic_inc(>shared->nr_busy_cpus); > unlock: > rcu_read_unlock(); > } This breaks my POWER7 box which presumably doesn't have SD_SHARE_PKG_RESOURCES, NIP [c012de68] .set_cpu_sd_state_idle+0x58/0x80 LR [c017ded4] .tick_nohz_idle_enter+0x24/0x90 Call Trace: [c007774b7cf0] [c007774b4080] 0xc007774b4080 (unreliable) [c007774b7d60] [c017ded4] .tick_nohz_idle_enter+0x24/0x90 [c007774b7dd0] [c0137200] .cpu_startup_entry+0xe0/0x440 [c007774b7ee0] [c004739c] .start_secondary+0x35c/0x3a0 [c007774b7f90] [c0008bfc] start_secondary_prolog+0x10/0x14 The following fixes it, diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 978b3ef2d87e..d27153adee4d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7920,7 +7920,8 @@ static inline void set_cpu_sd_state_busy(void) goto unlock; sd->nohz_idle = 0; - atomic_inc(>shared->nr_busy_cpus); + if (sd->shared) + atomic_inc(>shared->nr_busy_cpus); unlock: rcu_read_unlock(); } @@ -7937,7 +7938,8 @@ void set_cpu_sd_state_idle(void) goto unlock; sd->nohz_idle = 1; - atomic_dec(>shared->nr_busy_cpus); + if (sd->shared) + atomic_dec(>shared->nr_busy_cpus); unlock: rcu_read_unlock(); }
Re: [RFC PATCH v1 10/18] x86/efi: Access EFI related tables in the clear
On Tue, 26 Apr, at 05:57:40PM, Tom Lendacky wrote: > The EFI tables are not encrypted and need to be accessed as such. Be sure > to memmap them without the encryption attribute set. For EFI support that > lives outside of the arch/x86 tree, create a routine that uses the __weak > attribute so that it can be overridden by an architecture specific routine. > > When freeing boot services related memory, since it has been mapped as > un-encrypted, be sure to change the mapping to encrypted for future use. > > Signed-off-by: Tom Lendacky> --- > arch/x86/include/asm/cacheflush.h |3 + > arch/x86/include/asm/mem_encrypt.h | 22 +++ > arch/x86/kernel/setup.c|6 +-- > arch/x86/mm/mem_encrypt.c | 56 +++ > arch/x86/mm/pageattr.c | 75 > > arch/x86/platform/efi/efi.c| 26 +++- > arch/x86/platform/efi/efi_64.c |9 +++- > arch/x86/platform/efi/quirks.c | 12 +- > drivers/firmware/efi/efi.c | 18 +++-- > drivers/firmware/efi/esrt.c| 12 +++--- > include/linux/efi.h|3 + > 11 files changed, 212 insertions(+), 30 deletions(-) The size of this change is completely unexpected. Why is there so much churn to workaround this new feature? Is it not possible to maintain some kind of kernel virtual address mapping so memremap*() and friends can figure out when to twiddle the mapping attributes and map with/without encryption? These API changes place an undue burden on developers that don't even care about SME.
Re: [RFC PATCH v1 10/18] x86/efi: Access EFI related tables in the clear
On Tue, 26 Apr, at 05:57:40PM, Tom Lendacky wrote: > The EFI tables are not encrypted and need to be accessed as such. Be sure > to memmap them without the encryption attribute set. For EFI support that > lives outside of the arch/x86 tree, create a routine that uses the __weak > attribute so that it can be overridden by an architecture specific routine. > > When freeing boot services related memory, since it has been mapped as > un-encrypted, be sure to change the mapping to encrypted for future use. > > Signed-off-by: Tom Lendacky > --- > arch/x86/include/asm/cacheflush.h |3 + > arch/x86/include/asm/mem_encrypt.h | 22 +++ > arch/x86/kernel/setup.c|6 +-- > arch/x86/mm/mem_encrypt.c | 56 +++ > arch/x86/mm/pageattr.c | 75 > > arch/x86/platform/efi/efi.c| 26 +++- > arch/x86/platform/efi/efi_64.c |9 +++- > arch/x86/platform/efi/quirks.c | 12 +- > drivers/firmware/efi/efi.c | 18 +++-- > drivers/firmware/efi/esrt.c| 12 +++--- > include/linux/efi.h|3 + > 11 files changed, 212 insertions(+), 30 deletions(-) The size of this change is completely unexpected. Why is there so much churn to workaround this new feature? Is it not possible to maintain some kind of kernel virtual address mapping so memremap*() and friends can figure out when to twiddle the mapping attributes and map with/without encryption? These API changes place an undue burden on developers that don't even care about SME.
Re: [BUG] x86/efi: MMRs no longer properly mapped after switch to isolated page table
On Mon, 02 May, at 04:39:31PM, Alex Thorlton wrote: > > If you think we're violating EFI rules by accessing these registers from > both sides of the fence, please let me know. I'd like to make sure that > we get everything behaving the way it should be! Oh no, I don't think this is violating the UEFI spec at all, but I do think it goes against the spirit of the way other implementations are designed; with maximum separation between firmware and kernel. In a perfect world, I'd suggest mapping the MMR range in both the kernel and firmware, at different virtual address ranges, but have the firmware's version opaque to the kernel and only described by an EfiMemoryMappedIO region, or something. That is ignoring any region type conflicts that may arise. Of course, we don't operate in a perfect world, so a good solution may be to just insert the kernels' mapping into the EFI page tables.
Re: [BUG] x86/efi: MMRs no longer properly mapped after switch to isolated page table
On Mon, 02 May, at 04:39:31PM, Alex Thorlton wrote: > > If you think we're violating EFI rules by accessing these registers from > both sides of the fence, please let me know. I'd like to make sure that > we get everything behaving the way it should be! Oh no, I don't think this is violating the UEFI spec at all, but I do think it goes against the spirit of the way other implementations are designed; with maximum separation between firmware and kernel. In a perfect world, I'd suggest mapping the MMR range in both the kernel and firmware, at different virtual address ranges, but have the firmware's version opaque to the kernel and only described by an EfiMemoryMappedIO region, or something. That is ignoring any region type conflicts that may arise. Of course, we don't operate in a perfect world, so a good solution may be to just insert the kernels' mapping into the EFI page tables.
[tip:efi/core] efi/capsule: Move 'capsule' to the stack in efi_capsule_supported()
Commit-ID: fb7a84cac03541f4da18dfa25b3f4767d4efc6fc Gitweb: http://git.kernel.org/tip/fb7a84cac03541f4da18dfa25b3f4767d4efc6fc Author: Matt Fleming <m...@codeblueprint.co.uk> AuthorDate: Fri, 6 May 2016 22:39:29 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Sat, 7 May 2016 07:06:13 +0200 efi/capsule: Move 'capsule' to the stack in efi_capsule_supported() Dan Carpenter reports that passing the address of the pointer to the kmalloc()'d memory for 'capsule' is dangerous: "drivers/firmware/efi/capsule.c:109 efi_capsule_supported() warn: did you mean to pass the address of 'capsule' 108 109 status = efi.query_capsule_caps(, 1, _size, reset); If we modify capsule inside this function call then at the end of the function we aren't freeing the original pointer that we allocated." Ard Biesheuvel noted that we don't even need to call kmalloc() since the object we allocate isn't very big and doesn't need to persist after the function returns. Place 'capsule' on the stack instead. Suggested-by: Ard Biesheuvel <ard.biesheu...@linaro.org> Reported-by: Dan Carpenter <dan.carpen...@oracle.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> Acked-by: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Borislav Petkov <b...@alien8.de> Cc: Brian Gerst <brge...@gmail.com> Cc: Bryan O'Donoghue <pure.lo...@nexus-software.ie> Cc: Denys Vlasenko <dvlas...@redhat.com> Cc: H. Peter Anvin <h...@zytor.com> Cc: Kweh Hock Leong <hock.leong.k...@intel.com> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: joeyli <j...@suse.com> Cc: linux-...@vger.kernel.org Link: http://lkml.kernel.org/r/1462570771-13324-4-git-send-email-m...@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mi...@kernel.org> --- drivers/firmware/efi/capsule.c | 29 +++-- 1 file changed, 11 insertions(+), 18 deletions(-) diff --git a/drivers/firmware/efi/capsule.c b/drivers/firmware/efi/capsule.c index e530540..53b9fd2 100644 --- a/drivers/firmware/efi/capsule.c +++ b/drivers/firmware/efi/capsule.c @@ -86,33 +86,26 @@ bool efi_capsule_pending(int *reset_type) */ int efi_capsule_supported(efi_guid_t guid, u32 flags, size_t size, int *reset) { - efi_capsule_header_t *capsule; + efi_capsule_header_t capsule; + efi_capsule_header_t *cap_list[] = { }; efi_status_t status; u64 max_size; - int rv = 0; if (flags & ~EFI_CAPSULE_SUPPORTED_FLAG_MASK) return -EINVAL; - capsule = kmalloc(sizeof(*capsule), GFP_KERNEL); - if (!capsule) - return -ENOMEM; - - capsule->headersize = capsule->imagesize = sizeof(*capsule); - memcpy(>guid, , sizeof(efi_guid_t)); - capsule->flags = flags; + capsule.headersize = capsule.imagesize = sizeof(capsule); + memcpy(, , sizeof(efi_guid_t)); + capsule.flags = flags; - status = efi.query_capsule_caps(, 1, _size, reset); - if (status != EFI_SUCCESS) { - rv = efi_status_to_err(status); - goto out; - } + status = efi.query_capsule_caps(cap_list, 1, _size, reset); + if (status != EFI_SUCCESS) + return efi_status_to_err(status); if (size > max_size) - rv = -ENOSPC; -out: - kfree(capsule); - return rv; + return -ENOSPC; + + return 0; } EXPORT_SYMBOL_GPL(efi_capsule_supported);
[tip:efi/core] efi/capsule: Move 'capsule' to the stack in efi_capsule_supported()
Commit-ID: fb7a84cac03541f4da18dfa25b3f4767d4efc6fc Gitweb: http://git.kernel.org/tip/fb7a84cac03541f4da18dfa25b3f4767d4efc6fc Author: Matt Fleming AuthorDate: Fri, 6 May 2016 22:39:29 +0100 Committer: Ingo Molnar CommitDate: Sat, 7 May 2016 07:06:13 +0200 efi/capsule: Move 'capsule' to the stack in efi_capsule_supported() Dan Carpenter reports that passing the address of the pointer to the kmalloc()'d memory for 'capsule' is dangerous: "drivers/firmware/efi/capsule.c:109 efi_capsule_supported() warn: did you mean to pass the address of 'capsule' 108 109 status = efi.query_capsule_caps(, 1, _size, reset); If we modify capsule inside this function call then at the end of the function we aren't freeing the original pointer that we allocated." Ard Biesheuvel noted that we don't even need to call kmalloc() since the object we allocate isn't very big and doesn't need to persist after the function returns. Place 'capsule' on the stack instead. Suggested-by: Ard Biesheuvel Reported-by: Dan Carpenter Signed-off-by: Matt Fleming Acked-by: Ard Biesheuvel Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Bryan O'Donoghue Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Kweh Hock Leong Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: joeyli Cc: linux-...@vger.kernel.org Link: http://lkml.kernel.org/r/1462570771-13324-4-git-send-email-m...@codeblueprint.co.uk Signed-off-by: Ingo Molnar --- drivers/firmware/efi/capsule.c | 29 +++-- 1 file changed, 11 insertions(+), 18 deletions(-) diff --git a/drivers/firmware/efi/capsule.c b/drivers/firmware/efi/capsule.c index e530540..53b9fd2 100644 --- a/drivers/firmware/efi/capsule.c +++ b/drivers/firmware/efi/capsule.c @@ -86,33 +86,26 @@ bool efi_capsule_pending(int *reset_type) */ int efi_capsule_supported(efi_guid_t guid, u32 flags, size_t size, int *reset) { - efi_capsule_header_t *capsule; + efi_capsule_header_t capsule; + efi_capsule_header_t *cap_list[] = { }; efi_status_t status; u64 max_size; - int rv = 0; if (flags & ~EFI_CAPSULE_SUPPORTED_FLAG_MASK) return -EINVAL; - capsule = kmalloc(sizeof(*capsule), GFP_KERNEL); - if (!capsule) - return -ENOMEM; - - capsule->headersize = capsule->imagesize = sizeof(*capsule); - memcpy(>guid, , sizeof(efi_guid_t)); - capsule->flags = flags; + capsule.headersize = capsule.imagesize = sizeof(capsule); + memcpy(, , sizeof(efi_guid_t)); + capsule.flags = flags; - status = efi.query_capsule_caps(, 1, _size, reset); - if (status != EFI_SUCCESS) { - rv = efi_status_to_err(status); - goto out; - } + status = efi.query_capsule_caps(cap_list, 1, _size, reset); + if (status != EFI_SUCCESS) + return efi_status_to_err(status); if (size > max_size) - rv = -ENOSPC; -out: - kfree(capsule); - return rv; + return -ENOSPC; + + return 0; } EXPORT_SYMBOL_GPL(efi_capsule_supported);
[tip:efi/core] efi/capsule: Make efi_capsule_pending() lockless
Commit-ID: 62075e581802ea1842d5d3c490a7e46330bdb9e1 Gitweb: http://git.kernel.org/tip/62075e581802ea1842d5d3c490a7e46330bdb9e1 Author: Matt Fleming <m...@codeblueprint.co.uk> AuthorDate: Fri, 6 May 2016 22:39:27 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Sat, 7 May 2016 07:06:13 +0200 efi/capsule: Make efi_capsule_pending() lockless Taking a mutex in the reboot path is bogus because we cannot sleep with interrupts disabled, such as when rebooting due to panic(), BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97 in_atomic(): 0, irqs_disabled(): 1, pid: 7, name: rcu_sched Call Trace: dump_stack+0x63/0x89 ___might_sleep+0xd8/0x120 __might_sleep+0x49/0x80 mutex_lock+0x20/0x50 efi_capsule_pending+0x1d/0x60 native_machine_emergency_restart+0x59/0x280 machine_emergency_restart+0x19/0x20 emergency_restart+0x18/0x20 panic+0x1ba/0x217 In this case all other CPUs will have been stopped by the time we execute the platform reboot code, so 'capsule_pending' cannot change under our feet. We wouldn't care even if it could since we cannot wait for it complete. Also, instead of relying on the external 'system_state' variable just use a reboot notifier, so we can set 'stop_capsules' while holding 'capsule_mutex', thereby avoiding a race where system_state is updated while we're in the middle of efi_capsule_update_locked() (since CPUs won't have been stopped at that point). Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Borislav Petkov <b...@alien8.de> Cc: Brian Gerst <brge...@gmail.com> Cc: Bryan O'Donoghue <pure.lo...@nexus-software.ie> Cc: Denys Vlasenko <dvlas...@redhat.com> Cc: H. Peter Anvin <h...@zytor.com> Cc: Kweh Hock Leong <hock.leong.k...@intel.com> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: joeyli <j...@suse.com> Cc: linux-...@vger.kernel.org Link: http://lkml.kernel.org/r/1462570771-13324-2-git-send-email-m...@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mi...@kernel.org> --- drivers/firmware/efi/capsule.c | 35 +-- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/drivers/firmware/efi/capsule.c b/drivers/firmware/efi/capsule.c index 0de5594..e530540 100644 --- a/drivers/firmware/efi/capsule.c +++ b/drivers/firmware/efi/capsule.c @@ -22,11 +22,12 @@ typedef struct { } efi_capsule_block_desc_t; static bool capsule_pending; +static bool stop_capsules; static int efi_reset_type = -1; /* * capsule_mutex serialises access to both capsule_pending and - * efi_reset_type. + * efi_reset_type and stop_capsules. */ static DEFINE_MUTEX(capsule_mutex); @@ -50,18 +51,13 @@ static DEFINE_MUTEX(capsule_mutex); */ bool efi_capsule_pending(int *reset_type) { - bool rv = false; - - mutex_lock(_mutex); if (!capsule_pending) - goto out; + return false; if (reset_type) *reset_type = efi_reset_type; - rv = true; -out: - mutex_unlock(_mutex); - return rv; + + return true; } /* @@ -176,7 +172,7 @@ efi_capsule_update_locked(efi_capsule_header_t *capsule, * whether to force an EFI reboot), and we're racing against * that call. Abort in that case. */ - if (unlikely(system_state == SYSTEM_RESTART)) { + if (unlikely(stop_capsules)) { pr_warn("Capsule update raced with reboot, aborting.\n"); return -EINVAL; } @@ -298,3 +294,22 @@ out: return rv; } EXPORT_SYMBOL_GPL(efi_capsule_update); + +static int capsule_reboot_notify(struct notifier_block *nb, unsigned long event, void *cmd) +{ + mutex_lock(_mutex); + stop_capsules = true; + mutex_unlock(_mutex); + + return NOTIFY_DONE; +} + +static struct notifier_block capsule_reboot_nb = { + .notifier_call = capsule_reboot_notify, +}; + +static int __init capsule_reboot_register(void) +{ + return register_reboot_notifier(_reboot_nb); +} +core_initcall(capsule_reboot_register);
[tip:efi/core] efi/capsule: Make efi_capsule_pending() lockless
Commit-ID: 62075e581802ea1842d5d3c490a7e46330bdb9e1 Gitweb: http://git.kernel.org/tip/62075e581802ea1842d5d3c490a7e46330bdb9e1 Author: Matt Fleming AuthorDate: Fri, 6 May 2016 22:39:27 +0100 Committer: Ingo Molnar CommitDate: Sat, 7 May 2016 07:06:13 +0200 efi/capsule: Make efi_capsule_pending() lockless Taking a mutex in the reboot path is bogus because we cannot sleep with interrupts disabled, such as when rebooting due to panic(), BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97 in_atomic(): 0, irqs_disabled(): 1, pid: 7, name: rcu_sched Call Trace: dump_stack+0x63/0x89 ___might_sleep+0xd8/0x120 __might_sleep+0x49/0x80 mutex_lock+0x20/0x50 efi_capsule_pending+0x1d/0x60 native_machine_emergency_restart+0x59/0x280 machine_emergency_restart+0x19/0x20 emergency_restart+0x18/0x20 panic+0x1ba/0x217 In this case all other CPUs will have been stopped by the time we execute the platform reboot code, so 'capsule_pending' cannot change under our feet. We wouldn't care even if it could since we cannot wait for it complete. Also, instead of relying on the external 'system_state' variable just use a reboot notifier, so we can set 'stop_capsules' while holding 'capsule_mutex', thereby avoiding a race where system_state is updated while we're in the middle of efi_capsule_update_locked() (since CPUs won't have been stopped at that point). Signed-off-by: Matt Fleming Cc: Andy Lutomirski Cc: Ard Biesheuvel Cc: Borislav Petkov Cc: Brian Gerst Cc: Bryan O'Donoghue Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Kweh Hock Leong Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: joeyli Cc: linux-...@vger.kernel.org Link: http://lkml.kernel.org/r/1462570771-13324-2-git-send-email-m...@codeblueprint.co.uk Signed-off-by: Ingo Molnar --- drivers/firmware/efi/capsule.c | 35 +-- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/drivers/firmware/efi/capsule.c b/drivers/firmware/efi/capsule.c index 0de5594..e530540 100644 --- a/drivers/firmware/efi/capsule.c +++ b/drivers/firmware/efi/capsule.c @@ -22,11 +22,12 @@ typedef struct { } efi_capsule_block_desc_t; static bool capsule_pending; +static bool stop_capsules; static int efi_reset_type = -1; /* * capsule_mutex serialises access to both capsule_pending and - * efi_reset_type. + * efi_reset_type and stop_capsules. */ static DEFINE_MUTEX(capsule_mutex); @@ -50,18 +51,13 @@ static DEFINE_MUTEX(capsule_mutex); */ bool efi_capsule_pending(int *reset_type) { - bool rv = false; - - mutex_lock(_mutex); if (!capsule_pending) - goto out; + return false; if (reset_type) *reset_type = efi_reset_type; - rv = true; -out: - mutex_unlock(_mutex); - return rv; + + return true; } /* @@ -176,7 +172,7 @@ efi_capsule_update_locked(efi_capsule_header_t *capsule, * whether to force an EFI reboot), and we're racing against * that call. Abort in that case. */ - if (unlikely(system_state == SYSTEM_RESTART)) { + if (unlikely(stop_capsules)) { pr_warn("Capsule update raced with reboot, aborting.\n"); return -EINVAL; } @@ -298,3 +294,22 @@ out: return rv; } EXPORT_SYMBOL_GPL(efi_capsule_update); + +static int capsule_reboot_notify(struct notifier_block *nb, unsigned long event, void *cmd) +{ + mutex_lock(_mutex); + stop_capsules = true; + mutex_unlock(_mutex); + + return NOTIFY_DONE; +} + +static struct notifier_block capsule_reboot_nb = { + .notifier_call = capsule_reboot_notify, +}; + +static int __init capsule_reboot_register(void) +{ + return register_reboot_notifier(_reboot_nb); +} +core_initcall(capsule_reboot_register);
[PATCH 5/5] efivarfs: Make efivarfs_file_ioctl static
From: Peter Jones <pjo...@redhat.com> There are no callers except through the file_operations struct below this, so it should be static like everything else here. Signed-off-by: Peter Jones <pjo...@redhat.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- fs/efivarfs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/efivarfs/file.c b/fs/efivarfs/file.c index d48e0d261d78..5f22e74bbade 100644 --- a/fs/efivarfs/file.c +++ b/fs/efivarfs/file.c @@ -157,7 +157,7 @@ efivarfs_ioc_setxflags(struct file *file, void __user *arg) return 0; } -long +static long efivarfs_file_ioctl(struct file *file, unsigned int cmd, unsigned long p) { void __user *arg = (void __user *)p; -- 2.7.3
[PATCH 4/5] efi: Merge boolean flag arguments
From: Julia Lawall <julia.law...@lip6.fr> The parameters atomic and duplicates of efivar_init always have opposite values. Drop the parameter atomic, replace the uses of !atomic with duplicates, and update the call sites accordingly. The code using duplicates is slightly reorganized with an else, to avoid duplicating the lock code. Signed-off-by: Julia Lawall <julia.law...@lip6.fr> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Matthew Garrett <mj...@srcf.ucam.org> Cc: Jeremy Kerr <j...@ozlabs.org> Cc: Vaishali Thakkar <vaishali.thak...@oracle.com> Cc: Saurabh Sengar <saurabh.tr...@gmail.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- drivers/firmware/efi/efivars.c | 5 ++--- drivers/firmware/efi/vars.c| 23 ++- fs/efivarfs/super.c| 3 +-- include/linux/efi.h| 3 +-- 4 files changed, 14 insertions(+), 20 deletions(-) diff --git a/drivers/firmware/efi/efivars.c b/drivers/firmware/efi/efivars.c index 096adcbcb5a9..116b244dee68 100644 --- a/drivers/firmware/efi/efivars.c +++ b/drivers/firmware/efi/efivars.c @@ -661,7 +661,7 @@ static void efivar_update_sysfs_entries(struct work_struct *work) return; err = efivar_init(efivar_update_sysfs_entry, entry, - true, false, _sysfs_list); + false, _sysfs_list); if (!err) break; @@ -730,8 +730,7 @@ int efivars_sysfs_init(void) return -ENOMEM; } - efivar_init(efivars_sysfs_callback, NULL, false, - true, _sysfs_list); + efivar_init(efivars_sysfs_callback, NULL, true, _sysfs_list); error = create_efivars_bin_attributes(); if (error) { diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c index 0012331d5a3d..d3b751383286 100644 --- a/drivers/firmware/efi/vars.c +++ b/drivers/firmware/efi/vars.c @@ -419,8 +419,7 @@ static void dup_variable_bug(efi_char16_t *str16, efi_guid_t *vendor_guid, * Returns 0 on success, or a kernel error code on failure. */ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *), - void *data, bool atomic, bool duplicates, - struct list_head *head) + void *data, bool duplicates, struct list_head *head) { const struct efivar_operations *ops = __efivars->ops; unsigned long variable_name_size = 1024; @@ -450,7 +449,7 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *), _guid); switch (status) { case EFI_SUCCESS: - if (!atomic) + if (duplicates) spin_unlock_irq(&__efivars->lock); variable_name_size = var_name_strnsize(variable_name, @@ -465,21 +464,19 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *), * and may end up looping here forever. */ if (duplicates && - variable_is_present(variable_name, _guid, head)) { + variable_is_present(variable_name, _guid, + head)) { dup_variable_bug(variable_name, _guid, variable_name_size); - if (!atomic) - spin_lock_irq(&__efivars->lock); - status = EFI_NOT_FOUND; - break; + } else { + err = func(variable_name, vendor_guid, + variable_name_size, data); + if (err) + status = EFI_NOT_FOUND; } - err = func(variable_name, vendor_guid, variable_name_size, data); - if (err) - status = EFI_NOT_FOUND; - - if (!atomic) + if (duplicates) spin_lock_irq(&__efivars->lock); break; diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c index 553c5d2db4a4..9cb54a38832d 100644 --- a/fs/efivarfs/super.c +++ b/fs/efivarfs/super.c @@ -216,8 +216,7 @@ static int efivarfs_fill_super(struct super_block *sb, void *data, int silent) INIT_LIST_HEAD(_list); - err = efivar_init(efivarfs_callback, (void *)sb, false, - true, _list); + err = efivar_init(efivarfs_callback, (void *)sb, true, _list); if (err) __efivar_entry_
[PATCH 5/5] efivarfs: Make efivarfs_file_ioctl static
From: Peter Jones There are no callers except through the file_operations struct below this, so it should be static like everything else here. Signed-off-by: Peter Jones Signed-off-by: Matt Fleming --- fs/efivarfs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/efivarfs/file.c b/fs/efivarfs/file.c index d48e0d261d78..5f22e74bbade 100644 --- a/fs/efivarfs/file.c +++ b/fs/efivarfs/file.c @@ -157,7 +157,7 @@ efivarfs_ioc_setxflags(struct file *file, void __user *arg) return 0; } -long +static long efivarfs_file_ioctl(struct file *file, unsigned int cmd, unsigned long p) { void __user *arg = (void __user *)p; -- 2.7.3
[PATCH 4/5] efi: Merge boolean flag arguments
From: Julia Lawall The parameters atomic and duplicates of efivar_init always have opposite values. Drop the parameter atomic, replace the uses of !atomic with duplicates, and update the call sites accordingly. The code using duplicates is slightly reorganized with an else, to avoid duplicating the lock code. Signed-off-by: Julia Lawall Cc: Ard Biesheuvel Cc: Matthew Garrett Cc: Jeremy Kerr Cc: Vaishali Thakkar Cc: Saurabh Sengar Signed-off-by: Matt Fleming --- drivers/firmware/efi/efivars.c | 5 ++--- drivers/firmware/efi/vars.c| 23 ++- fs/efivarfs/super.c| 3 +-- include/linux/efi.h| 3 +-- 4 files changed, 14 insertions(+), 20 deletions(-) diff --git a/drivers/firmware/efi/efivars.c b/drivers/firmware/efi/efivars.c index 096adcbcb5a9..116b244dee68 100644 --- a/drivers/firmware/efi/efivars.c +++ b/drivers/firmware/efi/efivars.c @@ -661,7 +661,7 @@ static void efivar_update_sysfs_entries(struct work_struct *work) return; err = efivar_init(efivar_update_sysfs_entry, entry, - true, false, _sysfs_list); + false, _sysfs_list); if (!err) break; @@ -730,8 +730,7 @@ int efivars_sysfs_init(void) return -ENOMEM; } - efivar_init(efivars_sysfs_callback, NULL, false, - true, _sysfs_list); + efivar_init(efivars_sysfs_callback, NULL, true, _sysfs_list); error = create_efivars_bin_attributes(); if (error) { diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c index 0012331d5a3d..d3b751383286 100644 --- a/drivers/firmware/efi/vars.c +++ b/drivers/firmware/efi/vars.c @@ -419,8 +419,7 @@ static void dup_variable_bug(efi_char16_t *str16, efi_guid_t *vendor_guid, * Returns 0 on success, or a kernel error code on failure. */ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *), - void *data, bool atomic, bool duplicates, - struct list_head *head) + void *data, bool duplicates, struct list_head *head) { const struct efivar_operations *ops = __efivars->ops; unsigned long variable_name_size = 1024; @@ -450,7 +449,7 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *), _guid); switch (status) { case EFI_SUCCESS: - if (!atomic) + if (duplicates) spin_unlock_irq(&__efivars->lock); variable_name_size = var_name_strnsize(variable_name, @@ -465,21 +464,19 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *), * and may end up looping here forever. */ if (duplicates && - variable_is_present(variable_name, _guid, head)) { + variable_is_present(variable_name, _guid, + head)) { dup_variable_bug(variable_name, _guid, variable_name_size); - if (!atomic) - spin_lock_irq(&__efivars->lock); - status = EFI_NOT_FOUND; - break; + } else { + err = func(variable_name, vendor_guid, + variable_name_size, data); + if (err) + status = EFI_NOT_FOUND; } - err = func(variable_name, vendor_guid, variable_name_size, data); - if (err) - status = EFI_NOT_FOUND; - - if (!atomic) + if (duplicates) spin_lock_irq(&__efivars->lock); break; diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c index 553c5d2db4a4..9cb54a38832d 100644 --- a/fs/efivarfs/super.c +++ b/fs/efivarfs/super.c @@ -216,8 +216,7 @@ static int efivarfs_fill_super(struct super_block *sb, void *data, int silent) INIT_LIST_HEAD(_list); - err = efivar_init(efivarfs_callback, (void *)sb, false, - true, _list); + err = efivar_init(efivarfs_callback, (void *)sb, true, _list); if (err) __efivar_entry_iter(efivarfs_destroy, _list, NULL, NULL); diff --git a/include/linux/efi.h b/include/linux/efi.h index aa36fb8bea4b..df7acb51f3cc 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1336,8 +1336,7 @@ int efivars_unregister(s
[PATCH 3/5] efi/capsule: Move 'capsule' to the stack in efi_capsule_supported()
Dan reports that passing the address of the pointer to the kmalloc()'d memory for 'capsule' is dangerous, "drivers/firmware/efi/capsule.c:109 efi_capsule_supported() warn: did you mean to pass the address of 'capsule' 108 109 status = efi.query_capsule_caps(, 1, _size, reset); If we modify capsule inside this function call then at the end of the function we aren't freeing the original pointer that we allocated." Ard noted that we don't even need to call kmalloc() since the object we allocate isn't very big and doesn't need to persist after the function returns. Place 'capsule' on the stack instead. Suggested-by: Ard Biesheuvel <ard.biesheu...@linaro.org> Acked-by: Ard Biesheuvel <ard.biesheu...@linaro.org> Reported-by: Dan Carpenter <dan.carpen...@oracle.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Kweh Hock Leong <hock.leong.k...@intel.com> Cc: Bryan O'Donoghue <pure.lo...@nexus-software.ie> Cc: joeyli <j...@suse.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- drivers/firmware/efi/capsule.c | 29 +++-- 1 file changed, 11 insertions(+), 18 deletions(-) diff --git a/drivers/firmware/efi/capsule.c b/drivers/firmware/efi/capsule.c index 4703dc9b8fbd..7593108f5402 100644 --- a/drivers/firmware/efi/capsule.c +++ b/drivers/firmware/efi/capsule.c @@ -86,33 +86,26 @@ bool efi_capsule_pending(int *reset_type) */ int efi_capsule_supported(efi_guid_t guid, u32 flags, size_t size, int *reset) { - efi_capsule_header_t *capsule; + efi_capsule_header_t capsule; + efi_capsule_header_t *cap_list[] = { }; efi_status_t status; u64 max_size; - int rv = 0; if (flags & ~EFI_CAPSULE_SUPPORTED_FLAG_MASK) return -EINVAL; - capsule = kmalloc(sizeof(*capsule), GFP_KERNEL); - if (!capsule) - return -ENOMEM; - - capsule->headersize = capsule->imagesize = sizeof(*capsule); - memcpy(>guid, , sizeof(efi_guid_t)); - capsule->flags = flags; + capsule.headersize = capsule.imagesize = sizeof(capsule); + memcpy(, , sizeof(efi_guid_t)); + capsule.flags = flags; - status = efi.query_capsule_caps(, 1, _size, reset); - if (status != EFI_SUCCESS) { - rv = efi_status_to_err(status); - goto out; - } + status = efi.query_capsule_caps(cap_list, 1, _size, reset); + if (status != EFI_SUCCESS) + return efi_status_to_err(status); if (size > max_size) - rv = -ENOSPC; -out: - kfree(capsule); - return rv; + return -ENOSPC; + + return 0; } EXPORT_SYMBOL_GPL(efi_capsule_supported); -- 2.7.3
[PATCH 3/5] efi/capsule: Move 'capsule' to the stack in efi_capsule_supported()
Dan reports that passing the address of the pointer to the kmalloc()'d memory for 'capsule' is dangerous, "drivers/firmware/efi/capsule.c:109 efi_capsule_supported() warn: did you mean to pass the address of 'capsule' 108 109 status = efi.query_capsule_caps(, 1, _size, reset); If we modify capsule inside this function call then at the end of the function we aren't freeing the original pointer that we allocated." Ard noted that we don't even need to call kmalloc() since the object we allocate isn't very big and doesn't need to persist after the function returns. Place 'capsule' on the stack instead. Suggested-by: Ard Biesheuvel Acked-by: Ard Biesheuvel Reported-by: Dan Carpenter Cc: Borislav Petkov Cc: Kweh Hock Leong Cc: Bryan O'Donoghue Cc: joeyli Signed-off-by: Matt Fleming --- drivers/firmware/efi/capsule.c | 29 +++-- 1 file changed, 11 insertions(+), 18 deletions(-) diff --git a/drivers/firmware/efi/capsule.c b/drivers/firmware/efi/capsule.c index 4703dc9b8fbd..7593108f5402 100644 --- a/drivers/firmware/efi/capsule.c +++ b/drivers/firmware/efi/capsule.c @@ -86,33 +86,26 @@ bool efi_capsule_pending(int *reset_type) */ int efi_capsule_supported(efi_guid_t guid, u32 flags, size_t size, int *reset) { - efi_capsule_header_t *capsule; + efi_capsule_header_t capsule; + efi_capsule_header_t *cap_list[] = { }; efi_status_t status; u64 max_size; - int rv = 0; if (flags & ~EFI_CAPSULE_SUPPORTED_FLAG_MASK) return -EINVAL; - capsule = kmalloc(sizeof(*capsule), GFP_KERNEL); - if (!capsule) - return -ENOMEM; - - capsule->headersize = capsule->imagesize = sizeof(*capsule); - memcpy(>guid, , sizeof(efi_guid_t)); - capsule->flags = flags; + capsule.headersize = capsule.imagesize = sizeof(capsule); + memcpy(, , sizeof(efi_guid_t)); + capsule.flags = flags; - status = efi.query_capsule_caps(, 1, _size, reset); - if (status != EFI_SUCCESS) { - rv = efi_status_to_err(status); - goto out; - } + status = efi.query_capsule_caps(cap_list, 1, _size, reset); + if (status != EFI_SUCCESS) + return efi_status_to_err(status); if (size > max_size) - rv = -ENOSPC; -out: - kfree(capsule); - return rv; + return -ENOSPC; + + return 0; } EXPORT_SYMBOL_GPL(efi_capsule_supported); -- 2.7.3
[GIT PULL 0/5] EFI changes for v4.7
Folks, this is the second pull request containing v4.7 material. The commits are listed in priority order, with the first patch fixing an oops in the EFI capsule code sitting in tip/efi/core, and the rest being a compiler warning fix, static checker fix, and a couple of cleanups. The following changes since commit 0ec7ae928a9c19c2b7b8054507d5694a2597065e: efi: Remove unnecessary (and buggy) .memmap initialization from the Xen EFI driver (2016-04-29 11:06:15 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git tags/efi-next for you to fetch changes up to 20948c1d9fdefa7acfaa84046f59adce9ef00f2e: efivarfs: Make efivarfs_file_ioctl static (2016-05-05 16:52:19 +0100) * Fix an oops in the EFI capsule code reported by the 0day bot because efi_capsule_pending() was grabbing a mutex in the emergency reboot path - Matt Fleming * Fix a compiler warning about excessive stack usage in the new efibc driver by kmalloc'ing the efivar_entry object - Jeremy Compostella * Dan Carpenter reported that it's potentially unsafe to pass the address of a pointer to the firmware in efi_capsule_supported(). Instead we can skip the dynamic allocation entirely and put the capsule object on the stack - Matt Fleming * Simplify the locking in the efivars code by merging two of efivar_init()'s parameters into one - Julia Lawall * Cleanup efivarfs_file_ioctl by marking it as static since it has no external users - Peter Jones Jeremy Compostella (1): efibc: Fix excessive stack footprint warning Julia Lawall (1): efi: Merge boolean flag arguments Matt Fleming (2): efi/capsule: Make efi_capsule_pending() lockless efi/capsule: Move 'capsule' to the stack in efi_capsule_supported() Peter Jones (1): efivarfs: Make efivarfs_file_ioctl static drivers/firmware/efi/capsule.c | 65 -- drivers/firmware/efi/efibc.c | 34 +++--- drivers/firmware/efi/efivars.c | 5 ++-- drivers/firmware/efi/vars.c| 23 +++ fs/efivarfs/file.c | 2 +- fs/efivarfs/super.c| 3 +- include/linux/efi.h| 3 +- 7 files changed, 75 insertions(+), 60 deletions(-)
[PATCH 1/5] efi/capsule: Make efi_capsule_pending() lockless
Taking a mutex in the reboot path is bogus because we cannot sleep with interrupts disabled, such as when rebooting due to panic(), BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97 in_atomic(): 0, irqs_disabled(): 1, pid: 7, name: rcu_sched Call Trace: dump_stack+0x63/0x89 ___might_sleep+0xd8/0x120 __might_sleep+0x49/0x80 mutex_lock+0x20/0x50 efi_capsule_pending+0x1d/0x60 native_machine_emergency_restart+0x59/0x280 machine_emergency_restart+0x19/0x20 emergency_restart+0x18/0x20 panic+0x1ba/0x217 In this case all other CPUs will have been stopped by the time we execute the platform reboot code, so 'capsule_pending' cannot change under our feet. We wouldn't care even if it could since we cannot wait for it complete. Also, instead of relying on the external 'system_state' variable just use a reboot notifier, so we can set 'stop_capsules' while holding 'capsule_mutex', thereby avoiding a race where system_state is updated while we're in the middle of efi_capsule_update_locked() (since CPUs won't have been stopped at that point). Cc: Borislav Petkov <b...@alien8.de> Cc: Kweh Hock Leong <hock.leong.k...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Bryan O'Donoghue <pure.lo...@nexus-software.ie> Cc: joeyli <j...@suse.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- drivers/firmware/efi/capsule.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/drivers/firmware/efi/capsule.c b/drivers/firmware/efi/capsule.c index 0de55944ac0b..4703dc9b8fbd 100644 --- a/drivers/firmware/efi/capsule.c +++ b/drivers/firmware/efi/capsule.c @@ -22,11 +22,12 @@ typedef struct { } efi_capsule_block_desc_t; static bool capsule_pending; +static bool stop_capsules; static int efi_reset_type = -1; /* * capsule_mutex serialises access to both capsule_pending and - * efi_reset_type. + * efi_reset_type and stop_capsules. */ static DEFINE_MUTEX(capsule_mutex); @@ -50,18 +51,13 @@ static DEFINE_MUTEX(capsule_mutex); */ bool efi_capsule_pending(int *reset_type) { - bool rv = false; - - mutex_lock(_mutex); if (!capsule_pending) - goto out; + return false; if (reset_type) *reset_type = efi_reset_type; - rv = true; -out: - mutex_unlock(_mutex); - return rv; + + return true; } /* @@ -176,7 +172,7 @@ efi_capsule_update_locked(efi_capsule_header_t *capsule, * whether to force an EFI reboot), and we're racing against * that call. Abort in that case. */ - if (unlikely(system_state == SYSTEM_RESTART)) { + if (unlikely(stop_capsules)) { pr_warn("Capsule update raced with reboot, aborting.\n"); return -EINVAL; } @@ -298,3 +294,23 @@ out: return rv; } EXPORT_SYMBOL_GPL(efi_capsule_update); + +static int capsule_reboot_notify(struct notifier_block *nb, +unsigned long event, void *cmd) +{ + mutex_lock(_mutex); + stop_capsules = true; + mutex_unlock(_mutex); + + return NOTIFY_DONE; +} + +static struct notifier_block capsule_reboot_nb = { + .notifier_call = capsule_reboot_notify, +}; + +static int __init capsule_reboot_register(void) +{ + return register_reboot_notifier(_reboot_nb); +} +core_initcall(capsule_reboot_register); -- 2.7.3
[PATCH 2/5] efibc: Fix excessive stack footprint warning
From: Jeremy Compostella <jeremy.composte...@intel.com> gcc complains about a newly added file for the EFI Bootloader Control: drivers/firmware/efi/efibc.c: In function 'efibc_set_variable': drivers/firmware/efi/efibc.c:53:1: error: the frame size of 2272 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] The problem is the declaration of a local variable of type struct efivar_entry, which is by itself larger than the warning limit of 1024 bytes. Use dynamic memory allocation instead of stack memory for the entry object. This patch also fixes a potential buffer overflow. Reported-by: Ingo Molnar <mi...@kernel.org> Reported-by: Arnd Bergmann <a...@arndb.de> Signed-off-by: Jeremy Compostella <jeremy.composte...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> [ Updated changelog to include gcc error ] Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- drivers/firmware/efi/efibc.c | 34 +++--- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/drivers/firmware/efi/efibc.c b/drivers/firmware/efi/efibc.c index 2e0c7ccd9d9e..8dd0c7085e59 100644 --- a/drivers/firmware/efi/efibc.c +++ b/drivers/firmware/efi/efibc.c @@ -17,6 +17,7 @@ #include #include #include +#include static void efibc_str_to_str16(const char *str, efi_char16_t *str16) { @@ -28,41 +29,52 @@ static void efibc_str_to_str16(const char *str, efi_char16_t *str16) str16[i] = '\0'; } -static void efibc_set_variable(const char *name, const char *value) +static int efibc_set_variable(const char *name, const char *value) { int ret; efi_guid_t guid = LINUX_EFI_LOADER_ENTRY_GUID; - struct efivar_entry entry; + struct efivar_entry *entry; size_t size = (strlen(value) + 1) * sizeof(efi_char16_t); - if (size > sizeof(entry.var.Data)) + if (size > sizeof(entry->var.Data)) { pr_err("value is too large"); + return -EINVAL; + } - efibc_str_to_str16(name, entry.var.VariableName); - efibc_str_to_str16(value, (efi_char16_t *)entry.var.Data); - memcpy(, , sizeof(guid)); + entry = kmalloc(sizeof(*entry), GFP_KERNEL); + if (!entry) { + pr_err("failed to allocate efivar entry"); + return -ENOMEM; + } - ret = efivar_entry_set(, + efibc_str_to_str16(name, entry->var.VariableName); + efibc_str_to_str16(value, (efi_char16_t *)entry->var.Data); + memcpy(>var.VendorGuid, , sizeof(guid)); + + ret = efivar_entry_set(entry, EFI_VARIABLE_NON_VOLATILE | EFI_VARIABLE_BOOTSERVICE_ACCESS | EFI_VARIABLE_RUNTIME_ACCESS, - size, entry.var.Data, NULL); + size, entry->var.Data, NULL); if (ret) pr_err("failed to set %s EFI variable: 0x%x\n", name, ret); + + kfree(entry); + return ret; } static int efibc_reboot_notifier_call(struct notifier_block *notifier, unsigned long event, void *data) { const char *reason = "shutdown"; + int ret; if (event == SYS_RESTART) reason = "reboot"; - efibc_set_variable("LoaderEntryRebootReason", reason); - - if (!data) + ret = efibc_set_variable("LoaderEntryRebootReason", reason); + if (ret || !data) return NOTIFY_DONE; efibc_set_variable("LoaderEntryOneShot", (char *)data); -- 2.7.3
[GIT PULL 0/5] EFI changes for v4.7
Folks, this is the second pull request containing v4.7 material. The commits are listed in priority order, with the first patch fixing an oops in the EFI capsule code sitting in tip/efi/core, and the rest being a compiler warning fix, static checker fix, and a couple of cleanups. The following changes since commit 0ec7ae928a9c19c2b7b8054507d5694a2597065e: efi: Remove unnecessary (and buggy) .memmap initialization from the Xen EFI driver (2016-04-29 11:06:15 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git tags/efi-next for you to fetch changes up to 20948c1d9fdefa7acfaa84046f59adce9ef00f2e: efivarfs: Make efivarfs_file_ioctl static (2016-05-05 16:52:19 +0100) * Fix an oops in the EFI capsule code reported by the 0day bot because efi_capsule_pending() was grabbing a mutex in the emergency reboot path - Matt Fleming * Fix a compiler warning about excessive stack usage in the new efibc driver by kmalloc'ing the efivar_entry object - Jeremy Compostella * Dan Carpenter reported that it's potentially unsafe to pass the address of a pointer to the firmware in efi_capsule_supported(). Instead we can skip the dynamic allocation entirely and put the capsule object on the stack - Matt Fleming * Simplify the locking in the efivars code by merging two of efivar_init()'s parameters into one - Julia Lawall * Cleanup efivarfs_file_ioctl by marking it as static since it has no external users - Peter Jones Jeremy Compostella (1): efibc: Fix excessive stack footprint warning Julia Lawall (1): efi: Merge boolean flag arguments Matt Fleming (2): efi/capsule: Make efi_capsule_pending() lockless efi/capsule: Move 'capsule' to the stack in efi_capsule_supported() Peter Jones (1): efivarfs: Make efivarfs_file_ioctl static drivers/firmware/efi/capsule.c | 65 -- drivers/firmware/efi/efibc.c | 34 +++--- drivers/firmware/efi/efivars.c | 5 ++-- drivers/firmware/efi/vars.c| 23 +++ fs/efivarfs/file.c | 2 +- fs/efivarfs/super.c| 3 +- include/linux/efi.h| 3 +- 7 files changed, 75 insertions(+), 60 deletions(-)
[PATCH 1/5] efi/capsule: Make efi_capsule_pending() lockless
Taking a mutex in the reboot path is bogus because we cannot sleep with interrupts disabled, such as when rebooting due to panic(), BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97 in_atomic(): 0, irqs_disabled(): 1, pid: 7, name: rcu_sched Call Trace: dump_stack+0x63/0x89 ___might_sleep+0xd8/0x120 __might_sleep+0x49/0x80 mutex_lock+0x20/0x50 efi_capsule_pending+0x1d/0x60 native_machine_emergency_restart+0x59/0x280 machine_emergency_restart+0x19/0x20 emergency_restart+0x18/0x20 panic+0x1ba/0x217 In this case all other CPUs will have been stopped by the time we execute the platform reboot code, so 'capsule_pending' cannot change under our feet. We wouldn't care even if it could since we cannot wait for it complete. Also, instead of relying on the external 'system_state' variable just use a reboot notifier, so we can set 'stop_capsules' while holding 'capsule_mutex', thereby avoiding a race where system_state is updated while we're in the middle of efi_capsule_update_locked() (since CPUs won't have been stopped at that point). Cc: Borislav Petkov Cc: Kweh Hock Leong Cc: Ard Biesheuvel Cc: Bryan O'Donoghue Cc: joeyli Signed-off-by: Matt Fleming --- drivers/firmware/efi/capsule.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/drivers/firmware/efi/capsule.c b/drivers/firmware/efi/capsule.c index 0de55944ac0b..4703dc9b8fbd 100644 --- a/drivers/firmware/efi/capsule.c +++ b/drivers/firmware/efi/capsule.c @@ -22,11 +22,12 @@ typedef struct { } efi_capsule_block_desc_t; static bool capsule_pending; +static bool stop_capsules; static int efi_reset_type = -1; /* * capsule_mutex serialises access to both capsule_pending and - * efi_reset_type. + * efi_reset_type and stop_capsules. */ static DEFINE_MUTEX(capsule_mutex); @@ -50,18 +51,13 @@ static DEFINE_MUTEX(capsule_mutex); */ bool efi_capsule_pending(int *reset_type) { - bool rv = false; - - mutex_lock(_mutex); if (!capsule_pending) - goto out; + return false; if (reset_type) *reset_type = efi_reset_type; - rv = true; -out: - mutex_unlock(_mutex); - return rv; + + return true; } /* @@ -176,7 +172,7 @@ efi_capsule_update_locked(efi_capsule_header_t *capsule, * whether to force an EFI reboot), and we're racing against * that call. Abort in that case. */ - if (unlikely(system_state == SYSTEM_RESTART)) { + if (unlikely(stop_capsules)) { pr_warn("Capsule update raced with reboot, aborting.\n"); return -EINVAL; } @@ -298,3 +294,23 @@ out: return rv; } EXPORT_SYMBOL_GPL(efi_capsule_update); + +static int capsule_reboot_notify(struct notifier_block *nb, +unsigned long event, void *cmd) +{ + mutex_lock(_mutex); + stop_capsules = true; + mutex_unlock(_mutex); + + return NOTIFY_DONE; +} + +static struct notifier_block capsule_reboot_nb = { + .notifier_call = capsule_reboot_notify, +}; + +static int __init capsule_reboot_register(void) +{ + return register_reboot_notifier(_reboot_nb); +} +core_initcall(capsule_reboot_register); -- 2.7.3