Re: [PATCH] x86/efi Fix regression in efi_arch_mem_reserve
On Wed, 21 Dec, at 02:11:38PM, Petr Oros wrote: > Booting on EFI with ESRT table has been stop since commit: > 8e80632 efi/esrt: Use efi_mem_reserve() and avoid a kmalloc() > > This is caused by this commit: > 816e761 efi: Allow drivers to reserve boot services forever > > Problem is, that efi_memmap_insert need memory aligned > on EFI_PAGE_SIZE. If memory not aligned, efi_memmap_insert > just return and let efi.memmap in inconsistent state. > This breaking boot. > > Tested in my machine, which stop booting > after upgrade to 4.9 > > Signed-off-by: Petr Oros > --- > arch/x86/platform/efi/quirks.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Could you provide some more information? Why does efi_memmap_insert() require this alignment? How does booting "break"? If you see an Oops, please post it here.
Re: [PATCH 2/2 v3] sched: use load_avg for selecting idlest group
On Thu, 08 Dec, at 05:56:54PM, Vincent Guittot wrote: > find_idlest_group() only compares the runnable_load_avg when looking for > the least loaded group. But on fork intensive use case like hackbench > where tasks blocked quickly after the fork, this can lead to selecting the > same CPU instead of other CPUs, which have similar runnable load but a > lower load_avg. > > When the runnable_load_avg of 2 CPUs are close, we now take into account > the amount of blocked load as a 2nd selection factor. There is now 3 zones > for the runnable_load of the rq: > -[0 .. (runnable_load - imbalance)] : Select the new rq which has > significantly less runnable_load > -](runnable_load - imbalance) .. (runnable_load + imbalance)[ : The > runnable loads are close so we use load_avg to chose between the 2 rq > -[(runnable_load + imbalance) .. ULONG_MAX] : Keep the current rq which > has significantly less runnable_load > > The scale factor that is currently used for comparing runnable_load, > doesn't work well with small value. As an example, the use of a scaling > factor fails as soon as this_runnable_load == 0 because we always select > local rq even if min_runnable_load is only 1, which doesn't really make > sense because they are just the same. So instead of scaling factor, we use > an absolute margin for runnable_load to detect CPUs with similar > runnable_load and we keep using scaling factor for blocked load. > > For use case like hackbench, this enable the scheduler to select different > CPUs during the fork sequence and to spread tasks across the system. > > Tests have been done on a Hikey board (ARM based octo cores) for several > kernel. The result below gives min, max, avg and stdev values of 18 runs > with each configuration. > > The v4.8+patches configuration also includes the changes below which is > part of the proposal made by Peter to ensure that the clock will be up to > date when the fork task will be attached to the rq. > > @@ -2568,6 +2568,7 @@ void wake_up_new_task(struct task_struct *p) > __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0)); > #endif > rq = __task_rq_lock(p, ); > + update_rq_clock(rq); > post_init_entity_util_avg(>se); > > activate_task(rq, p, 0); > > hackbench -P -g 1 > >ea86cb4b7621 7dc603c9028e v4.8v4.8+patches > min0.049 0.050 0.051 0,048 > avg0.057 0.057(0%) 0.057(0%) 0,055(+5%) > max0.066 0.068 0.070 0,063 > stdev +/-9% +/-9% +/-8% +/-9% > > Signed-off-by: Vincent Guittot <vincent.guit...@linaro.org> > --- > kernel/sched/fair.c | 48 ++-- > 1 file changed, 38 insertions(+), 10 deletions(-) Tested-by: Matt Fleming <m...@codeblueprint.co.uk> Reviewed-by: Matt Fleming <m...@codeblueprint.co.uk> Peter, Ingo, when you pick this up would you also consider adding the following tag which links to an email describing the problem this patch solves and the performance test results when it's applied? Link: https://lkml.kernel.org/r/20161203214707.gi20...@codeblueprint.co.uk
Re: [PATCH 2/2 v3] sched: use load_avg for selecting idlest group
On Thu, 08 Dec, at 05:56:54PM, Vincent Guittot wrote: > find_idlest_group() only compares the runnable_load_avg when looking for > the least loaded group. But on fork intensive use case like hackbench > where tasks blocked quickly after the fork, this can lead to selecting the > same CPU instead of other CPUs, which have similar runnable load but a > lower load_avg. > > When the runnable_load_avg of 2 CPUs are close, we now take into account > the amount of blocked load as a 2nd selection factor. There is now 3 zones > for the runnable_load of the rq: > -[0 .. (runnable_load - imbalance)] : Select the new rq which has > significantly less runnable_load > -](runnable_load - imbalance) .. (runnable_load + imbalance)[ : The > runnable loads are close so we use load_avg to chose between the 2 rq > -[(runnable_load + imbalance) .. ULONG_MAX] : Keep the current rq which > has significantly less runnable_load > > The scale factor that is currently used for comparing runnable_load, > doesn't work well with small value. As an example, the use of a scaling > factor fails as soon as this_runnable_load == 0 because we always select > local rq even if min_runnable_load is only 1, which doesn't really make > sense because they are just the same. So instead of scaling factor, we use > an absolute margin for runnable_load to detect CPUs with similar > runnable_load and we keep using scaling factor for blocked load. > > For use case like hackbench, this enable the scheduler to select different > CPUs during the fork sequence and to spread tasks across the system. > > Tests have been done on a Hikey board (ARM based octo cores) for several > kernel. The result below gives min, max, avg and stdev values of 18 runs > with each configuration. > > The v4.8+patches configuration also includes the changes below which is > part of the proposal made by Peter to ensure that the clock will be up to > date when the fork task will be attached to the rq. > > @@ -2568,6 +2568,7 @@ void wake_up_new_task(struct task_struct *p) > __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0)); > #endif > rq = __task_rq_lock(p, ); > + update_rq_clock(rq); > post_init_entity_util_avg(>se); > > activate_task(rq, p, 0); > > hackbench -P -g 1 > >ea86cb4b7621 7dc603c9028e v4.8v4.8+patches > min0.049 0.050 0.051 0,048 > avg0.057 0.057(0%) 0.057(0%) 0,055(+5%) > max0.066 0.068 0.070 0,063 > stdev +/-9% +/-9% +/-8% +/-9% > > Signed-off-by: Vincent Guittot > --- > kernel/sched/fair.c | 48 ++-- > 1 file changed, 38 insertions(+), 10 deletions(-) Tested-by: Matt Fleming Reviewed-by: Matt Fleming Peter, Ingo, when you pick this up would you also consider adding the following tag which links to an email describing the problem this patch solves and the performance test results when it's applied? Link: https://lkml.kernel.org/r/20161203214707.gi20...@codeblueprint.co.uk
Re: [PATCH 1/2 v3] sched: fix find_idlest_group for fork
On Thu, 08 Dec, at 05:56:53PM, Vincent Guittot wrote: > During fork, the utilization of a task is init once the rq has been > selected because the current utilization level of the rq is used to set > the utilization of the fork task. As the task's utilization is still > null at this step of the fork sequence, it doesn't make sense to look for > some spare capacity that can fit the task's utilization. > Furthermore, I can see perf regressions for the test "hackbench -P -g 1" > because the least loaded policy is always bypassed and tasks are not > spread during fork. > > With this patch and the fix below, we are back to same performances as > for v4.8. The fix below is only a temporary one used for the test until a > smarter solution is found because we can't simply remove the test which is > useful for others benchmarks > > @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, > struct sched_domain *sd, int t > > avg_cost = this_sd->avg_scan_cost; > > - /* > - * Due to large variance we need a large fuzz factor; hackbench in > - * particularly is sensitive here. > - */ > - if ((avg_idle / 512) < avg_cost) > - return -1; > - > time = local_clock(); > > for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) { > > Signed-off-by: Vincent Guittot <vincent.guit...@linaro.org> > Acked-by: Morten Rasmussen <morten.rasmus...@arm.com> > --- > kernel/sched/fair.c | 6 ++ > 1 file changed, 6 insertions(+) Tested-by: Matt Fleming <m...@codeblueprint.co.uk> Reviewed-by: Matt Fleming <m...@codeblueprint.co.uk>
Re: [PATCH 1/2 v3] sched: fix find_idlest_group for fork
On Thu, 08 Dec, at 05:56:53PM, Vincent Guittot wrote: > During fork, the utilization of a task is init once the rq has been > selected because the current utilization level of the rq is used to set > the utilization of the fork task. As the task's utilization is still > null at this step of the fork sequence, it doesn't make sense to look for > some spare capacity that can fit the task's utilization. > Furthermore, I can see perf regressions for the test "hackbench -P -g 1" > because the least loaded policy is always bypassed and tasks are not > spread during fork. > > With this patch and the fix below, we are back to same performances as > for v4.8. The fix below is only a temporary one used for the test until a > smarter solution is found because we can't simply remove the test which is > useful for others benchmarks > > @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, > struct sched_domain *sd, int t > > avg_cost = this_sd->avg_scan_cost; > > - /* > - * Due to large variance we need a large fuzz factor; hackbench in > - * particularly is sensitive here. > - */ > - if ((avg_idle / 512) < avg_cost) > - return -1; > - > time = local_clock(); > > for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) { > > Signed-off-by: Vincent Guittot > Acked-by: Morten Rasmussen > --- > kernel/sched/fair.c | 6 ++ > 1 file changed, 6 insertions(+) Tested-by: Matt Fleming Reviewed-by: Matt Fleming
Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group
On Mon, 05 Dec, at 01:35:46PM, Matt Fleming wrote: > On Mon, 05 Dec, at 10:27:36AM, Vincent Guittot wrote: > > > > Hi Matt, > > > > Thanks for the results. > > > > During the review, it has been pointed out by Morten that the test condition > > (100*this_avg_load < imbalance_scale*min_avg_load) makes more sense than > > (100*min_avg_load > imbalance_scale*this_avg_load). But i see lower > > performances with this change. Coud you run tests with the change below on > > top of the patchset ? > > > > --- > > kernel/sched/fair.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index e8d1ae7..0129fbb 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -5514,7 +5514,7 @@ find_idlest_group(struct sched_domain *sd, struct > > task_struct *p, > > if (!idlest || > > (min_runnable_load > (this_runnable_load + imbalance)) || > > ((this_runnable_load < (min_runnable_load + imbalance)) && > > - (100*min_avg_load > imbalance_scale*this_avg_load))) > > + (100*this_avg_load < imbalance_scale*min_avg_load))) > > return NULL; > > return idlest; > > } > > Queued for testing. Most of the machines didn't notice the difference with this new patch. However, I did come across one test that showed a negative change, hackbench-thread-pipes 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 tip-sched fix-fig-for-fork fix-sig fix-fig-for-fork-v2 Amean1 0.1266 ( 0.00%) 0.1269 ( -0.23%) 0.1287 ( -1.69%) 0.1357 ( -7.22%) Amean4 0.4989 ( 0.00%) 0.5174 ( -3.72%) 0.5251 ( -5.27%) 0.5117 ( -2.58%) Amean7 0.8510 ( 0.00%) 0.8517 ( -0.08%) 0.8964 ( -5.34%) 0.8801 ( -3.42%) Amean12 1.0699 ( 0.00%) 1.0484 ( 2.00%) 1.0147 ( 5.15%) 1.0759 ( -0.56%) Amean21 1.2816 ( 0.00%) 1.2140 ( 5.27%) 1.1879 ( 7.31%) 1.2414 ( 3.13%) Amean30 1.4440 ( 0.00%) 1.4003 ( 3.03%) 1.3969 ( 3.26%) 1.4057 ( 2.65%) Amean48 1.5773 ( 0.00%) 1.5983 ( -1.33%) 1.3984 ( 11.34%) 1.5624 ( 0.94%) Amean79 2.2343 ( 0.00%) 2.3066 ( -3.24%) 2.0053 ( 10.25%) 2.2630 ( -1.29%) Amean96 2.6736 ( 0.00%) 2.4313 ( 9.06%) 2.4181 ( 9.55%) 2.4717 ( 7.55%) 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 tip-schedfix-fig-for-fork fix-sigfix-fig-for-fork-v2 User 129.53 128.64 127.70 131.00 System 1784.54 1744.21 1654.08 1744.00 Elapsed92.07 90.44 86.95 91.00 Looking at the 48 and 79 groups rows for mean there's a noticeable drop off in performance of ~10%, which should be outside of the noise for this test. This is a 2 socket, 4 NUMA node (yes, really), 24 cpus AMD opteron circa 2010. Given the age of this machine, I don't think it's worth holding up the patch.
Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group
On Mon, 05 Dec, at 01:35:46PM, Matt Fleming wrote: > On Mon, 05 Dec, at 10:27:36AM, Vincent Guittot wrote: > > > > Hi Matt, > > > > Thanks for the results. > > > > During the review, it has been pointed out by Morten that the test condition > > (100*this_avg_load < imbalance_scale*min_avg_load) makes more sense than > > (100*min_avg_load > imbalance_scale*this_avg_load). But i see lower > > performances with this change. Coud you run tests with the change below on > > top of the patchset ? > > > > --- > > kernel/sched/fair.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index e8d1ae7..0129fbb 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -5514,7 +5514,7 @@ find_idlest_group(struct sched_domain *sd, struct > > task_struct *p, > > if (!idlest || > > (min_runnable_load > (this_runnable_load + imbalance)) || > > ((this_runnable_load < (min_runnable_load + imbalance)) && > > - (100*min_avg_load > imbalance_scale*this_avg_load))) > > + (100*this_avg_load < imbalance_scale*min_avg_load))) > > return NULL; > > return idlest; > > } > > Queued for testing. Most of the machines didn't notice the difference with this new patch. However, I did come across one test that showed a negative change, hackbench-thread-pipes 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 tip-sched fix-fig-for-fork fix-sig fix-fig-for-fork-v2 Amean1 0.1266 ( 0.00%) 0.1269 ( -0.23%) 0.1287 ( -1.69%) 0.1357 ( -7.22%) Amean4 0.4989 ( 0.00%) 0.5174 ( -3.72%) 0.5251 ( -5.27%) 0.5117 ( -2.58%) Amean7 0.8510 ( 0.00%) 0.8517 ( -0.08%) 0.8964 ( -5.34%) 0.8801 ( -3.42%) Amean12 1.0699 ( 0.00%) 1.0484 ( 2.00%) 1.0147 ( 5.15%) 1.0759 ( -0.56%) Amean21 1.2816 ( 0.00%) 1.2140 ( 5.27%) 1.1879 ( 7.31%) 1.2414 ( 3.13%) Amean30 1.4440 ( 0.00%) 1.4003 ( 3.03%) 1.3969 ( 3.26%) 1.4057 ( 2.65%) Amean48 1.5773 ( 0.00%) 1.5983 ( -1.33%) 1.3984 ( 11.34%) 1.5624 ( 0.94%) Amean79 2.2343 ( 0.00%) 2.3066 ( -3.24%) 2.0053 ( 10.25%) 2.2630 ( -1.29%) Amean96 2.6736 ( 0.00%) 2.4313 ( 9.06%) 2.4181 ( 9.55%) 2.4717 ( 7.55%) 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 tip-schedfix-fig-for-fork fix-sigfix-fig-for-fork-v2 User 129.53 128.64 127.70 131.00 System 1784.54 1744.21 1654.08 1744.00 Elapsed92.07 90.44 86.95 91.00 Looking at the 48 and 79 groups rows for mean there's a noticeable drop off in performance of ~10%, which should be outside of the noise for this test. This is a 2 socket, 4 NUMA node (yes, really), 24 cpus AMD opteron circa 2010. Given the age of this machine, I don't think it's worth holding up the patch.
Re: [GIT PULL] efi: Pass secure boot mode to kernel
On Thu, 08 Dec, at 11:45:17AM, David Howells wrote: > Hi Matt, Ard, > > Is it too late to request this for the upcoming merge window? For something as non-trivial as this, yes, it's too late. We generally close the EFI tree window for new features around -rc5 time. > Also, I've made > Lukas's requested changes and reposted just that patch in my reply to him. Do > you want me to repost the lot? Please do, yeah.
Re: [GIT PULL] efi: Pass secure boot mode to kernel
On Thu, 08 Dec, at 11:45:17AM, David Howells wrote: > Hi Matt, Ard, > > Is it too late to request this for the upcoming merge window? For something as non-trivial as this, yes, it's too late. We generally close the EFI tree window for new features around -rc5 time. > Also, I've made > Lukas's requested changes and reposted just that patch in my reply to him. Do > you want me to repost the lot? Please do, yeah.
Re: [PATCH 0/4] UEFI: EFI_MEMORY_ATTRIBUTES_TABLE support for x86
On Wed, 07 Dec, at 11:01:06AM, Sai Praneeth Prakhya wrote: > > Thanks Matt! > > Would you like to see a new version of these patch series addressing > your comments? Like > 1. Dropping of patch #4 > 2. Adding Reviewed-by tag of Joey (Sorry for that) > 3. This time with correct version number No need, they're already in the 'next' branch of the EFI tree.
Re: [PATCH 0/4] UEFI: EFI_MEMORY_ATTRIBUTES_TABLE support for x86
On Wed, 07 Dec, at 11:01:06AM, Sai Praneeth Prakhya wrote: > > Thanks Matt! > > Would you like to see a new version of these patch series addressing > your comments? Like > 1. Dropping of patch #4 > 2. Adding Reviewed-by tag of Joey (Sorry for that) > 3. This time with correct version number No need, they're already in the 'next' branch of the EFI tree.
Re: Does anything ever enter through startup_64 in head_64.S?
On Wed, 07 Dec, at 03:11:17PM, David Howells wrote: > Hi Matt, > > Does anything ever enter the kernel through startup_64 in head_64.S?[*] Do > all 64-bit mode entries always enter through one of the EFI entry points? Which head_64.S? There are two ;-) Assuming you mean startup_64 in boot/compressed/head_64.S, then the answer is "yes". 64-bit BIOS boot loaders will jump there. I'm fairly sure it's possible to boot that way on EFI too, you just lose some of the newer functionality that is dependent on the EFI boot stub. But I don't remember the last time I tried it. In general, we've always tried to maintain backwards compat, but you don't get the new features without switching to the EFI entry points. Whether or not any distros are still using the old 0x200 entry point for EFI is a good question, though.
Re: Does anything ever enter through startup_64 in head_64.S?
On Wed, 07 Dec, at 03:11:17PM, David Howells wrote: > Hi Matt, > > Does anything ever enter the kernel through startup_64 in head_64.S?[*] Do > all 64-bit mode entries always enter through one of the EFI entry points? Which head_64.S? There are two ;-) Assuming you mean startup_64 in boot/compressed/head_64.S, then the answer is "yes". 64-bit BIOS boot loaders will jump there. I'm fairly sure it's possible to boot that way on EFI too, you just lose some of the newer functionality that is dependent on the EFI boot stub. But I don't remember the last time I tried it. In general, we've always tried to maintain backwards compat, but you don't get the new features without switching to the EFI entry points. Whether or not any distros are still using the old 0x200 entry point for EFI is a good question, though.
Re: [PATCH 0/4] UEFI: EFI_MEMORY_ATTRIBUTES_TABLE support for x86
On Tue, 06 Dec, at 11:15:59AM, Sai Praneeth Prakhya wrote: > From: Sai Praneeth> > UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE which describes memory > protections that may be applied to EFI Runtime code and data regions by > kernel. This helps kernel to map efi runtime regions more strictly and > hence allowing only appropriate accesses to these regions. Please refer > to "EFI_MEMORY_ATTRIBUTES_TABLE" in section 4.6 of UEFI specification > v2.6 for more information on this table. > > This patch set relies on commit a604af075a32 ("efi: Add support for the > EFI_MEMORY_ATTRIBUTES_TABLE config table"), commit 10f0d2f57705 ("efi: > Implement generic support for the Memory Attributes table") and hence > implements support for only x86. > > Since the above commits have already implemented early discovery and > validation of table, the following patches implement a call back > function for x86 which is called only when EFI_MEMORY_ATTRIBUTES_TABLE > is detected. > > Patch #1 makes the efi_memory_attributes table detection code generic > across all architectures > > Patch #2 adds EFI_MEM_ATTR bit to keep track of this feature > > Patch #3 Implements call back function that does stricter mappings based > on this table > > Patch #4 Skip parsing of EFI_PROPERTIES_TABLE if EFI_MEMORY_ATTRIBUTES_TABLE > is detected > > Sai Praneeth (4): > efi: Make EFI_MEMORY_ATTRIBUTES_TABLE initialization common across all > architectures > efi: Introduce EFI_MEM_ATTR bit and set it from memory attributes > table > x86/efi: Add support for EFI_MEMORY_ATTRIBUTES_TABLE > efi: Skip parsing of EFI_PROPERTIES_TABLE if > EFI_MEMORY_ATTRIBUTES_TABLE is detected > > arch/x86/platform/efi/efi_64.c | 64 > ++--- > drivers/firmware/efi/arm-init.c | 1 - > drivers/firmware/efi/efi.c | 13 + > drivers/firmware/efi/memattr.c | 6 +++- > include/linux/efi.h | 1 + > 5 files changed, 73 insertions(+), 12 deletions(-) Thanks Sai, I've queued this up for v4.11.
Re: [PATCH 0/4] UEFI: EFI_MEMORY_ATTRIBUTES_TABLE support for x86
On Tue, 06 Dec, at 11:15:59AM, Sai Praneeth Prakhya wrote: > From: Sai Praneeth > > UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE which describes memory > protections that may be applied to EFI Runtime code and data regions by > kernel. This helps kernel to map efi runtime regions more strictly and > hence allowing only appropriate accesses to these regions. Please refer > to "EFI_MEMORY_ATTRIBUTES_TABLE" in section 4.6 of UEFI specification > v2.6 for more information on this table. > > This patch set relies on commit a604af075a32 ("efi: Add support for the > EFI_MEMORY_ATTRIBUTES_TABLE config table"), commit 10f0d2f57705 ("efi: > Implement generic support for the Memory Attributes table") and hence > implements support for only x86. > > Since the above commits have already implemented early discovery and > validation of table, the following patches implement a call back > function for x86 which is called only when EFI_MEMORY_ATTRIBUTES_TABLE > is detected. > > Patch #1 makes the efi_memory_attributes table detection code generic > across all architectures > > Patch #2 adds EFI_MEM_ATTR bit to keep track of this feature > > Patch #3 Implements call back function that does stricter mappings based > on this table > > Patch #4 Skip parsing of EFI_PROPERTIES_TABLE if EFI_MEMORY_ATTRIBUTES_TABLE > is detected > > Sai Praneeth (4): > efi: Make EFI_MEMORY_ATTRIBUTES_TABLE initialization common across all > architectures > efi: Introduce EFI_MEM_ATTR bit and set it from memory attributes > table > x86/efi: Add support for EFI_MEMORY_ATTRIBUTES_TABLE > efi: Skip parsing of EFI_PROPERTIES_TABLE if > EFI_MEMORY_ATTRIBUTES_TABLE is detected > > arch/x86/platform/efi/efi_64.c | 64 > ++--- > drivers/firmware/efi/arm-init.c | 1 - > drivers/firmware/efi/efi.c | 13 + > drivers/firmware/efi/memattr.c | 6 +++- > include/linux/efi.h | 1 + > 5 files changed, 73 insertions(+), 12 deletions(-) Thanks Sai, I've queued this up for v4.11.
Re: [PATCH 4/4] efi: Skip parsing of EFI_PROPERTIES_TABLE if EFI_MEMORY_ATTRIBUTES_TABLE is detected
On Tue, 06 Dec, at 11:16:03AM, Sai Praneeth Prakhya wrote: > From: Sai Praneeth <sai.praneeth.prak...@intel.com> > > UEFI specification v2.6 recommends not to use > "EFI_PROPERTIES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA" > attribute of EFI_PROPERTIES_TABLE. Presently, this is the *only* bit > defined in EFI_PROPERTIES_TABLE. This bit implies that EFI Runtime code > and data regions of an executable image are separate and are aligned as > specified in spec. Please refer to "EFI_PROPERTIES_TABLE" in section 4.6 > of UEFI specification v2.6 for more information on this table. > > UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE and is intended to > replace EFI_PROPERTIES_TABLE. If EFI_MEMORY_ATTRIBUTES_TABLE is found we > skip updating of efi runtime region mappings based on > EFI_PROPERTIES_TABLE, so let's also skip parsing of EFI_PROPERTIES_TABLE > if we find EFI_MEMORY_ATTRIBUTES_TABLE because we are not using this > table anyways. The only caveat here is, if further versions of UEFI spec > adds some more bits (hence some more attributes) to EFI_PROPERTIES_TABLE > then we might need to parse it again, otherwise there is no good in > doing that. We can also expect that the same attributes might be reflected in > EFI_MEMORY_ATTRIBUTES_TABLE and hence saving us from parsing > EFI_PROPERTIES_TABLE again. > > Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> > Cc: Lee, Chun-Yi <j...@suse.com> > Cc: Borislav Petkov <b...@alien8.de> > Cc: Ricardo Neri <ricardo.n...@intel.com> > Cc: Matt Fleming <m...@codeblueprint.co.uk> > Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> > Cc: Ravi Shankar <ravi.v.shan...@intel.com> > Cc: Fenghua Yu <fenghua...@intel.com> > --- > drivers/firmware/efi/efi.c | 11 +++ > 1 file changed, 11 insertions(+) I see where you're coming from with this patch, but I think it's unnecessary. Turning on/off parsing of Table A based on existence of Table B just seems like extra work.
Re: [PATCH 4/4] efi: Skip parsing of EFI_PROPERTIES_TABLE if EFI_MEMORY_ATTRIBUTES_TABLE is detected
On Tue, 06 Dec, at 11:16:03AM, Sai Praneeth Prakhya wrote: > From: Sai Praneeth > > UEFI specification v2.6 recommends not to use > "EFI_PROPERTIES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA" > attribute of EFI_PROPERTIES_TABLE. Presently, this is the *only* bit > defined in EFI_PROPERTIES_TABLE. This bit implies that EFI Runtime code > and data regions of an executable image are separate and are aligned as > specified in spec. Please refer to "EFI_PROPERTIES_TABLE" in section 4.6 > of UEFI specification v2.6 for more information on this table. > > UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE and is intended to > replace EFI_PROPERTIES_TABLE. If EFI_MEMORY_ATTRIBUTES_TABLE is found we > skip updating of efi runtime region mappings based on > EFI_PROPERTIES_TABLE, so let's also skip parsing of EFI_PROPERTIES_TABLE > if we find EFI_MEMORY_ATTRIBUTES_TABLE because we are not using this > table anyways. The only caveat here is, if further versions of UEFI spec > adds some more bits (hence some more attributes) to EFI_PROPERTIES_TABLE > then we might need to parse it again, otherwise there is no good in > doing that. We can also expect that the same attributes might be reflected in > EFI_MEMORY_ATTRIBUTES_TABLE and hence saving us from parsing > EFI_PROPERTIES_TABLE again. > > Signed-off-by: Sai Praneeth Prakhya > Cc: Lee, Chun-Yi > Cc: Borislav Petkov > Cc: Ricardo Neri > Cc: Matt Fleming > Cc: Ard Biesheuvel > Cc: Ravi Shankar > Cc: Fenghua Yu > --- > drivers/firmware/efi/efi.c | 11 +++ > 1 file changed, 11 insertions(+) I see where you're coming from with this patch, but I think it's unnecessary. Turning on/off parsing of Table A based on existence of Table B just seems like extra work.
Re: [PATCH 1/4] efi: Make EFI_MEMORY_ATTRIBUTES_TABLE initialization common across all architectures
On Tue, 06 Dec, at 11:16:00AM, Sai Praneeth Prakhya wrote: > From: Sai Praneeth <sai.praneeth.prak...@intel.com> > > Since EFI_PROPERTIES_TABLE and EFI_MEMORY_ATTRIBUTES_TABLE deal with > updating memory region attributes, it makes sense to call > EFI_MEMORY_ATTRIBUTES_TABLE initialization function from the same place > as EFI_PROPERTIES_TABLE. This also moves the EFI_MEMORY_ATTRIBUTES_TABLE > initialization code to a more generic efi initialization path rather > than ARM specific efi initialization. This is important because > EFI_MEMORY_ATTRIBUTES_TABLE will be supported by x86 as well. > > Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> > Cc: Lee, Chun-Yi <j...@suse.com> > Cc: Borislav Petkov <b...@alien8.de> > Cc: Ricardo Neri <ricardo.n...@intel.com> > Cc: Matt Fleming <m...@codeblueprint.co.uk> > Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> > Cc: Ravi Shankar <ravi.v.shan...@intel.com> > Cc: Fenghua Yu <fenghua...@intel.com> > --- > drivers/firmware/efi/arm-init.c | 1 - > drivers/firmware/efi/efi.c | 2 ++ > 2 files changed, 2 insertions(+), 1 deletion(-) You should have applied Joey's Reviewed-by tag when sending out this version so that it doesn't get lost. It's not a big deal, just something to watch out for with future submissions.
Re: [PATCH 1/4] efi: Make EFI_MEMORY_ATTRIBUTES_TABLE initialization common across all architectures
On Tue, 06 Dec, at 11:16:00AM, Sai Praneeth Prakhya wrote: > From: Sai Praneeth > > Since EFI_PROPERTIES_TABLE and EFI_MEMORY_ATTRIBUTES_TABLE deal with > updating memory region attributes, it makes sense to call > EFI_MEMORY_ATTRIBUTES_TABLE initialization function from the same place > as EFI_PROPERTIES_TABLE. This also moves the EFI_MEMORY_ATTRIBUTES_TABLE > initialization code to a more generic efi initialization path rather > than ARM specific efi initialization. This is important because > EFI_MEMORY_ATTRIBUTES_TABLE will be supported by x86 as well. > > Signed-off-by: Sai Praneeth Prakhya > Cc: Lee, Chun-Yi > Cc: Borislav Petkov > Cc: Ricardo Neri > Cc: Matt Fleming > Cc: Ard Biesheuvel > Cc: Ravi Shankar > Cc: Fenghua Yu > --- > drivers/firmware/efi/arm-init.c | 1 - > drivers/firmware/efi/efi.c | 2 ++ > 2 files changed, 2 insertions(+), 1 deletion(-) You should have applied Joey's Reviewed-by tag when sending out this version so that it doesn't get lost. It's not a big deal, just something to watch out for with future submissions.
Re: [PATCH 0/4] UEFI: EFI_MEMORY_ATTRIBUTES_TABLE support for x86
On Tue, 06 Dec, at 11:15:59AM, Sai Praneeth Prakhya wrote: > From: Sai Praneeth> > UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE which describes memory > protections that may be applied to EFI Runtime code and data regions by > kernel. This helps kernel to map efi runtime regions more strictly and > hence allowing only appropriate accesses to these regions. Please refer > to "EFI_MEMORY_ATTRIBUTES_TABLE" in section 4.6 of UEFI specification > v2.6 for more information on this table. I guess this is v3 of your patch series right? Please be sure to include a version tag in future like you did for you v2.
Re: [PATCH 0/4] UEFI: EFI_MEMORY_ATTRIBUTES_TABLE support for x86
On Tue, 06 Dec, at 11:15:59AM, Sai Praneeth Prakhya wrote: > From: Sai Praneeth > > UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE which describes memory > protections that may be applied to EFI Runtime code and data regions by > kernel. This helps kernel to map efi runtime regions more strictly and > hence allowing only appropriate accesses to these regions. Please refer > to "EFI_MEMORY_ATTRIBUTES_TABLE" in section 4.6 of UEFI specification > v2.6 for more information on this table. I guess this is v3 of your patch series right? Please be sure to include a version tag in future like you did for you v2.
Re: [RFC PATCH v3 10/20] Add support to access boot related data in the clear
On Wed, 09 Nov, at 06:36:31PM, Tom Lendacky wrote: > Boot data (such as EFI related data) is not encrypted when the system is > booted and needs to be accessed unencrypted. Add support to apply the > proper attributes to the EFI page tables and to the early_memremap and > memremap APIs to identify the type of data being accessed so that the > proper encryption attribute can be applied. > > Signed-off-by: Tom Lendacky> --- > arch/x86/include/asm/e820.h|1 > arch/x86/kernel/e820.c | 16 +++ > arch/x86/mm/ioremap.c | 89 > > arch/x86/platform/efi/efi_64.c | 12 - > drivers/firmware/efi/efi.c | 33 +++ > include/linux/efi.h|2 + > kernel/memremap.c |8 +++- > mm/early_ioremap.c | 18 +++- > 8 files changed, 172 insertions(+), 7 deletions(-) FWIW, I think this version is an improvement over all the previous ones. [...] > diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c > index ff542cd..ee347c2 100644 > --- a/arch/x86/mm/ioremap.c > +++ b/arch/x86/mm/ioremap.c > @@ -20,6 +20,9 @@ > #include > #include > #include > +#include > +#include > +#include > > #include "physaddr.h" > > @@ -418,6 +421,92 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr) > iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK)); > } > > +static bool memremap_setup_data(resource_size_t phys_addr, > + unsigned long size) > +{ > + u64 paddr; > + > + if (phys_addr == boot_params.hdr.setup_data) > + return true; > + Why is the setup_data linked list not traversed when checking for matching addresses? Am I reading this incorrectly? I don't see how this can work. > + paddr = boot_params.efi_info.efi_memmap_hi; > + paddr <<= 32; > + paddr |= boot_params.efi_info.efi_memmap; > + if (phys_addr == paddr) > + return true; > + > + paddr = boot_params.efi_info.efi_systab_hi; > + paddr <<= 32; > + paddr |= boot_params.efi_info.efi_systab; > + if (phys_addr == paddr) > + return true; > + > + if (efi_table_address_match(phys_addr)) > + return true; > + > + return false; > +} > + > +static bool memremap_apply_encryption(resource_size_t phys_addr, > + unsigned long size) > +{ > + /* SME is not active, just return true */ > + if (!sme_me_mask) > + return true; > + > + /* Check if the address is part of the setup data */ > + if (memremap_setup_data(phys_addr, size)) > + return false; > + > + /* Check if the address is part of EFI boot/runtime data */ > + switch (efi_mem_type(phys_addr)) { > + case EFI_BOOT_SERVICES_DATA: > + case EFI_RUNTIME_SERVICES_DATA: > + return false; > + } EFI_LOADER_DATA is notable by its absence. We use that memory type for allocations inside of the EFI boot stub that are than used while the kernel is running. One use that comes to mind is for initrd files, see handle_cmdline_files(). Oh I see you handle that in PATCH 9, never mind. > diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c > index 58b0f80..3f89179 100644 > --- a/arch/x86/platform/efi/efi_64.c > +++ b/arch/x86/platform/efi/efi_64.c > @@ -221,7 +221,13 @@ int __init efi_setup_page_tables(unsigned long > pa_memmap, unsigned num_pages) > if (efi_enabled(EFI_OLD_MEMMAP)) > return 0; > > - efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd); > + /* > + * Since the PGD is encrypted, set the encryption mask so that when > + * this value is loaded into cr3 the PGD will be decrypted during > + * the pagetable walk. > + */ > + efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd); > + > pgd = efi_pgd; > > /* Do all callers of __pa() in arch/x86 need fixing up like this?
Re: [RFC PATCH v3 10/20] Add support to access boot related data in the clear
On Wed, 09 Nov, at 06:36:31PM, Tom Lendacky wrote: > Boot data (such as EFI related data) is not encrypted when the system is > booted and needs to be accessed unencrypted. Add support to apply the > proper attributes to the EFI page tables and to the early_memremap and > memremap APIs to identify the type of data being accessed so that the > proper encryption attribute can be applied. > > Signed-off-by: Tom Lendacky > --- > arch/x86/include/asm/e820.h|1 > arch/x86/kernel/e820.c | 16 +++ > arch/x86/mm/ioremap.c | 89 > > arch/x86/platform/efi/efi_64.c | 12 - > drivers/firmware/efi/efi.c | 33 +++ > include/linux/efi.h|2 + > kernel/memremap.c |8 +++- > mm/early_ioremap.c | 18 +++- > 8 files changed, 172 insertions(+), 7 deletions(-) FWIW, I think this version is an improvement over all the previous ones. [...] > diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c > index ff542cd..ee347c2 100644 > --- a/arch/x86/mm/ioremap.c > +++ b/arch/x86/mm/ioremap.c > @@ -20,6 +20,9 @@ > #include > #include > #include > +#include > +#include > +#include > > #include "physaddr.h" > > @@ -418,6 +421,92 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr) > iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK)); > } > > +static bool memremap_setup_data(resource_size_t phys_addr, > + unsigned long size) > +{ > + u64 paddr; > + > + if (phys_addr == boot_params.hdr.setup_data) > + return true; > + Why is the setup_data linked list not traversed when checking for matching addresses? Am I reading this incorrectly? I don't see how this can work. > + paddr = boot_params.efi_info.efi_memmap_hi; > + paddr <<= 32; > + paddr |= boot_params.efi_info.efi_memmap; > + if (phys_addr == paddr) > + return true; > + > + paddr = boot_params.efi_info.efi_systab_hi; > + paddr <<= 32; > + paddr |= boot_params.efi_info.efi_systab; > + if (phys_addr == paddr) > + return true; > + > + if (efi_table_address_match(phys_addr)) > + return true; > + > + return false; > +} > + > +static bool memremap_apply_encryption(resource_size_t phys_addr, > + unsigned long size) > +{ > + /* SME is not active, just return true */ > + if (!sme_me_mask) > + return true; > + > + /* Check if the address is part of the setup data */ > + if (memremap_setup_data(phys_addr, size)) > + return false; > + > + /* Check if the address is part of EFI boot/runtime data */ > + switch (efi_mem_type(phys_addr)) { > + case EFI_BOOT_SERVICES_DATA: > + case EFI_RUNTIME_SERVICES_DATA: > + return false; > + } EFI_LOADER_DATA is notable by its absence. We use that memory type for allocations inside of the EFI boot stub that are than used while the kernel is running. One use that comes to mind is for initrd files, see handle_cmdline_files(). Oh I see you handle that in PATCH 9, never mind. > diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c > index 58b0f80..3f89179 100644 > --- a/arch/x86/platform/efi/efi_64.c > +++ b/arch/x86/platform/efi/efi_64.c > @@ -221,7 +221,13 @@ int __init efi_setup_page_tables(unsigned long > pa_memmap, unsigned num_pages) > if (efi_enabled(EFI_OLD_MEMMAP)) > return 0; > > - efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd); > + /* > + * Since the PGD is encrypted, set the encryption mask so that when > + * this value is loaded into cr3 the PGD will be decrypted during > + * the pagetable walk. > + */ > + efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd); > + > pgd = efi_pgd; > > /* Do all callers of __pa() in arch/x86 need fixing up like this?
Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group
On Mon, 05 Dec, at 10:27:36AM, Vincent Guittot wrote: > > Hi Matt, > > Thanks for the results. > > During the review, it has been pointed out by Morten that the test condition > (100*this_avg_load < imbalance_scale*min_avg_load) makes more sense than > (100*min_avg_load > imbalance_scale*this_avg_load). But i see lower > performances with this change. Coud you run tests with the change below on > top of the patchset ? > > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index e8d1ae7..0129fbb 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5514,7 +5514,7 @@ find_idlest_group(struct sched_domain *sd, struct > task_struct *p, > if (!idlest || > (min_runnable_load > (this_runnable_load + imbalance)) || > ((this_runnable_load < (min_runnable_load + imbalance)) && > - (100*min_avg_load > imbalance_scale*this_avg_load))) > + (100*this_avg_load < imbalance_scale*min_avg_load))) > return NULL; > return idlest; > } Queued for testing.
Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group
On Mon, 05 Dec, at 10:27:36AM, Vincent Guittot wrote: > > Hi Matt, > > Thanks for the results. > > During the review, it has been pointed out by Morten that the test condition > (100*this_avg_load < imbalance_scale*min_avg_load) makes more sense than > (100*min_avg_load > imbalance_scale*this_avg_load). But i see lower > performances with this change. Coud you run tests with the change below on > top of the patchset ? > > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index e8d1ae7..0129fbb 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5514,7 +5514,7 @@ find_idlest_group(struct sched_domain *sd, struct > task_struct *p, > if (!idlest || > (min_runnable_load > (this_runnable_load + imbalance)) || > ((this_runnable_load < (min_runnable_load + imbalance)) && > - (100*min_avg_load > imbalance_scale*this_avg_load))) > + (100*this_avg_load < imbalance_scale*min_avg_load))) > return NULL; > return idlest; > } Queued for testing.
Re: [tip:efi/core] efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit
On Sun, 04 Dec, at 02:31:23PM, Ard Biesheuvel wrote: > On 4 December 2016 at 14:17, Matt Fleming <m...@codeblueprint.co.uk> wrote: > > > > Ard, was this picked up for the correct tip branch? If it fixes a > > build failure it should have gone into tip/efi/urgent, right? > > The failure was in -next, with a patch queued up for v4.10, so that is > where the fix went as well. Oops, sorry I missed that. I was looking at the v4.11 queue and my brain suffered an off-by-one bug - I thought we'd already had the v4.10 release.
Re: [tip:efi/core] efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit
On Sun, 04 Dec, at 02:31:23PM, Ard Biesheuvel wrote: > On 4 December 2016 at 14:17, Matt Fleming wrote: > > > > Ard, was this picked up for the correct tip branch? If it fixes a > > build failure it should have gone into tip/efi/urgent, right? > > The failure was in -next, with a patch queued up for v4.10, so that is > where the fix went as well. Oops, sorry I missed that. I was looking at the v4.11 queue and my brain suffered an off-by-one bug - I thought we'd already had the v4.10 release.
Re: [tip:efi/core] efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit
On Thu, 24 Nov, at 10:47:15PM, tip-bot for Ard Biesheuvel wrote: > Commit-ID: 018edcfac4c3b140366ad51b0907f3becb5bb624 > Gitweb: http://git.kernel.org/tip/018edcfac4c3b140366ad51b0907f3becb5bb624 > Author: Ard Biesheuvel <ard.biesheu...@linaro.org> > AuthorDate: Thu, 24 Nov 2016 18:02:23 + > Committer: Ingo Molnar <mi...@kernel.org> > CommitDate: Fri, 25 Nov 2016 07:15:23 +0100 > > efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit > > The UEFI stub executes in the context of the firmware, which identity > maps the available system RAM, which implies that only memory below > 4 GB can be used for allocations on 32-bit architectures, even on [L]PAE > capable hardware. > > So ignore any reported memory above 4 GB in efi_random_alloc(). This > also fixes a reported build problem on ARM under -Os, where the 64-bit > logical shift relies on a software routine that the ARM decompressor does > not provide. > > A second [minor] issue is also fixed, where the '+ 1' is moved out of > the shift, where it belongs: the reason for its presence is that a > memory region where start == end should count as a single slot, given > that 'end' takes the desired size and alignment of the allocation into > account. > > To clarify the code in this regard, rename start/end to 'first_slot' and > 'last_slot', respectively, and introduce 'region_end' to describe the > last usable address of the current region. > > Reported-by: Arnd Bergmann <a...@arndb.de> > Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> > Cc: Linus Torvalds <torva...@linux-foundation.org> > Cc: Matt Fleming <m...@codeblueprint.co.uk> > Cc: Peter Zijlstra <pet...@infradead.org> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: linux-...@vger.kernel.org > Link: > http://lkml.kernel.org/r/1480010543-25709-2-git-send-email-ard.biesheu...@linaro.org > Signed-off-by: Ingo Molnar <mi...@kernel.org> > --- > drivers/firmware/efi/libstub/random.c | 13 +++-- > 1 file changed, 7 insertions(+), 6 deletions(-) Ard, was this picked up for the correct tip branch? If it fixes a build failure it should have gone into tip/efi/urgent, right?
Re: [tip:efi/core] efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit
On Thu, 24 Nov, at 10:47:15PM, tip-bot for Ard Biesheuvel wrote: > Commit-ID: 018edcfac4c3b140366ad51b0907f3becb5bb624 > Gitweb: http://git.kernel.org/tip/018edcfac4c3b140366ad51b0907f3becb5bb624 > Author: Ard Biesheuvel > AuthorDate: Thu, 24 Nov 2016 18:02:23 + > Committer: Ingo Molnar > CommitDate: Fri, 25 Nov 2016 07:15:23 +0100 > > efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit > > The UEFI stub executes in the context of the firmware, which identity > maps the available system RAM, which implies that only memory below > 4 GB can be used for allocations on 32-bit architectures, even on [L]PAE > capable hardware. > > So ignore any reported memory above 4 GB in efi_random_alloc(). This > also fixes a reported build problem on ARM under -Os, where the 64-bit > logical shift relies on a software routine that the ARM decompressor does > not provide. > > A second [minor] issue is also fixed, where the '+ 1' is moved out of > the shift, where it belongs: the reason for its presence is that a > memory region where start == end should count as a single slot, given > that 'end' takes the desired size and alignment of the allocation into > account. > > To clarify the code in this regard, rename start/end to 'first_slot' and > 'last_slot', respectively, and introduce 'region_end' to describe the > last usable address of the current region. > > Reported-by: Arnd Bergmann > Signed-off-by: Ard Biesheuvel > Cc: Linus Torvalds > Cc: Matt Fleming > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: linux-...@vger.kernel.org > Link: > http://lkml.kernel.org/r/1480010543-25709-2-git-send-email-ard.biesheu...@linaro.org > Signed-off-by: Ingo Molnar > --- > drivers/firmware/efi/libstub/random.c | 13 +++-- > 1 file changed, 7 insertions(+), 6 deletions(-) Ard, was this picked up for the correct tip branch? If it fixes a build failure it should have gone into tip/efi/urgent, right?
Re: [PATCH 1/2 v2] sched: fix find_idlest_group for fork
On Fri, 25 Nov, at 04:34:32PM, Vincent Guittot wrote: > During fork, the utilization of a task is init once the rq has been > selected because the current utilization level of the rq is used to set > the utilization of the fork task. As the task's utilization is still > null at this step of the fork sequence, it doesn't make sense to look for > some spare capacity that can fit the task's utilization. > Furthermore, I can see perf regressions for the test "hackbench -P -g 1" > because the least loaded policy is always bypassed and tasks are not > spread during fork. > > With this patch and the fix below, we are back to same performances as > for v4.8. The fix below is only a temporary one used for the test until a > smarter solution is found because we can't simply remove the test which is > useful for others benchmarks > > @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, > struct sched_domain *sd, int t > > avg_cost = this_sd->avg_scan_cost; > > - /* > - * Due to large variance we need a large fuzz factor; hackbench in > - * particularly is sensitive here. > - */ > - if ((avg_idle / 512) < avg_cost) > - return -1; > - > time = local_clock(); > > for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) { > OK, I need to point out that I didn't apply the above hunk when testing this patch series. But I wouldn't have expected that to impact our fork-intensive workloads so much. Let me know if you'd like me to re-run with it applied. I don't see much of a difference, positive or negative, for the majority of the test machines, it's mainly a wash. However, the following 4-cpu Xeon E5504 machine does show a nice win, with thread counts in the mid-range (note, the second column is number of hackbench groups, where each group has 40 tasks), hackbench-process-pipes 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 tip-sched fix-fig-for-fork fix-sig Amean1 0.2193 ( 0.00%) 0.2014 ( 8.14%) 0.1746 ( 20.39%) Amean3 0.4489 ( 0.00%) 0.3544 ( 21.04%) 0.3284 ( 26.83%) Amean5 0.6173 ( 0.00%) 0.4690 ( 24.02%) 0.4977 ( 19.37%) Amean7 0.7323 ( 0.00%) 0.6367 ( 13.05%) 0.6267 ( 14.42%) Amean12 0.9716 ( 0.00%) 1.0187 ( -4.85%) 0.9351 ( 3.75%) Amean16 1.2866 ( 0.00%) 1.2664 ( 1.57%) 1.2131 ( 5.71%)
Re: [PATCH 1/2 v2] sched: fix find_idlest_group for fork
On Fri, 25 Nov, at 04:34:32PM, Vincent Guittot wrote: > During fork, the utilization of a task is init once the rq has been > selected because the current utilization level of the rq is used to set > the utilization of the fork task. As the task's utilization is still > null at this step of the fork sequence, it doesn't make sense to look for > some spare capacity that can fit the task's utilization. > Furthermore, I can see perf regressions for the test "hackbench -P -g 1" > because the least loaded policy is always bypassed and tasks are not > spread during fork. > > With this patch and the fix below, we are back to same performances as > for v4.8. The fix below is only a temporary one used for the test until a > smarter solution is found because we can't simply remove the test which is > useful for others benchmarks > > @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, > struct sched_domain *sd, int t > > avg_cost = this_sd->avg_scan_cost; > > - /* > - * Due to large variance we need a large fuzz factor; hackbench in > - * particularly is sensitive here. > - */ > - if ((avg_idle / 512) < avg_cost) > - return -1; > - > time = local_clock(); > > for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) { > OK, I need to point out that I didn't apply the above hunk when testing this patch series. But I wouldn't have expected that to impact our fork-intensive workloads so much. Let me know if you'd like me to re-run with it applied. I don't see much of a difference, positive or negative, for the majority of the test machines, it's mainly a wash. However, the following 4-cpu Xeon E5504 machine does show a nice win, with thread counts in the mid-range (note, the second column is number of hackbench groups, where each group has 40 tasks), hackbench-process-pipes 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 tip-sched fix-fig-for-fork fix-sig Amean1 0.2193 ( 0.00%) 0.2014 ( 8.14%) 0.1746 ( 20.39%) Amean3 0.4489 ( 0.00%) 0.3544 ( 21.04%) 0.3284 ( 26.83%) Amean5 0.6173 ( 0.00%) 0.4690 ( 24.02%) 0.4977 ( 19.37%) Amean7 0.7323 ( 0.00%) 0.6367 ( 13.05%) 0.6267 ( 14.42%) Amean12 0.9716 ( 0.00%) 1.0187 ( -4.85%) 0.9351 ( 3.75%) Amean16 1.2866 ( 0.00%) 1.2664 ( 1.57%) 1.2131 ( 5.71%)
Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group
On Fri, 02 Dec, at 07:31:04PM, Brendan Gregg wrote: > > For background, is this from the "A decade of wasted cores" paper's > patches? No, this patch fixes an issue I originally reported here, https://lkml.kernel.org/r/20160923115808.2330-1-m...@codeblueprint.co.uk Essentially, if you have an idle or partially-idle system and a workload that consists of fork()'ing a bunch of tasks, where each of those tasks immediately sleeps waiting for some wakeup, then those tasks aren't spread across all idle CPUs very well. We saw this issue when running hackbench with a small loop count, such that the actual benchmark setup (fork()'ing) is where the majority of the runtime is spent. In that scenario, there's a large potential/blocked load, but essentially no runnable load, and the balance on fork scheduler code only cares about runnable load without Vincent's patch applied. The closest thing I can find in the "A decade of wasted cores" paper is "The Overload-on-Wakeup bug", but I don't think that's the issue here since, a) We're balancing on fork, not wakeup b) The fork on balance code balances across nodes OK > What's the expected typical gain? Thanks, The results are still coming back from the SUSE performance test grid but they do show that this patch is mainly a win for multi-socket machines with more than 8 cores or thereabouts. [ Vincent, I'll follow up to your PATCH 1/2 with the results that are specifically for that patch ] Assuming a fork-intensive or fork-dominated workload, and a multi-socket machine, such as this 2 socket, NUMA, with 12 cores and HT enabled (48 cpus), we saw a very clear win between +10% and +15% for processes communicating via pipes, (1) tip-sched = tip/sched/core branch (2) fix-fig-for-fork = (1) + PATCH 1/2 (3) fix-sig = (1) + (2) + PATCH 2/2 hackbench-process-pipes 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 tip-sched fix-fig-for-fork fix-sig Amean10.0717 ( 0.00%) 0.0696 ( 2.99%) 0.0730 ( -1.79%) Amean40.1244 ( 0.00%) 0.1200 ( 3.56%) 0.1190 ( 4.36%) Amean70.1891 ( 0.00%) 0.1937 ( -2.42%) 0.1831 ( 3.17%) Amean12 0.2964 ( 0.00%) 0.3116 ( -5.11%) 0.2784 ( 6.07%) Amean21 0.4011 ( 0.00%) 0.4090 ( -1.96%) 0.3574 ( 10.90%) Amean30 0.4944 ( 0.00%) 0.4654 ( 5.87%) 0.4171 ( 15.63%) Amean48 0.6113 ( 0.00%) 0.6309 ( -3.20%) 0.5331 ( 12.78%) Amean79 0.8616 ( 0.00%) 0.8706 ( -1.04%) 0.7710 ( 10.51%) Amean110 1.1304 ( 0.00%) 1.2211 ( -8.02%) 1.0163 ( 10.10%) Amean141 1.3754 ( 0.00%) 1.4279 ( -3.81%) 1.2803 ( 6.92%) Amean172 1.6217 ( 0.00%) 1.7367 ( -7.09%) 1.5363 ( 5.27%) Amean192 1.7809 ( 0.00%) 2.0199 (-13.42%) 1.7129 ( 3.82%) Things look even better when using threads and pipes, with wins between 11% and 29% when looking at results outside of the noise, hackbench-thread-pipes 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 tip-sched fix-fig-for-fork fix-sig Amean10.0736 ( 0.00%) 0.0794 ( -7.96%) 0.0779 ( -5.83%) Amean40.1709 ( 0.00%) 0.1690 ( 1.09%) 0.1663 ( 2.68%) Amean70.2836 ( 0.00%) 0.3080 ( -8.61%) 0.2640 ( 6.90%) Amean12 0.4393 ( 0.00%) 0.4843 (-10.24%) 0.4090 ( 6.89%) Amean21 0.5821 ( 0.00%) 0.6369 ( -9.40%) 0.5126 ( 11.95%) Amean30 0.6557 ( 0.00%) 0.6459 ( 1.50%) 0.5711 ( 12.90%) Amean48 0.7924 ( 0.00%) 0.7760 ( 2.07%) 0.6286 ( 20.68%) Amean79 1.0534 ( 0.00%) 1.0551 ( -0.16%) 0.8481 ( 19.49%) Amean110 1.5286 ( 0.00%) 1.4504 ( 5.11%) 1.1121 ( 27.24%) Amean141 1.9507 ( 0.00%) 1.7790 ( 8.80%) 1.3804 ( 29.23%) Amean172 2.2261 ( 0.00%) 2.3330 ( -4.80%) 1.6336 ( 26.62%) Amean192 2.3753 ( 0.00%) 2.3307 ( 1.88%) 1.8246 ( 23.19%) Somewhat surprisingly, I can see improvements for UMA machines with fewer cores when the workload heavily saturates the machine and the workload isn't dominated by fork. Such heavy saturation isn't super realistic, but still interesting. I haven't dug into why these results occurred, but I am happy things didn't instead fall off a cliff. Here's a 4-cpu UMA box showing some improvement at the higher end, hackbench-process-pipes 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 tip-sched fix-fig-for-fork fix-sig Amean1 3.5060 ( 0.00%) 3.5747 ( -1.96%) 3.5117 ( -0.16%) Amean3 7.7113 ( 0.00%) 7.8160 ( -1.36%) 7.7747 ( -0.82%) Amean5 11.4453 ( 0.00%)
Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group
On Fri, 02 Dec, at 07:31:04PM, Brendan Gregg wrote: > > For background, is this from the "A decade of wasted cores" paper's > patches? No, this patch fixes an issue I originally reported here, https://lkml.kernel.org/r/20160923115808.2330-1-m...@codeblueprint.co.uk Essentially, if you have an idle or partially-idle system and a workload that consists of fork()'ing a bunch of tasks, where each of those tasks immediately sleeps waiting for some wakeup, then those tasks aren't spread across all idle CPUs very well. We saw this issue when running hackbench with a small loop count, such that the actual benchmark setup (fork()'ing) is where the majority of the runtime is spent. In that scenario, there's a large potential/blocked load, but essentially no runnable load, and the balance on fork scheduler code only cares about runnable load without Vincent's patch applied. The closest thing I can find in the "A decade of wasted cores" paper is "The Overload-on-Wakeup bug", but I don't think that's the issue here since, a) We're balancing on fork, not wakeup b) The fork on balance code balances across nodes OK > What's the expected typical gain? Thanks, The results are still coming back from the SUSE performance test grid but they do show that this patch is mainly a win for multi-socket machines with more than 8 cores or thereabouts. [ Vincent, I'll follow up to your PATCH 1/2 with the results that are specifically for that patch ] Assuming a fork-intensive or fork-dominated workload, and a multi-socket machine, such as this 2 socket, NUMA, with 12 cores and HT enabled (48 cpus), we saw a very clear win between +10% and +15% for processes communicating via pipes, (1) tip-sched = tip/sched/core branch (2) fix-fig-for-fork = (1) + PATCH 1/2 (3) fix-sig = (1) + (2) + PATCH 2/2 hackbench-process-pipes 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 tip-sched fix-fig-for-fork fix-sig Amean10.0717 ( 0.00%) 0.0696 ( 2.99%) 0.0730 ( -1.79%) Amean40.1244 ( 0.00%) 0.1200 ( 3.56%) 0.1190 ( 4.36%) Amean70.1891 ( 0.00%) 0.1937 ( -2.42%) 0.1831 ( 3.17%) Amean12 0.2964 ( 0.00%) 0.3116 ( -5.11%) 0.2784 ( 6.07%) Amean21 0.4011 ( 0.00%) 0.4090 ( -1.96%) 0.3574 ( 10.90%) Amean30 0.4944 ( 0.00%) 0.4654 ( 5.87%) 0.4171 ( 15.63%) Amean48 0.6113 ( 0.00%) 0.6309 ( -3.20%) 0.5331 ( 12.78%) Amean79 0.8616 ( 0.00%) 0.8706 ( -1.04%) 0.7710 ( 10.51%) Amean110 1.1304 ( 0.00%) 1.2211 ( -8.02%) 1.0163 ( 10.10%) Amean141 1.3754 ( 0.00%) 1.4279 ( -3.81%) 1.2803 ( 6.92%) Amean172 1.6217 ( 0.00%) 1.7367 ( -7.09%) 1.5363 ( 5.27%) Amean192 1.7809 ( 0.00%) 2.0199 (-13.42%) 1.7129 ( 3.82%) Things look even better when using threads and pipes, with wins between 11% and 29% when looking at results outside of the noise, hackbench-thread-pipes 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 tip-sched fix-fig-for-fork fix-sig Amean10.0736 ( 0.00%) 0.0794 ( -7.96%) 0.0779 ( -5.83%) Amean40.1709 ( 0.00%) 0.1690 ( 1.09%) 0.1663 ( 2.68%) Amean70.2836 ( 0.00%) 0.3080 ( -8.61%) 0.2640 ( 6.90%) Amean12 0.4393 ( 0.00%) 0.4843 (-10.24%) 0.4090 ( 6.89%) Amean21 0.5821 ( 0.00%) 0.6369 ( -9.40%) 0.5126 ( 11.95%) Amean30 0.6557 ( 0.00%) 0.6459 ( 1.50%) 0.5711 ( 12.90%) Amean48 0.7924 ( 0.00%) 0.7760 ( 2.07%) 0.6286 ( 20.68%) Amean79 1.0534 ( 0.00%) 1.0551 ( -0.16%) 0.8481 ( 19.49%) Amean110 1.5286 ( 0.00%) 1.4504 ( 5.11%) 1.1121 ( 27.24%) Amean141 1.9507 ( 0.00%) 1.7790 ( 8.80%) 1.3804 ( 29.23%) Amean172 2.2261 ( 0.00%) 2.3330 ( -4.80%) 1.6336 ( 26.62%) Amean192 2.3753 ( 0.00%) 2.3307 ( 1.88%) 1.8246 ( 23.19%) Somewhat surprisingly, I can see improvements for UMA machines with fewer cores when the workload heavily saturates the machine and the workload isn't dominated by fork. Such heavy saturation isn't super realistic, but still interesting. I haven't dug into why these results occurred, but I am happy things didn't instead fall off a cliff. Here's a 4-cpu UMA box showing some improvement at the higher end, hackbench-process-pipes 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 tip-sched fix-fig-for-fork fix-sig Amean1 3.5060 ( 0.00%) 3.5747 ( -1.96%) 3.5117 ( -0.16%) Amean3 7.7113 ( 0.00%) 7.8160 ( -1.36%) 7.7747 ( -0.82%) Amean5 11.4453 ( 0.00%)
Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group
On Fri, 02 Dec, at 04:20:54PM, Vincent Guittot wrote: > > Matt, > > Have you been able to get some results for the patchset ? Sorry, I messed up the test config and the tests never ran :( I've redone everything and fast-tracked the results on the SUSE grid now. Should have results by morning.
Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group
On Fri, 02 Dec, at 04:20:54PM, Vincent Guittot wrote: > > Matt, > > Have you been able to get some results for the patchset ? Sorry, I messed up the test config and the tests never ran :( I've redone everything and fast-tracked the results on the SUSE grid now. Should have results by morning.
Re: [PATCH 1/2 v2] sched: fix find_idlest_group for fork
On Tue, 29 Nov, at 12:42:43PM, Peter Zijlstra wrote: > > IIRC, and my pounding head really doesn't remember much, the comment > reads like we need the large fudge factor because hackbench. That is, > hackbench would like this test to go away, but others benchmarks will > tank. > > Now, if only I would've written down which benchmarks that were.. awell. Going out on a limb: Chris' schbench?
Re: [PATCH 1/2 v2] sched: fix find_idlest_group for fork
On Tue, 29 Nov, at 12:42:43PM, Peter Zijlstra wrote: > > IIRC, and my pounding head really doesn't remember much, the comment > reads like we need the large fudge factor because hackbench. That is, > hackbench would like this test to go away, but others benchmarks will > tank. > > Now, if only I would've written down which benchmarks that were.. awell. Going out on a limb: Chris' schbench?
Re: [PATCH 0/2 v2] sched: improve spread of tasks during fork
On Fri, 25 Nov, at 04:34:31PM, Vincent Guittot wrote: > This patchset was originally 1 patch but a perf regression happened during > the rebase. > The patch 01 fixes the perf regression discovered in tip/sched/core during the > rebase. > The patch 02 is the rebase of the patch that fixes the regression raised by > Matt Fleming [1]. > > [1] https://lkml.org/lkml/2016/10/18/206 > > Vincent Guittot (2): > sched: fix find_idlest_group for fork > sched: use load_avg for selecting idlest group > > kernel/sched/fair.c | 54 > +++-- > 1 file changed, 44 insertions(+), 10 deletions(-) FYI, I'm running both of these patches through the SUSE performance testing grid right now. I'll let you know as soon as I have results (should be in the next few days).
Re: [PATCH 0/2 v2] sched: improve spread of tasks during fork
On Fri, 25 Nov, at 04:34:31PM, Vincent Guittot wrote: > This patchset was originally 1 patch but a perf regression happened during > the rebase. > The patch 01 fixes the perf regression discovered in tip/sched/core during the > rebase. > The patch 02 is the rebase of the patch that fixes the regression raised by > Matt Fleming [1]. > > [1] https://lkml.org/lkml/2016/10/18/206 > > Vincent Guittot (2): > sched: fix find_idlest_group for fork > sched: use load_avg for selecting idlest group > > kernel/sched/fair.c | 54 > +++-- > 1 file changed, 44 insertions(+), 10 deletions(-) FYI, I'm running both of these patches through the SUSE performance testing grid right now. I'll let you know as soon as I have results (should be in the next few days).
Re: [PATCH 1/2 v2] sched: fix find_idlest_group for fork
On Fri, 25 Nov, at 04:34:32PM, Vincent Guittot wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index aa47589..820a787 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5463,13 +5463,19 @@ find_idlest_group(struct sched_domain *sd, struct > task_struct *p, >* utilized systems if we require spare_capacity > task_util(p), >* so we allow for some task stuffing by using >* spare_capacity > task_util(p)/2. > + * spare capacity can't be used for fork because the utilization has > + * not been set yet as it need to get a rq to init the utilization >*/ > + if (sd_flag & SD_BALANCE_FORK) > + goto no_spare; > + > if (this_spare > task_util(p) / 2 && > imbalance*this_spare > 100*most_spare) > return NULL; > else if (most_spare > task_util(p) / 2) > return most_spare_sg; > > +no_spare: > if (!idlest || 100*this_load < imbalance*min_load) > return NULL; > return idlest; It's only a minor comment, but would you be opposed to calling this label 'skip_spare' to indicate that spare capacity may exist, but we're not going to make use of it?
Re: [PATCH 1/2 v2] sched: fix find_idlest_group for fork
On Fri, 25 Nov, at 04:34:32PM, Vincent Guittot wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index aa47589..820a787 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5463,13 +5463,19 @@ find_idlest_group(struct sched_domain *sd, struct > task_struct *p, >* utilized systems if we require spare_capacity > task_util(p), >* so we allow for some task stuffing by using >* spare_capacity > task_util(p)/2. > + * spare capacity can't be used for fork because the utilization has > + * not been set yet as it need to get a rq to init the utilization >*/ > + if (sd_flag & SD_BALANCE_FORK) > + goto no_spare; > + > if (this_spare > task_util(p) / 2 && > imbalance*this_spare > 100*most_spare) > return NULL; > else if (most_spare > task_util(p) / 2) > return most_spare_sg; > > +no_spare: > if (!idlest || 100*this_load < imbalance*min_load) > return NULL; > return idlest; It's only a minor comment, but would you be opposed to calling this label 'skip_spare' to indicate that spare capacity may exist, but we're not going to make use of it?
Re: [PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
On Mon, 14 Nov, at 02:55:22PM, Ingo Molnar wrote: > > The problem is not that Ard applied the patches, but that you subsequently > rebased > the tree. For example: > > commit bf5d1f98c1d8be04a40eabb9dd6913347b1b3fc4 > Author: Ard Biesheuvel <ard.biesheu...@linaro.org> > AuthorDate: Thu Oct 20 12:21:26 2016 +0100 > Commit: Matt Fleming <m...@codeblueprint.co.uk> > CommitDate: Sat Nov 12 21:14:41 2016 + > > efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table > > Invoke the EFI_RNG_PROTOCOL protocol in the context of the stub and > install the Linux-specific RNG seed UEFI config table. This will be > picked up by the EFI routines in the core kernel to seed the kernel > entropy pool. > > Cc: Matt Fleming <m...@codeblueprint.co.uk> > Reviewed-by: Kees Cook <keesc...@chromium.org> > Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> Ah yes, this is exactly what happened. > if you rebase it (with your co-maintainer's permission) then you need to add > your > SoB tag. OK, will do so in future. Thanks Ingo.
Re: [PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
On Mon, 14 Nov, at 02:55:22PM, Ingo Molnar wrote: > > The problem is not that Ard applied the patches, but that you subsequently > rebased > the tree. For example: > > commit bf5d1f98c1d8be04a40eabb9dd6913347b1b3fc4 > Author: Ard Biesheuvel > AuthorDate: Thu Oct 20 12:21:26 2016 +0100 > Commit: Matt Fleming > CommitDate: Sat Nov 12 21:14:41 2016 + > > efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table > > Invoke the EFI_RNG_PROTOCOL protocol in the context of the stub and > install the Linux-specific RNG seed UEFI config table. This will be > picked up by the EFI routines in the core kernel to seed the kernel > entropy pool. > > Cc: Matt Fleming > Reviewed-by: Kees Cook > Signed-off-by: Ard Biesheuvel Ah yes, this is exactly what happened. > if you rebase it (with your co-maintainer's permission) then you need to add > your > SoB tag. OK, will do so in future. Thanks Ingo.
Re: [PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
On Sun, 13 Nov, at 09:59:32AM, Ingo Molnar wrote: > > * Ingo Molnarwrote: > > > I'll apply the patches from email and add your SOB. > > Note that the attached config produces this build error: > > drivers/firmware/efi/apple-properties.c:149:9: error: implicit declaration of > function ‘efi_get_device_by_path’ [-Werror=implicit-function-declaration] > > also: > > warning: (THUNDERBOLT) selects APPLE_PROPERTIES which has unmet direct > dependencies (EFI && EFI_STUB && X86) Oops. Lukas, could you take a look at this (config attached). # # Automatically generated file; DO NOT EDIT. # Linux/i386 4.9.0-rc4 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_BITS_MAX=16 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_SMP=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_DEBUG_RODATA=y CONFIG_PGTABLE_LEVELS=3 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_CONSTRUCTORS=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" CONFIG_COMPILE_TEST=y CONFIG_LOCALVERSION="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_FHANDLE=y CONFIG_USELIB=y CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y # CONFIG_IRQ_DOMAIN_DEBUG is not set CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_IRQ_TIME_ACCOUNTING is not set CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_TASKSTATS=y # CONFIG_TASK_DELAY_ACCT is not set CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # # RCU Subsystem # CONFIG_TREE_RCU=y CONFIG_RCU_EXPERT=y CONFIG_SRCU=y # CONFIG_TASKS_RCU is not set CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_FANOUT=32 CONFIG_RCU_FANOUT_LEAF=16 # CONFIG_RCU_FAST_NO_HZ is not set # CONFIG_TREE_RCU_TRACE is not set CONFIG_RCU_KTHREAD_PRIO=0 CONFIG_RCU_NOCB_CPU=y # CONFIG_RCU_NOCB_CPU_NONE is not set CONFIG_RCU_NOCB_CPU_ZERO=y # CONFIG_RCU_NOCB_CPU_ALL is not set # CONFIG_RCU_EXPEDITE_BOOT is not set CONFIG_BUILD_BIN2C=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=20 CONFIG_LOG_CPU_MAX_BUF_SHIFT=12 CONFIG_NMI_LOG_BUF_SHIFT=13 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y CONFIG_CGROUPS=y # CONFIG_MEMCG is not set # CONFIG_BLK_CGROUP is not set CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_CFS_BANDWIDTH=y # CONFIG_RT_GROUP_SCHED is not set # CONFIG_CGROUP_PIDS is not set # CONFIG_CGROUP_FREEZER is not set # CONFIG_CPUSETS is not set # CONFIG_CGROUP_DEVICE is not set # CONFIG_CGROUP_CPUACCT is not set CONFIG_CGROUP_PERF=y CONFIG_CGROUP_DEBUG=y # CONFIG_CHECKPOINT_RESTORE is not set CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_IPC_NS is not set # CONFIG_USER_NS is not set CONFIG_PID_NS=y # CONFIG_NET_NS is not set
Re: [PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
On Sun, 13 Nov, at 09:59:32AM, Ingo Molnar wrote: > > * Ingo Molnar wrote: > > > I'll apply the patches from email and add your SOB. > > Note that the attached config produces this build error: > > drivers/firmware/efi/apple-properties.c:149:9: error: implicit declaration of > function ‘efi_get_device_by_path’ [-Werror=implicit-function-declaration] > > also: > > warning: (THUNDERBOLT) selects APPLE_PROPERTIES which has unmet direct > dependencies (EFI && EFI_STUB && X86) Oops. Lukas, could you take a look at this (config attached). # # Automatically generated file; DO NOT EDIT. # Linux/i386 4.9.0-rc4 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_BITS_MAX=16 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_SMP=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_DEBUG_RODATA=y CONFIG_PGTABLE_LEVELS=3 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_CONSTRUCTORS=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" CONFIG_COMPILE_TEST=y CONFIG_LOCALVERSION="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_FHANDLE=y CONFIG_USELIB=y CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y # CONFIG_IRQ_DOMAIN_DEBUG is not set CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_IRQ_TIME_ACCOUNTING is not set CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_TASKSTATS=y # CONFIG_TASK_DELAY_ACCT is not set CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # # RCU Subsystem # CONFIG_TREE_RCU=y CONFIG_RCU_EXPERT=y CONFIG_SRCU=y # CONFIG_TASKS_RCU is not set CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_FANOUT=32 CONFIG_RCU_FANOUT_LEAF=16 # CONFIG_RCU_FAST_NO_HZ is not set # CONFIG_TREE_RCU_TRACE is not set CONFIG_RCU_KTHREAD_PRIO=0 CONFIG_RCU_NOCB_CPU=y # CONFIG_RCU_NOCB_CPU_NONE is not set CONFIG_RCU_NOCB_CPU_ZERO=y # CONFIG_RCU_NOCB_CPU_ALL is not set # CONFIG_RCU_EXPEDITE_BOOT is not set CONFIG_BUILD_BIN2C=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=20 CONFIG_LOG_CPU_MAX_BUF_SHIFT=12 CONFIG_NMI_LOG_BUF_SHIFT=13 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y CONFIG_CGROUPS=y # CONFIG_MEMCG is not set # CONFIG_BLK_CGROUP is not set CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_CFS_BANDWIDTH=y # CONFIG_RT_GROUP_SCHED is not set # CONFIG_CGROUP_PIDS is not set # CONFIG_CGROUP_FREEZER is not set # CONFIG_CPUSETS is not set # CONFIG_CGROUP_DEVICE is not set # CONFIG_CGROUP_CPUACCT is not set CONFIG_CGROUP_PERF=y CONFIG_CGROUP_DEBUG=y # CONFIG_CHECKPOINT_RESTORE is not set CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_IPC_NS is not set # CONFIG_USER_NS is not set CONFIG_PID_NS=y # CONFIG_NET_NS is not set CONFIG_SCHED_AUTOGROUP=y
Re: [PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
On Sun, 13 Nov, at 08:19:39AM, Ingo Molnar wrote: > > * Matt Fleming <m...@codeblueprint.co.uk> wrote: > > > From: Ard Biesheuvel <ard.biesheu...@linaro.org> > > > > Invoke the EFI_RNG_PROTOCOL protocol in the context of the stub and > > install the Linux-specific RNG seed UEFI config table. This will be > > picked up by the EFI routines in the core kernel to seed the kernel > > entropy pool. > > > > Cc: Matt Fleming <m...@codeblueprint.co.uk> > > Reviewed-by: Kees Cook <keesc...@chromium.org> > > Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> > > This commit (and the commits after this one) doesn't have a proper signoff > chain, > probably due to rebasing? Argh, my bad. This is fallout from moving to the co-maintainer model. My scripts assume they don't need to append a SoB because that was handled when applying the patch to the git tree. But that obviously doesn't hold if Ard applies the patch to git, but I mail out the patches as part of the pull request (or vice versa). I guess in future you'd wanna see the SoB of the person mailing the patches, right? > I'll apply the patches from email and add your SOB. Thanks.
Re: [PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
On Sun, 13 Nov, at 08:19:39AM, Ingo Molnar wrote: > > * Matt Fleming wrote: > > > From: Ard Biesheuvel > > > > Invoke the EFI_RNG_PROTOCOL protocol in the context of the stub and > > install the Linux-specific RNG seed UEFI config table. This will be > > picked up by the EFI routines in the core kernel to seed the kernel > > entropy pool. > > > > Cc: Matt Fleming > > Reviewed-by: Kees Cook > > Signed-off-by: Ard Biesheuvel > > This commit (and the commits after this one) doesn't have a proper signoff > chain, > probably due to rebasing? Argh, my bad. This is fallout from moving to the co-maintainer model. My scripts assume they don't need to append a SoB because that was handled when applying the patch to the git tree. But that obviously doesn't hold if Ard applies the patch to git, but I mail out the patches as part of the pull request (or vice versa). I guess in future you'd wanna see the SoB of the person mailing the patches, right? > I'll apply the patches from email and add your SOB. Thanks.
[tip:efi/urgent] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y
Commit-ID: f6697df36bdf0bf7fce984605c2918d4a7b4269f Gitweb: http://git.kernel.org/tip/f6697df36bdf0bf7fce984605c2918d4a7b4269f Author: Matt Fleming <m...@codeblueprint.co.uk> AuthorDate: Sat, 12 Nov 2016 21:04:24 + Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Sun, 13 Nov 2016 08:26:40 +0100 x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y Booting an EFI mixed mode kernel has been crashing since commit: e37e43a497d5 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)") The user-visible effect in my test setup was the kernel being unable to find the root file system ramdisk. This was likely caused by silent memory or page table corruption. Enabling CONFIG_DEBUG_VIRTUAL=y immediately flagged the thunking code as abusing virt_to_phys() because it was passing addresses that were not part of the kernel direct mapping. Use the slow version instead, which correctly handles all memory regions by performing a page table walk. Suggested-by: Andy Lutomirski <l...@amacapital.net> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> Cc: Andy Lutomirski <l...@kernel.org> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Borislav Petkov <b...@alien8.de> Cc: Brian Gerst <brge...@gmail.com> Cc: Denys Vlasenko <dvlas...@redhat.com> Cc: H. Peter Anvin <h...@zytor.com> Cc: Josh Poimboeuf <jpoim...@redhat.com> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: linux-...@vger.kernel.org Link: http://lkml.kernel.org/r/20161112210424.5157-3-m...@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mi...@kernel.org> --- arch/x86/platform/efi/efi_64.c | 80 ++ 1 file changed, 57 insertions(+), 23 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 58b0f80..319148b 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -211,6 +212,35 @@ void efi_sync_low_kernel_mappings(void) memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries); } +/* + * Wrapper for slow_virt_to_phys() that handles NULL addresses. + */ +static inline phys_addr_t +virt_to_phys_or_null_size(void *va, unsigned long size) +{ + bool bad_size; + + if (!va) + return 0; + + if (virt_addr_valid(va)) + return virt_to_phys(va); + + /* +* A fully aligned variable on the stack is guaranteed not to +* cross a page bounary. Try to catch strings on the stack by +* checking that 'size' is a power of two. +*/ + bad_size = size > PAGE_SIZE || !is_power_of_2(size); + + WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size); + + return slow_virt_to_phys(va); +} + +#define virt_to_phys_or_null(addr) \ + virt_to_phys_or_null_size((addr), sizeof(*(addr))) + int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) { unsigned long pfn, text; @@ -494,8 +524,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, efi_time_cap_t *tc) spin_lock(_lock); - phys_tm = virt_to_phys(tm); - phys_tc = virt_to_phys(tc); + phys_tm = virt_to_phys_or_null(tm); + phys_tc = virt_to_phys_or_null(tc); status = efi_thunk(get_time, phys_tm, phys_tc); @@ -511,7 +541,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_time, phys_tm); @@ -529,9 +559,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending, spin_lock(_lock); - phys_enabled = virt_to_phys(enabled); - phys_pending = virt_to_phys(pending); - phys_tm = virt_to_phys(tm); + phys_enabled = virt_to_phys_or_null(enabled); + phys_pending = virt_to_phys_or_null(pending); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(get_wakeup_time, phys_enabled, phys_pending, phys_tm); @@ -549,7 +579,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_wakeup_time, enabled, phys_tm); @@ -558,6 +588,10 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) return status; } +static unsigned long efi_name_size(efi_char16_t *name) +{ + return ucs2_strsize(name, EFI_VAR_NAME_LEN) + 1; +} static efi_status_t efi_thunk_get_variable(efi_char16_t *name, efi_guid_t *vendor, @@ -567,11 +601,11 @@ efi_thunk_get_variable(efi_char16_t *name, ef
[tip:efi/urgent] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y
Commit-ID: f6697df36bdf0bf7fce984605c2918d4a7b4269f Gitweb: http://git.kernel.org/tip/f6697df36bdf0bf7fce984605c2918d4a7b4269f Author: Matt Fleming AuthorDate: Sat, 12 Nov 2016 21:04:24 + Committer: Ingo Molnar CommitDate: Sun, 13 Nov 2016 08:26:40 +0100 x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y Booting an EFI mixed mode kernel has been crashing since commit: e37e43a497d5 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)") The user-visible effect in my test setup was the kernel being unable to find the root file system ramdisk. This was likely caused by silent memory or page table corruption. Enabling CONFIG_DEBUG_VIRTUAL=y immediately flagged the thunking code as abusing virt_to_phys() because it was passing addresses that were not part of the kernel direct mapping. Use the slow version instead, which correctly handles all memory regions by performing a page table walk. Suggested-by: Andy Lutomirski Signed-off-by: Matt Fleming Cc: Andy Lutomirski Cc: Ard Biesheuvel Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-...@vger.kernel.org Link: http://lkml.kernel.org/r/20161112210424.5157-3-m...@codeblueprint.co.uk Signed-off-by: Ingo Molnar --- arch/x86/platform/efi/efi_64.c | 80 ++ 1 file changed, 57 insertions(+), 23 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 58b0f80..319148b 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -211,6 +212,35 @@ void efi_sync_low_kernel_mappings(void) memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries); } +/* + * Wrapper for slow_virt_to_phys() that handles NULL addresses. + */ +static inline phys_addr_t +virt_to_phys_or_null_size(void *va, unsigned long size) +{ + bool bad_size; + + if (!va) + return 0; + + if (virt_addr_valid(va)) + return virt_to_phys(va); + + /* +* A fully aligned variable on the stack is guaranteed not to +* cross a page bounary. Try to catch strings on the stack by +* checking that 'size' is a power of two. +*/ + bad_size = size > PAGE_SIZE || !is_power_of_2(size); + + WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size); + + return slow_virt_to_phys(va); +} + +#define virt_to_phys_or_null(addr) \ + virt_to_phys_or_null_size((addr), sizeof(*(addr))) + int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) { unsigned long pfn, text; @@ -494,8 +524,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, efi_time_cap_t *tc) spin_lock(_lock); - phys_tm = virt_to_phys(tm); - phys_tc = virt_to_phys(tc); + phys_tm = virt_to_phys_or_null(tm); + phys_tc = virt_to_phys_or_null(tc); status = efi_thunk(get_time, phys_tm, phys_tc); @@ -511,7 +541,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_time, phys_tm); @@ -529,9 +559,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending, spin_lock(_lock); - phys_enabled = virt_to_phys(enabled); - phys_pending = virt_to_phys(pending); - phys_tm = virt_to_phys(tm); + phys_enabled = virt_to_phys_or_null(enabled); + phys_pending = virt_to_phys_or_null(pending); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(get_wakeup_time, phys_enabled, phys_pending, phys_tm); @@ -549,7 +579,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_wakeup_time, enabled, phys_tm); @@ -558,6 +588,10 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) return status; } +static unsigned long efi_name_size(efi_char16_t *name) +{ + return ucs2_strsize(name, EFI_VAR_NAME_LEN) + 1; +} static efi_status_t efi_thunk_get_variable(efi_char16_t *name, efi_guid_t *vendor, @@ -567,11 +601,11 @@ efi_thunk_get_variable(efi_char16_t *name, efi_guid_t *vendor, u32 phys_name, phys_vendor, phys_attr; u32 phys_data_size, phys_data; - phys_data_size = virt_to_phys(data_size); - phys_vendor = virt_to_phys(vendor); - phys_name = virt_to_phys(name); - phys_attr = virt_to_phys(attr); - phys_data = virt_to_phys(data); + phys_data_size = virt_to_phys_or_null(data_size); + phys_vendor = virt_to_phys_o
[PATCH 1/9] efi/libstub: Fix allocation size calculations
From: Roy Franz <roy.fr...@hpe.com> Adjust the size used in calculations to match the actual size of allocation that will be performed based on EFI size/alignment constraints. efi_high_alloc() and efi_low_alloc() use the passed size in bytes directly to find space in the memory map for the allocation, rather than the actual allocation size that has been adjusted for size and alignment constraints. This results in failed allocations and retries in efi_high_alloc(). The same error is present in efi_low_alloc(), although failure will only happen if the lowest memory block is small. Also use EFI_PAGE_SIZE consistently and remove use of EFI_PAGE_SHIFT to calculate page size. Signed-off-by: Roy Franz <roy.fr...@hpe.com> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Matt Fleming <m...@codeblueprint.co.uk> --- drivers/firmware/efi/libstub/efi-stub-helper.c | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c index aded10662020..4b74bf86c74d 100644 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c @@ -186,14 +186,16 @@ efi_status_t efi_high_alloc(efi_system_table_t *sys_table_arg, goto fail; /* -* Enforce minimum alignment that EFI requires when requesting -* a specific address. We are doing page-based allocations, -* so we must be aligned to a page. +* Enforce minimum alignment that EFI or Linux requires when +* requesting a specific address. We are doing page-based (or +* larger) allocations, and both the address and size must meet +* alignment constraints. */ if (align < EFI_ALLOC_ALIGN) align = EFI_ALLOC_ALIGN; - nr_pages = round_up(size, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE; + size = round_up(size, EFI_ALLOC_ALIGN); + nr_pages = size / EFI_PAGE_SIZE; again: for (i = 0; i < map_size / desc_size; i++) { efi_memory_desc_t *desc; @@ -208,7 +210,7 @@ efi_status_t efi_high_alloc(efi_system_table_t *sys_table_arg, continue; start = desc->phys_addr; - end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT); + end = start + desc->num_pages * EFI_PAGE_SIZE; if (end > max) end = max; @@ -278,14 +280,16 @@ efi_status_t efi_low_alloc(efi_system_table_t *sys_table_arg, goto fail; /* -* Enforce minimum alignment that EFI requires when requesting -* a specific address. We are doing page-based allocations, -* so we must be aligned to a page. +* Enforce minimum alignment that EFI or Linux requires when +* requesting a specific address. We are doing page-based (or +* larger) allocations, and both the address and size must meet +* alignment constraints. */ if (align < EFI_ALLOC_ALIGN) align = EFI_ALLOC_ALIGN; - nr_pages = round_up(size, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE; + size = round_up(size, EFI_ALLOC_ALIGN); + nr_pages = size / EFI_PAGE_SIZE; for (i = 0; i < map_size / desc_size; i++) { efi_memory_desc_t *desc; unsigned long m = (unsigned long)map; @@ -300,7 +304,7 @@ efi_status_t efi_low_alloc(efi_system_table_t *sys_table_arg, continue; start = desc->phys_addr; - end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT); + end = start + desc->num_pages * EFI_PAGE_SIZE; /* * Don't allocate at 0x0. It will confuse code that -- 2.10.0
[PATCH 1/9] efi/libstub: Fix allocation size calculations
From: Roy Franz Adjust the size used in calculations to match the actual size of allocation that will be performed based on EFI size/alignment constraints. efi_high_alloc() and efi_low_alloc() use the passed size in bytes directly to find space in the memory map for the allocation, rather than the actual allocation size that has been adjusted for size and alignment constraints. This results in failed allocations and retries in efi_high_alloc(). The same error is present in efi_low_alloc(), although failure will only happen if the lowest memory block is small. Also use EFI_PAGE_SIZE consistently and remove use of EFI_PAGE_SHIFT to calculate page size. Signed-off-by: Roy Franz Signed-off-by: Ard Biesheuvel Cc: Matt Fleming --- drivers/firmware/efi/libstub/efi-stub-helper.c | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c index aded10662020..4b74bf86c74d 100644 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c @@ -186,14 +186,16 @@ efi_status_t efi_high_alloc(efi_system_table_t *sys_table_arg, goto fail; /* -* Enforce minimum alignment that EFI requires when requesting -* a specific address. We are doing page-based allocations, -* so we must be aligned to a page. +* Enforce minimum alignment that EFI or Linux requires when +* requesting a specific address. We are doing page-based (or +* larger) allocations, and both the address and size must meet +* alignment constraints. */ if (align < EFI_ALLOC_ALIGN) align = EFI_ALLOC_ALIGN; - nr_pages = round_up(size, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE; + size = round_up(size, EFI_ALLOC_ALIGN); + nr_pages = size / EFI_PAGE_SIZE; again: for (i = 0; i < map_size / desc_size; i++) { efi_memory_desc_t *desc; @@ -208,7 +210,7 @@ efi_status_t efi_high_alloc(efi_system_table_t *sys_table_arg, continue; start = desc->phys_addr; - end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT); + end = start + desc->num_pages * EFI_PAGE_SIZE; if (end > max) end = max; @@ -278,14 +280,16 @@ efi_status_t efi_low_alloc(efi_system_table_t *sys_table_arg, goto fail; /* -* Enforce minimum alignment that EFI requires when requesting -* a specific address. We are doing page-based allocations, -* so we must be aligned to a page. +* Enforce minimum alignment that EFI or Linux requires when +* requesting a specific address. We are doing page-based (or +* larger) allocations, and both the address and size must meet +* alignment constraints. */ if (align < EFI_ALLOC_ALIGN) align = EFI_ALLOC_ALIGN; - nr_pages = round_up(size, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE; + size = round_up(size, EFI_ALLOC_ALIGN); + nr_pages = size / EFI_PAGE_SIZE; for (i = 0; i < map_size / desc_size; i++) { efi_memory_desc_t *desc; unsigned long m = (unsigned long)map; @@ -300,7 +304,7 @@ efi_status_t efi_low_alloc(efi_system_table_t *sys_table_arg, continue; start = desc->phys_addr; - end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT); + end = start + desc->num_pages * EFI_PAGE_SIZE; /* * Don't allocate at 0x0. It will confuse code that -- 2.10.0
[PATCH 6/9] efi: Add device path parser
From: Lukas Wunner <lu...@wunner.de> We're about to extended the efistub to retrieve device properties from EFI on Apple Macs. The properties use EFI Device Paths to indicate the device they belong to. This commit adds a parser which, given an EFI Device Path, locates the corresponding struct device and returns a reference to it. Initially only ACPI and PCI Device Path nodes are supported, these are the only types needed for Apple device properties (the corresponding macOS function AppleACPIPlatformExpert::matchEFIDevicePath() does not support any others). Further node types can be added with little to moderate effort. Apple device properties is currently the only use case of this parser, but Peter Jones intends to use it to match up devices with the ConInDev/ConOutDev/ErrOutDev variables and add sysfs attributes to these devices to say the hardware supports using them as console. Thus, make this parser a separate component which can be selected with config option EFI_DEV_PATH_PARSER. It can in principle be compiled as a module if acpi_get_first_physical_node() and acpi_bus_type are exported (and efi_get_device_by_path() itself is exported). The dependency on CONFIG_ACPI is needed for acpi_match_device_ids(). It can be removed if an empty inline stub is added for that function. Signed-off-by: Lukas Wunner <lu...@wunner.de> Cc: Peter Jones <pjo...@redhat.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Andreas Noever <andreas.noe...@gmail.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- drivers/firmware/efi/Kconfig | 5 + drivers/firmware/efi/Makefile | 1 + drivers/firmware/efi/dev-path-parser.c | 203 + include/linux/efi.h| 20 4 files changed, 229 insertions(+) create mode 100644 drivers/firmware/efi/dev-path-parser.c diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index c981be17d3c0..893fda48fcdd 100644 --- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -133,3 +133,8 @@ endmenu config UEFI_CPER bool + +config EFI_DEV_PATH_PARSER + bool + depends on ACPI + default n diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile index c8a439f6d715..3e91ae31f9d1 100644 --- a/drivers/firmware/efi/Makefile +++ b/drivers/firmware/efi/Makefile @@ -21,6 +21,7 @@ obj-$(CONFIG_EFI_STUB)+= libstub/ obj-$(CONFIG_EFI_FAKE_MEMMAP) += fake_mem.o obj-$(CONFIG_EFI_BOOTLOADER_CONTROL) += efibc.o obj-$(CONFIG_EFI_TEST) += test/ +obj-$(CONFIG_EFI_DEV_PATH_PARSER) += dev-path-parser.o arm-obj-$(CONFIG_EFI) := arm-init.o arm-runtime.o obj-$(CONFIG_ARM) += $(arm-obj-y) diff --git a/drivers/firmware/efi/dev-path-parser.c b/drivers/firmware/efi/dev-path-parser.c new file mode 100644 index ..85d1834ee9b7 --- /dev/null +++ b/drivers/firmware/efi/dev-path-parser.c @@ -0,0 +1,203 @@ +/* + * dev-path-parser.c - EFI Device Path parser + * Copyright (C) 2016 Lukas Wunner <lu...@wunner.de> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2) as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#include +#include +#include + +struct acpi_hid_uid { + struct acpi_device_id hid[2]; + char uid[11]; /* UINT_MAX + null byte */ +}; + +static int __init match_acpi_dev(struct device *dev, void *data) +{ + struct acpi_hid_uid hid_uid = *(struct acpi_hid_uid *)data; + struct acpi_device *adev = to_acpi_device(dev); + + if (acpi_match_device_ids(adev, hid_uid.hid)) + return 0; + + if (adev->pnp.unique_id) + return !strcmp(adev->pnp.unique_id, hid_uid.uid); + else + return !strcmp("0", hid_uid.uid); +} + +static long __init parse_acpi_path(struct efi_dev_path *node, + struct device *parent, struct device **child) +{ + struct acpi_hid_uid hid_uid = {}; + struct device *phys_dev; + + if (node->length != 12) + return -EINVAL; + + sprintf(hid_uid.hid[0].id, "%c%c%c%04X", + 'A' + ((node->acpi.hid >> 10) & 0x1f) - 1, + 'A' + ((node->acpi.hid >> 5) & 0x1f) - 1, + 'A' + ((node->acpi.hid >> 0) & 0x1f) - 1, + node->acpi.h
[PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
From: Ard Biesheuvel <ard.biesheu...@linaro.org> Invoke the EFI_RNG_PROTOCOL protocol in the context of the stub and install the Linux-specific RNG seed UEFI config table. This will be picked up by the EFI routines in the core kernel to seed the kernel entropy pool. Cc: Matt Fleming <m...@codeblueprint.co.uk> Reviewed-by: Kees Cook <keesc...@chromium.org> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- drivers/firmware/efi/libstub/arm-stub.c | 2 ++ drivers/firmware/efi/libstub/efistub.h | 2 ++ drivers/firmware/efi/libstub/random.c | 48 + include/linux/efi.h | 1 + 4 files changed, 53 insertions(+) diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c index 993aa56755f6..b4f7d78f9e8b 100644 --- a/drivers/firmware/efi/libstub/arm-stub.c +++ b/drivers/firmware/efi/libstub/arm-stub.c @@ -340,6 +340,8 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table, if (status != EFI_SUCCESS) pr_efi_err(sys_table, "Failed initrd from command line!\n"); + efi_random_get_seed(sys_table); + new_fdt_addr = fdt_addr; status = allocate_new_fdt_and_exit_boot(sys_table, handle, _fdt_addr, dram_base + MAX_FDT_OFFSET, diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h index fe1f22584c69..b98824e3800a 100644 --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h @@ -71,4 +71,6 @@ efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg, efi_status_t check_platform_features(efi_system_table_t *sys_table_arg); +efi_status_t efi_random_get_seed(efi_system_table_t *sys_table_arg); + #endif diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c index f8e2e5ae6872..3a3feacc329f 100644 --- a/drivers/firmware/efi/libstub/random.c +++ b/drivers/firmware/efi/libstub/random.c @@ -143,3 +143,51 @@ efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg, return status; } + +#define RANDOM_SEED_SIZE 32 + +efi_status_t efi_random_get_seed(efi_system_table_t *sys_table_arg) +{ + efi_guid_t rng_proto = EFI_RNG_PROTOCOL_GUID; + efi_guid_t rng_algo_raw = EFI_RNG_ALGORITHM_RAW; + efi_guid_t rng_table_guid = LINUX_EFI_RANDOM_SEED_TABLE_GUID; + struct efi_rng_protocol *rng; + struct linux_efi_random_seed *seed; + efi_status_t status; + + status = efi_call_early(locate_protocol, _proto, NULL, + (void **)); + if (status != EFI_SUCCESS) + return status; + + status = efi_call_early(allocate_pool, EFI_RUNTIME_SERVICES_DATA, + sizeof(*seed) + RANDOM_SEED_SIZE, + (void **)); + if (status != EFI_SUCCESS) + return status; + + status = rng->get_rng(rng, _algo_raw, RANDOM_SEED_SIZE, + seed->bits); + if (status == EFI_UNSUPPORTED) + /* +* Use whatever algorithm we have available if the raw algorithm +* is not implemented. +*/ + status = rng->get_rng(rng, NULL, RANDOM_SEED_SIZE, + seed->bits); + + if (status != EFI_SUCCESS) + goto err_freepool; + + seed->size = RANDOM_SEED_SIZE; + status = efi_call_early(install_configuration_table, _table_guid, + seed); + if (status != EFI_SUCCESS) + goto err_freepool; + + return EFI_SUCCESS; + +err_freepool: + efi_call_early(free_pool, seed); + return status; +} diff --git a/include/linux/efi.h b/include/linux/efi.h index 85e28b138cdd..f5a821d9b90c 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -589,6 +589,7 @@ void efi_native_runtime_setup(void); #define DEVICE_TREE_GUID EFI_GUID(0xb1b621d5, 0xf19c, 0x41a5, 0x83, 0x0b, 0xd9, 0x15, 0x2c, 0x69, 0xaa, 0xe0) #define EFI_PROPERTIES_TABLE_GUID EFI_GUID(0x880aaca3, 0x4adc, 0x4a04, 0x90, 0x79, 0xb7, 0x47, 0x34, 0x08, 0x25, 0xe5) #define EFI_RNG_PROTOCOL_GUID EFI_GUID(0x3152bca5, 0xeade, 0x433d, 0x86, 0x2e, 0xc0, 0x1c, 0xdc, 0x29, 0x1f, 0x44) +#define EFI_RNG_ALGORITHM_RAW EFI_GUID(0xe43176d7, 0xb6e8, 0x4827, 0xb7, 0x84, 0x7f, 0xfd, 0xc4, 0xb6, 0x85, 0x61) #define EFI_MEMORY_ATTRIBUTES_TABLE_GUID EFI_GUID(0xdcfa911d, 0x26eb, 0x469f, 0xa2, 0x20, 0x38, 0xb7, 0xdc, 0x46, 0x12, 0x20) #define EFI_CONSOLE_OUT_DEVICE_GUIDEFI_GUID(0xd3b36f2c, 0xd551, 0x11d4, 0x9a, 0x46, 0x00, 0x90, 0x27, 0x3f, 0xc1, 0x4d) -- 2.10.0
[PATCH 2/9] MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem
From: Ard Biesheuvel <ard.biesheu...@linaro.org> Since I will be co-maintaining the EFI subsystem, it makes sense to mention the ARM and arm64 EFI bits in the EFI section in MAINTAINERS so that Matt, the list and I get cc'ed on proposed changes. Acked-by: Will Deacon <will.dea...@arm.com> Acked-by: Russell King <rmk+ker...@armlinux.org.uk> Cc: Matt Fleming <m...@codeblueprint.co.uk> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- MAINTAINERS | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 6847ba844ef9..1f38999d8ce7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4625,12 +4625,14 @@ L: linux-...@vger.kernel.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git S: Maintained F: Documentation/efi-stub.txt -F: arch/ia64/kernel/efi.c +F: arch/*/kernel/efi.c F: arch/x86/boot/compressed/eboot.[ch] -F: arch/x86/include/asm/efi.h +F: arch/*/include/asm/efi.h F: arch/x86/platform/efi/ F: drivers/firmware/efi/ F: include/linux/efi*.h +F: arch/arm/boot/compressed/efi-header.S +F: arch/arm64/kernel/efi-entry.S EFI VARIABLE FILESYSTEM M: Matthew Garrett <matthew.garr...@nebula.com> -- 2.10.0
[PATCH 4/9] efi/libstub: Add random.c to ARM build
From: Ard Biesheuvel <ard.biesheu...@linaro.org> Make random.c build for ARM by moving the fallback definition of EFI_ALLOC_ALIGN to efistub.h, and replacing a division by a value we know to be a power of 2 with a right shift (this is required since ARM does not have any integer division helper routines in its decompressor) Cc: Matt Fleming <m...@codeblueprint.co.uk> Reviewed-by: Kees Cook <keesc...@chromium.org> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- drivers/firmware/efi/libstub/Makefile | 4 ++-- drivers/firmware/efi/libstub/efi-stub-helper.c | 9 - drivers/firmware/efi/libstub/efistub.h | 9 + drivers/firmware/efi/libstub/random.c | 8 +--- 4 files changed, 16 insertions(+), 14 deletions(-) diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile index c06945160a41..40ddf8f763a8 100644 --- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile @@ -36,11 +36,11 @@ arm-deps := fdt_rw.c fdt_ro.c fdt_wip.c fdt.c fdt_empty_tree.c fdt_sw.c sort.c $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE $(call if_changed_rule,cc_o_c) -lib-$(CONFIG_EFI_ARMSTUB) += arm-stub.o fdt.o string.o \ +lib-$(CONFIG_EFI_ARMSTUB) += arm-stub.o fdt.o string.o random.o \ $(patsubst %.c,lib-%.o,$(arm-deps)) lib-$(CONFIG_ARM) += arm32-stub.o -lib-$(CONFIG_ARM64)+= arm64-stub.o random.o +lib-$(CONFIG_ARM64)+= arm64-stub.o CFLAGS_arm64-stub.o:= -DTEXT_OFFSET=$(TEXT_OFFSET) # diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c index 4b74bf86c74d..757badc1debb 100644 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c @@ -32,15 +32,6 @@ static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE; -/* - * Allow the platform to override the allocation granularity: this allows - * systems that have the capability to run with a larger page size to deal - * with the allocations for initrd and fdt more efficiently. - */ -#ifndef EFI_ALLOC_ALIGN -#define EFI_ALLOC_ALIGNEFI_PAGE_SIZE -#endif - #define EFI_MMAP_NR_SLACK_SLOTS8 struct file_info { diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h index ee49cd23ee63..fe1f22584c69 100644 --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h @@ -15,6 +15,15 @@ */ #undef __init +/* + * Allow the platform to override the allocation granularity: this allows + * systems that have the capability to run with a larger page size to deal + * with the allocations for initrd and fdt more efficiently. + */ +#ifndef EFI_ALLOC_ALIGN +#define EFI_ALLOC_ALIGNEFI_PAGE_SIZE +#endif + void efi_char16_printk(efi_system_table_t *, efi_char16_t *); efi_status_t efi_open_volume(efi_system_table_t *sys_table_arg, void *__image, diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c index 0c9f58c5ba50..f8e2e5ae6872 100644 --- a/drivers/firmware/efi/libstub/random.c +++ b/drivers/firmware/efi/libstub/random.c @@ -8,6 +8,7 @@ */ #include +#include #include #include "efistub.h" @@ -41,8 +42,9 @@ efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table_arg, */ static unsigned long get_entry_num_slots(efi_memory_desc_t *md, unsigned long size, -unsigned long align) +unsigned long align_shift) { + unsigned long align = 1UL << align_shift; u64 start, end; if (md->type != EFI_CONVENTIONAL_MEMORY) @@ -55,7 +57,7 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t *md, if (start > end) return 0; - return (end - start + 1) / align; + return (end - start + 1) >> align_shift; } /* @@ -98,7 +100,7 @@ efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg, efi_memory_desc_t *md = (void *)memory_map + map_offset; unsigned long slots; - slots = get_entry_num_slots(md, size, align); + slots = get_entry_num_slots(md, size, ilog2(align)); MD_NUM_SLOTS(md) = slots; total_slots += slots; } -- 2.10.0
[PATCH 9/9] thunderbolt: Use Device ROM retrieved from EFI
From: Lukas Wunner <lu...@wunner.de> Macs with Thunderbolt 1 do not have a unit-specific DROM: The DROM is empty with uid 0x1. (Apple started factory-burning a unit- specific DROM with Thunderbolt 2.) Instead, the NHI EFI driver supplies a DROM in a device property. Use it if available. It's only available when booting with the efistub. If it's not available, silently fall back to our hardcoded DROM. The size of the DROM is always 256 bytes. The number is hardcoded into the NHI EFI driver. This commit can deal with an arbitrary size however, just in case they ever change that. Background information: The EFI firmware volume contains ROM files for the NHI, GMUX and several other chips as well as key material. This strategy allows Apple to deploy ROM or key updates by simply publishing an EFI firmware update on their website. Drivers do not access those files directly but rather through a file server via EFI protocol AC5E4829-A8FD-440B-AF33-9FFE013B12D8. Files are identified by GUID, the NHI DROM has 339370BD-CFC6-4454-8EF7-704653120818. The NHI EFI driver amends that file with a unit-specific uid. The uid has 64 bit but its entropy is much lower: 24 bit represent the model, 24 bit are taken from a serial number, 16 bit are fixed. The NHI EFI driver obtains the serial number via the DataHub protocol, copies it into the DROM, calculates the CRC and submits the result as a device property. A modification is needed in the resume code where we currently read the uid of all switches in the hierarchy to detect plug events that occurred during sleep. On Thunderbolt 1 root switches this will now lead to a mismatch between the uid of the empty DROM and the EFI DROM. Exempt the root switch from this check: It's built in, so the uid should never change. However we continue to *read* the uid of the root switch, this seems like a good way to test its reachability after resume. Signed-off-by: Lukas Wunner <lu...@wunner.de> Tested-by: Lukas Wunner <lu...@wunner.de> [MacBookPro9,1] Tested-by: Pierre Moreau <pierre.mor...@free.fr> [MacBookPro11,3] Acked-by: Andreas Noever <andreas.noe...@gmail.com> Cc: Pedro Vilaça <rever...@put.as> Cc: Peter Jones <pjo...@redhat.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- drivers/thunderbolt/Kconfig | 1 + drivers/thunderbolt/eeprom.c | 43 +++ drivers/thunderbolt/switch.c | 2 +- 3 files changed, 45 insertions(+), 1 deletion(-) diff --git a/drivers/thunderbolt/Kconfig b/drivers/thunderbolt/Kconfig index c121acc15bfe..0056df7f3c09 100644 --- a/drivers/thunderbolt/Kconfig +++ b/drivers/thunderbolt/Kconfig @@ -1,6 +1,7 @@ menuconfig THUNDERBOLT tristate "Thunderbolt support for Apple devices" depends on PCI + select APPLE_PROPERTIES select CRC32 help Cactus Ridge Thunderbolt Controller driver diff --git a/drivers/thunderbolt/eeprom.c b/drivers/thunderbolt/eeprom.c index 2b9602c2c355..6392990c984d 100644 --- a/drivers/thunderbolt/eeprom.c +++ b/drivers/thunderbolt/eeprom.c @@ -5,6 +5,7 @@ */ #include +#include #include #include "tb.h" @@ -360,6 +361,40 @@ static int tb_drom_parse_entries(struct tb_switch *sw) } /** + * tb_drom_copy_efi - copy drom supplied by EFI to sw->drom if present + */ +static int tb_drom_copy_efi(struct tb_switch *sw, u16 *size) +{ + struct device *dev = >tb->nhi->pdev->dev; + int len, res; + + len = device_property_read_u8_array(dev, "ThunderboltDROM", NULL, 0); + if (len < 0 || len < sizeof(struct tb_drom_header)) + return -EINVAL; + + sw->drom = kmalloc(len, GFP_KERNEL); + if (!sw->drom) + return -ENOMEM; + + res = device_property_read_u8_array(dev, "ThunderboltDROM", sw->drom, + len); + if (res) + goto err; + + *size = ((struct tb_drom_header *)sw->drom)->data_len + + TB_DROM_DATA_START; + if (*size > len) + goto err; + + return 0; + +err: + kfree(sw->drom); + sw->drom = NULL; + return -EINVAL; +} + +/** * tb_drom_read - copy drom to sw->drom and parse it */ int tb_drom_read(struct tb_switch *sw) @@ -374,6 +409,13 @@ int tb_drom_read(struct tb_switch *sw) if (tb_route(sw) == 0) { /* +* Apple's NHI EFI driver supplies a DROM for the root switch +* in a device property. Use it if available. +*/ + if (tb_drom_copy_efi(sw, ) == 0) + goto parse; + + /* * The root switch contains only a dummy drom (header only,
[PATCH 6/9] efi: Add device path parser
From: Lukas Wunner We're about to extended the efistub to retrieve device properties from EFI on Apple Macs. The properties use EFI Device Paths to indicate the device they belong to. This commit adds a parser which, given an EFI Device Path, locates the corresponding struct device and returns a reference to it. Initially only ACPI and PCI Device Path nodes are supported, these are the only types needed for Apple device properties (the corresponding macOS function AppleACPIPlatformExpert::matchEFIDevicePath() does not support any others). Further node types can be added with little to moderate effort. Apple device properties is currently the only use case of this parser, but Peter Jones intends to use it to match up devices with the ConInDev/ConOutDev/ErrOutDev variables and add sysfs attributes to these devices to say the hardware supports using them as console. Thus, make this parser a separate component which can be selected with config option EFI_DEV_PATH_PARSER. It can in principle be compiled as a module if acpi_get_first_physical_node() and acpi_bus_type are exported (and efi_get_device_by_path() itself is exported). The dependency on CONFIG_ACPI is needed for acpi_match_device_ids(). It can be removed if an empty inline stub is added for that function. Signed-off-by: Lukas Wunner Cc: Peter Jones Cc: Ard Biesheuvel Cc: Andreas Noever Signed-off-by: Matt Fleming --- drivers/firmware/efi/Kconfig | 5 + drivers/firmware/efi/Makefile | 1 + drivers/firmware/efi/dev-path-parser.c | 203 + include/linux/efi.h| 20 4 files changed, 229 insertions(+) create mode 100644 drivers/firmware/efi/dev-path-parser.c diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index c981be17d3c0..893fda48fcdd 100644 --- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -133,3 +133,8 @@ endmenu config UEFI_CPER bool + +config EFI_DEV_PATH_PARSER + bool + depends on ACPI + default n diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile index c8a439f6d715..3e91ae31f9d1 100644 --- a/drivers/firmware/efi/Makefile +++ b/drivers/firmware/efi/Makefile @@ -21,6 +21,7 @@ obj-$(CONFIG_EFI_STUB)+= libstub/ obj-$(CONFIG_EFI_FAKE_MEMMAP) += fake_mem.o obj-$(CONFIG_EFI_BOOTLOADER_CONTROL) += efibc.o obj-$(CONFIG_EFI_TEST) += test/ +obj-$(CONFIG_EFI_DEV_PATH_PARSER) += dev-path-parser.o arm-obj-$(CONFIG_EFI) := arm-init.o arm-runtime.o obj-$(CONFIG_ARM) += $(arm-obj-y) diff --git a/drivers/firmware/efi/dev-path-parser.c b/drivers/firmware/efi/dev-path-parser.c new file mode 100644 index ..85d1834ee9b7 --- /dev/null +++ b/drivers/firmware/efi/dev-path-parser.c @@ -0,0 +1,203 @@ +/* + * dev-path-parser.c - EFI Device Path parser + * Copyright (C) 2016 Lukas Wunner + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2) as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#include +#include +#include + +struct acpi_hid_uid { + struct acpi_device_id hid[2]; + char uid[11]; /* UINT_MAX + null byte */ +}; + +static int __init match_acpi_dev(struct device *dev, void *data) +{ + struct acpi_hid_uid hid_uid = *(struct acpi_hid_uid *)data; + struct acpi_device *adev = to_acpi_device(dev); + + if (acpi_match_device_ids(adev, hid_uid.hid)) + return 0; + + if (adev->pnp.unique_id) + return !strcmp(adev->pnp.unique_id, hid_uid.uid); + else + return !strcmp("0", hid_uid.uid); +} + +static long __init parse_acpi_path(struct efi_dev_path *node, + struct device *parent, struct device **child) +{ + struct acpi_hid_uid hid_uid = {}; + struct device *phys_dev; + + if (node->length != 12) + return -EINVAL; + + sprintf(hid_uid.hid[0].id, "%c%c%c%04X", + 'A' + ((node->acpi.hid >> 10) & 0x1f) - 1, + 'A' + ((node->acpi.hid >> 5) & 0x1f) - 1, + 'A' + ((node->acpi.hid >> 0) & 0x1f) - 1, + node->acpi.hid >> 16); + sprintf(hid_uid.uid, "%u", node->acpi.uid); + + *child = bus_find_device(_bus_type, NULL, _uid, +
[PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
From: Ard Biesheuvel Invoke the EFI_RNG_PROTOCOL protocol in the context of the stub and install the Linux-specific RNG seed UEFI config table. This will be picked up by the EFI routines in the core kernel to seed the kernel entropy pool. Cc: Matt Fleming Reviewed-by: Kees Cook Signed-off-by: Ard Biesheuvel --- drivers/firmware/efi/libstub/arm-stub.c | 2 ++ drivers/firmware/efi/libstub/efistub.h | 2 ++ drivers/firmware/efi/libstub/random.c | 48 + include/linux/efi.h | 1 + 4 files changed, 53 insertions(+) diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c index 993aa56755f6..b4f7d78f9e8b 100644 --- a/drivers/firmware/efi/libstub/arm-stub.c +++ b/drivers/firmware/efi/libstub/arm-stub.c @@ -340,6 +340,8 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table, if (status != EFI_SUCCESS) pr_efi_err(sys_table, "Failed initrd from command line!\n"); + efi_random_get_seed(sys_table); + new_fdt_addr = fdt_addr; status = allocate_new_fdt_and_exit_boot(sys_table, handle, _fdt_addr, dram_base + MAX_FDT_OFFSET, diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h index fe1f22584c69..b98824e3800a 100644 --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h @@ -71,4 +71,6 @@ efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg, efi_status_t check_platform_features(efi_system_table_t *sys_table_arg); +efi_status_t efi_random_get_seed(efi_system_table_t *sys_table_arg); + #endif diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c index f8e2e5ae6872..3a3feacc329f 100644 --- a/drivers/firmware/efi/libstub/random.c +++ b/drivers/firmware/efi/libstub/random.c @@ -143,3 +143,51 @@ efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg, return status; } + +#define RANDOM_SEED_SIZE 32 + +efi_status_t efi_random_get_seed(efi_system_table_t *sys_table_arg) +{ + efi_guid_t rng_proto = EFI_RNG_PROTOCOL_GUID; + efi_guid_t rng_algo_raw = EFI_RNG_ALGORITHM_RAW; + efi_guid_t rng_table_guid = LINUX_EFI_RANDOM_SEED_TABLE_GUID; + struct efi_rng_protocol *rng; + struct linux_efi_random_seed *seed; + efi_status_t status; + + status = efi_call_early(locate_protocol, _proto, NULL, + (void **)); + if (status != EFI_SUCCESS) + return status; + + status = efi_call_early(allocate_pool, EFI_RUNTIME_SERVICES_DATA, + sizeof(*seed) + RANDOM_SEED_SIZE, + (void **)); + if (status != EFI_SUCCESS) + return status; + + status = rng->get_rng(rng, _algo_raw, RANDOM_SEED_SIZE, + seed->bits); + if (status == EFI_UNSUPPORTED) + /* +* Use whatever algorithm we have available if the raw algorithm +* is not implemented. +*/ + status = rng->get_rng(rng, NULL, RANDOM_SEED_SIZE, + seed->bits); + + if (status != EFI_SUCCESS) + goto err_freepool; + + seed->size = RANDOM_SEED_SIZE; + status = efi_call_early(install_configuration_table, _table_guid, + seed); + if (status != EFI_SUCCESS) + goto err_freepool; + + return EFI_SUCCESS; + +err_freepool: + efi_call_early(free_pool, seed); + return status; +} diff --git a/include/linux/efi.h b/include/linux/efi.h index 85e28b138cdd..f5a821d9b90c 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -589,6 +589,7 @@ void efi_native_runtime_setup(void); #define DEVICE_TREE_GUID EFI_GUID(0xb1b621d5, 0xf19c, 0x41a5, 0x83, 0x0b, 0xd9, 0x15, 0x2c, 0x69, 0xaa, 0xe0) #define EFI_PROPERTIES_TABLE_GUID EFI_GUID(0x880aaca3, 0x4adc, 0x4a04, 0x90, 0x79, 0xb7, 0x47, 0x34, 0x08, 0x25, 0xe5) #define EFI_RNG_PROTOCOL_GUID EFI_GUID(0x3152bca5, 0xeade, 0x433d, 0x86, 0x2e, 0xc0, 0x1c, 0xdc, 0x29, 0x1f, 0x44) +#define EFI_RNG_ALGORITHM_RAW EFI_GUID(0xe43176d7, 0xb6e8, 0x4827, 0xb7, 0x84, 0x7f, 0xfd, 0xc4, 0xb6, 0x85, 0x61) #define EFI_MEMORY_ATTRIBUTES_TABLE_GUID EFI_GUID(0xdcfa911d, 0x26eb, 0x469f, 0xa2, 0x20, 0x38, 0xb7, 0xdc, 0x46, 0x12, 0x20) #define EFI_CONSOLE_OUT_DEVICE_GUIDEFI_GUID(0xd3b36f2c, 0xd551, 0x11d4, 0x9a, 0x46, 0x00, 0x90, 0x27, 0x3f, 0xc1, 0x4d) -- 2.10.0
[PATCH 2/9] MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem
From: Ard Biesheuvel Since I will be co-maintaining the EFI subsystem, it makes sense to mention the ARM and arm64 EFI bits in the EFI section in MAINTAINERS so that Matt, the list and I get cc'ed on proposed changes. Acked-by: Will Deacon Acked-by: Russell King Cc: Matt Fleming Signed-off-by: Ard Biesheuvel --- MAINTAINERS | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 6847ba844ef9..1f38999d8ce7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4625,12 +4625,14 @@ L: linux-...@vger.kernel.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git S: Maintained F: Documentation/efi-stub.txt -F: arch/ia64/kernel/efi.c +F: arch/*/kernel/efi.c F: arch/x86/boot/compressed/eboot.[ch] -F: arch/x86/include/asm/efi.h +F: arch/*/include/asm/efi.h F: arch/x86/platform/efi/ F: drivers/firmware/efi/ F: include/linux/efi*.h +F: arch/arm/boot/compressed/efi-header.S +F: arch/arm64/kernel/efi-entry.S EFI VARIABLE FILESYSTEM M: Matthew Garrett -- 2.10.0
[PATCH 4/9] efi/libstub: Add random.c to ARM build
From: Ard Biesheuvel Make random.c build for ARM by moving the fallback definition of EFI_ALLOC_ALIGN to efistub.h, and replacing a division by a value we know to be a power of 2 with a right shift (this is required since ARM does not have any integer division helper routines in its decompressor) Cc: Matt Fleming Reviewed-by: Kees Cook Signed-off-by: Ard Biesheuvel --- drivers/firmware/efi/libstub/Makefile | 4 ++-- drivers/firmware/efi/libstub/efi-stub-helper.c | 9 - drivers/firmware/efi/libstub/efistub.h | 9 + drivers/firmware/efi/libstub/random.c | 8 +--- 4 files changed, 16 insertions(+), 14 deletions(-) diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile index c06945160a41..40ddf8f763a8 100644 --- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile @@ -36,11 +36,11 @@ arm-deps := fdt_rw.c fdt_ro.c fdt_wip.c fdt.c fdt_empty_tree.c fdt_sw.c sort.c $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE $(call if_changed_rule,cc_o_c) -lib-$(CONFIG_EFI_ARMSTUB) += arm-stub.o fdt.o string.o \ +lib-$(CONFIG_EFI_ARMSTUB) += arm-stub.o fdt.o string.o random.o \ $(patsubst %.c,lib-%.o,$(arm-deps)) lib-$(CONFIG_ARM) += arm32-stub.o -lib-$(CONFIG_ARM64)+= arm64-stub.o random.o +lib-$(CONFIG_ARM64)+= arm64-stub.o CFLAGS_arm64-stub.o:= -DTEXT_OFFSET=$(TEXT_OFFSET) # diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c index 4b74bf86c74d..757badc1debb 100644 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c @@ -32,15 +32,6 @@ static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE; -/* - * Allow the platform to override the allocation granularity: this allows - * systems that have the capability to run with a larger page size to deal - * with the allocations for initrd and fdt more efficiently. - */ -#ifndef EFI_ALLOC_ALIGN -#define EFI_ALLOC_ALIGNEFI_PAGE_SIZE -#endif - #define EFI_MMAP_NR_SLACK_SLOTS8 struct file_info { diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h index ee49cd23ee63..fe1f22584c69 100644 --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h @@ -15,6 +15,15 @@ */ #undef __init +/* + * Allow the platform to override the allocation granularity: this allows + * systems that have the capability to run with a larger page size to deal + * with the allocations for initrd and fdt more efficiently. + */ +#ifndef EFI_ALLOC_ALIGN +#define EFI_ALLOC_ALIGNEFI_PAGE_SIZE +#endif + void efi_char16_printk(efi_system_table_t *, efi_char16_t *); efi_status_t efi_open_volume(efi_system_table_t *sys_table_arg, void *__image, diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c index 0c9f58c5ba50..f8e2e5ae6872 100644 --- a/drivers/firmware/efi/libstub/random.c +++ b/drivers/firmware/efi/libstub/random.c @@ -8,6 +8,7 @@ */ #include +#include #include #include "efistub.h" @@ -41,8 +42,9 @@ efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table_arg, */ static unsigned long get_entry_num_slots(efi_memory_desc_t *md, unsigned long size, -unsigned long align) +unsigned long align_shift) { + unsigned long align = 1UL << align_shift; u64 start, end; if (md->type != EFI_CONVENTIONAL_MEMORY) @@ -55,7 +57,7 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t *md, if (start > end) return 0; - return (end - start + 1) / align; + return (end - start + 1) >> align_shift; } /* @@ -98,7 +100,7 @@ efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg, efi_memory_desc_t *md = (void *)memory_map + map_offset; unsigned long slots; - slots = get_entry_num_slots(md, size, align); + slots = get_entry_num_slots(md, size, ilog2(align)); MD_NUM_SLOTS(md) = slots; total_slots += slots; } -- 2.10.0
[PATCH 9/9] thunderbolt: Use Device ROM retrieved from EFI
From: Lukas Wunner Macs with Thunderbolt 1 do not have a unit-specific DROM: The DROM is empty with uid 0x1. (Apple started factory-burning a unit- specific DROM with Thunderbolt 2.) Instead, the NHI EFI driver supplies a DROM in a device property. Use it if available. It's only available when booting with the efistub. If it's not available, silently fall back to our hardcoded DROM. The size of the DROM is always 256 bytes. The number is hardcoded into the NHI EFI driver. This commit can deal with an arbitrary size however, just in case they ever change that. Background information: The EFI firmware volume contains ROM files for the NHI, GMUX and several other chips as well as key material. This strategy allows Apple to deploy ROM or key updates by simply publishing an EFI firmware update on their website. Drivers do not access those files directly but rather through a file server via EFI protocol AC5E4829-A8FD-440B-AF33-9FFE013B12D8. Files are identified by GUID, the NHI DROM has 339370BD-CFC6-4454-8EF7-704653120818. The NHI EFI driver amends that file with a unit-specific uid. The uid has 64 bit but its entropy is much lower: 24 bit represent the model, 24 bit are taken from a serial number, 16 bit are fixed. The NHI EFI driver obtains the serial number via the DataHub protocol, copies it into the DROM, calculates the CRC and submits the result as a device property. A modification is needed in the resume code where we currently read the uid of all switches in the hierarchy to detect plug events that occurred during sleep. On Thunderbolt 1 root switches this will now lead to a mismatch between the uid of the empty DROM and the EFI DROM. Exempt the root switch from this check: It's built in, so the uid should never change. However we continue to *read* the uid of the root switch, this seems like a good way to test its reachability after resume. Signed-off-by: Lukas Wunner Tested-by: Lukas Wunner [MacBookPro9,1] Tested-by: Pierre Moreau [MacBookPro11,3] Acked-by: Andreas Noever Cc: Pedro Vilaça Cc: Peter Jones Cc: Ard Biesheuvel Signed-off-by: Matt Fleming --- drivers/thunderbolt/Kconfig | 1 + drivers/thunderbolt/eeprom.c | 43 +++ drivers/thunderbolt/switch.c | 2 +- 3 files changed, 45 insertions(+), 1 deletion(-) diff --git a/drivers/thunderbolt/Kconfig b/drivers/thunderbolt/Kconfig index c121acc15bfe..0056df7f3c09 100644 --- a/drivers/thunderbolt/Kconfig +++ b/drivers/thunderbolt/Kconfig @@ -1,6 +1,7 @@ menuconfig THUNDERBOLT tristate "Thunderbolt support for Apple devices" depends on PCI + select APPLE_PROPERTIES select CRC32 help Cactus Ridge Thunderbolt Controller driver diff --git a/drivers/thunderbolt/eeprom.c b/drivers/thunderbolt/eeprom.c index 2b9602c2c355..6392990c984d 100644 --- a/drivers/thunderbolt/eeprom.c +++ b/drivers/thunderbolt/eeprom.c @@ -5,6 +5,7 @@ */ #include +#include #include #include "tb.h" @@ -360,6 +361,40 @@ static int tb_drom_parse_entries(struct tb_switch *sw) } /** + * tb_drom_copy_efi - copy drom supplied by EFI to sw->drom if present + */ +static int tb_drom_copy_efi(struct tb_switch *sw, u16 *size) +{ + struct device *dev = >tb->nhi->pdev->dev; + int len, res; + + len = device_property_read_u8_array(dev, "ThunderboltDROM", NULL, 0); + if (len < 0 || len < sizeof(struct tb_drom_header)) + return -EINVAL; + + sw->drom = kmalloc(len, GFP_KERNEL); + if (!sw->drom) + return -ENOMEM; + + res = device_property_read_u8_array(dev, "ThunderboltDROM", sw->drom, + len); + if (res) + goto err; + + *size = ((struct tb_drom_header *)sw->drom)->data_len + + TB_DROM_DATA_START; + if (*size > len) + goto err; + + return 0; + +err: + kfree(sw->drom); + sw->drom = NULL; + return -EINVAL; +} + +/** * tb_drom_read - copy drom to sw->drom and parse it */ int tb_drom_read(struct tb_switch *sw) @@ -374,6 +409,13 @@ int tb_drom_read(struct tb_switch *sw) if (tb_route(sw) == 0) { /* +* Apple's NHI EFI driver supplies a DROM for the root switch +* in a device property. Use it if available. +*/ + if (tb_drom_copy_efi(sw, ) == 0) + goto parse; + + /* * The root switch contains only a dummy drom (header only, * no entries). Hardcode the configuration here. */ @@ -418,6 +460,7 @@ int tb_drom_read(struct tb_switch *sw) if (res) goto err; +parse: header = (void *) sw->drom; if (header-&g
[PATCH 8/9] x86/efi: Retrieve and assign Apple device properties
ould also work for devices instantiated in a deferred fashion. It seems like this approach would be more complicated and require more code. That doesn't seem justified without a specific use case. For comparison, the strategy on macOS is to assign properties to objects in the ACPI namespace (AppleACPIPlatformExpert::mergeEFIProperties()). That approach is definitely wrong as it fails for devices not present in the namespace: The NHI EFI driver supplies properties for attached Thunderbolt devices, yet on Macs with Thunderbolt 1 only one device level behind the host controller is described in the namespace. Consequently macOS cannot assign properties for chained devices. With Thunderbolt 2 they started to describe three device levels behind host controllers in the namespace but this grossly inflates the SSDT and still fails if the user daisy-chained more than three devices. We copy the property names and values from the setup_data payload to swappable virtual memory and afterwards make the payload available to the page allocator. This is just for the sake of good housekeeping, it wouldn't occupy a meaningful amount of physical memory ( bytes on my machine). Only the payload is freed, not the setup_data header since otherwise we'd break the list linkage and we cannot safely update the predecessor's ->next link because there's no locking for the list. The payload is currently not passed on to kexec'ed kernels, same for PCI ROMs retrieved by setup_efi_pci(). This can be added later if there is demand by amending setup_efi_state(). The payload can then no longer be made available to the page allocator of course. Tested-by: Lukas Wunner <lu...@wunner.de> [MacBookPro9,1] Tested-by: Pierre Moreau <pierre.mor...@free.fr> [MacBookPro11,3] Signed-off-by: Lukas Wunner <lu...@wunner.de> Cc: grub-de...@gnu.org Cc: Pedro Vilaça <rever...@put.as> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Andreas Noever <andreas.noe...@gmail.com> Cc: Peter Jones <pjo...@redhat.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- Documentation/kernel-parameters.txt | 5 + arch/x86/boot/compressed/eboot.c| 65 + arch/x86/include/uapi/asm/bootparam.h | 1 + drivers/firmware/efi/Kconfig| 13 ++ drivers/firmware/efi/Makefile | 1 + drivers/firmware/efi/apple-properties.c | 248 include/linux/efi.h | 17 +++ 7 files changed, 350 insertions(+) create mode 100644 drivers/firmware/efi/apple-properties.c diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 37babf91f2cb..86a31dfc036e 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1062,6 +1062,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted. dscc4.setup=[NET] + dump_apple_properties [X86] + Dump name and content of EFI device properties on + x86 Macs. Useful for driver authors to determine + what data is available or for reverse-engineering. + dyndbg[="val"] [KNL,DYNAMIC_DEBUG] module.dyndbg[="val"] Enable debug messages at boot time. See diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c index cc69e37548db..ff01c8fc76f7 100644 --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -537,6 +537,69 @@ static void setup_efi_pci(struct boot_params *params) efi_call_early(free_pool, pci_handle); } +static void retrieve_apple_device_properties(struct boot_params *boot_params) +{ + efi_guid_t guid = APPLE_PROPERTIES_PROTOCOL_GUID; + struct setup_data *data, *new; + efi_status_t status; + u32 size = 0; + void *p; + + status = efi_call_early(locate_protocol, , NULL, ); + if (status != EFI_SUCCESS) + return; + + if (efi_table_attr(apple_properties_protocol, version, p) != 0x1) { + efi_printk(sys_table, "Unsupported properties proto version\n"); + return; + } + + efi_call_proto(apple_properties_protocol, get_all, p, NULL, ); + if (!size) + return; + + do { + status = efi_call_early(allocate_pool, EFI_LOADER_DATA, + size + sizeof(struct setup_data), ); + if (status != EFI_SUCCESS) { + efi_printk(sys_table, + "Failed to alloc mem for properties\n"); + return; + } + + status = efi_call_proto(apple_properties_protocol, get_all, p, + new->data, ); + + if (status == EFI_BUFFER_TOO_SMALL) + e
[PATCH 3/9] efi: Add support for seeding the RNG from a UEFI config table
From: Ard Biesheuvel <ard.biesheu...@linaro.org> Specify a Linux specific UEFI configuration table that carries some random bits, and use the contents during early boot to seed the kernel's random number generator. This allows much strong random numbers to be generated early on. The entropy is fed to the kernel using add_device_randomness(), which is documented as being appropriate for being called very early. Since UEFI configuration tables may also be consumed by kexec'd kernels, register a reboot notifier that updates the seed in the table. Note that the config table could be generated by the EFI stub or by any other UEFI driver or application (e.g., GRUB), but the random seed table GUID and the associated functionality should be considered an internal kernel interface (unless it is promoted to ABI later on) Cc: Matt Fleming <m...@codeblueprint.co.uk> Reviewed-by: Kees Cook <keesc...@chromium.org> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- drivers/firmware/efi/efi.c | 72 ++ include/linux/efi.h| 8 ++ 2 files changed, 80 insertions(+) diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index a4944e22f294..92914801e388 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -23,7 +23,10 @@ #include #include #include +#include #include +#include +#include #include #include #include @@ -48,6 +51,7 @@ struct efi __read_mostly efi = { .esrt = EFI_INVALID_TABLE_ADDR, .properties_table = EFI_INVALID_TABLE_ADDR, .mem_attr_table = EFI_INVALID_TABLE_ADDR, + .rng_seed = EFI_INVALID_TABLE_ADDR, }; EXPORT_SYMBOL(efi); @@ -440,6 +444,7 @@ static __initdata efi_config_table_type_t common_tables[] = { {EFI_SYSTEM_RESOURCE_TABLE_GUID, "ESRT", }, {EFI_PROPERTIES_TABLE_GUID, "PROP", _table}, {EFI_MEMORY_ATTRIBUTES_TABLE_GUID, "MEMATTR", _attr_table}, + {LINUX_EFI_RANDOM_SEED_TABLE_GUID, "RNG", _seed}, {NULL_GUID, NULL, NULL}, }; @@ -501,6 +506,29 @@ int __init efi_config_parse_tables(void *config_tables, int count, int sz, pr_cont("\n"); set_bit(EFI_CONFIG_TABLES, ); + if (efi.rng_seed != EFI_INVALID_TABLE_ADDR) { + struct linux_efi_random_seed *seed; + u32 size = 0; + + seed = early_memremap(efi.rng_seed, sizeof(*seed)); + if (seed != NULL) { + size = seed->size; + early_memunmap(seed, sizeof(*seed)); + } else { + pr_err("Could not map UEFI random seed!\n"); + } + if (size > 0) { + seed = early_memremap(efi.rng_seed, + sizeof(*seed) + size); + if (seed != NULL) { + add_device_randomness(seed->bits, seed->size); + early_memunmap(seed, sizeof(*seed) + size); + } else { + pr_err("Could not map UEFI random seed!\n"); + } + } + } + /* Parse the EFI Properties table if it exists */ if (efi.properties_table != EFI_INVALID_TABLE_ADDR) { efi_properties_table_t *tbl; @@ -824,3 +852,47 @@ int efi_status_to_err(efi_status_t status) return err; } + +#ifdef CONFIG_KEXEC +static int update_efi_random_seed(struct notifier_block *nb, + unsigned long code, void *unused) +{ + struct linux_efi_random_seed *seed; + u32 size = 0; + + if (!kexec_in_progress) + return NOTIFY_DONE; + + seed = memremap(efi.rng_seed, sizeof(*seed), MEMREMAP_WB); + if (seed != NULL) { + size = min(seed->size, 32U); + memunmap(seed); + } else { + pr_err("Could not map UEFI random seed!\n"); + } + if (size > 0) { + seed = memremap(efi.rng_seed, sizeof(*seed) + size, + MEMREMAP_WB); + if (seed != NULL) { + seed->size = size; + get_random_bytes(seed->bits, seed->size); + memunmap(seed); + } else { + pr_err("Could not map UEFI random seed!\n"); + } + } + return NOTIFY_DONE; +} + +static struct notifier_block efi_random_seed_nb = { + .notifier_call = update_efi_random_seed, +}; + +static int register_update_efi_random_seed(void) +{ + if (efi.rng_seed == EFI_INVALID_TABLE_ADDR) + return 0; + return register_reboot_notifier(_random_seed_nb); +} +late_init
[PATCH 7/9] efi: Allow bitness-agnostic protocol calls
From: Lukas Wunner <lu...@wunner.de> We already have a macro to invoke boot services which on x86 adapts automatically to the bitness of the EFI firmware: efi_call_early(). The macro allows sharing of functions across arches and bitness variants as long as those functions only call boot services. However in practice functions in the EFI stub contain a mix of boot services calls and protocol calls. Add an efi_call_proto() macro for bitness-agnostic protocol calls to allow sharing more code across arches as well as deduplicating 32 bit and 64 bit code paths. On x86, implement it using a new efi_table_attr() macro for bitness- agnostic table lookups. Refactor efi_call_early() to make use of the same macro. (The resulting object code remains identical.) Signed-off-by: Lukas Wunner <lu...@wunner.de> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Andreas Noever <andreas.noe...@gmail.com> Cc: Peter Jones <pjo...@redhat.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- arch/arm/include/asm/efi.h | 3 +++ arch/arm64/include/asm/efi.h | 3 +++ arch/x86/include/asm/efi.h | 16 +++- 3 files changed, 17 insertions(+), 5 deletions(-) diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h index 766bf9b78160..0b06f5341b45 100644 --- a/arch/arm/include/asm/efi.h +++ b/arch/arm/include/asm/efi.h @@ -57,6 +57,9 @@ void efi_virtmap_unload(void); #define __efi_call_early(f, ...) f(__VA_ARGS__) #define efi_is_64bit() (false) +#define efi_call_proto(protocol, f, instance, ...) \ + ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__) + struct screen_info *alloc_screen_info(efi_system_table_t *sys_table_arg); void free_screen_info(efi_system_table_t *sys_table, struct screen_info *si); diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h index a9e54aad15ef..771b3f0bc757 100644 --- a/arch/arm64/include/asm/efi.h +++ b/arch/arm64/include/asm/efi.h @@ -51,6 +51,9 @@ int efi_set_mapping_permissions(struct mm_struct *mm, efi_memory_desc_t *md); #define __efi_call_early(f, ...) f(__VA_ARGS__) #define efi_is_64bit() (true) +#define efi_call_proto(protocol, f, instance, ...) \ + ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__) + #define alloc_screen_info(x...)_info #define free_screen_info(x...) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 389d700b961e..e99675b9c861 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -210,12 +210,18 @@ static inline bool efi_is_64bit(void) return __efi_early()->is64; } +#define efi_table_attr(table, attr, instance) \ + (efi_is_64bit() ? \ + ((table##_64_t *)(unsigned long)instance)->attr : \ + ((table##_32_t *)(unsigned long)instance)->attr) + +#define efi_call_proto(protocol, f, instance, ...) \ + __efi_early()->call(efi_table_attr(protocol, f, instance), \ + instance, ##__VA_ARGS__) + #define efi_call_early(f, ...) \ - __efi_early()->call(efi_is_64bit() ?\ - ((efi_boot_services_64_t *)(unsigned long) \ - __efi_early()->boot_services)->f : \ - ((efi_boot_services_32_t *)(unsigned long) \ - __efi_early()->boot_services)->f, __VA_ARGS__) + __efi_early()->call(efi_table_attr(efi_boot_services, f,\ + __efi_early()->boot_services), __VA_ARGS__) #define __efi_call_early(f, ...) \ __efi_early()->call((unsigned long)f, __VA_ARGS__); -- 2.10.0
[GIT PULL 0/9] EFI changes for v4.10
Folks, please pull the following v4.10 material. There isn't a huge amount of stuff here. The biggest change is the EFI dev path parser code from Lukas to get thunderbolt working on his macbook. [ The thunderbolt patch has been ACK'd by Andreas and given the OK to take it through the EFI tree ] The following changes since commit a75dcb5848359f488c32c0aef8711d9bd37a77b8: efi/efivar_ssdt_load: Don't return success on allocation failure (2016-10-18 17:11:20 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-next for you to fetch changes up to 9110bc036062fcd31994a35540d63f8deed22efa: thunderbolt: Use Device ROM retrieved from EFI (2016-11-12 21:14:43 +) * Fix an allocation bug in the generic EFI libstub where alignment and adjusted size isn't taken into account - Roy Franz * Update the EFI MAINTAINERS entry to include ARM and arm64 files and directories - Ard Biesheuvel * Add new feature to seed the RNG from the stashed value returned by EFI_RNG_PROTOCOL in EFI stub and wire up for ARM/arm64 - Ard Biesheuvel * Retrieve Apple device properties from within the EFI stub to fully support thunderbolt devices on Apple Macbooks - Lukas Wunner Ard Biesheuvel (4): MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem efi: Add support for seeding the RNG from a UEFI config table efi/libstub: Add random.c to ARM build efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table Lukas Wunner (4): efi: Add device path parser efi: Allow bitness-agnostic protocol calls x86/efi: Retrieve and assign Apple device properties thunderbolt: Use Device ROM retrieved from EFI Roy Franz (1): efi/libstub: Fix allocation size calculations Documentation/kernel-parameters.txt| 5 + MAINTAINERS| 6 +- arch/arm/include/asm/efi.h | 3 + arch/arm64/include/asm/efi.h | 3 + arch/x86/boot/compressed/eboot.c | 65 +++ arch/x86/include/asm/efi.h | 16 +- arch/x86/include/uapi/asm/bootparam.h | 1 + drivers/firmware/efi/Kconfig | 18 ++ drivers/firmware/efi/Makefile | 2 + drivers/firmware/efi/apple-properties.c| 248 + drivers/firmware/efi/dev-path-parser.c | 203 drivers/firmware/efi/efi.c | 72 +++ drivers/firmware/efi/libstub/Makefile | 4 +- drivers/firmware/efi/libstub/arm-stub.c| 2 + drivers/firmware/efi/libstub/efi-stub-helper.c | 33 ++-- drivers/firmware/efi/libstub/efistub.h | 11 ++ drivers/firmware/efi/libstub/random.c | 56 +- drivers/thunderbolt/Kconfig| 1 + drivers/thunderbolt/eeprom.c | 43 + drivers/thunderbolt/switch.c | 2 +- include/linux/efi.h| 46 + 21 files changed, 808 insertions(+), 32 deletions(-) create mode 100644 drivers/firmware/efi/apple-properties.c create mode 100644 drivers/firmware/efi/dev-path-parser.c
[PATCH 8/9] x86/efi: Retrieve and assign Apple device properties
for devices instantiated in a deferred fashion. It seems like this approach would be more complicated and require more code. That doesn't seem justified without a specific use case. For comparison, the strategy on macOS is to assign properties to objects in the ACPI namespace (AppleACPIPlatformExpert::mergeEFIProperties()). That approach is definitely wrong as it fails for devices not present in the namespace: The NHI EFI driver supplies properties for attached Thunderbolt devices, yet on Macs with Thunderbolt 1 only one device level behind the host controller is described in the namespace. Consequently macOS cannot assign properties for chained devices. With Thunderbolt 2 they started to describe three device levels behind host controllers in the namespace but this grossly inflates the SSDT and still fails if the user daisy-chained more than three devices. We copy the property names and values from the setup_data payload to swappable virtual memory and afterwards make the payload available to the page allocator. This is just for the sake of good housekeeping, it wouldn't occupy a meaningful amount of physical memory ( bytes on my machine). Only the payload is freed, not the setup_data header since otherwise we'd break the list linkage and we cannot safely update the predecessor's ->next link because there's no locking for the list. The payload is currently not passed on to kexec'ed kernels, same for PCI ROMs retrieved by setup_efi_pci(). This can be added later if there is demand by amending setup_efi_state(). The payload can then no longer be made available to the page allocator of course. Tested-by: Lukas Wunner [MacBookPro9,1] Tested-by: Pierre Moreau [MacBookPro11,3] Signed-off-by: Lukas Wunner Cc: grub-de...@gnu.org Cc: Pedro Vilaça Cc: Ard Biesheuvel Cc: Andreas Noever Cc: Peter Jones Signed-off-by: Matt Fleming --- Documentation/kernel-parameters.txt | 5 + arch/x86/boot/compressed/eboot.c| 65 + arch/x86/include/uapi/asm/bootparam.h | 1 + drivers/firmware/efi/Kconfig| 13 ++ drivers/firmware/efi/Makefile | 1 + drivers/firmware/efi/apple-properties.c | 248 include/linux/efi.h | 17 +++ 7 files changed, 350 insertions(+) create mode 100644 drivers/firmware/efi/apple-properties.c diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 37babf91f2cb..86a31dfc036e 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1062,6 +1062,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted. dscc4.setup=[NET] + dump_apple_properties [X86] + Dump name and content of EFI device properties on + x86 Macs. Useful for driver authors to determine + what data is available or for reverse-engineering. + dyndbg[="val"] [KNL,DYNAMIC_DEBUG] module.dyndbg[="val"] Enable debug messages at boot time. See diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c index cc69e37548db..ff01c8fc76f7 100644 --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -537,6 +537,69 @@ static void setup_efi_pci(struct boot_params *params) efi_call_early(free_pool, pci_handle); } +static void retrieve_apple_device_properties(struct boot_params *boot_params) +{ + efi_guid_t guid = APPLE_PROPERTIES_PROTOCOL_GUID; + struct setup_data *data, *new; + efi_status_t status; + u32 size = 0; + void *p; + + status = efi_call_early(locate_protocol, , NULL, ); + if (status != EFI_SUCCESS) + return; + + if (efi_table_attr(apple_properties_protocol, version, p) != 0x1) { + efi_printk(sys_table, "Unsupported properties proto version\n"); + return; + } + + efi_call_proto(apple_properties_protocol, get_all, p, NULL, ); + if (!size) + return; + + do { + status = efi_call_early(allocate_pool, EFI_LOADER_DATA, + size + sizeof(struct setup_data), ); + if (status != EFI_SUCCESS) { + efi_printk(sys_table, + "Failed to alloc mem for properties\n"); + return; + } + + status = efi_call_proto(apple_properties_protocol, get_all, p, + new->data, ); + + if (status == EFI_BUFFER_TOO_SMALL) + efi_call_early(free_pool, new); + } while (status == EFI_BUFFER_TOO_SMALL); + + new->type = SETUP_APPLE_PROPERTIES; + new->len = size; + new->next = 0; + + data = (struct setup_data *)(unsigned long)boot_params
[PATCH 3/9] efi: Add support for seeding the RNG from a UEFI config table
From: Ard Biesheuvel Specify a Linux specific UEFI configuration table that carries some random bits, and use the contents during early boot to seed the kernel's random number generator. This allows much strong random numbers to be generated early on. The entropy is fed to the kernel using add_device_randomness(), which is documented as being appropriate for being called very early. Since UEFI configuration tables may also be consumed by kexec'd kernels, register a reboot notifier that updates the seed in the table. Note that the config table could be generated by the EFI stub or by any other UEFI driver or application (e.g., GRUB), but the random seed table GUID and the associated functionality should be considered an internal kernel interface (unless it is promoted to ABI later on) Cc: Matt Fleming Reviewed-by: Kees Cook Signed-off-by: Ard Biesheuvel --- drivers/firmware/efi/efi.c | 72 ++ include/linux/efi.h| 8 ++ 2 files changed, 80 insertions(+) diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index a4944e22f294..92914801e388 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -23,7 +23,10 @@ #include #include #include +#include #include +#include +#include #include #include #include @@ -48,6 +51,7 @@ struct efi __read_mostly efi = { .esrt = EFI_INVALID_TABLE_ADDR, .properties_table = EFI_INVALID_TABLE_ADDR, .mem_attr_table = EFI_INVALID_TABLE_ADDR, + .rng_seed = EFI_INVALID_TABLE_ADDR, }; EXPORT_SYMBOL(efi); @@ -440,6 +444,7 @@ static __initdata efi_config_table_type_t common_tables[] = { {EFI_SYSTEM_RESOURCE_TABLE_GUID, "ESRT", }, {EFI_PROPERTIES_TABLE_GUID, "PROP", _table}, {EFI_MEMORY_ATTRIBUTES_TABLE_GUID, "MEMATTR", _attr_table}, + {LINUX_EFI_RANDOM_SEED_TABLE_GUID, "RNG", _seed}, {NULL_GUID, NULL, NULL}, }; @@ -501,6 +506,29 @@ int __init efi_config_parse_tables(void *config_tables, int count, int sz, pr_cont("\n"); set_bit(EFI_CONFIG_TABLES, ); + if (efi.rng_seed != EFI_INVALID_TABLE_ADDR) { + struct linux_efi_random_seed *seed; + u32 size = 0; + + seed = early_memremap(efi.rng_seed, sizeof(*seed)); + if (seed != NULL) { + size = seed->size; + early_memunmap(seed, sizeof(*seed)); + } else { + pr_err("Could not map UEFI random seed!\n"); + } + if (size > 0) { + seed = early_memremap(efi.rng_seed, + sizeof(*seed) + size); + if (seed != NULL) { + add_device_randomness(seed->bits, seed->size); + early_memunmap(seed, sizeof(*seed) + size); + } else { + pr_err("Could not map UEFI random seed!\n"); + } + } + } + /* Parse the EFI Properties table if it exists */ if (efi.properties_table != EFI_INVALID_TABLE_ADDR) { efi_properties_table_t *tbl; @@ -824,3 +852,47 @@ int efi_status_to_err(efi_status_t status) return err; } + +#ifdef CONFIG_KEXEC +static int update_efi_random_seed(struct notifier_block *nb, + unsigned long code, void *unused) +{ + struct linux_efi_random_seed *seed; + u32 size = 0; + + if (!kexec_in_progress) + return NOTIFY_DONE; + + seed = memremap(efi.rng_seed, sizeof(*seed), MEMREMAP_WB); + if (seed != NULL) { + size = min(seed->size, 32U); + memunmap(seed); + } else { + pr_err("Could not map UEFI random seed!\n"); + } + if (size > 0) { + seed = memremap(efi.rng_seed, sizeof(*seed) + size, + MEMREMAP_WB); + if (seed != NULL) { + seed->size = size; + get_random_bytes(seed->bits, seed->size); + memunmap(seed); + } else { + pr_err("Could not map UEFI random seed!\n"); + } + } + return NOTIFY_DONE; +} + +static struct notifier_block efi_random_seed_nb = { + .notifier_call = update_efi_random_seed, +}; + +static int register_update_efi_random_seed(void) +{ + if (efi.rng_seed == EFI_INVALID_TABLE_ADDR) + return 0; + return register_reboot_notifier(_random_seed_nb); +} +late_initcall(register_update_efi_random_seed); +#endif diff --git a/include/linux/efi.h b/include/linux/efi.h index 2d089487d2da..85e28b138
[PATCH 7/9] efi: Allow bitness-agnostic protocol calls
From: Lukas Wunner We already have a macro to invoke boot services which on x86 adapts automatically to the bitness of the EFI firmware: efi_call_early(). The macro allows sharing of functions across arches and bitness variants as long as those functions only call boot services. However in practice functions in the EFI stub contain a mix of boot services calls and protocol calls. Add an efi_call_proto() macro for bitness-agnostic protocol calls to allow sharing more code across arches as well as deduplicating 32 bit and 64 bit code paths. On x86, implement it using a new efi_table_attr() macro for bitness- agnostic table lookups. Refactor efi_call_early() to make use of the same macro. (The resulting object code remains identical.) Signed-off-by: Lukas Wunner Cc: Ard Biesheuvel Cc: Andreas Noever Cc: Peter Jones Signed-off-by: Matt Fleming --- arch/arm/include/asm/efi.h | 3 +++ arch/arm64/include/asm/efi.h | 3 +++ arch/x86/include/asm/efi.h | 16 +++- 3 files changed, 17 insertions(+), 5 deletions(-) diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h index 766bf9b78160..0b06f5341b45 100644 --- a/arch/arm/include/asm/efi.h +++ b/arch/arm/include/asm/efi.h @@ -57,6 +57,9 @@ void efi_virtmap_unload(void); #define __efi_call_early(f, ...) f(__VA_ARGS__) #define efi_is_64bit() (false) +#define efi_call_proto(protocol, f, instance, ...) \ + ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__) + struct screen_info *alloc_screen_info(efi_system_table_t *sys_table_arg); void free_screen_info(efi_system_table_t *sys_table, struct screen_info *si); diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h index a9e54aad15ef..771b3f0bc757 100644 --- a/arch/arm64/include/asm/efi.h +++ b/arch/arm64/include/asm/efi.h @@ -51,6 +51,9 @@ int efi_set_mapping_permissions(struct mm_struct *mm, efi_memory_desc_t *md); #define __efi_call_early(f, ...) f(__VA_ARGS__) #define efi_is_64bit() (true) +#define efi_call_proto(protocol, f, instance, ...) \ + ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__) + #define alloc_screen_info(x...)_info #define free_screen_info(x...) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 389d700b961e..e99675b9c861 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -210,12 +210,18 @@ static inline bool efi_is_64bit(void) return __efi_early()->is64; } +#define efi_table_attr(table, attr, instance) \ + (efi_is_64bit() ? \ + ((table##_64_t *)(unsigned long)instance)->attr : \ + ((table##_32_t *)(unsigned long)instance)->attr) + +#define efi_call_proto(protocol, f, instance, ...) \ + __efi_early()->call(efi_table_attr(protocol, f, instance), \ + instance, ##__VA_ARGS__) + #define efi_call_early(f, ...) \ - __efi_early()->call(efi_is_64bit() ?\ - ((efi_boot_services_64_t *)(unsigned long) \ - __efi_early()->boot_services)->f : \ - ((efi_boot_services_32_t *)(unsigned long) \ - __efi_early()->boot_services)->f, __VA_ARGS__) + __efi_early()->call(efi_table_attr(efi_boot_services, f,\ + __efi_early()->boot_services), __VA_ARGS__) #define __efi_call_early(f, ...) \ __efi_early()->call((unsigned long)f, __VA_ARGS__); -- 2.10.0
[GIT PULL 0/9] EFI changes for v4.10
Folks, please pull the following v4.10 material. There isn't a huge amount of stuff here. The biggest change is the EFI dev path parser code from Lukas to get thunderbolt working on his macbook. [ The thunderbolt patch has been ACK'd by Andreas and given the OK to take it through the EFI tree ] The following changes since commit a75dcb5848359f488c32c0aef8711d9bd37a77b8: efi/efivar_ssdt_load: Don't return success on allocation failure (2016-10-18 17:11:20 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-next for you to fetch changes up to 9110bc036062fcd31994a35540d63f8deed22efa: thunderbolt: Use Device ROM retrieved from EFI (2016-11-12 21:14:43 +) * Fix an allocation bug in the generic EFI libstub where alignment and adjusted size isn't taken into account - Roy Franz * Update the EFI MAINTAINERS entry to include ARM and arm64 files and directories - Ard Biesheuvel * Add new feature to seed the RNG from the stashed value returned by EFI_RNG_PROTOCOL in EFI stub and wire up for ARM/arm64 - Ard Biesheuvel * Retrieve Apple device properties from within the EFI stub to fully support thunderbolt devices on Apple Macbooks - Lukas Wunner Ard Biesheuvel (4): MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem efi: Add support for seeding the RNG from a UEFI config table efi/libstub: Add random.c to ARM build efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table Lukas Wunner (4): efi: Add device path parser efi: Allow bitness-agnostic protocol calls x86/efi: Retrieve and assign Apple device properties thunderbolt: Use Device ROM retrieved from EFI Roy Franz (1): efi/libstub: Fix allocation size calculations Documentation/kernel-parameters.txt| 5 + MAINTAINERS| 6 +- arch/arm/include/asm/efi.h | 3 + arch/arm64/include/asm/efi.h | 3 + arch/x86/boot/compressed/eboot.c | 65 +++ arch/x86/include/asm/efi.h | 16 +- arch/x86/include/uapi/asm/bootparam.h | 1 + drivers/firmware/efi/Kconfig | 18 ++ drivers/firmware/efi/Makefile | 2 + drivers/firmware/efi/apple-properties.c| 248 + drivers/firmware/efi/dev-path-parser.c | 203 drivers/firmware/efi/efi.c | 72 +++ drivers/firmware/efi/libstub/Makefile | 4 +- drivers/firmware/efi/libstub/arm-stub.c| 2 + drivers/firmware/efi/libstub/efi-stub-helper.c | 33 ++-- drivers/firmware/efi/libstub/efistub.h | 11 ++ drivers/firmware/efi/libstub/random.c | 56 +- drivers/thunderbolt/Kconfig| 1 + drivers/thunderbolt/eeprom.c | 43 + drivers/thunderbolt/switch.c | 2 +- include/linux/efi.h| 46 + 21 files changed, 808 insertions(+), 32 deletions(-) create mode 100644 drivers/firmware/efi/apple-properties.c create mode 100644 drivers/firmware/efi/dev-path-parser.c
[PATCH 1/2] x86/efi: Fix EFI memmap pointer size warning
From: Borislav Petkov <b...@suse.de> Fix this when building on 32-bit: arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’: arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] (efi_memory_desc_t *)pa); ^ arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] (efi_memory_desc_t *)pa); ^ The @pa local variable is declared as phys_addr_t and that is a u64 when CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE build.) However, its value comes from __pa() which is basically doing pointer arithmetic and checking, and returns unsigned long as it is the native pointer width. So let's use an unsigned long too. It should be fine to do so because the later users cast it to a pointer too. Signed-off-by: Borislav Petkov <b...@suse.de> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- arch/x86/platform/efi/efi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index bf99aa7005eb..936a488d6cf6 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -861,7 +861,7 @@ static void __init __efi_enter_virtual_mode(void) int count = 0, pg_shift = 0; void *new_memmap = NULL; efi_status_t status; - phys_addr_t pa; + unsigned long pa; efi.systab = NULL; -- 2.10.0
[PATCH 2/2] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK
Booting an EFI mixed mode kernel has been crashing since commit: e37e43a497d5 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)") The user-visible effect in my test setup was the kernel being unable to find the root file system ramdisk. This was likely caused by silent memory or page table corruption. Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as abusing virt_to_phys() because it was passing addresses that were not part of the kernel direct mapping. Use the slow version instead, which correctly handles all memory regions by performing a page table walk. Suggested-by: Andy Lutomirski <l...@amacapital.net> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ingo Molnar <mi...@kernel.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: "H. Peter Anvin" <h...@zytor.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- arch/x86/platform/efi/efi_64.c | 80 ++ 1 file changed, 57 insertions(+), 23 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 58b0f801f66f..319148bd4b05 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -211,6 +212,35 @@ void efi_sync_low_kernel_mappings(void) memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries); } +/* + * Wrapper for slow_virt_to_phys() that handles NULL addresses. + */ +static inline phys_addr_t +virt_to_phys_or_null_size(void *va, unsigned long size) +{ + bool bad_size; + + if (!va) + return 0; + + if (virt_addr_valid(va)) + return virt_to_phys(va); + + /* +* A fully aligned variable on the stack is guaranteed not to +* cross a page bounary. Try to catch strings on the stack by +* checking that 'size' is a power of two. +*/ + bad_size = size > PAGE_SIZE || !is_power_of_2(size); + + WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size); + + return slow_virt_to_phys(va); +} + +#define virt_to_phys_or_null(addr) \ + virt_to_phys_or_null_size((addr), sizeof(*(addr))) + int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) { unsigned long pfn, text; @@ -494,8 +524,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, efi_time_cap_t *tc) spin_lock(_lock); - phys_tm = virt_to_phys(tm); - phys_tc = virt_to_phys(tc); + phys_tm = virt_to_phys_or_null(tm); + phys_tc = virt_to_phys_or_null(tc); status = efi_thunk(get_time, phys_tm, phys_tc); @@ -511,7 +541,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_time, phys_tm); @@ -529,9 +559,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending, spin_lock(_lock); - phys_enabled = virt_to_phys(enabled); - phys_pending = virt_to_phys(pending); - phys_tm = virt_to_phys(tm); + phys_enabled = virt_to_phys_or_null(enabled); + phys_pending = virt_to_phys_or_null(pending); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(get_wakeup_time, phys_enabled, phys_pending, phys_tm); @@ -549,7 +579,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_wakeup_time, enabled, phys_tm); @@ -558,6 +588,10 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) return status; } +static unsigned long efi_name_size(efi_char16_t *name) +{ + return ucs2_strsize(name, EFI_VAR_NAME_LEN) + 1; +} static efi_status_t efi_thunk_get_variable(efi_char16_t *name, efi_guid_t *vendor, @@ -567,11 +601,11 @@ efi_thunk_get_variable(efi_char16_t *name, efi_guid_t *vendor, u32 phys_name, phys_vendor, phys_attr; u32 phys_data_size, phys_data; - phys_data_size = virt_to_phys(data_size); - phys_vendor = virt_to_phys(vendor); - phys_name = virt_to_phys(name); - phys_attr = virt_to_phys(attr); - phys_data = virt_to_phys(data); + phys_data_size = virt_to_phys_or_null(data_size); + phys_vendor = virt_to_phys_or_null(vendor); + phys_name = virt_to_phys_or_null_size(name, efi_name_size(name)); + phys_attr = virt_to_phys_or_null(attr); + phys_data = virt_to_phys_or_null_size(data, *data_size); status = efi_thunk(get_variable, phys_name, phys_vendor, phys_attr, phys_data_size, phys_data); @@ -586,9 +620,9 @@ efi_thunk_set_variable(efi_char16_t *name, efi_guid_t *vendor,
[GIT PULL 0/2] EFI urgent fixes
Folks, please pull the following two EFI patches. The first fixes a build warning for PAE that Boris hit. The second makes mixed-mode EFI boot again after the vmap'd stack changes introduced during the merge window. The following changes since commit bc33b0ca11e3df46a4fa7639ba488c9d4911: Linux 4.9-rc4 (2016-11-05 16:23:36 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-urgent for you to fetch changes up to 044ddf3d3e3cb62671f22fa837a2164d4786d867: x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK (2016-11-12 21:00:18 +) * Fix memory corruption when booting EFI mixed mode due to the recent vmap'd stack changes - Matt Fleming * Build warning fix in the EFI memmap code when CONFIG_X86_PAE and CONFIG_PHYS_ADDR_T_64BIT are enabled - Borislav Petkov Borislav Petkov (1): x86/efi: Fix EFI memmap pointer size warning Matt Fleming (1): x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK arch/x86/platform/efi/efi.c| 2 +- arch/x86/platform/efi/efi_64.c | 80 ++ 2 files changed, 58 insertions(+), 24 deletions(-)
[PATCH 1/2] x86/efi: Fix EFI memmap pointer size warning
From: Borislav Petkov Fix this when building on 32-bit: arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’: arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] (efi_memory_desc_t *)pa); ^ arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] (efi_memory_desc_t *)pa); ^ The @pa local variable is declared as phys_addr_t and that is a u64 when CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE build.) However, its value comes from __pa() which is basically doing pointer arithmetic and checking, and returns unsigned long as it is the native pointer width. So let's use an unsigned long too. It should be fine to do so because the later users cast it to a pointer too. Signed-off-by: Borislav Petkov Cc: Ard Biesheuvel Signed-off-by: Matt Fleming --- arch/x86/platform/efi/efi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index bf99aa7005eb..936a488d6cf6 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -861,7 +861,7 @@ static void __init __efi_enter_virtual_mode(void) int count = 0, pg_shift = 0; void *new_memmap = NULL; efi_status_t status; - phys_addr_t pa; + unsigned long pa; efi.systab = NULL; -- 2.10.0
[PATCH 2/2] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK
Booting an EFI mixed mode kernel has been crashing since commit: e37e43a497d5 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)") The user-visible effect in my test setup was the kernel being unable to find the root file system ramdisk. This was likely caused by silent memory or page table corruption. Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as abusing virt_to_phys() because it was passing addresses that were not part of the kernel direct mapping. Use the slow version instead, which correctly handles all memory regions by performing a page table walk. Suggested-by: Andy Lutomirski Cc: Ard Biesheuvel Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "H. Peter Anvin" Signed-off-by: Matt Fleming --- arch/x86/platform/efi/efi_64.c | 80 ++ 1 file changed, 57 insertions(+), 23 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 58b0f801f66f..319148bd4b05 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -211,6 +212,35 @@ void efi_sync_low_kernel_mappings(void) memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries); } +/* + * Wrapper for slow_virt_to_phys() that handles NULL addresses. + */ +static inline phys_addr_t +virt_to_phys_or_null_size(void *va, unsigned long size) +{ + bool bad_size; + + if (!va) + return 0; + + if (virt_addr_valid(va)) + return virt_to_phys(va); + + /* +* A fully aligned variable on the stack is guaranteed not to +* cross a page bounary. Try to catch strings on the stack by +* checking that 'size' is a power of two. +*/ + bad_size = size > PAGE_SIZE || !is_power_of_2(size); + + WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size); + + return slow_virt_to_phys(va); +} + +#define virt_to_phys_or_null(addr) \ + virt_to_phys_or_null_size((addr), sizeof(*(addr))) + int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) { unsigned long pfn, text; @@ -494,8 +524,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, efi_time_cap_t *tc) spin_lock(_lock); - phys_tm = virt_to_phys(tm); - phys_tc = virt_to_phys(tc); + phys_tm = virt_to_phys_or_null(tm); + phys_tc = virt_to_phys_or_null(tc); status = efi_thunk(get_time, phys_tm, phys_tc); @@ -511,7 +541,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_time, phys_tm); @@ -529,9 +559,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending, spin_lock(_lock); - phys_enabled = virt_to_phys(enabled); - phys_pending = virt_to_phys(pending); - phys_tm = virt_to_phys(tm); + phys_enabled = virt_to_phys_or_null(enabled); + phys_pending = virt_to_phys_or_null(pending); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(get_wakeup_time, phys_enabled, phys_pending, phys_tm); @@ -549,7 +579,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_wakeup_time, enabled, phys_tm); @@ -558,6 +588,10 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) return status; } +static unsigned long efi_name_size(efi_char16_t *name) +{ + return ucs2_strsize(name, EFI_VAR_NAME_LEN) + 1; +} static efi_status_t efi_thunk_get_variable(efi_char16_t *name, efi_guid_t *vendor, @@ -567,11 +601,11 @@ efi_thunk_get_variable(efi_char16_t *name, efi_guid_t *vendor, u32 phys_name, phys_vendor, phys_attr; u32 phys_data_size, phys_data; - phys_data_size = virt_to_phys(data_size); - phys_vendor = virt_to_phys(vendor); - phys_name = virt_to_phys(name); - phys_attr = virt_to_phys(attr); - phys_data = virt_to_phys(data); + phys_data_size = virt_to_phys_or_null(data_size); + phys_vendor = virt_to_phys_or_null(vendor); + phys_name = virt_to_phys_or_null_size(name, efi_name_size(name)); + phys_attr = virt_to_phys_or_null(attr); + phys_data = virt_to_phys_or_null_size(data, *data_size); status = efi_thunk(get_variable, phys_name, phys_vendor, phys_attr, phys_data_size, phys_data); @@ -586,9 +620,9 @@ efi_thunk_set_variable(efi_char16_t *name, efi_guid_t *vendor, u32 phys_name, phys_vendor, phys_data; efi_status_t status; - phys_name = virt_to_phys(name); - phys_vendor = virt_to_phys(vendor); - p
[GIT PULL 0/2] EFI urgent fixes
Folks, please pull the following two EFI patches. The first fixes a build warning for PAE that Boris hit. The second makes mixed-mode EFI boot again after the vmap'd stack changes introduced during the merge window. The following changes since commit bc33b0ca11e3df46a4fa7639ba488c9d4911: Linux 4.9-rc4 (2016-11-05 16:23:36 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-urgent for you to fetch changes up to 044ddf3d3e3cb62671f22fa837a2164d4786d867: x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK (2016-11-12 21:00:18 +) * Fix memory corruption when booting EFI mixed mode due to the recent vmap'd stack changes - Matt Fleming * Build warning fix in the EFI memmap code when CONFIG_X86_PAE and CONFIG_PHYS_ADDR_T_64BIT are enabled - Borislav Petkov Borislav Petkov (1): x86/efi: Fix EFI memmap pointer size warning Matt Fleming (1): x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK arch/x86/platform/efi/efi.c| 2 +- arch/x86/platform/efi/efi_64.c | 80 ++ 2 files changed, 58 insertions(+), 24 deletions(-)
Re: [tip:x86/asm] x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)
On Sun, 30 Oct, at 09:21:29AM, Andy Lutomirski wrote: > > @@ -251,7 +277,8 @@ int __init efi_setup_page_tables(unsigned long > > pa_memmap, unsigned num_pages) > > if (!page) > > panic("Unable to allocate EFI runtime stack < 4GB\n"); > > > > - efi_scratch.phys_stack = virt_to_phys(page_address(page)); > > + addr = page_address(page); > > + efi_scratch.phys_stack = virt_to_phys_or_null_size(addr, PAGE_SIZE); > > This can't be on the stack -- you just allocated it with alloc_page(). Oh good point. I was too eager with search and replace. > > +static unsigned long efi_name_size(efi_char16_t *name) > > +{ > > + return ucs2_strsize(name, EFI_VAR_NAME_LEN) + 1; > > +} > > If this is really dynamic, I'm surprised that anything ends up > aligned. Can't this be a number like 6? It might pay to extend that > warning to also check that, if the variable is on the stack, its size > is a power of two. But maybe none of the users of this are on the > stack, in which case it might pay to try to prevent a new user on the > stack from showing up. I can't find any existing users that place strings on the stack, no. OK, let's try this one. --- >From 3f8a1ec209a458a9e4b82313186fbde86082696b Mon Sep 17 00:00:00 2001 From: Matt Fleming <m...@codeblueprint.co.uk> Date: Thu, 20 Oct 2016 22:17:21 +0100 Subject: [PATCH] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK Booting an EFI mixed mode kernel has been crashing since commit: e37e43a497d5 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)") The user-visible effect in my test setup was the kernel being unable to find the root file system ramdisk. This was likely caused by silent memory or page table corruption. Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as abusing virt_to_phys() because it was passing addresses that were not part of the kernel direct mapping. Use the slow version instead, which correctly handles all memory regions by performing a page table walk. Suggested-by: Andy Lutomirski <l...@amacapital.net> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ingo Molnar <mi...@kernel.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: "H. Peter Anvin" <h...@zytor.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- arch/x86/platform/efi/efi_64.c | 80 ++ 1 file changed, 57 insertions(+), 23 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 58b0f801f66f..319148bd4b05 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -211,6 +212,35 @@ void efi_sync_low_kernel_mappings(void) memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries); } +/* + * Wrapper for slow_virt_to_phys() that handles NULL addresses. + */ +static inline phys_addr_t +virt_to_phys_or_null_size(void *va, unsigned long size) +{ + bool bad_size; + + if (!va) + return 0; + + if (virt_addr_valid(va)) + return virt_to_phys(va); + + /* +* A fully aligned variable on the stack is guaranteed not to +* cross a page bounary. Try to catch strings on the stack by +* checking that 'size' is a power of two. +*/ + bad_size = size > PAGE_SIZE || !is_power_of_2(size); + + WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size); + + return slow_virt_to_phys(va); +} + +#define virt_to_phys_or_null(addr) \ + virt_to_phys_or_null_size((addr), sizeof(*(addr))) + int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) { unsigned long pfn, text; @@ -494,8 +524,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, efi_time_cap_t *tc) spin_lock(_lock); - phys_tm = virt_to_phys(tm); - phys_tc = virt_to_phys(tc); + phys_tm = virt_to_phys_or_null(tm); + phys_tc = virt_to_phys_or_null(tc); status = efi_thunk(get_time, phys_tm, phys_tc); @@ -511,7 +541,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_time, phys_tm); @@ -529,9 +559,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending, spin_lock(_lock); - phys_enabled = virt_to_phys(enabled); - phys_pending = virt_to_phys(pending); - phys_tm = virt_to_phys(tm); + phys_enabled = virt_to_phys_or_null(enabled); + phys_pending = virt_to_phys_or_null(pending); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(get_wa
Re: [tip:x86/asm] x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)
On Sun, 30 Oct, at 09:21:29AM, Andy Lutomirski wrote: > > @@ -251,7 +277,8 @@ int __init efi_setup_page_tables(unsigned long > > pa_memmap, unsigned num_pages) > > if (!page) > > panic("Unable to allocate EFI runtime stack < 4GB\n"); > > > > - efi_scratch.phys_stack = virt_to_phys(page_address(page)); > > + addr = page_address(page); > > + efi_scratch.phys_stack = virt_to_phys_or_null_size(addr, PAGE_SIZE); > > This can't be on the stack -- you just allocated it with alloc_page(). Oh good point. I was too eager with search and replace. > > +static unsigned long efi_name_size(efi_char16_t *name) > > +{ > > + return ucs2_strsize(name, EFI_VAR_NAME_LEN) + 1; > > +} > > If this is really dynamic, I'm surprised that anything ends up > aligned. Can't this be a number like 6? It might pay to extend that > warning to also check that, if the variable is on the stack, its size > is a power of two. But maybe none of the users of this are on the > stack, in which case it might pay to try to prevent a new user on the > stack from showing up. I can't find any existing users that place strings on the stack, no. OK, let's try this one. --- >From 3f8a1ec209a458a9e4b82313186fbde86082696b Mon Sep 17 00:00:00 2001 From: Matt Fleming Date: Thu, 20 Oct 2016 22:17:21 +0100 Subject: [PATCH] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK Booting an EFI mixed mode kernel has been crashing since commit: e37e43a497d5 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)") The user-visible effect in my test setup was the kernel being unable to find the root file system ramdisk. This was likely caused by silent memory or page table corruption. Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as abusing virt_to_phys() because it was passing addresses that were not part of the kernel direct mapping. Use the slow version instead, which correctly handles all memory regions by performing a page table walk. Suggested-by: Andy Lutomirski Cc: Ard Biesheuvel Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "H. Peter Anvin" Signed-off-by: Matt Fleming --- arch/x86/platform/efi/efi_64.c | 80 ++ 1 file changed, 57 insertions(+), 23 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 58b0f801f66f..319148bd4b05 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -211,6 +212,35 @@ void efi_sync_low_kernel_mappings(void) memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries); } +/* + * Wrapper for slow_virt_to_phys() that handles NULL addresses. + */ +static inline phys_addr_t +virt_to_phys_or_null_size(void *va, unsigned long size) +{ + bool bad_size; + + if (!va) + return 0; + + if (virt_addr_valid(va)) + return virt_to_phys(va); + + /* +* A fully aligned variable on the stack is guaranteed not to +* cross a page bounary. Try to catch strings on the stack by +* checking that 'size' is a power of two. +*/ + bad_size = size > PAGE_SIZE || !is_power_of_2(size); + + WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size); + + return slow_virt_to_phys(va); +} + +#define virt_to_phys_or_null(addr) \ + virt_to_phys_or_null_size((addr), sizeof(*(addr))) + int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) { unsigned long pfn, text; @@ -494,8 +524,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, efi_time_cap_t *tc) spin_lock(_lock); - phys_tm = virt_to_phys(tm); - phys_tc = virt_to_phys(tc); + phys_tm = virt_to_phys_or_null(tm); + phys_tc = virt_to_phys_or_null(tc); status = efi_thunk(get_time, phys_tm, phys_tc); @@ -511,7 +541,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_time, phys_tm); @@ -529,9 +559,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending, spin_lock(_lock); - phys_enabled = virt_to_phys(enabled); - phys_pending = virt_to_phys(pending); - phys_tm = virt_to_phys(tm); + phys_enabled = virt_to_phys_or_null(enabled); + phys_pending = virt_to_phys_or_null(pending); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(get_wakeup_time, phys_enabled, phys_pending, phys_tm); @@ -549,7 +579,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) spin_lock(_lock); -
Re: [PATCH] x86/efi: Fix EFI memmap pointer size warning
On Sun, 06 Nov, at 02:02:54PM, Borislav Petkov wrote: > Hi Matt, > > please doublecheck me on this but I think we're fine using an unsigned > long. > > Thanks. > > --- > From: Borislav Petkov <b...@suse.de> > Date: Sun, 6 Nov 2016 13:49:10 +0100 > Subject: [PATCH] x86/efi: Fix EFI memmap pointer size warning > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > Fix this when building on 32-bit: > > arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’: > arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of > different size [-Wint-to-pointer-cast] >(efi_memory_desc_t *)pa); >^ > arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of > different size [-Wint-to-pointer-cast] >(efi_memory_desc_t *)pa); >^ > > The @pa local variable is declared as phys_addr_t and that is a u64 when > CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE > build.) > > However, its value comes from __pa() which is basically doing pointer > arithmetic and checking, and returns unsigned long as it is the native > pointer width. > > So let's use an unsigned long too. It should be fine to do so because > the later users cast it to a pointer too. > > Signed-off-by: Borislav Petkov <b...@suse.de> > Cc: Matt Fleming <m...@codeblueprint.co.uk> > --- > arch/x86/platform/efi/efi.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c > index bf99aa7005eb..936a488d6cf6 100644 > --- a/arch/x86/platform/efi/efi.c > +++ b/arch/x86/platform/efi/efi.c > @@ -861,7 +861,7 @@ static void __init __efi_enter_virtual_mode(void) > int count = 0, pg_shift = 0; > void *new_memmap = NULL; > efi_status_t status; > - phys_addr_t pa; > + unsigned long pa; > > efi.systab = NULL; > Right, this does look correct given that __pa() uses 'unsigned long'. I think the reason this is safe on PAE is that __get_free_pages() is guaranteed to return a 32-bit address, and will not return HIGHMEM pages. Which makes __pa() legal and 'unsigned long' the correct type. Want me to take this through the EFI tree?
Re: [PATCH] x86/efi: Fix EFI memmap pointer size warning
On Sun, 06 Nov, at 02:02:54PM, Borislav Petkov wrote: > Hi Matt, > > please doublecheck me on this but I think we're fine using an unsigned > long. > > Thanks. > > --- > From: Borislav Petkov > Date: Sun, 6 Nov 2016 13:49:10 +0100 > Subject: [PATCH] x86/efi: Fix EFI memmap pointer size warning > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > Fix this when building on 32-bit: > > arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’: > arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of > different size [-Wint-to-pointer-cast] >(efi_memory_desc_t *)pa); >^ > arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of > different size [-Wint-to-pointer-cast] >(efi_memory_desc_t *)pa); >^ > > The @pa local variable is declared as phys_addr_t and that is a u64 when > CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE > build.) > > However, its value comes from __pa() which is basically doing pointer > arithmetic and checking, and returns unsigned long as it is the native > pointer width. > > So let's use an unsigned long too. It should be fine to do so because > the later users cast it to a pointer too. > > Signed-off-by: Borislav Petkov > Cc: Matt Fleming > --- > arch/x86/platform/efi/efi.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c > index bf99aa7005eb..936a488d6cf6 100644 > --- a/arch/x86/platform/efi/efi.c > +++ b/arch/x86/platform/efi/efi.c > @@ -861,7 +861,7 @@ static void __init __efi_enter_virtual_mode(void) > int count = 0, pg_shift = 0; > void *new_memmap = NULL; > efi_status_t status; > - phys_addr_t pa; > + unsigned long pa; > > efi.systab = NULL; > Right, this does look correct given that __pa() uses 'unsigned long'. I think the reason this is safe on PAE is that __get_free_pages() is guaranteed to return a 32-bit address, and will not return HIGHMEM pages. Which makes __pa() legal and 'unsigned long' the correct type. Want me to take this through the EFI tree?
Re: [PATCH] x86/boot: Remove always empty $(USERINCLUDE)
On Thu, 03 Nov, at 10:51:38AM, Paul Bolle wrote: > Apparently Matt left Intel. Let's forward this to a recently used > address. > > On Thu, 2016-11-03 at 10:47 +0100, Paul Bolle wrote: > > Commmit b6eea87fc685 ("x86, boot: Explicitly include autoconf.h for > > hostprogs") correctly noted > > [...] that because $(USERINCLUDE) isn't exported by > > the top-level Makefile it's actually empty in arch/x86/boot/Makefile. > > > > So let's do the sane thing and remove the reference to that make variable. > > > > Signed-off-by: Paul Bolle <pebo...@tiscali.nl> > > --- > > arch/x86/boot/Makefile | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile > > index 12ea8f8384f4..0d810fb15eac 100644 > > --- a/arch/x86/boot/Makefile > > +++ b/arch/x86/boot/Makefile > > @@ -65,7 +65,7 @@ clean-files += cpustr.h > > > > # > > --- > > > > -KBUILD_CFLAGS := $(USERINCLUDE) $(REALMODE_CFLAGS) -D_SETUP > > +KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP > > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ > > GCOV_PROFILE := n > > UBSAN_SANITIZE := n Looks OK to me. Reviewed-by: Matt Fleming <m...@codeblueprint.co.uk>
Re: [PATCH] x86/boot: Remove always empty $(USERINCLUDE)
On Thu, 03 Nov, at 10:51:38AM, Paul Bolle wrote: > Apparently Matt left Intel. Let's forward this to a recently used > address. > > On Thu, 2016-11-03 at 10:47 +0100, Paul Bolle wrote: > > Commmit b6eea87fc685 ("x86, boot: Explicitly include autoconf.h for > > hostprogs") correctly noted > > [...] that because $(USERINCLUDE) isn't exported by > > the top-level Makefile it's actually empty in arch/x86/boot/Makefile. > > > > So let's do the sane thing and remove the reference to that make variable. > > > > Signed-off-by: Paul Bolle > > --- > > arch/x86/boot/Makefile | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile > > index 12ea8f8384f4..0d810fb15eac 100644 > > --- a/arch/x86/boot/Makefile > > +++ b/arch/x86/boot/Makefile > > @@ -65,7 +65,7 @@ clean-files += cpustr.h > > > > # > > --- > > > > -KBUILD_CFLAGS := $(USERINCLUDE) $(REALMODE_CFLAGS) -D_SETUP > > +KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP > > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ > > GCOV_PROFILE := n > > UBSAN_SANITIZE := n Looks OK to me. Reviewed-by: Matt Fleming
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Sun, 30 Oct, at 08:59:58AM, Dan Williams wrote: > On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis >wrote: > > JFYI: I added this report to the list of regressions for Linux 4.9. I'll > > watch this thread for further updates on this issue to document progress > > in my weekly reports. Please let me know via regressi...@leemhuis.info > > in case the discussion moves to a different place (bugzilla or another > > mail thread for example). tia! > > > > Current status (afaics) in my report: This looks stuck. Or was is > > discussed (or even fixed) somewhere else? > > Thanks, and no, not fixed yet. I've not found the time to run the > experiments Matt needs, but a colleague has offered to look into it. Of course, if you are willing to help with debugging, Thorsten, it would be much appreciated and this bug might get fixed sooner.
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Sun, 30 Oct, at 08:59:58AM, Dan Williams wrote: > On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis > wrote: > > JFYI: I added this report to the list of regressions for Linux 4.9. I'll > > watch this thread for further updates on this issue to document progress > > in my weekly reports. Please let me know via regressi...@leemhuis.info > > in case the discussion moves to a different place (bugzilla or another > > mail thread for example). tia! > > > > Current status (afaics) in my report: This looks stuck. Or was is > > discussed (or even fixed) somewhere else? > > Thanks, and no, not fixed yet. I've not found the time to run the > experiments Matt needs, but a colleague has offered to look into it. Of course, if you are willing to help with debugging, Thorsten, it would be much appreciated and this bug might get fixed sooner.
Re: [tip:x86/asm] x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)
On Fri, 21 Oct, at 05:18:30PM, Andy Lutomirski wrote: > > This is asking for trouble if any of the variable length parameters > are on the stack. How about adding a "size_t size" parameter and > doing: > > if (!va) { > return 0; > } else if (virt_addr_valid(va)) { > return virt_to_phys(va); > } else { > /* A fully aligned variable on the stack is guaranteed not to cross > a page boundary. */ > WARN_ON(!IS_ALIGNED((uintptr_t)va, size) || size > PAGE_SIZE); > return slow_virt_to_phys(va); > } Ah, good catch. Something like this? --- >From d2c17f46686076677da3bf04caa2f69d654f8d8a Mon Sep 17 00:00:00 2001 From: Matt Fleming <m...@codeblueprint.co.uk> Date: Thu, 20 Oct 2016 22:17:21 +0100 Subject: [PATCH] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK Booting an EFI mixed mode kernel has been crashing since commit: e37e43a497d5 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)") The user-visible effect in my test setup was the kernel being unable to find the root file system ramdisk. This was likely caused by silent memory or page table corruption. Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as abusing virt_to_phys() because it was passing addresses that were not part of the kernel direct mapping. Use the slow version instead, which correctly handles all memory regions by performing a page table walk. Suggested-by: Andy Lutomirski <l...@amacapital.net> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ingo Molnar <mi...@kernel.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: "H. Peter Anvin" <h...@zytor.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- arch/x86/platform/efi/efi_64.c | 79 +- 1 file changed, 55 insertions(+), 24 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 58b0f801f66f..010544293dda 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -211,11 +212,36 @@ void efi_sync_low_kernel_mappings(void) memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries); } +/* + * Wrapper for slow_virt_to_phys() that handles NULL addresses. + */ +static inline phys_addr_t +virt_to_phys_or_null_size(void *va, unsigned long size) +{ + if (!va) + return 0; + + if (virt_addr_valid(va)) + return virt_to_phys(va); + + /* +* A fully aligned variable on the stack is guaranteed not to +* cross a page bounary. +*/ + WARN_ON(!IS_ALIGNED((unsigned long)va, size) || size > PAGE_SIZE); + + return slow_virt_to_phys(va); +} + +#define virt_to_phys_or_null(addr) \ + virt_to_phys_or_null_size((addr), sizeof(*(addr))) + int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) { unsigned long pfn, text; struct page *page; unsigned npages; + void *addr; pgd_t *pgd; if (efi_enabled(EFI_OLD_MEMMAP)) @@ -251,7 +277,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) if (!page) panic("Unable to allocate EFI runtime stack < 4GB\n"); - efi_scratch.phys_stack = virt_to_phys(page_address(page)); + addr = page_address(page); + efi_scratch.phys_stack = virt_to_phys_or_null_size(addr, PAGE_SIZE); efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */ npages = (_etext - _text) >> PAGE_SHIFT; @@ -494,8 +521,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, efi_time_cap_t *tc) spin_lock(_lock); - phys_tm = virt_to_phys(tm); - phys_tc = virt_to_phys(tc); + phys_tm = virt_to_phys_or_null(tm); + phys_tc = virt_to_phys_or_null(tc); status = efi_thunk(get_time, phys_tm, phys_tc); @@ -511,7 +538,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_time, phys_tm); @@ -529,9 +556,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending, spin_lock(_lock); - phys_enabled = virt_to_phys(enabled); - phys_pending = virt_to_phys(pending); - phys_tm = virt_to_phys(tm); + phys_enabled = virt_to_phys_or_null(enabled); + phys_pending = virt_to_phys_or_null(pending); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(get_wakeup_time, phys_enabled, phys_pending, phys_tm); @@ -549,7 +576,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) spin_lock(_lock); - phys_tm = v
Re: [tip:x86/asm] x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)
On Fri, 21 Oct, at 05:18:30PM, Andy Lutomirski wrote: > > This is asking for trouble if any of the variable length parameters > are on the stack. How about adding a "size_t size" parameter and > doing: > > if (!va) { > return 0; > } else if (virt_addr_valid(va)) { > return virt_to_phys(va); > } else { > /* A fully aligned variable on the stack is guaranteed not to cross > a page boundary. */ > WARN_ON(!IS_ALIGNED((uintptr_t)va, size) || size > PAGE_SIZE); > return slow_virt_to_phys(va); > } Ah, good catch. Something like this? --- >From d2c17f46686076677da3bf04caa2f69d654f8d8a Mon Sep 17 00:00:00 2001 From: Matt Fleming Date: Thu, 20 Oct 2016 22:17:21 +0100 Subject: [PATCH] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK Booting an EFI mixed mode kernel has been crashing since commit: e37e43a497d5 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)") The user-visible effect in my test setup was the kernel being unable to find the root file system ramdisk. This was likely caused by silent memory or page table corruption. Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as abusing virt_to_phys() because it was passing addresses that were not part of the kernel direct mapping. Use the slow version instead, which correctly handles all memory regions by performing a page table walk. Suggested-by: Andy Lutomirski Cc: Ard Biesheuvel Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "H. Peter Anvin" Signed-off-by: Matt Fleming --- arch/x86/platform/efi/efi_64.c | 79 +- 1 file changed, 55 insertions(+), 24 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 58b0f801f66f..010544293dda 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -211,11 +212,36 @@ void efi_sync_low_kernel_mappings(void) memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries); } +/* + * Wrapper for slow_virt_to_phys() that handles NULL addresses. + */ +static inline phys_addr_t +virt_to_phys_or_null_size(void *va, unsigned long size) +{ + if (!va) + return 0; + + if (virt_addr_valid(va)) + return virt_to_phys(va); + + /* +* A fully aligned variable on the stack is guaranteed not to +* cross a page bounary. +*/ + WARN_ON(!IS_ALIGNED((unsigned long)va, size) || size > PAGE_SIZE); + + return slow_virt_to_phys(va); +} + +#define virt_to_phys_or_null(addr) \ + virt_to_phys_or_null_size((addr), sizeof(*(addr))) + int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) { unsigned long pfn, text; struct page *page; unsigned npages; + void *addr; pgd_t *pgd; if (efi_enabled(EFI_OLD_MEMMAP)) @@ -251,7 +277,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) if (!page) panic("Unable to allocate EFI runtime stack < 4GB\n"); - efi_scratch.phys_stack = virt_to_phys(page_address(page)); + addr = page_address(page); + efi_scratch.phys_stack = virt_to_phys_or_null_size(addr, PAGE_SIZE); efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */ npages = (_etext - _text) >> PAGE_SHIFT; @@ -494,8 +521,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, efi_time_cap_t *tc) spin_lock(_lock); - phys_tm = virt_to_phys(tm); - phys_tc = virt_to_phys(tc); + phys_tm = virt_to_phys_or_null(tm); + phys_tc = virt_to_phys_or_null(tc); status = efi_thunk(get_time, phys_tm, phys_tc); @@ -511,7 +538,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_time, phys_tm); @@ -529,9 +556,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending, spin_lock(_lock); - phys_enabled = virt_to_phys(enabled); - phys_pending = virt_to_phys(pending); - phys_tm = virt_to_phys(tm); + phys_enabled = virt_to_phys_or_null(enabled); + phys_pending = virt_to_phys_or_null(pending); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(get_wakeup_time, phys_enabled, phys_pending, phys_tm); @@ -549,7 +576,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_wakeup_time, enabled, phys_tm); @@ -558,6 +585,10 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Fri, 21 Oct, at 04:41:29PM, Matt Fleming wrote: > > FYI, I've been able to reproduce some crash when using your EFI memory > map layout under Qemu and forcing the ESRT driver to reserve the space. Nope, that was a bug in my hack. I can't get Qemu to crash while using your memory map layout. Any chance you can insert "while(1)" loops into the EFI boot paths for a kernel that is known to reboot or trigger a triple fault in kernels that hang, so that we can narrow in on the issue. See, http://www.codeblueprint.co.uk/2015/04/early-x86-linux-boot-debug-tricks.html
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Fri, 21 Oct, at 04:41:29PM, Matt Fleming wrote: > > FYI, I've been able to reproduce some crash when using your EFI memory > map layout under Qemu and forcing the ESRT driver to reserve the space. Nope, that was a bug in my hack. I can't get Qemu to crash while using your memory map layout. Any chance you can insert "while(1)" loops into the EFI boot paths for a kernel that is known to reboot or trigger a triple fault in kernels that hang, so that we can narrow in on the issue. See, http://www.codeblueprint.co.uk/2015/04/early-x86-linux-boot-debug-tricks.html
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Thu, 20 Oct, at 12:37:16PM, Dan Williams wrote: > > I am able to build a kernel and boot the platform with the following > set of reverts: > > Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE" > Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data" > Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()" > Revert "efi: Allow drivers to reserve boot services forever" FYI, I've been able to reproduce some crash when using your EFI memory map layout under Qemu and forcing the ESRT driver to reserve the space. It looks like the new EFI memmap we allocate as part of the reservation is smaller than the old one - which is backwards. Still debugging...
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Thu, 20 Oct, at 12:37:16PM, Dan Williams wrote: > > I am able to build a kernel and boot the platform with the following > set of reverts: > > Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE" > Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data" > Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()" > Revert "efi: Allow drivers to reserve boot services forever" FYI, I've been able to reproduce some crash when using your EFI memory map layout under Qemu and forcing the ESRT driver to reserve the space. It looks like the new EFI memmap we allocate as part of the reservation is smaller than the old one - which is backwards. Still debugging...
Re: [tip:x86/asm] x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)
On Wed, 24 Aug, at 06:03:04AM, tip-bot for Andy Lutomirski wrote: > Commit-ID: e37e43a497d5a8b7c0cc1736d56986f432c394c9 > Gitweb: http://git.kernel.org/tip/e37e43a497d5a8b7c0cc1736d56986f432c394c9 > Author: Andy Lutomirski <l...@kernel.org> > AuthorDate: Thu, 11 Aug 2016 02:35:23 -0700 > Committer: Ingo Molnar <mi...@kernel.org> > CommitDate: Wed, 24 Aug 2016 12:11:42 +0200 > > x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y) > > This allows x86_64 kernels to enable vmapped stacks by setting > HAVE_ARCH_VMAP_STACK=y - which enables the CONFIG_VMAP_STACK=y > high level Kconfig option. > > There are a couple of interesting bits: This commit broke booting EFI mixed mode kernels. Here's what I've got queued up to fix it. --- >From acf11e55bfcef7a1dca7d1735f4a780e0cdb1c89 Mon Sep 17 00:00:00 2001 From: Matt Fleming <m...@codeblueprint.co.uk> Date: Thu, 20 Oct 2016 22:17:21 +0100 Subject: [PATCH] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK Booting an EFI mixed mode kernel has been crashing since commit: e37e43a497d5 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)") The user-visible effect in my test setup was the kernel being unable to find the root file system ramdisk. This was likely caused by silent memory or page table corruption. Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as abusing virt_to_phys() because it was passing addresses that were not part of the kernel direct mapping. Use the slow version instead, which correctly handles all memory regions by performing a page table walk. Suggested-by: Andy Lutomirski <l...@amacapital.net> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ingo Molnar <mi...@kernel.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: "H. Peter Anvin" <h...@zytor.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- arch/x86/platform/efi/efi_64.c | 59 +- 1 file changed, 35 insertions(+), 24 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 58b0f801f66f..e3569a00885b 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -211,6 +211,17 @@ void efi_sync_low_kernel_mappings(void) memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries); } +/* + * Wrapper for slow_virt_to_phys() that handles NULL addresses. + */ +static inline phys_addr_t virt_to_phys_or_null(void *va) +{ + if (!va) + return 0; + + return slow_virt_to_phys(va); +} + int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) { unsigned long pfn, text; @@ -251,7 +262,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) if (!page) panic("Unable to allocate EFI runtime stack < 4GB\n"); - efi_scratch.phys_stack = virt_to_phys(page_address(page)); + efi_scratch.phys_stack = virt_to_phys_or_null(page_address(page)); efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */ npages = (_etext - _text) >> PAGE_SHIFT; @@ -494,8 +505,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, efi_time_cap_t *tc) spin_lock(_lock); - phys_tm = virt_to_phys(tm); - phys_tc = virt_to_phys(tc); + phys_tm = virt_to_phys_or_null(tm); + phys_tc = virt_to_phys_or_null(tc); status = efi_thunk(get_time, phys_tm, phys_tc); @@ -511,7 +522,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_time, phys_tm); @@ -529,9 +540,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending, spin_lock(_lock); - phys_enabled = virt_to_phys(enabled); - phys_pending = virt_to_phys(pending); - phys_tm = virt_to_phys(tm); + phys_enabled = virt_to_phys_or_null(enabled); + phys_pending = virt_to_phys_or_null(pending); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(get_wakeup_time, phys_enabled, phys_pending, phys_tm); @@ -549,7 +560,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_wakeup_time, enabled, phys_tm); @@ -567,11 +578,11 @@ efi_thunk_get_variable(efi_char16_t *name, efi_guid_t *vendor, u32 phys_name, phys_vendor, phys_attr; u32 phys_data_size, phys_data; - phys_data_size = virt_to_phys(data_size); - phys_vendor = virt_to_phys(vendor); - phys_name = virt_to_phys(name); - phys_attr = virt_to_phys(attr); - phys_data = virt_to_p
Re: [tip:x86/asm] x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)
On Wed, 24 Aug, at 06:03:04AM, tip-bot for Andy Lutomirski wrote: > Commit-ID: e37e43a497d5a8b7c0cc1736d56986f432c394c9 > Gitweb: http://git.kernel.org/tip/e37e43a497d5a8b7c0cc1736d56986f432c394c9 > Author: Andy Lutomirski > AuthorDate: Thu, 11 Aug 2016 02:35:23 -0700 > Committer: Ingo Molnar > CommitDate: Wed, 24 Aug 2016 12:11:42 +0200 > > x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y) > > This allows x86_64 kernels to enable vmapped stacks by setting > HAVE_ARCH_VMAP_STACK=y - which enables the CONFIG_VMAP_STACK=y > high level Kconfig option. > > There are a couple of interesting bits: This commit broke booting EFI mixed mode kernels. Here's what I've got queued up to fix it. --- >From acf11e55bfcef7a1dca7d1735f4a780e0cdb1c89 Mon Sep 17 00:00:00 2001 From: Matt Fleming Date: Thu, 20 Oct 2016 22:17:21 +0100 Subject: [PATCH] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK Booting an EFI mixed mode kernel has been crashing since commit: e37e43a497d5 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)") The user-visible effect in my test setup was the kernel being unable to find the root file system ramdisk. This was likely caused by silent memory or page table corruption. Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as abusing virt_to_phys() because it was passing addresses that were not part of the kernel direct mapping. Use the slow version instead, which correctly handles all memory regions by performing a page table walk. Suggested-by: Andy Lutomirski Cc: Ard Biesheuvel Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "H. Peter Anvin" Signed-off-by: Matt Fleming --- arch/x86/platform/efi/efi_64.c | 59 +- 1 file changed, 35 insertions(+), 24 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 58b0f801f66f..e3569a00885b 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -211,6 +211,17 @@ void efi_sync_low_kernel_mappings(void) memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries); } +/* + * Wrapper for slow_virt_to_phys() that handles NULL addresses. + */ +static inline phys_addr_t virt_to_phys_or_null(void *va) +{ + if (!va) + return 0; + + return slow_virt_to_phys(va); +} + int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) { unsigned long pfn, text; @@ -251,7 +262,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) if (!page) panic("Unable to allocate EFI runtime stack < 4GB\n"); - efi_scratch.phys_stack = virt_to_phys(page_address(page)); + efi_scratch.phys_stack = virt_to_phys_or_null(page_address(page)); efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */ npages = (_etext - _text) >> PAGE_SHIFT; @@ -494,8 +505,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, efi_time_cap_t *tc) spin_lock(_lock); - phys_tm = virt_to_phys(tm); - phys_tc = virt_to_phys(tc); + phys_tm = virt_to_phys_or_null(tm); + phys_tc = virt_to_phys_or_null(tc); status = efi_thunk(get_time, phys_tm, phys_tc); @@ -511,7 +522,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_time, phys_tm); @@ -529,9 +540,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending, spin_lock(_lock); - phys_enabled = virt_to_phys(enabled); - phys_pending = virt_to_phys(pending); - phys_tm = virt_to_phys(tm); + phys_enabled = virt_to_phys_or_null(enabled); + phys_pending = virt_to_phys_or_null(pending); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(get_wakeup_time, phys_enabled, phys_pending, phys_tm); @@ -549,7 +560,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm) spin_lock(_lock); - phys_tm = virt_to_phys(tm); + phys_tm = virt_to_phys_or_null(tm); status = efi_thunk(set_wakeup_time, enabled, phys_tm); @@ -567,11 +578,11 @@ efi_thunk_get_variable(efi_char16_t *name, efi_guid_t *vendor, u32 phys_name, phys_vendor, phys_attr; u32 phys_data_size, phys_data; - phys_data_size = virt_to_phys(data_size); - phys_vendor = virt_to_phys(vendor); - phys_name = virt_to_phys(name); - phys_attr = virt_to_phys(attr); - phys_data = virt_to_phys(data); + phys_data_size = virt_to_phys_or_null(data_size); + phys_vendor = virt_to_phys_or_null(vendor); + phys_name = virt_to_phys_or_null(name); + phys_attr = virt_to_phys_or_null(attr); + phys_data = virt_to_p
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote: > Hi, > > I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots. > > The symptom is a reboot before the video console is available. > > I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot > services forever". However, that commit is known to be broken. The > proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap > reservations to EFI_PAGE_SIZE", also exhibits the reboot problem. > > During the bisect some of the stopping points landed on commits that > caused the boot process to hang rather than cause a reboot. The > commits that resulted in a hang are marked "git bisect skip" in this > log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e > > I'll try treating those hangs as bad bisect results and re-run the > full bisect tomorrow. In the meantime I wonder if the bisect log > implicates a better regression candidate? Could you mail the dmesg output when booting a known working kernel with efi=debug ?
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote: > Hi, > > I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots. > > The symptom is a reboot before the video console is available. > > I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot > services forever". However, that commit is known to be broken. The > proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap > reservations to EFI_PAGE_SIZE", also exhibits the reboot problem. > > During the bisect some of the stopping points landed on commits that > caused the boot process to hang rather than cause a reboot. The > commits that resulted in a hang are marked "git bisect skip" in this > log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e > > I'll try treating those hangs as bad bisect results and re-run the > full bisect tomorrow. In the meantime I wonder if the bisect log > implicates a better regression candidate? Could you mail the dmesg output when booting a known working kernel with efi=debug ?
Re: [PATCH v2] x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates
On Wed, 19 Oct, at 08:48:51PM, Alex Thorlton wrote: > Some time ago, we brought our UV BIOS callback code up to speed with the > new EFI memory mapping scheme, in commit: > > d1be84a232e3 ("x86/uv: Update uv_bios_call() to use > efi_call_virt_pointer()") > > By leveraging some changes that I made to a few of the EFI runtime > callback mechanisms, in commit: > > 80e75596079f ("efi: Convert efi_call_virt() to efi_call_virt_pointer()") > > This got everything running smoothly on UV, with the new EFI mapping > code. However, this left one, small loose end, in that EFI_OLD_MEMMAP > (a.k.a. efi=old_map) will no longer work on UV, on kernels that include > the aforementioned changes. > > At the time this was not a major issue (in fact, it still really isn't), > but there's no reason that EFI_OLD_MEMMAP *shouldn't* work on our > systems. This commit adds a check into uv_bios_call, to see if we have > the EFI_OLD_MEMMAP bit set in efi.flags. If it is set, we fall back to > using our old callback method, which uses efi_call directly on the __va > of our function pointer. > > v2: Invert if-statement and add unlikely() hint. > > Signed-off-by: Alex Thorlton <athorl...@sgi.com> > Cc: Mike Travis <tra...@sgi.com> > Cc: Russ Anderson <r...@sgi.com> > Cc: Dimitri Sivanich <sivan...@sgi.com> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Ingo Molnar <mi...@redhat.com> > Cc: "H. Peter Anvin" <h...@zytor.com> > Cc: Matt Fleming <m...@codeblueprint.co.uk> > Cc: Masahiro Yamada <yamada.masah...@socionext.com> > Cc: x...@kernel.org > --- > arch/x86/platform/uv/bios_uv.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) Looks fine to me. Applied to 'next'. Thanks!
Re: [PATCH v2] x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates
On Wed, 19 Oct, at 08:48:51PM, Alex Thorlton wrote: > Some time ago, we brought our UV BIOS callback code up to speed with the > new EFI memory mapping scheme, in commit: > > d1be84a232e3 ("x86/uv: Update uv_bios_call() to use > efi_call_virt_pointer()") > > By leveraging some changes that I made to a few of the EFI runtime > callback mechanisms, in commit: > > 80e75596079f ("efi: Convert efi_call_virt() to efi_call_virt_pointer()") > > This got everything running smoothly on UV, with the new EFI mapping > code. However, this left one, small loose end, in that EFI_OLD_MEMMAP > (a.k.a. efi=old_map) will no longer work on UV, on kernels that include > the aforementioned changes. > > At the time this was not a major issue (in fact, it still really isn't), > but there's no reason that EFI_OLD_MEMMAP *shouldn't* work on our > systems. This commit adds a check into uv_bios_call, to see if we have > the EFI_OLD_MEMMAP bit set in efi.flags. If it is set, we fall back to > using our old callback method, which uses efi_call directly on the __va > of our function pointer. > > v2: Invert if-statement and add unlikely() hint. > > Signed-off-by: Alex Thorlton > Cc: Mike Travis > Cc: Russ Anderson > Cc: Dimitri Sivanich > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: Matt Fleming > Cc: Masahiro Yamada > Cc: x...@kernel.org > --- > arch/x86/platform/uv/bios_uv.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) Looks fine to me. Applied to 'next'. Thanks!
Re: [REGRESSION] EFI mixed mode patch triggers boot failure
On Wed, 19 Oct, at 02:13:00PM, Laura Abbott wrote: > On 10/19/2016 01:04 PM, Laura Abbott wrote: > >Hi, > > > >Fedora received a bug report > >https://bugzilla.redhat.com/show_bug.cgi?id=1384238 > >of a bootup failure with stable 4.7.6. efi=noruntime fixed the bootup > >problem. > >1297667083d5442aafe3e337b9413bf02b114edb was linked as the cause > >of the problem. > > > > > >x86/efi: Only map RAM into EFI page tables if in mixed-mode > > > >Waiman reported that booting with CONFIG_EFI_MIXED enabled on his > >multi-terabyte HP machine results in boot crashes, because the EFI > >region mapping functions loop forever while trying to map those > >regions describing RAM. > > > >While this patch doesn't fix the underlying hang, there's really no > >reason to map EFI_CONVENTIONAL_MEMORY regions into the EFI page tables > >when mixed-mode is not in use at runtime. > > > >Reported-by: Waiman Long <waiman.l...@hpe.com> > >Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> > >Cc: Borislav Petkov <b...@alien8.de> > >Cc: Linus Torvalds <torva...@linux-foundation.org> > >CC: Theodore Ts'o <ty...@mit.edu> > >Cc: Arnd Bergmann <a...@arndb.de> > >Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org> > >Cc: Scott J Norton <scott.nor...@hpe.com> > >Cc: Douglas Hatch <doug.ha...@hpe.com> > >Cc: <sta...@vger.kernel.org> # v4.6+ > >Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> > > > >I made a request in the bugzilla for the reporter to > >give a bootlog with efi=debug which I'm still waiting on. > > > >Any ideas? > > > >Thanks, > >Laura > > dmesg with efi=debug from the reporter is attached > [0.00] DMI: Hewlett-Packard Compaq CQ58 Notebook PC/188B, BIOS F.36 > 06/07/2013 Hmm.. this is a fairly old machine, what kernel versions has the reporter successfully run on it? The code that was nop'd out in the above commit for the native case only came into existence in v4.6. The fact that booting with efi=old_map does *not* seem to result in a booting kernel is very suspicious. Could you ask them to double-check? Maybe try this patch too on v4.7.6. --- diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 964c7022d31d..b07183c6b470 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -244,9 +244,12 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) * text and allocate a new stack because we can't rely on the * stack pointer being < 4GB. */ - if (!IS_ENABLED(CONFIG_EFI_MIXED) || efi_is_native()) + if (!IS_ENABLED(CONFIG_EFI_MIXED)) return 0; + if (efi_is_native()) + goto map_text; + /* * Map all of RAM so that we can access arguments in the 1:1 * mapping when making EFI runtime calls. @@ -273,6 +276,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) efi_scratch.phys_stack = virt_to_phys(page_address(page)); efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */ +map_text: npages = (_etext - _text) >> PAGE_SHIFT; text = __pa(_text); pfn = text >> PAGE_SHIFT;
Re: [REGRESSION] EFI mixed mode patch triggers boot failure
On Wed, 19 Oct, at 02:13:00PM, Laura Abbott wrote: > On 10/19/2016 01:04 PM, Laura Abbott wrote: > >Hi, > > > >Fedora received a bug report > >https://bugzilla.redhat.com/show_bug.cgi?id=1384238 > >of a bootup failure with stable 4.7.6. efi=noruntime fixed the bootup > >problem. > >1297667083d5442aafe3e337b9413bf02b114edb was linked as the cause > >of the problem. > > > > > >x86/efi: Only map RAM into EFI page tables if in mixed-mode > > > >Waiman reported that booting with CONFIG_EFI_MIXED enabled on his > >multi-terabyte HP machine results in boot crashes, because the EFI > >region mapping functions loop forever while trying to map those > >regions describing RAM. > > > >While this patch doesn't fix the underlying hang, there's really no > >reason to map EFI_CONVENTIONAL_MEMORY regions into the EFI page tables > >when mixed-mode is not in use at runtime. > > > >Reported-by: Waiman Long > >Cc: Ard Biesheuvel > >Cc: Borislav Petkov > >Cc: Linus Torvalds > >CC: Theodore Ts'o > >Cc: Arnd Bergmann > >Cc: Greg Kroah-Hartman > >Cc: Scott J Norton > >Cc: Douglas Hatch > >Cc: # v4.6+ > >Signed-off-by: Matt Fleming > > > >I made a request in the bugzilla for the reporter to > >give a bootlog with efi=debug which I'm still waiting on. > > > >Any ideas? > > > >Thanks, > >Laura > > dmesg with efi=debug from the reporter is attached > [0.00] DMI: Hewlett-Packard Compaq CQ58 Notebook PC/188B, BIOS F.36 > 06/07/2013 Hmm.. this is a fairly old machine, what kernel versions has the reporter successfully run on it? The code that was nop'd out in the above commit for the native case only came into existence in v4.6. The fact that booting with efi=old_map does *not* seem to result in a booting kernel is very suspicious. Could you ask them to double-check? Maybe try this patch too on v4.7.6. --- diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 964c7022d31d..b07183c6b470 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -244,9 +244,12 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) * text and allocate a new stack because we can't rely on the * stack pointer being < 4GB. */ - if (!IS_ENABLED(CONFIG_EFI_MIXED) || efi_is_native()) + if (!IS_ENABLED(CONFIG_EFI_MIXED)) return 0; + if (efi_is_native()) + goto map_text; + /* * Map all of RAM so that we can access arguments in the 1:1 * mapping when making EFI runtime calls. @@ -273,6 +276,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) efi_scratch.phys_stack = virt_to_phys(page_address(page)); efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */ +map_text: npages = (_etext - _text) >> PAGE_SHIFT; text = __pa(_text); pfn = text >> PAGE_SHIFT;
[tip:sched/core] sched/fair: Kill the unused 'sched_shares_window_ns' tunable
Commit-ID: 3c3fcb45d524feb5d14a14f332e3eec7f2aff8f3 Gitweb: http://git.kernel.org/tip/3c3fcb45d524feb5d14a14f332e3eec7f2aff8f3 Author: Matt Fleming <m...@codeblueprint.co.uk> AuthorDate: Wed, 19 Oct 2016 15:10:59 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Thu, 20 Oct 2016 08:44:57 +0200 sched/fair: Kill the unused 'sched_shares_window_ns' tunable The last user of this tunable was removed in 2012 in commit: 82958366cfea ("sched: Replace update_shares weight distribution with per-entity computation") Delete it since its very existence confuses people. Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> Cc: Dietmar Eggemann <dietmar.eggem...@arm.com> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Mike Galbraith <umgwanakikb...@gmail.com> Cc: Paul Turner <p...@google.com> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Link: http://lkml.kernel.org/r/20161019141059.26408-1-m...@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mi...@kernel.org> --- include/linux/sched/sysctl.h | 1 - kernel/sched/fair.c | 7 --- kernel/sysctl.c | 7 --- 3 files changed, 15 deletions(-) diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 22db1e6..4411453 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -36,7 +36,6 @@ extern unsigned int sysctl_numa_balancing_scan_size; extern unsigned int sysctl_sched_migration_cost; extern unsigned int sysctl_sched_nr_migrate; extern unsigned int sysctl_sched_time_avg; -extern unsigned int sysctl_sched_shares_window; int sched_proc_update_handler(struct ctl_table *table, int write, void __user *buffer, size_t *length, diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d941c97..79d464a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -93,13 +93,6 @@ unsigned int normalized_sysctl_sched_wakeup_granularity = 100UL; const_debug unsigned int sysctl_sched_migration_cost = 50UL; -/* - * The exponential sliding window over which load is averaged for shares - * distribution. - * (default: 10msec) - */ -unsigned int __read_mostly sysctl_sched_shares_window = 1000UL; - #ifdef CONFIG_CFS_BANDWIDTH /* * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 706309f..739fb17 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -347,13 +347,6 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, - { - .procname = "sched_shares_window_ns", - .data = _sched_shares_window, - .maxlen = sizeof(unsigned int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, #ifdef CONFIG_SCHEDSTATS { .procname = "sched_schedstats",
[tip:sched/core] sched/fair: Kill the unused 'sched_shares_window_ns' tunable
Commit-ID: 3c3fcb45d524feb5d14a14f332e3eec7f2aff8f3 Gitweb: http://git.kernel.org/tip/3c3fcb45d524feb5d14a14f332e3eec7f2aff8f3 Author: Matt Fleming AuthorDate: Wed, 19 Oct 2016 15:10:59 +0100 Committer: Ingo Molnar CommitDate: Thu, 20 Oct 2016 08:44:57 +0200 sched/fair: Kill the unused 'sched_shares_window_ns' tunable The last user of this tunable was removed in 2012 in commit: 82958366cfea ("sched: Replace update_shares weight distribution with per-entity computation") Delete it since its very existence confuses people. Signed-off-by: Matt Fleming Cc: Dietmar Eggemann Cc: Linus Torvalds Cc: Mike Galbraith Cc: Paul Turner Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20161019141059.26408-1-m...@codeblueprint.co.uk Signed-off-by: Ingo Molnar --- include/linux/sched/sysctl.h | 1 - kernel/sched/fair.c | 7 --- kernel/sysctl.c | 7 --- 3 files changed, 15 deletions(-) diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 22db1e6..4411453 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -36,7 +36,6 @@ extern unsigned int sysctl_numa_balancing_scan_size; extern unsigned int sysctl_sched_migration_cost; extern unsigned int sysctl_sched_nr_migrate; extern unsigned int sysctl_sched_time_avg; -extern unsigned int sysctl_sched_shares_window; int sched_proc_update_handler(struct ctl_table *table, int write, void __user *buffer, size_t *length, diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d941c97..79d464a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -93,13 +93,6 @@ unsigned int normalized_sysctl_sched_wakeup_granularity = 100UL; const_debug unsigned int sysctl_sched_migration_cost = 50UL; -/* - * The exponential sliding window over which load is averaged for shares - * distribution. - * (default: 10msec) - */ -unsigned int __read_mostly sysctl_sched_shares_window = 1000UL; - #ifdef CONFIG_CFS_BANDWIDTH /* * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 706309f..739fb17 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -347,13 +347,6 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, - { - .procname = "sched_shares_window_ns", - .data = _sched_shares_window, - .maxlen = sizeof(unsigned int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, #ifdef CONFIG_SCHEDSTATS { .procname = "sched_schedstats",
[PATCH] sched/fair: Kill the unused sched_shares_window_ns tunable
The last user of this tunable was removed in 2012 in commit 8295836 ("sched: Replace update_shares weight distribution with per-entity computation"). Delete it since its very existence confuses people. Cc: Peter Zijlstra <pet...@infradead.org> Cc: Ingo Molnar <mi...@kernel.org> Cc: Dietmar Eggemann <dietmar.eggem...@arm.com> Cc: Paul Turner <p...@google.com> Cc: Mike Galbraith <umgwanakikb...@gmail.com> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> --- include/linux/sched/sysctl.h | 1 - kernel/sched/fair.c | 7 --- kernel/sysctl.c | 7 --- 3 files changed, 15 deletions(-) diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 22db1e63707e..441145351301 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -36,7 +36,6 @@ extern unsigned int sysctl_numa_balancing_scan_size; extern unsigned int sysctl_sched_migration_cost; extern unsigned int sysctl_sched_nr_migrate; extern unsigned int sysctl_sched_time_avg; -extern unsigned int sysctl_sched_shares_window; int sched_proc_update_handler(struct ctl_table *table, int write, void __user *buffer, size_t *length, diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 543b2f291152..f026050c15cc 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -93,13 +93,6 @@ unsigned int normalized_sysctl_sched_wakeup_granularity = 100UL; const_debug unsigned int sysctl_sched_migration_cost = 50UL; -/* - * The exponential sliding window over which load is averaged for shares - * distribution. - * (default: 10msec) - */ -unsigned int __read_mostly sysctl_sched_shares_window = 1000UL; - #ifdef CONFIG_CFS_BANDWIDTH /* * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool diff --git a/kernel/sysctl.c b/kernel/sysctl.c index a13bbdaab47d..2a361dc6e1eb 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -347,13 +347,6 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, - { - .procname = "sched_shares_window_ns", - .data = _sched_shares_window, - .maxlen = sizeof(unsigned int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, #ifdef CONFIG_SCHEDSTATS { .procname = "sched_schedstats", -- 2.10.0