Re: [PATCH] x86/efi Fix regression in efi_arch_mem_reserve

2016-12-21 Thread Matt Fleming
On Wed, 21 Dec, at 02:11:38PM, Petr Oros wrote:
>   Booting on EFI with ESRT table has been stop since commit:
> 8e80632 efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()
> 
>   This is caused by this commit:
> 816e761 efi: Allow drivers to reserve boot services forever
> 
>   Problem is, that efi_memmap_insert need memory aligned
>   on EFI_PAGE_SIZE. If memory not aligned, efi_memmap_insert
>   just return and let efi.memmap in inconsistent state.
>   This breaking boot.
> 
>   Tested in my machine, which stop booting
>   after upgrade to 4.9
> 
> Signed-off-by: Petr Oros 
> ---
>  arch/x86/platform/efi/quirks.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Could you provide some more information? Why does efi_memmap_insert()
require this alignment? How does booting "break"? If you see an Oops,
please post it here.


Re: [PATCH 2/2 v3] sched: use load_avg for selecting idlest group

2016-12-09 Thread Matt Fleming
On Thu, 08 Dec, at 05:56:54PM, Vincent Guittot wrote:
> find_idlest_group() only compares the runnable_load_avg when looking for
> the least loaded group. But on fork intensive use case like hackbench
> where tasks blocked quickly after the fork, this can lead to selecting the
> same CPU instead of other CPUs, which have similar runnable load but a
> lower load_avg.
> 
> When the runnable_load_avg of 2 CPUs are close, we now take into account
> the amount of blocked load as a 2nd selection factor. There is now 3 zones
> for the runnable_load of the rq:
> -[0 .. (runnable_load - imbalance)] : Select the new rq which has
> significantly less runnable_load
> -](runnable_load - imbalance) .. (runnable_load + imbalance)[ : The
> runnable loads are close so we use load_avg to chose between the 2 rq
> -[(runnable_load + imbalance) .. ULONG_MAX] : Keep the current rq which
> has significantly less runnable_load
> 
> The scale factor that is currently used for comparing runnable_load,
> doesn't work well with small value. As an example, the use of a scaling
> factor fails as soon as this_runnable_load == 0 because we always select
> local rq even if min_runnable_load is only 1, which doesn't really make
> sense because they are just the same. So instead of scaling factor, we use
> an absolute margin for runnable_load to detect CPUs with similar
> runnable_load and we keep using scaling factor for blocked load.
> 
> For use case like hackbench, this enable the scheduler to select different
> CPUs during the fork sequence and to spread tasks across the system.
> 
> Tests have been done on a Hikey board (ARM based octo cores) for several
> kernel. The result below gives min, max, avg and stdev values of 18 runs
> with each configuration.
> 
> The v4.8+patches configuration also includes the changes below which is
> part of the proposal made by Peter to ensure that the clock will be up to
> date when the fork task will be attached to the rq.
> 
> @@ -2568,6 +2568,7 @@ void wake_up_new_task(struct task_struct *p)
>   __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
>  #endif
>   rq = __task_rq_lock(p, );
> + update_rq_clock(rq);
>   post_init_entity_util_avg(>se);
> 
>   activate_task(rq, p, 0);
> 
> hackbench -P -g 1
> 
>ea86cb4b7621  7dc603c9028e  v4.8v4.8+patches
> min0.049 0.050 0.051   0,048
> avg0.057 0.057(0%) 0.057(0%)   0,055(+5%)
> max0.066 0.068 0.070   0,063
> stdev  +/-9% +/-9% +/-8%   +/-9%
> 
> Signed-off-by: Vincent Guittot <vincent.guit...@linaro.org>
> ---
>  kernel/sched/fair.c | 48 ++--
>  1 file changed, 38 insertions(+), 10 deletions(-)

Tested-by: Matt Fleming <m...@codeblueprint.co.uk>
Reviewed-by: Matt Fleming <m...@codeblueprint.co.uk>

Peter, Ingo, when you pick this up would you also consider adding the
following tag which links to an email describing the problem this
patch solves and the performance test results when it's applied?

Link: https://lkml.kernel.org/r/20161203214707.gi20...@codeblueprint.co.uk


Re: [PATCH 2/2 v3] sched: use load_avg for selecting idlest group

2016-12-09 Thread Matt Fleming
On Thu, 08 Dec, at 05:56:54PM, Vincent Guittot wrote:
> find_idlest_group() only compares the runnable_load_avg when looking for
> the least loaded group. But on fork intensive use case like hackbench
> where tasks blocked quickly after the fork, this can lead to selecting the
> same CPU instead of other CPUs, which have similar runnable load but a
> lower load_avg.
> 
> When the runnable_load_avg of 2 CPUs are close, we now take into account
> the amount of blocked load as a 2nd selection factor. There is now 3 zones
> for the runnable_load of the rq:
> -[0 .. (runnable_load - imbalance)] : Select the new rq which has
> significantly less runnable_load
> -](runnable_load - imbalance) .. (runnable_load + imbalance)[ : The
> runnable loads are close so we use load_avg to chose between the 2 rq
> -[(runnable_load + imbalance) .. ULONG_MAX] : Keep the current rq which
> has significantly less runnable_load
> 
> The scale factor that is currently used for comparing runnable_load,
> doesn't work well with small value. As an example, the use of a scaling
> factor fails as soon as this_runnable_load == 0 because we always select
> local rq even if min_runnable_load is only 1, which doesn't really make
> sense because they are just the same. So instead of scaling factor, we use
> an absolute margin for runnable_load to detect CPUs with similar
> runnable_load and we keep using scaling factor for blocked load.
> 
> For use case like hackbench, this enable the scheduler to select different
> CPUs during the fork sequence and to spread tasks across the system.
> 
> Tests have been done on a Hikey board (ARM based octo cores) for several
> kernel. The result below gives min, max, avg and stdev values of 18 runs
> with each configuration.
> 
> The v4.8+patches configuration also includes the changes below which is
> part of the proposal made by Peter to ensure that the clock will be up to
> date when the fork task will be attached to the rq.
> 
> @@ -2568,6 +2568,7 @@ void wake_up_new_task(struct task_struct *p)
>   __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
>  #endif
>   rq = __task_rq_lock(p, );
> + update_rq_clock(rq);
>   post_init_entity_util_avg(>se);
> 
>   activate_task(rq, p, 0);
> 
> hackbench -P -g 1
> 
>ea86cb4b7621  7dc603c9028e  v4.8v4.8+patches
> min0.049 0.050 0.051   0,048
> avg0.057 0.057(0%) 0.057(0%)   0,055(+5%)
> max0.066 0.068 0.070   0,063
> stdev  +/-9% +/-9% +/-8%   +/-9%
> 
> Signed-off-by: Vincent Guittot 
> ---
>  kernel/sched/fair.c | 48 ++--
>  1 file changed, 38 insertions(+), 10 deletions(-)

Tested-by: Matt Fleming 
Reviewed-by: Matt Fleming 

Peter, Ingo, when you pick this up would you also consider adding the
following tag which links to an email describing the problem this
patch solves and the performance test results when it's applied?

Link: https://lkml.kernel.org/r/20161203214707.gi20...@codeblueprint.co.uk


Re: [PATCH 1/2 v3] sched: fix find_idlest_group for fork

2016-12-09 Thread Matt Fleming
On Thu, 08 Dec, at 05:56:53PM, Vincent Guittot wrote:
> During fork, the utilization of a task is init once the rq has been
> selected because the current utilization level of the rq is used to set
> the utilization of the fork task. As the task's utilization is still
> null at this step of the fork sequence, it doesn't make sense to look for
> some spare capacity that can fit the task's utilization.
> Furthermore, I can see perf regressions for the test "hackbench -P -g 1"
> because the least loaded policy is always bypassed and tasks are not
> spread during fork.
> 
> With this patch and the fix below, we are back to same performances as
> for v4.8. The fix below is only a temporary one used for the test until a
> smarter solution is found because we can't simply remove the test which is
> useful for others benchmarks
> 
> @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, 
> struct sched_domain *sd, int t
> 
>   avg_cost = this_sd->avg_scan_cost;
> 
> - /*
> -  * Due to large variance we need a large fuzz factor; hackbench in
> -  * particularly is sensitive here.
> -  */
> - if ((avg_idle / 512) < avg_cost)
> - return -1;
> -
>   time = local_clock();
> 
>   for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) {
> 
> Signed-off-by: Vincent Guittot <vincent.guit...@linaro.org>
> Acked-by: Morten Rasmussen <morten.rasmus...@arm.com>
> ---
>  kernel/sched/fair.c | 6 ++
>  1 file changed, 6 insertions(+)

Tested-by: Matt Fleming <m...@codeblueprint.co.uk>
Reviewed-by: Matt Fleming <m...@codeblueprint.co.uk>


Re: [PATCH 1/2 v3] sched: fix find_idlest_group for fork

2016-12-09 Thread Matt Fleming
On Thu, 08 Dec, at 05:56:53PM, Vincent Guittot wrote:
> During fork, the utilization of a task is init once the rq has been
> selected because the current utilization level of the rq is used to set
> the utilization of the fork task. As the task's utilization is still
> null at this step of the fork sequence, it doesn't make sense to look for
> some spare capacity that can fit the task's utilization.
> Furthermore, I can see perf regressions for the test "hackbench -P -g 1"
> because the least loaded policy is always bypassed and tasks are not
> spread during fork.
> 
> With this patch and the fix below, we are back to same performances as
> for v4.8. The fix below is only a temporary one used for the test until a
> smarter solution is found because we can't simply remove the test which is
> useful for others benchmarks
> 
> @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, 
> struct sched_domain *sd, int t
> 
>   avg_cost = this_sd->avg_scan_cost;
> 
> - /*
> -  * Due to large variance we need a large fuzz factor; hackbench in
> -  * particularly is sensitive here.
> -  */
> - if ((avg_idle / 512) < avg_cost)
> - return -1;
> -
>   time = local_clock();
> 
>   for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) {
> 
> Signed-off-by: Vincent Guittot 
> Acked-by: Morten Rasmussen 
> ---
>  kernel/sched/fair.c | 6 ++
>  1 file changed, 6 insertions(+)

Tested-by: Matt Fleming 
Reviewed-by: Matt Fleming 


Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group

2016-12-08 Thread Matt Fleming
On Mon, 05 Dec, at 01:35:46PM, Matt Fleming wrote:
> On Mon, 05 Dec, at 10:27:36AM, Vincent Guittot wrote:
> > 
> > Hi Matt,
> > 
> > Thanks for the results.
> > 
> > During the review, it has been pointed out by Morten that the test condition
> > (100*this_avg_load < imbalance_scale*min_avg_load) makes more sense than
> > (100*min_avg_load > imbalance_scale*this_avg_load). But i see lower
> > performances with this change. Coud you run tests with the change below on
> > top of the patchset ?
> > 
> > ---
> >  kernel/sched/fair.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index e8d1ae7..0129fbb 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5514,7 +5514,7 @@ find_idlest_group(struct sched_domain *sd, struct 
> > task_struct *p,
> > if (!idlest ||
> > (min_runnable_load > (this_runnable_load + imbalance)) ||
> > ((this_runnable_load < (min_runnable_load + imbalance)) &&
> > -   (100*min_avg_load > imbalance_scale*this_avg_load)))
> > +   (100*this_avg_load < imbalance_scale*min_avg_load)))
> > return NULL;
> > return idlest;
> >  }
> 
> Queued for testing.

Most of the machines didn't notice the difference with this new patch.
However, I did come across one test that showed a negative change,


hackbench-thread-pipes
4.9.0-rc6 4.9.0-rc6 4.9.0-rc6   
  4.9.0-rc6
tip-sched  fix-fig-for-fork   fix-sig   
fix-fig-for-fork-v2
Amean1   0.1266 (  0.00%)  0.1269 ( -0.23%)  0.1287 ( -1.69%)   
   0.1357 ( -7.22%)
Amean4   0.4989 (  0.00%)  0.5174 ( -3.72%)  0.5251 ( -5.27%)   
   0.5117 ( -2.58%)
Amean7   0.8510 (  0.00%)  0.8517 ( -0.08%)  0.8964 ( -5.34%)   
   0.8801 ( -3.42%)
Amean12  1.0699 (  0.00%)  1.0484 (  2.00%)  1.0147 (  5.15%)   
   1.0759 ( -0.56%)
Amean21  1.2816 (  0.00%)  1.2140 (  5.27%)  1.1879 (  7.31%)   
   1.2414 (  3.13%)
Amean30  1.4440 (  0.00%)  1.4003 (  3.03%)  1.3969 (  3.26%)   
   1.4057 (  2.65%)
Amean48  1.5773 (  0.00%)  1.5983 ( -1.33%)  1.3984 ( 11.34%)   
   1.5624 (  0.94%)
Amean79  2.2343 (  0.00%)  2.3066 ( -3.24%)  2.0053 ( 10.25%)   
   2.2630 ( -1.29%)
Amean96  2.6736 (  0.00%)  2.4313 (  9.06%)  2.4181 (  9.55%)   
   2.4717 (  7.55%)

   4.9.0-rc6   4.9.0-rc6   4.9.0-rc6   4.9.0-rc6
   tip-schedfix-fig-for-fork fix-sigfix-fig-for-fork-v2
User  129.53  128.64  127.70  131.00
System   1784.54 1744.21 1654.08 1744.00
Elapsed92.07   90.44   86.95   91.00

Looking at the 48 and 79 groups rows for mean there's a noticeable
drop off in performance of ~10%, which should be outside of the noise
for this test. This is a 2 socket, 4 NUMA node (yes, really), 24 cpus
AMD opteron circa 2010.

Given the age of this machine, I don't think it's worth holding up the
patch.


Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group

2016-12-08 Thread Matt Fleming
On Mon, 05 Dec, at 01:35:46PM, Matt Fleming wrote:
> On Mon, 05 Dec, at 10:27:36AM, Vincent Guittot wrote:
> > 
> > Hi Matt,
> > 
> > Thanks for the results.
> > 
> > During the review, it has been pointed out by Morten that the test condition
> > (100*this_avg_load < imbalance_scale*min_avg_load) makes more sense than
> > (100*min_avg_load > imbalance_scale*this_avg_load). But i see lower
> > performances with this change. Coud you run tests with the change below on
> > top of the patchset ?
> > 
> > ---
> >  kernel/sched/fair.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index e8d1ae7..0129fbb 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5514,7 +5514,7 @@ find_idlest_group(struct sched_domain *sd, struct 
> > task_struct *p,
> > if (!idlest ||
> > (min_runnable_load > (this_runnable_load + imbalance)) ||
> > ((this_runnable_load < (min_runnable_load + imbalance)) &&
> > -   (100*min_avg_load > imbalance_scale*this_avg_load)))
> > +   (100*this_avg_load < imbalance_scale*min_avg_load)))
> > return NULL;
> > return idlest;
> >  }
> 
> Queued for testing.

Most of the machines didn't notice the difference with this new patch.
However, I did come across one test that showed a negative change,


hackbench-thread-pipes
4.9.0-rc6 4.9.0-rc6 4.9.0-rc6   
  4.9.0-rc6
tip-sched  fix-fig-for-fork   fix-sig   
fix-fig-for-fork-v2
Amean1   0.1266 (  0.00%)  0.1269 ( -0.23%)  0.1287 ( -1.69%)   
   0.1357 ( -7.22%)
Amean4   0.4989 (  0.00%)  0.5174 ( -3.72%)  0.5251 ( -5.27%)   
   0.5117 ( -2.58%)
Amean7   0.8510 (  0.00%)  0.8517 ( -0.08%)  0.8964 ( -5.34%)   
   0.8801 ( -3.42%)
Amean12  1.0699 (  0.00%)  1.0484 (  2.00%)  1.0147 (  5.15%)   
   1.0759 ( -0.56%)
Amean21  1.2816 (  0.00%)  1.2140 (  5.27%)  1.1879 (  7.31%)   
   1.2414 (  3.13%)
Amean30  1.4440 (  0.00%)  1.4003 (  3.03%)  1.3969 (  3.26%)   
   1.4057 (  2.65%)
Amean48  1.5773 (  0.00%)  1.5983 ( -1.33%)  1.3984 ( 11.34%)   
   1.5624 (  0.94%)
Amean79  2.2343 (  0.00%)  2.3066 ( -3.24%)  2.0053 ( 10.25%)   
   2.2630 ( -1.29%)
Amean96  2.6736 (  0.00%)  2.4313 (  9.06%)  2.4181 (  9.55%)   
   2.4717 (  7.55%)

   4.9.0-rc6   4.9.0-rc6   4.9.0-rc6   4.9.0-rc6
   tip-schedfix-fig-for-fork fix-sigfix-fig-for-fork-v2
User  129.53  128.64  127.70  131.00
System   1784.54 1744.21 1654.08 1744.00
Elapsed92.07   90.44   86.95   91.00

Looking at the 48 and 79 groups rows for mean there's a noticeable
drop off in performance of ~10%, which should be outside of the noise
for this test. This is a 2 socket, 4 NUMA node (yes, really), 24 cpus
AMD opteron circa 2010.

Given the age of this machine, I don't think it's worth holding up the
patch.


Re: [GIT PULL] efi: Pass secure boot mode to kernel

2016-12-08 Thread Matt Fleming
On Thu, 08 Dec, at 11:45:17AM, David Howells wrote:
> Hi Matt, Ard,
> 
> Is it too late to request this for the upcoming merge window?

For something as non-trivial as this, yes, it's too late. We generally
close the EFI tree window for new features around -rc5 time.

> Also, I've made
> Lukas's requested changes and reposted just that patch in my reply to him.  Do
> you want me to repost the lot?
 
Please do, yeah.


Re: [GIT PULL] efi: Pass secure boot mode to kernel

2016-12-08 Thread Matt Fleming
On Thu, 08 Dec, at 11:45:17AM, David Howells wrote:
> Hi Matt, Ard,
> 
> Is it too late to request this for the upcoming merge window?

For something as non-trivial as this, yes, it's too late. We generally
close the EFI tree window for new features around -rc5 time.

> Also, I've made
> Lukas's requested changes and reposted just that patch in my reply to him.  Do
> you want me to repost the lot?
 
Please do, yeah.


Re: [PATCH 0/4] UEFI: EFI_MEMORY_ATTRIBUTES_TABLE support for x86

2016-12-07 Thread Matt Fleming
On Wed, 07 Dec, at 11:01:06AM, Sai Praneeth Prakhya wrote:
> 
> Thanks Matt!
> 
> Would you like to see a new version of these patch series addressing
> your comments? Like
> 1. Dropping of patch #4
> 2. Adding Reviewed-by tag of Joey (Sorry for that)
> 3. This time with correct version number

No need, they're already in the 'next' branch of the EFI tree.


Re: [PATCH 0/4] UEFI: EFI_MEMORY_ATTRIBUTES_TABLE support for x86

2016-12-07 Thread Matt Fleming
On Wed, 07 Dec, at 11:01:06AM, Sai Praneeth Prakhya wrote:
> 
> Thanks Matt!
> 
> Would you like to see a new version of these patch series addressing
> your comments? Like
> 1. Dropping of patch #4
> 2. Adding Reviewed-by tag of Joey (Sorry for that)
> 3. This time with correct version number

No need, they're already in the 'next' branch of the EFI tree.


Re: Does anything ever enter through startup_64 in head_64.S?

2016-12-07 Thread Matt Fleming
On Wed, 07 Dec, at 03:11:17PM, David Howells wrote:
> Hi Matt,
> 
> Does anything ever enter the kernel through startup_64 in head_64.S?[*]  Do
> all 64-bit mode entries always enter through one of the EFI entry points?
 
Which head_64.S? There are two ;-)

Assuming you mean startup_64 in boot/compressed/head_64.S, then the
answer is "yes". 64-bit BIOS boot loaders will jump there.

I'm fairly sure it's possible to boot that way on EFI too, you just
lose some of the newer functionality that is dependent on the EFI boot
stub. But I don't remember the last time I tried it.

In general, we've always tried to maintain backwards compat, but you
don't get the new features without switching to the EFI entry points.

Whether or not any distros are still using the old 0x200 entry point
for EFI is a good question, though.


Re: Does anything ever enter through startup_64 in head_64.S?

2016-12-07 Thread Matt Fleming
On Wed, 07 Dec, at 03:11:17PM, David Howells wrote:
> Hi Matt,
> 
> Does anything ever enter the kernel through startup_64 in head_64.S?[*]  Do
> all 64-bit mode entries always enter through one of the EFI entry points?
 
Which head_64.S? There are two ;-)

Assuming you mean startup_64 in boot/compressed/head_64.S, then the
answer is "yes". 64-bit BIOS boot loaders will jump there.

I'm fairly sure it's possible to boot that way on EFI too, you just
lose some of the newer functionality that is dependent on the EFI boot
stub. But I don't remember the last time I tried it.

In general, we've always tried to maintain backwards compat, but you
don't get the new features without switching to the EFI entry points.

Whether or not any distros are still using the old 0x200 entry point
for EFI is a good question, though.


Re: [PATCH 0/4] UEFI: EFI_MEMORY_ATTRIBUTES_TABLE support for x86

2016-12-07 Thread Matt Fleming
On Tue, 06 Dec, at 11:15:59AM, Sai Praneeth Prakhya wrote:
> From: Sai Praneeth 
> 
> UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE which describes memory 
> protections that may be applied to EFI Runtime code and data regions by 
> kernel. This helps kernel to map efi runtime regions more strictly and 
> hence allowing only appropriate accesses to these regions. Please refer 
> to "EFI_MEMORY_ATTRIBUTES_TABLE" in section 4.6 of UEFI specification 
> v2.6 for more information on this table.
> 
> This patch set relies on commit a604af075a32 ("efi: Add support for the 
> EFI_MEMORY_ATTRIBUTES_TABLE config table"), commit 10f0d2f57705 ("efi: 
> Implement generic support for the Memory Attributes table") and hence 
> implements support for only x86.
> 
> Since the above commits have already implemented early discovery and 
> validation of table, the following patches implement a call back 
> function for x86 which is called only when EFI_MEMORY_ATTRIBUTES_TABLE 
> is detected.
> 
> Patch #1 makes the efi_memory_attributes table detection code generic 
> across all architectures
> 
> Patch #2 adds EFI_MEM_ATTR bit to keep track of this feature
> 
> Patch #3 Implements call back function that does stricter mappings based 
> on this table
> 
> Patch #4 Skip parsing of EFI_PROPERTIES_TABLE if EFI_MEMORY_ATTRIBUTES_TABLE 
> is detected
> 
> Sai Praneeth (4):
>   efi: Make EFI_MEMORY_ATTRIBUTES_TABLE initialization common across all
> architectures
>   efi: Introduce EFI_MEM_ATTR bit and set it from memory attributes
> table
>   x86/efi: Add support for EFI_MEMORY_ATTRIBUTES_TABLE
>   efi: Skip parsing of EFI_PROPERTIES_TABLE if
> EFI_MEMORY_ATTRIBUTES_TABLE is detected
> 
>  arch/x86/platform/efi/efi_64.c  | 64 
> ++---
>  drivers/firmware/efi/arm-init.c |  1 -
>  drivers/firmware/efi/efi.c  | 13 +
>  drivers/firmware/efi/memattr.c  |  6 +++-
>  include/linux/efi.h |  1 +
>  5 files changed, 73 insertions(+), 12 deletions(-)

Thanks Sai, I've queued this up for v4.11.


Re: [PATCH 0/4] UEFI: EFI_MEMORY_ATTRIBUTES_TABLE support for x86

2016-12-07 Thread Matt Fleming
On Tue, 06 Dec, at 11:15:59AM, Sai Praneeth Prakhya wrote:
> From: Sai Praneeth 
> 
> UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE which describes memory 
> protections that may be applied to EFI Runtime code and data regions by 
> kernel. This helps kernel to map efi runtime regions more strictly and 
> hence allowing only appropriate accesses to these regions. Please refer 
> to "EFI_MEMORY_ATTRIBUTES_TABLE" in section 4.6 of UEFI specification 
> v2.6 for more information on this table.
> 
> This patch set relies on commit a604af075a32 ("efi: Add support for the 
> EFI_MEMORY_ATTRIBUTES_TABLE config table"), commit 10f0d2f57705 ("efi: 
> Implement generic support for the Memory Attributes table") and hence 
> implements support for only x86.
> 
> Since the above commits have already implemented early discovery and 
> validation of table, the following patches implement a call back 
> function for x86 which is called only when EFI_MEMORY_ATTRIBUTES_TABLE 
> is detected.
> 
> Patch #1 makes the efi_memory_attributes table detection code generic 
> across all architectures
> 
> Patch #2 adds EFI_MEM_ATTR bit to keep track of this feature
> 
> Patch #3 Implements call back function that does stricter mappings based 
> on this table
> 
> Patch #4 Skip parsing of EFI_PROPERTIES_TABLE if EFI_MEMORY_ATTRIBUTES_TABLE 
> is detected
> 
> Sai Praneeth (4):
>   efi: Make EFI_MEMORY_ATTRIBUTES_TABLE initialization common across all
> architectures
>   efi: Introduce EFI_MEM_ATTR bit and set it from memory attributes
> table
>   x86/efi: Add support for EFI_MEMORY_ATTRIBUTES_TABLE
>   efi: Skip parsing of EFI_PROPERTIES_TABLE if
> EFI_MEMORY_ATTRIBUTES_TABLE is detected
> 
>  arch/x86/platform/efi/efi_64.c  | 64 
> ++---
>  drivers/firmware/efi/arm-init.c |  1 -
>  drivers/firmware/efi/efi.c  | 13 +
>  drivers/firmware/efi/memattr.c  |  6 +++-
>  include/linux/efi.h |  1 +
>  5 files changed, 73 insertions(+), 12 deletions(-)

Thanks Sai, I've queued this up for v4.11.


Re: [PATCH 4/4] efi: Skip parsing of EFI_PROPERTIES_TABLE if EFI_MEMORY_ATTRIBUTES_TABLE is detected

2016-12-07 Thread Matt Fleming
On Tue, 06 Dec, at 11:16:03AM, Sai Praneeth Prakhya wrote:
> From: Sai Praneeth <sai.praneeth.prak...@intel.com>
> 
> UEFI specification v2.6 recommends not to use
> "EFI_PROPERTIES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA"
> attribute of EFI_PROPERTIES_TABLE. Presently, this is the *only* bit
> defined in EFI_PROPERTIES_TABLE. This bit implies that EFI Runtime code
> and data regions of an executable image are separate and are aligned as
> specified in spec. Please refer to "EFI_PROPERTIES_TABLE" in section 4.6
> of UEFI specification v2.6 for more information on this table.
> 
> UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE and is intended to
> replace EFI_PROPERTIES_TABLE. If EFI_MEMORY_ATTRIBUTES_TABLE is found we
> skip updating of efi runtime region mappings based on
> EFI_PROPERTIES_TABLE, so let's also skip parsing of EFI_PROPERTIES_TABLE
> if we find EFI_MEMORY_ATTRIBUTES_TABLE because we are not using this
> table anyways. The only caveat here is, if further versions of UEFI spec
> adds some more bits (hence some more attributes) to EFI_PROPERTIES_TABLE
> then we might need to parse it again, otherwise there is no good in
> doing that. We can also expect that the same attributes might be reflected in
> EFI_MEMORY_ATTRIBUTES_TABLE and hence saving us from parsing
> EFI_PROPERTIES_TABLE again.
> 
> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
> Cc: Lee, Chun-Yi <j...@suse.com>
> Cc: Borislav Petkov <b...@alien8.de>
> Cc: Ricardo Neri <ricardo.n...@intel.com>
> Cc: Matt Fleming <m...@codeblueprint.co.uk>
> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
> Cc: Ravi Shankar <ravi.v.shan...@intel.com>
> Cc: Fenghua Yu <fenghua...@intel.com>
> ---
>  drivers/firmware/efi/efi.c | 11 +++
>  1 file changed, 11 insertions(+)

I see where you're coming from with this patch, but I think it's
unnecessary. Turning on/off parsing of Table A based on existence of
Table B just seems like extra work.


Re: [PATCH 4/4] efi: Skip parsing of EFI_PROPERTIES_TABLE if EFI_MEMORY_ATTRIBUTES_TABLE is detected

2016-12-07 Thread Matt Fleming
On Tue, 06 Dec, at 11:16:03AM, Sai Praneeth Prakhya wrote:
> From: Sai Praneeth 
> 
> UEFI specification v2.6 recommends not to use
> "EFI_PROPERTIES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA"
> attribute of EFI_PROPERTIES_TABLE. Presently, this is the *only* bit
> defined in EFI_PROPERTIES_TABLE. This bit implies that EFI Runtime code
> and data regions of an executable image are separate and are aligned as
> specified in spec. Please refer to "EFI_PROPERTIES_TABLE" in section 4.6
> of UEFI specification v2.6 for more information on this table.
> 
> UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE and is intended to
> replace EFI_PROPERTIES_TABLE. If EFI_MEMORY_ATTRIBUTES_TABLE is found we
> skip updating of efi runtime region mappings based on
> EFI_PROPERTIES_TABLE, so let's also skip parsing of EFI_PROPERTIES_TABLE
> if we find EFI_MEMORY_ATTRIBUTES_TABLE because we are not using this
> table anyways. The only caveat here is, if further versions of UEFI spec
> adds some more bits (hence some more attributes) to EFI_PROPERTIES_TABLE
> then we might need to parse it again, otherwise there is no good in
> doing that. We can also expect that the same attributes might be reflected in
> EFI_MEMORY_ATTRIBUTES_TABLE and hence saving us from parsing
> EFI_PROPERTIES_TABLE again.
> 
> Signed-off-by: Sai Praneeth Prakhya 
> Cc: Lee, Chun-Yi 
> Cc: Borislav Petkov 
> Cc: Ricardo Neri 
> Cc: Matt Fleming 
> Cc: Ard Biesheuvel 
> Cc: Ravi Shankar 
> Cc: Fenghua Yu 
> ---
>  drivers/firmware/efi/efi.c | 11 +++
>  1 file changed, 11 insertions(+)

I see where you're coming from with this patch, but I think it's
unnecessary. Turning on/off parsing of Table A based on existence of
Table B just seems like extra work.


Re: [PATCH 1/4] efi: Make EFI_MEMORY_ATTRIBUTES_TABLE initialization common across all architectures

2016-12-07 Thread Matt Fleming
On Tue, 06 Dec, at 11:16:00AM, Sai Praneeth Prakhya wrote:
> From: Sai Praneeth <sai.praneeth.prak...@intel.com>
> 
> Since EFI_PROPERTIES_TABLE and EFI_MEMORY_ATTRIBUTES_TABLE deal with
> updating memory region attributes, it makes sense to call
> EFI_MEMORY_ATTRIBUTES_TABLE initialization function from the same place
> as EFI_PROPERTIES_TABLE. This also moves the EFI_MEMORY_ATTRIBUTES_TABLE
> initialization code to a more generic efi initialization path rather
> than ARM specific efi initialization. This is important because
> EFI_MEMORY_ATTRIBUTES_TABLE will be supported by x86 as well.
> 
> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
> Cc: Lee, Chun-Yi <j...@suse.com>
> Cc: Borislav Petkov <b...@alien8.de>
> Cc: Ricardo Neri <ricardo.n...@intel.com>
> Cc: Matt Fleming <m...@codeblueprint.co.uk>
> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
> Cc: Ravi Shankar <ravi.v.shan...@intel.com>
> Cc: Fenghua Yu <fenghua...@intel.com>
> ---
>  drivers/firmware/efi/arm-init.c | 1 -
>  drivers/firmware/efi/efi.c  | 2 ++
>  2 files changed, 2 insertions(+), 1 deletion(-)

You should have applied Joey's Reviewed-by tag when sending out this
version so that it doesn't get lost.

It's not a big deal, just something to watch out for with future
submissions.


Re: [PATCH 1/4] efi: Make EFI_MEMORY_ATTRIBUTES_TABLE initialization common across all architectures

2016-12-07 Thread Matt Fleming
On Tue, 06 Dec, at 11:16:00AM, Sai Praneeth Prakhya wrote:
> From: Sai Praneeth 
> 
> Since EFI_PROPERTIES_TABLE and EFI_MEMORY_ATTRIBUTES_TABLE deal with
> updating memory region attributes, it makes sense to call
> EFI_MEMORY_ATTRIBUTES_TABLE initialization function from the same place
> as EFI_PROPERTIES_TABLE. This also moves the EFI_MEMORY_ATTRIBUTES_TABLE
> initialization code to a more generic efi initialization path rather
> than ARM specific efi initialization. This is important because
> EFI_MEMORY_ATTRIBUTES_TABLE will be supported by x86 as well.
> 
> Signed-off-by: Sai Praneeth Prakhya 
> Cc: Lee, Chun-Yi 
> Cc: Borislav Petkov 
> Cc: Ricardo Neri 
> Cc: Matt Fleming 
> Cc: Ard Biesheuvel 
> Cc: Ravi Shankar 
> Cc: Fenghua Yu 
> ---
>  drivers/firmware/efi/arm-init.c | 1 -
>  drivers/firmware/efi/efi.c  | 2 ++
>  2 files changed, 2 insertions(+), 1 deletion(-)

You should have applied Joey's Reviewed-by tag when sending out this
version so that it doesn't get lost.

It's not a big deal, just something to watch out for with future
submissions.


Re: [PATCH 0/4] UEFI: EFI_MEMORY_ATTRIBUTES_TABLE support for x86

2016-12-07 Thread Matt Fleming
On Tue, 06 Dec, at 11:15:59AM, Sai Praneeth Prakhya wrote:
> From: Sai Praneeth 
> 
> UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE which describes memory 
> protections that may be applied to EFI Runtime code and data regions by 
> kernel. This helps kernel to map efi runtime regions more strictly and 
> hence allowing only appropriate accesses to these regions. Please refer 
> to "EFI_MEMORY_ATTRIBUTES_TABLE" in section 4.6 of UEFI specification 
> v2.6 for more information on this table.
 
I guess this is v3 of your patch series right? Please be sure to
include a version tag in future like you did for you v2. 


Re: [PATCH 0/4] UEFI: EFI_MEMORY_ATTRIBUTES_TABLE support for x86

2016-12-07 Thread Matt Fleming
On Tue, 06 Dec, at 11:15:59AM, Sai Praneeth Prakhya wrote:
> From: Sai Praneeth 
> 
> UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE which describes memory 
> protections that may be applied to EFI Runtime code and data regions by 
> kernel. This helps kernel to map efi runtime regions more strictly and 
> hence allowing only appropriate accesses to these regions. Please refer 
> to "EFI_MEMORY_ATTRIBUTES_TABLE" in section 4.6 of UEFI specification 
> v2.6 for more information on this table.
 
I guess this is v3 of your patch series right? Please be sure to
include a version tag in future like you did for you v2. 


Re: [RFC PATCH v3 10/20] Add support to access boot related data in the clear

2016-12-07 Thread Matt Fleming
On Wed, 09 Nov, at 06:36:31PM, Tom Lendacky wrote:
> Boot data (such as EFI related data) is not encrypted when the system is
> booted and needs to be accessed unencrypted.  Add support to apply the
> proper attributes to the EFI page tables and to the early_memremap and
> memremap APIs to identify the type of data being accessed so that the
> proper encryption attribute can be applied.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/e820.h|1 
>  arch/x86/kernel/e820.c |   16 +++
>  arch/x86/mm/ioremap.c  |   89 
> 
>  arch/x86/platform/efi/efi_64.c |   12 -
>  drivers/firmware/efi/efi.c |   33 +++
>  include/linux/efi.h|2 +
>  kernel/memremap.c  |8 +++-
>  mm/early_ioremap.c |   18 +++-
>  8 files changed, 172 insertions(+), 7 deletions(-)
 
FWIW, I think this version is an improvement over all the previous
ones.

[...]

> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index ff542cd..ee347c2 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -20,6 +20,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  
>  #include "physaddr.h"
>  
> @@ -418,6 +421,92 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
>   iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
>  }
>  
> +static bool memremap_setup_data(resource_size_t phys_addr,
> + unsigned long size)
> +{
> + u64 paddr;
> +
> + if (phys_addr == boot_params.hdr.setup_data)
> + return true;
> +

Why is the setup_data linked list not traversed when checking for
matching addresses? Am I reading this incorrectly? I don't see how
this can work.

> + paddr = boot_params.efi_info.efi_memmap_hi;
> + paddr <<= 32;
> + paddr |= boot_params.efi_info.efi_memmap;
> + if (phys_addr == paddr)
> + return true;
> +
> + paddr = boot_params.efi_info.efi_systab_hi;
> + paddr <<= 32;
> + paddr |= boot_params.efi_info.efi_systab;
> + if (phys_addr == paddr)
> + return true;
> +
> + if (efi_table_address_match(phys_addr))
> + return true;
> +
> + return false;
> +}
> +
> +static bool memremap_apply_encryption(resource_size_t phys_addr,
> +   unsigned long size)
> +{
> + /* SME is not active, just return true */
> + if (!sme_me_mask)
> + return true;
> +
> + /* Check if the address is part of the setup data */
> + if (memremap_setup_data(phys_addr, size))
> + return false;
> +
> + /* Check if the address is part of EFI boot/runtime data */
> + switch (efi_mem_type(phys_addr)) {
> + case EFI_BOOT_SERVICES_DATA:
> + case EFI_RUNTIME_SERVICES_DATA:
> + return false;
> + }

EFI_LOADER_DATA is notable by its absence.

We use that memory type for allocations inside of the EFI boot stub
that are than used while the kernel is running. One use that comes to
mind is for initrd files, see handle_cmdline_files().

Oh I see you handle that in PATCH 9, never mind.

> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index 58b0f80..3f89179 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -221,7 +221,13 @@ int __init efi_setup_page_tables(unsigned long 
> pa_memmap, unsigned num_pages)
>   if (efi_enabled(EFI_OLD_MEMMAP))
>   return 0;
>  
> - efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
> + /*
> +  * Since the PGD is encrypted, set the encryption mask so that when
> +  * this value is loaded into cr3 the PGD will be decrypted during
> +  * the pagetable walk.
> +  */
> + efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
> +
>   pgd = efi_pgd;
>  
>   /*

Do all callers of __pa() in arch/x86 need fixing up like this?


Re: [RFC PATCH v3 10/20] Add support to access boot related data in the clear

2016-12-07 Thread Matt Fleming
On Wed, 09 Nov, at 06:36:31PM, Tom Lendacky wrote:
> Boot data (such as EFI related data) is not encrypted when the system is
> booted and needs to be accessed unencrypted.  Add support to apply the
> proper attributes to the EFI page tables and to the early_memremap and
> memremap APIs to identify the type of data being accessed so that the
> proper encryption attribute can be applied.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/e820.h|1 
>  arch/x86/kernel/e820.c |   16 +++
>  arch/x86/mm/ioremap.c  |   89 
> 
>  arch/x86/platform/efi/efi_64.c |   12 -
>  drivers/firmware/efi/efi.c |   33 +++
>  include/linux/efi.h|2 +
>  kernel/memremap.c  |8 +++-
>  mm/early_ioremap.c |   18 +++-
>  8 files changed, 172 insertions(+), 7 deletions(-)
 
FWIW, I think this version is an improvement over all the previous
ones.

[...]

> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index ff542cd..ee347c2 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -20,6 +20,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  
>  #include "physaddr.h"
>  
> @@ -418,6 +421,92 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
>   iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
>  }
>  
> +static bool memremap_setup_data(resource_size_t phys_addr,
> + unsigned long size)
> +{
> + u64 paddr;
> +
> + if (phys_addr == boot_params.hdr.setup_data)
> + return true;
> +

Why is the setup_data linked list not traversed when checking for
matching addresses? Am I reading this incorrectly? I don't see how
this can work.

> + paddr = boot_params.efi_info.efi_memmap_hi;
> + paddr <<= 32;
> + paddr |= boot_params.efi_info.efi_memmap;
> + if (phys_addr == paddr)
> + return true;
> +
> + paddr = boot_params.efi_info.efi_systab_hi;
> + paddr <<= 32;
> + paddr |= boot_params.efi_info.efi_systab;
> + if (phys_addr == paddr)
> + return true;
> +
> + if (efi_table_address_match(phys_addr))
> + return true;
> +
> + return false;
> +}
> +
> +static bool memremap_apply_encryption(resource_size_t phys_addr,
> +   unsigned long size)
> +{
> + /* SME is not active, just return true */
> + if (!sme_me_mask)
> + return true;
> +
> + /* Check if the address is part of the setup data */
> + if (memremap_setup_data(phys_addr, size))
> + return false;
> +
> + /* Check if the address is part of EFI boot/runtime data */
> + switch (efi_mem_type(phys_addr)) {
> + case EFI_BOOT_SERVICES_DATA:
> + case EFI_RUNTIME_SERVICES_DATA:
> + return false;
> + }

EFI_LOADER_DATA is notable by its absence.

We use that memory type for allocations inside of the EFI boot stub
that are than used while the kernel is running. One use that comes to
mind is for initrd files, see handle_cmdline_files().

Oh I see you handle that in PATCH 9, never mind.

> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index 58b0f80..3f89179 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -221,7 +221,13 @@ int __init efi_setup_page_tables(unsigned long 
> pa_memmap, unsigned num_pages)
>   if (efi_enabled(EFI_OLD_MEMMAP))
>   return 0;
>  
> - efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
> + /*
> +  * Since the PGD is encrypted, set the encryption mask so that when
> +  * this value is loaded into cr3 the PGD will be decrypted during
> +  * the pagetable walk.
> +  */
> + efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
> +
>   pgd = efi_pgd;
>  
>   /*

Do all callers of __pa() in arch/x86 need fixing up like this?


Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group

2016-12-05 Thread Matt Fleming
On Mon, 05 Dec, at 10:27:36AM, Vincent Guittot wrote:
> 
> Hi Matt,
> 
> Thanks for the results.
> 
> During the review, it has been pointed out by Morten that the test condition
> (100*this_avg_load < imbalance_scale*min_avg_load) makes more sense than
> (100*min_avg_load > imbalance_scale*this_avg_load). But i see lower
> performances with this change. Coud you run tests with the change below on
> top of the patchset ?
> 
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e8d1ae7..0129fbb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5514,7 +5514,7 @@ find_idlest_group(struct sched_domain *sd, struct 
> task_struct *p,
>   if (!idlest ||
>   (min_runnable_load > (this_runnable_load + imbalance)) ||
>   ((this_runnable_load < (min_runnable_load + imbalance)) &&
> - (100*min_avg_load > imbalance_scale*this_avg_load)))
> + (100*this_avg_load < imbalance_scale*min_avg_load)))
>   return NULL;
>   return idlest;
>  }

Queued for testing.


Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group

2016-12-05 Thread Matt Fleming
On Mon, 05 Dec, at 10:27:36AM, Vincent Guittot wrote:
> 
> Hi Matt,
> 
> Thanks for the results.
> 
> During the review, it has been pointed out by Morten that the test condition
> (100*this_avg_load < imbalance_scale*min_avg_load) makes more sense than
> (100*min_avg_load > imbalance_scale*this_avg_load). But i see lower
> performances with this change. Coud you run tests with the change below on
> top of the patchset ?
> 
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e8d1ae7..0129fbb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5514,7 +5514,7 @@ find_idlest_group(struct sched_domain *sd, struct 
> task_struct *p,
>   if (!idlest ||
>   (min_runnable_load > (this_runnable_load + imbalance)) ||
>   ((this_runnable_load < (min_runnable_load + imbalance)) &&
> - (100*min_avg_load > imbalance_scale*this_avg_load)))
> + (100*this_avg_load < imbalance_scale*min_avg_load)))
>   return NULL;
>   return idlest;
>  }

Queued for testing.


Re: [tip:efi/core] efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit

2016-12-04 Thread Matt Fleming
On Sun, 04 Dec, at 02:31:23PM, Ard Biesheuvel wrote:
> On 4 December 2016 at 14:17, Matt Fleming <m...@codeblueprint.co.uk> wrote:
> >
> > Ard, was this picked up for the correct tip branch? If it fixes a
> > build failure it should have gone into tip/efi/urgent, right?
> 
> The failure was in -next, with a patch queued up for v4.10, so that is
> where the fix went as well.

Oops, sorry I missed that. I was looking at the v4.11 queue and my
brain suffered an off-by-one bug - I thought we'd already had the
v4.10 release.


Re: [tip:efi/core] efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit

2016-12-04 Thread Matt Fleming
On Sun, 04 Dec, at 02:31:23PM, Ard Biesheuvel wrote:
> On 4 December 2016 at 14:17, Matt Fleming  wrote:
> >
> > Ard, was this picked up for the correct tip branch? If it fixes a
> > build failure it should have gone into tip/efi/urgent, right?
> 
> The failure was in -next, with a patch queued up for v4.10, so that is
> where the fix went as well.

Oops, sorry I missed that. I was looking at the v4.11 queue and my
brain suffered an off-by-one bug - I thought we'd already had the
v4.10 release.


Re: [tip:efi/core] efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit

2016-12-04 Thread Matt Fleming
On Thu, 24 Nov, at 10:47:15PM, tip-bot for Ard Biesheuvel wrote:
> Commit-ID:  018edcfac4c3b140366ad51b0907f3becb5bb624
> Gitweb: http://git.kernel.org/tip/018edcfac4c3b140366ad51b0907f3becb5bb624
> Author: Ard Biesheuvel <ard.biesheu...@linaro.org>
> AuthorDate: Thu, 24 Nov 2016 18:02:23 +
> Committer:  Ingo Molnar <mi...@kernel.org>
> CommitDate: Fri, 25 Nov 2016 07:15:23 +0100
> 
> efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit
> 
> The UEFI stub executes in the context of the firmware, which identity
> maps the available system RAM, which implies that only memory below
> 4 GB can be used for allocations on 32-bit architectures, even on [L]PAE
> capable hardware.
> 
> So ignore any reported memory above 4 GB in efi_random_alloc(). This
> also fixes a reported build problem on ARM under -Os, where the 64-bit
> logical shift relies on a software routine that the ARM decompressor does
> not provide.
> 
> A second [minor] issue is also fixed, where the '+ 1' is moved out of
> the shift, where it belongs: the reason for its presence is that a
> memory region where start == end should count as a single slot, given
> that 'end' takes the desired size and alignment of the allocation into
> account.
> 
> To clarify the code in this regard, rename start/end to 'first_slot' and
> 'last_slot', respectively, and introduce 'region_end' to describe the
> last usable address of the current region.
> 
> Reported-by: Arnd Bergmann <a...@arndb.de>
> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
> Cc: Linus Torvalds <torva...@linux-foundation.org>
> Cc: Matt Fleming <m...@codeblueprint.co.uk>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: linux-...@vger.kernel.org
> Link: 
> http://lkml.kernel.org/r/1480010543-25709-2-git-send-email-ard.biesheu...@linaro.org
> Signed-off-by: Ingo Molnar <mi...@kernel.org>
> ---
>  drivers/firmware/efi/libstub/random.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)

Ard, was this picked up for the correct tip branch? If it fixes a
build failure it should have gone into tip/efi/urgent, right?


Re: [tip:efi/core] efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit

2016-12-04 Thread Matt Fleming
On Thu, 24 Nov, at 10:47:15PM, tip-bot for Ard Biesheuvel wrote:
> Commit-ID:  018edcfac4c3b140366ad51b0907f3becb5bb624
> Gitweb: http://git.kernel.org/tip/018edcfac4c3b140366ad51b0907f3becb5bb624
> Author: Ard Biesheuvel 
> AuthorDate: Thu, 24 Nov 2016 18:02:23 +
> Committer:  Ingo Molnar 
> CommitDate: Fri, 25 Nov 2016 07:15:23 +0100
> 
> efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit
> 
> The UEFI stub executes in the context of the firmware, which identity
> maps the available system RAM, which implies that only memory below
> 4 GB can be used for allocations on 32-bit architectures, even on [L]PAE
> capable hardware.
> 
> So ignore any reported memory above 4 GB in efi_random_alloc(). This
> also fixes a reported build problem on ARM under -Os, where the 64-bit
> logical shift relies on a software routine that the ARM decompressor does
> not provide.
> 
> A second [minor] issue is also fixed, where the '+ 1' is moved out of
> the shift, where it belongs: the reason for its presence is that a
> memory region where start == end should count as a single slot, given
> that 'end' takes the desired size and alignment of the allocation into
> account.
> 
> To clarify the code in this regard, rename start/end to 'first_slot' and
> 'last_slot', respectively, and introduce 'region_end' to describe the
> last usable address of the current region.
> 
> Reported-by: Arnd Bergmann 
> Signed-off-by: Ard Biesheuvel 
> Cc: Linus Torvalds 
> Cc: Matt Fleming 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: linux-...@vger.kernel.org
> Link: 
> http://lkml.kernel.org/r/1480010543-25709-2-git-send-email-ard.biesheu...@linaro.org
> Signed-off-by: Ingo Molnar 
> ---
>  drivers/firmware/efi/libstub/random.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)

Ard, was this picked up for the correct tip branch? If it fixes a
build failure it should have gone into tip/efi/urgent, right?


Re: [PATCH 1/2 v2] sched: fix find_idlest_group for fork

2016-12-03 Thread Matt Fleming
On Fri, 25 Nov, at 04:34:32PM, Vincent Guittot wrote:
> During fork, the utilization of a task is init once the rq has been
> selected because the current utilization level of the rq is used to set
> the utilization of the fork task. As the task's utilization is still
> null at this step of the fork sequence, it doesn't make sense to look for
> some spare capacity that can fit the task's utilization.
> Furthermore, I can see perf regressions for the test "hackbench -P -g 1"
> because the least loaded policy is always bypassed and tasks are not
> spread during fork.
> 
> With this patch and the fix below, we are back to same performances as
> for v4.8. The fix below is only a temporary one used for the test until a
> smarter solution is found because we can't simply remove the test which is
> useful for others benchmarks
> 
> @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, 
> struct sched_domain *sd, int t
>  
>   avg_cost = this_sd->avg_scan_cost;
>  
> - /*
> -  * Due to large variance we need a large fuzz factor; hackbench in
> -  * particularly is sensitive here.
> -  */
> - if ((avg_idle / 512) < avg_cost)
> - return -1;
> -
>   time = local_clock();
>  
>   for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) {
> 

OK, I need to point out that I didn't apply the above hunk when
testing this patch series. But I wouldn't have expected that to impact
our fork-intensive workloads so much. Let me know if you'd like me to
re-run with it applied.

I don't see much of a difference, positive or negative, for the
majority of the test machines, it's mainly a wash.

However, the following 4-cpu Xeon E5504 machine does show a nice win,
with thread counts in the mid-range (note, the second column is number
of hackbench groups, where each group has 40 tasks),

hackbench-process-pipes
4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
tip-sched  fix-fig-for-fork   fix-sig
Amean1   0.2193 (  0.00%)  0.2014 (  8.14%)  0.1746 ( 20.39%)
Amean3   0.4489 (  0.00%)  0.3544 ( 21.04%)  0.3284 ( 26.83%)
Amean5   0.6173 (  0.00%)  0.4690 ( 24.02%)  0.4977 ( 19.37%)
Amean7   0.7323 (  0.00%)  0.6367 ( 13.05%)  0.6267 ( 14.42%)
Amean12  0.9716 (  0.00%)  1.0187 ( -4.85%)  0.9351 (  3.75%)
Amean16  1.2866 (  0.00%)  1.2664 (  1.57%)  1.2131 (  5.71%)


Re: [PATCH 1/2 v2] sched: fix find_idlest_group for fork

2016-12-03 Thread Matt Fleming
On Fri, 25 Nov, at 04:34:32PM, Vincent Guittot wrote:
> During fork, the utilization of a task is init once the rq has been
> selected because the current utilization level of the rq is used to set
> the utilization of the fork task. As the task's utilization is still
> null at this step of the fork sequence, it doesn't make sense to look for
> some spare capacity that can fit the task's utilization.
> Furthermore, I can see perf regressions for the test "hackbench -P -g 1"
> because the least loaded policy is always bypassed and tasks are not
> spread during fork.
> 
> With this patch and the fix below, we are back to same performances as
> for v4.8. The fix below is only a temporary one used for the test until a
> smarter solution is found because we can't simply remove the test which is
> useful for others benchmarks
> 
> @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, 
> struct sched_domain *sd, int t
>  
>   avg_cost = this_sd->avg_scan_cost;
>  
> - /*
> -  * Due to large variance we need a large fuzz factor; hackbench in
> -  * particularly is sensitive here.
> -  */
> - if ((avg_idle / 512) < avg_cost)
> - return -1;
> -
>   time = local_clock();
>  
>   for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) {
> 

OK, I need to point out that I didn't apply the above hunk when
testing this patch series. But I wouldn't have expected that to impact
our fork-intensive workloads so much. Let me know if you'd like me to
re-run with it applied.

I don't see much of a difference, positive or negative, for the
majority of the test machines, it's mainly a wash.

However, the following 4-cpu Xeon E5504 machine does show a nice win,
with thread counts in the mid-range (note, the second column is number
of hackbench groups, where each group has 40 tasks),

hackbench-process-pipes
4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
tip-sched  fix-fig-for-fork   fix-sig
Amean1   0.2193 (  0.00%)  0.2014 (  8.14%)  0.1746 ( 20.39%)
Amean3   0.4489 (  0.00%)  0.3544 ( 21.04%)  0.3284 ( 26.83%)
Amean5   0.6173 (  0.00%)  0.4690 ( 24.02%)  0.4977 ( 19.37%)
Amean7   0.7323 (  0.00%)  0.6367 ( 13.05%)  0.6267 ( 14.42%)
Amean12  0.9716 (  0.00%)  1.0187 ( -4.85%)  0.9351 (  3.75%)
Amean16  1.2866 (  0.00%)  1.2664 (  1.57%)  1.2131 (  5.71%)


Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group

2016-12-03 Thread Matt Fleming
On Fri, 02 Dec, at 07:31:04PM, Brendan Gregg wrote:
> 
> For background, is this from the "A decade of wasted cores" paper's
> patches?

No, this patch fixes an issue I originally reported here,

  https://lkml.kernel.org/r/20160923115808.2330-1-m...@codeblueprint.co.uk

Essentially, if you have an idle or partially-idle system and a
workload that consists of fork()'ing a bunch of tasks, where each of
those tasks immediately sleeps waiting for some wakeup, then those
tasks aren't spread across all idle CPUs very well.

We saw this issue when running hackbench with a small loop count, such
that the actual benchmark setup (fork()'ing) is where the majority of
the runtime is spent.

In that scenario, there's a large potential/blocked load, but
essentially no runnable load, and the balance on fork scheduler code
only cares about runnable load without Vincent's patch applied.

The closest thing I can find in the "A decade of wasted cores" paper
is "The Overload-on-Wakeup bug", but I don't think that's the issue
here since,

  a) We're balancing on fork, not wakeup
  b) The fork on balance code balances across nodes OK

> What's the expected typical gain? Thanks,

The results are still coming back from the SUSE performance test grid
but they do show that this patch is mainly a win for multi-socket
machines with more than 8 cores or thereabouts.

 [ Vincent, I'll follow up to your PATCH 1/2 with the results that are
   specifically for that patch ]

Assuming a fork-intensive or fork-dominated workload, and a
multi-socket machine, such as this 2 socket, NUMA, with 12 cores and
HT enabled (48 cpus), we saw a very clear win between +10% and +15%
for processes communicating via pipes,

  (1) tip-sched = tip/sched/core branch
  (2) fix-fig-for-fork = (1) + PATCH 1/2
  (3) fix-sig = (1) + (2) + PATCH 2/2

hackbench-process-pipes
 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
 tip-sched  fix-fig-for-fork   fix-sig
Amean10.0717 (  0.00%)  0.0696 (  2.99%)  0.0730 ( -1.79%)
Amean40.1244 (  0.00%)  0.1200 (  3.56%)  0.1190 (  4.36%)
Amean70.1891 (  0.00%)  0.1937 ( -2.42%)  0.1831 (  3.17%)
Amean12   0.2964 (  0.00%)  0.3116 ( -5.11%)  0.2784 (  6.07%)
Amean21   0.4011 (  0.00%)  0.4090 ( -1.96%)  0.3574 ( 10.90%)
Amean30   0.4944 (  0.00%)  0.4654 (  5.87%)  0.4171 ( 15.63%)
Amean48   0.6113 (  0.00%)  0.6309 ( -3.20%)  0.5331 ( 12.78%)
Amean79   0.8616 (  0.00%)  0.8706 ( -1.04%)  0.7710 ( 10.51%)
Amean110  1.1304 (  0.00%)  1.2211 ( -8.02%)  1.0163 ( 10.10%)
Amean141  1.3754 (  0.00%)  1.4279 ( -3.81%)  1.2803 (  6.92%)
Amean172  1.6217 (  0.00%)  1.7367 ( -7.09%)  1.5363 (  5.27%)
Amean192  1.7809 (  0.00%)  2.0199 (-13.42%)  1.7129 (  3.82%)

Things look even better when using threads and pipes, with wins
between 11% and 29% when looking at results outside of the noise,

hackbench-thread-pipes
 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
 tip-sched  fix-fig-for-fork   fix-sig
Amean10.0736 (  0.00%)  0.0794 ( -7.96%)  0.0779 ( -5.83%)
Amean40.1709 (  0.00%)  0.1690 (  1.09%)  0.1663 (  2.68%)
Amean70.2836 (  0.00%)  0.3080 ( -8.61%)  0.2640 (  6.90%)
Amean12   0.4393 (  0.00%)  0.4843 (-10.24%)  0.4090 (  6.89%)
Amean21   0.5821 (  0.00%)  0.6369 ( -9.40%)  0.5126 ( 11.95%)
Amean30   0.6557 (  0.00%)  0.6459 (  1.50%)  0.5711 ( 12.90%)
Amean48   0.7924 (  0.00%)  0.7760 (  2.07%)  0.6286 ( 20.68%)
Amean79   1.0534 (  0.00%)  1.0551 ( -0.16%)  0.8481 ( 19.49%)
Amean110  1.5286 (  0.00%)  1.4504 (  5.11%)  1.1121 ( 27.24%)
Amean141  1.9507 (  0.00%)  1.7790 (  8.80%)  1.3804 ( 29.23%)
Amean172  2.2261 (  0.00%)  2.3330 ( -4.80%)  1.6336 ( 26.62%)
Amean192  2.3753 (  0.00%)  2.3307 (  1.88%)  1.8246 ( 23.19%)

Somewhat surprisingly, I can see improvements for UMA machines with
fewer cores when the workload heavily saturates the machine and the
workload isn't dominated by fork. Such heavy saturation isn't super
realistic, but still interesting. I haven't dug into why these results
occurred, but I am happy things didn't instead fall off a cliff.

Here's a 4-cpu UMA box showing some improvement at the higher end,

hackbench-process-pipes
4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
tip-sched  fix-fig-for-fork   fix-sig
Amean1   3.5060 (  0.00%)  3.5747 ( -1.96%)  3.5117 ( -0.16%)
Amean3   7.7113 (  0.00%)  7.8160 ( -1.36%)  7.7747 ( -0.82%)
Amean5  11.4453 (  0.00%) 

Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group

2016-12-03 Thread Matt Fleming
On Fri, 02 Dec, at 07:31:04PM, Brendan Gregg wrote:
> 
> For background, is this from the "A decade of wasted cores" paper's
> patches?

No, this patch fixes an issue I originally reported here,

  https://lkml.kernel.org/r/20160923115808.2330-1-m...@codeblueprint.co.uk

Essentially, if you have an idle or partially-idle system and a
workload that consists of fork()'ing a bunch of tasks, where each of
those tasks immediately sleeps waiting for some wakeup, then those
tasks aren't spread across all idle CPUs very well.

We saw this issue when running hackbench with a small loop count, such
that the actual benchmark setup (fork()'ing) is where the majority of
the runtime is spent.

In that scenario, there's a large potential/blocked load, but
essentially no runnable load, and the balance on fork scheduler code
only cares about runnable load without Vincent's patch applied.

The closest thing I can find in the "A decade of wasted cores" paper
is "The Overload-on-Wakeup bug", but I don't think that's the issue
here since,

  a) We're balancing on fork, not wakeup
  b) The fork on balance code balances across nodes OK

> What's the expected typical gain? Thanks,

The results are still coming back from the SUSE performance test grid
but they do show that this patch is mainly a win for multi-socket
machines with more than 8 cores or thereabouts.

 [ Vincent, I'll follow up to your PATCH 1/2 with the results that are
   specifically for that patch ]

Assuming a fork-intensive or fork-dominated workload, and a
multi-socket machine, such as this 2 socket, NUMA, with 12 cores and
HT enabled (48 cpus), we saw a very clear win between +10% and +15%
for processes communicating via pipes,

  (1) tip-sched = tip/sched/core branch
  (2) fix-fig-for-fork = (1) + PATCH 1/2
  (3) fix-sig = (1) + (2) + PATCH 2/2

hackbench-process-pipes
 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
 tip-sched  fix-fig-for-fork   fix-sig
Amean10.0717 (  0.00%)  0.0696 (  2.99%)  0.0730 ( -1.79%)
Amean40.1244 (  0.00%)  0.1200 (  3.56%)  0.1190 (  4.36%)
Amean70.1891 (  0.00%)  0.1937 ( -2.42%)  0.1831 (  3.17%)
Amean12   0.2964 (  0.00%)  0.3116 ( -5.11%)  0.2784 (  6.07%)
Amean21   0.4011 (  0.00%)  0.4090 ( -1.96%)  0.3574 ( 10.90%)
Amean30   0.4944 (  0.00%)  0.4654 (  5.87%)  0.4171 ( 15.63%)
Amean48   0.6113 (  0.00%)  0.6309 ( -3.20%)  0.5331 ( 12.78%)
Amean79   0.8616 (  0.00%)  0.8706 ( -1.04%)  0.7710 ( 10.51%)
Amean110  1.1304 (  0.00%)  1.2211 ( -8.02%)  1.0163 ( 10.10%)
Amean141  1.3754 (  0.00%)  1.4279 ( -3.81%)  1.2803 (  6.92%)
Amean172  1.6217 (  0.00%)  1.7367 ( -7.09%)  1.5363 (  5.27%)
Amean192  1.7809 (  0.00%)  2.0199 (-13.42%)  1.7129 (  3.82%)

Things look even better when using threads and pipes, with wins
between 11% and 29% when looking at results outside of the noise,

hackbench-thread-pipes
 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
 tip-sched  fix-fig-for-fork   fix-sig
Amean10.0736 (  0.00%)  0.0794 ( -7.96%)  0.0779 ( -5.83%)
Amean40.1709 (  0.00%)  0.1690 (  1.09%)  0.1663 (  2.68%)
Amean70.2836 (  0.00%)  0.3080 ( -8.61%)  0.2640 (  6.90%)
Amean12   0.4393 (  0.00%)  0.4843 (-10.24%)  0.4090 (  6.89%)
Amean21   0.5821 (  0.00%)  0.6369 ( -9.40%)  0.5126 ( 11.95%)
Amean30   0.6557 (  0.00%)  0.6459 (  1.50%)  0.5711 ( 12.90%)
Amean48   0.7924 (  0.00%)  0.7760 (  2.07%)  0.6286 ( 20.68%)
Amean79   1.0534 (  0.00%)  1.0551 ( -0.16%)  0.8481 ( 19.49%)
Amean110  1.5286 (  0.00%)  1.4504 (  5.11%)  1.1121 ( 27.24%)
Amean141  1.9507 (  0.00%)  1.7790 (  8.80%)  1.3804 ( 29.23%)
Amean172  2.2261 (  0.00%)  2.3330 ( -4.80%)  1.6336 ( 26.62%)
Amean192  2.3753 (  0.00%)  2.3307 (  1.88%)  1.8246 ( 23.19%)

Somewhat surprisingly, I can see improvements for UMA machines with
fewer cores when the workload heavily saturates the machine and the
workload isn't dominated by fork. Such heavy saturation isn't super
realistic, but still interesting. I haven't dug into why these results
occurred, but I am happy things didn't instead fall off a cliff.

Here's a 4-cpu UMA box showing some improvement at the higher end,

hackbench-process-pipes
4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
tip-sched  fix-fig-for-fork   fix-sig
Amean1   3.5060 (  0.00%)  3.5747 ( -1.96%)  3.5117 ( -0.16%)
Amean3   7.7113 (  0.00%)  7.8160 ( -1.36%)  7.7747 ( -0.82%)
Amean5  11.4453 (  0.00%) 

Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group

2016-12-02 Thread Matt Fleming
On Fri, 02 Dec, at 04:20:54PM, Vincent Guittot wrote:
> 
> Matt,
> 
> Have you been able to get some results for the patchset ?

Sorry, I messed up the test config and the tests never ran :(

I've redone everything and fast-tracked the results on the SUSE grid
now. Should have results by morning.


Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group

2016-12-02 Thread Matt Fleming
On Fri, 02 Dec, at 04:20:54PM, Vincent Guittot wrote:
> 
> Matt,
> 
> Have you been able to get some results for the patchset ?

Sorry, I messed up the test config and the tests never ran :(

I've redone everything and fast-tracked the results on the SUSE grid
now. Should have results by morning.


Re: [PATCH 1/2 v2] sched: fix find_idlest_group for fork

2016-11-29 Thread Matt Fleming
On Tue, 29 Nov, at 12:42:43PM, Peter Zijlstra wrote:
> 
> IIRC, and my pounding head really doesn't remember much, the comment
> reads like we need the large fudge factor because hackbench. That is,
> hackbench would like this test to go away, but others benchmarks will
> tank.
> 
> Now, if only I would've written down which benchmarks that were.. awell.
 
Going out on a limb: Chris' schbench?


Re: [PATCH 1/2 v2] sched: fix find_idlest_group for fork

2016-11-29 Thread Matt Fleming
On Tue, 29 Nov, at 12:42:43PM, Peter Zijlstra wrote:
> 
> IIRC, and my pounding head really doesn't remember much, the comment
> reads like we need the large fudge factor because hackbench. That is,
> hackbench would like this test to go away, but others benchmarks will
> tank.
> 
> Now, if only I would've written down which benchmarks that were.. awell.
 
Going out on a limb: Chris' schbench?


Re: [PATCH 0/2 v2] sched: improve spread of tasks during fork

2016-11-28 Thread Matt Fleming
On Fri, 25 Nov, at 04:34:31PM, Vincent Guittot wrote:
> This patchset was originally 1 patch but a perf regression happened during 
> the rebase.
> The patch 01 fixes the perf regression discovered in tip/sched/core during the
> rebase.
> The patch 02 is the rebase of the patch that fixes the regression raised by
> Matt Fleming [1].
> 
> [1] https://lkml.org/lkml/2016/10/18/206
> 
> Vincent Guittot (2):
>   sched: fix find_idlest_group for fork
>   sched: use load_avg for selecting idlest group
> 
>  kernel/sched/fair.c | 54 
> +++--
>  1 file changed, 44 insertions(+), 10 deletions(-)

FYI, I'm running both of these patches through the SUSE performance
testing grid right now. I'll let you know as soon as I have results
(should be in the next few days).


Re: [PATCH 0/2 v2] sched: improve spread of tasks during fork

2016-11-28 Thread Matt Fleming
On Fri, 25 Nov, at 04:34:31PM, Vincent Guittot wrote:
> This patchset was originally 1 patch but a perf regression happened during 
> the rebase.
> The patch 01 fixes the perf regression discovered in tip/sched/core during the
> rebase.
> The patch 02 is the rebase of the patch that fixes the regression raised by
> Matt Fleming [1].
> 
> [1] https://lkml.org/lkml/2016/10/18/206
> 
> Vincent Guittot (2):
>   sched: fix find_idlest_group for fork
>   sched: use load_avg for selecting idlest group
> 
>  kernel/sched/fair.c | 54 
> +++--
>  1 file changed, 44 insertions(+), 10 deletions(-)

FYI, I'm running both of these patches through the SUSE performance
testing grid right now. I'll let you know as soon as I have results
(should be in the next few days).


Re: [PATCH 1/2 v2] sched: fix find_idlest_group for fork

2016-11-28 Thread Matt Fleming
On Fri, 25 Nov, at 04:34:32PM, Vincent Guittot wrote:
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index aa47589..820a787 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5463,13 +5463,19 @@ find_idlest_group(struct sched_domain *sd, struct 
> task_struct *p,
>* utilized systems if we require spare_capacity > task_util(p),
>* so we allow for some task stuffing by using
>* spare_capacity > task_util(p)/2.
> +  * spare capacity can't be used for fork because the utilization has
> +  * not been set yet as it need to get a rq to init the utilization
>*/
> + if (sd_flag & SD_BALANCE_FORK)
> + goto no_spare;
> +
>   if (this_spare > task_util(p) / 2 &&
>   imbalance*this_spare > 100*most_spare)
>   return NULL;
>   else if (most_spare > task_util(p) / 2)
>   return most_spare_sg;
>  
> +no_spare:
>   if (!idlest || 100*this_load < imbalance*min_load)
>   return NULL;
>   return idlest;

It's only a minor comment, but would you be opposed to calling this
label 'skip_spare' to indicate that spare capacity may exist, but
we're not going to make use of it?


Re: [PATCH 1/2 v2] sched: fix find_idlest_group for fork

2016-11-28 Thread Matt Fleming
On Fri, 25 Nov, at 04:34:32PM, Vincent Guittot wrote:
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index aa47589..820a787 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5463,13 +5463,19 @@ find_idlest_group(struct sched_domain *sd, struct 
> task_struct *p,
>* utilized systems if we require spare_capacity > task_util(p),
>* so we allow for some task stuffing by using
>* spare_capacity > task_util(p)/2.
> +  * spare capacity can't be used for fork because the utilization has
> +  * not been set yet as it need to get a rq to init the utilization
>*/
> + if (sd_flag & SD_BALANCE_FORK)
> + goto no_spare;
> +
>   if (this_spare > task_util(p) / 2 &&
>   imbalance*this_spare > 100*most_spare)
>   return NULL;
>   else if (most_spare > task_util(p) / 2)
>   return most_spare_sg;
>  
> +no_spare:
>   if (!idlest || 100*this_load < imbalance*min_load)
>   return NULL;
>   return idlest;

It's only a minor comment, but would you be opposed to calling this
label 'skip_spare' to indicate that spare capacity may exist, but
we're not going to make use of it?


Re: [PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table

2016-11-14 Thread Matt Fleming
On Mon, 14 Nov, at 02:55:22PM, Ingo Molnar wrote:
> 
> The problem is not that Ard applied the patches, but that you subsequently 
> rebased 
> the tree. For example:
> 
>  commit bf5d1f98c1d8be04a40eabb9dd6913347b1b3fc4
>  Author: Ard Biesheuvel <ard.biesheu...@linaro.org>
>  AuthorDate: Thu Oct 20 12:21:26 2016 +0100
>  Commit: Matt Fleming <m...@codeblueprint.co.uk>
>  CommitDate: Sat Nov 12 21:14:41 2016 +
> 
> efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
> 
> Invoke the EFI_RNG_PROTOCOL protocol in the context of the stub and
> install the Linux-specific RNG seed UEFI config table. This will be
> picked up by the EFI routines in the core kernel to seed the kernel
> entropy pool.
> 
> Cc: Matt Fleming <m...@codeblueprint.co.uk>
> Reviewed-by: Kees Cook <keesc...@chromium.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
 
Ah yes, this is exactly what happened.

> if you rebase it (with your co-maintainer's permission) then you need to add 
> your 
> SoB tag.

OK, will do so in future. Thanks Ingo.


Re: [PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table

2016-11-14 Thread Matt Fleming
On Mon, 14 Nov, at 02:55:22PM, Ingo Molnar wrote:
> 
> The problem is not that Ard applied the patches, but that you subsequently 
> rebased 
> the tree. For example:
> 
>  commit bf5d1f98c1d8be04a40eabb9dd6913347b1b3fc4
>  Author: Ard Biesheuvel 
>  AuthorDate: Thu Oct 20 12:21:26 2016 +0100
>  Commit: Matt Fleming 
>  CommitDate: Sat Nov 12 21:14:41 2016 +
> 
> efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
> 
> Invoke the EFI_RNG_PROTOCOL protocol in the context of the stub and
> install the Linux-specific RNG seed UEFI config table. This will be
> picked up by the EFI routines in the core kernel to seed the kernel
> entropy pool.
> 
> Cc: Matt Fleming 
> Reviewed-by: Kees Cook 
> Signed-off-by: Ard Biesheuvel 
 
Ah yes, this is exactly what happened.

> if you rebase it (with your co-maintainer's permission) then you need to add 
> your 
> SoB tag.

OK, will do so in future. Thanks Ingo.


Re: [PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table

2016-11-14 Thread Matt Fleming
On Sun, 13 Nov, at 09:59:32AM, Ingo Molnar wrote:
> 
> * Ingo Molnar  wrote:
> 
> > I'll apply the patches from email and add your SOB.
> 
> Note that the attached config produces this build error:
> 
> drivers/firmware/efi/apple-properties.c:149:9: error: implicit declaration of 
> function ‘efi_get_device_by_path’ [-Werror=implicit-function-declaration]
> 
> also:
> 
>   warning: (THUNDERBOLT) selects APPLE_PROPERTIES which has unmet direct 
> dependencies (EFI && EFI_STUB && X86)

Oops. Lukas, could you take a look at this (config attached).
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.9.0-rc4 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_SMP=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=3
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_COMPILE_TEST=y
CONFIG_LOCALVERSION=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
# CONFIG_TASK_DELAY_ACCT is not set
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_SRCU=y
# CONFIG_TASKS_RCU is not set
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_RCU_KTHREAD_PRIO=0
CONFIG_RCU_NOCB_CPU=y
# CONFIG_RCU_NOCB_CPU_NONE is not set
CONFIG_RCU_NOCB_CPU_ZERO=y
# CONFIG_RCU_NOCB_CPU_ALL is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=20
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_NMI_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_DEBUG=y
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
CONFIG_PID_NS=y
# CONFIG_NET_NS is not set

Re: [PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table

2016-11-14 Thread Matt Fleming
On Sun, 13 Nov, at 09:59:32AM, Ingo Molnar wrote:
> 
> * Ingo Molnar  wrote:
> 
> > I'll apply the patches from email and add your SOB.
> 
> Note that the attached config produces this build error:
> 
> drivers/firmware/efi/apple-properties.c:149:9: error: implicit declaration of 
> function ‘efi_get_device_by_path’ [-Werror=implicit-function-declaration]
> 
> also:
> 
>   warning: (THUNDERBOLT) selects APPLE_PROPERTIES which has unmet direct 
> dependencies (EFI && EFI_STUB && X86)

Oops. Lukas, could you take a look at this (config attached).
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.9.0-rc4 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_SMP=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=3
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_COMPILE_TEST=y
CONFIG_LOCALVERSION=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
# CONFIG_TASK_DELAY_ACCT is not set
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_SRCU=y
# CONFIG_TASKS_RCU is not set
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_RCU_KTHREAD_PRIO=0
CONFIG_RCU_NOCB_CPU=y
# CONFIG_RCU_NOCB_CPU_NONE is not set
CONFIG_RCU_NOCB_CPU_ZERO=y
# CONFIG_RCU_NOCB_CPU_ALL is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=20
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_NMI_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_DEBUG=y
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
CONFIG_PID_NS=y
# CONFIG_NET_NS is not set
CONFIG_SCHED_AUTOGROUP=y

Re: [PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table

2016-11-14 Thread Matt Fleming
On Sun, 13 Nov, at 08:19:39AM, Ingo Molnar wrote:
> 
> * Matt Fleming <m...@codeblueprint.co.uk> wrote:
> 
> > From: Ard Biesheuvel <ard.biesheu...@linaro.org>
> > 
> > Invoke the EFI_RNG_PROTOCOL protocol in the context of the stub and
> > install the Linux-specific RNG seed UEFI config table. This will be
> > picked up by the EFI routines in the core kernel to seed the kernel
> > entropy pool.
> > 
> > Cc: Matt Fleming <m...@codeblueprint.co.uk>
> > Reviewed-by: Kees Cook <keesc...@chromium.org>
> > Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
> 
> This commit (and the commits after this one) doesn't have a proper signoff 
> chain, 
> probably due to rebasing?
 
Argh, my bad. This is fallout from moving to the co-maintainer model.
My scripts assume they don't need to append a SoB because that was
handled when applying the patch to the git tree.

But that obviously doesn't hold if Ard applies the patch to git, but I
mail out the patches as part of the pull request (or vice versa).

I guess in future you'd wanna see the SoB of the person mailing the
patches, right?

> I'll apply the patches from email and add your SOB.

Thanks.


Re: [PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table

2016-11-14 Thread Matt Fleming
On Sun, 13 Nov, at 08:19:39AM, Ingo Molnar wrote:
> 
> * Matt Fleming  wrote:
> 
> > From: Ard Biesheuvel 
> > 
> > Invoke the EFI_RNG_PROTOCOL protocol in the context of the stub and
> > install the Linux-specific RNG seed UEFI config table. This will be
> > picked up by the EFI routines in the core kernel to seed the kernel
> > entropy pool.
> > 
> > Cc: Matt Fleming 
> > Reviewed-by: Kees Cook 
> > Signed-off-by: Ard Biesheuvel 
> 
> This commit (and the commits after this one) doesn't have a proper signoff 
> chain, 
> probably due to rebasing?
 
Argh, my bad. This is fallout from moving to the co-maintainer model.
My scripts assume they don't need to append a SoB because that was
handled when applying the patch to the git tree.

But that obviously doesn't hold if Ard applies the patch to git, but I
mail out the patches as part of the pull request (or vice versa).

I guess in future you'd wanna see the SoB of the person mailing the
patches, right?

> I'll apply the patches from email and add your SOB.

Thanks.


[tip:efi/urgent] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y

2016-11-13 Thread tip-bot for Matt Fleming
Commit-ID:  f6697df36bdf0bf7fce984605c2918d4a7b4269f
Gitweb: http://git.kernel.org/tip/f6697df36bdf0bf7fce984605c2918d4a7b4269f
Author: Matt Fleming <m...@codeblueprint.co.uk>
AuthorDate: Sat, 12 Nov 2016 21:04:24 +
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Sun, 13 Nov 2016 08:26:40 +0100

x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y

Booting an EFI mixed mode kernel has been crashing since commit:

  e37e43a497d5 ("x86/mm/64: Enable vmapped stacks 
(CONFIG_HAVE_ARCH_VMAP_STACK=y)")

The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.

Enabling CONFIG_DEBUG_VIRTUAL=y immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.

Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.

Suggested-by: Andy Lutomirski <l...@amacapital.net>
Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Josh Poimboeuf <jpoim...@redhat.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20161112210424.5157-3-m...@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/platform/efi/efi_64.c | 80 ++
 1 file changed, 57 insertions(+), 23 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f80..319148b 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -211,6 +212,35 @@ void efi_sync_low_kernel_mappings(void)
memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
 }
 
+/*
+ * Wrapper for slow_virt_to_phys() that handles NULL addresses.
+ */
+static inline phys_addr_t
+virt_to_phys_or_null_size(void *va, unsigned long size)
+{
+   bool bad_size;
+
+   if (!va)
+   return 0;
+
+   if (virt_addr_valid(va))
+   return virt_to_phys(va);
+
+   /*
+* A fully aligned variable on the stack is guaranteed not to
+* cross a page bounary. Try to catch strings on the stack by
+* checking that 'size' is a power of two.
+*/
+   bad_size = size > PAGE_SIZE || !is_power_of_2(size);
+
+   WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size);
+
+   return slow_virt_to_phys(va);
+}
+
+#define virt_to_phys_or_null(addr) \
+   virt_to_phys_or_null_size((addr), sizeof(*(addr)))
+
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
unsigned long pfn, text;
@@ -494,8 +524,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, 
efi_time_cap_t *tc)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
-   phys_tc = virt_to_phys(tc);
+   phys_tm = virt_to_phys_or_null(tm);
+   phys_tc = virt_to_phys_or_null(tc);
 
status = efi_thunk(get_time, phys_tm, phys_tc);
 
@@ -511,7 +541,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_time, phys_tm);
 
@@ -529,9 +559,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t 
*pending,
 
spin_lock(_lock);
 
-   phys_enabled = virt_to_phys(enabled);
-   phys_pending = virt_to_phys(pending);
-   phys_tm = virt_to_phys(tm);
+   phys_enabled = virt_to_phys_or_null(enabled);
+   phys_pending = virt_to_phys_or_null(pending);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(get_wakeup_time, phys_enabled,
 phys_pending, phys_tm);
@@ -549,7 +579,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_wakeup_time, enabled, phys_tm);
 
@@ -558,6 +588,10 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm)
return status;
 }
 
+static unsigned long efi_name_size(efi_char16_t *name)
+{
+   return ucs2_strsize(name, EFI_VAR_NAME_LEN) + 1;
+}
 
 static efi_status_t
 efi_thunk_get_variable(efi_char16_t *name, efi_guid_t *vendor,
@@ -567,11 +601,11 @@ efi_thunk_get_variable(efi_char16_t *name, ef

[tip:efi/urgent] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y

2016-11-13 Thread tip-bot for Matt Fleming
Commit-ID:  f6697df36bdf0bf7fce984605c2918d4a7b4269f
Gitweb: http://git.kernel.org/tip/f6697df36bdf0bf7fce984605c2918d4a7b4269f
Author: Matt Fleming 
AuthorDate: Sat, 12 Nov 2016 21:04:24 +
Committer:  Ingo Molnar 
CommitDate: Sun, 13 Nov 2016 08:26:40 +0100

x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y

Booting an EFI mixed mode kernel has been crashing since commit:

  e37e43a497d5 ("x86/mm/64: Enable vmapped stacks 
(CONFIG_HAVE_ARCH_VMAP_STACK=y)")

The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.

Enabling CONFIG_DEBUG_VIRTUAL=y immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.

Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.

Suggested-by: Andy Lutomirski 
Signed-off-by: Matt Fleming 
Cc: Andy Lutomirski 
Cc: Ard Biesheuvel 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20161112210424.5157-3-m...@codeblueprint.co.uk
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/efi/efi_64.c | 80 ++
 1 file changed, 57 insertions(+), 23 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f80..319148b 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -211,6 +212,35 @@ void efi_sync_low_kernel_mappings(void)
memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
 }
 
+/*
+ * Wrapper for slow_virt_to_phys() that handles NULL addresses.
+ */
+static inline phys_addr_t
+virt_to_phys_or_null_size(void *va, unsigned long size)
+{
+   bool bad_size;
+
+   if (!va)
+   return 0;
+
+   if (virt_addr_valid(va))
+   return virt_to_phys(va);
+
+   /*
+* A fully aligned variable on the stack is guaranteed not to
+* cross a page bounary. Try to catch strings on the stack by
+* checking that 'size' is a power of two.
+*/
+   bad_size = size > PAGE_SIZE || !is_power_of_2(size);
+
+   WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size);
+
+   return slow_virt_to_phys(va);
+}
+
+#define virt_to_phys_or_null(addr) \
+   virt_to_phys_or_null_size((addr), sizeof(*(addr)))
+
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
unsigned long pfn, text;
@@ -494,8 +524,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, 
efi_time_cap_t *tc)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
-   phys_tc = virt_to_phys(tc);
+   phys_tm = virt_to_phys_or_null(tm);
+   phys_tc = virt_to_phys_or_null(tc);
 
status = efi_thunk(get_time, phys_tm, phys_tc);
 
@@ -511,7 +541,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_time, phys_tm);
 
@@ -529,9 +559,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t 
*pending,
 
spin_lock(_lock);
 
-   phys_enabled = virt_to_phys(enabled);
-   phys_pending = virt_to_phys(pending);
-   phys_tm = virt_to_phys(tm);
+   phys_enabled = virt_to_phys_or_null(enabled);
+   phys_pending = virt_to_phys_or_null(pending);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(get_wakeup_time, phys_enabled,
 phys_pending, phys_tm);
@@ -549,7 +579,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_wakeup_time, enabled, phys_tm);
 
@@ -558,6 +588,10 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm)
return status;
 }
 
+static unsigned long efi_name_size(efi_char16_t *name)
+{
+   return ucs2_strsize(name, EFI_VAR_NAME_LEN) + 1;
+}
 
 static efi_status_t
 efi_thunk_get_variable(efi_char16_t *name, efi_guid_t *vendor,
@@ -567,11 +601,11 @@ efi_thunk_get_variable(efi_char16_t *name, efi_guid_t 
*vendor,
u32 phys_name, phys_vendor, phys_attr;
u32 phys_data_size, phys_data;
 
-   phys_data_size = virt_to_phys(data_size);
-   phys_vendor = virt_to_phys(vendor);
-   phys_name = virt_to_phys(name);
-   phys_attr = virt_to_phys(attr);
-   phys_data = virt_to_phys(data);
+   phys_data_size = virt_to_phys_or_null(data_size);
+   phys_vendor = virt_to_phys_o

[PATCH 1/9] efi/libstub: Fix allocation size calculations

2016-11-12 Thread Matt Fleming
From: Roy Franz <roy.fr...@hpe.com>

Adjust the size used in calculations to match the actual size of allocation
that will be performed based on EFI size/alignment constraints.
efi_high_alloc() and efi_low_alloc() use the passed size in bytes directly
to find space in the memory map for the allocation, rather than the actual
allocation size that has been adjusted for size and alignment constraints.
This results in failed allocations and retries in efi_high_alloc().  The
same error is present in efi_low_alloc(), although failure will only happen
if the lowest memory block is small.
Also use EFI_PAGE_SIZE consistently and remove use of EFI_PAGE_SHIFT to
calculate page size.

Signed-off-by: Roy Franz <roy.fr...@hpe.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
---
 drivers/firmware/efi/libstub/efi-stub-helper.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c 
b/drivers/firmware/efi/libstub/efi-stub-helper.c
index aded10662020..4b74bf86c74d 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -186,14 +186,16 @@ efi_status_t efi_high_alloc(efi_system_table_t 
*sys_table_arg,
goto fail;
 
/*
-* Enforce minimum alignment that EFI requires when requesting
-* a specific address.  We are doing page-based allocations,
-* so we must be aligned to a page.
+* Enforce minimum alignment that EFI or Linux requires when
+* requesting a specific address.  We are doing page-based (or
+* larger) allocations, and both the address and size must meet
+* alignment constraints.
 */
if (align < EFI_ALLOC_ALIGN)
align = EFI_ALLOC_ALIGN;
 
-   nr_pages = round_up(size, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE;
+   size = round_up(size, EFI_ALLOC_ALIGN);
+   nr_pages = size / EFI_PAGE_SIZE;
 again:
for (i = 0; i < map_size / desc_size; i++) {
efi_memory_desc_t *desc;
@@ -208,7 +210,7 @@ efi_status_t efi_high_alloc(efi_system_table_t 
*sys_table_arg,
continue;
 
start = desc->phys_addr;
-   end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT);
+   end = start + desc->num_pages * EFI_PAGE_SIZE;
 
if (end > max)
end = max;
@@ -278,14 +280,16 @@ efi_status_t efi_low_alloc(efi_system_table_t 
*sys_table_arg,
goto fail;
 
/*
-* Enforce minimum alignment that EFI requires when requesting
-* a specific address.  We are doing page-based allocations,
-* so we must be aligned to a page.
+* Enforce minimum alignment that EFI or Linux requires when
+* requesting a specific address.  We are doing page-based (or
+* larger) allocations, and both the address and size must meet
+* alignment constraints.
 */
if (align < EFI_ALLOC_ALIGN)
align = EFI_ALLOC_ALIGN;
 
-   nr_pages = round_up(size, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE;
+   size = round_up(size, EFI_ALLOC_ALIGN);
+   nr_pages = size / EFI_PAGE_SIZE;
for (i = 0; i < map_size / desc_size; i++) {
efi_memory_desc_t *desc;
unsigned long m = (unsigned long)map;
@@ -300,7 +304,7 @@ efi_status_t efi_low_alloc(efi_system_table_t 
*sys_table_arg,
continue;
 
start = desc->phys_addr;
-   end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT);
+   end = start + desc->num_pages * EFI_PAGE_SIZE;
 
/*
 * Don't allocate at 0x0. It will confuse code that
-- 
2.10.0



[PATCH 1/9] efi/libstub: Fix allocation size calculations

2016-11-12 Thread Matt Fleming
From: Roy Franz 

Adjust the size used in calculations to match the actual size of allocation
that will be performed based on EFI size/alignment constraints.
efi_high_alloc() and efi_low_alloc() use the passed size in bytes directly
to find space in the memory map for the allocation, rather than the actual
allocation size that has been adjusted for size and alignment constraints.
This results in failed allocations and retries in efi_high_alloc().  The
same error is present in efi_low_alloc(), although failure will only happen
if the lowest memory block is small.
Also use EFI_PAGE_SIZE consistently and remove use of EFI_PAGE_SHIFT to
calculate page size.

Signed-off-by: Roy Franz 
Signed-off-by: Ard Biesheuvel 
Cc: Matt Fleming 
---
 drivers/firmware/efi/libstub/efi-stub-helper.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c 
b/drivers/firmware/efi/libstub/efi-stub-helper.c
index aded10662020..4b74bf86c74d 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -186,14 +186,16 @@ efi_status_t efi_high_alloc(efi_system_table_t 
*sys_table_arg,
goto fail;
 
/*
-* Enforce minimum alignment that EFI requires when requesting
-* a specific address.  We are doing page-based allocations,
-* so we must be aligned to a page.
+* Enforce minimum alignment that EFI or Linux requires when
+* requesting a specific address.  We are doing page-based (or
+* larger) allocations, and both the address and size must meet
+* alignment constraints.
 */
if (align < EFI_ALLOC_ALIGN)
align = EFI_ALLOC_ALIGN;
 
-   nr_pages = round_up(size, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE;
+   size = round_up(size, EFI_ALLOC_ALIGN);
+   nr_pages = size / EFI_PAGE_SIZE;
 again:
for (i = 0; i < map_size / desc_size; i++) {
efi_memory_desc_t *desc;
@@ -208,7 +210,7 @@ efi_status_t efi_high_alloc(efi_system_table_t 
*sys_table_arg,
continue;
 
start = desc->phys_addr;
-   end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT);
+   end = start + desc->num_pages * EFI_PAGE_SIZE;
 
if (end > max)
end = max;
@@ -278,14 +280,16 @@ efi_status_t efi_low_alloc(efi_system_table_t 
*sys_table_arg,
goto fail;
 
/*
-* Enforce minimum alignment that EFI requires when requesting
-* a specific address.  We are doing page-based allocations,
-* so we must be aligned to a page.
+* Enforce minimum alignment that EFI or Linux requires when
+* requesting a specific address.  We are doing page-based (or
+* larger) allocations, and both the address and size must meet
+* alignment constraints.
 */
if (align < EFI_ALLOC_ALIGN)
align = EFI_ALLOC_ALIGN;
 
-   nr_pages = round_up(size, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE;
+   size = round_up(size, EFI_ALLOC_ALIGN);
+   nr_pages = size / EFI_PAGE_SIZE;
for (i = 0; i < map_size / desc_size; i++) {
efi_memory_desc_t *desc;
unsigned long m = (unsigned long)map;
@@ -300,7 +304,7 @@ efi_status_t efi_low_alloc(efi_system_table_t 
*sys_table_arg,
continue;
 
start = desc->phys_addr;
-   end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT);
+   end = start + desc->num_pages * EFI_PAGE_SIZE;
 
/*
 * Don't allocate at 0x0. It will confuse code that
-- 
2.10.0



[PATCH 6/9] efi: Add device path parser

2016-11-12 Thread Matt Fleming
From: Lukas Wunner <lu...@wunner.de>

We're about to extended the efistub to retrieve device properties from
EFI on Apple Macs. The properties use EFI Device Paths to indicate the
device they belong to. This commit adds a parser which, given an EFI
Device Path, locates the corresponding struct device and returns a
reference to it.

Initially only ACPI and PCI Device Path nodes are supported, these are
the only types needed for Apple device properties (the corresponding
macOS function AppleACPIPlatformExpert::matchEFIDevicePath() does not
support any others). Further node types can be added with little to
moderate effort.

Apple device properties is currently the only use case of this parser,
but Peter Jones intends to use it to match up devices with the
ConInDev/ConOutDev/ErrOutDev variables and add sysfs attributes to these
devices to say the hardware supports using them as console. Thus,
make this parser a separate component which can be selected with config
option EFI_DEV_PATH_PARSER. It can in principle be compiled as a module
if acpi_get_first_physical_node() and acpi_bus_type are exported (and
efi_get_device_by_path() itself is exported).

The dependency on CONFIG_ACPI is needed for acpi_match_device_ids().
It can be removed if an empty inline stub is added for that function.

Signed-off-by: Lukas Wunner <lu...@wunner.de>
Cc: Peter Jones <pjo...@redhat.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Andreas Noever <andreas.noe...@gmail.com>
Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk>
---
 drivers/firmware/efi/Kconfig   |   5 +
 drivers/firmware/efi/Makefile  |   1 +
 drivers/firmware/efi/dev-path-parser.c | 203 +
 include/linux/efi.h|  20 
 4 files changed, 229 insertions(+)
 create mode 100644 drivers/firmware/efi/dev-path-parser.c

diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
index c981be17d3c0..893fda48fcdd 100644
--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -133,3 +133,8 @@ endmenu
 
 config UEFI_CPER
bool
+
+config EFI_DEV_PATH_PARSER
+   bool
+   depends on ACPI
+   default n
diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile
index c8a439f6d715..3e91ae31f9d1 100644
--- a/drivers/firmware/efi/Makefile
+++ b/drivers/firmware/efi/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_EFI_STUB)+= libstub/
 obj-$(CONFIG_EFI_FAKE_MEMMAP)  += fake_mem.o
 obj-$(CONFIG_EFI_BOOTLOADER_CONTROL)   += efibc.o
 obj-$(CONFIG_EFI_TEST) += test/
+obj-$(CONFIG_EFI_DEV_PATH_PARSER)  += dev-path-parser.o
 
 arm-obj-$(CONFIG_EFI)  := arm-init.o arm-runtime.o
 obj-$(CONFIG_ARM)  += $(arm-obj-y)
diff --git a/drivers/firmware/efi/dev-path-parser.c 
b/drivers/firmware/efi/dev-path-parser.c
new file mode 100644
index ..85d1834ee9b7
--- /dev/null
+++ b/drivers/firmware/efi/dev-path-parser.c
@@ -0,0 +1,203 @@
+/*
+ * dev-path-parser.c - EFI Device Path parser
+ * Copyright (C) 2016 Lukas Wunner <lu...@wunner.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2) as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+#include 
+#include 
+
+struct acpi_hid_uid {
+   struct acpi_device_id hid[2];
+   char uid[11]; /* UINT_MAX + null byte */
+};
+
+static int __init match_acpi_dev(struct device *dev, void *data)
+{
+   struct acpi_hid_uid hid_uid = *(struct acpi_hid_uid *)data;
+   struct acpi_device *adev = to_acpi_device(dev);
+
+   if (acpi_match_device_ids(adev, hid_uid.hid))
+   return 0;
+
+   if (adev->pnp.unique_id)
+   return !strcmp(adev->pnp.unique_id, hid_uid.uid);
+   else
+   return !strcmp("0", hid_uid.uid);
+}
+
+static long __init parse_acpi_path(struct efi_dev_path *node,
+  struct device *parent, struct device **child)
+{
+   struct acpi_hid_uid hid_uid = {};
+   struct device *phys_dev;
+
+   if (node->length != 12)
+   return -EINVAL;
+
+   sprintf(hid_uid.hid[0].id, "%c%c%c%04X",
+   'A' + ((node->acpi.hid >> 10) & 0x1f) - 1,
+   'A' + ((node->acpi.hid >>  5) & 0x1f) - 1,
+   'A' + ((node->acpi.hid >>  0) & 0x1f) - 1,
+   node->acpi.h

[PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table

2016-11-12 Thread Matt Fleming
From: Ard Biesheuvel <ard.biesheu...@linaro.org>

Invoke the EFI_RNG_PROTOCOL protocol in the context of the stub and
install the Linux-specific RNG seed UEFI config table. This will be
picked up by the EFI routines in the core kernel to seed the kernel
entropy pool.

Cc: Matt Fleming <m...@codeblueprint.co.uk>
Reviewed-by: Kees Cook <keesc...@chromium.org>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 drivers/firmware/efi/libstub/arm-stub.c |  2 ++
 drivers/firmware/efi/libstub/efistub.h  |  2 ++
 drivers/firmware/efi/libstub/random.c   | 48 +
 include/linux/efi.h |  1 +
 4 files changed, 53 insertions(+)

diff --git a/drivers/firmware/efi/libstub/arm-stub.c 
b/drivers/firmware/efi/libstub/arm-stub.c
index 993aa56755f6..b4f7d78f9e8b 100644
--- a/drivers/firmware/efi/libstub/arm-stub.c
+++ b/drivers/firmware/efi/libstub/arm-stub.c
@@ -340,6 +340,8 @@ unsigned long efi_entry(void *handle, efi_system_table_t 
*sys_table,
if (status != EFI_SUCCESS)
pr_efi_err(sys_table, "Failed initrd from command line!\n");
 
+   efi_random_get_seed(sys_table);
+
new_fdt_addr = fdt_addr;
status = allocate_new_fdt_and_exit_boot(sys_table, handle,
_fdt_addr, dram_base + MAX_FDT_OFFSET,
diff --git a/drivers/firmware/efi/libstub/efistub.h 
b/drivers/firmware/efi/libstub/efistub.h
index fe1f22584c69..b98824e3800a 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -71,4 +71,6 @@ efi_status_t efi_random_alloc(efi_system_table_t 
*sys_table_arg,
 
 efi_status_t check_platform_features(efi_system_table_t *sys_table_arg);
 
+efi_status_t efi_random_get_seed(efi_system_table_t *sys_table_arg);
+
 #endif
diff --git a/drivers/firmware/efi/libstub/random.c 
b/drivers/firmware/efi/libstub/random.c
index f8e2e5ae6872..3a3feacc329f 100644
--- a/drivers/firmware/efi/libstub/random.c
+++ b/drivers/firmware/efi/libstub/random.c
@@ -143,3 +143,51 @@ efi_status_t efi_random_alloc(efi_system_table_t 
*sys_table_arg,
 
return status;
 }
+
+#define RANDOM_SEED_SIZE   32
+
+efi_status_t efi_random_get_seed(efi_system_table_t *sys_table_arg)
+{
+   efi_guid_t rng_proto = EFI_RNG_PROTOCOL_GUID;
+   efi_guid_t rng_algo_raw = EFI_RNG_ALGORITHM_RAW;
+   efi_guid_t rng_table_guid = LINUX_EFI_RANDOM_SEED_TABLE_GUID;
+   struct efi_rng_protocol *rng;
+   struct linux_efi_random_seed *seed;
+   efi_status_t status;
+
+   status = efi_call_early(locate_protocol, _proto, NULL,
+   (void **));
+   if (status != EFI_SUCCESS)
+   return status;
+
+   status = efi_call_early(allocate_pool, EFI_RUNTIME_SERVICES_DATA,
+   sizeof(*seed) + RANDOM_SEED_SIZE,
+   (void **));
+   if (status != EFI_SUCCESS)
+   return status;
+
+   status = rng->get_rng(rng, _algo_raw, RANDOM_SEED_SIZE,
+ seed->bits);
+   if (status == EFI_UNSUPPORTED)
+   /*
+* Use whatever algorithm we have available if the raw algorithm
+* is not implemented.
+*/
+   status = rng->get_rng(rng, NULL, RANDOM_SEED_SIZE,
+ seed->bits);
+
+   if (status != EFI_SUCCESS)
+   goto err_freepool;
+
+   seed->size = RANDOM_SEED_SIZE;
+   status = efi_call_early(install_configuration_table, _table_guid,
+   seed);
+   if (status != EFI_SUCCESS)
+   goto err_freepool;
+
+   return EFI_SUCCESS;
+
+err_freepool:
+   efi_call_early(free_pool, seed);
+   return status;
+}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 85e28b138cdd..f5a821d9b90c 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -589,6 +589,7 @@ void efi_native_runtime_setup(void);
 #define DEVICE_TREE_GUID   EFI_GUID(0xb1b621d5, 0xf19c, 
0x41a5,  0x83, 0x0b, 0xd9, 0x15, 0x2c, 0x69, 0xaa, 0xe0)
 #define EFI_PROPERTIES_TABLE_GUID  EFI_GUID(0x880aaca3, 0x4adc, 
0x4a04,  0x90, 0x79, 0xb7, 0x47, 0x34, 0x08, 0x25, 0xe5)
 #define EFI_RNG_PROTOCOL_GUID  EFI_GUID(0x3152bca5, 0xeade, 
0x433d,  0x86, 0x2e, 0xc0, 0x1c, 0xdc, 0x29, 0x1f, 0x44)
+#define EFI_RNG_ALGORITHM_RAW  EFI_GUID(0xe43176d7, 0xb6e8, 
0x4827,  0xb7, 0x84, 0x7f, 0xfd, 0xc4, 0xb6, 0x85, 0x61)
 #define EFI_MEMORY_ATTRIBUTES_TABLE_GUID   EFI_GUID(0xdcfa911d, 0x26eb, 
0x469f,  0xa2, 0x20, 0x38, 0xb7, 0xdc, 0x46, 0x12, 0x20)
 #define EFI_CONSOLE_OUT_DEVICE_GUIDEFI_GUID(0xd3b36f2c, 0xd551, 
0x11d4,  0x9a, 0x46, 0x00, 0x90, 0x27, 0x3f, 0xc1, 0x4d)
 
-- 
2.10.0



[PATCH 2/9] MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem

2016-11-12 Thread Matt Fleming
From: Ard Biesheuvel <ard.biesheu...@linaro.org>

Since I will be co-maintaining the EFI subsystem, it makes sense to
mention the ARM and arm64 EFI bits in the EFI section in MAINTAINERS
so that Matt, the list and I get cc'ed on proposed changes.

Acked-by: Will Deacon <will.dea...@arm.com>
Acked-by: Russell King <rmk+ker...@armlinux.org.uk>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 MAINTAINERS | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6847ba844ef9..1f38999d8ce7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4625,12 +4625,14 @@ L:  linux-...@vger.kernel.org
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git
 S: Maintained
 F: Documentation/efi-stub.txt
-F: arch/ia64/kernel/efi.c
+F: arch/*/kernel/efi.c
 F: arch/x86/boot/compressed/eboot.[ch]
-F: arch/x86/include/asm/efi.h
+F: arch/*/include/asm/efi.h
 F: arch/x86/platform/efi/
 F: drivers/firmware/efi/
 F: include/linux/efi*.h
+F: arch/arm/boot/compressed/efi-header.S
+F: arch/arm64/kernel/efi-entry.S
 
 EFI VARIABLE FILESYSTEM
 M: Matthew Garrett <matthew.garr...@nebula.com>
-- 
2.10.0



[PATCH 4/9] efi/libstub: Add random.c to ARM build

2016-11-12 Thread Matt Fleming
From: Ard Biesheuvel <ard.biesheu...@linaro.org>

Make random.c build for ARM by moving the fallback definition of
EFI_ALLOC_ALIGN to efistub.h, and replacing a division by a value
we know to be a power of 2 with a right shift (this is required since
ARM does not have any integer division helper routines in its decompressor)

Cc: Matt Fleming <m...@codeblueprint.co.uk>
Reviewed-by: Kees Cook <keesc...@chromium.org>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 drivers/firmware/efi/libstub/Makefile  | 4 ++--
 drivers/firmware/efi/libstub/efi-stub-helper.c | 9 -
 drivers/firmware/efi/libstub/efistub.h | 9 +
 drivers/firmware/efi/libstub/random.c  | 8 +---
 4 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/firmware/efi/libstub/Makefile 
b/drivers/firmware/efi/libstub/Makefile
index c06945160a41..40ddf8f763a8 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -36,11 +36,11 @@ arm-deps := fdt_rw.c fdt_ro.c fdt_wip.c fdt.c 
fdt_empty_tree.c fdt_sw.c sort.c
 $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
$(call if_changed_rule,cc_o_c)
 
-lib-$(CONFIG_EFI_ARMSTUB)  += arm-stub.o fdt.o string.o \
+lib-$(CONFIG_EFI_ARMSTUB)  += arm-stub.o fdt.o string.o random.o \
   $(patsubst %.c,lib-%.o,$(arm-deps))
 
 lib-$(CONFIG_ARM)  += arm32-stub.o
-lib-$(CONFIG_ARM64)+= arm64-stub.o random.o
+lib-$(CONFIG_ARM64)+= arm64-stub.o
 CFLAGS_arm64-stub.o:= -DTEXT_OFFSET=$(TEXT_OFFSET)
 
 #
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c 
b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 4b74bf86c74d..757badc1debb 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -32,15 +32,6 @@
 
 static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
 
-/*
- * Allow the platform to override the allocation granularity: this allows
- * systems that have the capability to run with a larger page size to deal
- * with the allocations for initrd and fdt more efficiently.
- */
-#ifndef EFI_ALLOC_ALIGN
-#define EFI_ALLOC_ALIGNEFI_PAGE_SIZE
-#endif
-
 #define EFI_MMAP_NR_SLACK_SLOTS8
 
 struct file_info {
diff --git a/drivers/firmware/efi/libstub/efistub.h 
b/drivers/firmware/efi/libstub/efistub.h
index ee49cd23ee63..fe1f22584c69 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -15,6 +15,15 @@
  */
 #undef __init
 
+/*
+ * Allow the platform to override the allocation granularity: this allows
+ * systems that have the capability to run with a larger page size to deal
+ * with the allocations for initrd and fdt more efficiently.
+ */
+#ifndef EFI_ALLOC_ALIGN
+#define EFI_ALLOC_ALIGNEFI_PAGE_SIZE
+#endif
+
 void efi_char16_printk(efi_system_table_t *, efi_char16_t *);
 
 efi_status_t efi_open_volume(efi_system_table_t *sys_table_arg, void *__image,
diff --git a/drivers/firmware/efi/libstub/random.c 
b/drivers/firmware/efi/libstub/random.c
index 0c9f58c5ba50..f8e2e5ae6872 100644
--- a/drivers/firmware/efi/libstub/random.c
+++ b/drivers/firmware/efi/libstub/random.c
@@ -8,6 +8,7 @@
  */
 
 #include 
+#include 
 #include 
 
 #include "efistub.h"
@@ -41,8 +42,9 @@ efi_status_t efi_get_random_bytes(efi_system_table_t 
*sys_table_arg,
  */
 static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
 unsigned long size,
-unsigned long align)
+unsigned long align_shift)
 {
+   unsigned long align = 1UL << align_shift;
u64 start, end;
 
if (md->type != EFI_CONVENTIONAL_MEMORY)
@@ -55,7 +57,7 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t 
*md,
if (start > end)
return 0;
 
-   return (end - start + 1) / align;
+   return (end - start + 1) >> align_shift;
 }
 
 /*
@@ -98,7 +100,7 @@ efi_status_t efi_random_alloc(efi_system_table_t 
*sys_table_arg,
efi_memory_desc_t *md = (void *)memory_map + map_offset;
unsigned long slots;
 
-   slots = get_entry_num_slots(md, size, align);
+   slots = get_entry_num_slots(md, size, ilog2(align));
MD_NUM_SLOTS(md) = slots;
total_slots += slots;
}
-- 
2.10.0



[PATCH 9/9] thunderbolt: Use Device ROM retrieved from EFI

2016-11-12 Thread Matt Fleming
From: Lukas Wunner <lu...@wunner.de>

Macs with Thunderbolt 1 do not have a unit-specific DROM: The DROM is
empty with uid 0x1. (Apple started factory-burning a unit-
specific DROM with Thunderbolt 2.)

Instead, the NHI EFI driver supplies a DROM in a device property. Use
it if available. It's only available when booting with the efistub.
If it's not available, silently fall back to our hardcoded DROM.

The size of the DROM is always 256 bytes. The number is hardcoded into
the NHI EFI driver. This commit can deal with an arbitrary size however,
just in case they ever change that.

Background information: The EFI firmware volume contains ROM files for
the NHI, GMUX and several other chips as well as key material. This
strategy allows Apple to deploy ROM or key updates by simply publishing
an EFI firmware update on their website. Drivers do not access those
files directly but rather through a file server via EFI protocol
AC5E4829-A8FD-440B-AF33-9FFE013B12D8. Files are identified by GUID, the
NHI DROM has 339370BD-CFC6-4454-8EF7-704653120818.

The NHI EFI driver amends that file with a unit-specific uid. The uid
has 64 bit but its entropy is much lower: 24 bit represent the model,
24 bit are taken from a serial number, 16 bit are fixed. The NHI EFI
driver obtains the serial number via the DataHub protocol, copies it
into the DROM, calculates the CRC and submits the result as a device
property.

A modification is needed in the resume code where we currently read the
uid of all switches in the hierarchy to detect plug events that occurred
during sleep. On Thunderbolt 1 root switches this will now lead to a
mismatch between the uid of the empty DROM and the EFI DROM. Exempt the
root switch from this check: It's built in, so the uid should never
change. However we continue to *read* the uid of the root switch, this
seems like a good way to test its reachability after resume.

Signed-off-by: Lukas Wunner <lu...@wunner.de>
Tested-by: Lukas Wunner <lu...@wunner.de> [MacBookPro9,1]
Tested-by: Pierre Moreau <pierre.mor...@free.fr> [MacBookPro11,3]
Acked-by: Andreas Noever <andreas.noe...@gmail.com>
Cc: Pedro Vilaça <rever...@put.as>
Cc: Peter Jones <pjo...@redhat.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk>
---
 drivers/thunderbolt/Kconfig  |  1 +
 drivers/thunderbolt/eeprom.c | 43 +++
 drivers/thunderbolt/switch.c |  2 +-
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/thunderbolt/Kconfig b/drivers/thunderbolt/Kconfig
index c121acc15bfe..0056df7f3c09 100644
--- a/drivers/thunderbolt/Kconfig
+++ b/drivers/thunderbolt/Kconfig
@@ -1,6 +1,7 @@
 menuconfig THUNDERBOLT
tristate "Thunderbolt support for Apple devices"
depends on PCI
+   select APPLE_PROPERTIES
select CRC32
help
  Cactus Ridge Thunderbolt Controller driver
diff --git a/drivers/thunderbolt/eeprom.c b/drivers/thunderbolt/eeprom.c
index 2b9602c2c355..6392990c984d 100644
--- a/drivers/thunderbolt/eeprom.c
+++ b/drivers/thunderbolt/eeprom.c
@@ -5,6 +5,7 @@
  */
 
 #include 
+#include 
 #include 
 #include "tb.h"
 
@@ -360,6 +361,40 @@ static int tb_drom_parse_entries(struct tb_switch *sw)
 }
 
 /**
+ * tb_drom_copy_efi - copy drom supplied by EFI to sw->drom if present
+ */
+static int tb_drom_copy_efi(struct tb_switch *sw, u16 *size)
+{
+   struct device *dev = >tb->nhi->pdev->dev;
+   int len, res;
+
+   len = device_property_read_u8_array(dev, "ThunderboltDROM", NULL, 0);
+   if (len < 0 || len < sizeof(struct tb_drom_header))
+   return -EINVAL;
+
+   sw->drom = kmalloc(len, GFP_KERNEL);
+   if (!sw->drom)
+   return -ENOMEM;
+
+   res = device_property_read_u8_array(dev, "ThunderboltDROM", sw->drom,
+   len);
+   if (res)
+   goto err;
+
+   *size = ((struct tb_drom_header *)sw->drom)->data_len +
+ TB_DROM_DATA_START;
+   if (*size > len)
+   goto err;
+
+   return 0;
+
+err:
+   kfree(sw->drom);
+   sw->drom = NULL;
+   return -EINVAL;
+}
+
+/**
  * tb_drom_read - copy drom to sw->drom and parse it
  */
 int tb_drom_read(struct tb_switch *sw)
@@ -374,6 +409,13 @@ int tb_drom_read(struct tb_switch *sw)
 
if (tb_route(sw) == 0) {
/*
+* Apple's NHI EFI driver supplies a DROM for the root switch
+* in a device property. Use it if available.
+*/
+   if (tb_drom_copy_efi(sw, ) == 0)
+   goto parse;
+
+   /*
 * The root switch contains only a dummy drom (header only,
 

[PATCH 6/9] efi: Add device path parser

2016-11-12 Thread Matt Fleming
From: Lukas Wunner 

We're about to extended the efistub to retrieve device properties from
EFI on Apple Macs. The properties use EFI Device Paths to indicate the
device they belong to. This commit adds a parser which, given an EFI
Device Path, locates the corresponding struct device and returns a
reference to it.

Initially only ACPI and PCI Device Path nodes are supported, these are
the only types needed for Apple device properties (the corresponding
macOS function AppleACPIPlatformExpert::matchEFIDevicePath() does not
support any others). Further node types can be added with little to
moderate effort.

Apple device properties is currently the only use case of this parser,
but Peter Jones intends to use it to match up devices with the
ConInDev/ConOutDev/ErrOutDev variables and add sysfs attributes to these
devices to say the hardware supports using them as console. Thus,
make this parser a separate component which can be selected with config
option EFI_DEV_PATH_PARSER. It can in principle be compiled as a module
if acpi_get_first_physical_node() and acpi_bus_type are exported (and
efi_get_device_by_path() itself is exported).

The dependency on CONFIG_ACPI is needed for acpi_match_device_ids().
It can be removed if an empty inline stub is added for that function.

Signed-off-by: Lukas Wunner 
Cc: Peter Jones 
Cc: Ard Biesheuvel 
Cc: Andreas Noever 
Signed-off-by: Matt Fleming 
---
 drivers/firmware/efi/Kconfig   |   5 +
 drivers/firmware/efi/Makefile  |   1 +
 drivers/firmware/efi/dev-path-parser.c | 203 +
 include/linux/efi.h|  20 
 4 files changed, 229 insertions(+)
 create mode 100644 drivers/firmware/efi/dev-path-parser.c

diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
index c981be17d3c0..893fda48fcdd 100644
--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -133,3 +133,8 @@ endmenu
 
 config UEFI_CPER
bool
+
+config EFI_DEV_PATH_PARSER
+   bool
+   depends on ACPI
+   default n
diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile
index c8a439f6d715..3e91ae31f9d1 100644
--- a/drivers/firmware/efi/Makefile
+++ b/drivers/firmware/efi/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_EFI_STUB)+= libstub/
 obj-$(CONFIG_EFI_FAKE_MEMMAP)  += fake_mem.o
 obj-$(CONFIG_EFI_BOOTLOADER_CONTROL)   += efibc.o
 obj-$(CONFIG_EFI_TEST) += test/
+obj-$(CONFIG_EFI_DEV_PATH_PARSER)  += dev-path-parser.o
 
 arm-obj-$(CONFIG_EFI)  := arm-init.o arm-runtime.o
 obj-$(CONFIG_ARM)  += $(arm-obj-y)
diff --git a/drivers/firmware/efi/dev-path-parser.c 
b/drivers/firmware/efi/dev-path-parser.c
new file mode 100644
index ..85d1834ee9b7
--- /dev/null
+++ b/drivers/firmware/efi/dev-path-parser.c
@@ -0,0 +1,203 @@
+/*
+ * dev-path-parser.c - EFI Device Path parser
+ * Copyright (C) 2016 Lukas Wunner 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2) as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+#include 
+#include 
+
+struct acpi_hid_uid {
+   struct acpi_device_id hid[2];
+   char uid[11]; /* UINT_MAX + null byte */
+};
+
+static int __init match_acpi_dev(struct device *dev, void *data)
+{
+   struct acpi_hid_uid hid_uid = *(struct acpi_hid_uid *)data;
+   struct acpi_device *adev = to_acpi_device(dev);
+
+   if (acpi_match_device_ids(adev, hid_uid.hid))
+   return 0;
+
+   if (adev->pnp.unique_id)
+   return !strcmp(adev->pnp.unique_id, hid_uid.uid);
+   else
+   return !strcmp("0", hid_uid.uid);
+}
+
+static long __init parse_acpi_path(struct efi_dev_path *node,
+  struct device *parent, struct device **child)
+{
+   struct acpi_hid_uid hid_uid = {};
+   struct device *phys_dev;
+
+   if (node->length != 12)
+   return -EINVAL;
+
+   sprintf(hid_uid.hid[0].id, "%c%c%c%04X",
+   'A' + ((node->acpi.hid >> 10) & 0x1f) - 1,
+   'A' + ((node->acpi.hid >>  5) & 0x1f) - 1,
+   'A' + ((node->acpi.hid >>  0) & 0x1f) - 1,
+   node->acpi.hid >> 16);
+   sprintf(hid_uid.uid, "%u", node->acpi.uid);
+
+   *child = bus_find_device(_bus_type, NULL, _uid,
+

[PATCH 5/9] efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table

2016-11-12 Thread Matt Fleming
From: Ard Biesheuvel 

Invoke the EFI_RNG_PROTOCOL protocol in the context of the stub and
install the Linux-specific RNG seed UEFI config table. This will be
picked up by the EFI routines in the core kernel to seed the kernel
entropy pool.

Cc: Matt Fleming 
Reviewed-by: Kees Cook 
Signed-off-by: Ard Biesheuvel 
---
 drivers/firmware/efi/libstub/arm-stub.c |  2 ++
 drivers/firmware/efi/libstub/efistub.h  |  2 ++
 drivers/firmware/efi/libstub/random.c   | 48 +
 include/linux/efi.h |  1 +
 4 files changed, 53 insertions(+)

diff --git a/drivers/firmware/efi/libstub/arm-stub.c 
b/drivers/firmware/efi/libstub/arm-stub.c
index 993aa56755f6..b4f7d78f9e8b 100644
--- a/drivers/firmware/efi/libstub/arm-stub.c
+++ b/drivers/firmware/efi/libstub/arm-stub.c
@@ -340,6 +340,8 @@ unsigned long efi_entry(void *handle, efi_system_table_t 
*sys_table,
if (status != EFI_SUCCESS)
pr_efi_err(sys_table, "Failed initrd from command line!\n");
 
+   efi_random_get_seed(sys_table);
+
new_fdt_addr = fdt_addr;
status = allocate_new_fdt_and_exit_boot(sys_table, handle,
_fdt_addr, dram_base + MAX_FDT_OFFSET,
diff --git a/drivers/firmware/efi/libstub/efistub.h 
b/drivers/firmware/efi/libstub/efistub.h
index fe1f22584c69..b98824e3800a 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -71,4 +71,6 @@ efi_status_t efi_random_alloc(efi_system_table_t 
*sys_table_arg,
 
 efi_status_t check_platform_features(efi_system_table_t *sys_table_arg);
 
+efi_status_t efi_random_get_seed(efi_system_table_t *sys_table_arg);
+
 #endif
diff --git a/drivers/firmware/efi/libstub/random.c 
b/drivers/firmware/efi/libstub/random.c
index f8e2e5ae6872..3a3feacc329f 100644
--- a/drivers/firmware/efi/libstub/random.c
+++ b/drivers/firmware/efi/libstub/random.c
@@ -143,3 +143,51 @@ efi_status_t efi_random_alloc(efi_system_table_t 
*sys_table_arg,
 
return status;
 }
+
+#define RANDOM_SEED_SIZE   32
+
+efi_status_t efi_random_get_seed(efi_system_table_t *sys_table_arg)
+{
+   efi_guid_t rng_proto = EFI_RNG_PROTOCOL_GUID;
+   efi_guid_t rng_algo_raw = EFI_RNG_ALGORITHM_RAW;
+   efi_guid_t rng_table_guid = LINUX_EFI_RANDOM_SEED_TABLE_GUID;
+   struct efi_rng_protocol *rng;
+   struct linux_efi_random_seed *seed;
+   efi_status_t status;
+
+   status = efi_call_early(locate_protocol, _proto, NULL,
+   (void **));
+   if (status != EFI_SUCCESS)
+   return status;
+
+   status = efi_call_early(allocate_pool, EFI_RUNTIME_SERVICES_DATA,
+   sizeof(*seed) + RANDOM_SEED_SIZE,
+   (void **));
+   if (status != EFI_SUCCESS)
+   return status;
+
+   status = rng->get_rng(rng, _algo_raw, RANDOM_SEED_SIZE,
+ seed->bits);
+   if (status == EFI_UNSUPPORTED)
+   /*
+* Use whatever algorithm we have available if the raw algorithm
+* is not implemented.
+*/
+   status = rng->get_rng(rng, NULL, RANDOM_SEED_SIZE,
+ seed->bits);
+
+   if (status != EFI_SUCCESS)
+   goto err_freepool;
+
+   seed->size = RANDOM_SEED_SIZE;
+   status = efi_call_early(install_configuration_table, _table_guid,
+   seed);
+   if (status != EFI_SUCCESS)
+   goto err_freepool;
+
+   return EFI_SUCCESS;
+
+err_freepool:
+   efi_call_early(free_pool, seed);
+   return status;
+}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 85e28b138cdd..f5a821d9b90c 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -589,6 +589,7 @@ void efi_native_runtime_setup(void);
 #define DEVICE_TREE_GUID   EFI_GUID(0xb1b621d5, 0xf19c, 
0x41a5,  0x83, 0x0b, 0xd9, 0x15, 0x2c, 0x69, 0xaa, 0xe0)
 #define EFI_PROPERTIES_TABLE_GUID  EFI_GUID(0x880aaca3, 0x4adc, 
0x4a04,  0x90, 0x79, 0xb7, 0x47, 0x34, 0x08, 0x25, 0xe5)
 #define EFI_RNG_PROTOCOL_GUID  EFI_GUID(0x3152bca5, 0xeade, 
0x433d,  0x86, 0x2e, 0xc0, 0x1c, 0xdc, 0x29, 0x1f, 0x44)
+#define EFI_RNG_ALGORITHM_RAW  EFI_GUID(0xe43176d7, 0xb6e8, 
0x4827,  0xb7, 0x84, 0x7f, 0xfd, 0xc4, 0xb6, 0x85, 0x61)
 #define EFI_MEMORY_ATTRIBUTES_TABLE_GUID   EFI_GUID(0xdcfa911d, 0x26eb, 
0x469f,  0xa2, 0x20, 0x38, 0xb7, 0xdc, 0x46, 0x12, 0x20)
 #define EFI_CONSOLE_OUT_DEVICE_GUIDEFI_GUID(0xd3b36f2c, 0xd551, 
0x11d4,  0x9a, 0x46, 0x00, 0x90, 0x27, 0x3f, 0xc1, 0x4d)
 
-- 
2.10.0



[PATCH 2/9] MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem

2016-11-12 Thread Matt Fleming
From: Ard Biesheuvel 

Since I will be co-maintaining the EFI subsystem, it makes sense to
mention the ARM and arm64 EFI bits in the EFI section in MAINTAINERS
so that Matt, the list and I get cc'ed on proposed changes.

Acked-by: Will Deacon 
Acked-by: Russell King 
Cc: Matt Fleming 
Signed-off-by: Ard Biesheuvel 
---
 MAINTAINERS | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6847ba844ef9..1f38999d8ce7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4625,12 +4625,14 @@ L:  linux-...@vger.kernel.org
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git
 S: Maintained
 F: Documentation/efi-stub.txt
-F: arch/ia64/kernel/efi.c
+F: arch/*/kernel/efi.c
 F: arch/x86/boot/compressed/eboot.[ch]
-F: arch/x86/include/asm/efi.h
+F: arch/*/include/asm/efi.h
 F: arch/x86/platform/efi/
 F: drivers/firmware/efi/
 F: include/linux/efi*.h
+F: arch/arm/boot/compressed/efi-header.S
+F: arch/arm64/kernel/efi-entry.S
 
 EFI VARIABLE FILESYSTEM
 M: Matthew Garrett 
-- 
2.10.0



[PATCH 4/9] efi/libstub: Add random.c to ARM build

2016-11-12 Thread Matt Fleming
From: Ard Biesheuvel 

Make random.c build for ARM by moving the fallback definition of
EFI_ALLOC_ALIGN to efistub.h, and replacing a division by a value
we know to be a power of 2 with a right shift (this is required since
ARM does not have any integer division helper routines in its decompressor)

Cc: Matt Fleming 
Reviewed-by: Kees Cook 
Signed-off-by: Ard Biesheuvel 
---
 drivers/firmware/efi/libstub/Makefile  | 4 ++--
 drivers/firmware/efi/libstub/efi-stub-helper.c | 9 -
 drivers/firmware/efi/libstub/efistub.h | 9 +
 drivers/firmware/efi/libstub/random.c  | 8 +---
 4 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/firmware/efi/libstub/Makefile 
b/drivers/firmware/efi/libstub/Makefile
index c06945160a41..40ddf8f763a8 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -36,11 +36,11 @@ arm-deps := fdt_rw.c fdt_ro.c fdt_wip.c fdt.c 
fdt_empty_tree.c fdt_sw.c sort.c
 $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
$(call if_changed_rule,cc_o_c)
 
-lib-$(CONFIG_EFI_ARMSTUB)  += arm-stub.o fdt.o string.o \
+lib-$(CONFIG_EFI_ARMSTUB)  += arm-stub.o fdt.o string.o random.o \
   $(patsubst %.c,lib-%.o,$(arm-deps))
 
 lib-$(CONFIG_ARM)  += arm32-stub.o
-lib-$(CONFIG_ARM64)+= arm64-stub.o random.o
+lib-$(CONFIG_ARM64)+= arm64-stub.o
 CFLAGS_arm64-stub.o:= -DTEXT_OFFSET=$(TEXT_OFFSET)
 
 #
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c 
b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 4b74bf86c74d..757badc1debb 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -32,15 +32,6 @@
 
 static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
 
-/*
- * Allow the platform to override the allocation granularity: this allows
- * systems that have the capability to run with a larger page size to deal
- * with the allocations for initrd and fdt more efficiently.
- */
-#ifndef EFI_ALLOC_ALIGN
-#define EFI_ALLOC_ALIGNEFI_PAGE_SIZE
-#endif
-
 #define EFI_MMAP_NR_SLACK_SLOTS8
 
 struct file_info {
diff --git a/drivers/firmware/efi/libstub/efistub.h 
b/drivers/firmware/efi/libstub/efistub.h
index ee49cd23ee63..fe1f22584c69 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -15,6 +15,15 @@
  */
 #undef __init
 
+/*
+ * Allow the platform to override the allocation granularity: this allows
+ * systems that have the capability to run with a larger page size to deal
+ * with the allocations for initrd and fdt more efficiently.
+ */
+#ifndef EFI_ALLOC_ALIGN
+#define EFI_ALLOC_ALIGNEFI_PAGE_SIZE
+#endif
+
 void efi_char16_printk(efi_system_table_t *, efi_char16_t *);
 
 efi_status_t efi_open_volume(efi_system_table_t *sys_table_arg, void *__image,
diff --git a/drivers/firmware/efi/libstub/random.c 
b/drivers/firmware/efi/libstub/random.c
index 0c9f58c5ba50..f8e2e5ae6872 100644
--- a/drivers/firmware/efi/libstub/random.c
+++ b/drivers/firmware/efi/libstub/random.c
@@ -8,6 +8,7 @@
  */
 
 #include 
+#include 
 #include 
 
 #include "efistub.h"
@@ -41,8 +42,9 @@ efi_status_t efi_get_random_bytes(efi_system_table_t 
*sys_table_arg,
  */
 static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
 unsigned long size,
-unsigned long align)
+unsigned long align_shift)
 {
+   unsigned long align = 1UL << align_shift;
u64 start, end;
 
if (md->type != EFI_CONVENTIONAL_MEMORY)
@@ -55,7 +57,7 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t 
*md,
if (start > end)
return 0;
 
-   return (end - start + 1) / align;
+   return (end - start + 1) >> align_shift;
 }
 
 /*
@@ -98,7 +100,7 @@ efi_status_t efi_random_alloc(efi_system_table_t 
*sys_table_arg,
efi_memory_desc_t *md = (void *)memory_map + map_offset;
unsigned long slots;
 
-   slots = get_entry_num_slots(md, size, align);
+   slots = get_entry_num_slots(md, size, ilog2(align));
MD_NUM_SLOTS(md) = slots;
total_slots += slots;
}
-- 
2.10.0



[PATCH 9/9] thunderbolt: Use Device ROM retrieved from EFI

2016-11-12 Thread Matt Fleming
From: Lukas Wunner 

Macs with Thunderbolt 1 do not have a unit-specific DROM: The DROM is
empty with uid 0x1. (Apple started factory-burning a unit-
specific DROM with Thunderbolt 2.)

Instead, the NHI EFI driver supplies a DROM in a device property. Use
it if available. It's only available when booting with the efistub.
If it's not available, silently fall back to our hardcoded DROM.

The size of the DROM is always 256 bytes. The number is hardcoded into
the NHI EFI driver. This commit can deal with an arbitrary size however,
just in case they ever change that.

Background information: The EFI firmware volume contains ROM files for
the NHI, GMUX and several other chips as well as key material. This
strategy allows Apple to deploy ROM or key updates by simply publishing
an EFI firmware update on their website. Drivers do not access those
files directly but rather through a file server via EFI protocol
AC5E4829-A8FD-440B-AF33-9FFE013B12D8. Files are identified by GUID, the
NHI DROM has 339370BD-CFC6-4454-8EF7-704653120818.

The NHI EFI driver amends that file with a unit-specific uid. The uid
has 64 bit but its entropy is much lower: 24 bit represent the model,
24 bit are taken from a serial number, 16 bit are fixed. The NHI EFI
driver obtains the serial number via the DataHub protocol, copies it
into the DROM, calculates the CRC and submits the result as a device
property.

A modification is needed in the resume code where we currently read the
uid of all switches in the hierarchy to detect plug events that occurred
during sleep. On Thunderbolt 1 root switches this will now lead to a
mismatch between the uid of the empty DROM and the EFI DROM. Exempt the
root switch from this check: It's built in, so the uid should never
change. However we continue to *read* the uid of the root switch, this
seems like a good way to test its reachability after resume.

Signed-off-by: Lukas Wunner 
Tested-by: Lukas Wunner  [MacBookPro9,1]
Tested-by: Pierre Moreau  [MacBookPro11,3]
Acked-by: Andreas Noever 
Cc: Pedro Vilaça 
Cc: Peter Jones 
Cc: Ard Biesheuvel 
Signed-off-by: Matt Fleming 
---
 drivers/thunderbolt/Kconfig  |  1 +
 drivers/thunderbolt/eeprom.c | 43 +++
 drivers/thunderbolt/switch.c |  2 +-
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/thunderbolt/Kconfig b/drivers/thunderbolt/Kconfig
index c121acc15bfe..0056df7f3c09 100644
--- a/drivers/thunderbolt/Kconfig
+++ b/drivers/thunderbolt/Kconfig
@@ -1,6 +1,7 @@
 menuconfig THUNDERBOLT
tristate "Thunderbolt support for Apple devices"
depends on PCI
+   select APPLE_PROPERTIES
select CRC32
help
  Cactus Ridge Thunderbolt Controller driver
diff --git a/drivers/thunderbolt/eeprom.c b/drivers/thunderbolt/eeprom.c
index 2b9602c2c355..6392990c984d 100644
--- a/drivers/thunderbolt/eeprom.c
+++ b/drivers/thunderbolt/eeprom.c
@@ -5,6 +5,7 @@
  */
 
 #include 
+#include 
 #include 
 #include "tb.h"
 
@@ -360,6 +361,40 @@ static int tb_drom_parse_entries(struct tb_switch *sw)
 }
 
 /**
+ * tb_drom_copy_efi - copy drom supplied by EFI to sw->drom if present
+ */
+static int tb_drom_copy_efi(struct tb_switch *sw, u16 *size)
+{
+   struct device *dev = >tb->nhi->pdev->dev;
+   int len, res;
+
+   len = device_property_read_u8_array(dev, "ThunderboltDROM", NULL, 0);
+   if (len < 0 || len < sizeof(struct tb_drom_header))
+   return -EINVAL;
+
+   sw->drom = kmalloc(len, GFP_KERNEL);
+   if (!sw->drom)
+   return -ENOMEM;
+
+   res = device_property_read_u8_array(dev, "ThunderboltDROM", sw->drom,
+   len);
+   if (res)
+   goto err;
+
+   *size = ((struct tb_drom_header *)sw->drom)->data_len +
+ TB_DROM_DATA_START;
+   if (*size > len)
+   goto err;
+
+   return 0;
+
+err:
+   kfree(sw->drom);
+   sw->drom = NULL;
+   return -EINVAL;
+}
+
+/**
  * tb_drom_read - copy drom to sw->drom and parse it
  */
 int tb_drom_read(struct tb_switch *sw)
@@ -374,6 +409,13 @@ int tb_drom_read(struct tb_switch *sw)
 
if (tb_route(sw) == 0) {
/*
+* Apple's NHI EFI driver supplies a DROM for the root switch
+* in a device property. Use it if available.
+*/
+   if (tb_drom_copy_efi(sw, ) == 0)
+   goto parse;
+
+   /*
 * The root switch contains only a dummy drom (header only,
 * no entries). Hardcode the configuration here.
 */
@@ -418,6 +460,7 @@ int tb_drom_read(struct tb_switch *sw)
if (res)
goto err;
 
+parse:
header = (void *) sw->drom;
 
if (header-&g

[PATCH 8/9] x86/efi: Retrieve and assign Apple device properties

2016-11-12 Thread Matt Fleming
ould also work for
devices instantiated in a deferred fashion. It seems like this approach
would be more complicated and require more code. That doesn't seem
justified without a specific use case.

For comparison, the strategy on macOS is to assign properties to objects
in the ACPI namespace (AppleACPIPlatformExpert::mergeEFIProperties()).
That approach is definitely wrong as it fails for devices not present in
the namespace: The NHI EFI driver supplies properties for attached
Thunderbolt devices, yet on Macs with Thunderbolt 1 only one device
level behind the host controller is described in the namespace.
Consequently macOS cannot assign properties for chained devices. With
Thunderbolt 2 they started to describe three device levels behind host
controllers in the namespace but this grossly inflates the SSDT and
still fails if the user daisy-chained more than three devices.

We copy the property names and values from the setup_data payload to
swappable virtual memory and afterwards make the payload available to
the page allocator. This is just for the sake of good housekeeping, it
wouldn't occupy a meaningful amount of physical memory ( bytes on my
machine). Only the payload is freed, not the setup_data header since
otherwise we'd break the list linkage and we cannot safely update the
predecessor's ->next link because there's no locking for the list.

The payload is currently not passed on to kexec'ed kernels, same for PCI
ROMs retrieved by setup_efi_pci(). This can be added later if there is
demand by amending setup_efi_state(). The payload can then no longer be
made available to the page allocator of course.

Tested-by: Lukas Wunner <lu...@wunner.de> [MacBookPro9,1]
Tested-by: Pierre Moreau <pierre.mor...@free.fr> [MacBookPro11,3]
Signed-off-by: Lukas Wunner <lu...@wunner.de>
Cc: grub-de...@gnu.org
Cc: Pedro Vilaça <rever...@put.as>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Andreas Noever <andreas.noe...@gmail.com>
Cc: Peter Jones <pjo...@redhat.com>
Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk>
---
 Documentation/kernel-parameters.txt |   5 +
 arch/x86/boot/compressed/eboot.c|  65 +
 arch/x86/include/uapi/asm/bootparam.h   |   1 +
 drivers/firmware/efi/Kconfig|  13 ++
 drivers/firmware/efi/Makefile   |   1 +
 drivers/firmware/efi/apple-properties.c | 248 
 include/linux/efi.h |  17 +++
 7 files changed, 350 insertions(+)
 create mode 100644 drivers/firmware/efi/apple-properties.c

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 37babf91f2cb..86a31dfc036e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1062,6 +1062,11 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
 
dscc4.setup=[NET]
 
+   dump_apple_properties   [X86]
+   Dump name and content of EFI device properties on
+   x86 Macs.  Useful for driver authors to determine
+   what data is available or for reverse-engineering.
+
dyndbg[="val"]  [KNL,DYNAMIC_DEBUG]
module.dyndbg[="val"]
Enable debug messages at boot time.  See
diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index cc69e37548db..ff01c8fc76f7 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -537,6 +537,69 @@ static void setup_efi_pci(struct boot_params *params)
efi_call_early(free_pool, pci_handle);
 }
 
+static void retrieve_apple_device_properties(struct boot_params *boot_params)
+{
+   efi_guid_t guid = APPLE_PROPERTIES_PROTOCOL_GUID;
+   struct setup_data *data, *new;
+   efi_status_t status;
+   u32 size = 0;
+   void *p;
+
+   status = efi_call_early(locate_protocol, , NULL, );
+   if (status != EFI_SUCCESS)
+   return;
+
+   if (efi_table_attr(apple_properties_protocol, version, p) != 0x1) {
+   efi_printk(sys_table, "Unsupported properties proto version\n");
+   return;
+   }
+
+   efi_call_proto(apple_properties_protocol, get_all, p, NULL, );
+   if (!size)
+   return;
+
+   do {
+   status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
+   size + sizeof(struct setup_data), );
+   if (status != EFI_SUCCESS) {
+   efi_printk(sys_table,
+   "Failed to alloc mem for properties\n");
+   return;
+   }
+
+   status = efi_call_proto(apple_properties_protocol, get_all, p,
+   new->data, );
+
+   if (status == EFI_BUFFER_TOO_SMALL)
+   e

[PATCH 3/9] efi: Add support for seeding the RNG from a UEFI config table

2016-11-12 Thread Matt Fleming
From: Ard Biesheuvel <ard.biesheu...@linaro.org>

Specify a Linux specific UEFI configuration table that carries some
random bits, and use the contents during early boot to seed the kernel's
random number generator. This allows much strong random numbers to be
generated early on.

The entropy is fed to the kernel using add_device_randomness(), which is
documented as being appropriate for being called very early.

Since UEFI configuration tables may also be consumed by kexec'd kernels,
register a reboot notifier that updates the seed in the table.

Note that the config table could be generated by the EFI stub or by any
other UEFI driver or application (e.g., GRUB), but the random seed table
GUID and the associated functionality should be considered an internal
kernel interface (unless it is promoted to ABI later on)

Cc: Matt Fleming <m...@codeblueprint.co.uk>
Reviewed-by: Kees Cook <keesc...@chromium.org>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 drivers/firmware/efi/efi.c | 72 ++
 include/linux/efi.h|  8 ++
 2 files changed, 80 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index a4944e22f294..92914801e388 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -23,7 +23,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -48,6 +51,7 @@ struct efi __read_mostly efi = {
.esrt   = EFI_INVALID_TABLE_ADDR,
.properties_table   = EFI_INVALID_TABLE_ADDR,
.mem_attr_table = EFI_INVALID_TABLE_ADDR,
+   .rng_seed   = EFI_INVALID_TABLE_ADDR,
 };
 EXPORT_SYMBOL(efi);
 
@@ -440,6 +444,7 @@ static __initdata efi_config_table_type_t common_tables[] = 
{
{EFI_SYSTEM_RESOURCE_TABLE_GUID, "ESRT", },
{EFI_PROPERTIES_TABLE_GUID, "PROP", _table},
{EFI_MEMORY_ATTRIBUTES_TABLE_GUID, "MEMATTR", _attr_table},
+   {LINUX_EFI_RANDOM_SEED_TABLE_GUID, "RNG", _seed},
{NULL_GUID, NULL, NULL},
 };
 
@@ -501,6 +506,29 @@ int __init efi_config_parse_tables(void *config_tables, 
int count, int sz,
pr_cont("\n");
set_bit(EFI_CONFIG_TABLES, );
 
+   if (efi.rng_seed != EFI_INVALID_TABLE_ADDR) {
+   struct linux_efi_random_seed *seed;
+   u32 size = 0;
+
+   seed = early_memremap(efi.rng_seed, sizeof(*seed));
+   if (seed != NULL) {
+   size = seed->size;
+   early_memunmap(seed, sizeof(*seed));
+   } else {
+   pr_err("Could not map UEFI random seed!\n");
+   }
+   if (size > 0) {
+   seed = early_memremap(efi.rng_seed,
+ sizeof(*seed) + size);
+   if (seed != NULL) {
+   add_device_randomness(seed->bits, seed->size);
+   early_memunmap(seed, sizeof(*seed) + size);
+   } else {
+   pr_err("Could not map UEFI random seed!\n");
+   }
+   }
+   }
+
/* Parse the EFI Properties table if it exists */
if (efi.properties_table != EFI_INVALID_TABLE_ADDR) {
efi_properties_table_t *tbl;
@@ -824,3 +852,47 @@ int efi_status_to_err(efi_status_t status)
 
return err;
 }
+
+#ifdef CONFIG_KEXEC
+static int update_efi_random_seed(struct notifier_block *nb,
+ unsigned long code, void *unused)
+{
+   struct linux_efi_random_seed *seed;
+   u32 size = 0;
+
+   if (!kexec_in_progress)
+   return NOTIFY_DONE;
+
+   seed = memremap(efi.rng_seed, sizeof(*seed), MEMREMAP_WB);
+   if (seed != NULL) {
+   size = min(seed->size, 32U);
+   memunmap(seed);
+   } else {
+   pr_err("Could not map UEFI random seed!\n");
+   }
+   if (size > 0) {
+   seed = memremap(efi.rng_seed, sizeof(*seed) + size,
+   MEMREMAP_WB);
+   if (seed != NULL) {
+   seed->size = size;
+   get_random_bytes(seed->bits, seed->size);
+   memunmap(seed);
+   } else {
+   pr_err("Could not map UEFI random seed!\n");
+   }
+   }
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block efi_random_seed_nb = {
+   .notifier_call = update_efi_random_seed,
+};
+
+static int register_update_efi_random_seed(void)
+{
+   if (efi.rng_seed == EFI_INVALID_TABLE_ADDR)
+   return 0;
+   return register_reboot_notifier(_random_seed_nb);
+}
+late_init

[PATCH 7/9] efi: Allow bitness-agnostic protocol calls

2016-11-12 Thread Matt Fleming
From: Lukas Wunner <lu...@wunner.de>

We already have a macro to invoke boot services which on x86 adapts
automatically to the bitness of the EFI firmware:  efi_call_early().

The macro allows sharing of functions across arches and bitness variants
as long as those functions only call boot services.  However in practice
functions in the EFI stub contain a mix of boot services calls and
protocol calls.

Add an efi_call_proto() macro for bitness-agnostic protocol calls to
allow sharing more code across arches as well as deduplicating 32 bit
and 64 bit code paths.

On x86, implement it using a new efi_table_attr() macro for bitness-
agnostic table lookups.  Refactor efi_call_early() to make use of the
same macro.  (The resulting object code remains identical.)

Signed-off-by: Lukas Wunner <lu...@wunner.de>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Andreas Noever <andreas.noe...@gmail.com>
Cc: Peter Jones <pjo...@redhat.com>
Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk>
---
 arch/arm/include/asm/efi.h   |  3 +++
 arch/arm64/include/asm/efi.h |  3 +++
 arch/x86/include/asm/efi.h   | 16 +++-
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h
index 766bf9b78160..0b06f5341b45 100644
--- a/arch/arm/include/asm/efi.h
+++ b/arch/arm/include/asm/efi.h
@@ -57,6 +57,9 @@ void efi_virtmap_unload(void);
 #define __efi_call_early(f, ...)   f(__VA_ARGS__)
 #define efi_is_64bit() (false)
 
+#define efi_call_proto(protocol, f, instance, ...) \
+   ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
+
 struct screen_info *alloc_screen_info(efi_system_table_t *sys_table_arg);
 void free_screen_info(efi_system_table_t *sys_table, struct screen_info *si);
 
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index a9e54aad15ef..771b3f0bc757 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -51,6 +51,9 @@ int efi_set_mapping_permissions(struct mm_struct *mm, 
efi_memory_desc_t *md);
 #define __efi_call_early(f, ...)   f(__VA_ARGS__)
 #define efi_is_64bit() (true)
 
+#define efi_call_proto(protocol, f, instance, ...) \
+   ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
+
 #define alloc_screen_info(x...)_info
 #define free_screen_info(x...)
 
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 389d700b961e..e99675b9c861 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -210,12 +210,18 @@ static inline bool efi_is_64bit(void)
return __efi_early()->is64;
 }
 
+#define efi_table_attr(table, attr, instance)  \
+   (efi_is_64bit() ?   \
+   ((table##_64_t *)(unsigned long)instance)->attr :   \
+   ((table##_32_t *)(unsigned long)instance)->attr)
+
+#define efi_call_proto(protocol, f, instance, ...) \
+   __efi_early()->call(efi_table_attr(protocol, f, instance),  \
+   instance, ##__VA_ARGS__)
+
 #define efi_call_early(f, ...) \
-   __efi_early()->call(efi_is_64bit() ?\
-   ((efi_boot_services_64_t *)(unsigned long)  \
-   __efi_early()->boot_services)->f :  \
-   ((efi_boot_services_32_t *)(unsigned long)  \
-   __efi_early()->boot_services)->f, __VA_ARGS__)
+   __efi_early()->call(efi_table_attr(efi_boot_services, f,\
+   __efi_early()->boot_services), __VA_ARGS__)
 
 #define __efi_call_early(f, ...)   \
__efi_early()->call((unsigned long)f, __VA_ARGS__);
-- 
2.10.0



[GIT PULL 0/9] EFI changes for v4.10

2016-11-12 Thread Matt Fleming
Folks, please pull the following v4.10 material. There isn't a huge
amount of stuff here. The biggest change is the EFI dev path parser
code from Lukas to get thunderbolt working on his macbook.

 [ The thunderbolt patch has been ACK'd by Andreas and given the OK to
   take it through the EFI tree ]

The following changes since commit a75dcb5848359f488c32c0aef8711d9bd37a77b8:

  efi/efivar_ssdt_load: Don't return success on allocation failure (2016-10-18 
17:11:20 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-next

for you to fetch changes up to 9110bc036062fcd31994a35540d63f8deed22efa:

  thunderbolt: Use Device ROM retrieved from EFI (2016-11-12 21:14:43 +)


 * Fix an allocation bug in the generic EFI libstub where alignment
   and adjusted size isn't taken into account - Roy Franz

 * Update the EFI MAINTAINERS entry to include ARM and arm64 files and
   directories - Ard Biesheuvel

 * Add new feature to seed the RNG from the stashed value returned by
   EFI_RNG_PROTOCOL in EFI stub and wire up for ARM/arm64 - Ard Biesheuvel

 * Retrieve Apple device properties from within the EFI stub to fully
   support thunderbolt devices on Apple Macbooks - Lukas Wunner


Ard Biesheuvel (4):
  MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem
  efi: Add support for seeding the RNG from a UEFI config table
  efi/libstub: Add random.c to ARM build
  efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table

Lukas Wunner (4):
  efi: Add device path parser
  efi: Allow bitness-agnostic protocol calls
  x86/efi: Retrieve and assign Apple device properties
  thunderbolt: Use Device ROM retrieved from EFI

Roy Franz (1):
  efi/libstub: Fix allocation size calculations

 Documentation/kernel-parameters.txt|   5 +
 MAINTAINERS|   6 +-
 arch/arm/include/asm/efi.h |   3 +
 arch/arm64/include/asm/efi.h   |   3 +
 arch/x86/boot/compressed/eboot.c   |  65 +++
 arch/x86/include/asm/efi.h |  16 +-
 arch/x86/include/uapi/asm/bootparam.h  |   1 +
 drivers/firmware/efi/Kconfig   |  18 ++
 drivers/firmware/efi/Makefile  |   2 +
 drivers/firmware/efi/apple-properties.c| 248 +
 drivers/firmware/efi/dev-path-parser.c | 203 
 drivers/firmware/efi/efi.c |  72 +++
 drivers/firmware/efi/libstub/Makefile  |   4 +-
 drivers/firmware/efi/libstub/arm-stub.c|   2 +
 drivers/firmware/efi/libstub/efi-stub-helper.c |  33 ++--
 drivers/firmware/efi/libstub/efistub.h |  11 ++
 drivers/firmware/efi/libstub/random.c  |  56 +-
 drivers/thunderbolt/Kconfig|   1 +
 drivers/thunderbolt/eeprom.c   |  43 +
 drivers/thunderbolt/switch.c   |   2 +-
 include/linux/efi.h|  46 +
 21 files changed, 808 insertions(+), 32 deletions(-)
 create mode 100644 drivers/firmware/efi/apple-properties.c
 create mode 100644 drivers/firmware/efi/dev-path-parser.c


[PATCH 8/9] x86/efi: Retrieve and assign Apple device properties

2016-11-12 Thread Matt Fleming
 for
devices instantiated in a deferred fashion. It seems like this approach
would be more complicated and require more code. That doesn't seem
justified without a specific use case.

For comparison, the strategy on macOS is to assign properties to objects
in the ACPI namespace (AppleACPIPlatformExpert::mergeEFIProperties()).
That approach is definitely wrong as it fails for devices not present in
the namespace: The NHI EFI driver supplies properties for attached
Thunderbolt devices, yet on Macs with Thunderbolt 1 only one device
level behind the host controller is described in the namespace.
Consequently macOS cannot assign properties for chained devices. With
Thunderbolt 2 they started to describe three device levels behind host
controllers in the namespace but this grossly inflates the SSDT and
still fails if the user daisy-chained more than three devices.

We copy the property names and values from the setup_data payload to
swappable virtual memory and afterwards make the payload available to
the page allocator. This is just for the sake of good housekeeping, it
wouldn't occupy a meaningful amount of physical memory ( bytes on my
machine). Only the payload is freed, not the setup_data header since
otherwise we'd break the list linkage and we cannot safely update the
predecessor's ->next link because there's no locking for the list.

The payload is currently not passed on to kexec'ed kernels, same for PCI
ROMs retrieved by setup_efi_pci(). This can be added later if there is
demand by amending setup_efi_state(). The payload can then no longer be
made available to the page allocator of course.

Tested-by: Lukas Wunner  [MacBookPro9,1]
Tested-by: Pierre Moreau  [MacBookPro11,3]
Signed-off-by: Lukas Wunner 
Cc: grub-de...@gnu.org
Cc: Pedro Vilaça 
Cc: Ard Biesheuvel 
Cc: Andreas Noever 
Cc: Peter Jones 
Signed-off-by: Matt Fleming 
---
 Documentation/kernel-parameters.txt |   5 +
 arch/x86/boot/compressed/eboot.c|  65 +
 arch/x86/include/uapi/asm/bootparam.h   |   1 +
 drivers/firmware/efi/Kconfig|  13 ++
 drivers/firmware/efi/Makefile   |   1 +
 drivers/firmware/efi/apple-properties.c | 248 
 include/linux/efi.h |  17 +++
 7 files changed, 350 insertions(+)
 create mode 100644 drivers/firmware/efi/apple-properties.c

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 37babf91f2cb..86a31dfc036e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1062,6 +1062,11 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
 
dscc4.setup=[NET]
 
+   dump_apple_properties   [X86]
+   Dump name and content of EFI device properties on
+   x86 Macs.  Useful for driver authors to determine
+   what data is available or for reverse-engineering.
+
dyndbg[="val"]  [KNL,DYNAMIC_DEBUG]
module.dyndbg[="val"]
Enable debug messages at boot time.  See
diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index cc69e37548db..ff01c8fc76f7 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -537,6 +537,69 @@ static void setup_efi_pci(struct boot_params *params)
efi_call_early(free_pool, pci_handle);
 }
 
+static void retrieve_apple_device_properties(struct boot_params *boot_params)
+{
+   efi_guid_t guid = APPLE_PROPERTIES_PROTOCOL_GUID;
+   struct setup_data *data, *new;
+   efi_status_t status;
+   u32 size = 0;
+   void *p;
+
+   status = efi_call_early(locate_protocol, , NULL, );
+   if (status != EFI_SUCCESS)
+   return;
+
+   if (efi_table_attr(apple_properties_protocol, version, p) != 0x1) {
+   efi_printk(sys_table, "Unsupported properties proto version\n");
+   return;
+   }
+
+   efi_call_proto(apple_properties_protocol, get_all, p, NULL, );
+   if (!size)
+   return;
+
+   do {
+   status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
+   size + sizeof(struct setup_data), );
+   if (status != EFI_SUCCESS) {
+   efi_printk(sys_table,
+   "Failed to alloc mem for properties\n");
+   return;
+   }
+
+   status = efi_call_proto(apple_properties_protocol, get_all, p,
+   new->data, );
+
+   if (status == EFI_BUFFER_TOO_SMALL)
+   efi_call_early(free_pool, new);
+   } while (status == EFI_BUFFER_TOO_SMALL);
+
+   new->type = SETUP_APPLE_PROPERTIES;
+   new->len  = size;
+   new->next = 0;
+
+   data = (struct setup_data *)(unsigned long)boot_params

[PATCH 3/9] efi: Add support for seeding the RNG from a UEFI config table

2016-11-12 Thread Matt Fleming
From: Ard Biesheuvel 

Specify a Linux specific UEFI configuration table that carries some
random bits, and use the contents during early boot to seed the kernel's
random number generator. This allows much strong random numbers to be
generated early on.

The entropy is fed to the kernel using add_device_randomness(), which is
documented as being appropriate for being called very early.

Since UEFI configuration tables may also be consumed by kexec'd kernels,
register a reboot notifier that updates the seed in the table.

Note that the config table could be generated by the EFI stub or by any
other UEFI driver or application (e.g., GRUB), but the random seed table
GUID and the associated functionality should be considered an internal
kernel interface (unless it is promoted to ABI later on)

Cc: Matt Fleming 
Reviewed-by: Kees Cook 
Signed-off-by: Ard Biesheuvel 
---
 drivers/firmware/efi/efi.c | 72 ++
 include/linux/efi.h|  8 ++
 2 files changed, 80 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index a4944e22f294..92914801e388 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -23,7 +23,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -48,6 +51,7 @@ struct efi __read_mostly efi = {
.esrt   = EFI_INVALID_TABLE_ADDR,
.properties_table   = EFI_INVALID_TABLE_ADDR,
.mem_attr_table = EFI_INVALID_TABLE_ADDR,
+   .rng_seed   = EFI_INVALID_TABLE_ADDR,
 };
 EXPORT_SYMBOL(efi);
 
@@ -440,6 +444,7 @@ static __initdata efi_config_table_type_t common_tables[] = 
{
{EFI_SYSTEM_RESOURCE_TABLE_GUID, "ESRT", },
{EFI_PROPERTIES_TABLE_GUID, "PROP", _table},
{EFI_MEMORY_ATTRIBUTES_TABLE_GUID, "MEMATTR", _attr_table},
+   {LINUX_EFI_RANDOM_SEED_TABLE_GUID, "RNG", _seed},
{NULL_GUID, NULL, NULL},
 };
 
@@ -501,6 +506,29 @@ int __init efi_config_parse_tables(void *config_tables, 
int count, int sz,
pr_cont("\n");
set_bit(EFI_CONFIG_TABLES, );
 
+   if (efi.rng_seed != EFI_INVALID_TABLE_ADDR) {
+   struct linux_efi_random_seed *seed;
+   u32 size = 0;
+
+   seed = early_memremap(efi.rng_seed, sizeof(*seed));
+   if (seed != NULL) {
+   size = seed->size;
+   early_memunmap(seed, sizeof(*seed));
+   } else {
+   pr_err("Could not map UEFI random seed!\n");
+   }
+   if (size > 0) {
+   seed = early_memremap(efi.rng_seed,
+ sizeof(*seed) + size);
+   if (seed != NULL) {
+   add_device_randomness(seed->bits, seed->size);
+   early_memunmap(seed, sizeof(*seed) + size);
+   } else {
+   pr_err("Could not map UEFI random seed!\n");
+   }
+   }
+   }
+
/* Parse the EFI Properties table if it exists */
if (efi.properties_table != EFI_INVALID_TABLE_ADDR) {
efi_properties_table_t *tbl;
@@ -824,3 +852,47 @@ int efi_status_to_err(efi_status_t status)
 
return err;
 }
+
+#ifdef CONFIG_KEXEC
+static int update_efi_random_seed(struct notifier_block *nb,
+ unsigned long code, void *unused)
+{
+   struct linux_efi_random_seed *seed;
+   u32 size = 0;
+
+   if (!kexec_in_progress)
+   return NOTIFY_DONE;
+
+   seed = memremap(efi.rng_seed, sizeof(*seed), MEMREMAP_WB);
+   if (seed != NULL) {
+   size = min(seed->size, 32U);
+   memunmap(seed);
+   } else {
+   pr_err("Could not map UEFI random seed!\n");
+   }
+   if (size > 0) {
+   seed = memremap(efi.rng_seed, sizeof(*seed) + size,
+   MEMREMAP_WB);
+   if (seed != NULL) {
+   seed->size = size;
+   get_random_bytes(seed->bits, seed->size);
+   memunmap(seed);
+   } else {
+   pr_err("Could not map UEFI random seed!\n");
+   }
+   }
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block efi_random_seed_nb = {
+   .notifier_call = update_efi_random_seed,
+};
+
+static int register_update_efi_random_seed(void)
+{
+   if (efi.rng_seed == EFI_INVALID_TABLE_ADDR)
+   return 0;
+   return register_reboot_notifier(_random_seed_nb);
+}
+late_initcall(register_update_efi_random_seed);
+#endif
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 2d089487d2da..85e28b138

[PATCH 7/9] efi: Allow bitness-agnostic protocol calls

2016-11-12 Thread Matt Fleming
From: Lukas Wunner 

We already have a macro to invoke boot services which on x86 adapts
automatically to the bitness of the EFI firmware:  efi_call_early().

The macro allows sharing of functions across arches and bitness variants
as long as those functions only call boot services.  However in practice
functions in the EFI stub contain a mix of boot services calls and
protocol calls.

Add an efi_call_proto() macro for bitness-agnostic protocol calls to
allow sharing more code across arches as well as deduplicating 32 bit
and 64 bit code paths.

On x86, implement it using a new efi_table_attr() macro for bitness-
agnostic table lookups.  Refactor efi_call_early() to make use of the
same macro.  (The resulting object code remains identical.)

Signed-off-by: Lukas Wunner 
Cc: Ard Biesheuvel 
Cc: Andreas Noever 
Cc: Peter Jones 
Signed-off-by: Matt Fleming 
---
 arch/arm/include/asm/efi.h   |  3 +++
 arch/arm64/include/asm/efi.h |  3 +++
 arch/x86/include/asm/efi.h   | 16 +++-
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h
index 766bf9b78160..0b06f5341b45 100644
--- a/arch/arm/include/asm/efi.h
+++ b/arch/arm/include/asm/efi.h
@@ -57,6 +57,9 @@ void efi_virtmap_unload(void);
 #define __efi_call_early(f, ...)   f(__VA_ARGS__)
 #define efi_is_64bit() (false)
 
+#define efi_call_proto(protocol, f, instance, ...) \
+   ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
+
 struct screen_info *alloc_screen_info(efi_system_table_t *sys_table_arg);
 void free_screen_info(efi_system_table_t *sys_table, struct screen_info *si);
 
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index a9e54aad15ef..771b3f0bc757 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -51,6 +51,9 @@ int efi_set_mapping_permissions(struct mm_struct *mm, 
efi_memory_desc_t *md);
 #define __efi_call_early(f, ...)   f(__VA_ARGS__)
 #define efi_is_64bit() (true)
 
+#define efi_call_proto(protocol, f, instance, ...) \
+   ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
+
 #define alloc_screen_info(x...)_info
 #define free_screen_info(x...)
 
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 389d700b961e..e99675b9c861 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -210,12 +210,18 @@ static inline bool efi_is_64bit(void)
return __efi_early()->is64;
 }
 
+#define efi_table_attr(table, attr, instance)  \
+   (efi_is_64bit() ?   \
+   ((table##_64_t *)(unsigned long)instance)->attr :   \
+   ((table##_32_t *)(unsigned long)instance)->attr)
+
+#define efi_call_proto(protocol, f, instance, ...) \
+   __efi_early()->call(efi_table_attr(protocol, f, instance),  \
+   instance, ##__VA_ARGS__)
+
 #define efi_call_early(f, ...) \
-   __efi_early()->call(efi_is_64bit() ?\
-   ((efi_boot_services_64_t *)(unsigned long)  \
-   __efi_early()->boot_services)->f :  \
-   ((efi_boot_services_32_t *)(unsigned long)  \
-   __efi_early()->boot_services)->f, __VA_ARGS__)
+   __efi_early()->call(efi_table_attr(efi_boot_services, f,\
+   __efi_early()->boot_services), __VA_ARGS__)
 
 #define __efi_call_early(f, ...)   \
__efi_early()->call((unsigned long)f, __VA_ARGS__);
-- 
2.10.0



[GIT PULL 0/9] EFI changes for v4.10

2016-11-12 Thread Matt Fleming
Folks, please pull the following v4.10 material. There isn't a huge
amount of stuff here. The biggest change is the EFI dev path parser
code from Lukas to get thunderbolt working on his macbook.

 [ The thunderbolt patch has been ACK'd by Andreas and given the OK to
   take it through the EFI tree ]

The following changes since commit a75dcb5848359f488c32c0aef8711d9bd37a77b8:

  efi/efivar_ssdt_load: Don't return success on allocation failure (2016-10-18 
17:11:20 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-next

for you to fetch changes up to 9110bc036062fcd31994a35540d63f8deed22efa:

  thunderbolt: Use Device ROM retrieved from EFI (2016-11-12 21:14:43 +)


 * Fix an allocation bug in the generic EFI libstub where alignment
   and adjusted size isn't taken into account - Roy Franz

 * Update the EFI MAINTAINERS entry to include ARM and arm64 files and
   directories - Ard Biesheuvel

 * Add new feature to seed the RNG from the stashed value returned by
   EFI_RNG_PROTOCOL in EFI stub and wire up for ARM/arm64 - Ard Biesheuvel

 * Retrieve Apple device properties from within the EFI stub to fully
   support thunderbolt devices on Apple Macbooks - Lukas Wunner


Ard Biesheuvel (4):
  MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem
  efi: Add support for seeding the RNG from a UEFI config table
  efi/libstub: Add random.c to ARM build
  efi/arm*: libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table

Lukas Wunner (4):
  efi: Add device path parser
  efi: Allow bitness-agnostic protocol calls
  x86/efi: Retrieve and assign Apple device properties
  thunderbolt: Use Device ROM retrieved from EFI

Roy Franz (1):
  efi/libstub: Fix allocation size calculations

 Documentation/kernel-parameters.txt|   5 +
 MAINTAINERS|   6 +-
 arch/arm/include/asm/efi.h |   3 +
 arch/arm64/include/asm/efi.h   |   3 +
 arch/x86/boot/compressed/eboot.c   |  65 +++
 arch/x86/include/asm/efi.h |  16 +-
 arch/x86/include/uapi/asm/bootparam.h  |   1 +
 drivers/firmware/efi/Kconfig   |  18 ++
 drivers/firmware/efi/Makefile  |   2 +
 drivers/firmware/efi/apple-properties.c| 248 +
 drivers/firmware/efi/dev-path-parser.c | 203 
 drivers/firmware/efi/efi.c |  72 +++
 drivers/firmware/efi/libstub/Makefile  |   4 +-
 drivers/firmware/efi/libstub/arm-stub.c|   2 +
 drivers/firmware/efi/libstub/efi-stub-helper.c |  33 ++--
 drivers/firmware/efi/libstub/efistub.h |  11 ++
 drivers/firmware/efi/libstub/random.c  |  56 +-
 drivers/thunderbolt/Kconfig|   1 +
 drivers/thunderbolt/eeprom.c   |  43 +
 drivers/thunderbolt/switch.c   |   2 +-
 include/linux/efi.h|  46 +
 21 files changed, 808 insertions(+), 32 deletions(-)
 create mode 100644 drivers/firmware/efi/apple-properties.c
 create mode 100644 drivers/firmware/efi/dev-path-parser.c


[PATCH 1/2] x86/efi: Fix EFI memmap pointer size warning

2016-11-12 Thread Matt Fleming
From: Borislav Petkov <b...@suse.de>

Fix this when building on 32-bit:

  arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’:
  arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
   (efi_memory_desc_t *)pa);
   ^
  arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
   (efi_memory_desc_t *)pa);
   ^

The @pa local variable is declared as phys_addr_t and that is a u64 when
CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE
build.)

However, its value comes from __pa() which is basically doing pointer
arithmetic and checking, and returns unsigned long as it is the native
pointer width.

So let's use an unsigned long too. It should be fine to do so because
the later users cast it to a pointer too.

Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk>
---
 arch/x86/platform/efi/efi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index bf99aa7005eb..936a488d6cf6 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -861,7 +861,7 @@ static void __init __efi_enter_virtual_mode(void)
int count = 0, pg_shift = 0;
void *new_memmap = NULL;
efi_status_t status;
-   phys_addr_t pa;
+   unsigned long pa;
 
efi.systab = NULL;
 
-- 
2.10.0



[PATCH 2/2] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK

2016-11-12 Thread Matt Fleming
Booting an EFI mixed mode kernel has been crashing since commit:

  e37e43a497d5 ("x86/mm/64: Enable vmapped stacks 
(CONFIG_HAVE_ARCH_VMAP_STACK=y)")

The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.

Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.

Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.

Suggested-by: Andy Lutomirski <l...@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: "H. Peter Anvin" <h...@zytor.com>
Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk>
---
 arch/x86/platform/efi/efi_64.c | 80 ++
 1 file changed, 57 insertions(+), 23 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f801f66f..319148bd4b05 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -211,6 +212,35 @@ void efi_sync_low_kernel_mappings(void)
memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
 }
 
+/*
+ * Wrapper for slow_virt_to_phys() that handles NULL addresses.
+ */
+static inline phys_addr_t
+virt_to_phys_or_null_size(void *va, unsigned long size)
+{
+   bool bad_size;
+
+   if (!va)
+   return 0;
+
+   if (virt_addr_valid(va))
+   return virt_to_phys(va);
+
+   /*
+* A fully aligned variable on the stack is guaranteed not to
+* cross a page bounary. Try to catch strings on the stack by
+* checking that 'size' is a power of two.
+*/
+   bad_size = size > PAGE_SIZE || !is_power_of_2(size);
+
+   WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size);
+
+   return slow_virt_to_phys(va);
+}
+
+#define virt_to_phys_or_null(addr) \
+   virt_to_phys_or_null_size((addr), sizeof(*(addr)))
+
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
unsigned long pfn, text;
@@ -494,8 +524,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, 
efi_time_cap_t *tc)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
-   phys_tc = virt_to_phys(tc);
+   phys_tm = virt_to_phys_or_null(tm);
+   phys_tc = virt_to_phys_or_null(tc);
 
status = efi_thunk(get_time, phys_tm, phys_tc);
 
@@ -511,7 +541,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_time, phys_tm);
 
@@ -529,9 +559,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t 
*pending,
 
spin_lock(_lock);
 
-   phys_enabled = virt_to_phys(enabled);
-   phys_pending = virt_to_phys(pending);
-   phys_tm = virt_to_phys(tm);
+   phys_enabled = virt_to_phys_or_null(enabled);
+   phys_pending = virt_to_phys_or_null(pending);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(get_wakeup_time, phys_enabled,
 phys_pending, phys_tm);
@@ -549,7 +579,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_wakeup_time, enabled, phys_tm);
 
@@ -558,6 +588,10 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm)
return status;
 }
 
+static unsigned long efi_name_size(efi_char16_t *name)
+{
+   return ucs2_strsize(name, EFI_VAR_NAME_LEN) + 1;
+}
 
 static efi_status_t
 efi_thunk_get_variable(efi_char16_t *name, efi_guid_t *vendor,
@@ -567,11 +601,11 @@ efi_thunk_get_variable(efi_char16_t *name, efi_guid_t 
*vendor,
u32 phys_name, phys_vendor, phys_attr;
u32 phys_data_size, phys_data;
 
-   phys_data_size = virt_to_phys(data_size);
-   phys_vendor = virt_to_phys(vendor);
-   phys_name = virt_to_phys(name);
-   phys_attr = virt_to_phys(attr);
-   phys_data = virt_to_phys(data);
+   phys_data_size = virt_to_phys_or_null(data_size);
+   phys_vendor = virt_to_phys_or_null(vendor);
+   phys_name = virt_to_phys_or_null_size(name, efi_name_size(name));
+   phys_attr = virt_to_phys_or_null(attr);
+   phys_data = virt_to_phys_or_null_size(data, *data_size);
 
status = efi_thunk(get_variable, phys_name, phys_vendor,
   phys_attr, phys_data_size, phys_data);
@@ -586,9 +620,9 @@ efi_thunk_set_variable(efi_char16_t *name, efi_guid_t 
*vendor,

[GIT PULL 0/2] EFI urgent fixes

2016-11-12 Thread Matt Fleming
Folks, please pull the following two EFI patches. The first fixes a
build warning for PAE that Boris hit. The second makes mixed-mode EFI
boot again after the vmap'd stack changes introduced during the merge
window.

The following changes since commit bc33b0ca11e3df46a4fa7639ba488c9d4911:

  Linux 4.9-rc4 (2016-11-05 16:23:36 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-urgent

for you to fetch changes up to 044ddf3d3e3cb62671f22fa837a2164d4786d867:

  x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK 
(2016-11-12 21:00:18 +)


 * Fix memory corruption when booting EFI mixed mode due to the recent
   vmap'd stack changes - Matt Fleming

 * Build warning fix in the EFI memmap code when CONFIG_X86_PAE and
   CONFIG_PHYS_ADDR_T_64BIT are enabled - Borislav Petkov


Borislav Petkov (1):
  x86/efi: Fix EFI memmap pointer size warning

Matt Fleming (1):
  x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK

 arch/x86/platform/efi/efi.c|  2 +-
 arch/x86/platform/efi/efi_64.c | 80 ++
 2 files changed, 58 insertions(+), 24 deletions(-)


[PATCH 1/2] x86/efi: Fix EFI memmap pointer size warning

2016-11-12 Thread Matt Fleming
From: Borislav Petkov 

Fix this when building on 32-bit:

  arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’:
  arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
   (efi_memory_desc_t *)pa);
   ^
  arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
   (efi_memory_desc_t *)pa);
   ^

The @pa local variable is declared as phys_addr_t and that is a u64 when
CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE
build.)

However, its value comes from __pa() which is basically doing pointer
arithmetic and checking, and returns unsigned long as it is the native
pointer width.

So let's use an unsigned long too. It should be fine to do so because
the later users cast it to a pointer too.

Signed-off-by: Borislav Petkov 
Cc: Ard Biesheuvel 
Signed-off-by: Matt Fleming 
---
 arch/x86/platform/efi/efi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index bf99aa7005eb..936a488d6cf6 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -861,7 +861,7 @@ static void __init __efi_enter_virtual_mode(void)
int count = 0, pg_shift = 0;
void *new_memmap = NULL;
efi_status_t status;
-   phys_addr_t pa;
+   unsigned long pa;
 
efi.systab = NULL;
 
-- 
2.10.0



[PATCH 2/2] x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK

2016-11-12 Thread Matt Fleming
Booting an EFI mixed mode kernel has been crashing since commit:

  e37e43a497d5 ("x86/mm/64: Enable vmapped stacks 
(CONFIG_HAVE_ARCH_VMAP_STACK=y)")

The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.

Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.

Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.

Suggested-by: Andy Lutomirski 
Cc: Ard Biesheuvel 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Signed-off-by: Matt Fleming 
---
 arch/x86/platform/efi/efi_64.c | 80 ++
 1 file changed, 57 insertions(+), 23 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f801f66f..319148bd4b05 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -211,6 +212,35 @@ void efi_sync_low_kernel_mappings(void)
memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
 }
 
+/*
+ * Wrapper for slow_virt_to_phys() that handles NULL addresses.
+ */
+static inline phys_addr_t
+virt_to_phys_or_null_size(void *va, unsigned long size)
+{
+   bool bad_size;
+
+   if (!va)
+   return 0;
+
+   if (virt_addr_valid(va))
+   return virt_to_phys(va);
+
+   /*
+* A fully aligned variable on the stack is guaranteed not to
+* cross a page bounary. Try to catch strings on the stack by
+* checking that 'size' is a power of two.
+*/
+   bad_size = size > PAGE_SIZE || !is_power_of_2(size);
+
+   WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size);
+
+   return slow_virt_to_phys(va);
+}
+
+#define virt_to_phys_or_null(addr) \
+   virt_to_phys_or_null_size((addr), sizeof(*(addr)))
+
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
unsigned long pfn, text;
@@ -494,8 +524,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, 
efi_time_cap_t *tc)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
-   phys_tc = virt_to_phys(tc);
+   phys_tm = virt_to_phys_or_null(tm);
+   phys_tc = virt_to_phys_or_null(tc);
 
status = efi_thunk(get_time, phys_tm, phys_tc);
 
@@ -511,7 +541,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_time, phys_tm);
 
@@ -529,9 +559,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t 
*pending,
 
spin_lock(_lock);
 
-   phys_enabled = virt_to_phys(enabled);
-   phys_pending = virt_to_phys(pending);
-   phys_tm = virt_to_phys(tm);
+   phys_enabled = virt_to_phys_or_null(enabled);
+   phys_pending = virt_to_phys_or_null(pending);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(get_wakeup_time, phys_enabled,
 phys_pending, phys_tm);
@@ -549,7 +579,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_wakeup_time, enabled, phys_tm);
 
@@ -558,6 +588,10 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm)
return status;
 }
 
+static unsigned long efi_name_size(efi_char16_t *name)
+{
+   return ucs2_strsize(name, EFI_VAR_NAME_LEN) + 1;
+}
 
 static efi_status_t
 efi_thunk_get_variable(efi_char16_t *name, efi_guid_t *vendor,
@@ -567,11 +601,11 @@ efi_thunk_get_variable(efi_char16_t *name, efi_guid_t 
*vendor,
u32 phys_name, phys_vendor, phys_attr;
u32 phys_data_size, phys_data;
 
-   phys_data_size = virt_to_phys(data_size);
-   phys_vendor = virt_to_phys(vendor);
-   phys_name = virt_to_phys(name);
-   phys_attr = virt_to_phys(attr);
-   phys_data = virt_to_phys(data);
+   phys_data_size = virt_to_phys_or_null(data_size);
+   phys_vendor = virt_to_phys_or_null(vendor);
+   phys_name = virt_to_phys_or_null_size(name, efi_name_size(name));
+   phys_attr = virt_to_phys_or_null(attr);
+   phys_data = virt_to_phys_or_null_size(data, *data_size);
 
status = efi_thunk(get_variable, phys_name, phys_vendor,
   phys_attr, phys_data_size, phys_data);
@@ -586,9 +620,9 @@ efi_thunk_set_variable(efi_char16_t *name, efi_guid_t 
*vendor,
u32 phys_name, phys_vendor, phys_data;
efi_status_t status;
 
-   phys_name = virt_to_phys(name);
-   phys_vendor = virt_to_phys(vendor);
-   p

[GIT PULL 0/2] EFI urgent fixes

2016-11-12 Thread Matt Fleming
Folks, please pull the following two EFI patches. The first fixes a
build warning for PAE that Boris hit. The second makes mixed-mode EFI
boot again after the vmap'd stack changes introduced during the merge
window.

The following changes since commit bc33b0ca11e3df46a4fa7639ba488c9d4911:

  Linux 4.9-rc4 (2016-11-05 16:23:36 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-urgent

for you to fetch changes up to 044ddf3d3e3cb62671f22fa837a2164d4786d867:

  x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK 
(2016-11-12 21:00:18 +)


 * Fix memory corruption when booting EFI mixed mode due to the recent
   vmap'd stack changes - Matt Fleming

 * Build warning fix in the EFI memmap code when CONFIG_X86_PAE and
   CONFIG_PHYS_ADDR_T_64BIT are enabled - Borislav Petkov


Borislav Petkov (1):
  x86/efi: Fix EFI memmap pointer size warning

Matt Fleming (1):
  x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK

 arch/x86/platform/efi/efi.c|  2 +-
 arch/x86/platform/efi/efi_64.c | 80 ++
 2 files changed, 58 insertions(+), 24 deletions(-)


Re: [tip:x86/asm] x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)

2016-11-07 Thread Matt Fleming
On Sun, 30 Oct, at 09:21:29AM, Andy Lutomirski wrote:
> > @@ -251,7 +277,8 @@ int __init efi_setup_page_tables(unsigned long 
> > pa_memmap, unsigned num_pages)
> > if (!page)
> > panic("Unable to allocate EFI runtime stack < 4GB\n");
> >
> > -   efi_scratch.phys_stack = virt_to_phys(page_address(page));
> > +   addr = page_address(page);
> > +   efi_scratch.phys_stack = virt_to_phys_or_null_size(addr, PAGE_SIZE);
> 
> This can't be on the stack -- you just allocated it with alloc_page().
 
Oh good point. I was too eager with search and replace.

> > +static unsigned long efi_name_size(efi_char16_t *name)
> > +{
> > +   return ucs2_strsize(name, EFI_VAR_NAME_LEN) + 1;
> > +}
> 
> If this is really dynamic, I'm surprised that anything ends up
> aligned.  Can't this be a number like 6?  It might pay to extend that
> warning to also check that, if the variable is on the stack, its size
> is a power of two.  But maybe none of the users of this are on the
> stack, in which case it might pay to try to prevent a new user on the
> stack from showing up.

I can't find any existing users that place strings on the stack, no.

OK, let's try this one.

---

>From 3f8a1ec209a458a9e4b82313186fbde86082696b Mon Sep 17 00:00:00 2001
From: Matt Fleming <m...@codeblueprint.co.uk>
Date: Thu, 20 Oct 2016 22:17:21 +0100
Subject: [PATCH] x86/efi: Prevent mixed mode boot corruption with
 CONFIG_VMAP_STACK

Booting an EFI mixed mode kernel has been crashing since commit:

  e37e43a497d5 ("x86/mm/64: Enable vmapped stacks 
(CONFIG_HAVE_ARCH_VMAP_STACK=y)")

The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.

Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.

Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.

Suggested-by: Andy Lutomirski <l...@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: "H. Peter Anvin" <h...@zytor.com>
Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk>
---
 arch/x86/platform/efi/efi_64.c | 80 ++
 1 file changed, 57 insertions(+), 23 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f801f66f..319148bd4b05 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -211,6 +212,35 @@ void efi_sync_low_kernel_mappings(void)
memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
 }
 
+/*
+ * Wrapper for slow_virt_to_phys() that handles NULL addresses.
+ */
+static inline phys_addr_t
+virt_to_phys_or_null_size(void *va, unsigned long size)
+{
+   bool bad_size;
+
+   if (!va)
+   return 0;
+
+   if (virt_addr_valid(va))
+   return virt_to_phys(va);
+
+   /*
+* A fully aligned variable on the stack is guaranteed not to
+* cross a page bounary. Try to catch strings on the stack by
+* checking that 'size' is a power of two.
+*/
+   bad_size = size > PAGE_SIZE || !is_power_of_2(size);
+
+   WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size);
+
+   return slow_virt_to_phys(va);
+}
+
+#define virt_to_phys_or_null(addr) \
+   virt_to_phys_or_null_size((addr), sizeof(*(addr)))
+
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
unsigned long pfn, text;
@@ -494,8 +524,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, 
efi_time_cap_t *tc)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
-   phys_tc = virt_to_phys(tc);
+   phys_tm = virt_to_phys_or_null(tm);
+   phys_tc = virt_to_phys_or_null(tc);
 
status = efi_thunk(get_time, phys_tm, phys_tc);
 
@@ -511,7 +541,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_time, phys_tm);
 
@@ -529,9 +559,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t 
*pending,
 
spin_lock(_lock);
 
-   phys_enabled = virt_to_phys(enabled);
-   phys_pending = virt_to_phys(pending);
-   phys_tm = virt_to_phys(tm);
+   phys_enabled = virt_to_phys_or_null(enabled);
+   phys_pending = virt_to_phys_or_null(pending);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(get_wa

Re: [tip:x86/asm] x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)

2016-11-07 Thread Matt Fleming
On Sun, 30 Oct, at 09:21:29AM, Andy Lutomirski wrote:
> > @@ -251,7 +277,8 @@ int __init efi_setup_page_tables(unsigned long 
> > pa_memmap, unsigned num_pages)
> > if (!page)
> > panic("Unable to allocate EFI runtime stack < 4GB\n");
> >
> > -   efi_scratch.phys_stack = virt_to_phys(page_address(page));
> > +   addr = page_address(page);
> > +   efi_scratch.phys_stack = virt_to_phys_or_null_size(addr, PAGE_SIZE);
> 
> This can't be on the stack -- you just allocated it with alloc_page().
 
Oh good point. I was too eager with search and replace.

> > +static unsigned long efi_name_size(efi_char16_t *name)
> > +{
> > +   return ucs2_strsize(name, EFI_VAR_NAME_LEN) + 1;
> > +}
> 
> If this is really dynamic, I'm surprised that anything ends up
> aligned.  Can't this be a number like 6?  It might pay to extend that
> warning to also check that, if the variable is on the stack, its size
> is a power of two.  But maybe none of the users of this are on the
> stack, in which case it might pay to try to prevent a new user on the
> stack from showing up.

I can't find any existing users that place strings on the stack, no.

OK, let's try this one.

---

>From 3f8a1ec209a458a9e4b82313186fbde86082696b Mon Sep 17 00:00:00 2001
From: Matt Fleming 
Date: Thu, 20 Oct 2016 22:17:21 +0100
Subject: [PATCH] x86/efi: Prevent mixed mode boot corruption with
 CONFIG_VMAP_STACK

Booting an EFI mixed mode kernel has been crashing since commit:

  e37e43a497d5 ("x86/mm/64: Enable vmapped stacks 
(CONFIG_HAVE_ARCH_VMAP_STACK=y)")

The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.

Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.

Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.

Suggested-by: Andy Lutomirski 
Cc: Ard Biesheuvel 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Signed-off-by: Matt Fleming 
---
 arch/x86/platform/efi/efi_64.c | 80 ++
 1 file changed, 57 insertions(+), 23 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f801f66f..319148bd4b05 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -211,6 +212,35 @@ void efi_sync_low_kernel_mappings(void)
memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
 }
 
+/*
+ * Wrapper for slow_virt_to_phys() that handles NULL addresses.
+ */
+static inline phys_addr_t
+virt_to_phys_or_null_size(void *va, unsigned long size)
+{
+   bool bad_size;
+
+   if (!va)
+   return 0;
+
+   if (virt_addr_valid(va))
+   return virt_to_phys(va);
+
+   /*
+* A fully aligned variable on the stack is guaranteed not to
+* cross a page bounary. Try to catch strings on the stack by
+* checking that 'size' is a power of two.
+*/
+   bad_size = size > PAGE_SIZE || !is_power_of_2(size);
+
+   WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size);
+
+   return slow_virt_to_phys(va);
+}
+
+#define virt_to_phys_or_null(addr) \
+   virt_to_phys_or_null_size((addr), sizeof(*(addr)))
+
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
unsigned long pfn, text;
@@ -494,8 +524,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, 
efi_time_cap_t *tc)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
-   phys_tc = virt_to_phys(tc);
+   phys_tm = virt_to_phys_or_null(tm);
+   phys_tc = virt_to_phys_or_null(tc);
 
status = efi_thunk(get_time, phys_tm, phys_tc);
 
@@ -511,7 +541,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_time, phys_tm);
 
@@ -529,9 +559,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t 
*pending,
 
spin_lock(_lock);
 
-   phys_enabled = virt_to_phys(enabled);
-   phys_pending = virt_to_phys(pending);
-   phys_tm = virt_to_phys(tm);
+   phys_enabled = virt_to_phys_or_null(enabled);
+   phys_pending = virt_to_phys_or_null(pending);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(get_wakeup_time, phys_enabled,
 phys_pending, phys_tm);
@@ -549,7 +579,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm)
 
spin_lock(_lock);
 
-   

Re: [PATCH] x86/efi: Fix EFI memmap pointer size warning

2016-11-07 Thread Matt Fleming
On Sun, 06 Nov, at 02:02:54PM, Borislav Petkov wrote:
> Hi Matt,
> 
> please doublecheck me on this but I think we're fine using an unsigned
> long.
> 
> Thanks.
> 
> ---
> From: Borislav Petkov <b...@suse.de>
> Date: Sun, 6 Nov 2016 13:49:10 +0100
> Subject: [PATCH] x86/efi: Fix EFI memmap pointer size warning
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Fix this when building on 32-bit:
> 
>   arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’:
>   arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of 
> different size [-Wint-to-pointer-cast]
>(efi_memory_desc_t *)pa);
>^
>   arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of 
> different size [-Wint-to-pointer-cast]
>(efi_memory_desc_t *)pa);
>^
> 
> The @pa local variable is declared as phys_addr_t and that is a u64 when
> CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE
> build.)
> 
> However, its value comes from __pa() which is basically doing pointer
> arithmetic and checking, and returns unsigned long as it is the native
> pointer width.
> 
> So let's use an unsigned long too. It should be fine to do so because
> the later users cast it to a pointer too.
> 
> Signed-off-by: Borislav Petkov <b...@suse.de>
> Cc: Matt Fleming <m...@codeblueprint.co.uk>
> ---
>  arch/x86/platform/efi/efi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index bf99aa7005eb..936a488d6cf6 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -861,7 +861,7 @@ static void __init __efi_enter_virtual_mode(void)
>   int count = 0, pg_shift = 0;
>   void *new_memmap = NULL;
>   efi_status_t status;
> - phys_addr_t pa;
> + unsigned long pa;
>  
>   efi.systab = NULL;
>  

Right, this does look correct given that __pa() uses 'unsigned long'.

I think the reason this is safe on PAE is that __get_free_pages() is
guaranteed to return a 32-bit address, and will not return HIGHMEM
pages. Which makes __pa() legal and 'unsigned long' the correct type.

Want me to take this through the EFI tree?


Re: [PATCH] x86/efi: Fix EFI memmap pointer size warning

2016-11-07 Thread Matt Fleming
On Sun, 06 Nov, at 02:02:54PM, Borislav Petkov wrote:
> Hi Matt,
> 
> please doublecheck me on this but I think we're fine using an unsigned
> long.
> 
> Thanks.
> 
> ---
> From: Borislav Petkov 
> Date: Sun, 6 Nov 2016 13:49:10 +0100
> Subject: [PATCH] x86/efi: Fix EFI memmap pointer size warning
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Fix this when building on 32-bit:
> 
>   arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’:
>   arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of 
> different size [-Wint-to-pointer-cast]
>(efi_memory_desc_t *)pa);
>^
>   arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of 
> different size [-Wint-to-pointer-cast]
>(efi_memory_desc_t *)pa);
>^
> 
> The @pa local variable is declared as phys_addr_t and that is a u64 when
> CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE
> build.)
> 
> However, its value comes from __pa() which is basically doing pointer
> arithmetic and checking, and returns unsigned long as it is the native
> pointer width.
> 
> So let's use an unsigned long too. It should be fine to do so because
> the later users cast it to a pointer too.
> 
> Signed-off-by: Borislav Petkov 
> Cc: Matt Fleming 
> ---
>  arch/x86/platform/efi/efi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index bf99aa7005eb..936a488d6cf6 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -861,7 +861,7 @@ static void __init __efi_enter_virtual_mode(void)
>   int count = 0, pg_shift = 0;
>   void *new_memmap = NULL;
>   efi_status_t status;
> - phys_addr_t pa;
> + unsigned long pa;
>  
>   efi.systab = NULL;
>  

Right, this does look correct given that __pa() uses 'unsigned long'.

I think the reason this is safe on PAE is that __get_free_pages() is
guaranteed to return a 32-bit address, and will not return HIGHMEM
pages. Which makes __pa() legal and 'unsigned long' the correct type.

Want me to take this through the EFI tree?


Re: [PATCH] x86/boot: Remove always empty $(USERINCLUDE)

2016-11-03 Thread Matt Fleming
On Thu, 03 Nov, at 10:51:38AM, Paul Bolle wrote:
> Apparently Matt left Intel. Let's forward this to a recently used
> address.
> 
> On Thu, 2016-11-03 at 10:47 +0100, Paul Bolle wrote:
> > Commmit b6eea87fc685 ("x86, boot: Explicitly include autoconf.h for
> > hostprogs") correctly noted
> > [...] that because $(USERINCLUDE) isn't exported by
> > the top-level Makefile it's actually empty in arch/x86/boot/Makefile.
> > 
> > So let's do the sane thing and remove the reference to that make variable.
> > 
> > Signed-off-by: Paul Bolle <pebo...@tiscali.nl>
> > ---
> >  arch/x86/boot/Makefile | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> > index 12ea8f8384f4..0d810fb15eac 100644
> > --- a/arch/x86/boot/Makefile
> > +++ b/arch/x86/boot/Makefile
> > @@ -65,7 +65,7 @@ clean-files += cpustr.h
> >  
> >  # 
> > ---
> >  
> > -KBUILD_CFLAGS  := $(USERINCLUDE) $(REALMODE_CFLAGS) -D_SETUP
> > +KBUILD_CFLAGS  := $(REALMODE_CFLAGS) -D_SETUP
> >  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> >  GCOV_PROFILE := n
> >  UBSAN_SANITIZE := n

Looks OK to me.

Reviewed-by: Matt Fleming <m...@codeblueprint.co.uk>


Re: [PATCH] x86/boot: Remove always empty $(USERINCLUDE)

2016-11-03 Thread Matt Fleming
On Thu, 03 Nov, at 10:51:38AM, Paul Bolle wrote:
> Apparently Matt left Intel. Let's forward this to a recently used
> address.
> 
> On Thu, 2016-11-03 at 10:47 +0100, Paul Bolle wrote:
> > Commmit b6eea87fc685 ("x86, boot: Explicitly include autoconf.h for
> > hostprogs") correctly noted
> > [...] that because $(USERINCLUDE) isn't exported by
> > the top-level Makefile it's actually empty in arch/x86/boot/Makefile.
> > 
> > So let's do the sane thing and remove the reference to that make variable.
> > 
> > Signed-off-by: Paul Bolle 
> > ---
> >  arch/x86/boot/Makefile | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> > index 12ea8f8384f4..0d810fb15eac 100644
> > --- a/arch/x86/boot/Makefile
> > +++ b/arch/x86/boot/Makefile
> > @@ -65,7 +65,7 @@ clean-files += cpustr.h
> >  
> >  # 
> > ---
> >  
> > -KBUILD_CFLAGS  := $(USERINCLUDE) $(REALMODE_CFLAGS) -D_SETUP
> > +KBUILD_CFLAGS  := $(REALMODE_CFLAGS) -D_SETUP
> >  KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> >  GCOV_PROFILE := n
> >  UBSAN_SANITIZE := n

Looks OK to me.

Reviewed-by: Matt Fleming 


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-31 Thread Matt Fleming
On Sun, 30 Oct, at 08:59:58AM, Dan Williams wrote:
> On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis
>  wrote:
> > JFYI: I added this report to the list of regressions for Linux 4.9. I'll
> > watch this thread for further updates on this issue to document progress
> > in my weekly reports. Please let me know via regressi...@leemhuis.info
> > in case the discussion moves to a different place (bugzilla or another
> > mail thread for example). tia!
> >
> > Current status (afaics) in my report: This looks stuck. Or was is
> > discussed (or even fixed) somewhere else?
> 
> Thanks, and no, not fixed yet. I've not found the time to run the
> experiments Matt needs, but a colleague has offered to look into it.

Of course, if you are willing to help with debugging, Thorsten, it
would be much appreciated and this bug might get fixed sooner.


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-31 Thread Matt Fleming
On Sun, 30 Oct, at 08:59:58AM, Dan Williams wrote:
> On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis
>  wrote:
> > JFYI: I added this report to the list of regressions for Linux 4.9. I'll
> > watch this thread for further updates on this issue to document progress
> > in my weekly reports. Please let me know via regressi...@leemhuis.info
> > in case the discussion moves to a different place (bugzilla or another
> > mail thread for example). tia!
> >
> > Current status (afaics) in my report: This looks stuck. Or was is
> > discussed (or even fixed) somewhere else?
> 
> Thanks, and no, not fixed yet. I've not found the time to run the
> experiments Matt needs, but a colleague has offered to look into it.

Of course, if you are willing to help with debugging, Thorsten, it
would be much appreciated and this bug might get fixed sooner.


Re: [tip:x86/asm] x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)

2016-10-24 Thread Matt Fleming
On Fri, 21 Oct, at 05:18:30PM, Andy Lutomirski wrote:
> 
> This is asking for trouble if any of the variable length parameters
> are on the stack.  How about adding a "size_t size" parameter and
> doing:
> 
> if (!va) {
>   return 0;
> } else if (virt_addr_valid(va)) {
>   return virt_to_phys(va);
> } else {
>   /* A fully aligned variable on the stack is guaranteed not to cross
> a page boundary. */
>   WARN_ON(!IS_ALIGNED((uintptr_t)va, size) || size > PAGE_SIZE);
>   return slow_virt_to_phys(va);
> }

Ah, good catch. Something like this?

---

>From d2c17f46686076677da3bf04caa2f69d654f8d8a Mon Sep 17 00:00:00 2001
From: Matt Fleming <m...@codeblueprint.co.uk>
Date: Thu, 20 Oct 2016 22:17:21 +0100
Subject: [PATCH] x86/efi: Prevent mixed mode boot corruption with
 CONFIG_VMAP_STACK

Booting an EFI mixed mode kernel has been crashing since commit:

  e37e43a497d5 ("x86/mm/64: Enable vmapped stacks 
(CONFIG_HAVE_ARCH_VMAP_STACK=y)")

The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.

Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.

Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.

Suggested-by: Andy Lutomirski <l...@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: "H. Peter Anvin" <h...@zytor.com>
Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk>
---
 arch/x86/platform/efi/efi_64.c | 79 +-
 1 file changed, 55 insertions(+), 24 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f801f66f..010544293dda 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -211,11 +212,36 @@ void efi_sync_low_kernel_mappings(void)
memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
 }
 
+/*
+ * Wrapper for slow_virt_to_phys() that handles NULL addresses.
+ */
+static inline phys_addr_t
+virt_to_phys_or_null_size(void *va, unsigned long size)
+{
+   if (!va)
+   return 0;
+
+   if (virt_addr_valid(va))
+   return virt_to_phys(va);
+
+   /*
+* A fully aligned variable on the stack is guaranteed not to
+* cross a page bounary.
+*/
+   WARN_ON(!IS_ALIGNED((unsigned long)va, size) || size > PAGE_SIZE);
+
+   return slow_virt_to_phys(va);
+}
+
+#define virt_to_phys_or_null(addr) \
+   virt_to_phys_or_null_size((addr), sizeof(*(addr)))
+
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
unsigned long pfn, text;
struct page *page;
unsigned npages;
+   void *addr;
pgd_t *pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
@@ -251,7 +277,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
if (!page)
panic("Unable to allocate EFI runtime stack < 4GB\n");
 
-   efi_scratch.phys_stack = virt_to_phys(page_address(page));
+   addr = page_address(page);
+   efi_scratch.phys_stack = virt_to_phys_or_null_size(addr, PAGE_SIZE);
efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */
 
npages = (_etext - _text) >> PAGE_SHIFT;
@@ -494,8 +521,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, 
efi_time_cap_t *tc)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
-   phys_tc = virt_to_phys(tc);
+   phys_tm = virt_to_phys_or_null(tm);
+   phys_tc = virt_to_phys_or_null(tc);
 
status = efi_thunk(get_time, phys_tm, phys_tc);
 
@@ -511,7 +538,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_time, phys_tm);
 
@@ -529,9 +556,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t 
*pending,
 
spin_lock(_lock);
 
-   phys_enabled = virt_to_phys(enabled);
-   phys_pending = virt_to_phys(pending);
-   phys_tm = virt_to_phys(tm);
+   phys_enabled = virt_to_phys_or_null(enabled);
+   phys_pending = virt_to_phys_or_null(pending);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(get_wakeup_time, phys_enabled,
 phys_pending, phys_tm);
@@ -549,7 +576,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm)
 
spin_lock(_lock);
 
-   phys_tm = v

Re: [tip:x86/asm] x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)

2016-10-24 Thread Matt Fleming
On Fri, 21 Oct, at 05:18:30PM, Andy Lutomirski wrote:
> 
> This is asking for trouble if any of the variable length parameters
> are on the stack.  How about adding a "size_t size" parameter and
> doing:
> 
> if (!va) {
>   return 0;
> } else if (virt_addr_valid(va)) {
>   return virt_to_phys(va);
> } else {
>   /* A fully aligned variable on the stack is guaranteed not to cross
> a page boundary. */
>   WARN_ON(!IS_ALIGNED((uintptr_t)va, size) || size > PAGE_SIZE);
>   return slow_virt_to_phys(va);
> }

Ah, good catch. Something like this?

---

>From d2c17f46686076677da3bf04caa2f69d654f8d8a Mon Sep 17 00:00:00 2001
From: Matt Fleming 
Date: Thu, 20 Oct 2016 22:17:21 +0100
Subject: [PATCH] x86/efi: Prevent mixed mode boot corruption with
 CONFIG_VMAP_STACK

Booting an EFI mixed mode kernel has been crashing since commit:

  e37e43a497d5 ("x86/mm/64: Enable vmapped stacks 
(CONFIG_HAVE_ARCH_VMAP_STACK=y)")

The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.

Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.

Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.

Suggested-by: Andy Lutomirski 
Cc: Ard Biesheuvel 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Signed-off-by: Matt Fleming 
---
 arch/x86/platform/efi/efi_64.c | 79 +-
 1 file changed, 55 insertions(+), 24 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f801f66f..010544293dda 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -211,11 +212,36 @@ void efi_sync_low_kernel_mappings(void)
memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
 }
 
+/*
+ * Wrapper for slow_virt_to_phys() that handles NULL addresses.
+ */
+static inline phys_addr_t
+virt_to_phys_or_null_size(void *va, unsigned long size)
+{
+   if (!va)
+   return 0;
+
+   if (virt_addr_valid(va))
+   return virt_to_phys(va);
+
+   /*
+* A fully aligned variable on the stack is guaranteed not to
+* cross a page bounary.
+*/
+   WARN_ON(!IS_ALIGNED((unsigned long)va, size) || size > PAGE_SIZE);
+
+   return slow_virt_to_phys(va);
+}
+
+#define virt_to_phys_or_null(addr) \
+   virt_to_phys_or_null_size((addr), sizeof(*(addr)))
+
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
unsigned long pfn, text;
struct page *page;
unsigned npages;
+   void *addr;
pgd_t *pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
@@ -251,7 +277,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
if (!page)
panic("Unable to allocate EFI runtime stack < 4GB\n");
 
-   efi_scratch.phys_stack = virt_to_phys(page_address(page));
+   addr = page_address(page);
+   efi_scratch.phys_stack = virt_to_phys_or_null_size(addr, PAGE_SIZE);
efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */
 
npages = (_etext - _text) >> PAGE_SHIFT;
@@ -494,8 +521,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, 
efi_time_cap_t *tc)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
-   phys_tc = virt_to_phys(tc);
+   phys_tm = virt_to_phys_or_null(tm);
+   phys_tc = virt_to_phys_or_null(tc);
 
status = efi_thunk(get_time, phys_tm, phys_tc);
 
@@ -511,7 +538,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_time, phys_tm);
 
@@ -529,9 +556,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t 
*pending,
 
spin_lock(_lock);
 
-   phys_enabled = virt_to_phys(enabled);
-   phys_pending = virt_to_phys(pending);
-   phys_tm = virt_to_phys(tm);
+   phys_enabled = virt_to_phys_or_null(enabled);
+   phys_pending = virt_to_phys_or_null(pending);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(get_wakeup_time, phys_enabled,
 phys_pending, phys_tm);
@@ -549,7 +576,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_wakeup_time, enabled, phys_tm);
 
@@ -558,6 +585,10 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm

Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-21 Thread Matt Fleming
On Fri, 21 Oct, at 04:41:29PM, Matt Fleming wrote:
> 
> FYI, I've been able to reproduce some crash when using your EFI memory
> map layout under Qemu and forcing the ESRT driver to reserve the space.
 
Nope, that was a bug in my hack. I can't get Qemu to crash while using
your memory map layout.

Any chance you can insert "while(1)" loops into the EFI boot paths for
a kernel that is known to reboot or trigger a triple fault in kernels
that hang, so that we can narrow in on the issue. See,

  http://www.codeblueprint.co.uk/2015/04/early-x86-linux-boot-debug-tricks.html


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-21 Thread Matt Fleming
On Fri, 21 Oct, at 04:41:29PM, Matt Fleming wrote:
> 
> FYI, I've been able to reproduce some crash when using your EFI memory
> map layout under Qemu and forcing the ESRT driver to reserve the space.
 
Nope, that was a bug in my hack. I can't get Qemu to crash while using
your memory map layout.

Any chance you can insert "while(1)" loops into the EFI boot paths for
a kernel that is known to reboot or trigger a triple fault in kernels
that hang, so that we can narrow in on the issue. See,

  http://www.codeblueprint.co.uk/2015/04/early-x86-linux-boot-debug-tricks.html


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-21 Thread Matt Fleming
On Thu, 20 Oct, at 12:37:16PM, Dan Williams wrote:
> 
> I am able to build a kernel and boot the platform with the following
> set of reverts:
> 
>   Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE"
>   Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data"
>   Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()"
>   Revert "efi: Allow drivers to reserve boot services forever"

FYI, I've been able to reproduce some crash when using your EFI memory
map layout under Qemu and forcing the ESRT driver to reserve the space.

It looks like the new EFI memmap we allocate as part of the
reservation is smaller than the old one - which is backwards.

Still debugging...


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-21 Thread Matt Fleming
On Thu, 20 Oct, at 12:37:16PM, Dan Williams wrote:
> 
> I am able to build a kernel and boot the platform with the following
> set of reverts:
> 
>   Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE"
>   Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data"
>   Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()"
>   Revert "efi: Allow drivers to reserve boot services forever"

FYI, I've been able to reproduce some crash when using your EFI memory
map layout under Qemu and forcing the ESRT driver to reserve the space.

It looks like the new EFI memmap we allocate as part of the
reservation is smaller than the old one - which is backwards.

Still debugging...


Re: [tip:x86/asm] x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)

2016-10-21 Thread Matt Fleming
On Wed, 24 Aug, at 06:03:04AM, tip-bot for Andy Lutomirski wrote:
> Commit-ID:  e37e43a497d5a8b7c0cc1736d56986f432c394c9
> Gitweb: http://git.kernel.org/tip/e37e43a497d5a8b7c0cc1736d56986f432c394c9
> Author: Andy Lutomirski <l...@kernel.org>
> AuthorDate: Thu, 11 Aug 2016 02:35:23 -0700
> Committer:  Ingo Molnar <mi...@kernel.org>
> CommitDate: Wed, 24 Aug 2016 12:11:42 +0200
> 
> x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)
> 
> This allows x86_64 kernels to enable vmapped stacks by setting
> HAVE_ARCH_VMAP_STACK=y - which enables the CONFIG_VMAP_STACK=y
> high level Kconfig option.
> 
> There are a couple of interesting bits:

This commit broke booting EFI mixed mode kernels. Here's what I've got
queued up to fix it.

---
>From acf11e55bfcef7a1dca7d1735f4a780e0cdb1c89 Mon Sep 17 00:00:00 2001
From: Matt Fleming <m...@codeblueprint.co.uk>
Date: Thu, 20 Oct 2016 22:17:21 +0100
Subject: [PATCH] x86/efi: Prevent mixed mode boot corruption with
 CONFIG_VMAP_STACK

Booting an EFI mixed mode kernel has been crashing since commit:

  e37e43a497d5 ("x86/mm/64: Enable vmapped stacks 
(CONFIG_HAVE_ARCH_VMAP_STACK=y)")

The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.

Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.

Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.

Suggested-by: Andy Lutomirski <l...@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: "H. Peter Anvin" <h...@zytor.com>
Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk>
---
 arch/x86/platform/efi/efi_64.c | 59 +-
 1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f801f66f..e3569a00885b 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -211,6 +211,17 @@ void efi_sync_low_kernel_mappings(void)
memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
 }
 
+/*
+ * Wrapper for slow_virt_to_phys() that handles NULL addresses.
+ */
+static inline phys_addr_t virt_to_phys_or_null(void *va)
+{
+   if (!va)
+   return 0;
+
+   return slow_virt_to_phys(va);
+}
+
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
unsigned long pfn, text;
@@ -251,7 +262,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
if (!page)
panic("Unable to allocate EFI runtime stack < 4GB\n");
 
-   efi_scratch.phys_stack = virt_to_phys(page_address(page));
+   efi_scratch.phys_stack = virt_to_phys_or_null(page_address(page));
efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */
 
npages = (_etext - _text) >> PAGE_SHIFT;
@@ -494,8 +505,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, 
efi_time_cap_t *tc)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
-   phys_tc = virt_to_phys(tc);
+   phys_tm = virt_to_phys_or_null(tm);
+   phys_tc = virt_to_phys_or_null(tc);
 
status = efi_thunk(get_time, phys_tm, phys_tc);
 
@@ -511,7 +522,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_time, phys_tm);
 
@@ -529,9 +540,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t 
*pending,
 
spin_lock(_lock);
 
-   phys_enabled = virt_to_phys(enabled);
-   phys_pending = virt_to_phys(pending);
-   phys_tm = virt_to_phys(tm);
+   phys_enabled = virt_to_phys_or_null(enabled);
+   phys_pending = virt_to_phys_or_null(pending);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(get_wakeup_time, phys_enabled,
 phys_pending, phys_tm);
@@ -549,7 +560,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_wakeup_time, enabled, phys_tm);
 
@@ -567,11 +578,11 @@ efi_thunk_get_variable(efi_char16_t *name, efi_guid_t 
*vendor,
u32 phys_name, phys_vendor, phys_attr;
u32 phys_data_size, phys_data;
 
-   phys_data_size = virt_to_phys(data_size);
-   phys_vendor = virt_to_phys(vendor);
-   phys_name = virt_to_phys(name);
-   phys_attr = virt_to_phys(attr);
-   phys_data = virt_to_p

Re: [tip:x86/asm] x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)

2016-10-21 Thread Matt Fleming
On Wed, 24 Aug, at 06:03:04AM, tip-bot for Andy Lutomirski wrote:
> Commit-ID:  e37e43a497d5a8b7c0cc1736d56986f432c394c9
> Gitweb: http://git.kernel.org/tip/e37e43a497d5a8b7c0cc1736d56986f432c394c9
> Author: Andy Lutomirski 
> AuthorDate: Thu, 11 Aug 2016 02:35:23 -0700
> Committer:  Ingo Molnar 
> CommitDate: Wed, 24 Aug 2016 12:11:42 +0200
> 
> x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)
> 
> This allows x86_64 kernels to enable vmapped stacks by setting
> HAVE_ARCH_VMAP_STACK=y - which enables the CONFIG_VMAP_STACK=y
> high level Kconfig option.
> 
> There are a couple of interesting bits:

This commit broke booting EFI mixed mode kernels. Here's what I've got
queued up to fix it.

---
>From acf11e55bfcef7a1dca7d1735f4a780e0cdb1c89 Mon Sep 17 00:00:00 2001
From: Matt Fleming 
Date: Thu, 20 Oct 2016 22:17:21 +0100
Subject: [PATCH] x86/efi: Prevent mixed mode boot corruption with
 CONFIG_VMAP_STACK

Booting an EFI mixed mode kernel has been crashing since commit:

  e37e43a497d5 ("x86/mm/64: Enable vmapped stacks 
(CONFIG_HAVE_ARCH_VMAP_STACK=y)")

The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.

Enabling CONFIG_DEBUG_VIRTUAL immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.

Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.

Suggested-by: Andy Lutomirski 
Cc: Ard Biesheuvel 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Signed-off-by: Matt Fleming 
---
 arch/x86/platform/efi/efi_64.c | 59 +-
 1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f801f66f..e3569a00885b 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -211,6 +211,17 @@ void efi_sync_low_kernel_mappings(void)
memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
 }
 
+/*
+ * Wrapper for slow_virt_to_phys() that handles NULL addresses.
+ */
+static inline phys_addr_t virt_to_phys_or_null(void *va)
+{
+   if (!va)
+   return 0;
+
+   return slow_virt_to_phys(va);
+}
+
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
unsigned long pfn, text;
@@ -251,7 +262,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
if (!page)
panic("Unable to allocate EFI runtime stack < 4GB\n");
 
-   efi_scratch.phys_stack = virt_to_phys(page_address(page));
+   efi_scratch.phys_stack = virt_to_phys_or_null(page_address(page));
efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */
 
npages = (_etext - _text) >> PAGE_SHIFT;
@@ -494,8 +505,8 @@ static efi_status_t efi_thunk_get_time(efi_time_t *tm, 
efi_time_cap_t *tc)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
-   phys_tc = virt_to_phys(tc);
+   phys_tm = virt_to_phys_or_null(tm);
+   phys_tc = virt_to_phys_or_null(tc);
 
status = efi_thunk(get_time, phys_tm, phys_tc);
 
@@ -511,7 +522,7 @@ static efi_status_t efi_thunk_set_time(efi_time_t *tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_time, phys_tm);
 
@@ -529,9 +540,9 @@ efi_thunk_get_wakeup_time(efi_bool_t *enabled, efi_bool_t 
*pending,
 
spin_lock(_lock);
 
-   phys_enabled = virt_to_phys(enabled);
-   phys_pending = virt_to_phys(pending);
-   phys_tm = virt_to_phys(tm);
+   phys_enabled = virt_to_phys_or_null(enabled);
+   phys_pending = virt_to_phys_or_null(pending);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(get_wakeup_time, phys_enabled,
 phys_pending, phys_tm);
@@ -549,7 +560,7 @@ efi_thunk_set_wakeup_time(efi_bool_t enabled, efi_time_t 
*tm)
 
spin_lock(_lock);
 
-   phys_tm = virt_to_phys(tm);
+   phys_tm = virt_to_phys_or_null(tm);
 
status = efi_thunk(set_wakeup_time, enabled, phys_tm);
 
@@ -567,11 +578,11 @@ efi_thunk_get_variable(efi_char16_t *name, efi_guid_t 
*vendor,
u32 phys_name, phys_vendor, phys_attr;
u32 phys_data_size, phys_data;
 
-   phys_data_size = virt_to_phys(data_size);
-   phys_vendor = virt_to_phys(vendor);
-   phys_name = virt_to_phys(name);
-   phys_attr = virt_to_phys(attr);
-   phys_data = virt_to_phys(data);
+   phys_data_size = virt_to_phys_or_null(data_size);
+   phys_vendor = virt_to_phys_or_null(vendor);
+   phys_name = virt_to_phys_or_null(name);
+   phys_attr = virt_to_phys_or_null(attr);
+   phys_data = virt_to_p

Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-20 Thread Matt Fleming
On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote:
> Hi,
> 
> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots.
> 
> The symptom is a reboot before the video console is available.
> 
> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
> services forever".  However, that commit is known to be broken.  The
> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
> 
> During the bisect some of the stopping points landed on commits that
> caused the boot process to hang rather than cause a reboot.  The
> commits that resulted in a hang are marked "git bisect skip" in this
> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
> 
> I'll try treating those hangs as bad bisect results and re-run the
> full bisect tomorrow.  In the meantime I wonder if the bisect log
> implicates a better regression candidate?

Could you mail the dmesg output when booting a known working kernel
with efi=debug ?


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-20 Thread Matt Fleming
On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote:
> Hi,
> 
> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots.
> 
> The symptom is a reboot before the video console is available.
> 
> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
> services forever".  However, that commit is known to be broken.  The
> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
> 
> During the bisect some of the stopping points landed on commits that
> caused the boot process to hang rather than cause a reboot.  The
> commits that resulted in a hang are marked "git bisect skip" in this
> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
> 
> I'll try treating those hangs as bad bisect results and re-run the
> full bisect tomorrow.  In the meantime I wonder if the bisect log
> implicates a better regression candidate?

Could you mail the dmesg output when booting a known working kernel
with efi=debug ?


Re: [PATCH v2] x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates

2016-10-20 Thread Matt Fleming
On Wed, 19 Oct, at 08:48:51PM, Alex Thorlton wrote:
> Some time ago, we brought our UV BIOS callback code up to speed with the
> new EFI memory mapping scheme, in commit:
> 
> d1be84a232e3 ("x86/uv: Update uv_bios_call() to use 
> efi_call_virt_pointer()")
> 
> By leveraging some changes that I made to a few of the EFI runtime
> callback mechanisms, in commit:
> 
> 80e75596079f ("efi: Convert efi_call_virt() to efi_call_virt_pointer()")
> 
> This got everything running smoothly on UV, with the new EFI mapping
> code.  However, this left one, small loose end, in that EFI_OLD_MEMMAP
> (a.k.a. efi=old_map) will no longer work on UV, on kernels that include
> the aforementioned changes.
> 
> At the time this was not a major issue (in fact, it still really isn't),
> but there's no reason that EFI_OLD_MEMMAP *shouldn't* work on our
> systems.  This commit adds a check into uv_bios_call, to see if we have
> the EFI_OLD_MEMMAP bit set in efi.flags.  If it is set, we fall back to
> using our old callback method, which uses efi_call directly on the __va
> of our function pointer.
> 
> v2: Invert if-statement and add unlikely() hint.
> 
> Signed-off-by: Alex Thorlton <athorl...@sgi.com>
> Cc: Mike Travis <tra...@sgi.com>
> Cc: Russ Anderson <r...@sgi.com>
> Cc: Dimitri Sivanich <sivan...@sgi.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: "H. Peter Anvin" <h...@zytor.com>
> Cc: Matt Fleming <m...@codeblueprint.co.uk>
> Cc: Masahiro Yamada <yamada.masah...@socionext.com>
> Cc: x...@kernel.org
> ---
>  arch/x86/platform/uv/bios_uv.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)

Looks fine to me. Applied to 'next'. Thanks!


Re: [PATCH v2] x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates

2016-10-20 Thread Matt Fleming
On Wed, 19 Oct, at 08:48:51PM, Alex Thorlton wrote:
> Some time ago, we brought our UV BIOS callback code up to speed with the
> new EFI memory mapping scheme, in commit:
> 
> d1be84a232e3 ("x86/uv: Update uv_bios_call() to use 
> efi_call_virt_pointer()")
> 
> By leveraging some changes that I made to a few of the EFI runtime
> callback mechanisms, in commit:
> 
> 80e75596079f ("efi: Convert efi_call_virt() to efi_call_virt_pointer()")
> 
> This got everything running smoothly on UV, with the new EFI mapping
> code.  However, this left one, small loose end, in that EFI_OLD_MEMMAP
> (a.k.a. efi=old_map) will no longer work on UV, on kernels that include
> the aforementioned changes.
> 
> At the time this was not a major issue (in fact, it still really isn't),
> but there's no reason that EFI_OLD_MEMMAP *shouldn't* work on our
> systems.  This commit adds a check into uv_bios_call, to see if we have
> the EFI_OLD_MEMMAP bit set in efi.flags.  If it is set, we fall back to
> using our old callback method, which uses efi_call directly on the __va
> of our function pointer.
> 
> v2: Invert if-statement and add unlikely() hint.
> 
> Signed-off-by: Alex Thorlton 
> Cc: Mike Travis 
> Cc: Russ Anderson 
> Cc: Dimitri Sivanich 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Matt Fleming 
> Cc: Masahiro Yamada 
> Cc: x...@kernel.org
> ---
>  arch/x86/platform/uv/bios_uv.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)

Looks fine to me. Applied to 'next'. Thanks!


Re: [REGRESSION] EFI mixed mode patch triggers boot failure

2016-10-20 Thread Matt Fleming
On Wed, 19 Oct, at 02:13:00PM, Laura Abbott wrote:
> On 10/19/2016 01:04 PM, Laura Abbott wrote:
> >Hi,
> >
> >Fedora received a bug report 
> >https://bugzilla.redhat.com/show_bug.cgi?id=1384238
> >of a bootup failure with stable 4.7.6. efi=noruntime fixed the bootup 
> >problem.
> >1297667083d5442aafe3e337b9413bf02b114edb was linked as the cause
> >of the problem.
> >
> >
> >x86/efi: Only map RAM into EFI page tables if in mixed-mode
> >
> >Waiman reported that booting with CONFIG_EFI_MIXED enabled on his
> >multi-terabyte HP machine results in boot crashes, because the EFI
> >region mapping functions loop forever while trying to map those
> >regions describing RAM.
> >
> >While this patch doesn't fix the underlying hang, there's really no
> >reason to map EFI_CONVENTIONAL_MEMORY regions into the EFI page tables
> >when mixed-mode is not in use at runtime.
> >
> >Reported-by: Waiman Long <waiman.l...@hpe.com>
> >Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
> >Cc: Borislav Petkov <b...@alien8.de>
> >Cc: Linus Torvalds <torva...@linux-foundation.org>
> >CC: Theodore Ts'o <ty...@mit.edu>
> >Cc: Arnd Bergmann <a...@arndb.de>
> >Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
> >Cc: Scott J Norton <scott.nor...@hpe.com>
> >Cc: Douglas Hatch <doug.ha...@hpe.com>
> >Cc: <sta...@vger.kernel.org> # v4.6+
> >Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk>
> >
> >I made a request in the bugzilla for the reporter to
> >give a bootlog with efi=debug which I'm still waiting on.
> >
> >Any ideas?
> >
> >Thanks,
> >Laura
> 
> dmesg with efi=debug from the reporter is attached

> [0.00] DMI: Hewlett-Packard Compaq CQ58 Notebook PC/188B, BIOS F.36 
> 06/07/2013

Hmm.. this is a fairly old machine, what kernel versions has the
reporter successfully run on it? The code that was nop'd out in the
above commit for the native case only came into existence in v4.6.

The fact that booting with efi=old_map does *not* seem to result in a
booting kernel is very suspicious. Could you ask them to double-check?

Maybe try this patch too on v4.7.6.

---

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 964c7022d31d..b07183c6b470 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -244,9 +244,12 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * text and allocate a new stack because we can't rely on the
 * stack pointer being < 4GB.
 */
-   if (!IS_ENABLED(CONFIG_EFI_MIXED) || efi_is_native())
+   if (!IS_ENABLED(CONFIG_EFI_MIXED))
return 0;
 
+   if (efi_is_native())
+   goto map_text;
+
/*
 * Map all of RAM so that we can access arguments in the 1:1
 * mapping when making EFI runtime calls.
@@ -273,6 +276,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
efi_scratch.phys_stack = virt_to_phys(page_address(page));
efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */
 
+map_text:
npages = (_etext - _text) >> PAGE_SHIFT;
text = __pa(_text);
pfn = text >> PAGE_SHIFT;


Re: [REGRESSION] EFI mixed mode patch triggers boot failure

2016-10-20 Thread Matt Fleming
On Wed, 19 Oct, at 02:13:00PM, Laura Abbott wrote:
> On 10/19/2016 01:04 PM, Laura Abbott wrote:
> >Hi,
> >
> >Fedora received a bug report 
> >https://bugzilla.redhat.com/show_bug.cgi?id=1384238
> >of a bootup failure with stable 4.7.6. efi=noruntime fixed the bootup 
> >problem.
> >1297667083d5442aafe3e337b9413bf02b114edb was linked as the cause
> >of the problem.
> >
> >
> >x86/efi: Only map RAM into EFI page tables if in mixed-mode
> >
> >Waiman reported that booting with CONFIG_EFI_MIXED enabled on his
> >multi-terabyte HP machine results in boot crashes, because the EFI
> >region mapping functions loop forever while trying to map those
> >regions describing RAM.
> >
> >While this patch doesn't fix the underlying hang, there's really no
> >reason to map EFI_CONVENTIONAL_MEMORY regions into the EFI page tables
> >when mixed-mode is not in use at runtime.
> >
> >Reported-by: Waiman Long 
> >Cc: Ard Biesheuvel 
> >Cc: Borislav Petkov 
> >Cc: Linus Torvalds 
> >CC: Theodore Ts'o 
> >Cc: Arnd Bergmann 
> >Cc: Greg Kroah-Hartman 
> >Cc: Scott J Norton 
> >Cc: Douglas Hatch 
> >Cc:  # v4.6+
> >Signed-off-by: Matt Fleming 
> >
> >I made a request in the bugzilla for the reporter to
> >give a bootlog with efi=debug which I'm still waiting on.
> >
> >Any ideas?
> >
> >Thanks,
> >Laura
> 
> dmesg with efi=debug from the reporter is attached

> [0.00] DMI: Hewlett-Packard Compaq CQ58 Notebook PC/188B, BIOS F.36 
> 06/07/2013

Hmm.. this is a fairly old machine, what kernel versions has the
reporter successfully run on it? The code that was nop'd out in the
above commit for the native case only came into existence in v4.6.

The fact that booting with efi=old_map does *not* seem to result in a
booting kernel is very suspicious. Could you ask them to double-check?

Maybe try this patch too on v4.7.6.

---

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 964c7022d31d..b07183c6b470 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -244,9 +244,12 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * text and allocate a new stack because we can't rely on the
 * stack pointer being < 4GB.
 */
-   if (!IS_ENABLED(CONFIG_EFI_MIXED) || efi_is_native())
+   if (!IS_ENABLED(CONFIG_EFI_MIXED))
return 0;
 
+   if (efi_is_native())
+   goto map_text;
+
/*
 * Map all of RAM so that we can access arguments in the 1:1
 * mapping when making EFI runtime calls.
@@ -273,6 +276,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
efi_scratch.phys_stack = virt_to_phys(page_address(page));
efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */
 
+map_text:
npages = (_etext - _text) >> PAGE_SHIFT;
text = __pa(_text);
pfn = text >> PAGE_SHIFT;


[tip:sched/core] sched/fair: Kill the unused 'sched_shares_window_ns' tunable

2016-10-20 Thread tip-bot for Matt Fleming
Commit-ID:  3c3fcb45d524feb5d14a14f332e3eec7f2aff8f3
Gitweb: http://git.kernel.org/tip/3c3fcb45d524feb5d14a14f332e3eec7f2aff8f3
Author: Matt Fleming <m...@codeblueprint.co.uk>
AuthorDate: Wed, 19 Oct 2016 15:10:59 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Thu, 20 Oct 2016 08:44:57 +0200

sched/fair: Kill the unused 'sched_shares_window_ns' tunable

The last user of this tunable was removed in 2012 in commit:

  82958366cfea ("sched: Replace update_shares weight distribution with 
per-entity computation")

Delete it since its very existence confuses people.

Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dietmar Eggemann <dietmar.eggem...@arm.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Mike Galbraith <umgwanakikb...@gmail.com>
Cc: Paul Turner <p...@google.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Link: http://lkml.kernel.org/r/20161019141059.26408-1-m...@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 include/linux/sched/sysctl.h | 1 -
 kernel/sched/fair.c  | 7 ---
 kernel/sysctl.c  | 7 ---
 3 files changed, 15 deletions(-)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 22db1e6..4411453 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -36,7 +36,6 @@ extern unsigned int sysctl_numa_balancing_scan_size;
 extern unsigned int sysctl_sched_migration_cost;
 extern unsigned int sysctl_sched_nr_migrate;
 extern unsigned int sysctl_sched_time_avg;
-extern unsigned int sysctl_sched_shares_window;
 
 int sched_proc_update_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *length,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d941c97..79d464a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -93,13 +93,6 @@ unsigned int normalized_sysctl_sched_wakeup_granularity = 
100UL;
 
 const_debug unsigned int sysctl_sched_migration_cost = 50UL;
 
-/*
- * The exponential sliding  window over which load is averaged for shares
- * distribution.
- * (default: 10msec)
- */
-unsigned int __read_mostly sysctl_sched_shares_window = 1000UL;
-
 #ifdef CONFIG_CFS_BANDWIDTH
 /*
  * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 706309f..739fb17 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -347,13 +347,6 @@ static struct ctl_table kern_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec,
},
-   {
-   .procname   = "sched_shares_window_ns",
-   .data   = _sched_shares_window,
-   .maxlen = sizeof(unsigned int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec,
-   },
 #ifdef CONFIG_SCHEDSTATS
{
.procname   = "sched_schedstats",


[tip:sched/core] sched/fair: Kill the unused 'sched_shares_window_ns' tunable

2016-10-20 Thread tip-bot for Matt Fleming
Commit-ID:  3c3fcb45d524feb5d14a14f332e3eec7f2aff8f3
Gitweb: http://git.kernel.org/tip/3c3fcb45d524feb5d14a14f332e3eec7f2aff8f3
Author: Matt Fleming 
AuthorDate: Wed, 19 Oct 2016 15:10:59 +0100
Committer:  Ingo Molnar 
CommitDate: Thu, 20 Oct 2016 08:44:57 +0200

sched/fair: Kill the unused 'sched_shares_window_ns' tunable

The last user of this tunable was removed in 2012 in commit:

  82958366cfea ("sched: Replace update_shares weight distribution with 
per-entity computation")

Delete it since its very existence confuses people.

Signed-off-by: Matt Fleming 
Cc: Dietmar Eggemann 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Paul Turner 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/20161019141059.26408-1-m...@codeblueprint.co.uk
Signed-off-by: Ingo Molnar 
---
 include/linux/sched/sysctl.h | 1 -
 kernel/sched/fair.c  | 7 ---
 kernel/sysctl.c  | 7 ---
 3 files changed, 15 deletions(-)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 22db1e6..4411453 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -36,7 +36,6 @@ extern unsigned int sysctl_numa_balancing_scan_size;
 extern unsigned int sysctl_sched_migration_cost;
 extern unsigned int sysctl_sched_nr_migrate;
 extern unsigned int sysctl_sched_time_avg;
-extern unsigned int sysctl_sched_shares_window;
 
 int sched_proc_update_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *length,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d941c97..79d464a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -93,13 +93,6 @@ unsigned int normalized_sysctl_sched_wakeup_granularity = 
100UL;
 
 const_debug unsigned int sysctl_sched_migration_cost = 50UL;
 
-/*
- * The exponential sliding  window over which load is averaged for shares
- * distribution.
- * (default: 10msec)
- */
-unsigned int __read_mostly sysctl_sched_shares_window = 1000UL;
-
 #ifdef CONFIG_CFS_BANDWIDTH
 /*
  * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 706309f..739fb17 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -347,13 +347,6 @@ static struct ctl_table kern_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec,
},
-   {
-   .procname   = "sched_shares_window_ns",
-   .data   = _sched_shares_window,
-   .maxlen = sizeof(unsigned int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec,
-   },
 #ifdef CONFIG_SCHEDSTATS
{
.procname   = "sched_schedstats",


[PATCH] sched/fair: Kill the unused sched_shares_window_ns tunable

2016-10-19 Thread Matt Fleming
The last user of this tunable was removed in 2012 in commit 8295836
("sched: Replace update_shares weight distribution with per-entity
computation").

Delete it since its very existence confuses people.

Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggem...@arm.com>
Cc: Paul Turner <p...@google.com>
Cc: Mike Galbraith <umgwanakikb...@gmail.com>
Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk>
---
 include/linux/sched/sysctl.h | 1 -
 kernel/sched/fair.c  | 7 ---
 kernel/sysctl.c  | 7 ---
 3 files changed, 15 deletions(-)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 22db1e63707e..441145351301 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -36,7 +36,6 @@ extern unsigned int sysctl_numa_balancing_scan_size;
 extern unsigned int sysctl_sched_migration_cost;
 extern unsigned int sysctl_sched_nr_migrate;
 extern unsigned int sysctl_sched_time_avg;
-extern unsigned int sysctl_sched_shares_window;
 
 int sched_proc_update_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *length,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 543b2f291152..f026050c15cc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -93,13 +93,6 @@ unsigned int normalized_sysctl_sched_wakeup_granularity = 
100UL;
 
 const_debug unsigned int sysctl_sched_migration_cost = 50UL;
 
-/*
- * The exponential sliding  window over which load is averaged for shares
- * distribution.
- * (default: 10msec)
- */
-unsigned int __read_mostly sysctl_sched_shares_window = 1000UL;
-
 #ifdef CONFIG_CFS_BANDWIDTH
 /*
  * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index a13bbdaab47d..2a361dc6e1eb 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -347,13 +347,6 @@ static struct ctl_table kern_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec,
},
-   {
-   .procname   = "sched_shares_window_ns",
-   .data   = _sched_shares_window,
-   .maxlen = sizeof(unsigned int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec,
-   },
 #ifdef CONFIG_SCHEDSTATS
{
.procname   = "sched_schedstats",
-- 
2.10.0



<    1   2   3   4   5   6   7   8   9   10   >