RE: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-07 Thread Prakhya, Sai Praneeth
> >> > pstore writes could potentially be invoked in interrupt context and
> >> > it uses set_variable<>() and query_variable_info<>() to store logs.
> >> > If we invoke efi_runtime_services() through efi_rts_wq while in
> >> > atomic() kernel issues a warning ("scheduling wile in atomic") and
> >> > prints stack trace. One way to overcome this is to not make the
> >> > caller process wait for the worker thread to finish. This approach
> >> > breaks pstore i.e. the log messages aren't written to efi
> >> > variables. Hence, pstore calls
> >> > efi_runtime_services() without using efi_rts_wq or in other words
> >> > efi_rts_wq will be used unconditionally for all the
> >> > efi_runtime_services() except set_variable<>() and
> >> > query_variable_info<>()
> >>
> >> Can't NMIs still come in during this?
> >>
> >> ... or do we assume that since something has already gone wrong, this
> >> doesn't matter?
> >>
> >
> > I think they can come. AFAIK, pstore (if enabled) runs only when we crashed.
> > Since, we are still executing stuff to log messages and NMI's can't be
> > masked, there is still a possibility for NMI's to occur (please correct me 
> > if I am
> wrong).
> > But, as you said earlier, I guess it doesn't matter because anyways we are
> going down.
> 
> The problem is that we are not always in a "already going down"
> condition for typical set_variable and query_variable_info calls.

That's correct.

> So are we actually fixing anything with this patchset?

When we are _not_ in interrupt context (eg: process context)
we still use efi_rts_wq to invoke *all* efi_runtime_services().
This solves *someone trying to access user space* while in EFI runtime services
mapping problem. A instance could be, some user space process requests us to
execute efi_runtime_service(), so, kernel switches to efi_pgd (which doesn’t 
have
user space part of process address space) and while executing 
efi_runtime_service()
perf NMI comes along to profile user data.

> In other words if the NMI vs EFI
> mapping problem requires the workqueue context then we can't have any EFI
> calls outside of that context.  Am I missing how this scheme addresses that
> problem?

I think so, because, we are not trying to solve NMI vs EFI Runtime Service 
mappings.
AFAIK, they both work together (for x86_64). The problem is, as stated above,
"someone trying to access user space while executing EFI runtime services".
The problem exists because EFI runtime services mappings don’t have user space
part of process address space.

I think the problem still persists when we are already in interrupt context and 
then
we were requested to execute some efi_runtime_service() and then NMI happens
and that NMI handler touches user space.

Please let me know if my explanation didn’t make sense or if I misunderstood
your question.

Ard and Matt, please correct me if I stated something that is incorrect.


Re: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-07 Thread Dan Williams
On Wed, Mar 7, 2018 at 8:11 PM, Prakhya, Sai Praneeth
 wrote:
> 05, 2018 at 03:23:10PM -0800, Sai Praneeth Prakhya wrote:
>> > From: Sai Praneeth 
>> >
>> > Presently, efi_runtime_services() are executed by firmware in process
>> > context. To execute efi_runtime_service(), kernel switches the page
>> > directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have
>> > any user space mappings. A potential issue could be, for instance, an
>> > NMI interrupt (like perf) trying to profile some user data while in 
>> > efi_pgd.
>>
>> It might be worth pointing out that this could result in disaster (e.g.
>> if the frame pointer happens to point at MMIO in the EFI runtime services
>> mappings).
>>
>
> Sorry! I didn't get it. I would like to add this point, so could you please
> explain it more?
>
>> [...]
>>
>> > pstore writes could potentially be invoked in interrupt context and it
>> > uses set_variable<>() and query_variable_info<>() to store logs. If we
>> > invoke efi_runtime_services() through efi_rts_wq while in atomic()
>> > kernel issues a warning ("scheduling wile in atomic") and prints stack
>> > trace. One way to overcome this is to not make the caller process wait
>> > for the worker thread to finish. This approach breaks pstore i.e. the
>> > log messages aren't written to efi variables. Hence, pstore calls
>> > efi_runtime_services() without using efi_rts_wq or in other words
>> > efi_rts_wq will be used unconditionally for all the
>> > efi_runtime_services() except set_variable<>() and
>> > query_variable_info<>()
>>
>> Can't NMIs still come in during this?
>>
>> ... or do we assume that since something has already gone wrong, this doesn't
>> matter?
>>
>
> I think they can come. AFAIK, pstore (if enabled) runs only when we crashed.
> Since, we are still executing stuff to log messages and NMI's can't be masked,
> there is still a possibility for NMI's to occur (please correct me if I am 
> wrong).
> But, as you said earlier, I guess it doesn't matter because anyways we are 
> going down.

The problem is that we are not always in a "already going down"
condition for typical set_variable and query_variable_info calls. So
are we actually fixing anything with this patchset? In other words if
the NMI vs EFI mapping problem requires the workqueue context then we
can't have any EFI calls outside of that context.  Am I missing how
this scheme addresses that problem?
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-07 Thread Prakhya, Sai Praneeth
05, 2018 at 03:23:10PM -0800, Sai Praneeth Prakhya wrote:
> > From: Sai Praneeth 
> >
> > Presently, efi_runtime_services() are executed by firmware in process
> > context. To execute efi_runtime_service(), kernel switches the page
> > directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have
> > any user space mappings. A potential issue could be, for instance, an
> > NMI interrupt (like perf) trying to profile some user data while in efi_pgd.
> 
> It might be worth pointing out that this could result in disaster (e.g.
> if the frame pointer happens to point at MMIO in the EFI runtime services
> mappings).
> 

Sorry! I didn't get it. I would like to add this point, so could you please
explain it more?

> [...]
> 
> > pstore writes could potentially be invoked in interrupt context and it
> > uses set_variable<>() and query_variable_info<>() to store logs. If we
> > invoke efi_runtime_services() through efi_rts_wq while in atomic()
> > kernel issues a warning ("scheduling wile in atomic") and prints stack
> > trace. One way to overcome this is to not make the caller process wait
> > for the worker thread to finish. This approach breaks pstore i.e. the
> > log messages aren't written to efi variables. Hence, pstore calls
> > efi_runtime_services() without using efi_rts_wq or in other words
> > efi_rts_wq will be used unconditionally for all the
> > efi_runtime_services() except set_variable<>() and
> > query_variable_info<>()
> 
> Can't NMIs still come in during this?
> 
> ... or do we assume that since something has already gone wrong, this doesn't
> matter?
>

I think they can come. AFAIK, pstore (if enabled) runs only when we crashed.
Since, we are still executing stuff to log messages and NMI's can't be masked,
there is still a possibility for NMI's to occur (please correct me if I am 
wrong).
But, as you said earlier, I guess it doesn't matter because anyways we are 
going down.
 
> Otherwise, this doesn't seem to break basic stuff on arm64 platforms. I can 
> boot
> up, read the EFI RTC, and reboot. I ahd lockdep and KASAN enabled, I received
> no splats from either.
> 
> FWIW:
> 
> Tested-by: Mark Rutland  
> Thanks,
> Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-06 Thread Mark Rutland
On Mon, Mar 05, 2018 at 03:23:10PM -0800, Sai Praneeth Prakhya wrote:
> From: Sai Praneeth 
> 
> Presently, efi_runtime_services() are executed by firmware in process
> context. To execute efi_runtime_service(), kernel switches the page
> directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
> user space mappings. A potential issue could be, for instance, an NMI
> interrupt (like perf) trying to profile some user data while in efi_pgd.

It might be worth pointing out that this could result in disaster (e.g.
if the frame pointer happens to point at MMIO in the EFI runtime
services mappings).

[...]

> pstore writes could potentially be invoked in interrupt context and it
> uses set_variable<>() and query_variable_info<>() to store logs. If we
> invoke efi_runtime_services() through efi_rts_wq while in atomic()
> kernel issues a warning ("scheduling wile in atomic") and prints stack
> trace. One way to overcome this is to not make the caller process wait
> for the worker thread to finish. This approach breaks pstore i.e. the
> log messages aren't written to efi variables. Hence, pstore calls
> efi_runtime_services() without using efi_rts_wq or in other words
> efi_rts_wq will be used unconditionally for all the
> efi_runtime_services() except set_variable<>() and
> query_variable_info<>()

Can't NMIs still come in during this?

... or do we assume that since something has already gone wrong, this
doesn't matter?

Otherwise, this doesn't seem to break basic stuff on arm64 platforms. I
can boot up, read the EFI RTC, and reboot. I ahd lockdep and KASAN
enabled, I received no splats from either.

FWIW:

Tested-by: Mark Rutland 

RE: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-05 Thread Prakhya, Sai Praneeth
> > Presently, efi_runtime_services() are executed by firmware in process
> > context. To execute efi_runtime_service(), kernel switches the page
> > directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have
> > any user space mappings. A potential issue could be, for instance, an
> > NMI interrupt (like perf) trying to profile some user data while in efi_pgd.
> >
> > A solution for this issue could be to use kthread to run
> > efi_runtime_service(). When a user/kernel thread requests to execute
> > efi_runtime_service(), kernel off-loads this work to kthread which in
> > turn uses efi_pgd. Anything that tries to touch user space addresses
> > while in kthread is terminally broken. This patch adds support to efi
> > subsystem to handle all calls to efi_runtime_services() using a work
> > queue (which in turn uses kthread).
> >
> > Implementation summary:
> > ---
> > 1. When user/kernel thread requests to execute efi_runtime_service(),
> > enqueue work to efi_rts_workqueue.
> > 2. Caller thread waits until the work is finished because it's
> > dependent on the return status of efi_runtime_service().
> >
> > Semantics to pack arguments in efi_runtime_work (has void pointers):
> > 1. If argument is a pointer (of any type), pass it as is.
> > 2. If argument is a value (of any type), address of the value is
> > passed.
> >
> > Introduce a handler function (called efi_call_rts()) that
> > a. understands efi_runtime_work and
> > b. invokes the appropriate efi_runtime_service() with the
> > appropriate arguments
> >
> > Semantics followed by efi_call_rts() to understand efi_runtime_work:
> > 1. If argument was a pointer, recast it from void pointer to original
> > pointer type.
> > 2. If argument was a value, recast it from void pointer to original
> > pointer type and dereference it.
> >
> > pstore writes could potentially be invoked in interrupt context and it
> > uses set_variable<>() and query_variable_info<>() to store logs. If we
> > invoke efi_runtime_services() through efi_rts_wq while in atomic()
> > kernel issues a warning ("scheduling wile in atomic") and prints stack
> > trace. One way to overcome this is to not make the caller process wait
> > for the worker thread to finish. This approach breaks pstore i.e. the
> > log messages aren't written to efi variables. Hence, pstore calls
> > efi_runtime_services() without using efi_rts_wq or in other words
> > efi_rts_wq will be used unconditionally for all the
> > efi_runtime_services() except set_variable<>() and
> > query_variable_info<>()
> 
> 
> Is there a place in the system reboot path where we can try to flush these
> asynchronous pstore writes from interrupt context?

I don't think so because, the issue is not with the pstore writes but with 
pstore
using efi as backing store. Anything could register as pstore backend, eg: RAM,
ACPI-ERST etc.. and AFAIK, they don’t use work queues to store logs. Now that
efi_runtime_services() uses work queues, we unfortunately have to have this 
hack.

> It seems unfortunate that
> we need to have this wide exception for all
> set_variable() calls.

True, basically any efi_runtime_service() that might get called in interrupt 
context.
I am not very happy to have the hack too, but didn’t find other way.

Either that or switch to an explicit "emergency mode" where
> we stop caring about protecting the system from EFI runtime code because
> we're already crashing.

Should we care about extra warning (scheduling while in atomic) when we are 
already
crashing? This sounds kind of debatable. I will wait for feedback from 
community it they
think it's OK or maybe a better solution.

Regards,
Sai


Re: [PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-05 Thread Dan Williams
On Mon, Mar 5, 2018 at 3:23 PM, Sai Praneeth Prakhya
 wrote:
> From: Sai Praneeth 
>
> Presently, efi_runtime_services() are executed by firmware in process
> context. To execute efi_runtime_service(), kernel switches the page
> directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
> user space mappings. A potential issue could be, for instance, an NMI
> interrupt (like perf) trying to profile some user data while in efi_pgd.
>
> A solution for this issue could be to use kthread to run
> efi_runtime_service(). When a user/kernel thread requests to execute
> efi_runtime_service(), kernel off-loads this work to kthread which in
> turn uses efi_pgd. Anything that tries to touch user space addresses
> while in kthread is terminally broken. This patch adds support to efi
> subsystem to handle all calls to efi_runtime_services() using a work
> queue (which in turn uses kthread).
>
> Implementation summary:
> ---
> 1. When user/kernel thread requests to execute efi_runtime_service(),
> enqueue work to efi_rts_workqueue.
> 2. Caller thread waits until the work is finished because it's dependent
> on the return status of efi_runtime_service().
>
> Semantics to pack arguments in efi_runtime_work (has void pointers):
> 1. If argument is a pointer (of any type), pass it as is.
> 2. If argument is a value (of any type), address of the value is
> passed.
>
> Introduce a handler function (called efi_call_rts()) that
> a. understands efi_runtime_work and
> b. invokes the appropriate efi_runtime_service() with the
> appropriate arguments
>
> Semantics followed by efi_call_rts() to understand efi_runtime_work:
> 1. If argument was a pointer, recast it from void pointer to original
> pointer type.
> 2. If argument was a value, recast it from void pointer to original
> pointer type and dereference it.
>
> pstore writes could potentially be invoked in interrupt context and it
> uses set_variable<>() and query_variable_info<>() to store logs. If we
> invoke efi_runtime_services() through efi_rts_wq while in atomic()
> kernel issues a warning ("scheduling wile in atomic") and prints stack
> trace. One way to overcome this is to not make the caller process wait
> for the worker thread to finish. This approach breaks pstore i.e. the
> log messages aren't written to efi variables. Hence, pstore calls
> efi_runtime_services() without using efi_rts_wq or in other words
> efi_rts_wq will be used unconditionally for all the
> efi_runtime_services() except set_variable<>() and
> query_variable_info<>()


Is there a place in the system reboot path where we can try to flush
these asynchronous pstore writes from interrupt context? It seems
unfortunate that we need to have this wide exception for all
set_variable() calls. Either that or switch to an explicit "emergency
mode" where we stop caring about protecting the system from EFI
runtime code because we're already crashing.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html