RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-08-07 Thread Prakhya, Sai Praneeth
> > 1. Since we don't use "efi_rts_wq" for efi_reset_system(), the above
> > solution doesn't work if efi_reset_system() is buggy. We cannot
> > neglect it either because that's the issue faced by Al Stone and hence
> > I started the patch set. So, I am thinking to use long jump technique 
> > *only* for
> efi_reset_system(). Would you be OK with that?
> > Or please propose a new solution.
> 
> The long jump thing is super ugly.  I'd be okay with it if it were somehow
> factored out into a testable mechanism, but it's a nasty complicated bit of
> assembly, it will only be used very very rarely, and it more or less 
> duplicates
> switch_to().

Yes, I agree that it's complicated assembly. But I tested it by hacking OVMF.
And.. you are correct that it's very very rarely used.. so probably there is no 
point adding complicated code for it.

> 
> Could you just do a fallback reboot from the fault handler if
> efi_reset_system() fails?
>

Sure! I will try that and will update you on how that goes. Hopefully, well.
 
> >
> > 2. Also, as suggested, I haven’t used a dedicated "efi_kthread", I got
> > this working with the existing "efi_rts_wq" setup. Please note that,
> > the existing "efi_rts_wq", has only one worker thread. Could you
> > please brief me on why we should have a dedicated kthread and not leverage
> "efi_rts_wq" (I am unclear on this).
> 
> Two reasons:
> 
> 1. If you used a dedicated thread, then you could probably just set that 
> thread's -
> >mm to the EFI mm rather than manually switching the mm.

That means, we don't need efi_switch_mm() anymore and could just get away
by changing efi_kthread ->mm = efi_mm.
Is that what you meant? Just wanted to make sure, I understood you correctly.

> 
> 2. If you used a dedicated thread, you would be a lot closer to being able to 
> call
> into EFI at CPL 3.  In x86 Linux (and probably most or all other arches), the 
> kernel
> never calls into CPL 3.  Instead it
> *returns* into CPL 3.  So you would set up a thread and make that thread's
> task_pt_regs() have the correct context for a little stub that calls into EFI 
> and
> then does "syscall".  Then you would return all the way out of the kernel 
> into the
> stub.

I got it.. not 100% though.. but I will surely work on this because it sounds 
very interesting to me.

But still, I have a question, this article https://lwn.net/Articles/23634/ says 
that
"each workqueue has one or more dedicated worker threads (one per CPU, by 
default) associated with it"
So, do you think we still need our own efi dedicated kthread?

Regards,
Sai
N�r��y���b�X��ǧv�^�)޺{.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�

Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-08-07 Thread Andy Lutomirski
On Tue, Aug 7, 2018 at 11:01 AM, Prakhya, Sai Praneeth
 wrote:
> Hi All,
>
>> > >> The main problem is that we have just merged Sai's code to use a
>> > >> work queue for invoking EFI services, and killing kworker threads
>> > >> is obviously not going to make EFI any new friends.
>> > >>
>> > >> So I guess we should probably rework that code to use a dedicated
>> > >> kthread, and just freeze it when it performs an illegal memory
>> > >> access, and signal the completion in a way that makes the calling
>> > >> thread understand that a) the call failed and b) runtime services
>> > >> are no longer
>> > available.
>> > >
>> > > Yes, this makes sense to me.
>> > > Initially I did use a dedicated kthread for efi but moved to work
>> > > queues later so
>> > that the synchronization is well handled. I am ok to rework on the
>> > patches, could we ask Ingo to hold onto efi_workqueue patches?
>> > >
>> >
>> > I am fine with keeping them. We will have a different approach in
>> > v4.19 than in subsequent kernels, but the workqueue approach is still
>> > better than nothing at all.
>
> As discussed in previous mails, I have implemented the below:
>
> If kernel detects an illegal access by firmware to any efi region other than
> efi boot time code/data,
> a. It freezes "efi_rts_wq" (efi runtime services work queue).
> b. Signals the requested process that an error occurred.
> c. Disables EFI Runtime Services forever.
> d. Explicitly calls the scheduler to schedule another process.
>
> But, I have couple of questions:
>
> 1. Since we don't use "efi_rts_wq" for efi_reset_system(), the above solution
> doesn't work if efi_reset_system() is buggy. We cannot neglect it either 
> because
> that's the issue faced by Al Stone and hence I started the patch set. So, I 
> am thinking
> to use long jump technique *only* for efi_reset_system(). Would you be OK 
> with that?
> Or please propose a new solution.

The long jump thing is super ugly.  I'd be okay with it if it were
somehow factored out into a testable mechanism, but it's a nasty
complicated bit of assembly, it will only be used very very rarely,
and it more or less duplicates switch_to().

Could you just do a fallback reboot from the fault handler if
efi_reset_system() fails?

>
> 2. Also, as suggested, I haven’t used a dedicated "efi_kthread", I got this 
> working with
> the existing "efi_rts_wq" setup. Please note that, the existing "efi_rts_wq", 
> has only one
> worker thread. Could you please brief me on why we should have a dedicated 
> kthread
> and not leverage "efi_rts_wq" (I am unclear on this).

Two reasons:

1. If you used a dedicated thread, then you could probably just set
that thread's ->mm to the EFI mm rather than manually switching the
mm.

2. If you used a dedicated thread, you would be a lot closer to being
able to call into EFI at CPL 3.  In x86 Linux (and probably most or
all other arches), the kernel never calls into CPL 3.  Instead it
*returns* into CPL 3.  So you would set up a thread and make that
thread's task_pt_regs() have the correct context for a little stub
that calls into EFI and then does "syscall".  Then you would return
all the way out of the kernel into the stub.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-08-07 Thread Prakhya, Sai Praneeth
> > As discussed in previous mails, I have implemented the below:
> >
> > If kernel detects an illegal access by firmware to any efi region
> > other than efi boot time code/data, a. It freezes "efi_rts_wq" (efi
> > runtime services work queue).
> > b. Signals the requested process that an error occurred.
> > c. Disables EFI Runtime Services forever.
> > d. Explicitly calls the scheduler to schedule another process.
> >
> > But, I have couple of questions:
> >
> > 1. Since we don't use "efi_rts_wq" for efi_reset_system(), the above
> > solution doesn't work if efi_reset_system() is buggy. We cannot
> > neglect it either because that's the issue faced by Al Stone and hence
> > I started the patch set. So, I am thinking to use long jump technique 
> > *only* for
> efi_reset_system(). Would you be OK with that?
> > Or please propose a new solution.
> >
> 
> What would be the point of that? Can you reasonably expect the kernel to still
> run reliably after it has taken itself down, called
> efi_reset_system() and failed?
> 

The problem with efi_reset_system() being buggy is,
If user calls "reboot", kernel would call efi_reset_system() and if unhandled, 
kernel 
hangs and never reboots. But, if handled and returning error (either using long 
jump 
or some other technique), kernel could continue rebooting through other fall 
backs. 
I think, this is true only for x86 though.

Please note that native_machine_emergency_restart() would call "BOOT_BIOS", 
if efi_reboot() returns in arch/x86/kernel/reboot

> > 2. Also, as suggested, I haven’t used a dedicated "efi_kthread", I got
> > this working with the existing "efi_rts_wq" setup. Please note that,
> > the existing "efi_rts_wq", has only one worker thread. Could you
> > please brief me on why we should have a dedicated kthread and not leverage
> "efi_rts_wq" (I am unclear on this).
> >
> 
> Does every work queue have its own dedicated kernel thread?

Hmm.. I think so.. I referred to the below articles while working on 
"efi_rts_wq"

They say
1. Each workqueue has one or more dedicated worker threads (one per CPU, by 
default) associated with it.
2. Tasks running out of a special-purpose workqueue can sleep indefinitely, 
since they will not interfere with tasks in any other queue.

[1] https://lwn.net/Articles/23634/
[2] https://lwn.net/Articles/11360/

Regards,
Sai


Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-08-07 Thread Ard Biesheuvel
On 7 August 2018 at 20:01, Prakhya, Sai Praneeth
 wrote:
> Hi All,
>
>> > >> The main problem is that we have just merged Sai's code to use a
>> > >> work queue for invoking EFI services, and killing kworker threads
>> > >> is obviously not going to make EFI any new friends.
>> > >>
>> > >> So I guess we should probably rework that code to use a dedicated
>> > >> kthread, and just freeze it when it performs an illegal memory
>> > >> access, and signal the completion in a way that makes the calling
>> > >> thread understand that a) the call failed and b) runtime services
>> > >> are no longer
>> > available.
>> > >
>> > > Yes, this makes sense to me.
>> > > Initially I did use a dedicated kthread for efi but moved to work
>> > > queues later so
>> > that the synchronization is well handled. I am ok to rework on the
>> > patches, could we ask Ingo to hold onto efi_workqueue patches?
>> > >
>> >
>> > I am fine with keeping them. We will have a different approach in
>> > v4.19 than in subsequent kernels, but the workqueue approach is still
>> > better than nothing at all.
>
> As discussed in previous mails, I have implemented the below:
>
> If kernel detects an illegal access by firmware to any efi region other than
> efi boot time code/data,
> a. It freezes "efi_rts_wq" (efi runtime services work queue).
> b. Signals the requested process that an error occurred.
> c. Disables EFI Runtime Services forever.
> d. Explicitly calls the scheduler to schedule another process.
>
> But, I have couple of questions:
>
> 1. Since we don't use "efi_rts_wq" for efi_reset_system(), the above solution
> doesn't work if efi_reset_system() is buggy. We cannot neglect it either 
> because
> that's the issue faced by Al Stone and hence I started the patch set. So, I 
> am thinking
> to use long jump technique *only* for efi_reset_system(). Would you be OK 
> with that?
> Or please propose a new solution.
>

What would be the point of that? Can you reasonably expect the kernel
to still run reliably after it has taken itself down, called
efi_reset_system() and failed?

> 2. Also, as suggested, I haven’t used a dedicated "efi_kthread", I got this 
> working with
> the existing "efi_rts_wq" setup. Please note that, the existing "efi_rts_wq", 
> has only one
> worker thread. Could you please brief me on why we should have a dedicated 
> kthread
> and not leverage "efi_rts_wq" (I am unclear on this).
>

Does every work queue have its own dedicated kernel thread?
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-08-07 Thread Prakhya, Sai Praneeth
Hi All,

> > >> The main problem is that we have just merged Sai's code to use a
> > >> work queue for invoking EFI services, and killing kworker threads
> > >> is obviously not going to make EFI any new friends.
> > >>
> > >> So I guess we should probably rework that code to use a dedicated
> > >> kthread, and just freeze it when it performs an illegal memory
> > >> access, and signal the completion in a way that makes the calling
> > >> thread understand that a) the call failed and b) runtime services
> > >> are no longer
> > available.
> > >
> > > Yes, this makes sense to me.
> > > Initially I did use a dedicated kthread for efi but moved to work
> > > queues later so
> > that the synchronization is well handled. I am ok to rework on the
> > patches, could we ask Ingo to hold onto efi_workqueue patches?
> > >
> >
> > I am fine with keeping them. We will have a different approach in
> > v4.19 than in subsequent kernels, but the workqueue approach is still
> > better than nothing at all.

As discussed in previous mails, I have implemented the below:

If kernel detects an illegal access by firmware to any efi region other than 
efi boot time code/data,
a. It freezes "efi_rts_wq" (efi runtime services work queue).
b. Signals the requested process that an error occurred.
c. Disables EFI Runtime Services forever.
d. Explicitly calls the scheduler to schedule another process.

But, I have couple of questions:

1. Since we don't use "efi_rts_wq" for efi_reset_system(), the above solution 
doesn't work if efi_reset_system() is buggy. We cannot neglect it either 
because 
that's the issue faced by Al Stone and hence I started the patch set. So, I am 
thinking 
to use long jump technique *only* for efi_reset_system(). Would you be OK with 
that?
Or please propose a new solution.

2. Also, as suggested, I haven’t used a dedicated "efi_kthread", I got this 
working with 
the existing "efi_rts_wq" setup. Please note that, the existing "efi_rts_wq", 
has only one 
worker thread. Could you please brief me on why we should have a dedicated 
kthread 
and not leverage "efi_rts_wq" (I am unclear on this).

Regards,
Sai
N�r��y���b�X��ǧv�^�)޺{.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�

RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-26 Thread Prakhya, Sai Praneeth
> >> The main problem is that we have just merged Sai's code to use a work
> >> queue for invoking EFI services, and killing kworker threads is
> >> obviously not going to make EFI any new friends.
> >>
> >> So I guess we should probably rework that code to use a dedicated
> >> kthread, and just freeze it when it performs an illegal memory
> >> access, and signal the completion in a way that makes the calling
> >> thread understand that a) the call failed and b) runtime services are no 
> >> longer
> available.
> >
> > Yes, this makes sense to me.
> > Initially I did use a dedicated kthread for efi but moved to work queues 
> > later so
> that the synchronization is well handled. I am ok to rework on the patches, 
> could
> we ask Ingo to hold onto efi_workqueue patches?
> >
> 
> I am fine with keeping them. We will have a different approach in
> v4.19 than in subsequent kernels, but the workqueue approach is still better 
> than
> nothing at all.

Makes sense to me.



Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-26 Thread Ard Biesheuvel
On 26 July 2018 at 20:51, Prakhya, Sai Praneeth
 wrote:
>> > The basic idea is that, when you get an exception from a context that
>> > is busted (i.e. it wasn't user code with a signal handler or kernel
>> > code with a fixup), you can't return from the exception handler.  That
>> > leaves two choices:
>> >
>> > 1. Kill the task without returning.  You could call do_exit(), or,
>> > roughly equivalently, let yourself OOPS (fall through to die(), etc).
>> > Then you have to make sure that the code that called into the thread
>> > can handle it by signaling some completion first or whatever you need
>> > to do.
>> >
>> > 2. Sleep forever.  For example, set the task state to
>> > TASK_UNINTERRUPTIBLE and reschedule.
>> >
>> > If the thread is a real kthread, (1) may be a bad idea. I’m not sure a
>> > kthread can tolerate do_exit(), since the kthread core plays awful
>> > games involving storing important data structures on the stack. Also,
>> > with (2), it might be possible to enable some after-the-fact
>> > debugging, since the full crashing state is still there.
>> >
>>
>> The main problem is that we have just merged Sai's code to use a work queue
>> for invoking EFI services, and killing kworker threads is obviously not 
>> going to
>> make EFI any new friends.
>>
>> So I guess we should probably rework that code to use a dedicated kthread, 
>> and
>> just freeze it when it performs an illegal memory access, and signal the
>> completion in a way that makes the calling thread understand that a) the call
>> failed and b) runtime services are no longer available.
>
> Yes, this makes sense to me.
> Initially I did use a dedicated kthread for efi but moved to work queues 
> later so that the synchronization is well handled. I am ok to rework on the 
> patches, could we ask Ingo to hold onto efi_workqueue patches?
>

I am fine with keeping them. We will have a different approach in
v4.19 than in subsequent kernels, but the workqueue approach is still
better than nothing at all.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-26 Thread Prakhya, Sai Praneeth
> > The basic idea is that, when you get an exception from a context that
> > is busted (i.e. it wasn't user code with a signal handler or kernel
> > code with a fixup), you can't return from the exception handler.  That
> > leaves two choices:
> >
> > 1. Kill the task without returning.  You could call do_exit(), or,
> > roughly equivalently, let yourself OOPS (fall through to die(), etc).
> > Then you have to make sure that the code that called into the thread
> > can handle it by signaling some completion first or whatever you need
> > to do.
> >
> > 2. Sleep forever.  For example, set the task state to
> > TASK_UNINTERRUPTIBLE and reschedule.
> >
> > If the thread is a real kthread, (1) may be a bad idea. I’m not sure a
> > kthread can tolerate do_exit(), since the kthread core plays awful
> > games involving storing important data structures on the stack. Also,
> > with (2), it might be possible to enable some after-the-fact
> > debugging, since the full crashing state is still there.
> >
> 
> The main problem is that we have just merged Sai's code to use a work queue
> for invoking EFI services, and killing kworker threads is obviously not going 
> to
> make EFI any new friends.
> 
> So I guess we should probably rework that code to use a dedicated kthread, and
> just freeze it when it performs an illegal memory access, and signal the
> completion in a way that makes the calling thread understand that a) the call
> failed and b) runtime services are no longer available.

Yes, this makes sense to me.
Initially I did use a dedicated kthread for efi but moved to work queues later 
so that the synchronization is well handled. I am ok to rework on the patches, 
could we ask Ingo to hold onto efi_workqueue patches?

Regards,
Sai
N�r��y���b�X��ǧv�^�)޺{.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�

Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-26 Thread Ard Biesheuvel
On 26 July 2018 at 20:09, Andy Lutomirski  wrote:
> [sorry for totally busted formatting -- your email client and Gmail
> don't get along]
>
> On Jul 26, 2018, at 10:23 AM, Prakhya, Sai Praneeth
>  wrote:
>
>> I have added some x86/intel folks to cc.
>>
>> I am fine with these patches, and I think it is useful to be able to
>> detect and recover from buggy UEFI implementations that use boot time
>> regions at runtime.
>>
>> However, I need help from the x86 maintainers/developers to review
>> this so please cc them on these patches.
>>
>>
>>
>> I'm okay with the general concept, but I'm not really thrilled by the 
>> longjmp-like approach.
>>
>>
>>
>> Wasn't there a bunch of talk of having an actual kernel thread (kefid?) that 
>> makes runtime services calls?  Did that actually get implemented?  IMO a 
>> much nicer approach would be to handle the page fault by killing the thread, 
>> more or less like how we kill unruly user threads.  (And it's yet another 
>> step toward calling EFI runtime services at CPL 3!)
>>
>>
>>
>> Hi Andy,
>>
>>
>>
>> Thanks for the feedback J.
>>
>>
>>
>> We have efi_kthread implemented and I did briefly think about killing the 
>> efi_kthread approach, but I thought it might not be possible (I might be 
>> wrong) because, we are in page fault handler and if we kill efi_kthread, the 
>> page fault handler still returns to firmware (because a firmware instruction 
>> caused page fault) and firmware will try to perform illegal access again 
>> thinking that the page fault handler might have fixed the fault. So, I took 
>> this approach of jumping out of firmware.
>>
>>
>>
>> Please let me know If you think I missed something.
>
> The basic idea is that, when you get an exception from a context that
> is busted (i.e. it wasn't user code with a signal handler or kernel
> code with a fixup), you can't return from the exception handler.  That
> leaves two choices:
>
> 1. Kill the task without returning.  You could call do_exit(), or,
> roughly equivalently, let yourself OOPS (fall through to die(), etc).
> Then you have to make sure that the code that called into the thread
> can handle it by signaling some completion first or whatever you need
> to do.
>
> 2. Sleep forever.  For example, set the task state to
> TASK_UNINTERRUPTIBLE and reschedule.
>
> If the thread is a real kthread, (1) may be a bad idea. I’m not sure a
> kthread can tolerate do_exit(), since the kthread core plays awful
> games involving storing important data structures on the stack. Also,
> with (2), it might be possible to enable some after-the-fact
> debugging, since the full crashing state is still there.
>

The main problem is that we have just merged Sai's code to use a work
queue for invoking EFI services, and killing kworker threads is
obviously not going to make EFI any new friends.

So I guess we should probably rework that code to use a dedicated
kthread, and just freeze it when it performs an illegal memory access,
and signal the completion in a way that makes the calling thread
understand that a) the call failed and b) runtime services are no
longer available.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-26 Thread Andy Lutomirski
[sorry for totally busted formatting -- your email client and Gmail
don't get along]

On Jul 26, 2018, at 10:23 AM, Prakhya, Sai Praneeth
 wrote:

> I have added some x86/intel folks to cc.
>
> I am fine with these patches, and I think it is useful to be able to
> detect and recover from buggy UEFI implementations that use boot time
> regions at runtime.
>
> However, I need help from the x86 maintainers/developers to review
> this so please cc them on these patches.
>
>
>
> I'm okay with the general concept, but I'm not really thrilled by the 
> longjmp-like approach.
>
>
>
> Wasn't there a bunch of talk of having an actual kernel thread (kefid?) that 
> makes runtime services calls?  Did that actually get implemented?  IMO a much 
> nicer approach would be to handle the page fault by killing the thread, more 
> or less like how we kill unruly user threads.  (And it's yet another step 
> toward calling EFI runtime services at CPL 3!)
>
>
>
> Hi Andy,
>
>
>
> Thanks for the feedback J.
>
>
>
> We have efi_kthread implemented and I did briefly think about killing the 
> efi_kthread approach, but I thought it might not be possible (I might be 
> wrong) because, we are in page fault handler and if we kill efi_kthread, the 
> page fault handler still returns to firmware (because a firmware instruction 
> caused page fault) and firmware will try to perform illegal access again 
> thinking that the page fault handler might have fixed the fault. So, I took 
> this approach of jumping out of firmware.
>
>
>
> Please let me know If you think I missed something.

The basic idea is that, when you get an exception from a context that
is busted (i.e. it wasn't user code with a signal handler or kernel
code with a fixup), you can't return from the exception handler.  That
leaves two choices:

1. Kill the task without returning.  You could call do_exit(), or,
roughly equivalently, let yourself OOPS (fall through to die(), etc).
Then you have to make sure that the code that called into the thread
can handle it by signaling some completion first or whatever you need
to do.

2. Sleep forever.  For example, set the task state to
TASK_UNINTERRUPTIBLE and reschedule.

If the thread is a real kthread, (1) may be a bad idea. I’m not sure a
kthread can tolerate do_exit(), since the kthread core plays awful
games involving storing important data structures on the stack. Also,
with (2), it might be possible to enable some after-the-fact
debugging, since the full crashing state is still there.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-26 Thread Prakhya, Sai Praneeth
I have added some x86/intel folks to cc.

I am fine with these patches, and I think it is useful to be able to
detect and recover from buggy UEFI implementations that use boot time
regions at runtime.

However, I need help from the x86 maintainers/developers to review
this so please cc them on these patches.

I'm okay with the general concept, but I'm not really thrilled by the 
longjmp-like approach.

Wasn't there a bunch of talk of having an actual kernel thread (kefid?) that 
makes runtime services calls?  Did that actually get implemented?  IMO a much 
nicer approach would be to handle the page fault by killing the thread, more or 
less like how we kill unruly user threads.  (And it's yet another step toward 
calling EFI runtime services at CPL 3!)

Hi Andy,

Thanks for the feedback ☺.

We have efi_kthread implemented and I did briefly think about killing the 
efi_kthread approach, but I thought it might not be possible (I might be wrong) 
because, we are in page fault handler and if we kill efi_kthread, the page 
fault handler still returns to firmware (because a firmware instruction caused 
page fault) and firmware will try to perform illegal access again thinking that 
the page fault handler might have fixed the fault. So, I took this approach of 
jumping out of firmware.

Please let me know If you think I missed something.

Regards,
Sai


Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-26 Thread Andy Lutomirski
On Wed, Jul 25, 2018 at 2:39 AM, Ard Biesheuvel 
wrote:

> (+ Ingo, Andy, Peter)
>
> Hello Sai,
>
> I have added some x86/intel folks to cc.
>
> I am fine with these patches, and I think it is useful to be able to
> detect and recover from buggy UEFI implementations that use boot time
> regions at runtime.
>
> However, I need help from the x86 maintainers/developers to review
> this so please cc them on these patches.
>
>
I'm okay with the general concept, but I'm not really thrilled by the
longjmp-like approach.

Wasn't there a bunch of talk of having an actual kernel thread (kefid?)
that makes runtime services calls?  Did that actually get implemented?  IMO
a much nicer approach would be to handle the page fault by killing the
thread, more or less like how we kill unruly user threads.  (And it's yet
another step toward calling EFI runtime services at CPL 3!)

--Andy


RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-26 Thread Prakhya, Sai Praneeth
> > AFAIK, efi runtime services are not reentrant. With this in mind, if 
> > something
> like above happens, I have completely turned off EFI runtime services in 
> kernel.
> Is that OK? Or should we keep them enabled hoping to catch further illegal
> accesses (assuming that this feature is not used in production kernels).
> >
> 
> I think it is reasonable to turn off services after that. The only problem is 
> that
> distros will never be able to enable this, given that it may break systems 
> that are
> working fine today.

Actually these patches shouldn't break any existing behavior. Below are the 
possible illegal accesses.

1. If the illegal access was to boot time region, presently, it works during 
kernel boot but not after kernel boot, because we free boot time regions after 
set_virtual_address_map() is called. Please see 
efi_reserve/free_boot_services(). With the patches, we save boot time regions 
forever and hence illegal access could be fixed even after kernel boot. So, 
distros shouldn't see anything different here.

2. If the illegal access was to any other region except boot time region, 
presently, kernel panics both during and after kernel boot (this is the case 
reported by Al Stone). With these patches, we exit firmware context and hence 
fixup page fault handler. So, distros here, instead of seeing a kernel panic 
would see EFI Runtime Services disabled.

Regards,
Sai


Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-26 Thread Ard Biesheuvel
On 25 July 2018 at 19:32, Prakhya, Sai Praneeth
 wrote:
>> I have added some x86/intel folks to cc.
>>
>> I am fine with these patches, and I think it is useful to be able to detect 
>> and
>> recover from buggy UEFI implementations that use boot time regions at
>> runtime.
>>
>> However, I need help from the x86 maintainers/developers to review this so
>> please cc them on these patches.
>
> Hi Ard,
>
> Sure! I will keep them cc'ed.
>
> Could you also please let me know you thoughts on this approach
>
> If the illegal access occurs to any EFI region other than EFI boot time 
> regions (Eg: EFI conventional memory or EFI loader code/data), these patches 
> will exit firmware context and return to kernel i.e. we are adjusting RIP and 
> RSP in efi page fault handler and leaving runtime service execution abruptly. 
> Is that OK?
>

I need the x86 guys to tell me if that is OK. This is essentially an
open coded longjmp(), which smells dodgy to me, but this is an x86
question not an EFI question.

> This code in "[PATCH RFC 4/8] x86/efi: Add page fault handler to 
> fixup/recover from page faults caused by firmware"
> +   regs->sp = xmm_regs_rsp;
> +   regs->ip = exit_fw_ctx_rip;
> +   exited_fw_ctx = true;
> +   clear_bit(EFI_RUNTIME_SERVICES, );
> +   pr_info("Exited Firmware context and disabled EFI Runtime 
> Services\n");
>
> AFAIK, efi runtime services are not reentrant. With this in mind, if 
> something like above happens, I have completely turned off EFI runtime 
> services in kernel. Is that OK? Or should we keep them enabled hoping to 
> catch further illegal accesses (assuming that this feature is not used in 
> production kernels).
>

I think it is reasonable to turn off services after that. The only
problem is that distros will never be able to enable this, given that
it may break systems that are working fine today.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-25 Thread Prakhya, Sai Praneeth
> I have added some x86/intel folks to cc.
> 
> I am fine with these patches, and I think it is useful to be able to detect 
> and
> recover from buggy UEFI implementations that use boot time regions at
> runtime.
> 
> However, I need help from the x86 maintainers/developers to review this so
> please cc them on these patches.

Hi Ard,

Sure! I will keep them cc'ed.

Could you also please let me know you thoughts on this approach

If the illegal access occurs to any EFI region other than EFI boot time regions 
(Eg: EFI conventional memory or EFI loader code/data), these patches will exit 
firmware context and return to kernel i.e. we are adjusting RIP and RSP in efi 
page fault handler and leaving runtime service execution abruptly. Is that OK?

This code in "[PATCH RFC 4/8] x86/efi: Add page fault handler to fixup/recover 
from page faults caused by firmware"
+   regs->sp = xmm_regs_rsp;
+   regs->ip = exit_fw_ctx_rip;
+   exited_fw_ctx = true;
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   pr_info("Exited Firmware context and disabled EFI Runtime Services\n");

AFAIK, efi runtime services are not reentrant. With this in mind, if something 
like above happens, I have completely turned off EFI runtime services in 
kernel. Is that OK? Or should we keep them enabled hoping to catch further 
illegal accesses (assuming that this feature is not used in production kernels).

Regards,
Sai
N�r��y���b�X��ǧv�^�)޺{.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�

Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-25 Thread Ard Biesheuvel
(+ Ingo, Andy, Peter)

Hello Sai,

I have added some x86/intel folks to cc.

I am fine with these patches, and I think it is useful to be able to
detect and recover from buggy UEFI implementations that use boot time
regions at runtime.

However, I need help from the x86 maintainers/developers to review
this so please cc them on these patches.

Thanks,
Ard.

On 22 July 2018 at 07:41, Sai Praneeth Prakhya
 wrote:
> From: Sai Praneeth 
>
> There may exist some buggy UEFI firmware implementations that access efi
> memory regions other than EFI_RUNTIME_SERVICES_ even after
> kernel has assumed control of the platform. This violates UEFI
> specification. Here, we provide a debug config option which when enabled
> detects and fixes up/recovers from page faults caused by buggy firmware.
>
> The above said illegal accesses trigger page fault in ring 0 because
> firmware executes at ring 0 and if unhandled it hangs the kernel. We
> provide an efi specific page fault handler to:
> 1. Avoid panics/hangs caused by buggy firmware.
> 2. Shout loud that the firmware is buggy and hence can save ourselves
> from being blamed for not a fault of ours.
>
> Depending on the illegally accessed efi region, the efi page fault
> handler handles illegal accesses differently.
> 1. If the illegally accessed region is EFI_BOOT_SERVICES_,
> the page fault handler fixes it up by mapping the requested region.
> 2. If the illegally accessed region is any other efi region (like
> EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), the page fault
> handler exits firmware context and disables EFI Runtime Services, so
> that we will never again call buggy firmware.
>
> Page faults to efi regions are handled differently because, presently
> during kernel boot, EFI_BOOT_SERVICES_ regions are reserved
> by kernel and hence it's OK to dynamically map these regions in page
> fault handler. The same approach cannot be followed for other efi
> regions like EFI_CONVENTIONAL_MEMORY and EFI_LOADER_ as they
> are very huge in size and reserving them could make the kernel
> un-bootable. Hence, we take a different approach (exiting firmware
> context) while dealing with page faults to these regions. This also
> saves us from executing buggy firmware further.
>
> Exiting firmware context means that on every entry to firmware we save
> the kernel context before calling firmware and if the firmware
> misbehaves, in the page fault handler, we roll back to the saved kernel
> context. Saving kernel context means saving the stack pointer and the
> instruction that gets executed when firmware returns. In the page fault
> handler we fix up these two things (RIP and RSP) so that when returning
> from page fault handler it looks as if firmware has called RET.
>
> This issue was reported by Al Stone when he saw that reboot via EFI hangs
> the machine. Upon debugging, I found that it's efi_reset_system() that's
> touching memory regions which it shouldn't. To reproduce the same
> behavior, I have hacked OVMF and made efi_reset_system() buggy.
>
> Testing the patch set:
> --
> 1. Download buggy firmware from here [1].
> 2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
> Add reboot=efi to the kernel command line arguments and after the kernel
> is up and running, type "reboot". The kernel should hang while rebooting.
> 3. With the same setup, boot kernel after applying patches and the
> reboot should work fine. Also please notice warning/error messages
> printed by kernel.
>
> Note:
> -
> Patch set based on "next" branch in efi tree.
>
> [1] https://drive.google.com/open?id=1tkvT7GaVX2zSlzy1HK1T4Tv8cT36GP6R
>
> Sai Praneeth (8):
>   x86/efi: Remove __init attribute from memory mapping functions
>   x86/efi: Permanently save the EFI_MEMORY_MAP passed by firmware
>   x86/efi: Save kernel context before calling EFI Runtime Services
>   x86/efi: Add page fault handler to fixup/recover from page faults
> caused by firmware
>   x86/efi: If EFI_WARN_ON_ILLEGAL_ACCESSES is enabled don't call
> efi_free_boot_services()
>   x86/efi: Map EFI_BOOT_SERVICES_ regions only when
> EFI_WARN_ON_ILLEGAL_ACCESSES is disabled
>   x86/mm: If in_atomic(), allocate pages without sleeping
>   x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES
>
>  arch/x86/Kconfig|  17 +++
>  arch/x86/include/asm/efi.h  |  42 ++-
>  arch/x86/mm/fault.c |   9 ++
>  arch/x86/mm/pageattr.c  |  16 ++-
>  arch/x86/platform/efi/efi.c |  10 +-
>  arch/x86/platform/efi/efi_32.c  |   2 +-
>  arch/x86/platform/efi/efi_64.c  |  16 ++-
>  arch/x86/platform/efi/efi_stub_64.S | 101 -
>  arch/x86/platform/efi/quirks.c  | 193 
> 
>  drivers/firmware/efi/efi.c  |   6 +-
>  drivers/firmware/efi/runtime-wrappers.c |   6 +
>  include/linux/efi.h |  16 ++-
>  init/main.c

[PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that access efi
memory regions other than EFI_RUNTIME_SERVICES_ even after
kernel has assumed control of the platform. This violates UEFI
specification. Here, we provide a debug config option which when enabled
detects and fixes up/recovers from page faults caused by buggy firmware.

The above said illegal accesses trigger page fault in ring 0 because
firmware executes at ring 0 and if unhandled it hangs the kernel. We
provide an efi specific page fault handler to:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence can save ourselves
from being blamed for not a fault of ours.

Depending on the illegally accessed efi region, the efi page fault
handler handles illegal accesses differently.
1. If the illegally accessed region is EFI_BOOT_SERVICES_,
the page fault handler fixes it up by mapping the requested region.
2. If the illegally accessed region is any other efi region (like
EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), the page fault
handler exits firmware context and disables EFI Runtime Services, so
that we will never again call buggy firmware.

Page faults to efi regions are handled differently because, presently
during kernel boot, EFI_BOOT_SERVICES_ regions are reserved
by kernel and hence it's OK to dynamically map these regions in page
fault handler. The same approach cannot be followed for other efi
regions like EFI_CONVENTIONAL_MEMORY and EFI_LOADER_ as they
are very huge in size and reserving them could make the kernel
un-bootable. Hence, we take a different approach (exiting firmware
context) while dealing with page faults to these regions. This also
saves us from executing buggy firmware further.

Exiting firmware context means that on every entry to firmware we save
the kernel context before calling firmware and if the firmware
misbehaves, in the page fault handler, we roll back to the saved kernel
context. Saving kernel context means saving the stack pointer and the
instruction that gets executed when firmware returns. In the page fault
handler we fix up these two things (RIP and RSP) so that when returning
from page fault handler it looks as if firmware has called RET.

This issue was reported by Al Stone when he saw that reboot via EFI hangs
the machine. Upon debugging, I found that it's efi_reset_system() that's
touching memory regions which it shouldn't. To reproduce the same
behavior, I have hacked OVMF and made efi_reset_system() buggy.

Testing the patch set:
--
1. Download buggy firmware from here [1].
2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
Add reboot=efi to the kernel command line arguments and after the kernel
is up and running, type "reboot". The kernel should hang while rebooting.
3. With the same setup, boot kernel after applying patches and the
reboot should work fine. Also please notice warning/error messages
printed by kernel.

Note:
-
Patch set based on "next" branch in efi tree.

[1] https://drive.google.com/open?id=1tkvT7GaVX2zSlzy1HK1T4Tv8cT36GP6R

Sai Praneeth (8):
  x86/efi: Remove __init attribute from memory mapping functions
  x86/efi: Permanently save the EFI_MEMORY_MAP passed by firmware
  x86/efi: Save kernel context before calling EFI Runtime Services
  x86/efi: Add page fault handler to fixup/recover from page faults
caused by firmware
  x86/efi: If EFI_WARN_ON_ILLEGAL_ACCESSES is enabled don't call
efi_free_boot_services()
  x86/efi: Map EFI_BOOT_SERVICES_ regions only when
EFI_WARN_ON_ILLEGAL_ACCESSES is disabled
  x86/mm: If in_atomic(), allocate pages without sleeping
  x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES

 arch/x86/Kconfig|  17 +++
 arch/x86/include/asm/efi.h  |  42 ++-
 arch/x86/mm/fault.c |   9 ++
 arch/x86/mm/pageattr.c  |  16 ++-
 arch/x86/platform/efi/efi.c |  10 +-
 arch/x86/platform/efi/efi_32.c  |   2 +-
 arch/x86/platform/efi/efi_64.c  |  16 ++-
 arch/x86/platform/efi/efi_stub_64.S | 101 -
 arch/x86/platform/efi/quirks.c  | 193 
 drivers/firmware/efi/efi.c  |   6 +-
 drivers/firmware/efi/runtime-wrappers.c |   6 +
 include/linux/efi.h |  16 ++-
 init/main.c |   3 +-
 13 files changed, 415 insertions(+), 22 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Cc: Al Stone 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html