RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
> > 1. Since we don't use "efi_rts_wq" for efi_reset_system(), the above > > solution doesn't work if efi_reset_system() is buggy. We cannot > > neglect it either because that's the issue faced by Al Stone and hence > > I started the patch set. So, I am thinking to use long jump technique > > *only* for > efi_reset_system(). Would you be OK with that? > > Or please propose a new solution. > > The long jump thing is super ugly. I'd be okay with it if it were somehow > factored out into a testable mechanism, but it's a nasty complicated bit of > assembly, it will only be used very very rarely, and it more or less > duplicates > switch_to(). Yes, I agree that it's complicated assembly. But I tested it by hacking OVMF. And.. you are correct that it's very very rarely used.. so probably there is no point adding complicated code for it. > > Could you just do a fallback reboot from the fault handler if > efi_reset_system() fails? > Sure! I will try that and will update you on how that goes. Hopefully, well. > > > > 2. Also, as suggested, I haven’t used a dedicated "efi_kthread", I got > > this working with the existing "efi_rts_wq" setup. Please note that, > > the existing "efi_rts_wq", has only one worker thread. Could you > > please brief me on why we should have a dedicated kthread and not leverage > "efi_rts_wq" (I am unclear on this). > > Two reasons: > > 1. If you used a dedicated thread, then you could probably just set that > thread's - > >mm to the EFI mm rather than manually switching the mm. That means, we don't need efi_switch_mm() anymore and could just get away by changing efi_kthread ->mm = efi_mm. Is that what you meant? Just wanted to make sure, I understood you correctly. > > 2. If you used a dedicated thread, you would be a lot closer to being able to > call > into EFI at CPL 3. In x86 Linux (and probably most or all other arches), the > kernel > never calls into CPL 3. Instead it > *returns* into CPL 3. So you would set up a thread and make that thread's > task_pt_regs() have the correct context for a little stub that calls into EFI > and > then does "syscall". Then you would return all the way out of the kernel > into the > stub. I got it.. not 100% though.. but I will surely work on this because it sounds very interesting to me. But still, I have a question, this article https://lwn.net/Articles/23634/ says that "each workqueue has one or more dedicated worker threads (one per CPU, by default) associated with it" So, do you think we still need our own efi dedicated kthread? Regards, Sai N�r��y���b�X��ǧv�^�){.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�
Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
On Tue, Aug 7, 2018 at 11:01 AM, Prakhya, Sai Praneeth wrote: > Hi All, > >> > >> The main problem is that we have just merged Sai's code to use a >> > >> work queue for invoking EFI services, and killing kworker threads >> > >> is obviously not going to make EFI any new friends. >> > >> >> > >> So I guess we should probably rework that code to use a dedicated >> > >> kthread, and just freeze it when it performs an illegal memory >> > >> access, and signal the completion in a way that makes the calling >> > >> thread understand that a) the call failed and b) runtime services >> > >> are no longer >> > available. >> > > >> > > Yes, this makes sense to me. >> > > Initially I did use a dedicated kthread for efi but moved to work >> > > queues later so >> > that the synchronization is well handled. I am ok to rework on the >> > patches, could we ask Ingo to hold onto efi_workqueue patches? >> > > >> > >> > I am fine with keeping them. We will have a different approach in >> > v4.19 than in subsequent kernels, but the workqueue approach is still >> > better than nothing at all. > > As discussed in previous mails, I have implemented the below: > > If kernel detects an illegal access by firmware to any efi region other than > efi boot time code/data, > a. It freezes "efi_rts_wq" (efi runtime services work queue). > b. Signals the requested process that an error occurred. > c. Disables EFI Runtime Services forever. > d. Explicitly calls the scheduler to schedule another process. > > But, I have couple of questions: > > 1. Since we don't use "efi_rts_wq" for efi_reset_system(), the above solution > doesn't work if efi_reset_system() is buggy. We cannot neglect it either > because > that's the issue faced by Al Stone and hence I started the patch set. So, I > am thinking > to use long jump technique *only* for efi_reset_system(). Would you be OK > with that? > Or please propose a new solution. The long jump thing is super ugly. I'd be okay with it if it were somehow factored out into a testable mechanism, but it's a nasty complicated bit of assembly, it will only be used very very rarely, and it more or less duplicates switch_to(). Could you just do a fallback reboot from the fault handler if efi_reset_system() fails? > > 2. Also, as suggested, I haven’t used a dedicated "efi_kthread", I got this > working with > the existing "efi_rts_wq" setup. Please note that, the existing "efi_rts_wq", > has only one > worker thread. Could you please brief me on why we should have a dedicated > kthread > and not leverage "efi_rts_wq" (I am unclear on this). Two reasons: 1. If you used a dedicated thread, then you could probably just set that thread's ->mm to the EFI mm rather than manually switching the mm. 2. If you used a dedicated thread, you would be a lot closer to being able to call into EFI at CPL 3. In x86 Linux (and probably most or all other arches), the kernel never calls into CPL 3. Instead it *returns* into CPL 3. So you would set up a thread and make that thread's task_pt_regs() have the correct context for a little stub that calls into EFI and then does "syscall". Then you would return all the way out of the kernel into the stub. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
> > As discussed in previous mails, I have implemented the below: > > > > If kernel detects an illegal access by firmware to any efi region > > other than efi boot time code/data, a. It freezes "efi_rts_wq" (efi > > runtime services work queue). > > b. Signals the requested process that an error occurred. > > c. Disables EFI Runtime Services forever. > > d. Explicitly calls the scheduler to schedule another process. > > > > But, I have couple of questions: > > > > 1. Since we don't use "efi_rts_wq" for efi_reset_system(), the above > > solution doesn't work if efi_reset_system() is buggy. We cannot > > neglect it either because that's the issue faced by Al Stone and hence > > I started the patch set. So, I am thinking to use long jump technique > > *only* for > efi_reset_system(). Would you be OK with that? > > Or please propose a new solution. > > > > What would be the point of that? Can you reasonably expect the kernel to still > run reliably after it has taken itself down, called > efi_reset_system() and failed? > The problem with efi_reset_system() being buggy is, If user calls "reboot", kernel would call efi_reset_system() and if unhandled, kernel hangs and never reboots. But, if handled and returning error (either using long jump or some other technique), kernel could continue rebooting through other fall backs. I think, this is true only for x86 though. Please note that native_machine_emergency_restart() would call "BOOT_BIOS", if efi_reboot() returns in arch/x86/kernel/reboot > > 2. Also, as suggested, I haven’t used a dedicated "efi_kthread", I got > > this working with the existing "efi_rts_wq" setup. Please note that, > > the existing "efi_rts_wq", has only one worker thread. Could you > > please brief me on why we should have a dedicated kthread and not leverage > "efi_rts_wq" (I am unclear on this). > > > > Does every work queue have its own dedicated kernel thread? Hmm.. I think so.. I referred to the below articles while working on "efi_rts_wq" They say 1. Each workqueue has one or more dedicated worker threads (one per CPU, by default) associated with it. 2. Tasks running out of a special-purpose workqueue can sleep indefinitely, since they will not interfere with tasks in any other queue. [1] https://lwn.net/Articles/23634/ [2] https://lwn.net/Articles/11360/ Regards, Sai
Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
On 7 August 2018 at 20:01, Prakhya, Sai Praneeth wrote: > Hi All, > >> > >> The main problem is that we have just merged Sai's code to use a >> > >> work queue for invoking EFI services, and killing kworker threads >> > >> is obviously not going to make EFI any new friends. >> > >> >> > >> So I guess we should probably rework that code to use a dedicated >> > >> kthread, and just freeze it when it performs an illegal memory >> > >> access, and signal the completion in a way that makes the calling >> > >> thread understand that a) the call failed and b) runtime services >> > >> are no longer >> > available. >> > > >> > > Yes, this makes sense to me. >> > > Initially I did use a dedicated kthread for efi but moved to work >> > > queues later so >> > that the synchronization is well handled. I am ok to rework on the >> > patches, could we ask Ingo to hold onto efi_workqueue patches? >> > > >> > >> > I am fine with keeping them. We will have a different approach in >> > v4.19 than in subsequent kernels, but the workqueue approach is still >> > better than nothing at all. > > As discussed in previous mails, I have implemented the below: > > If kernel detects an illegal access by firmware to any efi region other than > efi boot time code/data, > a. It freezes "efi_rts_wq" (efi runtime services work queue). > b. Signals the requested process that an error occurred. > c. Disables EFI Runtime Services forever. > d. Explicitly calls the scheduler to schedule another process. > > But, I have couple of questions: > > 1. Since we don't use "efi_rts_wq" for efi_reset_system(), the above solution > doesn't work if efi_reset_system() is buggy. We cannot neglect it either > because > that's the issue faced by Al Stone and hence I started the patch set. So, I > am thinking > to use long jump technique *only* for efi_reset_system(). Would you be OK > with that? > Or please propose a new solution. > What would be the point of that? Can you reasonably expect the kernel to still run reliably after it has taken itself down, called efi_reset_system() and failed? > 2. Also, as suggested, I haven’t used a dedicated "efi_kthread", I got this > working with > the existing "efi_rts_wq" setup. Please note that, the existing "efi_rts_wq", > has only one > worker thread. Could you please brief me on why we should have a dedicated > kthread > and not leverage "efi_rts_wq" (I am unclear on this). > Does every work queue have its own dedicated kernel thread? -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
Hi All, > > >> The main problem is that we have just merged Sai's code to use a > > >> work queue for invoking EFI services, and killing kworker threads > > >> is obviously not going to make EFI any new friends. > > >> > > >> So I guess we should probably rework that code to use a dedicated > > >> kthread, and just freeze it when it performs an illegal memory > > >> access, and signal the completion in a way that makes the calling > > >> thread understand that a) the call failed and b) runtime services > > >> are no longer > > available. > > > > > > Yes, this makes sense to me. > > > Initially I did use a dedicated kthread for efi but moved to work > > > queues later so > > that the synchronization is well handled. I am ok to rework on the > > patches, could we ask Ingo to hold onto efi_workqueue patches? > > > > > > > I am fine with keeping them. We will have a different approach in > > v4.19 than in subsequent kernels, but the workqueue approach is still > > better than nothing at all. As discussed in previous mails, I have implemented the below: If kernel detects an illegal access by firmware to any efi region other than efi boot time code/data, a. It freezes "efi_rts_wq" (efi runtime services work queue). b. Signals the requested process that an error occurred. c. Disables EFI Runtime Services forever. d. Explicitly calls the scheduler to schedule another process. But, I have couple of questions: 1. Since we don't use "efi_rts_wq" for efi_reset_system(), the above solution doesn't work if efi_reset_system() is buggy. We cannot neglect it either because that's the issue faced by Al Stone and hence I started the patch set. So, I am thinking to use long jump technique *only* for efi_reset_system(). Would you be OK with that? Or please propose a new solution. 2. Also, as suggested, I haven’t used a dedicated "efi_kthread", I got this working with the existing "efi_rts_wq" setup. Please note that, the existing "efi_rts_wq", has only one worker thread. Could you please brief me on why we should have a dedicated kthread and not leverage "efi_rts_wq" (I am unclear on this). Regards, Sai N�r��y���b�X��ǧv�^�){.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�
RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
> >> The main problem is that we have just merged Sai's code to use a work > >> queue for invoking EFI services, and killing kworker threads is > >> obviously not going to make EFI any new friends. > >> > >> So I guess we should probably rework that code to use a dedicated > >> kthread, and just freeze it when it performs an illegal memory > >> access, and signal the completion in a way that makes the calling > >> thread understand that a) the call failed and b) runtime services are no > >> longer > available. > > > > Yes, this makes sense to me. > > Initially I did use a dedicated kthread for efi but moved to work queues > > later so > that the synchronization is well handled. I am ok to rework on the patches, > could > we ask Ingo to hold onto efi_workqueue patches? > > > > I am fine with keeping them. We will have a different approach in > v4.19 than in subsequent kernels, but the workqueue approach is still better > than > nothing at all. Makes sense to me.
Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
On 26 July 2018 at 20:51, Prakhya, Sai Praneeth wrote: >> > The basic idea is that, when you get an exception from a context that >> > is busted (i.e. it wasn't user code with a signal handler or kernel >> > code with a fixup), you can't return from the exception handler. That >> > leaves two choices: >> > >> > 1. Kill the task without returning. You could call do_exit(), or, >> > roughly equivalently, let yourself OOPS (fall through to die(), etc). >> > Then you have to make sure that the code that called into the thread >> > can handle it by signaling some completion first or whatever you need >> > to do. >> > >> > 2. Sleep forever. For example, set the task state to >> > TASK_UNINTERRUPTIBLE and reschedule. >> > >> > If the thread is a real kthread, (1) may be a bad idea. I’m not sure a >> > kthread can tolerate do_exit(), since the kthread core plays awful >> > games involving storing important data structures on the stack. Also, >> > with (2), it might be possible to enable some after-the-fact >> > debugging, since the full crashing state is still there. >> > >> >> The main problem is that we have just merged Sai's code to use a work queue >> for invoking EFI services, and killing kworker threads is obviously not >> going to >> make EFI any new friends. >> >> So I guess we should probably rework that code to use a dedicated kthread, >> and >> just freeze it when it performs an illegal memory access, and signal the >> completion in a way that makes the calling thread understand that a) the call >> failed and b) runtime services are no longer available. > > Yes, this makes sense to me. > Initially I did use a dedicated kthread for efi but moved to work queues > later so that the synchronization is well handled. I am ok to rework on the > patches, could we ask Ingo to hold onto efi_workqueue patches? > I am fine with keeping them. We will have a different approach in v4.19 than in subsequent kernels, but the workqueue approach is still better than nothing at all. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
> > The basic idea is that, when you get an exception from a context that > > is busted (i.e. it wasn't user code with a signal handler or kernel > > code with a fixup), you can't return from the exception handler. That > > leaves two choices: > > > > 1. Kill the task without returning. You could call do_exit(), or, > > roughly equivalently, let yourself OOPS (fall through to die(), etc). > > Then you have to make sure that the code that called into the thread > > can handle it by signaling some completion first or whatever you need > > to do. > > > > 2. Sleep forever. For example, set the task state to > > TASK_UNINTERRUPTIBLE and reschedule. > > > > If the thread is a real kthread, (1) may be a bad idea. I’m not sure a > > kthread can tolerate do_exit(), since the kthread core plays awful > > games involving storing important data structures on the stack. Also, > > with (2), it might be possible to enable some after-the-fact > > debugging, since the full crashing state is still there. > > > > The main problem is that we have just merged Sai's code to use a work queue > for invoking EFI services, and killing kworker threads is obviously not going > to > make EFI any new friends. > > So I guess we should probably rework that code to use a dedicated kthread, and > just freeze it when it performs an illegal memory access, and signal the > completion in a way that makes the calling thread understand that a) the call > failed and b) runtime services are no longer available. Yes, this makes sense to me. Initially I did use a dedicated kthread for efi but moved to work queues later so that the synchronization is well handled. I am ok to rework on the patches, could we ask Ingo to hold onto efi_workqueue patches? Regards, Sai N�r��y���b�X��ǧv�^�){.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�
Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
On 26 July 2018 at 20:09, Andy Lutomirski wrote: > [sorry for totally busted formatting -- your email client and Gmail > don't get along] > > On Jul 26, 2018, at 10:23 AM, Prakhya, Sai Praneeth > wrote: > >> I have added some x86/intel folks to cc. >> >> I am fine with these patches, and I think it is useful to be able to >> detect and recover from buggy UEFI implementations that use boot time >> regions at runtime. >> >> However, I need help from the x86 maintainers/developers to review >> this so please cc them on these patches. >> >> >> >> I'm okay with the general concept, but I'm not really thrilled by the >> longjmp-like approach. >> >> >> >> Wasn't there a bunch of talk of having an actual kernel thread (kefid?) that >> makes runtime services calls? Did that actually get implemented? IMO a >> much nicer approach would be to handle the page fault by killing the thread, >> more or less like how we kill unruly user threads. (And it's yet another >> step toward calling EFI runtime services at CPL 3!) >> >> >> >> Hi Andy, >> >> >> >> Thanks for the feedback J. >> >> >> >> We have efi_kthread implemented and I did briefly think about killing the >> efi_kthread approach, but I thought it might not be possible (I might be >> wrong) because, we are in page fault handler and if we kill efi_kthread, the >> page fault handler still returns to firmware (because a firmware instruction >> caused page fault) and firmware will try to perform illegal access again >> thinking that the page fault handler might have fixed the fault. So, I took >> this approach of jumping out of firmware. >> >> >> >> Please let me know If you think I missed something. > > The basic idea is that, when you get an exception from a context that > is busted (i.e. it wasn't user code with a signal handler or kernel > code with a fixup), you can't return from the exception handler. That > leaves two choices: > > 1. Kill the task without returning. You could call do_exit(), or, > roughly equivalently, let yourself OOPS (fall through to die(), etc). > Then you have to make sure that the code that called into the thread > can handle it by signaling some completion first or whatever you need > to do. > > 2. Sleep forever. For example, set the task state to > TASK_UNINTERRUPTIBLE and reschedule. > > If the thread is a real kthread, (1) may be a bad idea. I’m not sure a > kthread can tolerate do_exit(), since the kthread core plays awful > games involving storing important data structures on the stack. Also, > with (2), it might be possible to enable some after-the-fact > debugging, since the full crashing state is still there. > The main problem is that we have just merged Sai's code to use a work queue for invoking EFI services, and killing kworker threads is obviously not going to make EFI any new friends. So I guess we should probably rework that code to use a dedicated kthread, and just freeze it when it performs an illegal memory access, and signal the completion in a way that makes the calling thread understand that a) the call failed and b) runtime services are no longer available. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
[sorry for totally busted formatting -- your email client and Gmail don't get along] On Jul 26, 2018, at 10:23 AM, Prakhya, Sai Praneeth wrote: > I have added some x86/intel folks to cc. > > I am fine with these patches, and I think it is useful to be able to > detect and recover from buggy UEFI implementations that use boot time > regions at runtime. > > However, I need help from the x86 maintainers/developers to review > this so please cc them on these patches. > > > > I'm okay with the general concept, but I'm not really thrilled by the > longjmp-like approach. > > > > Wasn't there a bunch of talk of having an actual kernel thread (kefid?) that > makes runtime services calls? Did that actually get implemented? IMO a much > nicer approach would be to handle the page fault by killing the thread, more > or less like how we kill unruly user threads. (And it's yet another step > toward calling EFI runtime services at CPL 3!) > > > > Hi Andy, > > > > Thanks for the feedback J. > > > > We have efi_kthread implemented and I did briefly think about killing the > efi_kthread approach, but I thought it might not be possible (I might be > wrong) because, we are in page fault handler and if we kill efi_kthread, the > page fault handler still returns to firmware (because a firmware instruction > caused page fault) and firmware will try to perform illegal access again > thinking that the page fault handler might have fixed the fault. So, I took > this approach of jumping out of firmware. > > > > Please let me know If you think I missed something. The basic idea is that, when you get an exception from a context that is busted (i.e. it wasn't user code with a signal handler or kernel code with a fixup), you can't return from the exception handler. That leaves two choices: 1. Kill the task without returning. You could call do_exit(), or, roughly equivalently, let yourself OOPS (fall through to die(), etc). Then you have to make sure that the code that called into the thread can handle it by signaling some completion first or whatever you need to do. 2. Sleep forever. For example, set the task state to TASK_UNINTERRUPTIBLE and reschedule. If the thread is a real kthread, (1) may be a bad idea. I’m not sure a kthread can tolerate do_exit(), since the kthread core plays awful games involving storing important data structures on the stack. Also, with (2), it might be possible to enable some after-the-fact debugging, since the full crashing state is still there. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
I have added some x86/intel folks to cc. I am fine with these patches, and I think it is useful to be able to detect and recover from buggy UEFI implementations that use boot time regions at runtime. However, I need help from the x86 maintainers/developers to review this so please cc them on these patches. I'm okay with the general concept, but I'm not really thrilled by the longjmp-like approach. Wasn't there a bunch of talk of having an actual kernel thread (kefid?) that makes runtime services calls? Did that actually get implemented? IMO a much nicer approach would be to handle the page fault by killing the thread, more or less like how we kill unruly user threads. (And it's yet another step toward calling EFI runtime services at CPL 3!) Hi Andy, Thanks for the feedback ☺. We have efi_kthread implemented and I did briefly think about killing the efi_kthread approach, but I thought it might not be possible (I might be wrong) because, we are in page fault handler and if we kill efi_kthread, the page fault handler still returns to firmware (because a firmware instruction caused page fault) and firmware will try to perform illegal access again thinking that the page fault handler might have fixed the fault. So, I took this approach of jumping out of firmware. Please let me know If you think I missed something. Regards, Sai
Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
On Wed, Jul 25, 2018 at 2:39 AM, Ard Biesheuvel wrote: > (+ Ingo, Andy, Peter) > > Hello Sai, > > I have added some x86/intel folks to cc. > > I am fine with these patches, and I think it is useful to be able to > detect and recover from buggy UEFI implementations that use boot time > regions at runtime. > > However, I need help from the x86 maintainers/developers to review > this so please cc them on these patches. > > I'm okay with the general concept, but I'm not really thrilled by the longjmp-like approach. Wasn't there a bunch of talk of having an actual kernel thread (kefid?) that makes runtime services calls? Did that actually get implemented? IMO a much nicer approach would be to handle the page fault by killing the thread, more or less like how we kill unruly user threads. (And it's yet another step toward calling EFI runtime services at CPL 3!) --Andy
RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
> > AFAIK, efi runtime services are not reentrant. With this in mind, if > > something > like above happens, I have completely turned off EFI runtime services in > kernel. > Is that OK? Or should we keep them enabled hoping to catch further illegal > accesses (assuming that this feature is not used in production kernels). > > > > I think it is reasonable to turn off services after that. The only problem is > that > distros will never be able to enable this, given that it may break systems > that are > working fine today. Actually these patches shouldn't break any existing behavior. Below are the possible illegal accesses. 1. If the illegal access was to boot time region, presently, it works during kernel boot but not after kernel boot, because we free boot time regions after set_virtual_address_map() is called. Please see efi_reserve/free_boot_services(). With the patches, we save boot time regions forever and hence illegal access could be fixed even after kernel boot. So, distros shouldn't see anything different here. 2. If the illegal access was to any other region except boot time region, presently, kernel panics both during and after kernel boot (this is the case reported by Al Stone). With these patches, we exit firmware context and hence fixup page fault handler. So, distros here, instead of seeing a kernel panic would see EFI Runtime Services disabled. Regards, Sai
Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
On 25 July 2018 at 19:32, Prakhya, Sai Praneeth wrote: >> I have added some x86/intel folks to cc. >> >> I am fine with these patches, and I think it is useful to be able to detect >> and >> recover from buggy UEFI implementations that use boot time regions at >> runtime. >> >> However, I need help from the x86 maintainers/developers to review this so >> please cc them on these patches. > > Hi Ard, > > Sure! I will keep them cc'ed. > > Could you also please let me know you thoughts on this approach > > If the illegal access occurs to any EFI region other than EFI boot time > regions (Eg: EFI conventional memory or EFI loader code/data), these patches > will exit firmware context and return to kernel i.e. we are adjusting RIP and > RSP in efi page fault handler and leaving runtime service execution abruptly. > Is that OK? > I need the x86 guys to tell me if that is OK. This is essentially an open coded longjmp(), which smells dodgy to me, but this is an x86 question not an EFI question. > This code in "[PATCH RFC 4/8] x86/efi: Add page fault handler to > fixup/recover from page faults caused by firmware" > + regs->sp = xmm_regs_rsp; > + regs->ip = exit_fw_ctx_rip; > + exited_fw_ctx = true; > + clear_bit(EFI_RUNTIME_SERVICES, ); > + pr_info("Exited Firmware context and disabled EFI Runtime > Services\n"); > > AFAIK, efi runtime services are not reentrant. With this in mind, if > something like above happens, I have completely turned off EFI runtime > services in kernel. Is that OK? Or should we keep them enabled hoping to > catch further illegal accesses (assuming that this feature is not used in > production kernels). > I think it is reasonable to turn off services after that. The only problem is that distros will never be able to enable this, given that it may break systems that are working fine today. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
> I have added some x86/intel folks to cc. > > I am fine with these patches, and I think it is useful to be able to detect > and > recover from buggy UEFI implementations that use boot time regions at > runtime. > > However, I need help from the x86 maintainers/developers to review this so > please cc them on these patches. Hi Ard, Sure! I will keep them cc'ed. Could you also please let me know you thoughts on this approach If the illegal access occurs to any EFI region other than EFI boot time regions (Eg: EFI conventional memory or EFI loader code/data), these patches will exit firmware context and return to kernel i.e. we are adjusting RIP and RSP in efi page fault handler and leaving runtime service execution abruptly. Is that OK? This code in "[PATCH RFC 4/8] x86/efi: Add page fault handler to fixup/recover from page faults caused by firmware" + regs->sp = xmm_regs_rsp; + regs->ip = exit_fw_ctx_rip; + exited_fw_ctx = true; + clear_bit(EFI_RUNTIME_SERVICES, ); + pr_info("Exited Firmware context and disabled EFI Runtime Services\n"); AFAIK, efi runtime services are not reentrant. With this in mind, if something like above happens, I have completely turned off EFI runtime services in kernel. Is that OK? Or should we keep them enabled hoping to catch further illegal accesses (assuming that this feature is not used in production kernels). Regards, Sai N�r��y���b�X��ǧv�^�){.n�+{�y��^n�r��z���h&���G���h�(�階�ݢj"���m�z�ޖ���f���h���~�m�
Re: [PATCH RFC 0/8] Add efi page fault handler to fix/recover from
(+ Ingo, Andy, Peter) Hello Sai, I have added some x86/intel folks to cc. I am fine with these patches, and I think it is useful to be able to detect and recover from buggy UEFI implementations that use boot time regions at runtime. However, I need help from the x86 maintainers/developers to review this so please cc them on these patches. Thanks, Ard. On 22 July 2018 at 07:41, Sai Praneeth Prakhya wrote: > From: Sai Praneeth > > There may exist some buggy UEFI firmware implementations that access efi > memory regions other than EFI_RUNTIME_SERVICES_ even after > kernel has assumed control of the platform. This violates UEFI > specification. Here, we provide a debug config option which when enabled > detects and fixes up/recovers from page faults caused by buggy firmware. > > The above said illegal accesses trigger page fault in ring 0 because > firmware executes at ring 0 and if unhandled it hangs the kernel. We > provide an efi specific page fault handler to: > 1. Avoid panics/hangs caused by buggy firmware. > 2. Shout loud that the firmware is buggy and hence can save ourselves > from being blamed for not a fault of ours. > > Depending on the illegally accessed efi region, the efi page fault > handler handles illegal accesses differently. > 1. If the illegally accessed region is EFI_BOOT_SERVICES_, > the page fault handler fixes it up by mapping the requested region. > 2. If the illegally accessed region is any other efi region (like > EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), the page fault > handler exits firmware context and disables EFI Runtime Services, so > that we will never again call buggy firmware. > > Page faults to efi regions are handled differently because, presently > during kernel boot, EFI_BOOT_SERVICES_ regions are reserved > by kernel and hence it's OK to dynamically map these regions in page > fault handler. The same approach cannot be followed for other efi > regions like EFI_CONVENTIONAL_MEMORY and EFI_LOADER_ as they > are very huge in size and reserving them could make the kernel > un-bootable. Hence, we take a different approach (exiting firmware > context) while dealing with page faults to these regions. This also > saves us from executing buggy firmware further. > > Exiting firmware context means that on every entry to firmware we save > the kernel context before calling firmware and if the firmware > misbehaves, in the page fault handler, we roll back to the saved kernel > context. Saving kernel context means saving the stack pointer and the > instruction that gets executed when firmware returns. In the page fault > handler we fix up these two things (RIP and RSP) so that when returning > from page fault handler it looks as if firmware has called RET. > > This issue was reported by Al Stone when he saw that reboot via EFI hangs > the machine. Upon debugging, I found that it's efi_reset_system() that's > touching memory regions which it shouldn't. To reproduce the same > behavior, I have hacked OVMF and made efi_reset_system() buggy. > > Testing the patch set: > -- > 1. Download buggy firmware from here [1]. > 2. Run a qemu instance with this buggy BIOS and boot mainline kernel. > Add reboot=efi to the kernel command line arguments and after the kernel > is up and running, type "reboot". The kernel should hang while rebooting. > 3. With the same setup, boot kernel after applying patches and the > reboot should work fine. Also please notice warning/error messages > printed by kernel. > > Note: > - > Patch set based on "next" branch in efi tree. > > [1] https://drive.google.com/open?id=1tkvT7GaVX2zSlzy1HK1T4Tv8cT36GP6R > > Sai Praneeth (8): > x86/efi: Remove __init attribute from memory mapping functions > x86/efi: Permanently save the EFI_MEMORY_MAP passed by firmware > x86/efi: Save kernel context before calling EFI Runtime Services > x86/efi: Add page fault handler to fixup/recover from page faults > caused by firmware > x86/efi: If EFI_WARN_ON_ILLEGAL_ACCESSES is enabled don't call > efi_free_boot_services() > x86/efi: Map EFI_BOOT_SERVICES_ regions only when > EFI_WARN_ON_ILLEGAL_ACCESSES is disabled > x86/mm: If in_atomic(), allocate pages without sleeping > x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES > > arch/x86/Kconfig| 17 +++ > arch/x86/include/asm/efi.h | 42 ++- > arch/x86/mm/fault.c | 9 ++ > arch/x86/mm/pageattr.c | 16 ++- > arch/x86/platform/efi/efi.c | 10 +- > arch/x86/platform/efi/efi_32.c | 2 +- > arch/x86/platform/efi/efi_64.c | 16 ++- > arch/x86/platform/efi/efi_stub_64.S | 101 - > arch/x86/platform/efi/quirks.c | 193 > > drivers/firmware/efi/efi.c | 6 +- > drivers/firmware/efi/runtime-wrappers.c | 6 + > include/linux/efi.h | 16 ++- > init/main.c
[PATCH RFC 0/8] Add efi page fault handler to fix/recover from
From: Sai Praneeth There may exist some buggy UEFI firmware implementations that access efi memory regions other than EFI_RUNTIME_SERVICES_ even after kernel has assumed control of the platform. This violates UEFI specification. Here, we provide a debug config option which when enabled detects and fixes up/recovers from page faults caused by buggy firmware. The above said illegal accesses trigger page fault in ring 0 because firmware executes at ring 0 and if unhandled it hangs the kernel. We provide an efi specific page fault handler to: 1. Avoid panics/hangs caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence can save ourselves from being blamed for not a fault of ours. Depending on the illegally accessed efi region, the efi page fault handler handles illegal accesses differently. 1. If the illegally accessed region is EFI_BOOT_SERVICES_, the page fault handler fixes it up by mapping the requested region. 2. If the illegally accessed region is any other efi region (like EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), the page fault handler exits firmware context and disables EFI Runtime Services, so that we will never again call buggy firmware. Page faults to efi regions are handled differently because, presently during kernel boot, EFI_BOOT_SERVICES_ regions are reserved by kernel and hence it's OK to dynamically map these regions in page fault handler. The same approach cannot be followed for other efi regions like EFI_CONVENTIONAL_MEMORY and EFI_LOADER_ as they are very huge in size and reserving them could make the kernel un-bootable. Hence, we take a different approach (exiting firmware context) while dealing with page faults to these regions. This also saves us from executing buggy firmware further. Exiting firmware context means that on every entry to firmware we save the kernel context before calling firmware and if the firmware misbehaves, in the page fault handler, we roll back to the saved kernel context. Saving kernel context means saving the stack pointer and the instruction that gets executed when firmware returns. In the page fault handler we fix up these two things (RIP and RSP) so that when returning from page fault handler it looks as if firmware has called RET. This issue was reported by Al Stone when he saw that reboot via EFI hangs the machine. Upon debugging, I found that it's efi_reset_system() that's touching memory regions which it shouldn't. To reproduce the same behavior, I have hacked OVMF and made efi_reset_system() buggy. Testing the patch set: -- 1. Download buggy firmware from here [1]. 2. Run a qemu instance with this buggy BIOS and boot mainline kernel. Add reboot=efi to the kernel command line arguments and after the kernel is up and running, type "reboot". The kernel should hang while rebooting. 3. With the same setup, boot kernel after applying patches and the reboot should work fine. Also please notice warning/error messages printed by kernel. Note: - Patch set based on "next" branch in efi tree. [1] https://drive.google.com/open?id=1tkvT7GaVX2zSlzy1HK1T4Tv8cT36GP6R Sai Praneeth (8): x86/efi: Remove __init attribute from memory mapping functions x86/efi: Permanently save the EFI_MEMORY_MAP passed by firmware x86/efi: Save kernel context before calling EFI Runtime Services x86/efi: Add page fault handler to fixup/recover from page faults caused by firmware x86/efi: If EFI_WARN_ON_ILLEGAL_ACCESSES is enabled don't call efi_free_boot_services() x86/efi: Map EFI_BOOT_SERVICES_ regions only when EFI_WARN_ON_ILLEGAL_ACCESSES is disabled x86/mm: If in_atomic(), allocate pages without sleeping x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES arch/x86/Kconfig| 17 +++ arch/x86/include/asm/efi.h | 42 ++- arch/x86/mm/fault.c | 9 ++ arch/x86/mm/pageattr.c | 16 ++- arch/x86/platform/efi/efi.c | 10 +- arch/x86/platform/efi/efi_32.c | 2 +- arch/x86/platform/efi/efi_64.c | 16 ++- arch/x86/platform/efi/efi_stub_64.S | 101 - arch/x86/platform/efi/quirks.c | 193 drivers/firmware/efi/efi.c | 6 +- drivers/firmware/efi/runtime-wrappers.c | 6 + include/linux/efi.h | 16 ++- init/main.c | 3 +- 13 files changed, 415 insertions(+), 22 deletions(-) Signed-off-by: Sai Praneeth Prakhya Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Cc: Al Stone Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html