Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Fri, Sep 06, 2019 at 02:51:01PM +0200, Miroslav Benes wrote: > > > > Now, I don't think that replacing .ko on disk is a good idea. We've > > > already discussed it. It would lead to a maintenance/packaging problem, > > > because you never know which version of the module is loaded in the > > > system. The state space grows rather rapidly there. > > > > What exactly are your concerns? > > > > Either the old version of the module is loaded, and it's livepatched; or > > the new version of the module is loaded, and it's not livepatched. > > Let's have module foo.ko with function a(). > > Live patch 1 (LP1) fixes it to a'(), which calls new function b() (present > in LP1). LP1 is used only if foo.ko is loaded. foo.ko is replaced with > foo'.ko on disk. It contains both a'() (fixed a() to be precise) and new > b(). > > Now there is LP2 with new function c() (or c'(), it does not matter) > calling b(). Either foo.ko or foo'.ko can be loaded and you don't know > which one. The implementation LP2 would be different in both cases. > > You could say that it does not matter. If LP2 is implemented for foo.ko, > the same could work for foo'.ko (b() would be a part of LP2 and would not > be called directly from foo'.ko). LP2 would only be necessarily larger. It > is true in case of functions, but if symbol b is not a function but a > global variable, it is different then. Assuming atomic replace, I don't see how this could be a problem. LP2 replaces LP1, so why would LP2 need to access LP1's (or foo'.ko's) symbol b? All live patches should be built against and targeted for the original foo.ko. However... it might break atomic replace functionality in another way. If LP2 is an 'atomic replace' partial revert of LP1, and foo'.ko were loaded, when loading LP2, the atomic replace code wouldn't be able to detect which functions were "patched" in foo'.ko. So if the LP2 functions are not a superset of the LP1 functions, the "patched" functions in foo'.ko wouldn't get reverted. What if foo'.ko were really just the original foo.ko, plus livepatch metadata grafted onto it somehow, such that it patches itself when it loads? Then patched state would always be the same regardless of whether the patch came from the LP or foo'.ko. > Moreover, in this case foo'.ko is "LP superset". Meaning that it contains > only fixes which are present in LP1. What if it is not. We usually > preserve kABI, so there could be a module in two or more versions compiled > from slightly different code (older/newer and so on) and you don't know > which one is loaded. To be fair we don't allow it (I think) at SUSE except > for KMPs (kernel module packages) (the issue of course exists even now > and we haven't solved it yet, because it is rare) and out of tree modules > which we don't support with LP. It could be solved with srcversion, but it > complicates things a lot. "blue sky" idea could extend the issue to all > modules given the above is real. I'm having trouble understanding what this issue is and how "blue sky" would extend it. -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On 9/6/19 1:51 PM, Miroslav Benes wrote: Now, I don't think that replacing .ko on disk is a good idea. We've already discussed it. It would lead to a maintenance/packaging problem, because you never know which version of the module is loaded in the system. The state space grows rather rapidly there. What exactly are your concerns? Either the old version of the module is loaded, and it's livepatched; or the new version of the module is loaded, and it's not livepatched. Let's have module foo.ko with function a(). Live patch 1 (LP1) fixes it to a'(), which calls new function b() (present in LP1). LP1 is used only if foo.ko is loaded. foo.ko is replaced with foo'.ko on disk. It contains both a'() (fixed a() to be precise) and new b(). Now there is LP2 with new function c() (or c'(), it does not matter) calling b(). Either foo.ko or foo'.ko can be loaded and you don't know which one. The implementation LP2 would be different in both cases. You could say that it does not matter. If LP2 is implemented for foo.ko, the same could work for foo'.ko (b() would be a part of LP2 and would not be called directly from foo'.ko). LP2 would only be necessarily larger. It is true in case of functions, but if symbol b is not a function but a global variable, it is different then. Moreover, in this case foo'.ko is "LP superset". Meaning that it contains only fixes which are present in LP1. What if it is not. We usually preserve kABI, so there could be a module in two or more versions compiled from slightly different code (older/newer and so on) and you don't know which one is loaded. To be fair we don't allow it (I think) at SUSE except for KMPs (kernel module packages) (the issue of course exists even now and we haven't solved it yet, because it is rare) and out of tree modules which we don't support with LP. It could be solved with srcversion, but it complicates things a lot. "blue sky" idea could extend the issue to all modules given the above is real. Does it make sense? If I understand correctly, you're saying that this would add another dimension to the potential system state that livepatches need to consider? e.g. when updating a livepatch to v3, a v2 patched module may or may not be loaded. So are we updating livepatch v2 code or module v2 code... I agree that for functions, we could probably get away with repeating code, but not necessarily for new global variables. Then there's the question of: module v3 == module v{1,2} + livepatch v3? Is this scenario similar to one where a customer somehow finds and loads module v3 before loading livepatch v3? Livepatch doesn't have a srcversion whitelist so this should be entirely possible. I suppose it is a bit different in that module v3 would be starting from a fresh load and not something that livepatch v3 has hotpatched from an unknown source/base. -- Joe
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
> > Now, I don't think that replacing .ko on disk is a good idea. We've > > already discussed it. It would lead to a maintenance/packaging problem, > > because you never know which version of the module is loaded in the > > system. The state space grows rather rapidly there. > > What exactly are your concerns? > > Either the old version of the module is loaded, and it's livepatched; or > the new version of the module is loaded, and it's not livepatched. Let's have module foo.ko with function a(). Live patch 1 (LP1) fixes it to a'(), which calls new function b() (present in LP1). LP1 is used only if foo.ko is loaded. foo.ko is replaced with foo'.ko on disk. It contains both a'() (fixed a() to be precise) and new b(). Now there is LP2 with new function c() (or c'(), it does not matter) calling b(). Either foo.ko or foo'.ko can be loaded and you don't know which one. The implementation LP2 would be different in both cases. You could say that it does not matter. If LP2 is implemented for foo.ko, the same could work for foo'.ko (b() would be a part of LP2 and would not be called directly from foo'.ko). LP2 would only be necessarily larger. It is true in case of functions, but if symbol b is not a function but a global variable, it is different then. Moreover, in this case foo'.ko is "LP superset". Meaning that it contains only fixes which are present in LP1. What if it is not. We usually preserve kABI, so there could be a module in two or more versions compiled from slightly different code (older/newer and so on) and you don't know which one is loaded. To be fair we don't allow it (I think) at SUSE except for KMPs (kernel module packages) (the issue of course exists even now and we haven't solved it yet, because it is rare) and out of tree modules which we don't support with LP. It could be solved with srcversion, but it complicates things a lot. "blue sky" idea could extend the issue to all modules given the above is real. Does it make sense? Miroslav
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Thu, Sep 05, 2019 at 03:52:59PM +0200, Petr Mladek wrote: > On Thu 2019-09-05 08:15:02, Josh Poimboeuf wrote: > > On Thu, Sep 05, 2019 at 08:08:32AM -0500, Josh Poimboeuf wrote: > > > On Thu, Sep 05, 2019 at 01:09:55PM +0200, Petr Mladek wrote: > > > > > I don't have a number, but it's very common to patch a function which > > > > > uses jump labels or alternatives. > > > > > > > > Really? My impression is that both alternatives and jump_labels > > > > are used in hot paths. I would expect them mostly in core code > > > > that is always loaded. > > > > > > > > Alternatives are often used in assembly that we are not able > > > > to livepatch anyway. > > > > > > > > Or are they spread widely via some macros or inlined functions? > > > > > > Jump labels are used everywhere. Looking at vmlinux.o in my kernel: > > > > > > Relocation section [19621] '.rela__jump_table' for section [19620] > > > '__jump_table' at offset 0x197873c8 contains 11913 entries: > > > > > > Each jump label entry has 3 entries, so 11913/3 = 3971 jump labels. > > > > > > $ readelf -s vmlinux.o |grep FUNC |wc -l > > > 46902 > > > > > > 3971/46902 = ~8.5% > > > > > > ~8.5% of functions use jump labels. > > > > Obviously some functions may use more than one jump label so this isn't > > exactly bulletproof math. But it gives a rough idea of how widespread > > they are. > > It looks scary. I just wonder why we have never met this problem during > last few years. Who knows what can happen when you disable jump label patching. Sometimes it may be harmless. A panic is probably the worst case. There may be other fail modes which are harder to detect. > My only guess is that most of these functions are either in core > kernel or in code that we do not livepatch. This is definitely not the case. We recently introduced jump label checking in kpatch-build, and it complains a lot. The workaround is to replace such uses with static_key_enabled(). -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Thu 2019-09-05 08:15:02, Josh Poimboeuf wrote: > On Thu, Sep 05, 2019 at 08:08:32AM -0500, Josh Poimboeuf wrote: > > On Thu, Sep 05, 2019 at 01:09:55PM +0200, Petr Mladek wrote: > > > > I don't have a number, but it's very common to patch a function which > > > > uses jump labels or alternatives. > > > > > > Really? My impression is that both alternatives and jump_labels > > > are used in hot paths. I would expect them mostly in core code > > > that is always loaded. > > > > > > Alternatives are often used in assembly that we are not able > > > to livepatch anyway. > > > > > > Or are they spread widely via some macros or inlined functions? > > > > Jump labels are used everywhere. Looking at vmlinux.o in my kernel: > > > > Relocation section [19621] '.rela__jump_table' for section [19620] > > '__jump_table' at offset 0x197873c8 contains 11913 entries: > > > > Each jump label entry has 3 entries, so 11913/3 = 3971 jump labels. > > > > $ readelf -s vmlinux.o |grep FUNC |wc -l > > 46902 > > > > 3971/46902 = ~8.5% > > > > ~8.5% of functions use jump labels. > > Obviously some functions may use more than one jump label so this isn't > exactly bulletproof math. But it gives a rough idea of how widespread > they are. It looks scary. I just wonder why we have never met this problem during last few years. My only guess is that most of these functions are either in core kernel or in code that we do not livepatch. I do not want to say that we should ignore it. I want to understand the cost and impact of the various approaches. Regards, Petr
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Thu, Sep 05, 2019 at 03:31:56PM +0200, Jiri Kosina wrote: > On Thu, 5 Sep 2019, Josh Poimboeuf wrote: > > > > All the indirect jumps are turned into alternatives when retpolines > > > are in place. > > > > Actually in C code those are done by the compiler as calls/jumps to > > __x86_indirect_thunk_*. > > Sure, and the thunks do the redirection via JMP_NOSPEC / CALL_NOSPEC, > which has alternative in it. But the thunks are isolated to arch/x86/lib/retpoline.S. We can't patch that code anyway. -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Thu, 5 Sep 2019, Josh Poimboeuf wrote: > > All the indirect jumps are turned into alternatives when retpolines > > are in place. > > Actually in C code those are done by the compiler as calls/jumps to > __x86_indirect_thunk_*. Sure, and the thunks do the redirection via JMP_NOSPEC / CALL_NOSPEC, which has alternative in it. -- Jiri Kosina SUSE Labs
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Thu, Sep 05, 2019 at 01:19:06PM +0200, Jiri Kosina wrote: > On Thu, 5 Sep 2019, Petr Mladek wrote: > > > > I don't have a number, but it's very common to patch a function which > > > uses jump labels or alternatives. > > > > Really? My impression is that both alternatives and jump_labels > > are used in hot paths. I would expect them mostly in core code > > that is always loaded. > > > > Alternatives are often used in assembly that we are not able > > to livepatch anyway. > > > > Or are they spread widely via some macros or inlined functions? > > All the indirect jumps are turned into alternatives when retpolines are in > place. Actually in C code those are done by the compiler as calls/jumps to __x86_indirect_thunk_*. But there are still a bunch of paravirt patched instructions and alternatives used throughout the kernel. -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Thu, Sep 05, 2019 at 08:08:32AM -0500, Josh Poimboeuf wrote: > On Thu, Sep 05, 2019 at 01:09:55PM +0200, Petr Mladek wrote: > > > I don't have a number, but it's very common to patch a function which > > > uses jump labels or alternatives. > > > > Really? My impression is that both alternatives and jump_labels > > are used in hot paths. I would expect them mostly in core code > > that is always loaded. > > > > Alternatives are often used in assembly that we are not able > > to livepatch anyway. > > > > Or are they spread widely via some macros or inlined functions? > > Jump labels are used everywhere. Looking at vmlinux.o in my kernel: > > Relocation section [19621] '.rela__jump_table' for section [19620] > '__jump_table' at offset 0x197873c8 contains 11913 entries: > > Each jump label entry has 3 entries, so 11913/3 = 3971 jump labels. > > $ readelf -s vmlinux.o |grep FUNC |wc -l > 46902 > > 3971/46902 = ~8.5% > > ~8.5% of functions use jump labels. Obviously some functions may use more than one jump label so this isn't exactly bulletproof math. But it gives a rough idea of how widespread they are. > > > > > + How often new problematic features appear? > > > > > > I'm not exactly sure what you mean, but it seems that anytime we add a > > > new feature, we have to try to wrap our heads around how it interacts > > > with the weirdness of late module patching. > > > > I agree that we need to think about it and it makes complications. > > Anyway, I think that these are never the biggest problems. > > > > I would be more concerned about arch-specific features that might need > > special handling in the livepatch code. Everyone talks only about > > alternatives and jump_labels that were added long time ago. > > Jump labels have been around for many years, but we somehow missed > implementing klp.arch for them. As I said this resulted in panics. > > There may be other similar cases lurking, both in x86 and other arches. > It's not a comforting thought! > > And each case requires special klp code in addition to the real code. > > -- > Josh -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Thu, Sep 05, 2019 at 01:09:55PM +0200, Petr Mladek wrote: > > I don't have a number, but it's very common to patch a function which > > uses jump labels or alternatives. > > Really? My impression is that both alternatives and jump_labels > are used in hot paths. I would expect them mostly in core code > that is always loaded. > > Alternatives are often used in assembly that we are not able > to livepatch anyway. > > Or are they spread widely via some macros or inlined functions? Jump labels are used everywhere. Looking at vmlinux.o in my kernel: Relocation section [19621] '.rela__jump_table' for section [19620] '__jump_table' at offset 0x197873c8 contains 11913 entries: Each jump label entry has 3 entries, so 11913/3 = 3971 jump labels. $ readelf -s vmlinux.o |grep FUNC |wc -l 46902 3971/46902 = ~8.5% ~8.5% of functions use jump labels. > > > + How often new problematic features appear? > > > > I'm not exactly sure what you mean, but it seems that anytime we add a > > new feature, we have to try to wrap our heads around how it interacts > > with the weirdness of late module patching. > > I agree that we need to think about it and it makes complications. > Anyway, I think that these are never the biggest problems. > > I would be more concerned about arch-specific features that might need > special handling in the livepatch code. Everyone talks only about > alternatives and jump_labels that were added long time ago. Jump labels have been around for many years, but we somehow missed implementing klp.arch for them. As I said this resulted in panics. There may be other similar cases lurking, both in x86 and other arches. It's not a comforting thought! And each case requires special klp code in addition to the real code. -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Thu, Sep 05, 2019 at 02:16:51PM +0200, Miroslav Benes wrote: > > > > A full demo would require packaging up replacement .ko's with a > > > > livepatch, as > > > > well as "blacklisting" those deprecated .kos, etc. But that's all I > > > > had time > > > > to cook up last week before our holiday weekend here. > > > > > > Frankly, I'm not sure about this approach. I'm kind of torn. The current > > > solution is far from ideal, but I'm not excited about the other options > > > either. It seems like the choice is basically between "general but > > > technically complicated fragile solution with nontrivial maintenance > > > burden", or "something safer and maybe cleaner, but limiting for > > > users/distros". Of course it depends on whether the limitation is even > > > real and how big it is. Unfortunately we cannot quantify it much and that > > > is probably why our opinions (in the email thread) differ. > > > > How would this option be "limiting for users/distros"? If the packaging > > part of the solution is done correctly then I don't see how it would be > > limiting. > > I'll try to explain my worries. > > Blacklisting first. Yes, I agree that it would make things a lot simpler, > but I am afraid it would not fly at SUSE. Petr meanwhile explained > elsewhere, but I don't think we can limit our customers that much. We > perceive live patching as a product as much transparent as possible and as > less intrusive as possible. One thing is to forbid to remove a module, the > other is to forbid its loading. > > We could warn the admin. Something like "there is a fix for a module foo, > which is not loaded currently. It will not be patched and the system will > be still vulnerable if you load the module unless a new fixed version is > provided." No. We just distribute the new .ko with the livepatch. It should be transparent to the user. > Yes, we can distribute the new version of .ko with a livepatch. What is > the reason for blacklisting then? I don't probably understand, but either > a module is loaded and we can patch it (without late module patching), or > it is not and we could replace .ko on disk. I think the blacklisting is a failsafe to prevent the old module from accidentally getting loaded after patching. > Now, I don't think that replacing .ko on disk is a good idea. We've > already discussed it. It would lead to a maintenance/packaging problem, > because you never know which version of the module is loaded in the > system. The state space grows rather rapidly there. What exactly are your concerns? Either the old version of the module is loaded, and it's livepatched; or the new version of the module is loaded, and it's not livepatched. Anyway that could be reported to the user somehow, e.g. report srcversion in sysfs. -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Thu, 5 Sep 2019, Josh Poimboeuf wrote: > On Thu, Sep 05, 2019 at 02:03:34PM +0200, Miroslav Benes wrote: > > > > + I would like to better understand the scope of the current > > > > problems. It is about modifying code in the livepatch that > > > > depends on position of the related code: > > > > > > > > + relocations are rather clear; we will need them anyway > > > > to access non-public (static) API from the original code. > > > > > > > > + What are the other changes? > > > > > > I think the .klp.arch sections are the big ones: > > > > > > .klp.arch.altinstructions > > > .klp.arch.parainstructions > > > .klp.arch.jump_labels (doesn't exist yet) > > > > > > And that's just x86... > > > > I may misunderstand, but we have .klp.arch sections because para and > > alternatives have to be processed after relocations. And if we cannot get > > rid of relocations completely, because of static symbols, then we cannot > > get rid of .klp.arch sections either. > > With late module patching gone, the module code can just process the klp > relocations at the same time it processes normal relocations. > > Then the normal module alt/para/jump_label processing code can be used > instead of arch_klp_init_object_loaded(). Ah, of course. I obviously cannot grasp the idea of not having late module patching :) > Note this also means that Joe's patches can remove copy_module_elf() and > free_module_elf(). And module_arch_freeing_init() in s390. Correct. So yes, it would simplify the code a lot. I am still worried about the consequences. > > > And then of course there's the klp coming/going notifiers which have > > > also been an additional source of complexity. > > > > True, but I think we (me and Petr) do not consider it as much of a problem > > as you. > > It's less of an issue than .klp.arch and all the related code which can > be removed. Ok. Miroslav
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Thu, Sep 05, 2019 at 02:03:34PM +0200, Miroslav Benes wrote: > > > + I would like to better understand the scope of the current > > > problems. It is about modifying code in the livepatch that > > > depends on position of the related code: > > > > > > + relocations are rather clear; we will need them anyway > > > to access non-public (static) API from the original code. > > > > > > + What are the other changes? > > > > I think the .klp.arch sections are the big ones: > > > > .klp.arch.altinstructions > > .klp.arch.parainstructions > > .klp.arch.jump_labels (doesn't exist yet) > > > > And that's just x86... > > I may misunderstand, but we have .klp.arch sections because para and > alternatives have to be processed after relocations. And if we cannot get > rid of relocations completely, because of static symbols, then we cannot > get rid of .klp.arch sections either. With late module patching gone, the module code can just process the klp relocations at the same time it processes normal relocations. Then the normal module alt/para/jump_label processing code can be used instead of arch_klp_init_object_loaded(). Note this also means that Joe's patches can remove copy_module_elf() and free_module_elf(). And module_arch_freeing_init() in s390. > > And then of course there's the klp coming/going notifiers which have > > also been an additional source of complexity. > > True, but I think we (me and Petr) do not consider it as much of a problem > as you. It's less of an issue than .klp.arch and all the related code which can be removed. -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Wed, 4 Sep 2019, Josh Poimboeuf wrote: > On Tue, Sep 03, 2019 at 03:02:34PM +0200, Miroslav Benes wrote: > > On Mon, 2 Sep 2019, Joe Lawrence wrote: > > > > > On 9/2/19 12:13 PM, Miroslav Benes wrote: > > > >> I can easily foresee more problems like those in the future. Going > > > >> forward we have to always keep track of which special sections are > > > >> needed for which architectures. Those special sections can change over > > > >> time, or can simply be overlooked for a given architecture. It's > > > >> fragile. > > > > > > > > Indeed. It bothers me a lot. Even x86 "port" is not feature complete in > > > > this regard (jump labels, alternatives,...) and who knows what lurks in > > > > the corners of the other architectures we support. > > > > > > > > So it is in itself reason enough to do something about late module > > > > patching. > > > > > > > > > > Hi Miroslav, > > > > > > I was tinkering with the "blue-sky" ideas that I mentioned to Josh the > > > other > > > day. > > > > > I dunno if you had a chance to look at what removing that code looks > > > like, but I can continue to flesh out that idea if it looks interesting: > > > > Unfortunately no and I don't think I'll come up with something useful > > before LPC, so anything is really welcome. > > > > > > > > https://github.com/joe-lawrence/linux/tree/blue-sky > > I like this a lot. > > > > A full demo would require packaging up replacement .ko's with a > > > livepatch, as > > > well as "blacklisting" those deprecated .kos, etc. But that's all I had > > > time > > > to cook up last week before our holiday weekend here. > > > > Frankly, I'm not sure about this approach. I'm kind of torn. The current > > solution is far from ideal, but I'm not excited about the other options > > either. It seems like the choice is basically between "general but > > technically complicated fragile solution with nontrivial maintenance > > burden", or "something safer and maybe cleaner, but limiting for > > users/distros". Of course it depends on whether the limitation is even > > real and how big it is. Unfortunately we cannot quantify it much and that > > is probably why our opinions (in the email thread) differ. > > How would this option be "limiting for users/distros"? If the packaging > part of the solution is done correctly then I don't see how it would be > limiting. I'll try to explain my worries. Blacklisting first. Yes, I agree that it would make things a lot simpler, but I am afraid it would not fly at SUSE. Petr meanwhile explained elsewhere, but I don't think we can limit our customers that much. We perceive live patching as a product as much transparent as possible and as less intrusive as possible. One thing is to forbid to remove a module, the other is to forbid its loading. We could warn the admin. Something like "there is a fix for a module foo, which is not loaded currently. It will not be patched and the system will be still vulnerable if you load the module unless a new fixed version is provided." Yes, we can distribute the new version of .ko with a livepatch. What is the reason for blacklisting then? I don't probably understand, but either a module is loaded and we can patch it (without late module patching), or it is not and we could replace .ko on disk. Now, I don't think that replacing .ko on disk is a good idea. We've already discussed it. It would lead to a maintenance/packaging problem, because you never know which version of the module is loaded in the system. The state space grows rather rapidly there. But I may be wrong in my understanding, so bear with me. Miroslav
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Wed, 4 Sep 2019, Josh Poimboeuf wrote: > On Wed, Sep 04, 2019 at 10:49:32AM +0200, Petr Mladek wrote: > > On Tue 2019-09-03 15:02:34, Miroslav Benes wrote: > > > On Mon, 2 Sep 2019, Joe Lawrence wrote: > > > > > > > On 9/2/19 12:13 PM, Miroslav Benes wrote: > > > > >> I can easily foresee more problems like those in the future. Going > > > > >> forward we have to always keep track of which special sections are > > > > >> needed for which architectures. Those special sections can change > > > > >> over > > > > >> time, or can simply be overlooked for a given architecture. It's > > > > >> fragile. > > > > > > > > > > Indeed. It bothers me a lot. Even x86 "port" is not feature complete > > > > > in > > > > > this regard (jump labels, alternatives,...) and who knows what lurks > > > > > in > > > > > the corners of the other architectures we support. > > > > > > > > > > So it is in itself reason enough to do something about late module > > > > > patching. > > > > > > > > > > > > > Hi Miroslav, > > > > > > > > I was tinkering with the "blue-sky" ideas that I mentioned to Josh the > > > > other > > > > day. > > > > > > > I dunno if you had a chance to look at what removing that code looks > > > > like, but I can continue to flesh out that idea if it looks interesting: > > > > > > Unfortunately no and I don't think I'll come up with something useful > > > before LPC, so anything is really welcome. > > > > > > > > > > > https://github.com/joe-lawrence/linux/tree/blue-sky > > > > > > > > A full demo would require packaging up replacement .ko's with a > > > > livepatch, as > > > > well as "blacklisting" those deprecated .kos, etc. But that's all I > > > > had time > > > > to cook up last week before our holiday weekend here. > > > > > > Frankly, I'm not sure about this approach. I'm kind of torn. The current > > > solution is far from ideal, but I'm not excited about the other options > > > either. It seems like the choice is basically between "general but > > > technically complicated fragile solution with nontrivial maintenance > > > burden", or "something safer and maybe cleaner, but limiting for > > > users/distros". Of course it depends on whether the limitation is even > > > real and how big it is. Unfortunately we cannot quantify it much and that > > > is probably why our opinions (in the email thread) differ. > > > > I wonder what is necessary for a productive discussion on Plumbers: > > > > + Josh would like to see what code can get removed when late > > handling of modules gets removed. I think that it might be > > partially visible from Joe's blue-sky patches. > > Yes, and I like what I see. Especially the removal of the .klp.arch > nastiness! > > > + I would like to better understand the scope of the current > > problems. It is about modifying code in the livepatch that > > depends on position of the related code: > > > > + relocations are rather clear; we will need them anyway > > to access non-public (static) API from the original code. > > > > + What are the other changes? > > I think the .klp.arch sections are the big ones: > > .klp.arch.altinstructions > .klp.arch.parainstructions > .klp.arch.jump_labels (doesn't exist yet) > > And that's just x86... I may misunderstand, but we have .klp.arch sections because para and alternatives have to be processed after relocations. And if we cannot get rid of relocations completely, because of static symbols, then we cannot get rid of .klp.arch sections either. > And then of course there's the klp coming/going notifiers which have > also been an additional source of complexity. True, but I think we (me and Petr) do not consider it as much of a problem as you. > I'd like to hear more specific negatives about Joe's recent patches, > which IMO, are the best option we've discussed so far. I'll reply elsewhere... Miroslav
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
[...] > I wonder what is necessary for a productive discussion on Plumbers: [...] > + It might be useful to prepare overview of the existing proposals > and agree on the positives and negatives. I am afraid that some > of them might depend on the customer base and > use cases. Sometimes we might not have enough information. > But it might be good to get on the same page where possible. > > Anyway, it might rule out some variants so that we could better > concentrate on the acceptable ones. Or come with yet another > proposal that would avoid the real blockers. My plan is to describe the problem first for the public in the room. Then describe the proposals and their advantages and disadvantages. This should start the discussion pretty well. I would be happy if we managed to settle at least on the requirements for a solution. It seems that our experience with users and their use cases differ a lot and I doubt we could come up with a good solution without stating what we want (and don't want) first. Silly hope is that there may be someone with a perfect solution in the room. After all, it is what conferences are for. > Would it be better to discuss this in a separate room with > a whiteboard or paperboard? Maybe. Miroslav
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On 9/5/19 7:09 AM, Petr Mladek wrote: On Wed 2019-09-04 21:50:55, Josh Poimboeuf wrote: On Wed, Sep 04, 2019 at 10:49:32AM +0200, Petr Mladek wrote: I wonder what is necessary for a productive discussion on Plumbers: + Josh would like to see what code can get removed when late handling of modules gets removed. I think that it might be partially visible from Joe's blue-sky patches. Yes, and I like what I see. Especially the removal of the .klp.arch nastiness! Could we get rid of it? Is there any other way to get access to static variables and functions from the livepatched code? Hi Petr, I think the question is whether .klp (not-arch specific) relocations would be sufficient (without late module patching). If it would a great simplification if those were all we needed. I'm not 100% sure about this quite yet, but am hoping that is the case. Anyway, it might rule out some variants so that we could better concentrate on the acceptable ones. Or come with yet another proposal that would avoid the real blockers. I'd like to hear more specific negatives about Joe's recent patches, which IMO, are the best option we've discussed so far. I discussed this approach with our project manager. He was not much excited about this solution. His first idea was that it would block attaching USB devices. They are used by admins when taking care of the servers. And there might be other scenarios where a new module might need loading to solve some situation. > Customers understand Livepatching as a way how to secure system without immediate reboot and with minimal (invisible) effect on the workload. They might get pretty surprised when the system > suddenly blocks their "normal" workflow. FWIW the complete blue-sky idea was that the package delivered to the customer (RPM, deb, whatever) would provide: - livepatch .ko, blacklists known vulnerable srcversions - updated .ko's for the blacklisted modules The second part would maintain module loading workflow, albeit with a new set .ko files. As Miroslav said. No solution is perfect. We need to find the most acceptable compromise. It seems that you are more concerned about saving code, reducing complexity and risk. I am more concerned about user satisfaction. It is almost impossible to predict effects on user satisfaction because they have different workflow, use case, expectation, and tolerance. We could better estimate the technical side of each solution: + implementation cost + maintenance cost + risks + possible improvements and hardening + user visible effects + complication and limits with creating livepatches From my POV, the most problematic is the arch-specific code. It is hard to maintain and we do not have it fully under control. And I do not believe that we could remove all arch specific code when we do not allow delayed livepatching of modules. No doubt there will probably always be some arch-specific code, and even my blue-sky branch didn't move all that much. But I think the idea could be a bigger simplification in terms of the mental model, should the solution be acceptable by criteria you mention above. -- Joe
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Thu, 5 Sep 2019, Petr Mladek wrote: > > I don't have a number, but it's very common to patch a function which > > uses jump labels or alternatives. > > Really? My impression is that both alternatives and jump_labels > are used in hot paths. I would expect them mostly in core code > that is always loaded. > > Alternatives are often used in assembly that we are not able > to livepatch anyway. > > Or are they spread widely via some macros or inlined functions? All the indirect jumps are turned into alternatives when retpolines are in place. -- Jiri Kosina SUSE Labs
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Wed 2019-09-04 21:50:55, Josh Poimboeuf wrote: > On Wed, Sep 04, 2019 at 10:49:32AM +0200, Petr Mladek wrote: > > I wonder what is necessary for a productive discussion on Plumbers: > > > > + Josh would like to see what code can get removed when late > > handling of modules gets removed. I think that it might be > > partially visible from Joe's blue-sky patches. > > Yes, and I like what I see. Especially the removal of the .klp.arch > nastiness! Could we get rid of it? Is there any other way to get access to static variables and functions from the livepatched code? > I think the .klp.arch sections are the big ones: > > .klp.arch.altinstructions > .klp.arch.parainstructions > .klp.arch.jump_labels (doesn't exist yet) > > And that's just x86... > > And then of course there's the klp coming/going notifiers which have > also been an additional source of complexity. > > > + Do we use them in livepatches? How often? > > I don't have a number, but it's very common to patch a function which > uses jump labels or alternatives. Really? My impression is that both alternatives and jump_labels are used in hot paths. I would expect them mostly in core code that is always loaded. Alternatives are often used in assembly that we are not able to livepatch anyway. Or are they spread widely via some macros or inlined functions? > > + How often new problematic features appear? > > I'm not exactly sure what you mean, but it seems that anytime we add a > new feature, we have to try to wrap our heads around how it interacts > with the weirdness of late module patching. I agree that we need to think about it and it makes complications. Anyway, I think that these are never the biggest problems. I would be more concerned about arch-specific features that might need special handling in the livepatch code. Everyone talks only about alternatives and jump_labels that were added long time ago. > > Anyway, it might rule out some variants so that we could better > > concentrate on the acceptable ones. Or come with yet another > > proposal that would avoid the real blockers. > > I'd like to hear more specific negatives about Joe's recent patches, > which IMO, are the best option we've discussed so far. I discussed this approach with our project manager. He was not much excited about this solution. His first idea was that it would block attaching USB devices. They are used by admins when taking care of the servers. And there might be other scenarios where a new module might need loading to solve some situation. Customers understand Livepatching as a way how to secure system without immediate reboot and with minimal (invisible) effect on the workload. They might get pretty surprised when the system suddenly blocks their "normal" workflow. As Miroslav said. No solution is perfect. We need to find the most acceptable compromise. It seems that you are more concerned about saving code, reducing complexity and risk. I am more concerned about user satisfaction. It is almost impossible to predict effects on user satisfaction because they have different workflow, use case, expectation, and tolerance. We could better estimate the technical side of each solution: + implementation cost + maintenance cost + risks + possible improvements and hardening + user visible effects + complication and limits with creating livepatches >From my POV, the most problematic is the arch-specific code. It is hard to maintain and we do not have it fully under control. And I do not believe that we could remove all arch specific code when we do not allow delayed livepatching of modules. Best Regards, Petr
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Wed, Sep 04, 2019 at 10:49:32AM +0200, Petr Mladek wrote: > On Tue 2019-09-03 15:02:34, Miroslav Benes wrote: > > On Mon, 2 Sep 2019, Joe Lawrence wrote: > > > > > On 9/2/19 12:13 PM, Miroslav Benes wrote: > > > >> I can easily foresee more problems like those in the future. Going > > > >> forward we have to always keep track of which special sections are > > > >> needed for which architectures. Those special sections can change over > > > >> time, or can simply be overlooked for a given architecture. It's > > > >> fragile. > > > > > > > > Indeed. It bothers me a lot. Even x86 "port" is not feature complete in > > > > this regard (jump labels, alternatives,...) and who knows what lurks in > > > > the corners of the other architectures we support. > > > > > > > > So it is in itself reason enough to do something about late module > > > > patching. > > > > > > > > > > Hi Miroslav, > > > > > > I was tinkering with the "blue-sky" ideas that I mentioned to Josh the > > > other > > > day. > > > > > I dunno if you had a chance to look at what removing that code looks > > > like, but I can continue to flesh out that idea if it looks interesting: > > > > Unfortunately no and I don't think I'll come up with something useful > > before LPC, so anything is really welcome. > > > > > > > > https://github.com/joe-lawrence/linux/tree/blue-sky > > > > > > A full demo would require packaging up replacement .ko's with a > > > livepatch, as > > > well as "blacklisting" those deprecated .kos, etc. But that's all I had > > > time > > > to cook up last week before our holiday weekend here. > > > > Frankly, I'm not sure about this approach. I'm kind of torn. The current > > solution is far from ideal, but I'm not excited about the other options > > either. It seems like the choice is basically between "general but > > technically complicated fragile solution with nontrivial maintenance > > burden", or "something safer and maybe cleaner, but limiting for > > users/distros". Of course it depends on whether the limitation is even > > real and how big it is. Unfortunately we cannot quantify it much and that > > is probably why our opinions (in the email thread) differ. > > I wonder what is necessary for a productive discussion on Plumbers: > > + Josh would like to see what code can get removed when late > handling of modules gets removed. I think that it might be > partially visible from Joe's blue-sky patches. Yes, and I like what I see. Especially the removal of the .klp.arch nastiness! > + I would like to better understand the scope of the current > problems. It is about modifying code in the livepatch that > depends on position of the related code: > > + relocations are rather clear; we will need them anyway > to access non-public (static) API from the original code. > > + What are the other changes? I think the .klp.arch sections are the big ones: .klp.arch.altinstructions .klp.arch.parainstructions .klp.arch.jump_labels (doesn't exist yet) And that's just x86... And then of course there's the klp coming/going notifiers which have also been an additional source of complexity. > + Do we use them in livepatches? How often? I don't have a number, but it's very common to patch a function which uses jump labels or alternatives. > + How often new problematic features appear? I'm not exactly sure what you mean, but it seems that anytime we add a new feature, we have to try to wrap our heads around how it interacts with the weirdness of late module patching. > + Would be possible to detect potential problems, for example > by comparing the code in the binary and in memory when > the module is loaded the normal way? Perhaps, though I assume this would be some out-of-band testing thing. > + Would be possible to reset the livepatch code in memory > when the related module is unloaded and safe us half > of the troubles? Maybe, but I think that would solve a much lower percentage of our troubles than half :-/ > + It might be useful to prepare overview of the existing proposals > and agree on the positives and negatives. I am afraid that some > of them might depend on the customer base and > use cases. Sometimes we might not have enough information. > But it might be good to get on the same page where possible. I think we've already done that for the existing proposals. Maybe Miroslav can summarize them at the LPC session. > Anyway, it might rule out some variants so that we could better > concentrate on the acceptable ones. Or come with yet another > proposal that would avoid the real blockers. I'd like to hear more specific negatives about Joe's recent patches, which IMO, are the best option we've discussed so far. -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Tue, Sep 03, 2019 at 03:02:34PM +0200, Miroslav Benes wrote: > On Mon, 2 Sep 2019, Joe Lawrence wrote: > > > On 9/2/19 12:13 PM, Miroslav Benes wrote: > > >> I can easily foresee more problems like those in the future. Going > > >> forward we have to always keep track of which special sections are > > >> needed for which architectures. Those special sections can change over > > >> time, or can simply be overlooked for a given architecture. It's > > >> fragile. > > > > > > Indeed. It bothers me a lot. Even x86 "port" is not feature complete in > > > this regard (jump labels, alternatives,...) and who knows what lurks in > > > the corners of the other architectures we support. > > > > > > So it is in itself reason enough to do something about late module > > > patching. > > > > > > > Hi Miroslav, > > > > I was tinkering with the "blue-sky" ideas that I mentioned to Josh the other > > day. > > > I dunno if you had a chance to look at what removing that code looks > > like, but I can continue to flesh out that idea if it looks interesting: > > Unfortunately no and I don't think I'll come up with something useful > before LPC, so anything is really welcome. > > > > > https://github.com/joe-lawrence/linux/tree/blue-sky I like this a lot. > > A full demo would require packaging up replacement .ko's with a livepatch, > > as > > well as "blacklisting" those deprecated .kos, etc. But that's all I had > > time > > to cook up last week before our holiday weekend here. > > Frankly, I'm not sure about this approach. I'm kind of torn. The current > solution is far from ideal, but I'm not excited about the other options > either. It seems like the choice is basically between "general but > technically complicated fragile solution with nontrivial maintenance > burden", or "something safer and maybe cleaner, but limiting for > users/distros". Of course it depends on whether the limitation is even > real and how big it is. Unfortunately we cannot quantify it much and that > is probably why our opinions (in the email thread) differ. How would this option be "limiting for users/distros"? If the packaging part of the solution is done correctly then I don't see how it would be limiting. -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On 9/4/19 4:49 AM, Petr Mladek wrote: On Tue 2019-09-03 15:02:34, Miroslav Benes wrote: On Mon, 2 Sep 2019, Joe Lawrence wrote: On 9/2/19 12:13 PM, Miroslav Benes wrote: I can easily foresee more problems like those in the future. Going forward we have to always keep track of which special sections are needed for which architectures. Those special sections can change over time, or can simply be overlooked for a given architecture. It's fragile. Indeed. It bothers me a lot. Even x86 "port" is not feature complete in this regard (jump labels, alternatives,...) and who knows what lurks in the corners of the other architectures we support. So it is in itself reason enough to do something about late module patching. Hi Miroslav, I was tinkering with the "blue-sky" ideas that I mentioned to Josh the other day. I dunno if you had a chance to look at what removing that code looks like, but I can continue to flesh out that idea if it looks interesting: Unfortunately no and I don't think I'll come up with something useful before LPC, so anything is really welcome. https://github.com/joe-lawrence/linux/tree/blue-sky A full demo would require packaging up replacement .ko's with a livepatch, as well as "blacklisting" those deprecated .kos, etc. But that's all I had time to cook up last week before our holiday weekend here. Frankly, I'm not sure about this approach. I'm kind of torn. The current solution is far from ideal, but I'm not excited about the other options either. It seems like the choice is basically between "general but technically complicated fragile solution with nontrivial maintenance burden", or "something safer and maybe cleaner, but limiting for users/distros". Of course it depends on whether the limitation is even real and how big it is. Unfortunately we cannot quantify it much and that is probably why our opinions (in the email thread) differ. I wonder what is necessary for a productive discussion on Plumbers: Pre-planning this part of the miniconf is a great idea. + Josh would like to see what code can get removed when late handling of modules gets removed. I think that it might be partially visible from Joe's blue-sky patches. + I would like to better understand the scope of the current problems. It is about modifying code in the livepatch that depends on position of the related code: + relocations are rather clear; we will need them anyway to access non-public (static) API from the original code. + What are the other changes? + Do we use them in livepatches? How often? + How often new problematic features appear? + Would be possible to detect potential problems, for example by comparing the code in the binary and in memory when the module is loaded the normal way? + Would be possible to reset the livepatch code in memory when the related module is unloaded and safe us half of the troubles? + It might be useful to prepare overview of the existing proposals and agree on the positives and negatives. I am afraid that some of them might depend on the customer base and use cases. Sometimes we might not have enough information. But it might be good to get on the same page where possible. Anyway, it might rule out some variants so that we could better concentrate on the acceptable ones. Or come with yet another proposal that would avoid the real blockers. Any other ideas? I'll just add to your list that late module patching introduces complexity for klp-convert / livepatch style relocation support. Without worrying about unloaded modules, I *think* klp-convert might already be able to handle relocations in special sections (altinsts, parainst, etc.). I've put the current klp-convert patchset on top of the blue-sky branch to see if this indeed the case, but I'm not sure if I'll get through that experiment before LPC. Would it be better to discuss this in a separate room with a whiteboard or paperboard? Whiteboard would probably be ideal, but paper would work and be more transportable than the former. -- Joe
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Tue 2019-09-03 15:02:34, Miroslav Benes wrote: > On Mon, 2 Sep 2019, Joe Lawrence wrote: > > > On 9/2/19 12:13 PM, Miroslav Benes wrote: > > >> I can easily foresee more problems like those in the future. Going > > >> forward we have to always keep track of which special sections are > > >> needed for which architectures. Those special sections can change over > > >> time, or can simply be overlooked for a given architecture. It's > > >> fragile. > > > > > > Indeed. It bothers me a lot. Even x86 "port" is not feature complete in > > > this regard (jump labels, alternatives,...) and who knows what lurks in > > > the corners of the other architectures we support. > > > > > > So it is in itself reason enough to do something about late module > > > patching. > > > > > > > Hi Miroslav, > > > > I was tinkering with the "blue-sky" ideas that I mentioned to Josh the other > > day. > > > I dunno if you had a chance to look at what removing that code looks > > like, but I can continue to flesh out that idea if it looks interesting: > > Unfortunately no and I don't think I'll come up with something useful > before LPC, so anything is really welcome. > > > > > https://github.com/joe-lawrence/linux/tree/blue-sky > > > > A full demo would require packaging up replacement .ko's with a livepatch, > > as > > well as "blacklisting" those deprecated .kos, etc. But that's all I had > > time > > to cook up last week before our holiday weekend here. > > Frankly, I'm not sure about this approach. I'm kind of torn. The current > solution is far from ideal, but I'm not excited about the other options > either. It seems like the choice is basically between "general but > technically complicated fragile solution with nontrivial maintenance > burden", or "something safer and maybe cleaner, but limiting for > users/distros". Of course it depends on whether the limitation is even > real and how big it is. Unfortunately we cannot quantify it much and that > is probably why our opinions (in the email thread) differ. I wonder what is necessary for a productive discussion on Plumbers: + Josh would like to see what code can get removed when late handling of modules gets removed. I think that it might be partially visible from Joe's blue-sky patches. + I would like to better understand the scope of the current problems. It is about modifying code in the livepatch that depends on position of the related code: + relocations are rather clear; we will need them anyway to access non-public (static) API from the original code. + What are the other changes? + Do we use them in livepatches? How often? + How often new problematic features appear? + Would be possible to detect potential problems, for example by comparing the code in the binary and in memory when the module is loaded the normal way? + Would be possible to reset the livepatch code in memory when the related module is unloaded and safe us half of the troubles? + It might be useful to prepare overview of the existing proposals and agree on the positives and negatives. I am afraid that some of them might depend on the customer base and use cases. Sometimes we might not have enough information. But it might be good to get on the same page where possible. Anyway, it might rule out some variants so that we could better concentrate on the acceptable ones. Or come with yet another proposal that would avoid the real blockers. Any other ideas? Would it be better to discuss this in a separate room with a whiteboard or paperboard? Best Regards, Petr
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Mon, 2 Sep 2019, Joe Lawrence wrote: > On 9/2/19 12:13 PM, Miroslav Benes wrote: > >> I can easily foresee more problems like those in the future. Going > >> forward we have to always keep track of which special sections are > >> needed for which architectures. Those special sections can change over > >> time, or can simply be overlooked for a given architecture. It's > >> fragile. > > > > Indeed. It bothers me a lot. Even x86 "port" is not feature complete in > > this regard (jump labels, alternatives,...) and who knows what lurks in > > the corners of the other architectures we support. > > > > So it is in itself reason enough to do something about late module > > patching. > > > > Hi Miroslav, > > I was tinkering with the "blue-sky" ideas that I mentioned to Josh the other > day. > I dunno if you had a chance to look at what removing that code looks > like, but I can continue to flesh out that idea if it looks interesting: Unfortunately no and I don't think I'll come up with something useful before LPC, so anything is really welcome. > > https://github.com/joe-lawrence/linux/tree/blue-sky > > A full demo would require packaging up replacement .ko's with a livepatch, as > well as "blacklisting" those deprecated .kos, etc. But that's all I had time > to cook up last week before our holiday weekend here. Frankly, I'm not sure about this approach. I'm kind of torn. The current solution is far from ideal, but I'm not excited about the other options either. It seems like the choice is basically between "general but technically complicated fragile solution with nontrivial maintenance burden", or "something safer and maybe cleaner, but limiting for users/distros". Of course it depends on whether the limitation is even real and how big it is. Unfortunately we cannot quantify it much and that is probably why our opinions (in the email thread) differ. Not much constructive email, but I have to think about it some more (before LPC). Regards Miroslav
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On 9/2/19 12:13 PM, Miroslav Benes wrote: I can easily foresee more problems like those in the future. Going forward we have to always keep track of which special sections are needed for which architectures. Those special sections can change over time, or can simply be overlooked for a given architecture. It's fragile. Indeed. It bothers me a lot. Even x86 "port" is not feature complete in this regard (jump labels, alternatives,...) and who knows what lurks in the corners of the other architectures we support. So it is in itself reason enough to do something about late module patching. Hi Miroslav, I was tinkering with the "blue-sky" ideas that I mentioned to Josh the other day. I dunno if you had a chance to look at what removing that code looks like, but I can continue to flesh out that idea if it looks interesting: https://github.com/joe-lawrence/linux/tree/blue-sky A full demo would require packaging up replacement .ko's with a livepatch, as well as "blacklisting" those deprecated .kos, etc. But that's all I had time to cook up last week before our holiday weekend here. Regards, -- Joe
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
> I can easily foresee more problems like those in the future. Going > forward we have to always keep track of which special sections are > needed for which architectures. Those special sections can change over > time, or can simply be overlooked for a given architecture. It's > fragile. Indeed. It bothers me a lot. Even x86 "port" is not feature complete in this regard (jump labels, alternatives,...) and who knows what lurks in the corners of the other architectures we support. So it is in itself reason enough to do something about late module patching. Miroslav
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Tue, Aug 27, 2019 at 11:05:51AM -0400, Joe Lawrence wrote: > > Sure, it introduces risk. But we have to compare that risk (which only > > affects rare edge cases) with the ones introduced by the late module > > patching code. I get the feeling that "late module patching" introduces > > risk to a broader range of use cases than "occasional loading of unused > > modules". > > > > The latter risk could be minimized by introducing a disabled state for > > modules - load it in memory, but don't expose it to users until > > explicitly loaded. Just a brainstormed idea; not sure whether it would > > work in practice. > > > > Interesting idea. We would need to ensure consistency between the > loaded-but-not-enabled module and the version on disk. Does module init run > when it's enabled? Etc. I don't think there can be a mismatch unless somebody is mucking with the .ko files directly -- and anyway that would already be a problem today, because the patch module already assumes that the runtime version of the module matches what the patch module was built against. > > > What about folding this the other way? ie, if a livepatch targets unloaded > module foo, loaded module bar, and vmlinux ... it effectively patches bar > and vmlinux, but the foo changes are dropped. Responsibility is placed on > the admin to install an updated foo before loading it (in which case, > livepatching core will again ignore foo.) > > Building on this idea, perhaps loading that livepatch would also blacklist > specific, known vulnerable (unloaded) module versions. If the admin tries > to load one, a debug msg is generated explaining why it can't be loaded by > default. > > I like this. One potential tweak: the updated modules could be delivered with the patch module, and either replaced on disk or otherwise hooked into modprobe. > > > > >+ It might open more security holes that are not fixed by > > > > > the livepatch. > > > > > > > > Following the same line of thinking, the livepatch infrastructure might > > > > open security holes because of the inherent complexity of late module > > > > patching. > > > > > > Could you be more specific, please? > > > Has there been any known security hole in the late module > > > livepatching code? > > > > Just off the top of my head, I can think of two recent bugs which can be > > blamed on late module patching: > > > > 1) There was a RHEL-only bug which caused arch_klp_init_object_loaded() > > to not be loaded. This resulted in a panic when certain patched code > > was executed. > > > > 2) arch_klp_init_object_loaded() currently doesn't have any jump label > > specific code. This has recently caused panics for patched code > > which relies on static keys. The workaround is to not use jump > > labels in patched code. The real fix is to add support for them in > > arch_klp_init_object_loaded(). > > > > I can easily foresee more problems like those in the future. Going > > forward we have to always keep track of which special sections are > > needed for which architectures. Those special sections can change over > > time, or can simply be overlooked for a given architecture. It's > > fragile. > > FWIW, the static keys case is more involved than simple deferred relocations > -- those keys are added to lists and then the static key code futzes with > them when it needs to update code sites. That means the code managing the > data structures, kernel/jump_label.c, will need to understand livepatch's > strangely loaded-but-not-initialized variants. > > I don't think the other special sections will require such invasive changes, > but it's something to keep in mind with respect to late module patching. Maybe it could be implemented in a way that such differences are transparent (insert lots of hand-waving). So as far as I can tell, we currently have three feasible options: 1) drop unloaded module changes (and blacklist the old .ko and/or replace it) 2) use per-object patches (with no exported function changes) 3) half-load unloaded modules so we can patch them I think I like #1, if we could figure out a simple way to do it. -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On 8/26/19 10:54 AM, Josh Poimboeuf wrote: On Fri, Aug 23, 2019 at 10:13:06AM +0200, Petr Mladek wrote: On Thu 2019-08-22 17:36:49, Josh Poimboeuf wrote: On Fri, Aug 16, 2019 at 11:46:08AM +0200, Petr Mladek wrote: On Wed 2019-08-14 10:12:44, Josh Poimboeuf wrote: On Wed, Aug 14, 2019 at 01:06:09PM +0200, Miroslav Benes wrote: Really, we should be going in the opposite direction, by creating module dependencies, like all other kernel modules do, ensuring that a module is loaded *before* we patch it. That would also eliminate this bug. We should look at whether it makes sense to destabilize live patching for everybody, for a small minority of people who care about a small minority of edge cases. I do not see it that simple. Forcing livepatched modules to be loaded would mean loading "random" new modules when updating livepatches: I don't want to start a long debate on this, because this idea isn't even my first choice. But we shouldn't dismiss it outright. I am glad to hear that this is not your first choice. + It means more actions and higher risk to destabilize the system. Different modules have different quality. Maybe the distro shouldn't ship modules which would destabilize the system. Is this realistic? Even the best QA could not check all scenarios. My point is that the more actions we do the bigger the risk is. Sure, it introduces risk. But we have to compare that risk (which only affects rare edge cases) with the ones introduced by the late module patching code. I get the feeling that "late module patching" introduces risk to a broader range of use cases than "occasional loading of unused modules". The latter risk could be minimized by introducing a disabled state for modules - load it in memory, but don't expose it to users until explicitly loaded. Just a brainstormed idea; not sure whether it would work in practice. Interesting idea. We would need to ensure consistency between the loaded-but-not-enabled module and the version on disk. Does module init run when it's enabled? Etc. What about folding this the other way? ie, if a livepatch targets unloaded module foo, loaded module bar, and vmlinux ... it effectively patches bar and vmlinux, but the foo changes are dropped. Responsibility is placed on the admin to install an updated foo before loading it (in which case, livepatching core will again ignore foo.) Building on this idea, perhaps loading that livepatch would also blacklist specific, known vulnerable (unloaded) module versions. If the admin tries to load one, a debug msg is generated explaining why it can't be loaded by default. + It might open more security holes that are not fixed by the livepatch. Following the same line of thinking, the livepatch infrastructure might open security holes because of the inherent complexity of late module patching. Could you be more specific, please? Has there been any known security hole in the late module livepatching code? Just off the top of my head, I can think of two recent bugs which can be blamed on late module patching: 1) There was a RHEL-only bug which caused arch_klp_init_object_loaded() to not be loaded. This resulted in a panic when certain patched code was executed. 2) arch_klp_init_object_loaded() currently doesn't have any jump label specific code. This has recently caused panics for patched code which relies on static keys. The workaround is to not use jump labels in patched code. The real fix is to add support for them in arch_klp_init_object_loaded(). I can easily foresee more problems like those in the future. Going forward we have to always keep track of which special sections are needed for which architectures. Those special sections can change over time, or can simply be overlooked for a given architecture. It's fragile. FWIW, the static keys case is more involved than simple deferred relocations -- those keys are added to lists and then the static key code futzes with them when it needs to update code sites. That means the code managing the data structures, kernel/jump_label.c, will need to understand livepatch's strangely loaded-but-not-initialized variants. I don't think the other special sections will require such invasive changes, but it's something to keep in mind with respect to late module patching. -- Joe
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Mon, Aug 26, 2019 at 03:44:21PM +0200, Nicolai Stange wrote: > Josh Poimboeuf writes: > > > On Wed, Aug 14, 2019 at 01:06:09PM +0200, Miroslav Benes wrote: > >> > Really, we should be going in the opposite direction, by creating module > >> > dependencies, like all other kernel modules do, ensuring that a module > >> > is loaded *before* we patch it. That would also eliminate this bug. > >> > >> Yes, but it is not ideal either with cumulative one-fixes-all patch > >> modules. It would load also modules which are not necessary for a > >> customer and I know that at least some customers care about this. They > >> want to deploy only things which are crucial for their systems. > > Security concerns set aside, some of the patched modules might get > distributed separately from the main kernel through some sort of > kernel-*-extra packages and thus, not be found on some target system > at all. Or they might have been blacklisted. True. > > If you frame the question as "do you want to destabilize the live > > patching infrastucture" then the answer might be different. > > > > We should look at whether it makes sense to destabilize live patching > > for everybody, for a small minority of people who care about a small > > minority of edge cases. > > > > Or maybe there's some other solution we haven't thought about, which > > fits more in the framework of how kernel modules already work. > > > >> We could split patch modules as you proposed in the past, but that have > >> issues as well. > > > > Right, I'm not really crazy about that solution either. > > > > Here's another idea: per-object patch modules. Patches to vmlinux are > > in a vmlinux patch module. Patches to kvm.ko are in a kvm patch module. > > That would require: > > > > - Careful management of dependencies between object-specific patches. > > Maybe that just means that exported function ABIs shouldn't change. > > > > - Some kind of hooking into modprobe to ensure the patch module gets > > loaded with the real one. > > > > - Changing 'atomic replace' to allow patch modules to be per-object. > > > > Perhaps I'm misunderstanding, but supporting only per-object livepatch > modules would make livepatch creation for something like commit > 15fab63e1e57 ("fs: prevent page refcount overflow in pipe_buf_get"), > CVE-2019-11487 really cumbersome (see the fuse part)? Just don't change exported interfaces. In this case you could leave generic_pipe_buf_get() alone and then instead add a generic_pipe_buf_get__patched() which is called by the patched fuse module. If you build the fuse-specific livepatch module right, it would automatically have a dependency on the vmlinux-specific livepatch module. > I think I've seen similar interdependencies between e.g. kvm.ko <-> > kvm_intel.ko, but can't find an example right now. -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Fri, Aug 23, 2019 at 10:13:06AM +0200, Petr Mladek wrote: > On Thu 2019-08-22 17:36:49, Josh Poimboeuf wrote: > > On Fri, Aug 16, 2019 at 11:46:08AM +0200, Petr Mladek wrote: > > > On Wed 2019-08-14 10:12:44, Josh Poimboeuf wrote: > > > > On Wed, Aug 14, 2019 at 01:06:09PM +0200, Miroslav Benes wrote: > > > > > > Really, we should be going in the opposite direction, by creating > > > > > > module > > > > > > dependencies, like all other kernel modules do, ensuring that a > > > > > > module > > > > > > is loaded *before* we patch it. That would also eliminate this bug. > > > > > > > > We should look at whether it makes sense to destabilize live patching > > > > for everybody, for a small minority of people who care about a small > > > > minority of edge cases. > > > > > > I do not see it that simple. Forcing livepatched modules to be > > > loaded would mean loading "random" new modules when updating > > > livepatches: > > > > I don't want to start a long debate on this, because this idea isn't > > even my first choice. But we shouldn't dismiss it outright. > > I am glad to hear that this is not your first choice. > > > > > + It means more actions and higher risk to destabilize > > > the system. Different modules have different quality. > > > > Maybe the distro shouldn't ship modules which would destabilize the > > system. > > Is this realistic? Even the best QA could not check all scenarios. > My point is that the more actions we do the bigger the risk is. Sure, it introduces risk. But we have to compare that risk (which only affects rare edge cases) with the ones introduced by the late module patching code. I get the feeling that "late module patching" introduces risk to a broader range of use cases than "occasional loading of unused modules". The latter risk could be minimized by introducing a disabled state for modules - load it in memory, but don't expose it to users until explicitly loaded. Just a brainstormed idea; not sure whether it would work in practice. > Anyway, this approach might cause loading modules that are never > or rarely loaded together. Real life systems have limited number of > peripherals. > > I wonder if it might actually break certification of some > hardware. It is just an idea. I do not know how certifications > are done and what is the scope or limits. > > > > > + It might open more security holes that are not fixed by > > > the livepatch. > > > > Following the same line of thinking, the livepatch infrastructure might > > open security holes because of the inherent complexity of late module > > patching. > > Could you be more specific, please? > Has there been any known security hole in the late module > livepatching code? Just off the top of my head, I can think of two recent bugs which can be blamed on late module patching: 1) There was a RHEL-only bug which caused arch_klp_init_object_loaded() to not be loaded. This resulted in a panic when certain patched code was executed. 2) arch_klp_init_object_loaded() currently doesn't have any jump label specific code. This has recently caused panics for patched code which relies on static keys. The workaround is to not use jump labels in patched code. The real fix is to add support for them in arch_klp_init_object_loaded(). I can easily foresee more problems like those in the future. Going forward we have to always keep track of which special sections are needed for which architectures. Those special sections can change over time, or can simply be overlooked for a given architecture. It's fragile. Not to mention that most of the bugs we've fixed over the years seem to be related to klp_init_object_loaded() and klp_module_coming/going(). I would expect that to continue given the hackish nature of late module loading. With live patching, almost any bug can be a security bug. > > > + It might require some extra configuration actions to handle > > > the newly opened interfaces (devices). For example, updating > > > SELinux policies. > > > > I assume you mean user-created policies, not distro ones? Is this even > > a realistic concern? > > Honestly, I do not know. I am not familiar with this area. There are > also containers. They are going to be everywhere. They also need a lot > of rules to keep stuff separated. And it is another area where I have > no idea if newly loaded and unexpectedly modules might need special > handling. > > > > > + Are there conflicting modules that might need to get > > > livepatched? > > > > Again is this realistic? > > I do not know. But I could imagine it. > > > > > This approach has a strong no-go from my side. > > > > > > > > I agree it's not ideal, but nothing is ideal at this point. Let's not > > to rule it out prematurely. I do feel that our current approach is not > > the best. It will continue to create problems for us until we fix it. > > I am sure that we could do better. I just think that forcibly
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
Josh Poimboeuf writes: > On Wed, Aug 14, 2019 at 01:06:09PM +0200, Miroslav Benes wrote: >> > Really, we should be going in the opposite direction, by creating module >> > dependencies, like all other kernel modules do, ensuring that a module >> > is loaded *before* we patch it. That would also eliminate this bug. >> >> Yes, but it is not ideal either with cumulative one-fixes-all patch >> modules. It would load also modules which are not necessary for a >> customer and I know that at least some customers care about this. They >> want to deploy only things which are crucial for their systems. Security concerns set aside, some of the patched modules might get distributed separately from the main kernel through some sort of kernel-*-extra packages and thus, not be found on some target system at all. Or they might have been blacklisted. > If you frame the question as "do you want to destabilize the live > patching infrastucture" then the answer might be different. > > We should look at whether it makes sense to destabilize live patching > for everybody, for a small minority of people who care about a small > minority of edge cases. > > Or maybe there's some other solution we haven't thought about, which > fits more in the framework of how kernel modules already work. > >> We could split patch modules as you proposed in the past, but that have >> issues as well. > > Right, I'm not really crazy about that solution either. > > Here's another idea: per-object patch modules. Patches to vmlinux are > in a vmlinux patch module. Patches to kvm.ko are in a kvm patch module. > That would require: > > - Careful management of dependencies between object-specific patches. > Maybe that just means that exported function ABIs shouldn't change. > > - Some kind of hooking into modprobe to ensure the patch module gets > loaded with the real one. > > - Changing 'atomic replace' to allow patch modules to be per-object. > Perhaps I'm misunderstanding, but supporting only per-object livepatch modules would make livepatch creation for something like commit 15fab63e1e57 ("fs: prevent page refcount overflow in pipe_buf_get"), CVE-2019-11487 really cumbersome (see the fuse part)? I think I've seen similar interdependencies between e.g. kvm.ko <-> kvm_intel.ko, but can't find an example right now. Thanks, Nicolai -- SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 247165, AG München), GF: Felix Imendörffer
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Thu 2019-08-22 17:36:49, Josh Poimboeuf wrote: > On Fri, Aug 16, 2019 at 11:46:08AM +0200, Petr Mladek wrote: > > On Wed 2019-08-14 10:12:44, Josh Poimboeuf wrote: > > > On Wed, Aug 14, 2019 at 01:06:09PM +0200, Miroslav Benes wrote: > > > > > Really, we should be going in the opposite direction, by creating > > > > > module > > > > > dependencies, like all other kernel modules do, ensuring that a module > > > > > is loaded *before* we patch it. That would also eliminate this bug. > > > > > > We should look at whether it makes sense to destabilize live patching > > > for everybody, for a small minority of people who care about a small > > > minority of edge cases. > > > > I do not see it that simple. Forcing livepatched modules to be > > loaded would mean loading "random" new modules when updating > > livepatches: > > I don't want to start a long debate on this, because this idea isn't > even my first choice. But we shouldn't dismiss it outright. I am glad to hear that this is not your first choice. > > + It means more actions and higher risk to destabilize > > the system. Different modules have different quality. > > Maybe the distro shouldn't ship modules which would destabilize the > system. Is this realistic? Even the best QA could not check all scenarios. My point is that the more actions we do the bigger the risk is. Anyway, this approach might cause loading modules that are never or rarely loaded together. Real life systems have limited number of peripherals. I wonder if it might actually break certification of some hardware. It is just an idea. I do not know how certifications are done and what is the scope or limits. > > + It might open more security holes that are not fixed by > > the livepatch. > > Following the same line of thinking, the livepatch infrastructure might > open security holes because of the inherent complexity of late module > patching. Could you be more specific, please? Has there been any known security hole in the late module livepatching code? > > + It might require some extra configuration actions to handle > > the newly opened interfaces (devices). For example, updating > > SELinux policies. > > I assume you mean user-created policies, not distro ones? Is this even > a realistic concern? Honestly, I do not know. I am not familiar with this area. There are also containers. They are going to be everywhere. They also need a lot of rules to keep stuff separated. And it is another area where I have no idea if newly loaded and unexpectedly modules might need special handling. > > + Are there conflicting modules that might need to get > > livepatched? > > Again is this realistic? I do not know. But I could imagine it. > > This approach has a strong no-go from my side. > > > > I agree it's not ideal, but nothing is ideal at this point. Let's not > to rule it out prematurely. I do feel that our current approach is not > the best. It will continue to create problems for us until we fix it. I am sure that we could do better. I just think that forcibly loading modules is opening too huge can of worms. Basically all other approaches have more limited or better defined effects. For example, the newly added code that clears the relocations is something that can be tested. Behavior of "random" mix of loaded modules opens possibilities that have never been discovered before. > > > - Changing 'atomic replace' to allow patch modules to be per-object. > > > > The problem might be how to transition all loaded objects atomically > > when the needed code is loaded from different modules. > > I'm not sure what you mean. > > My idea was that each patch module would be specific to an object, with > no inter-module change dependencies. So when using atomic replace, if > the patch module is only targeted to vmlinux, then only vmlinux-targeted > patch modules would be replaced. > > In other words, 'atomic replace' would be object-specific. > > > Alternative would be to support only per-object consitency. But it > > might reduce the number of supported scenarios too much. Also it > > would make livepatching more error-prone. By per-object consistency I mean the same as you with "each patch module would be specific to an object, with no inter-module change dependencies". My concern is that it would prevent semantic changes in a shared code. Semantic changes are rare. But changes in shared code are not. If we reduce the consistency to per-object consistency. Will the consistency still make sense then? We might want to go back to trees, I mean immediate mode. Best Regards, Petr
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Fri, Aug 16, 2019 at 11:46:08AM +0200, Petr Mladek wrote: > On Wed 2019-08-14 10:12:44, Josh Poimboeuf wrote: > > On Wed, Aug 14, 2019 at 01:06:09PM +0200, Miroslav Benes wrote: > > > > Really, we should be going in the opposite direction, by creating module > > > > dependencies, like all other kernel modules do, ensuring that a module > > > > is loaded *before* we patch it. That would also eliminate this bug. > > > > > > Yes, but it is not ideal either with cumulative one-fixes-all patch > > > modules. It would load also modules which are not necessary for a > > > customer and I know that at least some customers care about this. They > > > want to deploy only things which are crucial for their systems. > > > > If you frame the question as "do you want to destabilize the live > > patching infrastucture" then the answer might be different. > > > > We should look at whether it makes sense to destabilize live patching > > for everybody, for a small minority of people who care about a small > > minority of edge cases. > > I do not see it that simple. Forcing livepatched modules to be > loaded would mean loading "random" new modules when updating > livepatches: I don't want to start a long debate on this, because this idea isn't even my first choice. But we shouldn't dismiss it outright. > + It means more actions and higher risk to destabilize > the system. Different modules have different quality. Maybe the distro shouldn't ship modules which would destabilize the system. > + It might open more security holes that are not fixed by > the livepatch. Following the same line of thinking, the livepatch infrastructure might open security holes because of the inherent complexity of late module patching. > + It might require some extra configuration actions to handle > the newly opened interfaces (devices). For example, updating > SELinux policies. I assume you mean user-created policies, not distro ones? Is this even a realistic concern? > + Are there conflicting modules that might need to get > livepatched? Again is this realistic? > This approach has a strong no-go from my side. I agree it's not ideal, but nothing is ideal at this point. Let's not to rule it out prematurely. I do feel that our current approach is not the best. It will continue to create problems for us until we fix it. > > > Or maybe there's some other solution we haven't thought about, which > > fits more in the framework of how kernel modules already work. > > > > > We could split patch modules as you proposed in the past, but that have > > > issues as well. > > > Right, I'm not really crazy about that solution either. > > Yes, this would just move the problem somewhere else. > > > > Here's another idea: per-object patch modules. Patches to vmlinux are > > in a vmlinux patch module. Patches to kvm.ko are in a kvm patch module. > > That would require: > > > > - Careful management of dependencies between object-specific patches. > > Maybe that just means that exported function ABIs shouldn't change. > > > > - Some kind of hooking into modprobe to ensure the patch module gets > > loaded with the real one. > > I see this just as a particular approach how to split livepatches > per-object. The above points suggest how to handle dependencies > on the kernel side. Yes, they would need to be done on the distro / patch creation / operational side. They probably wouldn't affect livepatch code. > > - Changing 'atomic replace' to allow patch modules to be per-object. > > The problem might be how to transition all loaded objects atomically > when the needed code is loaded from different modules. I'm not sure what you mean. My idea was that each patch module would be specific to an object, with no inter-module change dependencies. So when using atomic replace, if the patch module is only targeted to vmlinux, then only vmlinux-targeted patch modules would be replaced. In other words, 'atomic replace' would be object-specific. > Alternative would be to support only per-object consitency. But it > might reduce the number of supported scenarios too much. Also it > would make livepatching more error-prone. Again, I don't follow. -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Wed 2019-08-14 10:12:44, Josh Poimboeuf wrote: > On Wed, Aug 14, 2019 at 01:06:09PM +0200, Miroslav Benes wrote: > > > Really, we should be going in the opposite direction, by creating module > > > dependencies, like all other kernel modules do, ensuring that a module > > > is loaded *before* we patch it. That would also eliminate this bug. > > > > Yes, but it is not ideal either with cumulative one-fixes-all patch > > modules. It would load also modules which are not necessary for a > > customer and I know that at least some customers care about this. They > > want to deploy only things which are crucial for their systems. > > If you frame the question as "do you want to destabilize the live > patching infrastucture" then the answer might be different. > > We should look at whether it makes sense to destabilize live patching > for everybody, for a small minority of people who care about a small > minority of edge cases. I do not see it that simple. Forcing livepatched modules to be loaded would mean loading "random" new modules when updating livepatches: + It means more actions and higher risk to destabilize the system. Different modules have different quality. + It might open more security holes that are not fixed by the livepatch. + It might require some extra configuration actions to handle the newly opened interfaces (devices). For example, updating SELinux policies. + Are there conflicting modules that might need to get livepatched? This approach has a strong no-go from my side. > Or maybe there's some other solution we haven't thought about, which > fits more in the framework of how kernel modules already work. > > > We could split patch modules as you proposed in the past, but that have > > issues as well. > Right, I'm not really crazy about that solution either. Yes, this would just move the problem somewhere else. > Here's another idea: per-object patch modules. Patches to vmlinux are > in a vmlinux patch module. Patches to kvm.ko are in a kvm patch module. > That would require: > > - Careful management of dependencies between object-specific patches. > Maybe that just means that exported function ABIs shouldn't change. > > - Some kind of hooking into modprobe to ensure the patch module gets > loaded with the real one. I see this just as a particular approach how to split livepatches per-object. The above points suggest how to handle dependencies on the kernel side. > - Changing 'atomic replace' to allow patch modules to be per-object. The problem might be how to transition all loaded objects atomically when the needed code is loaded from different modules. Alternative would be to support only per-object consitency. But it might reduce the number of supported scenarios too much. Also it would make livepatching more error-prone. I would like to see updated variant of this patch to see how much arch-specific code is really necessary. IMHO, if reverting relocations is too complicated then the least painful compromise is to "deny the patched modules to be removed". > > Anyway, that is why I proposed "Rethinking late module patching" talk at > > LPC and we should try to come up with a solution there. > > Thanks, I saw that. It's definitely worthy of more discussion. The > talk may be more productive if there were code to look at. For example, > a patch which removes all the "late module patching" gunk, so we can at > least quantify the cost of the current approach. +1 Best Regards, Petr
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Wed, Aug 14, 2019 at 01:06:09PM +0200, Miroslav Benes wrote: > > Really, we should be going in the opposite direction, by creating module > > dependencies, like all other kernel modules do, ensuring that a module > > is loaded *before* we patch it. That would also eliminate this bug. > > Yes, but it is not ideal either with cumulative one-fixes-all patch > modules. It would load also modules which are not necessary for a > customer and I know that at least some customers care about this. They > want to deploy only things which are crucial for their systems. If you frame the question as "do you want to destabilize the live patching infrastucture" then the answer might be different. We should look at whether it makes sense to destabilize live patching for everybody, for a small minority of people who care about a small minority of edge cases. Or maybe there's some other solution we haven't thought about, which fits more in the framework of how kernel modules already work. > We could split patch modules as you proposed in the past, but that have > issues as well. Right, I'm not really crazy about that solution either. Here's another idea: per-object patch modules. Patches to vmlinux are in a vmlinux patch module. Patches to kvm.ko are in a kvm patch module. That would require: - Careful management of dependencies between object-specific patches. Maybe that just means that exported function ABIs shouldn't change. - Some kind of hooking into modprobe to ensure the patch module gets loaded with the real one. - Changing 'atomic replace' to allow patch modules to be per-object. > Anyway, that is why I proposed "Rethinking late module patching" talk at > LPC and we should try to come up with a solution there. Thanks, I saw that. It's definitely worthy of more discussion. The talk may be more productive if there were code to look at. For example, a patch which removes all the "late module patching" gunk, so we can at least quantify the cost of the current approach. -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Mon, 22 Jul 2019, Petr Mladek wrote: > On Fri 2019-07-19 14:28:40, Miroslav Benes wrote: > > Josh reported a bug: > > > > When the object to be patched is a module, and that module is > > rmmod'ed and reloaded, it fails to load with: > > > > module: x86/modules: Skipping invalid relocation target, existing value > > is nonzero for type 2, loc ba0302e9, val a03e293c > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' > > (-8) > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to > > load module 'nfsd' > > > > The livepatch module has a relocation which references a symbol > > in the _previous_ loading of nfsd. When apply_relocate_add() > > tries to replace the old relocation with a new one, it sees that > > the previous one is nonzero and it errors out. > > > > On ppc64le, we have a similar issue: > > > > module_64: livepatch_nfsd: Expected nop after call, got e8410018 at > > e_show+0x60/0x548 [livepatch_nfsd] > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' > > (-8) > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to > > load module 'nfsd' > > > > He also proposed three different solutions. We could remove the error > > check in apply_relocate_add() introduced by commit eda9cec4c9a1 > > ("x86/module: Detect and skip invalid relocations"). However the check > > is useful for detecting corrupted modules. > > > > We could also deny the patched modules to be removed. If it proved to be > > a major drawback for users, we could still implement a different > > approach. The solution would also complicate the existing code a lot. > > > > We thus decided to reverse the relocation patching (clear all relocation > > targets on x86_64, or return back nops on powerpc). The solution is not > > universal and is too much arch-specific, but it may prove to be simpler > > in the end. > > > > Reported-by: Josh Poimboeuf > > Signed-off-by: Miroslav Benes > > --- > > arch/powerpc/kernel/Makefile| 1 + > > arch/powerpc/kernel/livepatch.c | 75 + > > arch/powerpc/kernel/module.h| 15 +++ > > arch/powerpc/kernel/module_64.c | 7 +-- > > arch/x86/kernel/livepatch.c | 67 + > > include/linux/livepatch.h | 5 +++ > > kernel/livepatch/core.c | 17 +--- > > 7 files changed, 176 insertions(+), 11 deletions(-) > > create mode 100644 arch/powerpc/kernel/livepatch.c > > create mode 100644 arch/powerpc/kernel/module.h > > > > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile > > index 0ea6c4aa3a20..639000f78dc3 100644 > > --- a/arch/powerpc/kernel/Makefile > > +++ b/arch/powerpc/kernel/Makefile > > @@ -154,6 +154,7 @@ endif > > > > obj-$(CONFIG_EPAPR_PARAVIRT) += epapr_paravirt.o epapr_hcalls.o > > obj-$(CONFIG_KVM_GUEST)+= kvm.o kvm_emul.o > > +obj-$(CONFIG_LIVEPATCH)+= livepatch.o > > > > # Disable GCOV, KCOV & sanitizers in odd or sensitive code > > GCOV_PROFILE_prom_init.o := n > > diff --git a/arch/powerpc/kernel/livepatch.c > > b/arch/powerpc/kernel/livepatch.c > > new file mode 100644 > > index ..6f2468c60695 > > --- /dev/null > > +++ b/arch/powerpc/kernel/livepatch.c > > @@ -0,0 +1,75 @@ > > +// SPDX-License-Identifier: GPL-2.0-or-later > > +/* > > + * livepatch.c - powerpc-specific Kernel Live Patching Core > > + */ > > + > > +#include > > +#include > > +#include "module.h" > > + > > +void arch_klp_free_object_loaded(struct klp_patch *patch, > > +struct klp_object *obj) > > If I get it correctly then this functions reverts changes done by > klp_write_object_relocations(). Therefore it should get called > klp_clear_object_relocations() or so. Good point. Especially when we want to split the function to arch-independent and arch-specific parts. > There is also arch_klp_init_object_loaded() but it does different > things, for example it applies alternatives or paravirt instructions. > Do we need to revert these as well? No, I don't think so. They should not change because the patch module stays loaded. > > +{ > > + const char *objname, *secname, *symname; > > + char sec_objname[MODULE_NAME_LEN]; > > + struct klp_modinfo *info; > > + Elf64_Shdr *s; > > + Elf64_Rela *rel; > > + Elf64_Sym *sym; > > + void *loc; > > + u32 *instruction; > > + int i, cnt; > > + > > + info = patch->mod->klp_info; > > + objname = klp_is_module(obj) ? obj->name : "vmlinux"; > > + > > + /* See livepatch core code for BUILD_BUG_ON() explanation */ > > + BUILD_BUG_ON(MODULE_NAME_LEN < 56 || KSYM_NAME_LEN != 128); > > + > > + /* For each klp relocation section */ > > + for (s = info->sechdrs; s < info->sechdrs + info->hdr.e_shnum; s++) { > > + if (!(s->sh_flags & SHF_RELA_LIVEPATCH)) > > + continue; > > + > > + /* > > +* Fo
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Sun, 28 Jul 2019, Josh Poimboeuf wrote: > On Fri, Jul 19, 2019 at 02:28:40PM +0200, Miroslav Benes wrote: > > Josh reported a bug: > > > > When the object to be patched is a module, and that module is > > rmmod'ed and reloaded, it fails to load with: > > > > module: x86/modules: Skipping invalid relocation target, existing value > > is nonzero for type 2, loc ba0302e9, val a03e293c > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' > > (-8) > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to > > load module 'nfsd' > > > > The livepatch module has a relocation which references a symbol > > in the _previous_ loading of nfsd. When apply_relocate_add() > > tries to replace the old relocation with a new one, it sees that > > the previous one is nonzero and it errors out. > > > > On ppc64le, we have a similar issue: > > > > module_64: livepatch_nfsd: Expected nop after call, got e8410018 at > > e_show+0x60/0x548 [livepatch_nfsd] > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' > > (-8) > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to > > load module 'nfsd' > > > > He also proposed three different solutions. We could remove the error > > check in apply_relocate_add() introduced by commit eda9cec4c9a1 > > ("x86/module: Detect and skip invalid relocations"). However the check > > is useful for detecting corrupted modules. > > > > We could also deny the patched modules to be removed. If it proved to be > > a major drawback for users, we could still implement a different > > approach. The solution would also complicate the existing code a lot. > > > > We thus decided to reverse the relocation patching (clear all relocation > > targets on x86_64, or return back nops on powerpc). The solution is not > > universal and is too much arch-specific, but it may prove to be simpler > > in the end. > > Thanks for the patch Miroslav. > > However, I really don't like it. All this extra convoluted > arch-specific code, just so users can unload a patched module. Yes, it is unfortunate. > Remind me why we didn't do the "deny the patched modules to be removed" > option? Petr came with a couple of issues in the patch. Nothing unfixable, but it would complicate the code a bit, so we wanted to explore arch-specific approach first. I'll return to it, fix it and we'll see the outcome. > Really, we should be going in the opposite direction, by creating module > dependencies, like all other kernel modules do, ensuring that a module > is loaded *before* we patch it. That would also eliminate this bug. Yes, but it is not ideal either with cumulative one-fixes-all patch modules. It would load also modules which are not necessary for a customer and I know that at least some customers care about this. They want to deploy only things which are crucial for their systems. We could split patch modules as you proposed in the past, but that have issues as well. Anyway, that is why I proposed "Rethinking late module patching" talk at LPC and we should try to come up with a solution there. > And I think it would also help us remove a lot of nasty code, like the > coming/going notifiers and the .klp.arch mess. Which, BTW, seem to be > the sources of most of our bugs... Yes. > Yes, there's the "but it's less flexible!" argument. Does anybody > really need the flexibility? I strongly doubt it. I would love to see > an RFC patch which enforces that restriction, to see all the nasty code > we could remove. I would much rather live patching be stable than > flexible. I agree that unloading a module does not make sense much (famous last words), so we could try. Miroslav
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Fri, Jul 19, 2019 at 02:28:40PM +0200, Miroslav Benes wrote: > Josh reported a bug: > > When the object to be patched is a module, and that module is > rmmod'ed and reloaded, it fails to load with: > > module: x86/modules: Skipping invalid relocation target, existing value is > nonzero for type 2, loc ba0302e9, val a03e293c > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' > (-8) > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to > load module 'nfsd' > > The livepatch module has a relocation which references a symbol > in the _previous_ loading of nfsd. When apply_relocate_add() > tries to replace the old relocation with a new one, it sees that > the previous one is nonzero and it errors out. > > On ppc64le, we have a similar issue: > > module_64: livepatch_nfsd: Expected nop after call, got e8410018 at > e_show+0x60/0x548 [livepatch_nfsd] > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' > (-8) > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to > load module 'nfsd' > > He also proposed three different solutions. We could remove the error > check in apply_relocate_add() introduced by commit eda9cec4c9a1 > ("x86/module: Detect and skip invalid relocations"). However the check > is useful for detecting corrupted modules. > > We could also deny the patched modules to be removed. If it proved to be > a major drawback for users, we could still implement a different > approach. The solution would also complicate the existing code a lot. > > We thus decided to reverse the relocation patching (clear all relocation > targets on x86_64, or return back nops on powerpc). The solution is not > universal and is too much arch-specific, but it may prove to be simpler > in the end. Thanks for the patch Miroslav. However, I really don't like it. All this extra convoluted arch-specific code, just so users can unload a patched module. Remind me why we didn't do the "deny the patched modules to be removed" option? Really, we should be going in the opposite direction, by creating module dependencies, like all other kernel modules do, ensuring that a module is loaded *before* we patch it. That would also eliminate this bug. And I think it would also help us remove a lot of nasty code, like the coming/going notifiers and the .klp.arch mess. Which, BTW, seem to be the sources of most of our bugs... Yes, there's the "but it's less flexible!" argument. Does anybody really need the flexibility? I strongly doubt it. I would love to see an RFC patch which enforces that restriction, to see all the nasty code we could remove. I would much rather live patching be stable than flexible. -- Josh
Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal
On Fri 2019-07-19 14:28:40, Miroslav Benes wrote: > Josh reported a bug: > > When the object to be patched is a module, and that module is > rmmod'ed and reloaded, it fails to load with: > > module: x86/modules: Skipping invalid relocation target, existing value is > nonzero for type 2, loc ba0302e9, val a03e293c > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' > (-8) > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to > load module 'nfsd' > > The livepatch module has a relocation which references a symbol > in the _previous_ loading of nfsd. When apply_relocate_add() > tries to replace the old relocation with a new one, it sees that > the previous one is nonzero and it errors out. > > On ppc64le, we have a similar issue: > > module_64: livepatch_nfsd: Expected nop after call, got e8410018 at > e_show+0x60/0x548 [livepatch_nfsd] > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' > (-8) > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to > load module 'nfsd' > > He also proposed three different solutions. We could remove the error > check in apply_relocate_add() introduced by commit eda9cec4c9a1 > ("x86/module: Detect and skip invalid relocations"). However the check > is useful for detecting corrupted modules. > > We could also deny the patched modules to be removed. If it proved to be > a major drawback for users, we could still implement a different > approach. The solution would also complicate the existing code a lot. > > We thus decided to reverse the relocation patching (clear all relocation > targets on x86_64, or return back nops on powerpc). The solution is not > universal and is too much arch-specific, but it may prove to be simpler > in the end. > > Reported-by: Josh Poimboeuf > Signed-off-by: Miroslav Benes > --- > arch/powerpc/kernel/Makefile| 1 + > arch/powerpc/kernel/livepatch.c | 75 + > arch/powerpc/kernel/module.h| 15 +++ > arch/powerpc/kernel/module_64.c | 7 +-- > arch/x86/kernel/livepatch.c | 67 + > include/linux/livepatch.h | 5 +++ > kernel/livepatch/core.c | 17 +--- > 7 files changed, 176 insertions(+), 11 deletions(-) > create mode 100644 arch/powerpc/kernel/livepatch.c > create mode 100644 arch/powerpc/kernel/module.h > > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile > index 0ea6c4aa3a20..639000f78dc3 100644 > --- a/arch/powerpc/kernel/Makefile > +++ b/arch/powerpc/kernel/Makefile > @@ -154,6 +154,7 @@ endif > > obj-$(CONFIG_EPAPR_PARAVIRT) += epapr_paravirt.o epapr_hcalls.o > obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o > +obj-$(CONFIG_LIVEPATCH) += livepatch.o > > # Disable GCOV, KCOV & sanitizers in odd or sensitive code > GCOV_PROFILE_prom_init.o := n > diff --git a/arch/powerpc/kernel/livepatch.c b/arch/powerpc/kernel/livepatch.c > new file mode 100644 > index ..6f2468c60695 > --- /dev/null > +++ b/arch/powerpc/kernel/livepatch.c > @@ -0,0 +1,75 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > +/* > + * livepatch.c - powerpc-specific Kernel Live Patching Core > + */ > + > +#include > +#include > +#include "module.h" > + > +void arch_klp_free_object_loaded(struct klp_patch *patch, > + struct klp_object *obj) If I get it correctly then this functions reverts changes done by klp_write_object_relocations(). Therefore it should get called klp_clear_object_relocations() or so. There is also arch_klp_init_object_loaded() but it does different things, for example it applies alternatives or paravirt instructions. Do we need to revert these as well? > +{ > + const char *objname, *secname, *symname; > + char sec_objname[MODULE_NAME_LEN]; > + struct klp_modinfo *info; > + Elf64_Shdr *s; > + Elf64_Rela *rel; > + Elf64_Sym *sym; > + void *loc; > + u32 *instruction; > + int i, cnt; > + > + info = patch->mod->klp_info; > + objname = klp_is_module(obj) ? obj->name : "vmlinux"; > + > + /* See livepatch core code for BUILD_BUG_ON() explanation */ > + BUILD_BUG_ON(MODULE_NAME_LEN < 56 || KSYM_NAME_LEN != 128); > + > + /* For each klp relocation section */ > + for (s = info->sechdrs; s < info->sechdrs + info->hdr.e_shnum; s++) { > + if (!(s->sh_flags & SHF_RELA_LIVEPATCH)) > + continue; > + > + /* > + * Format: .klp.rela.sec_objname.section_name > + */ > + secname = info->secstrings + s->sh_name; > + cnt = sscanf(secname, ".klp.rela.%55[^.]", sec_objname); > + if (cnt != 1) { > + pr_err("section %s has an incorrectly formatted name\n", > +secname); > + continue; > + } > + > + if (strcmp(ob