Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Tue, Feb 20, 2018 at 9:16 AM, Igor Stoppawrote: > > > On 13/02/18 20:10, Laura Abbott wrote: >> On 02/13/2018 07:20 AM, Igor Stoppa wrote: >>> Why alterations of page properties are not considered a risk and the >>> physmap is? >>> And how would it be easier (i suppose) to attack the latter? >> >> Alterations are certainly a risk but with the physmap the >> mapping is already there. Find the address and you have >> access vs. needing to actually modify the properties >> then do the access. I could also be complete off base >> on my threat model here so please correct me if I'm >> wrong. > > It's difficult for me to comment on this without knowing *how* the > attack would be performed, in your model. > > Ex: my expectation is that the attacked has R/W access to kernel data > and has knowledge of the location of static variables. > > This is not just a guess, but a real-life scenario, found in attacks > that, among other things, are capable of disabling SELinux, to proceed > toward gaining full root capability. > > At that point, I think that variables which are allocated dynamically, > in vmalloc address space, are harder to locate, because of the virtual > mapping and the randomness of the address chosen (this I have not > confirmed yet, but I suppose there is some randomness in picking the > address to assign to a certain allocation request to vmalloc, otherwise, > it could be added). Machine-to-machine runtime variation certainly affects the mapping location, but for early boot allocations, these become surprisingly deterministic, especially across similar hardware/memory layouts (both the virtmap and physmap locations). However, using CONFIG_RANDOMIZE_MEMORY makes it MUCH more difficult. (Note that RANDOMIZE_BASE on arm64 effectively includes RANDOMIZE_MEMORY, as it uses the entropy for multiple base offsets, including the physmap, IIRC.) >> I think your other summaries are good points though >> and should go in the cover letter. > > Ok, I'm just afraid it risks becoming a lengthy dissertation :-) It's rare to have anyone say "your commit log is too long". :) -Kees -- Kees Cook Pixel Security
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Tue, Feb 20, 2018 at 9:16 AM, Igor Stoppa wrote: > > > On 13/02/18 20:10, Laura Abbott wrote: >> On 02/13/2018 07:20 AM, Igor Stoppa wrote: >>> Why alterations of page properties are not considered a risk and the >>> physmap is? >>> And how would it be easier (i suppose) to attack the latter? >> >> Alterations are certainly a risk but with the physmap the >> mapping is already there. Find the address and you have >> access vs. needing to actually modify the properties >> then do the access. I could also be complete off base >> on my threat model here so please correct me if I'm >> wrong. > > It's difficult for me to comment on this without knowing *how* the > attack would be performed, in your model. > > Ex: my expectation is that the attacked has R/W access to kernel data > and has knowledge of the location of static variables. > > This is not just a guess, but a real-life scenario, found in attacks > that, among other things, are capable of disabling SELinux, to proceed > toward gaining full root capability. > > At that point, I think that variables which are allocated dynamically, > in vmalloc address space, are harder to locate, because of the virtual > mapping and the randomness of the address chosen (this I have not > confirmed yet, but I suppose there is some randomness in picking the > address to assign to a certain allocation request to vmalloc, otherwise, > it could be added). Machine-to-machine runtime variation certainly affects the mapping location, but for early boot allocations, these become surprisingly deterministic, especially across similar hardware/memory layouts (both the virtmap and physmap locations). However, using CONFIG_RANDOMIZE_MEMORY makes it MUCH more difficult. (Note that RANDOMIZE_BASE on arm64 effectively includes RANDOMIZE_MEMORY, as it uses the entropy for multiple base offsets, including the physmap, IIRC.) >> I think your other summaries are good points though >> and should go in the cover letter. > > Ok, I'm just afraid it risks becoming a lengthy dissertation :-) It's rare to have anyone say "your commit log is too long". :) -Kees -- Kees Cook Pixel Security
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On Tue, Feb 20, 2018 at 8:28 AM, Igor Stoppawrote: > > > On 14/02/18 21:29, Kees Cook wrote: >> On Wed, Feb 14, 2018 at 11:06 AM, Laura Abbott wrote: > > [...] > >>> Kernel code should be fine, if it isn't that is a bug that should be >>> fixed. Modules yes are not fully protected. The conclusion from past >> >> I think that's a pretty serious problem: we can't have aliases with >> mismatched permissions; this degrades a deterministic protection >> (read-only) to a probabilistic protection (knowing where the alias of >> a target is mapped). Having an attack be "needs some info leaks" >> instead of "need execution control to change perms" is a much lower >> bar, IMO. > > Why "need execution control to change permission"? > Or, iow, what does it mean exactly? > ROP/JOP? Data-oriented control flow hijack? Right, I mean, if an attacker has already gained execute control, they can just call the needed functions to change memory permissions. But that isn't needed if there is a mismatch between physmap and virtmap: i.e. they can write to the physmap without needing to change perms first. > One can argue that this sort of R/W activity probably does require some > form of execution control, but AFAIK, the only way to to prevent it, is > to have CFI - btw, is there any standardization in that sense? I meant that I don't want a difference in protection between physmap and virtmap. I'd like to be able to reason the smae about the exposures in either. > So, from my (pessimistic?) perspective, the best that can be hoped for, > is to make it much harder to figure out where the data is located. > > Virtual mapping has this side effect, compared to linear mapping. Right, this is good, for sure. No complaints there at all. It's why I think pmalloc and arm64 physmap perms are separate issues. -Kees -- Kees Cook Pixel Security
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On Tue, Feb 20, 2018 at 8:28 AM, Igor Stoppa wrote: > > > On 14/02/18 21:29, Kees Cook wrote: >> On Wed, Feb 14, 2018 at 11:06 AM, Laura Abbott wrote: > > [...] > >>> Kernel code should be fine, if it isn't that is a bug that should be >>> fixed. Modules yes are not fully protected. The conclusion from past >> >> I think that's a pretty serious problem: we can't have aliases with >> mismatched permissions; this degrades a deterministic protection >> (read-only) to a probabilistic protection (knowing where the alias of >> a target is mapped). Having an attack be "needs some info leaks" >> instead of "need execution control to change perms" is a much lower >> bar, IMO. > > Why "need execution control to change permission"? > Or, iow, what does it mean exactly? > ROP/JOP? Data-oriented control flow hijack? Right, I mean, if an attacker has already gained execute control, they can just call the needed functions to change memory permissions. But that isn't needed if there is a mismatch between physmap and virtmap: i.e. they can write to the physmap without needing to change perms first. > One can argue that this sort of R/W activity probably does require some > form of execution control, but AFAIK, the only way to to prevent it, is > to have CFI - btw, is there any standardization in that sense? I meant that I don't want a difference in protection between physmap and virtmap. I'd like to be able to reason the smae about the exposures in either. > So, from my (pessimistic?) perspective, the best that can be hoped for, > is to make it much harder to figure out where the data is located. > > Virtual mapping has this side effect, compared to linear mapping. Right, this is good, for sure. No complaints there at all. It's why I think pmalloc and arm64 physmap perms are separate issues. -Kees -- Kees Cook Pixel Security
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 13/02/18 20:10, Laura Abbott wrote: > On 02/13/2018 07:20 AM, Igor Stoppa wrote: >> Why alterations of page properties are not considered a risk and the physmap >> is? >> And how would it be easier (i suppose) to attack the latter? > > Alterations are certainly a risk but with the physmap the > mapping is already there. Find the address and you have > access vs. needing to actually modify the properties > then do the access. I could also be complete off base > on my threat model here so please correct me if I'm > wrong. It's difficult for me to comment on this without knowing *how* the attack would be performed, in your model. Ex: my expectation is that the attacked has R/W access to kernel data and has knowledge of the location of static variables. This is not just a guess, but a real-life scenario, found in attacks that, among other things, are capable of disabling SELinux, to proceed toward gaining full root capability. At that point, I think that variables which are allocated dynamically, in vmalloc address space, are harder to locate, because of the virtual mapping and the randomness of the address chosen (this I have not confirmed yet, but I suppose there is some randomness in picking the address to assign to a certain allocation request to vmalloc, otherwise, it could be added). > I think your other summaries are good points though > and should go in the cover letter. Ok, I'm just afraid it risks becoming a lengthy dissertation :-) -- igor
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 13/02/18 20:10, Laura Abbott wrote: > On 02/13/2018 07:20 AM, Igor Stoppa wrote: >> Why alterations of page properties are not considered a risk and the physmap >> is? >> And how would it be easier (i suppose) to attack the latter? > > Alterations are certainly a risk but with the physmap the > mapping is already there. Find the address and you have > access vs. needing to actually modify the properties > then do the access. I could also be complete off base > on my threat model here so please correct me if I'm > wrong. It's difficult for me to comment on this without knowing *how* the attack would be performed, in your model. Ex: my expectation is that the attacked has R/W access to kernel data and has knowledge of the location of static variables. This is not just a guess, but a real-life scenario, found in attacks that, among other things, are capable of disabling SELinux, to proceed toward gaining full root capability. At that point, I think that variables which are allocated dynamically, in vmalloc address space, are harder to locate, because of the virtual mapping and the randomness of the address chosen (this I have not confirmed yet, but I suppose there is some randomness in picking the address to assign to a certain allocation request to vmalloc, otherwise, it could be added). > I think your other summaries are good points though > and should go in the cover letter. Ok, I'm just afraid it risks becoming a lengthy dissertation :-) -- igor
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On 14/02/18 21:29, Kees Cook wrote: > On Wed, Feb 14, 2018 at 11:06 AM, Laura Abbottwrote: [...] >> Kernel code should be fine, if it isn't that is a bug that should be >> fixed. Modules yes are not fully protected. The conclusion from past > > I think that's a pretty serious problem: we can't have aliases with > mismatched permissions; this degrades a deterministic protection > (read-only) to a probabilistic protection (knowing where the alias of > a target is mapped). Having an attack be "needs some info leaks" > instead of "need execution control to change perms" is a much lower > bar, IMO. Why "need execution control to change permission"? Or, iow, what does it mean exactly? ROP/JOP? Data-oriented control flow hijack? Unless I misunderstand the meaning of "need execution control", I think that "need write capability to arbitrary data address" should be sufficient, albeit uncomfortable to use. OTOH, "need read/write capability from/to arbitrary data address" would be enough, I think, assuming that one knows the offset where to write to - but that information could be inferred, for example, by scanning the memory for known patterns. IMHO the attack surface is so vast that it's not unreasonable to expect that it will be possible to fish out means to perform arbitrary R/W into kernel address space. Ex: some more recent/less tested driver. One can argue that this sort of R/W activity probably does require some form of execution control, but AFAIK, the only way to to prevent it, is to have CFI - btw, is there any standardization in that sense? So, from my (pessimistic?) perspective, the best that can be hoped for, is to make it much harder to figure out where the data is located. Virtual mapping has this side effect, compared to linear mapping. But, once easier attack targets are removed, I suspect the page mapping will become the next target. -- igor
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On 14/02/18 21:29, Kees Cook wrote: > On Wed, Feb 14, 2018 at 11:06 AM, Laura Abbott wrote: [...] >> Kernel code should be fine, if it isn't that is a bug that should be >> fixed. Modules yes are not fully protected. The conclusion from past > > I think that's a pretty serious problem: we can't have aliases with > mismatched permissions; this degrades a deterministic protection > (read-only) to a probabilistic protection (knowing where the alias of > a target is mapped). Having an attack be "needs some info leaks" > instead of "need execution control to change perms" is a much lower > bar, IMO. Why "need execution control to change permission"? Or, iow, what does it mean exactly? ROP/JOP? Data-oriented control flow hijack? Unless I misunderstand the meaning of "need execution control", I think that "need write capability to arbitrary data address" should be sufficient, albeit uncomfortable to use. OTOH, "need read/write capability from/to arbitrary data address" would be enough, I think, assuming that one knows the offset where to write to - but that information could be inferred, for example, by scanning the memory for known patterns. IMHO the attack surface is so vast that it's not unreasonable to expect that it will be possible to fish out means to perform arbitrary R/W into kernel address space. Ex: some more recent/less tested driver. One can argue that this sort of R/W activity probably does require some form of execution control, but AFAIK, the only way to to prevent it, is to have CFI - btw, is there any standardization in that sense? So, from my (pessimistic?) perspective, the best that can be hoped for, is to make it much harder to figure out where the data is located. Virtual mapping has this side effect, compared to linear mapping. But, once easier attack targets are removed, I suspect the page mapping will become the next target. -- igor
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On Wed, Feb 14, 2018 at 2:13 PM, Tycho Andersenwrote: > On Wed, Feb 14, 2018 at 11:48:38AM -0800, Kees Cook wrote: >> On Wed, Feb 14, 2018 at 11:06 AM, Laura Abbott wrote: >> > fixed. Modules yes are not fully protected. The conclusion from past >> > experience has been that we cannot safely break down larger page sizes >> > at runtime like x86 does. We could theoretically >> > add support for fixing up the alias if PAGE_POISONING is enabled but >> > I don't know who would actually use that in production. Performance >> > is very poor at that point. >> >> XPFO forces 4K pages on the physmap[1] for similar reasons. I have no >> doubt about performance changes, but I'd be curious to see real >> numbers. Did anyone do benchmarks on just the huge/4K change? (Without >> also the XPFO overhead?) >> >> If this, XPFO, and PAGE_POISONING all need it, I think we have to >> start a closer investigation. :) > > I haven't but it shouldn't be too hard. What benchmarks are you > thinking? Unless I'm looking at some specific micro benchmark, I tend to default to looking at kernel build benchmarks but that gets pretty noisy. Laura regularly uses hackbench, IIRC. I'm not finding the pastebin I had for that, though. I wonder if we need a benchmark subdirectory in tools/testing/, so we could collect some of these common tools? All benchmarks are terrible, but at least we'd have the same terrible benchmarks. :) -Kees -- Kees Cook Pixel Security
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On Wed, Feb 14, 2018 at 2:13 PM, Tycho Andersen wrote: > On Wed, Feb 14, 2018 at 11:48:38AM -0800, Kees Cook wrote: >> On Wed, Feb 14, 2018 at 11:06 AM, Laura Abbott wrote: >> > fixed. Modules yes are not fully protected. The conclusion from past >> > experience has been that we cannot safely break down larger page sizes >> > at runtime like x86 does. We could theoretically >> > add support for fixing up the alias if PAGE_POISONING is enabled but >> > I don't know who would actually use that in production. Performance >> > is very poor at that point. >> >> XPFO forces 4K pages on the physmap[1] for similar reasons. I have no >> doubt about performance changes, but I'd be curious to see real >> numbers. Did anyone do benchmarks on just the huge/4K change? (Without >> also the XPFO overhead?) >> >> If this, XPFO, and PAGE_POISONING all need it, I think we have to >> start a closer investigation. :) > > I haven't but it shouldn't be too hard. What benchmarks are you > thinking? Unless I'm looking at some specific micro benchmark, I tend to default to looking at kernel build benchmarks but that gets pretty noisy. Laura regularly uses hackbench, IIRC. I'm not finding the pastebin I had for that, though. I wonder if we need a benchmark subdirectory in tools/testing/, so we could collect some of these common tools? All benchmarks are terrible, but at least we'd have the same terrible benchmarks. :) -Kees -- Kees Cook Pixel Security
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On Wed, Feb 14, 2018 at 11:48:38AM -0800, Kees Cook wrote: > On Wed, Feb 14, 2018 at 11:06 AM, Laura Abbottwrote: > > fixed. Modules yes are not fully protected. The conclusion from past > > experience has been that we cannot safely break down larger page sizes > > at runtime like x86 does. We could theoretically > > add support for fixing up the alias if PAGE_POISONING is enabled but > > I don't know who would actually use that in production. Performance > > is very poor at that point. > > XPFO forces 4K pages on the physmap[1] for similar reasons. I have no > doubt about performance changes, but I'd be curious to see real > numbers. Did anyone do benchmarks on just the huge/4K change? (Without > also the XPFO overhead?) > > If this, XPFO, and PAGE_POISONING all need it, I think we have to > start a closer investigation. :) I haven't but it shouldn't be too hard. What benchmarks are you thinking? Tycho
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On Wed, Feb 14, 2018 at 11:48:38AM -0800, Kees Cook wrote: > On Wed, Feb 14, 2018 at 11:06 AM, Laura Abbott wrote: > > fixed. Modules yes are not fully protected. The conclusion from past > > experience has been that we cannot safely break down larger page sizes > > at runtime like x86 does. We could theoretically > > add support for fixing up the alias if PAGE_POISONING is enabled but > > I don't know who would actually use that in production. Performance > > is very poor at that point. > > XPFO forces 4K pages on the physmap[1] for similar reasons. I have no > doubt about performance changes, but I'd be curious to see real > numbers. Did anyone do benchmarks on just the huge/4K change? (Without > also the XPFO overhead?) > > If this, XPFO, and PAGE_POISONING all need it, I think we have to > start a closer investigation. :) I haven't but it shouldn't be too hard. What benchmarks are you thinking? Tycho
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On 02/14/2018 11:28 AM, Ard Biesheuvel wrote: On 14 February 2018 at 19:06, Laura Abbottwrote: On 02/13/2018 01:43 PM, Kees Cook wrote: On Tue, Feb 13, 2018 at 8:09 AM, Laura Abbott wrote: No, arm64 doesn't fixup the aliases, mostly because arm64 uses larger page sizes which can't be broken down at runtime. CONFIG_PAGE_POISONING does use 4K pages which could be adjusted at runtime. So yes, you are right we would have physmap exposure on arm64 as well. Errr, so that means even modules and kernel code are writable via the arm64 physmap? That seems extraordinarily bad. :( -Kees (adding linux-arm-kernel and changing the subject) Kernel code should be fine, if it isn't that is a bug that should be fixed. We take care to ensure that the linear alias of the core kernel's .text and .rodata segments are mapped read-only. When we first moved the kernel out of the linear region, we did not map it there at all anymore, but that broke hibernation so we had to put something back. Modules yes are not fully protected. The conclusion from past experience has been that we cannot safely break down larger page sizes at runtime like x86 does. We could theoretically add support for fixing up the alias if PAGE_POISONING is enabled but I don't know who would actually use that in production. Performance is very poor at that point. As long as the linear alias of the module is mapped down to pages, we should be able to tweak the permissions. I take it that PAGE_POISONING does more than just that? Page poisoning does exactly that. The argument I was trying to make was that if nobody really uses page poisoning except for debugging it might not be worth it to fix up the alias. Thinking a bit more, this is a terrible argument for many reasons so yes I agree that we can just fix up the alias if PAGE_POISONING (or other features) are enabled. Thanks, Laura
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On 02/14/2018 11:28 AM, Ard Biesheuvel wrote: On 14 February 2018 at 19:06, Laura Abbott wrote: On 02/13/2018 01:43 PM, Kees Cook wrote: On Tue, Feb 13, 2018 at 8:09 AM, Laura Abbott wrote: No, arm64 doesn't fixup the aliases, mostly because arm64 uses larger page sizes which can't be broken down at runtime. CONFIG_PAGE_POISONING does use 4K pages which could be adjusted at runtime. So yes, you are right we would have physmap exposure on arm64 as well. Errr, so that means even modules and kernel code are writable via the arm64 physmap? That seems extraordinarily bad. :( -Kees (adding linux-arm-kernel and changing the subject) Kernel code should be fine, if it isn't that is a bug that should be fixed. We take care to ensure that the linear alias of the core kernel's .text and .rodata segments are mapped read-only. When we first moved the kernel out of the linear region, we did not map it there at all anymore, but that broke hibernation so we had to put something back. Modules yes are not fully protected. The conclusion from past experience has been that we cannot safely break down larger page sizes at runtime like x86 does. We could theoretically add support for fixing up the alias if PAGE_POISONING is enabled but I don't know who would actually use that in production. Performance is very poor at that point. As long as the linear alias of the module is mapped down to pages, we should be able to tweak the permissions. I take it that PAGE_POISONING does more than just that? Page poisoning does exactly that. The argument I was trying to make was that if nobody really uses page poisoning except for debugging it might not be worth it to fix up the alias. Thinking a bit more, this is a terrible argument for many reasons so yes I agree that we can just fix up the alias if PAGE_POISONING (or other features) are enabled. Thanks, Laura
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On Wed, Feb 14, 2018 at 11:06 AM, Laura Abbottwrote: > fixed. Modules yes are not fully protected. The conclusion from past > experience has been that we cannot safely break down larger page sizes > at runtime like x86 does. We could theoretically > add support for fixing up the alias if PAGE_POISONING is enabled but > I don't know who would actually use that in production. Performance > is very poor at that point. XPFO forces 4K pages on the physmap[1] for similar reasons. I have no doubt about performance changes, but I'd be curious to see real numbers. Did anyone do benchmarks on just the huge/4K change? (Without also the XPFO overhead?) If this, XPFO, and PAGE_POISONING all need it, I think we have to start a closer investigation. :) -Kees [1] http://www.openwall.com/lists/kernel-hardening/2017/09/07/13 -- Kees Cook Pixel Security
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On Wed, Feb 14, 2018 at 11:06 AM, Laura Abbott wrote: > fixed. Modules yes are not fully protected. The conclusion from past > experience has been that we cannot safely break down larger page sizes > at runtime like x86 does. We could theoretically > add support for fixing up the alias if PAGE_POISONING is enabled but > I don't know who would actually use that in production. Performance > is very poor at that point. XPFO forces 4K pages on the physmap[1] for similar reasons. I have no doubt about performance changes, but I'd be curious to see real numbers. Did anyone do benchmarks on just the huge/4K change? (Without also the XPFO overhead?) If this, XPFO, and PAGE_POISONING all need it, I think we have to start a closer investigation. :) -Kees [1] http://www.openwall.com/lists/kernel-hardening/2017/09/07/13 -- Kees Cook Pixel Security
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On Wed, Feb 14, 2018 at 11:29 AM, Kees Cookwrote: > Why does using finer granularity on the physmap degrade performance? I > assume TLB pressure, but what is heavily using that area? (I must not > be understanding what physmap actually gets used for -- I thought it > was just a convenience to have a 1:1 virt/phys map for some lookups?) Jann has sorted me out: it's that physmap isn't an _alias_ for the buddy allocator memory areas; it's used directly. -Kees -- Kees Cook Pixel Security
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On Wed, Feb 14, 2018 at 11:29 AM, Kees Cook wrote: > Why does using finer granularity on the physmap degrade performance? I > assume TLB pressure, but what is heavily using that area? (I must not > be understanding what physmap actually gets used for -- I thought it > was just a convenience to have a 1:1 virt/phys map for some lookups?) Jann has sorted me out: it's that physmap isn't an _alias_ for the buddy allocator memory areas; it's used directly. -Kees -- Kees Cook Pixel Security
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On Wed, Feb 14, 2018 at 11:06 AM, Laura Abbottwrote: > On 02/13/2018 01:43 PM, Kees Cook wrote: >> >> On Tue, Feb 13, 2018 at 8:09 AM, Laura Abbott wrote: >>> >>> No, arm64 doesn't fixup the aliases, mostly because arm64 uses larger >>> page sizes which can't be broken down at runtime. CONFIG_PAGE_POISONING >>> does use 4K pages which could be adjusted at runtime. So yes, you are >>> right we would have physmap exposure on arm64 as well. >> >> >> Errr, so that means even modules and kernel code are writable via the >> arm64 physmap? That seems extraordinarily bad. :( >> >> -Kees >> > > (adding linux-arm-kernel and changing the subject) > > Kernel code should be fine, if it isn't that is a bug that should be > fixed. Modules yes are not fully protected. The conclusion from past I think that's a pretty serious problem: we can't have aliases with mismatched permissions; this degrades a deterministic protection (read-only) to a probabilistic protection (knowing where the alias of a target is mapped). Having an attack be "needs some info leaks" instead of "need execution control to change perms" is a much lower bar, IMO. > experience has been that we cannot safely break down larger page sizes > at runtime like x86 does. We could theoretically > add support for fixing up the alias if PAGE_POISONING is enabled but > I don't know who would actually use that in production. Performance > is very poor at that point. Why does using finer granularity on the physmap degrade performance? I assume TLB pressure, but what is heavily using that area? (I must not be understanding what physmap actually gets used for -- I thought it was just a convenience to have a 1:1 virt/phys map for some lookups?) -Kees -- Kees Cook Pixel Security
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On Wed, Feb 14, 2018 at 11:06 AM, Laura Abbott wrote: > On 02/13/2018 01:43 PM, Kees Cook wrote: >> >> On Tue, Feb 13, 2018 at 8:09 AM, Laura Abbott wrote: >>> >>> No, arm64 doesn't fixup the aliases, mostly because arm64 uses larger >>> page sizes which can't be broken down at runtime. CONFIG_PAGE_POISONING >>> does use 4K pages which could be adjusted at runtime. So yes, you are >>> right we would have physmap exposure on arm64 as well. >> >> >> Errr, so that means even modules and kernel code are writable via the >> arm64 physmap? That seems extraordinarily bad. :( >> >> -Kees >> > > (adding linux-arm-kernel and changing the subject) > > Kernel code should be fine, if it isn't that is a bug that should be > fixed. Modules yes are not fully protected. The conclusion from past I think that's a pretty serious problem: we can't have aliases with mismatched permissions; this degrades a deterministic protection (read-only) to a probabilistic protection (knowing where the alias of a target is mapped). Having an attack be "needs some info leaks" instead of "need execution control to change perms" is a much lower bar, IMO. > experience has been that we cannot safely break down larger page sizes > at runtime like x86 does. We could theoretically > add support for fixing up the alias if PAGE_POISONING is enabled but > I don't know who would actually use that in production. Performance > is very poor at that point. Why does using finer granularity on the physmap degrade performance? I assume TLB pressure, but what is heavily using that area? (I must not be understanding what physmap actually gets used for -- I thought it was just a convenience to have a 1:1 virt/phys map for some lookups?) -Kees -- Kees Cook Pixel Security
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On 14 February 2018 at 19:06, Laura Abbottwrote: > On 02/13/2018 01:43 PM, Kees Cook wrote: >> >> On Tue, Feb 13, 2018 at 8:09 AM, Laura Abbott wrote: >>> >>> No, arm64 doesn't fixup the aliases, mostly because arm64 uses larger >>> page sizes which can't be broken down at runtime. CONFIG_PAGE_POISONING >>> does use 4K pages which could be adjusted at runtime. So yes, you are >>> right we would have physmap exposure on arm64 as well. >> >> >> Errr, so that means even modules and kernel code are writable via the >> arm64 physmap? That seems extraordinarily bad. :( >> >> -Kees >> > > (adding linux-arm-kernel and changing the subject) > > Kernel code should be fine, if it isn't that is a bug that should be > fixed. We take care to ensure that the linear alias of the core kernel's .text and .rodata segments are mapped read-only. When we first moved the kernel out of the linear region, we did not map it there at all anymore, but that broke hibernation so we had to put something back. > Modules yes are not fully protected. The conclusion from past > experience has been that we cannot safely break down larger page sizes > at runtime like x86 does. We could theoretically > add support for fixing up the alias if PAGE_POISONING is enabled but > I don't know who would actually use that in production. Performance > is very poor at that point. > As long as the linear alias of the module is mapped down to pages, we should be able to tweak the permissions. I take it that PAGE_POISONING does more than just that?
Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On 14 February 2018 at 19:06, Laura Abbott wrote: > On 02/13/2018 01:43 PM, Kees Cook wrote: >> >> On Tue, Feb 13, 2018 at 8:09 AM, Laura Abbott wrote: >>> >>> No, arm64 doesn't fixup the aliases, mostly because arm64 uses larger >>> page sizes which can't be broken down at runtime. CONFIG_PAGE_POISONING >>> does use 4K pages which could be adjusted at runtime. So yes, you are >>> right we would have physmap exposure on arm64 as well. >> >> >> Errr, so that means even modules and kernel code are writable via the >> arm64 physmap? That seems extraordinarily bad. :( >> >> -Kees >> > > (adding linux-arm-kernel and changing the subject) > > Kernel code should be fine, if it isn't that is a bug that should be > fixed. We take care to ensure that the linear alias of the core kernel's .text and .rodata segments are mapped read-only. When we first moved the kernel out of the linear region, we did not map it there at all anymore, but that broke hibernation so we had to put something back. > Modules yes are not fully protected. The conclusion from past > experience has been that we cannot safely break down larger page sizes > at runtime like x86 does. We could theoretically > add support for fixing up the alias if PAGE_POISONING is enabled but > I don't know who would actually use that in production. Performance > is very poor at that point. > As long as the linear alias of the module is mapped down to pages, we should be able to tweak the permissions. I take it that PAGE_POISONING does more than just that?
arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On 02/13/2018 01:43 PM, Kees Cook wrote: On Tue, Feb 13, 2018 at 8:09 AM, Laura Abbottwrote: No, arm64 doesn't fixup the aliases, mostly because arm64 uses larger page sizes which can't be broken down at runtime. CONFIG_PAGE_POISONING does use 4K pages which could be adjusted at runtime. So yes, you are right we would have physmap exposure on arm64 as well. Errr, so that means even modules and kernel code are writable via the arm64 physmap? That seems extraordinarily bad. :( -Kees (adding linux-arm-kernel and changing the subject) Kernel code should be fine, if it isn't that is a bug that should be fixed. Modules yes are not fully protected. The conclusion from past experience has been that we cannot safely break down larger page sizes at runtime like x86 does. We could theoretically add support for fixing up the alias if PAGE_POISONING is enabled but I don't know who would actually use that in production. Performance is very poor at that point. Thanks, Laura
arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)
On 02/13/2018 01:43 PM, Kees Cook wrote: On Tue, Feb 13, 2018 at 8:09 AM, Laura Abbott wrote: No, arm64 doesn't fixup the aliases, mostly because arm64 uses larger page sizes which can't be broken down at runtime. CONFIG_PAGE_POISONING does use 4K pages which could be adjusted at runtime. So yes, you are right we would have physmap exposure on arm64 as well. Errr, so that means even modules and kernel code are writable via the arm64 physmap? That seems extraordinarily bad. :( -Kees (adding linux-arm-kernel and changing the subject) Kernel code should be fine, if it isn't that is a bug that should be fixed. Modules yes are not fully protected. The conclusion from past experience has been that we cannot safely break down larger page sizes at runtime like x86 does. We could theoretically add support for fixing up the alias if PAGE_POISONING is enabled but I don't know who would actually use that in production. Performance is very poor at that point. Thanks, Laura
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Tue, Feb 13, 2018 at 8:09 AM, Laura Abbottwrote: > No, arm64 doesn't fixup the aliases, mostly because arm64 uses larger > page sizes which can't be broken down at runtime. CONFIG_PAGE_POISONING > does use 4K pages which could be adjusted at runtime. So yes, you are > right we would have physmap exposure on arm64 as well. Errr, so that means even modules and kernel code are writable via the arm64 physmap? That seems extraordinarily bad. :( -Kees -- Kees Cook Pixel Security
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Tue, Feb 13, 2018 at 8:09 AM, Laura Abbott wrote: > No, arm64 doesn't fixup the aliases, mostly because arm64 uses larger > page sizes which can't be broken down at runtime. CONFIG_PAGE_POISONING > does use 4K pages which could be adjusted at runtime. So yes, you are > right we would have physmap exposure on arm64 as well. Errr, so that means even modules and kernel code are writable via the arm64 physmap? That seems extraordinarily bad. :( -Kees -- Kees Cook Pixel Security
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 02/13/2018 07:20 AM, Igor Stoppa wrote: Why alterations of page properties are not considered a risk and the physmap is? And how would it be easier (i suppose) to attack the latter? Alterations are certainly a risk but with the physmap the mapping is already there. Find the address and you have access vs. needing to actually modify the properties then do the access. I could also be complete off base on my threat model here so please correct me if I'm wrong. I think your other summaries are good points though and should go in the cover letter. Thanks, Laura
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 02/13/2018 07:20 AM, Igor Stoppa wrote: Why alterations of page properties are not considered a risk and the physmap is? And how would it be easier (i suppose) to attack the latter? Alterations are certainly a risk but with the physmap the mapping is already there. Find the address and you have access vs. needing to actually modify the properties then do the access. I could also be complete off base on my threat model here so please correct me if I'm wrong. I think your other summaries are good points though and should go in the cover letter. Thanks, Laura
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 02/12/2018 07:39 PM, Jann Horn wrote: On Tue, Feb 13, 2018 at 2:25 AM, Kees Cookwrote: On Mon, Feb 12, 2018 at 4:40 PM, Laura Abbott wrote: On 02/12/2018 03:27 PM, Kees Cook wrote: On Sun, Feb 4, 2018 at 7:05 AM, Igor Stoppa wrote: On 04/02/18 00:29, Boris Lukashev wrote: On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa wrote: [...] What you are suggesting, if I have understood it correctly, is that, when the pool is protected, the addresses already given out, will become traps that get resolved through a lookup table that is built based on the content of each allocation. That seems to generate a lot of overhead, not to mention the fact that it might not play very well with the MMU. That is effectively what i'm suggesting - as a form of protection for consumers against direct reads of data which may have been corrupted by some irrelevant means. In the context of pmalloc, it would probably be a separate type of ro+verified pool ok, that seems more like an extension though. ATM I am having problems gaining traction to get even the basic merged :-) I would consider this as a possibility for future work, unless it is said that it's necessary for pmalloc to be accepted ... I would agree: let's get basic functionality in first. Both verification and the physmap part can be done separately, IMO. Skipping over physmap leaves a pretty big area of exposure that could be difficult to solve later. I appreciate this might block basic functionality but I don't think we should just gloss over it without at least some idea of what we would do. What's our exposure on physmap for other regions? e.g. things that are executable, or made read-only later (like __ro_after_init)? I just checked on a system with a 4.9 kernel, and there seems to be no physical memory that is mapped as writable in the init PGD and executable elsewhere. Ah, I think I missed something. At least on X86, set_memory_ro, set_memory_rw, set_memory_nx and set_memory_x all use change_page_attr_clear/change_page_attr_set, which use change_page_attr_set_clr, which calls __change_page_attr_set_clr() with a second parameter "checkalias" that is set to 1 unless the bit being changed is the NX bit, and that parameter causes the invocation of cpa_process_alias(), which will, for mapped ranges, also change the attributes of physmap ranges. set_memory_ro() and so on are also used by the module loading code. But in the ARM64 code, I don't see anything similar. Does anyone with a better understanding of ARM64 want to check whether I missed something? Or maybe, with a recent kernel, check whether executable module pages show up with a second writable mapping in the "kernel_page_tables" file in debugfs? No, arm64 doesn't fixup the aliases, mostly because arm64 uses larger page sizes which can't be broken down at runtime. CONFIG_PAGE_POISONING does use 4K pages which could be adjusted at runtime. So yes, you are right we would have physmap exposure on arm64 as well. To the original question, it does sound like we are actually okay with the physmap. Thanks, Laura
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 02/12/2018 07:39 PM, Jann Horn wrote: On Tue, Feb 13, 2018 at 2:25 AM, Kees Cook wrote: On Mon, Feb 12, 2018 at 4:40 PM, Laura Abbott wrote: On 02/12/2018 03:27 PM, Kees Cook wrote: On Sun, Feb 4, 2018 at 7:05 AM, Igor Stoppa wrote: On 04/02/18 00:29, Boris Lukashev wrote: On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa wrote: [...] What you are suggesting, if I have understood it correctly, is that, when the pool is protected, the addresses already given out, will become traps that get resolved through a lookup table that is built based on the content of each allocation. That seems to generate a lot of overhead, not to mention the fact that it might not play very well with the MMU. That is effectively what i'm suggesting - as a form of protection for consumers against direct reads of data which may have been corrupted by some irrelevant means. In the context of pmalloc, it would probably be a separate type of ro+verified pool ok, that seems more like an extension though. ATM I am having problems gaining traction to get even the basic merged :-) I would consider this as a possibility for future work, unless it is said that it's necessary for pmalloc to be accepted ... I would agree: let's get basic functionality in first. Both verification and the physmap part can be done separately, IMO. Skipping over physmap leaves a pretty big area of exposure that could be difficult to solve later. I appreciate this might block basic functionality but I don't think we should just gloss over it without at least some idea of what we would do. What's our exposure on physmap for other regions? e.g. things that are executable, or made read-only later (like __ro_after_init)? I just checked on a system with a 4.9 kernel, and there seems to be no physical memory that is mapped as writable in the init PGD and executable elsewhere. Ah, I think I missed something. At least on X86, set_memory_ro, set_memory_rw, set_memory_nx and set_memory_x all use change_page_attr_clear/change_page_attr_set, which use change_page_attr_set_clr, which calls __change_page_attr_set_clr() with a second parameter "checkalias" that is set to 1 unless the bit being changed is the NX bit, and that parameter causes the invocation of cpa_process_alias(), which will, for mapped ranges, also change the attributes of physmap ranges. set_memory_ro() and so on are also used by the module loading code. But in the ARM64 code, I don't see anything similar. Does anyone with a better understanding of ARM64 want to check whether I missed something? Or maybe, with a recent kernel, check whether executable module pages show up with a second writable mapping in the "kernel_page_tables" file in debugfs? No, arm64 doesn't fixup the aliases, mostly because arm64 uses larger page sizes which can't be broken down at runtime. CONFIG_PAGE_POISONING does use 4K pages which could be adjusted at runtime. So yes, you are right we would have physmap exposure on arm64 as well. To the original question, it does sound like we are actually okay with the physmap. Thanks, Laura
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Tue, Feb 13, 2018 at 2:25 AM, Kees Cookwrote: > On Mon, Feb 12, 2018 at 4:40 PM, Laura Abbott wrote: >> On 02/12/2018 03:27 PM, Kees Cook wrote: >>> >>> On Sun, Feb 4, 2018 at 7:05 AM, Igor Stoppa >>> wrote: On 04/02/18 00:29, Boris Lukashev wrote: > > On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa > wrote: [...] >> What you are suggesting, if I have understood it correctly, is that, >> when the pool is protected, the addresses already given out, will >> become >> traps that get resolved through a lookup table that is built based on >> the content of each allocation. >> >> That seems to generate a lot of overhead, not to mention the fact that >> it might not play very well with the MMU. > > > That is effectively what i'm suggesting - as a form of protection for > consumers against direct reads of data which may have been corrupted > by some irrelevant means. In the context of pmalloc, it would probably > be a separate type of ro+verified pool ok, that seems more like an extension though. ATM I am having problems gaining traction to get even the basic merged :-) I would consider this as a possibility for future work, unless it is said that it's necessary for pmalloc to be accepted ... >>> >>> >>> I would agree: let's get basic functionality in first. Both >>> verification and the physmap part can be done separately, IMO. >> >> >> Skipping over physmap leaves a pretty big area of exposure that could >> be difficult to solve later. I appreciate this might block basic >> functionality but I don't think we should just gloss over it without >> at least some idea of what we would do. > > What's our exposure on physmap for other regions? e.g. things that are > executable, or made read-only later (like __ro_after_init)? I just checked on a system with a 4.9 kernel, and there seems to be no physical memory that is mapped as writable in the init PGD and executable elsewhere. Ah, I think I missed something. At least on X86, set_memory_ro, set_memory_rw, set_memory_nx and set_memory_x all use change_page_attr_clear/change_page_attr_set, which use change_page_attr_set_clr, which calls __change_page_attr_set_clr() with a second parameter "checkalias" that is set to 1 unless the bit being changed is the NX bit, and that parameter causes the invocation of cpa_process_alias(), which will, for mapped ranges, also change the attributes of physmap ranges. set_memory_ro() and so on are also used by the module loading code. But in the ARM64 code, I don't see anything similar. Does anyone with a better understanding of ARM64 want to check whether I missed something? Or maybe, with a recent kernel, check whether executable module pages show up with a second writable mapping in the "kernel_page_tables" file in debugfs?
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Tue, Feb 13, 2018 at 2:25 AM, Kees Cook wrote: > On Mon, Feb 12, 2018 at 4:40 PM, Laura Abbott wrote: >> On 02/12/2018 03:27 PM, Kees Cook wrote: >>> >>> On Sun, Feb 4, 2018 at 7:05 AM, Igor Stoppa >>> wrote: On 04/02/18 00:29, Boris Lukashev wrote: > > On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa > wrote: [...] >> What you are suggesting, if I have understood it correctly, is that, >> when the pool is protected, the addresses already given out, will >> become >> traps that get resolved through a lookup table that is built based on >> the content of each allocation. >> >> That seems to generate a lot of overhead, not to mention the fact that >> it might not play very well with the MMU. > > > That is effectively what i'm suggesting - as a form of protection for > consumers against direct reads of data which may have been corrupted > by some irrelevant means. In the context of pmalloc, it would probably > be a separate type of ro+verified pool ok, that seems more like an extension though. ATM I am having problems gaining traction to get even the basic merged :-) I would consider this as a possibility for future work, unless it is said that it's necessary for pmalloc to be accepted ... >>> >>> >>> I would agree: let's get basic functionality in first. Both >>> verification and the physmap part can be done separately, IMO. >> >> >> Skipping over physmap leaves a pretty big area of exposure that could >> be difficult to solve later. I appreciate this might block basic >> functionality but I don't think we should just gloss over it without >> at least some idea of what we would do. > > What's our exposure on physmap for other regions? e.g. things that are > executable, or made read-only later (like __ro_after_init)? I just checked on a system with a 4.9 kernel, and there seems to be no physical memory that is mapped as writable in the init PGD and executable elsewhere. Ah, I think I missed something. At least on X86, set_memory_ro, set_memory_rw, set_memory_nx and set_memory_x all use change_page_attr_clear/change_page_attr_set, which use change_page_attr_set_clr, which calls __change_page_attr_set_clr() with a second parameter "checkalias" that is set to 1 unless the bit being changed is the NX bit, and that parameter causes the invocation of cpa_process_alias(), which will, for mapped ranges, also change the attributes of physmap ranges. set_memory_ro() and so on are also used by the module loading code. But in the ARM64 code, I don't see anything similar. Does anyone with a better understanding of ARM64 want to check whether I missed something? Or maybe, with a recent kernel, check whether executable module pages show up with a second writable mapping in the "kernel_page_tables" file in debugfs?
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Mon, Feb 12, 2018 at 4:40 PM, Laura Abbottwrote: > On 02/12/2018 03:27 PM, Kees Cook wrote: >> >> On Sun, Feb 4, 2018 at 7:05 AM, Igor Stoppa >> wrote: >>> >>> On 04/02/18 00:29, Boris Lukashev wrote: On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa wrote: >>> >>> >>> [...] >>> > What you are suggesting, if I have understood it correctly, is that, > when the pool is protected, the addresses already given out, will > become > traps that get resolved through a lookup table that is built based on > the content of each allocation. > > That seems to generate a lot of overhead, not to mention the fact that > it might not play very well with the MMU. That is effectively what i'm suggesting - as a form of protection for consumers against direct reads of data which may have been corrupted by some irrelevant means. In the context of pmalloc, it would probably be a separate type of ro+verified pool >>> >>> ok, that seems more like an extension though. >>> >>> ATM I am having problems gaining traction to get even the basic merged >>> :-) >>> >>> I would consider this as a possibility for future work, unless it is >>> said that it's necessary for pmalloc to be accepted ... >> >> >> I would agree: let's get basic functionality in first. Both >> verification and the physmap part can be done separately, IMO. > > > Skipping over physmap leaves a pretty big area of exposure that could > be difficult to solve later. I appreciate this might block basic > functionality but I don't think we should just gloss over it without > at least some idea of what we would do. What's our exposure on physmap for other regions? e.g. things that are executable, or made read-only later (like __ro_after_init)? -Kees -- Kees Cook Pixel Security
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Mon, Feb 12, 2018 at 4:40 PM, Laura Abbott wrote: > On 02/12/2018 03:27 PM, Kees Cook wrote: >> >> On Sun, Feb 4, 2018 at 7:05 AM, Igor Stoppa >> wrote: >>> >>> On 04/02/18 00:29, Boris Lukashev wrote: On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa wrote: >>> >>> >>> [...] >>> > What you are suggesting, if I have understood it correctly, is that, > when the pool is protected, the addresses already given out, will > become > traps that get resolved through a lookup table that is built based on > the content of each allocation. > > That seems to generate a lot of overhead, not to mention the fact that > it might not play very well with the MMU. That is effectively what i'm suggesting - as a form of protection for consumers against direct reads of data which may have been corrupted by some irrelevant means. In the context of pmalloc, it would probably be a separate type of ro+verified pool >>> >>> ok, that seems more like an extension though. >>> >>> ATM I am having problems gaining traction to get even the basic merged >>> :-) >>> >>> I would consider this as a possibility for future work, unless it is >>> said that it's necessary for pmalloc to be accepted ... >> >> >> I would agree: let's get basic functionality in first. Both >> verification and the physmap part can be done separately, IMO. > > > Skipping over physmap leaves a pretty big area of exposure that could > be difficult to solve later. I appreciate this might block basic > functionality but I don't think we should just gloss over it without > at least some idea of what we would do. What's our exposure on physmap for other regions? e.g. things that are executable, or made read-only later (like __ro_after_init)? -Kees -- Kees Cook Pixel Security
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 02/12/2018 03:27 PM, Kees Cook wrote: On Sun, Feb 4, 2018 at 7:05 AM, Igor Stoppawrote: On 04/02/18 00:29, Boris Lukashev wrote: On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa wrote: [...] What you are suggesting, if I have understood it correctly, is that, when the pool is protected, the addresses already given out, will become traps that get resolved through a lookup table that is built based on the content of each allocation. That seems to generate a lot of overhead, not to mention the fact that it might not play very well with the MMU. That is effectively what i'm suggesting - as a form of protection for consumers against direct reads of data which may have been corrupted by some irrelevant means. In the context of pmalloc, it would probably be a separate type of ro+verified pool ok, that seems more like an extension though. ATM I am having problems gaining traction to get even the basic merged :-) I would consider this as a possibility for future work, unless it is said that it's necessary for pmalloc to be accepted ... I would agree: let's get basic functionality in first. Both verification and the physmap part can be done separately, IMO. Skipping over physmap leaves a pretty big area of exposure that could be difficult to solve later. I appreciate this might block basic functionality but I don't think we should just gloss over it without at least some idea of what we would do. Thanks, Laura
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 02/12/2018 03:27 PM, Kees Cook wrote: On Sun, Feb 4, 2018 at 7:05 AM, Igor Stoppa wrote: On 04/02/18 00:29, Boris Lukashev wrote: On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa wrote: [...] What you are suggesting, if I have understood it correctly, is that, when the pool is protected, the addresses already given out, will become traps that get resolved through a lookup table that is built based on the content of each allocation. That seems to generate a lot of overhead, not to mention the fact that it might not play very well with the MMU. That is effectively what i'm suggesting - as a form of protection for consumers against direct reads of data which may have been corrupted by some irrelevant means. In the context of pmalloc, it would probably be a separate type of ro+verified pool ok, that seems more like an extension though. ATM I am having problems gaining traction to get even the basic merged :-) I would consider this as a possibility for future work, unless it is said that it's necessary for pmalloc to be accepted ... I would agree: let's get basic functionality in first. Both verification and the physmap part can be done separately, IMO. Skipping over physmap leaves a pretty big area of exposure that could be difficult to solve later. I appreciate this might block basic functionality but I don't think we should just gloss over it without at least some idea of what we would do. Thanks, Laura
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Sun, Feb 4, 2018 at 7:05 AM, Igor Stoppawrote: > On 04/02/18 00:29, Boris Lukashev wrote: >> On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa wrote: > > [...] > >>> What you are suggesting, if I have understood it correctly, is that, >>> when the pool is protected, the addresses already given out, will become >>> traps that get resolved through a lookup table that is built based on >>> the content of each allocation. >>> >>> That seems to generate a lot of overhead, not to mention the fact that >>> it might not play very well with the MMU. >> >> That is effectively what i'm suggesting - as a form of protection for >> consumers against direct reads of data which may have been corrupted >> by some irrelevant means. In the context of pmalloc, it would probably >> be a separate type of ro+verified pool > ok, that seems more like an extension though. > > ATM I am having problems gaining traction to get even the basic merged :-) > > I would consider this as a possibility for future work, unless it is > said that it's necessary for pmalloc to be accepted ... I would agree: let's get basic functionality in first. Both verification and the physmap part can be done separately, IMO. -Kees -- Kees Cook Pixel Security
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Sun, Feb 4, 2018 at 7:05 AM, Igor Stoppa wrote: > On 04/02/18 00:29, Boris Lukashev wrote: >> On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa wrote: > > [...] > >>> What you are suggesting, if I have understood it correctly, is that, >>> when the pool is protected, the addresses already given out, will become >>> traps that get resolved through a lookup table that is built based on >>> the content of each allocation. >>> >>> That seems to generate a lot of overhead, not to mention the fact that >>> it might not play very well with the MMU. >> >> That is effectively what i'm suggesting - as a form of protection for >> consumers against direct reads of data which may have been corrupted >> by some irrelevant means. In the context of pmalloc, it would probably >> be a separate type of ro+verified pool > ok, that seems more like an extension though. > > ATM I am having problems gaining traction to get even the basic merged :-) > > I would consider this as a possibility for future work, unless it is > said that it's necessary for pmalloc to be accepted ... I would agree: let's get basic functionality in first. Both verification and the physmap part can be done separately, IMO. -Kees -- Kees Cook Pixel Security
[PATCH 4/6] Protectable Memory
The MMU available in many systems running Linux can often provide R/O protection to the memory pages it handles. However, the MMU-based protection works efficiently only when said pages contain exclusively data that will not need further modifications. Statically allocated variables can be segregated into a dedicated section, but this does not sit very well with dynamically allocated ones. Dynamic allocation does not provide, currently, any means for grouping variables in memory pages that would contain exclusively data suitable for conversion to read only access mode. The allocator here provided (pmalloc - protectable memory allocator) introduces the concept of pools of protectable memory. A module can request a pool and then refer any allocation request to the pool handler it has received. Once all the chunks of memory associated to a specific pool are initialized, the pool can be protected. After this point, the pool can only be destroyed (it is up to the module to avoid any further references to the memory from the pool, after the destruction is invoked). The latter case is mainly meant for releasing memory, when a module is unloaded. A module can have as many pools as needed, for example to support the protection of data that is initialized in sufficiently distinct phases. Signed-off-by: Igor Stoppa--- include/linux/genalloc.h | 3 + include/linux/pmalloc.h | 242 +++ include/linux/vmalloc.h | 1 + lib/genalloc.c | 27 +++ mm/Kconfig | 6 + mm/Makefile | 1 + mm/pmalloc.c | 499 +++ mm/usercopy.c| 33 8 files changed, 812 insertions(+) create mode 100644 include/linux/pmalloc.h create mode 100644 mm/pmalloc.c diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index dcaa33e74b1c..b6c4cea9fbd8 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t, extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); + +extern void gen_pool_flush_chunk(struct gen_pool *pool, +struct gen_pool_chunk *chunk); extern void gen_pool_for_each_chunk(struct gen_pool *, void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); extern size_t gen_pool_avail(struct gen_pool *); diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h new file mode 100644 index ..afc2068d5545 --- /dev/null +++ b/include/linux/pmalloc.h @@ -0,0 +1,242 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * pmalloc.h: Header for Protectable Memory Allocator + * + * (C) Copyright 2017 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#ifndef _LINUX_PMALLOC_H +#define _LINUX_PMALLOC_H + + +#include +#include + +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) + +/* + * Library for dynamic allocation of pools of memory that can be, + * after initialization, marked as read-only. + * + * This is intended to complement __read_only_after_init, for those cases + * where either it is not possible to know the initialization value before + * init is completed, or the amount of data is variable and can be + * determined only at run-time. + * + * ***WARNING*** + * The user of the API is expected to synchronize: + * 1) allocation, + * 2) writes to the allocated memory, + * 3) write protection of the pool, + * 4) freeing of the allocated memory, and + * 5) destruction of the pool. + * + * For a non-threaded scenario, this type of locking is not even required. + * + * Even if the library were to provide support for locking, point 2) + * would still depend on the user taking the lock. + */ + + +/** + * pmalloc_create_pool() - create a new protectable memory pool + * @name: the name of the pool, enforced to be unique + * @min_alloc_order: log2 of the minimum allocation size obtainable + * from the pool + * + * Creates a new (empty) memory pool for allocation of protectable + * memory. Memory will be allocated upon request (through pmalloc). + * + * Return: + * * pointer to the new pool - success + * * NULL - error + */ +struct gen_pool *pmalloc_create_pool(const char *name, +int min_alloc_order); + +/** + * is_pmalloc_object() - validates the existence of an alleged object + * @ptr: address of the object + * @n: size of the object, in bytes + * + * Return: + * * 0 - the object does not belong to pmalloc + * * 1 - the object belongs to pmalloc + * * \-1 - the object overlaps pmalloc memory incorrectly + */ +int is_pmalloc_object(const void *ptr, const unsigned long n); + +/** + * pmalloc_prealloc() - tries to allocate a memory chunk of the requested size + *
[PATCH 4/6] Protectable Memory
The MMU available in many systems running Linux can often provide R/O protection to the memory pages it handles. However, the MMU-based protection works efficiently only when said pages contain exclusively data that will not need further modifications. Statically allocated variables can be segregated into a dedicated section, but this does not sit very well with dynamically allocated ones. Dynamic allocation does not provide, currently, any means for grouping variables in memory pages that would contain exclusively data suitable for conversion to read only access mode. The allocator here provided (pmalloc - protectable memory allocator) introduces the concept of pools of protectable memory. A module can request a pool and then refer any allocation request to the pool handler it has received. Once all the chunks of memory associated to a specific pool are initialized, the pool can be protected. After this point, the pool can only be destroyed (it is up to the module to avoid any further references to the memory from the pool, after the destruction is invoked). The latter case is mainly meant for releasing memory, when a module is unloaded. A module can have as many pools as needed, for example to support the protection of data that is initialized in sufficiently distinct phases. Signed-off-by: Igor Stoppa --- include/linux/genalloc.h | 3 + include/linux/pmalloc.h | 242 +++ include/linux/vmalloc.h | 1 + lib/genalloc.c | 27 +++ mm/Kconfig | 6 + mm/Makefile | 1 + mm/pmalloc.c | 499 +++ mm/usercopy.c| 33 8 files changed, 812 insertions(+) create mode 100644 include/linux/pmalloc.h create mode 100644 mm/pmalloc.c diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index dcaa33e74b1c..b6c4cea9fbd8 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t, extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); + +extern void gen_pool_flush_chunk(struct gen_pool *pool, +struct gen_pool_chunk *chunk); extern void gen_pool_for_each_chunk(struct gen_pool *, void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); extern size_t gen_pool_avail(struct gen_pool *); diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h new file mode 100644 index ..afc2068d5545 --- /dev/null +++ b/include/linux/pmalloc.h @@ -0,0 +1,242 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * pmalloc.h: Header for Protectable Memory Allocator + * + * (C) Copyright 2017 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#ifndef _LINUX_PMALLOC_H +#define _LINUX_PMALLOC_H + + +#include +#include + +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) + +/* + * Library for dynamic allocation of pools of memory that can be, + * after initialization, marked as read-only. + * + * This is intended to complement __read_only_after_init, for those cases + * where either it is not possible to know the initialization value before + * init is completed, or the amount of data is variable and can be + * determined only at run-time. + * + * ***WARNING*** + * The user of the API is expected to synchronize: + * 1) allocation, + * 2) writes to the allocated memory, + * 3) write protection of the pool, + * 4) freeing of the allocated memory, and + * 5) destruction of the pool. + * + * For a non-threaded scenario, this type of locking is not even required. + * + * Even if the library were to provide support for locking, point 2) + * would still depend on the user taking the lock. + */ + + +/** + * pmalloc_create_pool() - create a new protectable memory pool + * @name: the name of the pool, enforced to be unique + * @min_alloc_order: log2 of the minimum allocation size obtainable + * from the pool + * + * Creates a new (empty) memory pool for allocation of protectable + * memory. Memory will be allocated upon request (through pmalloc). + * + * Return: + * * pointer to the new pool - success + * * NULL - error + */ +struct gen_pool *pmalloc_create_pool(const char *name, +int min_alloc_order); + +/** + * is_pmalloc_object() - validates the existence of an alleged object + * @ptr: address of the object + * @n: size of the object, in bytes + * + * Return: + * * 0 - the object does not belong to pmalloc + * * 1 - the object belongs to pmalloc + * * \-1 - the object overlaps pmalloc memory incorrectly + */ +int is_pmalloc_object(const void *ptr, const unsigned long n); + +/** + * pmalloc_prealloc() - tries to allocate a memory chunk of the requested size + * @pool: handle to the pool to be used for memory
Re: [PATCH 4/6] Protectable Memory
On 12/02/18 17:31, Mike Rapoport wrote: [...] > Seems that kernel-doc does not consider () as a valid match for the > identifier :) > > Can you please check with the below patch? yes, it works now, than you! -- igor
Re: [PATCH 4/6] Protectable Memory
On 12/02/18 17:31, Mike Rapoport wrote: [...] > Seems that kernel-doc does not consider () as a valid match for the > identifier :) > > Can you please check with the below patch? yes, it works now, than you! -- igor
Re: [PATCH 4/6] Protectable Memory
On Mon, Feb 12, 2018 at 03:41:57PM +0200, Igor Stoppa wrote: > > > On 12/02/18 14:53, Mike Rapoport wrote: > > 'scripts/kernel-doc -v -none > > That has a quite interesting behavior. > > I run it on genalloc.c while I am in the process of adding the brackets > to the function names in the kernel-doc description. > > The brackets confuse the script and it fails to output the name of the > function in the log: > > lib/genalloc.c:123: info: Scanning doc for get_bitmap_entry > lib/genalloc.c:139: info: Scanning doc for > lib/genalloc.c:152: info: Scanning doc for > lib/genalloc.c:164: info: Scanning doc for > > The first function does not have the brackets. > The others do. So what should I do with the missing brackets? > Add them, according to the kernel docs, or leave them out? Seems that kernel-doc does not consider () as a valid match for the identifier :) Can you please check with the below patch? > I'd lean toward adding them. > > -- > igor -- Sincerely yours, Mike. >From 35255bc2d7d2a63be4f78a7bf4eec83ab0dc4f3f Mon Sep 17 00:00:00 2001 From: Mike RapoportDate: Mon, 12 Feb 2018 17:19:04 +0200 Subject: [PATCH] scripts: kernel_doc: fixup reporting of function identifiers When function description includes brackets after the function name as suggested by Documentation/doc-guide/kernel-doc, the kernel-doc script omits the function name from "Scanning doc for" report. Extending match for identifier name with optional brackets fixes this issue. Signed-off-by: Mike Rapoport --- scripts/kernel-doc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/kernel-doc b/scripts/kernel-doc index fee8952037b1..a6a9a8ef116c 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc @@ -1873,7 +1873,7 @@ sub process_file($) { } elsif (/$doc_decl/o) { $identifier = $1; - if (/\s*([\w\s]+?)\s*-/) { + if (/\s*([\w\s]+?)(\(\))?\s*-/) { $identifier = $1; } -- 2.7.4
Re: [PATCH 4/6] Protectable Memory
On Mon, Feb 12, 2018 at 03:41:57PM +0200, Igor Stoppa wrote: > > > On 12/02/18 14:53, Mike Rapoport wrote: > > 'scripts/kernel-doc -v -none > > That has a quite interesting behavior. > > I run it on genalloc.c while I am in the process of adding the brackets > to the function names in the kernel-doc description. > > The brackets confuse the script and it fails to output the name of the > function in the log: > > lib/genalloc.c:123: info: Scanning doc for get_bitmap_entry > lib/genalloc.c:139: info: Scanning doc for > lib/genalloc.c:152: info: Scanning doc for > lib/genalloc.c:164: info: Scanning doc for > > The first function does not have the brackets. > The others do. So what should I do with the missing brackets? > Add them, according to the kernel docs, or leave them out? Seems that kernel-doc does not consider () as a valid match for the identifier :) Can you please check with the below patch? > I'd lean toward adding them. > > -- > igor -- Sincerely yours, Mike. >From 35255bc2d7d2a63be4f78a7bf4eec83ab0dc4f3f Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Mon, 12 Feb 2018 17:19:04 +0200 Subject: [PATCH] scripts: kernel_doc: fixup reporting of function identifiers When function description includes brackets after the function name as suggested by Documentation/doc-guide/kernel-doc, the kernel-doc script omits the function name from "Scanning doc for" report. Extending match for identifier name with optional brackets fixes this issue. Signed-off-by: Mike Rapoport --- scripts/kernel-doc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/kernel-doc b/scripts/kernel-doc index fee8952037b1..a6a9a8ef116c 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc @@ -1873,7 +1873,7 @@ sub process_file($) { } elsif (/$doc_decl/o) { $identifier = $1; - if (/\s*([\w\s]+?)\s*-/) { + if (/\s*([\w\s]+?)(\(\))?\s*-/) { $identifier = $1; } -- 2.7.4
Re: [PATCH 4/6] Protectable Memory
On 12/02/18 14:53, Mike Rapoport wrote: > 'scripts/kernel-doc -v -none That has a quite interesting behavior. I run it on genalloc.c while I am in the process of adding the brackets to the function names in the kernel-doc description. The brackets confuse the script and it fails to output the name of the function in the log: lib/genalloc.c:123: info: Scanning doc for get_bitmap_entry lib/genalloc.c:139: info: Scanning doc for lib/genalloc.c:152: info: Scanning doc for lib/genalloc.c:164: info: Scanning doc for The first function does not have the brackets. The others do. So what should I do with the missing brackets? Add them, according to the kernel docs, or leave them out? I'd lean toward adding them. -- igor
Re: [PATCH 4/6] Protectable Memory
On 12/02/18 14:53, Mike Rapoport wrote: > 'scripts/kernel-doc -v -none That has a quite interesting behavior. I run it on genalloc.c while I am in the process of adding the brackets to the function names in the kernel-doc description. The brackets confuse the script and it fails to output the name of the function in the log: lib/genalloc.c:123: info: Scanning doc for get_bitmap_entry lib/genalloc.c:139: info: Scanning doc for lib/genalloc.c:152: info: Scanning doc for lib/genalloc.c:164: info: Scanning doc for The first function does not have the brackets. The others do. So what should I do with the missing brackets? Add them, according to the kernel docs, or leave them out? I'd lean toward adding them. -- igor
Re: [PATCH 4/6] Protectable Memory
On Mon, Feb 12, 2018 at 01:43:11PM +0200, Mike Rapoport wrote: > On Mon, Feb 12, 2018 at 01:26:28PM +0200, Igor Stoppa wrote: > > On 11/02/18 14:37, Mike Rapoport wrote: > > > On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote: > > > > >> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs > > >> to > > >> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly. > > > > > > typo:^ memory > > > > thanks :-( > > > > [...] > > > > >> +/** > > >> + * When the sysfs is ready to receive registrations, connect all the > > >> + * pools previously created. Also enable further pools to be connected > > >> + * right away. > > >> + */ > > > > > > This does not seem as kernel-doc comment. Please either remove the second > > > * > > > from the opening comment mark or reformat the comment. > > > > For this too, I thought I had caught them all, but I was wrong ... > > > > I didn't find any mention of automated checking for comments. > > Is there such tool? > > I don't know if there is a tool. I couldn't find anything in scripts, maybe > somebody have such tool out of tree. > > For now, I've added mm-api.rst that includes all mm .c files and run 'make > htmldocs' which spits plenty of warnings and errors. Actually, you can run 'scripts/kernel-doc -v -none ' to check comments starting with '/**'. I afraid it won't catch formatted blocks that start with '/*' -- Sincerely yours, Mike.
Re: [PATCH 4/6] Protectable Memory
On Mon, Feb 12, 2018 at 01:43:11PM +0200, Mike Rapoport wrote: > On Mon, Feb 12, 2018 at 01:26:28PM +0200, Igor Stoppa wrote: > > On 11/02/18 14:37, Mike Rapoport wrote: > > > On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote: > > > > >> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs > > >> to > > >> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly. > > > > > > typo:^ memory > > > > thanks :-( > > > > [...] > > > > >> +/** > > >> + * When the sysfs is ready to receive registrations, connect all the > > >> + * pools previously created. Also enable further pools to be connected > > >> + * right away. > > >> + */ > > > > > > This does not seem as kernel-doc comment. Please either remove the second > > > * > > > from the opening comment mark or reformat the comment. > > > > For this too, I thought I had caught them all, but I was wrong ... > > > > I didn't find any mention of automated checking for comments. > > Is there such tool? > > I don't know if there is a tool. I couldn't find anything in scripts, maybe > somebody have such tool out of tree. > > For now, I've added mm-api.rst that includes all mm .c files and run 'make > htmldocs' which spits plenty of warnings and errors. Actually, you can run 'scripts/kernel-doc -v -none ' to check comments starting with '/**'. I afraid it won't catch formatted blocks that start with '/*' -- Sincerely yours, Mike.
Re: [PATCH 4/6] Protectable Memory
On Mon, Feb 12, 2018 at 01:26:28PM +0200, Igor Stoppa wrote: > On 11/02/18 14:37, Mike Rapoport wrote: > > On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote: > > >> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to > >> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly. > > > > typo:^ memory > > thanks :-( > > [...] > > >> +/** > >> + * When the sysfs is ready to receive registrations, connect all the > >> + * pools previously created. Also enable further pools to be connected > >> + * right away. > >> + */ > > > > This does not seem as kernel-doc comment. Please either remove the second * > > from the opening comment mark or reformat the comment. > > For this too, I thought I had caught them all, but I was wrong ... > > I didn't find any mention of automated checking for comments. > Is there such tool? I don't know if there is a tool. I couldn't find anything in scripts, maybe somebody have such tool out of tree. For now, I've added mm-api.rst that includes all mm .c files and run 'make htmldocs' which spits plenty of warnings and errors. > -- > thanks, igor > -- Sincerely yours, Mike.
Re: [PATCH 4/6] Protectable Memory
On Mon, Feb 12, 2018 at 01:26:28PM +0200, Igor Stoppa wrote: > On 11/02/18 14:37, Mike Rapoport wrote: > > On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote: > > >> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to > >> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly. > > > > typo:^ memory > > thanks :-( > > [...] > > >> +/** > >> + * When the sysfs is ready to receive registrations, connect all the > >> + * pools previously created. Also enable further pools to be connected > >> + * right away. > >> + */ > > > > This does not seem as kernel-doc comment. Please either remove the second * > > from the opening comment mark or reformat the comment. > > For this too, I thought I had caught them all, but I was wrong ... > > I didn't find any mention of automated checking for comments. > Is there such tool? I don't know if there is a tool. I couldn't find anything in scripts, maybe somebody have such tool out of tree. For now, I've added mm-api.rst that includes all mm .c files and run 'make htmldocs' which spits plenty of warnings and errors. > -- > thanks, igor > -- Sincerely yours, Mike.
Re: [PATCH 4/6] Protectable Memory
On 11/02/18 14:37, Mike Rapoport wrote: > On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote: >> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to >> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly. > > typo:^ memory thanks :-( [...] >> +/** >> + * When the sysfs is ready to receive registrations, connect all the >> + * pools previously created. Also enable further pools to be connected >> + * right away. >> + */ > > This does not seem as kernel-doc comment. Please either remove the second * > from the opening comment mark or reformat the comment. For this too, I thought I had caught them all, but I was wrong ... I didn't find any mention of automated checking for comments. Is there such tool? -- thanks, igor
Re: [PATCH 4/6] Protectable Memory
On 11/02/18 14:37, Mike Rapoport wrote: > On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote: >> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to >> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly. > > typo:^ memory thanks :-( [...] >> +/** >> + * When the sysfs is ready to receive registrations, connect all the >> + * pools previously created. Also enable further pools to be connected >> + * right away. >> + */ > > This does not seem as kernel-doc comment. Please either remove the second * > from the opening comment mark or reformat the comment. For this too, I thought I had caught them all, but I was wrong ... I didn't find any mention of automated checking for comments. Is there such tool? -- thanks, igor
Re: [PATCH 4/6] Protectable Memory
On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote: > The MMU available in many systems running Linux can often provide R/O > protection to the memory pages it handles. > > However, the MMU-based protection works efficiently only when said pages > contain exclusively data that will not need further modifications. > > Statically allocated variables can be segregated into a dedicated > section, but this does not sit very well with dynamically allocated > ones. > > Dynamic allocation does not provide, currently, any means for grouping > variables in memory pages that would contain exclusively data suitable > for conversion to read only access mode. > > The allocator here provided (pmalloc - protectable memory allocator) > introduces the concept of pools of protectable memory. > > A module can request a pool and then refer any allocation request to the > pool handler it has received. > > Once all the chunks of memory associated to a specific pool are > initialized, the pool can be protected. > > After this point, the pool can only be destroyed (it is up to the module > to avoid any further references to the memory from the pool, after > the destruction is invoked). > > The latter case is mainly meant for releasing memory, when a module is > unloaded. > > A module can have as many pools as needed, for example to support the > protection of data that is initialized in sufficiently distinct phases. > > Signed-off-by: Igor Stoppa> --- > include/linux/genalloc.h | 3 + > include/linux/pmalloc.h | 222 + > include/linux/vmalloc.h | 1 + > lib/genalloc.c | 27 +++ > mm/Kconfig | 6 + > mm/Makefile | 1 + > mm/pmalloc.c | 497 > +++ > mm/usercopy.c| 33 > 8 files changed, 790 insertions(+) > create mode 100644 include/linux/pmalloc.h > create mode 100644 mm/pmalloc.c > > diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h > index dcaa33e74b1c..b6c4cea9fbd8 100644 > --- a/include/linux/genalloc.h > +++ b/include/linux/genalloc.h > @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool > *, size_t, > extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, > dma_addr_t *dma); > extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); > + > +extern void gen_pool_flush_chunk(struct gen_pool *pool, > + struct gen_pool_chunk *chunk); > extern void gen_pool_for_each_chunk(struct gen_pool *, > void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); > extern size_t gen_pool_avail(struct gen_pool *); > diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h > new file mode 100644 > index ..624379a937c5 > --- /dev/null > +++ b/include/linux/pmalloc.h > @@ -0,0 +1,222 @@ > +/* SPDX-License-Identifier: GPL-2.0 > + * > + * pmalloc.h: Header for Protectable Memory Allocator > + * > + * (C) Copyright 2017 Huawei Technologies Co. Ltd. > + * Author: Igor Stoppa > + */ > + > +#ifndef _LINUX_PMALLOC_H > +#define _LINUX_PMALLOC_H > + > + > +#include > +#include > + > +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) > + > +/* > + * Library for dynamic allocation of pools of memory that can be, > + * after initialization, marked as read-only. > + * > + * This is intended to complement __read_only_after_init, for those cases > + * where either it is not possible to know the initialization value before > + * init is completed, or the amount of data is variable and can be > + * determined only at run-time. > + * > + * ***WARNING*** > + * The user of the API is expected to synchronize: > + * 1) allocation, > + * 2) writes to the allocated memory, > + * 3) write protection of the pool, > + * 4) freeing of the allocated memory, and > + * 5) destruction of the pool. > + * > + * For a non-threaded scenario, this type of locking is not even required. > + * > + * Even if the library were to provide support for locking, point 2) > + * would still depend on the user taking the lock. > + */ > + > + > +/** > + * pmalloc_create_pool - create a new protectable memory pool > + * @name: the name of the pool, enforced to be unique > + * @min_alloc_order: log2 of the minimum allocation size obtainable > + * from the pool > + * > + * Creates a new (empty) memory pool for allocation of protectable > + * memory. Memory will be allocated upon request (through pmalloc). > + * > + * Return: pointer to the new pool upon success, otherwise a NULL. > + */ > +struct gen_pool *pmalloc_create_pool(const char *name, > + int min_alloc_order); > + > +/** > + * is_pmalloc_object - validates the existence of an alleged object > + * @ptr: address of the object > + * @n: size of the object, in bytes > + * > + * Return: 0 if the object does not belong to pmalloc, 1 if it
Re: [PATCH 4/6] Protectable Memory
On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote: > The MMU available in many systems running Linux can often provide R/O > protection to the memory pages it handles. > > However, the MMU-based protection works efficiently only when said pages > contain exclusively data that will not need further modifications. > > Statically allocated variables can be segregated into a dedicated > section, but this does not sit very well with dynamically allocated > ones. > > Dynamic allocation does not provide, currently, any means for grouping > variables in memory pages that would contain exclusively data suitable > for conversion to read only access mode. > > The allocator here provided (pmalloc - protectable memory allocator) > introduces the concept of pools of protectable memory. > > A module can request a pool and then refer any allocation request to the > pool handler it has received. > > Once all the chunks of memory associated to a specific pool are > initialized, the pool can be protected. > > After this point, the pool can only be destroyed (it is up to the module > to avoid any further references to the memory from the pool, after > the destruction is invoked). > > The latter case is mainly meant for releasing memory, when a module is > unloaded. > > A module can have as many pools as needed, for example to support the > protection of data that is initialized in sufficiently distinct phases. > > Signed-off-by: Igor Stoppa > --- > include/linux/genalloc.h | 3 + > include/linux/pmalloc.h | 222 + > include/linux/vmalloc.h | 1 + > lib/genalloc.c | 27 +++ > mm/Kconfig | 6 + > mm/Makefile | 1 + > mm/pmalloc.c | 497 > +++ > mm/usercopy.c| 33 > 8 files changed, 790 insertions(+) > create mode 100644 include/linux/pmalloc.h > create mode 100644 mm/pmalloc.c > > diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h > index dcaa33e74b1c..b6c4cea9fbd8 100644 > --- a/include/linux/genalloc.h > +++ b/include/linux/genalloc.h > @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool > *, size_t, > extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, > dma_addr_t *dma); > extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); > + > +extern void gen_pool_flush_chunk(struct gen_pool *pool, > + struct gen_pool_chunk *chunk); > extern void gen_pool_for_each_chunk(struct gen_pool *, > void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); > extern size_t gen_pool_avail(struct gen_pool *); > diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h > new file mode 100644 > index ..624379a937c5 > --- /dev/null > +++ b/include/linux/pmalloc.h > @@ -0,0 +1,222 @@ > +/* SPDX-License-Identifier: GPL-2.0 > + * > + * pmalloc.h: Header for Protectable Memory Allocator > + * > + * (C) Copyright 2017 Huawei Technologies Co. Ltd. > + * Author: Igor Stoppa > + */ > + > +#ifndef _LINUX_PMALLOC_H > +#define _LINUX_PMALLOC_H > + > + > +#include > +#include > + > +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) > + > +/* > + * Library for dynamic allocation of pools of memory that can be, > + * after initialization, marked as read-only. > + * > + * This is intended to complement __read_only_after_init, for those cases > + * where either it is not possible to know the initialization value before > + * init is completed, or the amount of data is variable and can be > + * determined only at run-time. > + * > + * ***WARNING*** > + * The user of the API is expected to synchronize: > + * 1) allocation, > + * 2) writes to the allocated memory, > + * 3) write protection of the pool, > + * 4) freeing of the allocated memory, and > + * 5) destruction of the pool. > + * > + * For a non-threaded scenario, this type of locking is not even required. > + * > + * Even if the library were to provide support for locking, point 2) > + * would still depend on the user taking the lock. > + */ > + > + > +/** > + * pmalloc_create_pool - create a new protectable memory pool > + * @name: the name of the pool, enforced to be unique > + * @min_alloc_order: log2 of the minimum allocation size obtainable > + * from the pool > + * > + * Creates a new (empty) memory pool for allocation of protectable > + * memory. Memory will be allocated upon request (through pmalloc). > + * > + * Return: pointer to the new pool upon success, otherwise a NULL. > + */ > +struct gen_pool *pmalloc_create_pool(const char *name, > + int min_alloc_order); > + > +/** > + * is_pmalloc_object - validates the existence of an alleged object > + * @ptr: address of the object > + * @n: size of the object, in bytes > + * > + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to > + * pmalloc, -1 if it partially
[PATCH 4/6] Protectable Memory
The MMU available in many systems running Linux can often provide R/O protection to the memory pages it handles. However, the MMU-based protection works efficiently only when said pages contain exclusively data that will not need further modifications. Statically allocated variables can be segregated into a dedicated section, but this does not sit very well with dynamically allocated ones. Dynamic allocation does not provide, currently, any means for grouping variables in memory pages that would contain exclusively data suitable for conversion to read only access mode. The allocator here provided (pmalloc - protectable memory allocator) introduces the concept of pools of protectable memory. A module can request a pool and then refer any allocation request to the pool handler it has received. Once all the chunks of memory associated to a specific pool are initialized, the pool can be protected. After this point, the pool can only be destroyed (it is up to the module to avoid any further references to the memory from the pool, after the destruction is invoked). The latter case is mainly meant for releasing memory, when a module is unloaded. A module can have as many pools as needed, for example to support the protection of data that is initialized in sufficiently distinct phases. Signed-off-by: Igor Stoppa--- include/linux/genalloc.h | 3 + include/linux/pmalloc.h | 222 + include/linux/vmalloc.h | 1 + lib/genalloc.c | 27 +++ mm/Kconfig | 6 + mm/Makefile | 1 + mm/pmalloc.c | 497 +++ mm/usercopy.c| 33 8 files changed, 790 insertions(+) create mode 100644 include/linux/pmalloc.h create mode 100644 mm/pmalloc.c diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index dcaa33e74b1c..b6c4cea9fbd8 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t, extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); + +extern void gen_pool_flush_chunk(struct gen_pool *pool, +struct gen_pool_chunk *chunk); extern void gen_pool_for_each_chunk(struct gen_pool *, void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); extern size_t gen_pool_avail(struct gen_pool *); diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h new file mode 100644 index ..624379a937c5 --- /dev/null +++ b/include/linux/pmalloc.h @@ -0,0 +1,222 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * pmalloc.h: Header for Protectable Memory Allocator + * + * (C) Copyright 2017 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#ifndef _LINUX_PMALLOC_H +#define _LINUX_PMALLOC_H + + +#include +#include + +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) + +/* + * Library for dynamic allocation of pools of memory that can be, + * after initialization, marked as read-only. + * + * This is intended to complement __read_only_after_init, for those cases + * where either it is not possible to know the initialization value before + * init is completed, or the amount of data is variable and can be + * determined only at run-time. + * + * ***WARNING*** + * The user of the API is expected to synchronize: + * 1) allocation, + * 2) writes to the allocated memory, + * 3) write protection of the pool, + * 4) freeing of the allocated memory, and + * 5) destruction of the pool. + * + * For a non-threaded scenario, this type of locking is not even required. + * + * Even if the library were to provide support for locking, point 2) + * would still depend on the user taking the lock. + */ + + +/** + * pmalloc_create_pool - create a new protectable memory pool + * @name: the name of the pool, enforced to be unique + * @min_alloc_order: log2 of the minimum allocation size obtainable + * from the pool + * + * Creates a new (empty) memory pool for allocation of protectable + * memory. Memory will be allocated upon request (through pmalloc). + * + * Return: pointer to the new pool upon success, otherwise a NULL. + */ +struct gen_pool *pmalloc_create_pool(const char *name, +int min_alloc_order); + +/** + * is_pmalloc_object - validates the existence of an alleged object + * @ptr: address of the object + * @n: size of the object, in bytes + * + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly. + */ +int is_pmalloc_object(const void *ptr, const unsigned long n); + +/** + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size + * @pool: handle to the pool to be used for memory allocation + * @size:
[PATCH 4/6] Protectable Memory
The MMU available in many systems running Linux can often provide R/O protection to the memory pages it handles. However, the MMU-based protection works efficiently only when said pages contain exclusively data that will not need further modifications. Statically allocated variables can be segregated into a dedicated section, but this does not sit very well with dynamically allocated ones. Dynamic allocation does not provide, currently, any means for grouping variables in memory pages that would contain exclusively data suitable for conversion to read only access mode. The allocator here provided (pmalloc - protectable memory allocator) introduces the concept of pools of protectable memory. A module can request a pool and then refer any allocation request to the pool handler it has received. Once all the chunks of memory associated to a specific pool are initialized, the pool can be protected. After this point, the pool can only be destroyed (it is up to the module to avoid any further references to the memory from the pool, after the destruction is invoked). The latter case is mainly meant for releasing memory, when a module is unloaded. A module can have as many pools as needed, for example to support the protection of data that is initialized in sufficiently distinct phases. Signed-off-by: Igor Stoppa --- include/linux/genalloc.h | 3 + include/linux/pmalloc.h | 222 + include/linux/vmalloc.h | 1 + lib/genalloc.c | 27 +++ mm/Kconfig | 6 + mm/Makefile | 1 + mm/pmalloc.c | 497 +++ mm/usercopy.c| 33 8 files changed, 790 insertions(+) create mode 100644 include/linux/pmalloc.h create mode 100644 mm/pmalloc.c diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index dcaa33e74b1c..b6c4cea9fbd8 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t, extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); + +extern void gen_pool_flush_chunk(struct gen_pool *pool, +struct gen_pool_chunk *chunk); extern void gen_pool_for_each_chunk(struct gen_pool *, void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); extern size_t gen_pool_avail(struct gen_pool *); diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h new file mode 100644 index ..624379a937c5 --- /dev/null +++ b/include/linux/pmalloc.h @@ -0,0 +1,222 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * pmalloc.h: Header for Protectable Memory Allocator + * + * (C) Copyright 2017 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#ifndef _LINUX_PMALLOC_H +#define _LINUX_PMALLOC_H + + +#include +#include + +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) + +/* + * Library for dynamic allocation of pools of memory that can be, + * after initialization, marked as read-only. + * + * This is intended to complement __read_only_after_init, for those cases + * where either it is not possible to know the initialization value before + * init is completed, or the amount of data is variable and can be + * determined only at run-time. + * + * ***WARNING*** + * The user of the API is expected to synchronize: + * 1) allocation, + * 2) writes to the allocated memory, + * 3) write protection of the pool, + * 4) freeing of the allocated memory, and + * 5) destruction of the pool. + * + * For a non-threaded scenario, this type of locking is not even required. + * + * Even if the library were to provide support for locking, point 2) + * would still depend on the user taking the lock. + */ + + +/** + * pmalloc_create_pool - create a new protectable memory pool + * @name: the name of the pool, enforced to be unique + * @min_alloc_order: log2 of the minimum allocation size obtainable + * from the pool + * + * Creates a new (empty) memory pool for allocation of protectable + * memory. Memory will be allocated upon request (through pmalloc). + * + * Return: pointer to the new pool upon success, otherwise a NULL. + */ +struct gen_pool *pmalloc_create_pool(const char *name, +int min_alloc_order); + +/** + * is_pmalloc_object - validates the existence of an alleged object + * @ptr: address of the object + * @n: size of the object, in bytes + * + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly. + */ +int is_pmalloc_object(const void *ptr, const unsigned long n); + +/** + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size + * @pool: handle to the pool to be used for memory allocation + * @size: amount of memory (in bytes) requested + * + *
Re: [PATCH 4/6] Protectable Memory
On 05/02/18 00:06, Randy Dunlap wrote: > On 02/04/2018 08:47 AM, Igor Stoppa wrote: [...] >> + * pmalloc_create_pool - create a new protectable memory pool - > > Drop trailing " -". yes >> + * @name: the name of the pool, must be unique > > Is that enforced? Will return NULL if @name is duplicated? ok, I'll state it more clearly that it's enforced [...] >> + * @pool: handler to the pool to be used for memory allocation > > handle (I think) yes, also for all the other ones [...] >> + * avoid sleping during allocation. > > sleeping yes [...] >> + * opposite to what is allocated on-demand when pmalloc runs out of free > > opposed to yes >> + * space already existing in the pool and has to invoke vmalloc. >> + * >> + * Returns true if the vmalloc call was successful, false otherwise. > > Where is the allocated memory (pointer)? I.e., how does the caller know > where that memory is? > Oh, that memory isn't yet available to the caller until it calls pmalloc(), > right? yes, it's a way to: - preemptively beef up the pool, before entering atomic context (unlikely that it will be needed, but possible), so that there is no need to allocate extra pages (assuming one can estimate the max memory that will be requested) - avoid fragmentation caused by allocating smaller groups of pages I'll add explanation for this. [...] >> + * @size: amount of memory (in bytes) requested >> + * @gfp: flags for page allocation >> + * >> + * Allocates memory from an unprotected pool. If the pool doesn't have >> + * enough memory, and the request did not include GFP_ATOMIC, an attempt >> + * is made to add a new chunk of memory to the pool >> + * (a multiple of PAGE_SIZE), in order to fit the new request. > > fill > What if @size is > PAGE_SIZE? Nothing special, it gets rounded up to the nearest multiple of PAGE_SIZE. vmalloc doesn't have only drawbacks ;-) [...] >> + * Returns the pointer to the memory requested upon success, >> + * NULL otherwise (either no memory available or pool already read-only). > > It would be good to use the > * Return: > kernel-doc notation for return values. yes, good point, I'm fixing it everywhere in the patchset [...] >> + * will be availabel for further allocations. > > available yes [...] >> +/** > > /** means that the following comments are kernel-doc notation, but these > comments are not, so just use /* there, please. yes, also to the others [...] >> +/* is_pmalloc_object gets called pretty late, so chances are high >> + * that the object is indeed of vmalloc type >> + */ > > Multi-line comment style is > /* >* comment1 >* comment..N >*/ yes, also to the others -- igor
Re: [PATCH 4/6] Protectable Memory
On 05/02/18 00:06, Randy Dunlap wrote: > On 02/04/2018 08:47 AM, Igor Stoppa wrote: [...] >> + * pmalloc_create_pool - create a new protectable memory pool - > > Drop trailing " -". yes >> + * @name: the name of the pool, must be unique > > Is that enforced? Will return NULL if @name is duplicated? ok, I'll state it more clearly that it's enforced [...] >> + * @pool: handler to the pool to be used for memory allocation > > handle (I think) yes, also for all the other ones [...] >> + * avoid sleping during allocation. > > sleeping yes [...] >> + * opposite to what is allocated on-demand when pmalloc runs out of free > > opposed to yes >> + * space already existing in the pool and has to invoke vmalloc. >> + * >> + * Returns true if the vmalloc call was successful, false otherwise. > > Where is the allocated memory (pointer)? I.e., how does the caller know > where that memory is? > Oh, that memory isn't yet available to the caller until it calls pmalloc(), > right? yes, it's a way to: - preemptively beef up the pool, before entering atomic context (unlikely that it will be needed, but possible), so that there is no need to allocate extra pages (assuming one can estimate the max memory that will be requested) - avoid fragmentation caused by allocating smaller groups of pages I'll add explanation for this. [...] >> + * @size: amount of memory (in bytes) requested >> + * @gfp: flags for page allocation >> + * >> + * Allocates memory from an unprotected pool. If the pool doesn't have >> + * enough memory, and the request did not include GFP_ATOMIC, an attempt >> + * is made to add a new chunk of memory to the pool >> + * (a multiple of PAGE_SIZE), in order to fit the new request. > > fill > What if @size is > PAGE_SIZE? Nothing special, it gets rounded up to the nearest multiple of PAGE_SIZE. vmalloc doesn't have only drawbacks ;-) [...] >> + * Returns the pointer to the memory requested upon success, >> + * NULL otherwise (either no memory available or pool already read-only). > > It would be good to use the > * Return: > kernel-doc notation for return values. yes, good point, I'm fixing it everywhere in the patchset [...] >> + * will be availabel for further allocations. > > available yes [...] >> +/** > > /** means that the following comments are kernel-doc notation, but these > comments are not, so just use /* there, please. yes, also to the others [...] >> +/* is_pmalloc_object gets called pretty late, so chances are high >> + * that the object is indeed of vmalloc type >> + */ > > Multi-line comment style is > /* >* comment1 >* comment..N >*/ yes, also to the others -- igor
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 05/02/18 17:40, Christopher Lameter wrote: > On Sat, 3 Feb 2018, Igor Stoppa wrote: > >>> We could even do this in a more thorough way. Can we use a ring 1 / 2 >>> distinction to create a hardened OS core that policies the rest of >>> the ever expanding kernel with all its modules and this and that feature? >> >> What would be the differentiating criteria? Furthermore, what are the >> chances >> of invalidating the entire concept, because there is already an >> hypervisor using >> the higher level features? >> That is what you are proposing, if I understand correctly. > > Were there not 4 rings as well as methods by the processor vendors to > virtualize them as well? I think you are talking x86, mostly. On ARM there are ELx and they are often (typically?) already used. For x86 I cannot comment. >>> I think that will long term be a better approach and allow more than the >>> current hardening approaches can get you. It seems that we are willing to >>> tolerate significant performance regressions now. So lets use the >>> protection mechanisms that the hardware offers. >> >> I would rather *not* propose significant performance regression :-P > > But we already have implemented significant kernel hardening which causes > performance regressions. Using hardware capabilities allows the processor > vendor to further optimize these mechanisms whereas the software > preventative measures are eating up more and more performance as the pile > them on. Plus these are methods that can be worked around. Restrictions > implemented in a higher ring can be enforced and are much better than > just "hardening" (which is making life difficult for the hackers and > throwing away performannce for the average user). What you are proposing requires major restructuring of the memory management - at the very least - provided that it doesn't cause the conflicts I mentioned above. Even after you do that, the system will still be working with memory pages, there will be still a need to segregate data within certain pages, or pay the penalty of handling exceptions, when data with different permissions coexist within the same page. The way the pmalloc API is designed is meant to facilitate the segregation and to actually improve performance, by grouping types of data with same scope and permission. WRT the implementation, there is a minimal exposure to the memory provider, both for allocation and release. Same goes for the protection mechanism. It's a single call to the function which makes pages read only. It would be trivial to swap it out with a call to whatever framework you want to come up with, for implementing ring/EL based protection. >From this perspective, you can easily provide patches that implement what you are proposing, against pmalloc, if you really think that it's the way to go. I'll be happy to use them, if they provide improved performance and same or better protection. The way I designed pmalloc was really to be able to switch to some alternate memory provider and/or protection mechanism, should a better one arise. But it can be done in a separate step, I think, since you are not proposing to just change pmalloc, you are proposing to re-design how the overall kernel memory hardening works (including executable pages, const data, __ro_after_init, etc.) -- igor
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 05/02/18 17:40, Christopher Lameter wrote: > On Sat, 3 Feb 2018, Igor Stoppa wrote: > >>> We could even do this in a more thorough way. Can we use a ring 1 / 2 >>> distinction to create a hardened OS core that policies the rest of >>> the ever expanding kernel with all its modules and this and that feature? >> >> What would be the differentiating criteria? Furthermore, what are the >> chances >> of invalidating the entire concept, because there is already an >> hypervisor using >> the higher level features? >> That is what you are proposing, if I understand correctly. > > Were there not 4 rings as well as methods by the processor vendors to > virtualize them as well? I think you are talking x86, mostly. On ARM there are ELx and they are often (typically?) already used. For x86 I cannot comment. >>> I think that will long term be a better approach and allow more than the >>> current hardening approaches can get you. It seems that we are willing to >>> tolerate significant performance regressions now. So lets use the >>> protection mechanisms that the hardware offers. >> >> I would rather *not* propose significant performance regression :-P > > But we already have implemented significant kernel hardening which causes > performance regressions. Using hardware capabilities allows the processor > vendor to further optimize these mechanisms whereas the software > preventative measures are eating up more and more performance as the pile > them on. Plus these are methods that can be worked around. Restrictions > implemented in a higher ring can be enforced and are much better than > just "hardening" (which is making life difficult for the hackers and > throwing away performannce for the average user). What you are proposing requires major restructuring of the memory management - at the very least - provided that it doesn't cause the conflicts I mentioned above. Even after you do that, the system will still be working with memory pages, there will be still a need to segregate data within certain pages, or pay the penalty of handling exceptions, when data with different permissions coexist within the same page. The way the pmalloc API is designed is meant to facilitate the segregation and to actually improve performance, by grouping types of data with same scope and permission. WRT the implementation, there is a minimal exposure to the memory provider, both for allocation and release. Same goes for the protection mechanism. It's a single call to the function which makes pages read only. It would be trivial to swap it out with a call to whatever framework you want to come up with, for implementing ring/EL based protection. >From this perspective, you can easily provide patches that implement what you are proposing, against pmalloc, if you really think that it's the way to go. I'll be happy to use them, if they provide improved performance and same or better protection. The way I designed pmalloc was really to be able to switch to some alternate memory provider and/or protection mechanism, should a better one arise. But it can be done in a separate step, I think, since you are not proposing to just change pmalloc, you are proposing to re-design how the overall kernel memory hardening works (including executable pages, const data, __ro_after_init, etc.) -- igor
Re: [PATCH 4/6] Protectable Memory
Hi Igor, Thank you for the patch! Yet something to improve: [auto build test ERROR on kees/for-next/pstore] [also build test ERROR on v4.15] [cannot apply to linus/master mmotm/master next-20180207] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252 base: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore config: um-allmodconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=um All errors (new ones prefixed by >>): arch/um/drivers/vde.o: In function `vde_open_real': (.text+0x951): warning: Using 'getgrnam' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking (.text+0x79c): warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking (.text+0xab5): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametoaddr': (.text+0xdee5): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametonetaddr': (.text+0xdf85): warning: Using 'getnetbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametoproto': (.text+0xe1a5): warning: Using 'getprotobyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametoport': (.text+0xdfd7): warning: Using 'getservbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking mm/usercopy.o: In function `__check_object_size': >> (.text+0x3aa): undefined reference to `is_pmalloc_object' >> collect2: error: ld returned 1 exit status --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 4/6] Protectable Memory
Hi Igor, Thank you for the patch! Yet something to improve: [auto build test ERROR on kees/for-next/pstore] [also build test ERROR on v4.15] [cannot apply to linus/master mmotm/master next-20180207] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252 base: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore config: um-allmodconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=um All errors (new ones prefixed by >>): arch/um/drivers/vde.o: In function `vde_open_real': (.text+0x951): warning: Using 'getgrnam' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking (.text+0x79c): warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking (.text+0xab5): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametoaddr': (.text+0xdee5): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametonetaddr': (.text+0xdf85): warning: Using 'getnetbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametoproto': (.text+0xe1a5): warning: Using 'getprotobyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametoport': (.text+0xdfd7): warning: Using 'getservbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking mm/usercopy.o: In function `__check_object_size': >> (.text+0x3aa): undefined reference to `is_pmalloc_object' >> collect2: error: ld returned 1 exit status --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 4/6] Protectable Memory
Hi Igor, Thank you for the patch! Yet something to improve: [auto build test ERROR on kees/for-next/pstore] [also build test ERROR on v4.15] [cannot apply to linus/master mmotm/master next-20180206] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252 base: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore config: i386-tinyconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 Note: the linux-review/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252 HEAD 99d0cb7905216da7595ef08a781a9be16a8ce687 builds fine. It only hurts bisectibility. All errors (new ones prefixed by >>): >> mm/pmalloc.c:24:10: fatal error: pmalloc-selftest.h: No such file or >> directory #include "pmalloc-selftest.h" ^~~~ compilation terminated. vim +24 mm/pmalloc.c 23 > 24 #include "pmalloc-selftest.h" 25 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 4/6] Protectable Memory
Hi Igor, Thank you for the patch! Yet something to improve: [auto build test ERROR on kees/for-next/pstore] [also build test ERROR on v4.15] [cannot apply to linus/master mmotm/master next-20180206] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252 base: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore config: i386-tinyconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 Note: the linux-review/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252 HEAD 99d0cb7905216da7595ef08a781a9be16a8ce687 builds fine. It only hurts bisectibility. All errors (new ones prefixed by >>): >> mm/pmalloc.c:24:10: fatal error: pmalloc-selftest.h: No such file or >> directory #include "pmalloc-selftest.h" ^~~~ compilation terminated. vim +24 mm/pmalloc.c 23 > 24 #include "pmalloc-selftest.h" 25 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Sat, 3 Feb 2018, Igor Stoppa wrote: > > We could even do this in a more thorough way. Can we use a ring 1 / 2 > > distinction to create a hardened OS core that policies the rest of > > the ever expanding kernel with all its modules and this and that feature? > > What would be the differentiating criteria? Furthermore, what are the > chances > of invalidating the entire concept, because there is already an > hypervisor using > the higher level features? > That is what you are proposing, if I understand correctly. Were there not 4 rings as well as methods by the processor vendors to virtualize them as well? > > I think that will long term be a better approach and allow more than the > > current hardening approaches can get you. It seems that we are willing to > > tolerate significant performance regressions now. So lets use the > > protection mechanisms that the hardware offers. > > I would rather *not* propose significant performance regression :-P But we already have implemented significant kernel hardening which causes performance regressions. Using hardware capabilities allows the processor vendor to further optimize these mechanisms whereas the software preventative measures are eating up more and more performance as the pile them on. Plus these are methods that can be worked around. Restrictions implemented in a higher ring can be enforced and are much better than just "hardening" (which is making life difficult for the hackers and throwing away performannce for the average user).
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Sat, 3 Feb 2018, Igor Stoppa wrote: > > We could even do this in a more thorough way. Can we use a ring 1 / 2 > > distinction to create a hardened OS core that policies the rest of > > the ever expanding kernel with all its modules and this and that feature? > > What would be the differentiating criteria? Furthermore, what are the > chances > of invalidating the entire concept, because there is already an > hypervisor using > the higher level features? > That is what you are proposing, if I understand correctly. Were there not 4 rings as well as methods by the processor vendors to virtualize them as well? > > I think that will long term be a better approach and allow more than the > > current hardening approaches can get you. It seems that we are willing to > > tolerate significant performance regressions now. So lets use the > > protection mechanisms that the hardware offers. > > I would rather *not* propose significant performance regression :-P But we already have implemented significant kernel hardening which causes performance regressions. Using hardware capabilities allows the processor vendor to further optimize these mechanisms whereas the software preventative measures are eating up more and more performance as the pile them on. Plus these are methods that can be worked around. Restrictions implemented in a higher ring can be enforced and are much better than just "hardening" (which is making life difficult for the hackers and throwing away performannce for the average user).
Re: [PATCH 4/6] Protectable Memory
On 02/04/2018 08:47 AM, Igor Stoppa wrote: > The MMU available in many systems running Linux can often provide R/O > protection to the memory pages it handles. > > However, the MMU-based protection works efficiently only when said pages > contain exclusively data that will not need further modifications. > > Statically allocated variables can be segregated into a dedicated > section, but this does not sit very well with dynamically allocated > ones. > > Dynamic allocation does not provide, currently, any means for grouping > variables in memory pages that would contain exclusively data suitable > for conversion to read only access mode. > > The allocator here provided (pmalloc - protectable memory allocator) > introduces the concept of pools of protectable memory. > > A module can request a pool and then refer any allocation request to the > pool handler it has received. > > Once all the chunks of memory associated to a specific pool are > initialized, the pool can be protected. > > After this point, the pool can only be destroyed (it is up to the module > to avoid any further references to the memory from the pool, after > the destruction is invoked). > > The latter case is mainly meant for releasing memory, when a module is > unloaded. > > A module can have as many pools as needed, for example to support the > protection of data that is initialized in sufficiently distinct phases. > > Signed-off-by: Igor Stoppa> --- > include/linux/genalloc.h | 3 + > include/linux/pmalloc.h | 213 > include/linux/vmalloc.h | 1 + > lib/genalloc.c | 27 +++ > mm/Makefile | 1 + > mm/pmalloc.c | 514 > +++ > mm/usercopy.c| 25 ++- > 7 files changed, 780 insertions(+), 4 deletions(-) > create mode 100644 include/linux/pmalloc.h > create mode 100644 mm/pmalloc.c > > diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h > index dcaa33e74b1c..b6c4cea9fbd8 100644 > --- a/include/linux/genalloc.h > +++ b/include/linux/genalloc.h > @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool > *, size_t, > extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, > dma_addr_t *dma); > extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); > + > +extern void gen_pool_flush_chunk(struct gen_pool *pool, > + struct gen_pool_chunk *chunk); > extern void gen_pool_for_each_chunk(struct gen_pool *, > void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); > extern size_t gen_pool_avail(struct gen_pool *); > diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h > new file mode 100644 > index ..5fa8a78be819 > --- /dev/null > +++ b/include/linux/pmalloc.h > @@ -0,0 +1,213 @@ > +/* SPDX-License-Identifier: GPL-2.0 > + * > + * pmalloc.h: Header for Protectable Memory Allocator > + * > + * (C) Copyright 2017 Huawei Technologies Co. Ltd. > + * Author: Igor Stoppa > + */ > + > +#ifndef _PMALLOC_H > +#define _PMALLOC_H use_LINUX_PMALLOC_H_ > + > + > +#include > +#include > + > +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) > + > +/* > + * Library for dynamic allocation of pools of memory that can be, > + * after initialization, marked as read-only. > + * > + * This is intended to complement __read_only_after_init, for those cases > + * where either it is not possible to know the initialization value before > + * init is completed, or the amount of data is variable and can be > + * determined only at run-time. > + * > + * ***WARNING*** > + * The user of the API is expected to synchronize: > + * 1) allocation, > + * 2) writes to the allocated memory, > + * 3) write protection of the pool, > + * 4) freeing of the allocated memory, and > + * 5) destruction of the pool. > + * > + * For a non-threaded scenario, this type of locking is not even required. > + * > + * Even if the library were to provide support for locking, point 2) > + * would still depend on the user taking the lock. > + */ > + > + > +/** > + * pmalloc_create_pool - create a new protectable memory pool - Drop trailing " -". > + * @name: the name of the pool, must be unique Is that enforced? Will return NULL if @name is duplicated? > + * @min_alloc_order: log2 of the minimum allocation size obtainable > + * from the pool > + * > + * Creates a new (empty) memory pool for allocation of protectable > + * memory. Memory will be allocated upon request (through pmalloc). > + * > + * Returns a pointer to the new pool upon success, otherwise a NULL. > + */ > +struct gen_pool *pmalloc_create_pool(const char *name, > + int min_alloc_order); > + > + > +int is_pmalloc_object(const void *ptr, const unsigned long n); > + > +/** > + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size > +
Re: [PATCH 4/6] Protectable Memory
On 02/04/2018 08:47 AM, Igor Stoppa wrote: > The MMU available in many systems running Linux can often provide R/O > protection to the memory pages it handles. > > However, the MMU-based protection works efficiently only when said pages > contain exclusively data that will not need further modifications. > > Statically allocated variables can be segregated into a dedicated > section, but this does not sit very well with dynamically allocated > ones. > > Dynamic allocation does not provide, currently, any means for grouping > variables in memory pages that would contain exclusively data suitable > for conversion to read only access mode. > > The allocator here provided (pmalloc - protectable memory allocator) > introduces the concept of pools of protectable memory. > > A module can request a pool and then refer any allocation request to the > pool handler it has received. > > Once all the chunks of memory associated to a specific pool are > initialized, the pool can be protected. > > After this point, the pool can only be destroyed (it is up to the module > to avoid any further references to the memory from the pool, after > the destruction is invoked). > > The latter case is mainly meant for releasing memory, when a module is > unloaded. > > A module can have as many pools as needed, for example to support the > protection of data that is initialized in sufficiently distinct phases. > > Signed-off-by: Igor Stoppa > --- > include/linux/genalloc.h | 3 + > include/linux/pmalloc.h | 213 > include/linux/vmalloc.h | 1 + > lib/genalloc.c | 27 +++ > mm/Makefile | 1 + > mm/pmalloc.c | 514 > +++ > mm/usercopy.c| 25 ++- > 7 files changed, 780 insertions(+), 4 deletions(-) > create mode 100644 include/linux/pmalloc.h > create mode 100644 mm/pmalloc.c > > diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h > index dcaa33e74b1c..b6c4cea9fbd8 100644 > --- a/include/linux/genalloc.h > +++ b/include/linux/genalloc.h > @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool > *, size_t, > extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, > dma_addr_t *dma); > extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); > + > +extern void gen_pool_flush_chunk(struct gen_pool *pool, > + struct gen_pool_chunk *chunk); > extern void gen_pool_for_each_chunk(struct gen_pool *, > void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); > extern size_t gen_pool_avail(struct gen_pool *); > diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h > new file mode 100644 > index ..5fa8a78be819 > --- /dev/null > +++ b/include/linux/pmalloc.h > @@ -0,0 +1,213 @@ > +/* SPDX-License-Identifier: GPL-2.0 > + * > + * pmalloc.h: Header for Protectable Memory Allocator > + * > + * (C) Copyright 2017 Huawei Technologies Co. Ltd. > + * Author: Igor Stoppa > + */ > + > +#ifndef _PMALLOC_H > +#define _PMALLOC_H use_LINUX_PMALLOC_H_ > + > + > +#include > +#include > + > +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) > + > +/* > + * Library for dynamic allocation of pools of memory that can be, > + * after initialization, marked as read-only. > + * > + * This is intended to complement __read_only_after_init, for those cases > + * where either it is not possible to know the initialization value before > + * init is completed, or the amount of data is variable and can be > + * determined only at run-time. > + * > + * ***WARNING*** > + * The user of the API is expected to synchronize: > + * 1) allocation, > + * 2) writes to the allocated memory, > + * 3) write protection of the pool, > + * 4) freeing of the allocated memory, and > + * 5) destruction of the pool. > + * > + * For a non-threaded scenario, this type of locking is not even required. > + * > + * Even if the library were to provide support for locking, point 2) > + * would still depend on the user taking the lock. > + */ > + > + > +/** > + * pmalloc_create_pool - create a new protectable memory pool - Drop trailing " -". > + * @name: the name of the pool, must be unique Is that enforced? Will return NULL if @name is duplicated? > + * @min_alloc_order: log2 of the minimum allocation size obtainable > + * from the pool > + * > + * Creates a new (empty) memory pool for allocation of protectable > + * memory. Memory will be allocated upon request (through pmalloc). > + * > + * Returns a pointer to the new pool upon success, otherwise a NULL. > + */ > +struct gen_pool *pmalloc_create_pool(const char *name, > + int min_alloc_order); > + > + > +int is_pmalloc_object(const void *ptr, const unsigned long n); > + > +/** > + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size > + * @pool: handler to the pool to be used for
[PATCH 4/6] Protectable Memory
The MMU available in many systems running Linux can often provide R/O protection to the memory pages it handles. However, the MMU-based protection works efficiently only when said pages contain exclusively data that will not need further modifications. Statically allocated variables can be segregated into a dedicated section, but this does not sit very well with dynamically allocated ones. Dynamic allocation does not provide, currently, any means for grouping variables in memory pages that would contain exclusively data suitable for conversion to read only access mode. The allocator here provided (pmalloc - protectable memory allocator) introduces the concept of pools of protectable memory. A module can request a pool and then refer any allocation request to the pool handler it has received. Once all the chunks of memory associated to a specific pool are initialized, the pool can be protected. After this point, the pool can only be destroyed (it is up to the module to avoid any further references to the memory from the pool, after the destruction is invoked). The latter case is mainly meant for releasing memory, when a module is unloaded. A module can have as many pools as needed, for example to support the protection of data that is initialized in sufficiently distinct phases. Signed-off-by: Igor Stoppa--- include/linux/genalloc.h | 3 + include/linux/pmalloc.h | 213 include/linux/vmalloc.h | 1 + lib/genalloc.c | 27 +++ mm/Makefile | 1 + mm/pmalloc.c | 514 +++ mm/usercopy.c| 25 ++- 7 files changed, 780 insertions(+), 4 deletions(-) create mode 100644 include/linux/pmalloc.h create mode 100644 mm/pmalloc.c diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index dcaa33e74b1c..b6c4cea9fbd8 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t, extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); + +extern void gen_pool_flush_chunk(struct gen_pool *pool, +struct gen_pool_chunk *chunk); extern void gen_pool_for_each_chunk(struct gen_pool *, void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); extern size_t gen_pool_avail(struct gen_pool *); diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h new file mode 100644 index ..5fa8a78be819 --- /dev/null +++ b/include/linux/pmalloc.h @@ -0,0 +1,213 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * pmalloc.h: Header for Protectable Memory Allocator + * + * (C) Copyright 2017 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#ifndef _PMALLOC_H +#define _PMALLOC_H + + +#include +#include + +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) + +/* + * Library for dynamic allocation of pools of memory that can be, + * after initialization, marked as read-only. + * + * This is intended to complement __read_only_after_init, for those cases + * where either it is not possible to know the initialization value before + * init is completed, or the amount of data is variable and can be + * determined only at run-time. + * + * ***WARNING*** + * The user of the API is expected to synchronize: + * 1) allocation, + * 2) writes to the allocated memory, + * 3) write protection of the pool, + * 4) freeing of the allocated memory, and + * 5) destruction of the pool. + * + * For a non-threaded scenario, this type of locking is not even required. + * + * Even if the library were to provide support for locking, point 2) + * would still depend on the user taking the lock. + */ + + +/** + * pmalloc_create_pool - create a new protectable memory pool - + * @name: the name of the pool, must be unique + * @min_alloc_order: log2 of the minimum allocation size obtainable + * from the pool + * + * Creates a new (empty) memory pool for allocation of protectable + * memory. Memory will be allocated upon request (through pmalloc). + * + * Returns a pointer to the new pool upon success, otherwise a NULL. + */ +struct gen_pool *pmalloc_create_pool(const char *name, +int min_alloc_order); + + +int is_pmalloc_object(const void *ptr, const unsigned long n); + +/** + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size + * @pool: handler to the pool to be used for memory allocation + * @size: amount of memory (in bytes) requested + * + * Prepares a chunk of the requested size. + * This is intended to both minimize latency in later memory requests and + * avoid sleping during allocation. + * Memory allocated with prealloc is stored in one single chunk, as + * opposite to what is allocated on-demand when pmalloc runs out of
[PATCH 4/6] Protectable Memory
The MMU available in many systems running Linux can often provide R/O protection to the memory pages it handles. However, the MMU-based protection works efficiently only when said pages contain exclusively data that will not need further modifications. Statically allocated variables can be segregated into a dedicated section, but this does not sit very well with dynamically allocated ones. Dynamic allocation does not provide, currently, any means for grouping variables in memory pages that would contain exclusively data suitable for conversion to read only access mode. The allocator here provided (pmalloc - protectable memory allocator) introduces the concept of pools of protectable memory. A module can request a pool and then refer any allocation request to the pool handler it has received. Once all the chunks of memory associated to a specific pool are initialized, the pool can be protected. After this point, the pool can only be destroyed (it is up to the module to avoid any further references to the memory from the pool, after the destruction is invoked). The latter case is mainly meant for releasing memory, when a module is unloaded. A module can have as many pools as needed, for example to support the protection of data that is initialized in sufficiently distinct phases. Signed-off-by: Igor Stoppa --- include/linux/genalloc.h | 3 + include/linux/pmalloc.h | 213 include/linux/vmalloc.h | 1 + lib/genalloc.c | 27 +++ mm/Makefile | 1 + mm/pmalloc.c | 514 +++ mm/usercopy.c| 25 ++- 7 files changed, 780 insertions(+), 4 deletions(-) create mode 100644 include/linux/pmalloc.h create mode 100644 mm/pmalloc.c diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index dcaa33e74b1c..b6c4cea9fbd8 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t, extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); + +extern void gen_pool_flush_chunk(struct gen_pool *pool, +struct gen_pool_chunk *chunk); extern void gen_pool_for_each_chunk(struct gen_pool *, void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); extern size_t gen_pool_avail(struct gen_pool *); diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h new file mode 100644 index ..5fa8a78be819 --- /dev/null +++ b/include/linux/pmalloc.h @@ -0,0 +1,213 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * pmalloc.h: Header for Protectable Memory Allocator + * + * (C) Copyright 2017 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#ifndef _PMALLOC_H +#define _PMALLOC_H + + +#include +#include + +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) + +/* + * Library for dynamic allocation of pools of memory that can be, + * after initialization, marked as read-only. + * + * This is intended to complement __read_only_after_init, for those cases + * where either it is not possible to know the initialization value before + * init is completed, or the amount of data is variable and can be + * determined only at run-time. + * + * ***WARNING*** + * The user of the API is expected to synchronize: + * 1) allocation, + * 2) writes to the allocated memory, + * 3) write protection of the pool, + * 4) freeing of the allocated memory, and + * 5) destruction of the pool. + * + * For a non-threaded scenario, this type of locking is not even required. + * + * Even if the library were to provide support for locking, point 2) + * would still depend on the user taking the lock. + */ + + +/** + * pmalloc_create_pool - create a new protectable memory pool - + * @name: the name of the pool, must be unique + * @min_alloc_order: log2 of the minimum allocation size obtainable + * from the pool + * + * Creates a new (empty) memory pool for allocation of protectable + * memory. Memory will be allocated upon request (through pmalloc). + * + * Returns a pointer to the new pool upon success, otherwise a NULL. + */ +struct gen_pool *pmalloc_create_pool(const char *name, +int min_alloc_order); + + +int is_pmalloc_object(const void *ptr, const unsigned long n); + +/** + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size + * @pool: handler to the pool to be used for memory allocation + * @size: amount of memory (in bytes) requested + * + * Prepares a chunk of the requested size. + * This is intended to both minimize latency in later memory requests and + * avoid sleping during allocation. + * Memory allocated with prealloc is stored in one single chunk, as + * opposite to what is allocated on-demand when pmalloc runs out of free + * space already existing in the pool and
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 04/02/18 00:29, Boris Lukashev wrote: > On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppawrote: [...] >> What you are suggesting, if I have understood it correctly, is that, >> when the pool is protected, the addresses already given out, will become >> traps that get resolved through a lookup table that is built based on >> the content of each allocation. >> >> That seems to generate a lot of overhead, not to mention the fact that >> it might not play very well with the MMU. > > That is effectively what i'm suggesting - as a form of protection for > consumers against direct reads of data which may have been corrupted > by some irrelevant means. In the context of pmalloc, it would probably > be a separate type of ro+verified pool ok, that seems more like an extension though. ATM I am having problems gaining traction to get even the basic merged :-) I would consider this as a possibility for future work, unless it is said that it's necessary for pmalloc to be accepted ... -- igor
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 04/02/18 00:29, Boris Lukashev wrote: > On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa wrote: [...] >> What you are suggesting, if I have understood it correctly, is that, >> when the pool is protected, the addresses already given out, will become >> traps that get resolved through a lookup table that is built based on >> the content of each allocation. >> >> That seems to generate a lot of overhead, not to mention the fact that >> it might not play very well with the MMU. > > That is effectively what i'm suggesting - as a form of protection for > consumers against direct reads of data which may have been corrupted > by some irrelevant means. In the context of pmalloc, it would probably > be a separate type of ro+verified pool ok, that seems more like an extension though. ATM I am having problems gaining traction to get even the basic merged :-) I would consider this as a possibility for future work, unless it is said that it's necessary for pmalloc to be accepted ... -- igor
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppawrote: > > > On 03/02/18 22:12, Boris Lukashev wrote: > >> Regarding the notion of validated protected memory, is there a method >> by which the resulting checksum could be used in a lookup >> table/function to resolve the location of the protected data? > > What I have in mind is a checksum at page/vmap_area level, so there > would be no 1:1 mapping between a specific allocation and the checksum. > > An extreme case would be the one where an allocation crosses one or more > page boundaries, while the checksum refers to a (partially) overlapping > memory area. > > Code accessing a pool could perform one (relatively expensive) > validation. But still something that would require a more sophisticated > attack, to subvert. > >> Effectively a hash table of protected allocations, with a benefit of >> dedup since any data matching the same key would be the same data >> (multiple identical cred structs being pushed around). Should leave >> the resolver address/csum in recent memory to check against, right? > > I see where you are trying to land, but I do not see how it would work > without a further intermediate step. > > pmalloc dishes out virtual memory addresses, when called. > > It doesn't know what the user of the allocation will put in it. > The user, otoh, has the direct address of the memory it got. > > What you are suggesting, if I have understood it correctly, is that, > when the pool is protected, the addresses already given out, will become > traps that get resolved through a lookup table that is built based on > the content of each allocation. > > That seems to generate a lot of overhead, not to mention the fact that > it might not play very well with the MMU. That is effectively what i'm suggesting - as a form of protection for consumers against direct reads of data which may have been corrupted by some irrelevant means. In the context of pmalloc, it would probably be a separate type of ro+verified pool which consumers would explicitly opt into. Say there's a maintenance cycle on a and it wants to make sure that the instructions it read in are what they should have been before running them, those consumers might well take the penalty if it keeps from doing . If such a resolver could be implemented in a manner which doesnt break all the things (including acceptable performance for at least a significant number of workloads), it might be useful as a general tool for handing out memory to userspace, even in rw, as it provides execution context in which other requirements can be forcibly resolved, preventing unauthorized access to pages the consumer shouldn't get in a very generic way. Spectre comes to mind as a potential class of issues to be addressed this way, since speculative load could be prevented if the resolution were to fail. > > If I misunderstood, then I'd need a step by step description of what > happens, because it's not clear to me how else the data would be > accessed if not through the address that was obtained when pmalloc was > invoked. > > -- > igor -- Boris Lukashev Systems Architect Semper Victus
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa wrote: > > > On 03/02/18 22:12, Boris Lukashev wrote: > >> Regarding the notion of validated protected memory, is there a method >> by which the resulting checksum could be used in a lookup >> table/function to resolve the location of the protected data? > > What I have in mind is a checksum at page/vmap_area level, so there > would be no 1:1 mapping between a specific allocation and the checksum. > > An extreme case would be the one where an allocation crosses one or more > page boundaries, while the checksum refers to a (partially) overlapping > memory area. > > Code accessing a pool could perform one (relatively expensive) > validation. But still something that would require a more sophisticated > attack, to subvert. > >> Effectively a hash table of protected allocations, with a benefit of >> dedup since any data matching the same key would be the same data >> (multiple identical cred structs being pushed around). Should leave >> the resolver address/csum in recent memory to check against, right? > > I see where you are trying to land, but I do not see how it would work > without a further intermediate step. > > pmalloc dishes out virtual memory addresses, when called. > > It doesn't know what the user of the allocation will put in it. > The user, otoh, has the direct address of the memory it got. > > What you are suggesting, if I have understood it correctly, is that, > when the pool is protected, the addresses already given out, will become > traps that get resolved through a lookup table that is built based on > the content of each allocation. > > That seems to generate a lot of overhead, not to mention the fact that > it might not play very well with the MMU. That is effectively what i'm suggesting - as a form of protection for consumers against direct reads of data which may have been corrupted by some irrelevant means. In the context of pmalloc, it would probably be a separate type of ro+verified pool which consumers would explicitly opt into. Say there's a maintenance cycle on a and it wants to make sure that the instructions it read in are what they should have been before running them, those consumers might well take the penalty if it keeps from doing . If such a resolver could be implemented in a manner which doesnt break all the things (including acceptable performance for at least a significant number of workloads), it might be useful as a general tool for handing out memory to userspace, even in rw, as it provides execution context in which other requirements can be forcibly resolved, preventing unauthorized access to pages the consumer shouldn't get in a very generic way. Spectre comes to mind as a potential class of issues to be addressed this way, since speculative load could be prevented if the resolution were to fail. > > If I misunderstood, then I'd need a step by step description of what > happens, because it's not clear to me how else the data would be > accessed if not through the address that was obtained when pmalloc was > invoked. > > -- > igor -- Boris Lukashev Systems Architect Semper Victus
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 03/02/18 22:12, Boris Lukashev wrote: > Regarding the notion of validated protected memory, is there a method > by which the resulting checksum could be used in a lookup > table/function to resolve the location of the protected data? What I have in mind is a checksum at page/vmap_area level, so there would be no 1:1 mapping between a specific allocation and the checksum. An extreme case would be the one where an allocation crosses one or more page boundaries, while the checksum refers to a (partially) overlapping memory area. Code accessing a pool could perform one (relatively expensive) validation. But still something that would require a more sophisticated attack, to subvert. > Effectively a hash table of protected allocations, with a benefit of > dedup since any data matching the same key would be the same data > (multiple identical cred structs being pushed around). Should leave > the resolver address/csum in recent memory to check against, right? I see where you are trying to land, but I do not see how it would work without a further intermediate step. pmalloc dishes out virtual memory addresses, when called. It doesn't know what the user of the allocation will put in it. The user, otoh, has the direct address of the memory it got. What you are suggesting, if I have understood it correctly, is that, when the pool is protected, the addresses already given out, will become traps that get resolved through a lookup table that is built based on the content of each allocation. That seems to generate a lot of overhead, not to mention the fact that it might not play very well with the MMU. If I misunderstood, then I'd need a step by step description of what happens, because it's not clear to me how else the data would be accessed if not through the address that was obtained when pmalloc was invoked. -- igor
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 03/02/18 22:12, Boris Lukashev wrote: > Regarding the notion of validated protected memory, is there a method > by which the resulting checksum could be used in a lookup > table/function to resolve the location of the protected data? What I have in mind is a checksum at page/vmap_area level, so there would be no 1:1 mapping between a specific allocation and the checksum. An extreme case would be the one where an allocation crosses one or more page boundaries, while the checksum refers to a (partially) overlapping memory area. Code accessing a pool could perform one (relatively expensive) validation. But still something that would require a more sophisticated attack, to subvert. > Effectively a hash table of protected allocations, with a benefit of > dedup since any data matching the same key would be the same data > (multiple identical cred structs being pushed around). Should leave > the resolver address/csum in recent memory to check against, right? I see where you are trying to land, but I do not see how it would work without a further intermediate step. pmalloc dishes out virtual memory addresses, when called. It doesn't know what the user of the allocation will put in it. The user, otoh, has the direct address of the memory it got. What you are suggesting, if I have understood it correctly, is that, when the pool is protected, the addresses already given out, will become traps that get resolved through a lookup table that is built based on the content of each allocation. That seems to generate a lot of overhead, not to mention the fact that it might not play very well with the MMU. If I misunderstood, then I'd need a step by step description of what happens, because it's not clear to me how else the data would be accessed if not through the address that was obtained when pmalloc was invoked. -- igor
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Sat, Feb 3, 2018 at 2:57 PM, Igor Stoppawrote: >>> On Thu, 25 Jan 2018, Matthew Wilcox wrote: > It's worth having a discussion about whether we want the pmalloc API or whether we want a slab-based API. > I'd love to have some feedback specifically about the API. > > I have also some idea about userspace and how to extend the pmalloc > concept to it: > > http://www.openwall.com/lists/kernel-hardening/2018/01/30/20 > > I'll be AFK intermittently for about 2 weeks, so i might not be able to > reply immediately, but from my perspective this would be just the > beginning of a broader hardening of both kernel and userspace that I'd > like to pursue. > > -- > igor Regarding the notion of validated protected memory, is there a method by which the resulting checksum could be used in a lookup table/function to resolve the location of the protected data? Effectively a hash table of protected allocations, with a benefit of dedup since any data matching the same key would be the same data (multiple identical cred structs being pushed around). Should leave the resolver address/csum in recent memory to check against, right? -- Boris Lukashev Systems Architect Semper Victus
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Sat, Feb 3, 2018 at 2:57 PM, Igor Stoppa wrote: >>> On Thu, 25 Jan 2018, Matthew Wilcox wrote: > It's worth having a discussion about whether we want the pmalloc API or whether we want a slab-based API. > I'd love to have some feedback specifically about the API. > > I have also some idea about userspace and how to extend the pmalloc > concept to it: > > http://www.openwall.com/lists/kernel-hardening/2018/01/30/20 > > I'll be AFK intermittently for about 2 weeks, so i might not be able to > reply immediately, but from my perspective this would be just the > beginning of a broader hardening of both kernel and userspace that I'd > like to pursue. > > -- > igor Regarding the notion of validated protected memory, is there a method by which the resulting checksum could be used in a lookup table/function to resolve the location of the protected data? Effectively a hash table of protected allocations, with a benefit of dedup since any data matching the same key would be the same data (multiple identical cred structs being pushed around). Should leave the resolver address/csum in recent memory to check against, right? -- Boris Lukashev Systems Architect Semper Victus
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
>> On Thu, 25 Jan 2018, Matthew Wilcox wrote: >>> It's worth having a discussion about whether we want the pmalloc API >>> or whether we want a slab-based API. I'd love to have some feedback specifically about the API. I have also some idea about userspace and how to extend the pmalloc concept to it: http://www.openwall.com/lists/kernel-hardening/2018/01/30/20 I'll be AFK intermittently for about 2 weeks, so i might not be able to reply immediately, but from my perspective this would be just the beginning of a broader hardening of both kernel and userspace that I'd like to pursue. -- igor
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
>> On Thu, 25 Jan 2018, Matthew Wilcox wrote: >>> It's worth having a discussion about whether we want the pmalloc API >>> or whether we want a slab-based API. I'd love to have some feedback specifically about the API. I have also some idea about userspace and how to extend the pmalloc concept to it: http://www.openwall.com/lists/kernel-hardening/2018/01/30/20 I'll be AFK intermittently for about 2 weeks, so i might not be able to reply immediately, but from my perspective this would be just the beginning of a broader hardening of both kernel and userspace that I'd like to pursue. -- igor
[PATCH 4/6] Protectable Memory
The MMU available in many systems running Linux can often provide R/O protection to the memory pages it handles. However, the MMU-based protection works efficiently only when said pages contain exclusively data that will not need further modifications. Statically allocated variables can be segregated into a dedicated section, but this does not sit very well with dynamically allocated ones. Dynamic allocation does not provide, currently, any means for grouping variables in memory pages that would contain exclusively data suitable for conversion to read only access mode. The allocator here provided (pmalloc - protectable memory allocator) introduces the concept of pools of protectable memory. A module can request a pool and then refer any allocation request to the pool handler it has received. Once all the chunks of memory associated to a specific pool are initialized, the pool can be protected. After this point, the pool can only be destroyed (it is up to the module to avoid any further references to the memory from the pool, after the destruction is invoked). The latter case is mainly meant for releasing memory, when a module is unloaded. A module can have as many pools as needed, for example to support the protection of data that is initialized in sufficiently distinct phases. Signed-off-by: Igor Stoppa--- include/linux/genalloc.h | 3 + include/linux/pmalloc.h | 211 +++ include/linux/vmalloc.h | 1 + lib/genalloc.c | 27 +++ mm/Makefile | 1 + mm/pmalloc.c | 514 +++ mm/usercopy.c| 25 ++- 7 files changed, 778 insertions(+), 4 deletions(-) create mode 100644 include/linux/pmalloc.h create mode 100644 mm/pmalloc.c diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index dcaa33e..b6c4cea 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t, extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); + +extern void gen_pool_flush_chunk(struct gen_pool *pool, +struct gen_pool_chunk *chunk); extern void gen_pool_for_each_chunk(struct gen_pool *, void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); extern size_t gen_pool_avail(struct gen_pool *); diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h new file mode 100644 index 000..6d4a24e --- /dev/null +++ b/include/linux/pmalloc.h @@ -0,0 +1,211 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * pmalloc.h: Header for Protectable Memory Allocator + * + * (C) Copyright 2017 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#ifndef _PMALLOC_H +#define _PMALLOC_H + + +#include +#include + +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) + +/* + * Library for dynamic allocation of pools of memory that can be, + * after initialization, marked as read-only. + * + * This is intended to complement __read_only_after_init, for those cases + * where either it is not possible to know the initialization value before + * init is completed, or the amount of data is variable and can be + * determined only at run-time. + * + * ***WARNING*** + * The user of the API is expected to synchronize: + * 1) allocation, + * 2) writes to the allocated memory, + * 3) write protection of the pool, + * 4) freeing of the allocated memory, and + * 5) destruction of the pool. + * + * For a non-threaded scenario, this type of locking is not even required. + * + * Even if the library were to provide support for locking, point 2) + * would still depend on the user taking the lock. + */ + + +/** + * pmalloc_create_pool - create a new protectable memory pool - + * @name: the name of the pool, must be unique + * @min_alloc_order: log2 of the minimum allocation size obtainable + * from the pool + * + * Creates a new (empty) memory pool for allocation of protectable + * memory. Memory will be allocated upon request (through pmalloc). + * + * Returns a pointer to the new pool upon success, otherwise a NULL. + */ +struct gen_pool *pmalloc_create_pool(const char *name, +int min_alloc_order); + + +int is_pmalloc_object(const void *ptr, const unsigned long n); + +/** + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size + * @pool: handler to the pool to be used for memory allocation + * @size: amount of memory (in bytes) requested + * + * Prepares a chunk of the requested size. + * This is intended to both minimize latency in later memory requests and + * avoid sleping during allocation. + * Memory allocated with prealloc is stored in one single chunk, as + * opposite to what is allocated on-demand when pmalloc runs out of free + * space
[PATCH 4/6] Protectable Memory
The MMU available in many systems running Linux can often provide R/O protection to the memory pages it handles. However, the MMU-based protection works efficiently only when said pages contain exclusively data that will not need further modifications. Statically allocated variables can be segregated into a dedicated section, but this does not sit very well with dynamically allocated ones. Dynamic allocation does not provide, currently, any means for grouping variables in memory pages that would contain exclusively data suitable for conversion to read only access mode. The allocator here provided (pmalloc - protectable memory allocator) introduces the concept of pools of protectable memory. A module can request a pool and then refer any allocation request to the pool handler it has received. Once all the chunks of memory associated to a specific pool are initialized, the pool can be protected. After this point, the pool can only be destroyed (it is up to the module to avoid any further references to the memory from the pool, after the destruction is invoked). The latter case is mainly meant for releasing memory, when a module is unloaded. A module can have as many pools as needed, for example to support the protection of data that is initialized in sufficiently distinct phases. Signed-off-by: Igor Stoppa --- include/linux/genalloc.h | 3 + include/linux/pmalloc.h | 211 +++ include/linux/vmalloc.h | 1 + lib/genalloc.c | 27 +++ mm/Makefile | 1 + mm/pmalloc.c | 514 +++ mm/usercopy.c| 25 ++- 7 files changed, 778 insertions(+), 4 deletions(-) create mode 100644 include/linux/pmalloc.h create mode 100644 mm/pmalloc.c diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index dcaa33e..b6c4cea 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t, extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); + +extern void gen_pool_flush_chunk(struct gen_pool *pool, +struct gen_pool_chunk *chunk); extern void gen_pool_for_each_chunk(struct gen_pool *, void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); extern size_t gen_pool_avail(struct gen_pool *); diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h new file mode 100644 index 000..6d4a24e --- /dev/null +++ b/include/linux/pmalloc.h @@ -0,0 +1,211 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * pmalloc.h: Header for Protectable Memory Allocator + * + * (C) Copyright 2017 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#ifndef _PMALLOC_H +#define _PMALLOC_H + + +#include +#include + +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) + +/* + * Library for dynamic allocation of pools of memory that can be, + * after initialization, marked as read-only. + * + * This is intended to complement __read_only_after_init, for those cases + * where either it is not possible to know the initialization value before + * init is completed, or the amount of data is variable and can be + * determined only at run-time. + * + * ***WARNING*** + * The user of the API is expected to synchronize: + * 1) allocation, + * 2) writes to the allocated memory, + * 3) write protection of the pool, + * 4) freeing of the allocated memory, and + * 5) destruction of the pool. + * + * For a non-threaded scenario, this type of locking is not even required. + * + * Even if the library were to provide support for locking, point 2) + * would still depend on the user taking the lock. + */ + + +/** + * pmalloc_create_pool - create a new protectable memory pool - + * @name: the name of the pool, must be unique + * @min_alloc_order: log2 of the minimum allocation size obtainable + * from the pool + * + * Creates a new (empty) memory pool for allocation of protectable + * memory. Memory will be allocated upon request (through pmalloc). + * + * Returns a pointer to the new pool upon success, otherwise a NULL. + */ +struct gen_pool *pmalloc_create_pool(const char *name, +int min_alloc_order); + + +int is_pmalloc_object(const void *ptr, const unsigned long n); + +/** + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size + * @pool: handler to the pool to be used for memory allocation + * @size: amount of memory (in bytes) requested + * + * Prepares a chunk of the requested size. + * This is intended to both minimize latency in later memory requests and + * avoid sleping during allocation. + * Memory allocated with prealloc is stored in one single chunk, as + * opposite to what is allocated on-demand when pmalloc runs out of free + * space already existing in the pool and has to invoke
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
+Boris Lukashev On 02/02/18 20:39, Christopher Lameter wrote: > On Thu, 25 Jan 2018, Matthew Wilcox wrote: > >> It's worth having a discussion about whether we want the pmalloc API >> or whether we want a slab-based API. We can have a separate discussion >> about an API to remove pages from the physmap. > > We could even do this in a more thorough way. Can we use a ring 1 / 2 > distinction to create a hardened OS core that policies the rest of > the ever expanding kernel with all its modules and this and that feature? What would be the differentiating criteria? Furthermore, what are the chances of invalidating the entire concept, because there is already an hypervisor using the higher level features? That is what you are proposing, if I understand correctly. But more on this below ... > I think that will long term be a better approach and allow more than the > current hardening approaches can get you. It seems that we are willing to > tolerate significant performance regressions now. So lets use the > protection mechanisms that the hardware offers. I would rather *not* propose significant performance regression :-P There might be some one-off case or anyway rare event which is penalized, but my preference goes to not introducing any significant performance penalty, during regular use. After all, the lower the penalty, the wider the (potential) adoption. More in detail: there are 2 major cases for wanting some form of read-only protection. 1) extra ward against accidental corruption The kernel provides many debugging tools and they can detect lots of errors during development, but they require time and knowledge to use them, which are not always available. Furthermore, it is objectively true that not all the code has the same level of maturity, especially when non-upstream code is used in some custom product. It's not my main goal, but it would be nice if that case too could be addressed by the protection. Corruption *can* happen. Having live guards against it, will definitely help spotting bugs or, at the very least, crash/reboot a device before it can cause permanent data corruption. Protection against accidental corruption should be used as widely as possible, therefore it cannot have an high price tag, in terms of lost performance. Otherwise, there's the risk that it will be just a debug feature, more like lockdep or ubsan. 2) protection against malicious attacks This is harder, of course, but what is realistically to be expected? If an attacker can gain full control of the kernel, the only way to do damage control is to have HW and/or higher privilege SW that can somehow limit the reach of the attacker. To make it work for real, it should be mandated that either these extra HW/SW means can tell apart legitimate kernel activity from rogue actions, or they operate so independently from the kernel that a compromise kernel cannot use any API to influence them. The consensus seems to be to put aside (for now) this concern and instead focus on what is a typical scenario: - some bug is found that allows to read/write kernel memory - some other bug is found, which leaks the address of a well known variable, effectively revealing the randomized offset of each symbol placed in linear memory, once their relative location is known. What is described above is a toolkit that effectively can allow - with patience - to attack anything that is writable by the kernel. Including page tables and permissions. However the typical attack is more like: "let's flip some bit(s)". Which is where __ro_after_init has its purpose to exist. My proposal is to extend the same sort of protection also to variables allocated dynamically. * make the pages read only, once the data is initialized * use vmalloc to prevent that exfiltrating the address of an unrelated variable can easily give away the location of the real target, because of the individual page mapping vs linear mapping. Boris Lukashev proposed additional hardening, when accessing a certain variable, in the form of hash/checksum, but I could not come up with an implementation that did not have too much overhead. Re-considering this, one option would be to have a function "pool_validate()" - probably expensive - that could be invoked by a piece of code before using the data from the pool. Not perfect, because it would not be atomic, but it could be used once, at the beginning of a function, without adding overhead to each access to the pool that the function would perform. An attacker would have to time the attack so that the corruption of the data wold happen after the pool is validated and before the data is read from it. Possible, but way tricker than the current unprotected situation. What I am trying to say, is that even after having multi-ring implementation (which would be more dependent on HW features), there would be still the problem of validating the legitimacy of the use of the API that such implementation would expose. I'd rather try to
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
+Boris Lukashev On 02/02/18 20:39, Christopher Lameter wrote: > On Thu, 25 Jan 2018, Matthew Wilcox wrote: > >> It's worth having a discussion about whether we want the pmalloc API >> or whether we want a slab-based API. We can have a separate discussion >> about an API to remove pages from the physmap. > > We could even do this in a more thorough way. Can we use a ring 1 / 2 > distinction to create a hardened OS core that policies the rest of > the ever expanding kernel with all its modules and this and that feature? What would be the differentiating criteria? Furthermore, what are the chances of invalidating the entire concept, because there is already an hypervisor using the higher level features? That is what you are proposing, if I understand correctly. But more on this below ... > I think that will long term be a better approach and allow more than the > current hardening approaches can get you. It seems that we are willing to > tolerate significant performance regressions now. So lets use the > protection mechanisms that the hardware offers. I would rather *not* propose significant performance regression :-P There might be some one-off case or anyway rare event which is penalized, but my preference goes to not introducing any significant performance penalty, during regular use. After all, the lower the penalty, the wider the (potential) adoption. More in detail: there are 2 major cases for wanting some form of read-only protection. 1) extra ward against accidental corruption The kernel provides many debugging tools and they can detect lots of errors during development, but they require time and knowledge to use them, which are not always available. Furthermore, it is objectively true that not all the code has the same level of maturity, especially when non-upstream code is used in some custom product. It's not my main goal, but it would be nice if that case too could be addressed by the protection. Corruption *can* happen. Having live guards against it, will definitely help spotting bugs or, at the very least, crash/reboot a device before it can cause permanent data corruption. Protection against accidental corruption should be used as widely as possible, therefore it cannot have an high price tag, in terms of lost performance. Otherwise, there's the risk that it will be just a debug feature, more like lockdep or ubsan. 2) protection against malicious attacks This is harder, of course, but what is realistically to be expected? If an attacker can gain full control of the kernel, the only way to do damage control is to have HW and/or higher privilege SW that can somehow limit the reach of the attacker. To make it work for real, it should be mandated that either these extra HW/SW means can tell apart legitimate kernel activity from rogue actions, or they operate so independently from the kernel that a compromise kernel cannot use any API to influence them. The consensus seems to be to put aside (for now) this concern and instead focus on what is a typical scenario: - some bug is found that allows to read/write kernel memory - some other bug is found, which leaks the address of a well known variable, effectively revealing the randomized offset of each symbol placed in linear memory, once their relative location is known. What is described above is a toolkit that effectively can allow - with patience - to attack anything that is writable by the kernel. Including page tables and permissions. However the typical attack is more like: "let's flip some bit(s)". Which is where __ro_after_init has its purpose to exist. My proposal is to extend the same sort of protection also to variables allocated dynamically. * make the pages read only, once the data is initialized * use vmalloc to prevent that exfiltrating the address of an unrelated variable can easily give away the location of the real target, because of the individual page mapping vs linear mapping. Boris Lukashev proposed additional hardening, when accessing a certain variable, in the form of hash/checksum, but I could not come up with an implementation that did not have too much overhead. Re-considering this, one option would be to have a function "pool_validate()" - probably expensive - that could be invoked by a piece of code before using the data from the pool. Not perfect, because it would not be atomic, but it could be used once, at the beginning of a function, without adding overhead to each access to the pool that the function would perform. An attacker would have to time the attack so that the corruption of the data wold happen after the pool is validated and before the data is read from it. Possible, but way tricker than the current unprotected situation. What I am trying to say, is that even after having multi-ring implementation (which would be more dependent on HW features), there would be still the problem of validating the legitimacy of the use of the API that such implementation would expose. I'd rather try to
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Thu, 25 Jan 2018, Matthew Wilcox wrote: > It's worth having a discussion about whether we want the pmalloc API > or whether we want a slab-based API. We can have a separate discussion > about an API to remove pages from the physmap. We could even do this in a more thorough way. Can we use a ring 1 / 2 distinction to create a hardened OS core that policies the rest of the ever expanding kernel with all its modules and this and that feature? I think that will long term be a better approach and allow more than the current hardening approaches can get you. It seems that we are willing to tolerate significant performance regressions now. So lets use the protection mechanisms that the hardware offers.
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Thu, 25 Jan 2018, Matthew Wilcox wrote: > It's worth having a discussion about whether we want the pmalloc API > or whether we want a slab-based API. We can have a separate discussion > about an API to remove pages from the physmap. We could even do this in a more thorough way. Can we use a ring 1 / 2 distinction to create a hardened OS core that policies the rest of the ever expanding kernel with all its modules and this and that feature? I think that will long term be a better approach and allow more than the current hardening approaches can get you. It seems that we are willing to tolerate significant performance regressions now. So lets use the protection mechanisms that the hardware offers.
Re: [PATCH 4/6] Protectable Memory
Hi Igor, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.15] [cannot apply to next-20180201] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437 config: i386-tinyconfig (attached as .config) compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): mm/pmalloc.o: In function `pmalloc_pool_show_chunks': >> pmalloc.c:(.text+0x50): undefined reference to `gen_pool_for_each_chunk' mm/pmalloc.o: In function `pmalloc_pool_show_size': >> pmalloc.c:(.text+0x6e): undefined reference to `gen_pool_size' mm/pmalloc.o: In function `pmalloc_pool_show_avail': >> pmalloc.c:(.text+0x8a): undefined reference to `gen_pool_avail' mm/pmalloc.o: In function `pmalloc_chunk_free': >> pmalloc.c:(.text+0x171): undefined reference to `gen_pool_flush_chunk' mm/pmalloc.o: In function `pmalloc_create_pool': >> pmalloc.c:(.text+0x19b): undefined reference to `gen_pool_create' >> pmalloc.c:(.text+0x2bb): undefined reference to `gen_pool_destroy' mm/pmalloc.o: In function `pmalloc_prealloc': >> pmalloc.c:(.text+0x350): undefined reference to `gen_pool_add_virt' mm/pmalloc.o: In function `pmalloc': >> pmalloc.c:(.text+0x3a7): undefined reference to `gen_pool_alloc' pmalloc.c:(.text+0x3f1): undefined reference to `gen_pool_add_virt' pmalloc.c:(.text+0x401): undefined reference to `gen_pool_alloc' mm/pmalloc.o: In function `pmalloc_destroy_pool': pmalloc.c:(.text+0x4a1): undefined reference to `gen_pool_for_each_chunk' pmalloc.c:(.text+0x4a8): undefined reference to `gen_pool_destroy' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 4/6] Protectable Memory
Hi Igor, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.15] [cannot apply to next-20180201] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437 config: i386-tinyconfig (attached as .config) compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): mm/pmalloc.o: In function `pmalloc_pool_show_chunks': >> pmalloc.c:(.text+0x50): undefined reference to `gen_pool_for_each_chunk' mm/pmalloc.o: In function `pmalloc_pool_show_size': >> pmalloc.c:(.text+0x6e): undefined reference to `gen_pool_size' mm/pmalloc.o: In function `pmalloc_pool_show_avail': >> pmalloc.c:(.text+0x8a): undefined reference to `gen_pool_avail' mm/pmalloc.o: In function `pmalloc_chunk_free': >> pmalloc.c:(.text+0x171): undefined reference to `gen_pool_flush_chunk' mm/pmalloc.o: In function `pmalloc_create_pool': >> pmalloc.c:(.text+0x19b): undefined reference to `gen_pool_create' >> pmalloc.c:(.text+0x2bb): undefined reference to `gen_pool_destroy' mm/pmalloc.o: In function `pmalloc_prealloc': >> pmalloc.c:(.text+0x350): undefined reference to `gen_pool_add_virt' mm/pmalloc.o: In function `pmalloc': >> pmalloc.c:(.text+0x3a7): undefined reference to `gen_pool_alloc' pmalloc.c:(.text+0x3f1): undefined reference to `gen_pool_add_virt' pmalloc.c:(.text+0x401): undefined reference to `gen_pool_alloc' mm/pmalloc.o: In function `pmalloc_destroy_pool': pmalloc.c:(.text+0x4a1): undefined reference to `gen_pool_for_each_chunk' pmalloc.c:(.text+0x4a8): undefined reference to `gen_pool_destroy' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 4/6] Protectable Memory
Hi Igor, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.15] [cannot apply to next-20180201] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437 config: i386-randconfig-x071-201804 (attached as .config) compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025 reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): mm/pmalloc.c: In function 'pmalloc_pool_show_avail': >> mm/pmalloc.c:71:25: warning: format '%lu' expects argument of type 'long >> unsigned int', but argument 3 has type 'size_t {aka unsigned int}' >> [-Wformat=] return sprintf(buf, "%lu\n", gen_pool_avail(data->pool)); ~~^ ~~ %u mm/pmalloc.c: In function 'pmalloc_pool_show_size': mm/pmalloc.c:81:25: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=] return sprintf(buf, "%lu\n", gen_pool_size(data->pool)); ~~^ ~ %u vim +71 mm/pmalloc.c 63 64 static ssize_t pmalloc_pool_show_avail(struct kobject *dev, 65 struct kobj_attribute *attr, 66 char *buf) 67 { 68 struct pmalloc_data *data; 69 70 data = container_of(attr, struct pmalloc_data, attr_avail); > 71 return sprintf(buf, "%lu\n", gen_pool_avail(data->pool)); 72 } 73 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 4/6] Protectable Memory
Hi Igor, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.15] [cannot apply to next-20180201] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437 config: i386-randconfig-x071-201804 (attached as .config) compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025 reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): mm/pmalloc.c: In function 'pmalloc_pool_show_avail': >> mm/pmalloc.c:71:25: warning: format '%lu' expects argument of type 'long >> unsigned int', but argument 3 has type 'size_t {aka unsigned int}' >> [-Wformat=] return sprintf(buf, "%lu\n", gen_pool_avail(data->pool)); ~~^ ~~ %u mm/pmalloc.c: In function 'pmalloc_pool_show_size': mm/pmalloc.c:81:25: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=] return sprintf(buf, "%lu\n", gen_pool_size(data->pool)); ~~^ ~ %u vim +71 mm/pmalloc.c 63 64 static ssize_t pmalloc_pool_show_avail(struct kobject *dev, 65 struct kobj_attribute *attr, 66 char *buf) 67 { 68 struct pmalloc_data *data; 69 70 data = container_of(attr, struct pmalloc_data, attr_avail); > 71 return sprintf(buf, "%lu\n", gen_pool_avail(data->pool)); 72 } 73 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[PATCH 4/6] Protectable Memory
The MMU available in many systems running Linux can often provide R/O protection to the memory pages it handles. However, the MMU-based protection works efficiently only when said pages contain exclusively data that will not need further modifications. Statically allocated variables can be segregated into a dedicated section, but this does not sit very well with dynamically allocated ones. Dynamic allocation does not provide, currently, any means for grouping variables in memory pages that would contain exclusively data suitable for conversion to read only access mode. The allocator here provided (pmalloc - protectable memory allocator) introduces the concept of pools of protectable memory. A module can request a pool and then refer any allocation request to the pool handler it has received. Once all the chunks of memory associated to a specific pool are initialized, the pool can be protected. After this point, the pool can only be destroyed (it is up to the module to avoid any further references to the memory from the pool, after the destruction is invoked). The latter case is mainly meant for releasing memory, when a module is unloaded. A module can have as many pools as needed, for example to support the protection of data that is initialized in sufficiently distinct phases. Signed-off-by: Igor Stoppa--- include/linux/genalloc.h | 3 + include/linux/pmalloc.h | 216 include/linux/vmalloc.h | 1 + lib/genalloc.c | 27 +++ mm/Makefile | 1 + mm/pmalloc.c | 513 +++ mm/usercopy.c| 25 ++- 7 files changed, 782 insertions(+), 4 deletions(-) create mode 100644 include/linux/pmalloc.h create mode 100644 mm/pmalloc.c diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index 0377681..a486a26 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t, extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); + +extern void gen_pool_flush_chunk(struct gen_pool *pool, +struct gen_pool_chunk *chunk); extern void gen_pool_for_each_chunk(struct gen_pool *, void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); extern size_t gen_pool_avail(struct gen_pool *); diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h new file mode 100644 index 000..ad7d557 --- /dev/null +++ b/include/linux/pmalloc.h @@ -0,0 +1,216 @@ +/* + * pmalloc.h: Header for Protectable Memory Allocator + * + * (C) Copyright 2017 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#ifndef _PMALLOC_H +#define _PMALLOC_H + + +#include +#include +#include + +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) + +/* + * Library for dynamic allocation of pools of memory that can be, + * after initialization, marked as read-only. + * + * This is intended to complement __read_only_after_init, for those cases + * where either it is not possible to know the initialization value before + * init is completed, or the amount of data is variable and can be + * determined only at run-time. + * + * ***WARNING*** + * The user of the API is expected to synchronize: + * 1) allocation, + * 2) writes to the allocated memory, + * 3) write protection of the pool, + * 4) freeing of the allocated memory, and + * 5) destruction of the pool. + * + * For a non-threaded scenario, this type of locking is not even required. + * + * Even if the library were to provide support for locking, point 2) + * would still depend on the user taking the lock. + */ + + +/** + * pmalloc_create_pool - create a new protectable memory pool - + * @name: the name of the pool, must be unique + * @min_alloc_order: log2 of the minimum allocation size obtainable + * from the pool + * + * Creates a new (empty) memory pool for allocation of protectable + * memory. Memory will be allocated upon request (through pmalloc). + * + * Returns a pointer to the new pool upon success, otherwise a NULL. + */ +struct gen_pool *pmalloc_create_pool(const char *name, +int min_alloc_order); + + +int is_pmalloc_object(const void *ptr, const unsigned long n); + +/** + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size + * @pool: handler to the pool to be used for memory allocation + * @size: amount of memory (in bytes) requested + * + * Prepares a chunk of the requested size. + * This is intended to both minimize latency in later memory requests and + *
[PATCH 4/6] Protectable Memory
The MMU available in many systems running Linux can often provide R/O protection to the memory pages it handles. However, the MMU-based protection works efficiently only when said pages contain exclusively data that will not need further modifications. Statically allocated variables can be segregated into a dedicated section, but this does not sit very well with dynamically allocated ones. Dynamic allocation does not provide, currently, any means for grouping variables in memory pages that would contain exclusively data suitable for conversion to read only access mode. The allocator here provided (pmalloc - protectable memory allocator) introduces the concept of pools of protectable memory. A module can request a pool and then refer any allocation request to the pool handler it has received. Once all the chunks of memory associated to a specific pool are initialized, the pool can be protected. After this point, the pool can only be destroyed (it is up to the module to avoid any further references to the memory from the pool, after the destruction is invoked). The latter case is mainly meant for releasing memory, when a module is unloaded. A module can have as many pools as needed, for example to support the protection of data that is initialized in sufficiently distinct phases. Signed-off-by: Igor Stoppa --- include/linux/genalloc.h | 3 + include/linux/pmalloc.h | 216 include/linux/vmalloc.h | 1 + lib/genalloc.c | 27 +++ mm/Makefile | 1 + mm/pmalloc.c | 513 +++ mm/usercopy.c| 25 ++- 7 files changed, 782 insertions(+), 4 deletions(-) create mode 100644 include/linux/pmalloc.h create mode 100644 mm/pmalloc.c diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index 0377681..a486a26 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t, extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); + +extern void gen_pool_flush_chunk(struct gen_pool *pool, +struct gen_pool_chunk *chunk); extern void gen_pool_for_each_chunk(struct gen_pool *, void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); extern size_t gen_pool_avail(struct gen_pool *); diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h new file mode 100644 index 000..ad7d557 --- /dev/null +++ b/include/linux/pmalloc.h @@ -0,0 +1,216 @@ +/* + * pmalloc.h: Header for Protectable Memory Allocator + * + * (C) Copyright 2017 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#ifndef _PMALLOC_H +#define _PMALLOC_H + + +#include +#include +#include + +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) + +/* + * Library for dynamic allocation of pools of memory that can be, + * after initialization, marked as read-only. + * + * This is intended to complement __read_only_after_init, for those cases + * where either it is not possible to know the initialization value before + * init is completed, or the amount of data is variable and can be + * determined only at run-time. + * + * ***WARNING*** + * The user of the API is expected to synchronize: + * 1) allocation, + * 2) writes to the allocated memory, + * 3) write protection of the pool, + * 4) freeing of the allocated memory, and + * 5) destruction of the pool. + * + * For a non-threaded scenario, this type of locking is not even required. + * + * Even if the library were to provide support for locking, point 2) + * would still depend on the user taking the lock. + */ + + +/** + * pmalloc_create_pool - create a new protectable memory pool - + * @name: the name of the pool, must be unique + * @min_alloc_order: log2 of the minimum allocation size obtainable + * from the pool + * + * Creates a new (empty) memory pool for allocation of protectable + * memory. Memory will be allocated upon request (through pmalloc). + * + * Returns a pointer to the new pool upon success, otherwise a NULL. + */ +struct gen_pool *pmalloc_create_pool(const char *name, +int min_alloc_order); + + +int is_pmalloc_object(const void *ptr, const unsigned long n); + +/** + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size + * @pool: handler to the pool to be used for memory allocation + * @size: amount of memory (in bytes) requested + * + * Prepares a chunk of the requested size. + * This is intended to both minimize latency in later memory requests and + * avoid sleping during allocation. + * Memory
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 26/01/18 18:36, Boris Lukashev wrote: > I like the idea of making the verification call optional for consumers > allowing for fast/slow+hard paths depending on their needs. > Cant see any additional vectors for abuse (other than the original > ones effecting out-of-band modification) introduced by having > verify/normal callers, but i've not had enough coffee yet. Any access > races or things like that come to mind for anyone? Well, the devil is in the details. In this case, the question is how to perform the verification in a way that is sufficiently robust against races. After thinking about it for a while, I doubt it can be done reliably. It might work for some small data types, but the typical use case I have found myself dealing with, is protecting data structures. That also brings up a separate problem: what would be the size of data to hash? At one extreme there is a page, but it's probably too much, so what is the correct size? it cannot be smaller than a specific allocation, however that would imply looking for the hash related to the data being accessed, with extra overhead. And the data being accessed might be a field in a struct, for which we would not have any hash. There would be a hash only for the containing struct that was allocated ... Overall, it seems a good idea in theory, but when I think about its implementation, it seems like the overhead is so big that it would discourage its use for almost any practical purpose. If one really wants to be paranoid could, otoh have redundancy in a different pool. -- igor
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 26/01/18 18:36, Boris Lukashev wrote: > I like the idea of making the verification call optional for consumers > allowing for fast/slow+hard paths depending on their needs. > Cant see any additional vectors for abuse (other than the original > ones effecting out-of-band modification) introduced by having > verify/normal callers, but i've not had enough coffee yet. Any access > races or things like that come to mind for anyone? Well, the devil is in the details. In this case, the question is how to perform the verification in a way that is sufficiently robust against races. After thinking about it for a while, I doubt it can be done reliably. It might work for some small data types, but the typical use case I have found myself dealing with, is protecting data structures. That also brings up a separate problem: what would be the size of data to hash? At one extreme there is a page, but it's probably too much, so what is the correct size? it cannot be smaller than a specific allocation, however that would imply looking for the hash related to the data being accessed, with extra overhead. And the data being accessed might be a field in a struct, for which we would not have any hash. There would be a hash only for the containing struct that was allocated ... Overall, it seems a good idea in theory, but when I think about its implementation, it seems like the overhead is so big that it would discourage its use for almost any practical purpose. If one really wants to be paranoid could, otoh have redundancy in a different pool. -- igor
Re: [PATCH 4/6] Protectable Memory
On 24/01/18 19:56, Igor Stoppa wrote: [...] > +bool pmalloc_prealloc(struct gen_pool *pool, size_t size) > +{ [...] > +abort: > + vfree(chunk); this should be vfree_atomic() [...] > +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp) > +{ [...] > +free: > + vfree(chunk); and this one too I will fix them in the next iteration. I am waiting to see if any more comments arrive. Otherwise, I'll send it out probably next Tuesday. -- igor
Re: [PATCH 4/6] Protectable Memory
On 24/01/18 19:56, Igor Stoppa wrote: [...] > +bool pmalloc_prealloc(struct gen_pool *pool, size_t size) > +{ [...] > +abort: > + vfree(chunk); this should be vfree_atomic() [...] > +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp) > +{ [...] > +free: > + vfree(chunk); and this one too I will fix them in the next iteration. I am waiting to see if any more comments arrive. Otherwise, I'll send it out probably next Tuesday. -- igor
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Fri, Jan 26, 2018 at 7:28 AM, Igor Stoppawrote: > On 25/01/18 17:38, Jerome Glisse wrote: >> On Thu, Jan 25, 2018 at 10:14:28AM -0500, Boris Lukashev wrote: >>> On Thu, Jan 25, 2018 at 6:59 AM, Igor Stoppa wrote: >> >> [...] >> >>> DMA/physmap access coupled with a knowledge of which virtual mappings >>> are in the physical space should be enough for an attacker to bypass >>> the gating mechanism this work imposes. Not trivial, but not >>> impossible. Since there's no way to prevent that sort of access in >>> current hardware (especially something like a NIC or GPU working >>> independently of the CPU altogether) > > [...] > >> I am not saying that this can not happen but that we are trying our best >> to avoid it. > > How about an opt-in verification, similar to what proposed by Boris > Lukashev? > > When reading back the data, one could access the pointer directly and > bypass the verification, or could use a function that explicitly checks > the integrity of the data. > > Starting from an unprotected kmalloc allocation, even just turning the > data into R/O is an improvement, but if one can afford the overhead of > performing the verification, why not? > I like the idea of making the verification call optional for consumers allowing for fast/slow+hard paths depending on their needs. Cant see any additional vectors for abuse (other than the original ones effecting out-of-band modification) introduced by having verify/normal callers, but i've not had enough coffee yet. Any access races or things like that come to mind for anyone? Shouldn't happen with a write-once allocation, but again, lacking coffee. > It would still be better if the service was provided by the library, > instead than implemented by individual users, I think. > > -- > igor -Boris
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On Fri, Jan 26, 2018 at 7:28 AM, Igor Stoppa wrote: > On 25/01/18 17:38, Jerome Glisse wrote: >> On Thu, Jan 25, 2018 at 10:14:28AM -0500, Boris Lukashev wrote: >>> On Thu, Jan 25, 2018 at 6:59 AM, Igor Stoppa wrote: >> >> [...] >> >>> DMA/physmap access coupled with a knowledge of which virtual mappings >>> are in the physical space should be enough for an attacker to bypass >>> the gating mechanism this work imposes. Not trivial, but not >>> impossible. Since there's no way to prevent that sort of access in >>> current hardware (especially something like a NIC or GPU working >>> independently of the CPU altogether) > > [...] > >> I am not saying that this can not happen but that we are trying our best >> to avoid it. > > How about an opt-in verification, similar to what proposed by Boris > Lukashev? > > When reading back the data, one could access the pointer directly and > bypass the verification, or could use a function that explicitly checks > the integrity of the data. > > Starting from an unprotected kmalloc allocation, even just turning the > data into R/O is an improvement, but if one can afford the overhead of > performing the verification, why not? > I like the idea of making the verification call optional for consumers allowing for fast/slow+hard paths depending on their needs. Cant see any additional vectors for abuse (other than the original ones effecting out-of-band modification) introduced by having verify/normal callers, but i've not had enough coffee yet. Any access races or things like that come to mind for anyone? Shouldn't happen with a write-once allocation, but again, lacking coffee. > It would still be better if the service was provided by the library, > instead than implemented by individual users, I think. > > -- > igor -Boris
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 25/01/18 17:38, Jerome Glisse wrote: > On Thu, Jan 25, 2018 at 10:14:28AM -0500, Boris Lukashev wrote: >> On Thu, Jan 25, 2018 at 6:59 AM, Igor Stoppawrote: > > [...] > >> DMA/physmap access coupled with a knowledge of which virtual mappings >> are in the physical space should be enough for an attacker to bypass >> the gating mechanism this work imposes. Not trivial, but not >> impossible. Since there's no way to prevent that sort of access in >> current hardware (especially something like a NIC or GPU working >> independently of the CPU altogether) [...] > I am not saying that this can not happen but that we are trying our best > to avoid it. How about an opt-in verification, similar to what proposed by Boris Lukashev? When reading back the data, one could access the pointer directly and bypass the verification, or could use a function that explicitly checks the integrity of the data. Starting from an unprotected kmalloc allocation, even just turning the data into R/O is an improvement, but if one can afford the overhead of performing the verification, why not? It would still be better if the service was provided by the library, instead than implemented by individual users, I think. -- igor
Re: [kernel-hardening] [PATCH 4/6] Protectable Memory
On 25/01/18 17:38, Jerome Glisse wrote: > On Thu, Jan 25, 2018 at 10:14:28AM -0500, Boris Lukashev wrote: >> On Thu, Jan 25, 2018 at 6:59 AM, Igor Stoppa wrote: > > [...] > >> DMA/physmap access coupled with a knowledge of which virtual mappings >> are in the physical space should be enough for an attacker to bypass >> the gating mechanism this work imposes. Not trivial, but not >> impossible. Since there's no way to prevent that sort of access in >> current hardware (especially something like a NIC or GPU working >> independently of the CPU altogether) [...] > I am not saying that this can not happen but that we are trying our best > to avoid it. How about an opt-in verification, similar to what proposed by Boris Lukashev? When reading back the data, one could access the pointer directly and bypass the verification, or could use a function that explicitly checks the integrity of the data. Starting from an unprotected kmalloc allocation, even just turning the data into R/O is an improvement, but if one can afford the overhead of performing the verification, why not? It would still be better if the service was provided by the library, instead than implemented by individual users, I think. -- igor