Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-19 Thread Igor Stoppa
On 14/03/18 19:33, Matthew Wilcox wrote:
> I think an implementation of
> pmalloc which used a page_frag-style allocator would be larger than
> 100 lines, but I don't think it would have to be significantly larger
> than that.

I have some doubt about what is the best way to implement it using
vmalloced memory.

1. Since I can allocate an arbitrary number of pages, I think allocating
a rounded up amount of memory, so that it's multiple of PAGE_SIZE should
be enough.

But maybe I could do better than that:
a) support pre-allocation of x pages

b) define, as pool parameter, the minimum number of pages to allocate
every time there is a refill

c) both a and b





2. the flavor of page_frag from page_alloc relies on page->_refcount,
however neither vmap_area, nor vm_struct seem to have anything like
that. (My reasoning is that I should do the accounting not on page
level, but based on the virtual area that I get when I allocate new
memory) What would be the best way to do refcounting for the area?

a) use the the page->_refcount from the first page that belongs to the area

b) add the _refcount to either vm_struct or vmap_area (I am not really
sure of why these two structures exist as separate entities, rather than
a single one - cache optimization?)




3. I will have to add a list of chunks (in genalloc lingo, or areas, if
we refer to the new implementation), because I will still need to
iterate over all the memory that belongs to a pool, for either write
protecting it or for destroying the pool. I have two options:

a) handle the chunks within the pmalloc pool

b) create an intermediate type of pool (vfrag_pool?) and then include it
in the pmalloc pool structure.

I'd lean toward option a, but I thought I might as well ask for advice
before I implement the less desirable option (whatever it might be).

--
thanks, igor


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-19 Thread Igor Stoppa
On 14/03/18 19:33, Matthew Wilcox wrote:
> I think an implementation of
> pmalloc which used a page_frag-style allocator would be larger than
> 100 lines, but I don't think it would have to be significantly larger
> than that.

I have some doubt about what is the best way to implement it using
vmalloced memory.

1. Since I can allocate an arbitrary number of pages, I think allocating
a rounded up amount of memory, so that it's multiple of PAGE_SIZE should
be enough.

But maybe I could do better than that:
a) support pre-allocation of x pages

b) define, as pool parameter, the minimum number of pages to allocate
every time there is a refill

c) both a and b





2. the flavor of page_frag from page_alloc relies on page->_refcount,
however neither vmap_area, nor vm_struct seem to have anything like
that. (My reasoning is that I should do the accounting not on page
level, but based on the virtual area that I get when I allocate new
memory) What would be the best way to do refcounting for the area?

a) use the the page->_refcount from the first page that belongs to the area

b) add the _refcount to either vm_struct or vmap_area (I am not really
sure of why these two structures exist as separate entities, rather than
a single one - cache optimization?)




3. I will have to add a list of chunks (in genalloc lingo, or areas, if
we refer to the new implementation), because I will still need to
iterate over all the memory that belongs to a pool, for either write
protecting it or for destroying the pool. I have two options:

a) handle the chunks within the pmalloc pool

b) create an intermediate type of pool (vfrag_pool?) and then include it
in the pmalloc pool structure.

I'd lean toward option a, but I thought I might as well ask for advice
before I implement the less desirable option (whatever it might be).

--
thanks, igor


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-15 Thread Igor Stoppa


On 14/03/2018 19:33, Matthew Wilcox wrote:
> On Wed, Mar 14, 2018 at 06:11:22PM +0200, Igor Stoppa wrote:

[...]

>> Probably page_frag does well with relatively large allocations, while
>> genalloc seems to be better for small (few allocation units) allocations.
> 
> I don't understand why you would think that.  If you allocate 4096 1-byte
> elements, page_frag will just use up a page.  Doing the same thing with
> genalloc requires allocating two bits per byte (1kB of bitmap), plus
> other overheads.

I had misunderstood the amount of page_frag structures needed.

--
igor


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-15 Thread Igor Stoppa


On 14/03/2018 19:33, Matthew Wilcox wrote:
> On Wed, Mar 14, 2018 at 06:11:22PM +0200, Igor Stoppa wrote:

[...]

>> Probably page_frag does well with relatively large allocations, while
>> genalloc seems to be better for small (few allocation units) allocations.
> 
> I don't understand why you would think that.  If you allocate 4096 1-byte
> elements, page_frag will just use up a page.  Doing the same thing with
> genalloc requires allocating two bits per byte (1kB of bitmap), plus
> other overheads.

I had misunderstood the amount of page_frag structures needed.

--
igor


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-14 Thread Matthew Wilcox
On Wed, Mar 14, 2018 at 06:11:22PM +0200, Igor Stoppa wrote:
> On 14/03/18 15:04, Matthew Wilcox wrote:
> > but the principle it uses
> > seems like a better match to me than the rather complex genalloc.
> 
> It uses meta data in a different way than genalloc.
> There is probably a tipping point where one implementation becomes more
> space-efficient than the other.

Certainly there are always tradeoffs in writing a memory allocator.

> Probably page_frag does well with relatively large allocations, while
> genalloc seems to be better for small (few allocation units) allocations.

I don't understand why you would think that.  If you allocate 4096 1-byte
elements, page_frag will just use up a page.  Doing the same thing with
genalloc requires allocating two bits per byte (1kB of bitmap), plus
other overheads.

> Also, in case of high variance in the size of the allocations, genalloc
> requires the allocation unit to be small enough to fit the smallest
> request (otherwise one must accept some slack), while page_frag doesn't
> care if the allocation is small or large.

Right; internal versus external fragmentation.  The bane of memory
allocators ;-)

> page_frag otoh, seems to not support the reuse of space that was freed,
> since there is only

To a certain extent it does.  If you free everything on a page, and that
page is still in the page_frag_cache, it will get reused.

> But could you please explain to what you are referring to, when you say
> that page_frag has "significantly lower overhead" ?

Less CPU time taken per allocation, less metadata stored per object.

> Ex: if the pfree is called only on error paths, is it ok to not claim
> back the memory released, if it's less than one page?

Yes, I think that's a great example.

> To be clear: I do not want to hold to genalloc just because I have
> already implemented it. I can at least sketch a version with page_frag,
> but I would like to understand why its trade-offs are better :-)
> 
> > Just allocate some pages and track the offset within those pages that 
> 
> > is the current allocation point.
> 
> 
> > It's less than 100 lines of code!
> 
> Strictly speaking it is true, but it all relies on other functions,
> which must be rewritten, because they use linear address, while this
> must work with virtual (vmalloc) addresses.

No, that's basically the whole thing.  I think an implementation of
pmalloc which used a page_frag-style allocator would be larger than
100 lines, but I don't think it would have to be significantly larger
than that.

> Also, I see that the code relies a lot on order of allocation.
> I think we had similar discussion wrt compound pages.
> 
> It seems to me wasteful, if I have a request of, say, 5 pages, and I end
> up allocating 8.

Yes, but the other three pages are available for use by the pmalloc pool.
Now, at pmalloc_protect() time, you might well want to release the unused
pages by calling make_alloc_exact() and hand those three pages back to the
page allocator.

> I do not recall anyone giving a justification like:
> "yeah, it uses extra pages, but it's preferable, for reasons X, Y and Z,
> so it's a good trade-off"

Sometimes it is, sometimes it isn't.

> Could it be that it's normal RAM is considered less precious than the
> special memory genalloc is written for, so normal RAM is not really
> proactively reused, while special memory is treated as a more valuable
> resource that should not be wasted?

We're certainly at the point where normal RAM is pretty cheap.  A 16GB
DIMM is $200, so that's $12.50 per gigabyte.  We have more of a problem
with fragmentation than we do with squeezing every last byte out of
the system.

Of course, Linux still runs on tiny systems, and we don't want to
unnecessarily bloat the kernel.  And cachelines are also a precious
resource; the fewer we touch, the faster the system runs.  The bitmap
in genalloc can easily occupy several cachelines; the page_frag allocator
touches a single cacheline for most allocations.


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-14 Thread Matthew Wilcox
On Wed, Mar 14, 2018 at 06:11:22PM +0200, Igor Stoppa wrote:
> On 14/03/18 15:04, Matthew Wilcox wrote:
> > but the principle it uses
> > seems like a better match to me than the rather complex genalloc.
> 
> It uses meta data in a different way than genalloc.
> There is probably a tipping point where one implementation becomes more
> space-efficient than the other.

Certainly there are always tradeoffs in writing a memory allocator.

> Probably page_frag does well with relatively large allocations, while
> genalloc seems to be better for small (few allocation units) allocations.

I don't understand why you would think that.  If you allocate 4096 1-byte
elements, page_frag will just use up a page.  Doing the same thing with
genalloc requires allocating two bits per byte (1kB of bitmap), plus
other overheads.

> Also, in case of high variance in the size of the allocations, genalloc
> requires the allocation unit to be small enough to fit the smallest
> request (otherwise one must accept some slack), while page_frag doesn't
> care if the allocation is small or large.

Right; internal versus external fragmentation.  The bane of memory
allocators ;-)

> page_frag otoh, seems to not support the reuse of space that was freed,
> since there is only

To a certain extent it does.  If you free everything on a page, and that
page is still in the page_frag_cache, it will get reused.

> But could you please explain to what you are referring to, when you say
> that page_frag has "significantly lower overhead" ?

Less CPU time taken per allocation, less metadata stored per object.

> Ex: if the pfree is called only on error paths, is it ok to not claim
> back the memory released, if it's less than one page?

Yes, I think that's a great example.

> To be clear: I do not want to hold to genalloc just because I have
> already implemented it. I can at least sketch a version with page_frag,
> but I would like to understand why its trade-offs are better :-)
> 
> > Just allocate some pages and track the offset within those pages that 
> 
> > is the current allocation point.
> 
> 
> > It's less than 100 lines of code!
> 
> Strictly speaking it is true, but it all relies on other functions,
> which must be rewritten, because they use linear address, while this
> must work with virtual (vmalloc) addresses.

No, that's basically the whole thing.  I think an implementation of
pmalloc which used a page_frag-style allocator would be larger than
100 lines, but I don't think it would have to be significantly larger
than that.

> Also, I see that the code relies a lot on order of allocation.
> I think we had similar discussion wrt compound pages.
> 
> It seems to me wasteful, if I have a request of, say, 5 pages, and I end
> up allocating 8.

Yes, but the other three pages are available for use by the pmalloc pool.
Now, at pmalloc_protect() time, you might well want to release the unused
pages by calling make_alloc_exact() and hand those three pages back to the
page allocator.

> I do not recall anyone giving a justification like:
> "yeah, it uses extra pages, but it's preferable, for reasons X, Y and Z,
> so it's a good trade-off"

Sometimes it is, sometimes it isn't.

> Could it be that it's normal RAM is considered less precious than the
> special memory genalloc is written for, so normal RAM is not really
> proactively reused, while special memory is treated as a more valuable
> resource that should not be wasted?

We're certainly at the point where normal RAM is pretty cheap.  A 16GB
DIMM is $200, so that's $12.50 per gigabyte.  We have more of a problem
with fragmentation than we do with squeezing every last byte out of
the system.

Of course, Linux still runs on tiny systems, and we don't want to
unnecessarily bloat the kernel.  And cachelines are also a precious
resource; the fewer we touch, the faster the system runs.  The bitmap
in genalloc can easily occupy several cachelines; the page_frag allocator
touches a single cacheline for most allocations.


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-14 Thread Igor Stoppa


On 14/03/18 15:04, Matthew Wilcox wrote:

> I don't necessarily think you should use it as-is,

I think I simply cannot use it as-is, because it seems to use linear
memory, while I need virtual. This reason alone would require a rewrite
of several parts.

> but the principle it uses
> seems like a better match to me than the rather complex genalloc.

It uses meta data in a different way than genalloc.
There is probably a tipping point where one implementation becomes more
space-efficient than the other.

Probably page_frag does well with relatively large allocations, while
genalloc seems to be better for small (few allocation units) allocations.

Also, in case of high variance in the size of the allocations, genalloc
requires the allocation unit to be small enough to fit the smallest
request (otherwise one must accept some slack), while page_frag doesn't
care if the allocation is small or large.

page_frag otoh, seems to not support the reuse of space that was freed,
since there is only

But could you please explain to what you are referring to, when you say
that page_frag has "significantly lower overhead" ?

Is it because it doesn't try to reclaim space that was freed, until the
whole page is empty?

I see different trade-offs, but I am probably either missing or
underestimating the main reason why you think this is better.

And probably I am missing the capability of judging what is acceptable
in certain cases.

Ex: if the pfree is called only on error paths, is it ok to not claim
back the memory released, if it's less than one page?

To be clear: I do not want to hold to genalloc just because I have
already implemented it. I can at least sketch a version with page_frag,
but I would like to understand why its trade-offs are better :-)

> Just allocate some pages and track the offset within those pages that 

> is the current allocation point.


> It's less than 100 lines of code!

Strictly speaking it is true, but it all relies on other functions,
which must be rewritten, because they use linear address, while this
must work with virtual (vmalloc) addresses.

Also, I see that the code relies a lot on order of allocation.
I think we had similar discussion wrt compound pages.

It seems to me wasteful, if I have a request of, say, 5 pages, and I end
up allocating 8.

I do not recall anyone giving a justification like:
"yeah, it uses extra pages, but it's preferable, for reasons X, Y and Z,
so it's a good trade-off"

Could it be that it's normal RAM is considered less precious than the
special memory genalloc is written for, so normal RAM is not really
proactively reused, while special memory is treated as a more valuable
resource that should not be wasted?


--
igor


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-14 Thread Igor Stoppa


On 14/03/18 15:04, Matthew Wilcox wrote:

> I don't necessarily think you should use it as-is,

I think I simply cannot use it as-is, because it seems to use linear
memory, while I need virtual. This reason alone would require a rewrite
of several parts.

> but the principle it uses
> seems like a better match to me than the rather complex genalloc.

It uses meta data in a different way than genalloc.
There is probably a tipping point where one implementation becomes more
space-efficient than the other.

Probably page_frag does well with relatively large allocations, while
genalloc seems to be better for small (few allocation units) allocations.

Also, in case of high variance in the size of the allocations, genalloc
requires the allocation unit to be small enough to fit the smallest
request (otherwise one must accept some slack), while page_frag doesn't
care if the allocation is small or large.

page_frag otoh, seems to not support the reuse of space that was freed,
since there is only

But could you please explain to what you are referring to, when you say
that page_frag has "significantly lower overhead" ?

Is it because it doesn't try to reclaim space that was freed, until the
whole page is empty?

I see different trade-offs, but I am probably either missing or
underestimating the main reason why you think this is better.

And probably I am missing the capability of judging what is acceptable
in certain cases.

Ex: if the pfree is called only on error paths, is it ok to not claim
back the memory released, if it's less than one page?

To be clear: I do not want to hold to genalloc just because I have
already implemented it. I can at least sketch a version with page_frag,
but I would like to understand why its trade-offs are better :-)

> Just allocate some pages and track the offset within those pages that 

> is the current allocation point.


> It's less than 100 lines of code!

Strictly speaking it is true, but it all relies on other functions,
which must be rewritten, because they use linear address, while this
must work with virtual (vmalloc) addresses.

Also, I see that the code relies a lot on order of allocation.
I think we had similar discussion wrt compound pages.

It seems to me wasteful, if I have a request of, say, 5 pages, and I end
up allocating 8.

I do not recall anyone giving a justification like:
"yeah, it uses extra pages, but it's preferable, for reasons X, Y and Z,
so it's a good trade-off"

Could it be that it's normal RAM is considered less precious than the
special memory genalloc is written for, so normal RAM is not really
proactively reused, while special memory is treated as a more valuable
resource that should not be wasted?


--
igor


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-14 Thread Matthew Wilcox
On Wed, Mar 14, 2018 at 02:55:10PM +0200, Igor Stoppa wrote:
> >  The page_frag allocator seems like a much better place to
> > start than genalloc.  It has a significantly lower overhead and is
> > much more suited to the kind of probably-identical-lifespan that the
> > pmalloc API is going to persuade its users to have.
> 
> Could you please provide me a pointer?
> I did a quick search on 4.16-rc5 and found the definition of page_frag
> and sk_page_frag(). Is this what you are referring to?

It's a blink-and-you'll-miss-it allocator buried deep in mm/page_alloc.c:
void *page_frag_alloc(struct page_frag_cache *nc,
  unsigned int fragsz, gfp_t gfp_mask)
void page_frag_free(void *addr)

I don't necessarily think you should use it as-is, but the principle it uses
seems like a better match to me than the rather complex genalloc.  Just
allocate some pages and track the offset within those pages that is the
current allocation point.  It's less than 100 lines of code!


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-14 Thread Matthew Wilcox
On Wed, Mar 14, 2018 at 02:55:10PM +0200, Igor Stoppa wrote:
> >  The page_frag allocator seems like a much better place to
> > start than genalloc.  It has a significantly lower overhead and is
> > much more suited to the kind of probably-identical-lifespan that the
> > pmalloc API is going to persuade its users to have.
> 
> Could you please provide me a pointer?
> I did a quick search on 4.16-rc5 and found the definition of page_frag
> and sk_page_frag(). Is this what you are referring to?

It's a blink-and-you'll-miss-it allocator buried deep in mm/page_alloc.c:
void *page_frag_alloc(struct page_frag_cache *nc,
  unsigned int fragsz, gfp_t gfp_mask)
void page_frag_free(void *addr)

I don't necessarily think you should use it as-is, but the principle it uses
seems like a better match to me than the rather complex genalloc.  Just
allocate some pages and track the offset within those pages that is the
current allocation point.  It's less than 100 lines of code!


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-14 Thread Igor Stoppa


On 14/03/18 13:56, Matthew Wilcox wrote:
> On Wed, Mar 14, 2018 at 01:21:54PM +0200, Igor Stoppa wrote:

[...]

> You misread my proposal.  I did not suggest storing the 'start', but the
> 'end'.

Ok, but doesn't that only change the race scenario?

Attempting to free one allocation, while it is in progress, so that all
the "space" bits are written, but the "end bit" is not yet written.
That will eat up also the following, complete allocation, if there is no
locking in place.

[...]

>> The implementation which interleaves "space" and "start" does not suffer
>> from this sort of races, because the alteration of the interleaved
>> bitmaps is atomic.
> 
> This would be a bug in the allocator implementation.  Obviously it has to
> maintain the integrity of its own data structures.

But I cannot imagine how to do it, with the split bitmaps, without a
lock :-/
And genalloc is supposed to be lockless.

>> Does this justification for the use of interleaved bitmaps (iow the
>> current implementation) make sense?
> 
> I think you're making a mistake by basing the pmalloc allocator on
> genalloc.

It was recommended to me because it was a close match to the allocator
that I was writing from scratch and, when I looked at it, I could only
agree that it was very close.

But I have no particular reason for preferring it, if something better
is available. It was just never brought up before.
At least not that I noticed.

>  The page_frag allocator seems like a much better place to
> start than genalloc.  It has a significantly lower overhead and is
> much more suited to the kind of probably-identical-lifespan that the
> pmalloc API is going to persuade its users to have.


Could you please provide me a pointer?
I did a quick search on 4.16-rc5 and found the definition of page_frag
and sk_page_frag(). Is this what you are referring to?

--
igor


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-14 Thread Igor Stoppa


On 14/03/18 13:56, Matthew Wilcox wrote:
> On Wed, Mar 14, 2018 at 01:21:54PM +0200, Igor Stoppa wrote:

[...]

> You misread my proposal.  I did not suggest storing the 'start', but the
> 'end'.

Ok, but doesn't that only change the race scenario?

Attempting to free one allocation, while it is in progress, so that all
the "space" bits are written, but the "end bit" is not yet written.
That will eat up also the following, complete allocation, if there is no
locking in place.

[...]

>> The implementation which interleaves "space" and "start" does not suffer
>> from this sort of races, because the alteration of the interleaved
>> bitmaps is atomic.
> 
> This would be a bug in the allocator implementation.  Obviously it has to
> maintain the integrity of its own data structures.

But I cannot imagine how to do it, with the split bitmaps, without a
lock :-/
And genalloc is supposed to be lockless.

>> Does this justification for the use of interleaved bitmaps (iow the
>> current implementation) make sense?
> 
> I think you're making a mistake by basing the pmalloc allocator on
> genalloc.

It was recommended to me because it was a close match to the allocator
that I was writing from scratch and, when I looked at it, I could only
agree that it was very close.

But I have no particular reason for preferring it, if something better
is available. It was just never brought up before.
At least not that I noticed.

>  The page_frag allocator seems like a much better place to
> start than genalloc.  It has a significantly lower overhead and is
> much more suited to the kind of probably-identical-lifespan that the
> pmalloc API is going to persuade its users to have.


Could you please provide me a pointer?
I did a quick search on 4.16-rc5 and found the definition of page_frag
and sk_page_frag(). Is this what you are referring to?

--
igor


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-14 Thread Matthew Wilcox
On Wed, Mar 14, 2018 at 01:21:54PM +0200, Igor Stoppa wrote:
> > * @Kees Cook proposed to turn the self testing into modules.
> >   My answer was that the functionality is intentionally tested very early
> >   in the boot phase, to prevent unexplainable errors, should the feature
> >   really fail.
> 
> This could be workable, if it's acceptable that the early testing is
> performed only when the module is compiled in.
> I do not expect the module-based testing to bring much value, but it
> doesn't do harm. Is this acceptable?

Something I've been doing recently is building tests in both userspace and
kernel space.  Here's an example:
http://git.infradead.org/users/willy/linux-dax.git/commitdiff/717f2aa1d4040f65966bb9dab64035962576b0f9

Essentially, tools/ contains a reasonably good set of functions which
emulate kernel functions.  So you write your test suite as a kernel module
and then build it in userspace as well.

> > * @Matthew Wilcox proposed to use a different mechanism for the genalloc
> >   bitmap: 2 bitmaps, one for occupation and one for start.
> >   And possibly use an rbtree for the starts.
> >   My answer was that this solution is less optimized, because it scatters
> >   the data of one allocation across multiple words/pages, plus is not
> >   a transaction anymore. And the particular distribution of sizes of
> >   allocation is likely to eat up much more memory than the bitmap.
> 
> I think I can describe a scenario where the split bitmaps would not work
> (based on my understanding of the proposal), but I would appreciate a
> review. Here it is:

You misread my proposal.  I did not suggest storing the 'start', but the
'end'.

> * One allocation (let's call it allocation A) is already present in both
> bitmaps:
>   - its units of allocation are marked in the "space" bitmap
>   - its starting bit is marked in the "starts" bitmap
> 
> * Another allocation (let's call it allocation B) is undergoing:
>   - some of its units of allocation (starting from the beginning) are
> marked in the "space" bitmap
>   - the starting bit is *not* yet marked in the "starts" bitmap
> 
> * B occupies the space immediately after A
> 
> * While B is being written, A is freed
> 
> * Having to determine the length of A, the "space" bitmap will be
>   searched, then the "starts" bitmap
> 
> 
> The space initially allocated for B will be wrongly accounted for A,
> because there is no empty gap in-between and the beginning of B is not
> yet marked.
> 
> The implementation which interleaves "space" and "start" does not suffer
> from this sort of races, because the alteration of the interleaved
> bitmaps is atomic.

This would be a bug in the allocator implementation.  Obviously it has to
maintain the integrity of its own data structures.

> Does this justification for the use of interleaved bitmaps (iow the
> current implementation) make sense?

I think you're making a mistake by basing the pmalloc allocator on
genalloc.  The page_frag allocator seems like a much better place to
start than genalloc.  It has a significantly lower overhead and is
much more suited to the kind of probably-identical-lifespan that the
pmalloc API is going to persuade its users to have.


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-14 Thread Matthew Wilcox
On Wed, Mar 14, 2018 at 01:21:54PM +0200, Igor Stoppa wrote:
> > * @Kees Cook proposed to turn the self testing into modules.
> >   My answer was that the functionality is intentionally tested very early
> >   in the boot phase, to prevent unexplainable errors, should the feature
> >   really fail.
> 
> This could be workable, if it's acceptable that the early testing is
> performed only when the module is compiled in.
> I do not expect the module-based testing to bring much value, but it
> doesn't do harm. Is this acceptable?

Something I've been doing recently is building tests in both userspace and
kernel space.  Here's an example:
http://git.infradead.org/users/willy/linux-dax.git/commitdiff/717f2aa1d4040f65966bb9dab64035962576b0f9

Essentially, tools/ contains a reasonably good set of functions which
emulate kernel functions.  So you write your test suite as a kernel module
and then build it in userspace as well.

> > * @Matthew Wilcox proposed to use a different mechanism for the genalloc
> >   bitmap: 2 bitmaps, one for occupation and one for start.
> >   And possibly use an rbtree for the starts.
> >   My answer was that this solution is less optimized, because it scatters
> >   the data of one allocation across multiple words/pages, plus is not
> >   a transaction anymore. And the particular distribution of sizes of
> >   allocation is likely to eat up much more memory than the bitmap.
> 
> I think I can describe a scenario where the split bitmaps would not work
> (based on my understanding of the proposal), but I would appreciate a
> review. Here it is:

You misread my proposal.  I did not suggest storing the 'start', but the
'end'.

> * One allocation (let's call it allocation A) is already present in both
> bitmaps:
>   - its units of allocation are marked in the "space" bitmap
>   - its starting bit is marked in the "starts" bitmap
> 
> * Another allocation (let's call it allocation B) is undergoing:
>   - some of its units of allocation (starting from the beginning) are
> marked in the "space" bitmap
>   - the starting bit is *not* yet marked in the "starts" bitmap
> 
> * B occupies the space immediately after A
> 
> * While B is being written, A is freed
> 
> * Having to determine the length of A, the "space" bitmap will be
>   searched, then the "starts" bitmap
> 
> 
> The space initially allocated for B will be wrongly accounted for A,
> because there is no empty gap in-between and the beginning of B is not
> yet marked.
> 
> The implementation which interleaves "space" and "start" does not suffer
> from this sort of races, because the alteration of the interleaved
> bitmaps is atomic.

This would be a bug in the allocator implementation.  Obviously it has to
maintain the integrity of its own data structures.

> Does this justification for the use of interleaved bitmaps (iow the
> current implementation) make sense?

I think you're making a mistake by basing the pmalloc allocator on
genalloc.  The page_frag allocator seems like a much better place to
start than genalloc.  It has a significantly lower overhead and is
much more suited to the kind of probably-identical-lifespan that the
pmalloc API is going to persuade its users to have.


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-14 Thread Igor Stoppa
On 13/03/18 23:45, Igor Stoppa wrote:

[...]

Some more thoughts about the open topics:

> Discussion topics that are unclear if they are closed and would need
> comment from those who initiated them, if my answers are accepted or not:
> 
> * @Kees Cook proposed to have first self testing for genalloc, to
>   validate the following patch, adding tracing of allocations
>   My answer was that such tests would also need patching, therefore they 
>   could not certify that the functionality is corect both before and
>   after the genalloc bitmap modification.

This is the only one where I still couldn't find a solution.
If Matthew Wilcox's proposal about implementing the the genalloc bitmap
would work, then this too could be done, but I think this alternate
bitmap proposed has problems. More on it below.

> * @Kees Cook proposed to turn the self testing into modules.
>   My answer was that the functionality is intentionally tested very early
>   in the boot phase, to prevent unexplainable errors, should the feature
>   really fail.

This could be workable, if it's acceptable that the early testing is
performed only when the module is compiled in.
I do not expect the module-based testing to bring much value, but it
doesn't do harm. Is this acceptable?

> * @Matthew Wilcox proposed to use a different mechanism for the genalloc
>   bitmap: 2 bitmaps, one for occupation and one for start.
>   And possibly use an rbtree for the starts.
>   My answer was that this solution is less optimized, because it scatters
>   the data of one allocation across multiple words/pages, plus is not
>   a transaction anymore. And the particular distribution of sizes of
>   allocation is likely to eat up much more memory than the bitmap.

I think I can describe a scenario where the split bitmaps would not work
(based on my understanding of the proposal), but I would appreciate a
review. Here it is:

* One allocation (let's call it allocation A) is already present in both
bitmaps:
  - its units of allocation are marked in the "space" bitmap
  - its starting bit is marked in the "starts" bitmap

* Another allocation (let's call it allocation B) is undergoing:
  - some of its units of allocation (starting from the beginning) are
marked in the "space" bitmap
  - the starting bit is *not* yet marked in the "starts" bitmap

* B occupies the space immediately after A

* While B is being written, A is freed

* Having to determine the length of A, the "space" bitmap will be
  searched, then the "starts" bitmap


The space initially allocated for B will be wrongly accounted for A,
because there is no empty gap in-between and the beginning of B is not
yet marked.

The implementation which interleaves "space" and "start" does not suffer
from this sort of races, because the alteration of the interleaved
bitmaps is atomic.

However, at the very least, some more explanation is needed in the
documentation/code, because this scenario is not exactly obvious.

Does this justification for the use of interleaved bitmaps (iow the
current implementation) make sense?

--
igor


Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-14 Thread Igor Stoppa
On 13/03/18 23:45, Igor Stoppa wrote:

[...]

Some more thoughts about the open topics:

> Discussion topics that are unclear if they are closed and would need
> comment from those who initiated them, if my answers are accepted or not:
> 
> * @Kees Cook proposed to have first self testing for genalloc, to
>   validate the following patch, adding tracing of allocations
>   My answer was that such tests would also need patching, therefore they 
>   could not certify that the functionality is corect both before and
>   after the genalloc bitmap modification.

This is the only one where I still couldn't find a solution.
If Matthew Wilcox's proposal about implementing the the genalloc bitmap
would work, then this too could be done, but I think this alternate
bitmap proposed has problems. More on it below.

> * @Kees Cook proposed to turn the self testing into modules.
>   My answer was that the functionality is intentionally tested very early
>   in the boot phase, to prevent unexplainable errors, should the feature
>   really fail.

This could be workable, if it's acceptable that the early testing is
performed only when the module is compiled in.
I do not expect the module-based testing to bring much value, but it
doesn't do harm. Is this acceptable?

> * @Matthew Wilcox proposed to use a different mechanism for the genalloc
>   bitmap: 2 bitmaps, one for occupation and one for start.
>   And possibly use an rbtree for the starts.
>   My answer was that this solution is less optimized, because it scatters
>   the data of one allocation across multiple words/pages, plus is not
>   a transaction anymore. And the particular distribution of sizes of
>   allocation is likely to eat up much more memory than the bitmap.

I think I can describe a scenario where the split bitmaps would not work
(based on my understanding of the proposal), but I would appreciate a
review. Here it is:

* One allocation (let's call it allocation A) is already present in both
bitmaps:
  - its units of allocation are marked in the "space" bitmap
  - its starting bit is marked in the "starts" bitmap

* Another allocation (let's call it allocation B) is undergoing:
  - some of its units of allocation (starting from the beginning) are
marked in the "space" bitmap
  - the starting bit is *not* yet marked in the "starts" bitmap

* B occupies the space immediately after A

* While B is being written, A is freed

* Having to determine the length of A, the "space" bitmap will be
  searched, then the "starts" bitmap


The space initially allocated for B will be wrongly accounted for A,
because there is no empty gap in-between and the beginning of B is not
yet marked.

The implementation which interleaves "space" and "start" does not suffer
from this sort of races, because the alteration of the interleaved
bitmaps is atomic.

However, at the very least, some more explanation is needed in the
documentation/code, because this scenario is not exactly obvious.

Does this justification for the use of interleaved bitmaps (iow the
current implementation) make sense?

--
igor


[RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-13 Thread Igor Stoppa
This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since v18:

[http://www.openwall.com/lists/kernel-hardening/2018/02/28/21]

* Code refactoring in pmalloc & genalloc:
  - simplify the logic for handling pools before and after sysf init
  - reduced section holding mutex on pmalloc list, when adding a pool
  - reduced the steps in finding length of an existing allocation
  - split various functions into smaller ones
* clarified in the comments the need for pfree()
* Various fixes to the documentation:
  - remove kerneldoc duplicates
  - added cross-reference lables
  - miscellaneous typos
* improved error notifications: use WARNs with specific messages
* added missing tests for possible error conditions


Discussion topics that are unclear if they are closed and would need
comment from those who initiated them, if my answers are accepted or not:

* @Kees Cook proposed to have first self testing for genalloc, to
  validate the following patch, adding tracing of allocations
  My answer was that such tests would also need patching, therefore they 
  could not certify that the functionality is corect both before and
  after the genalloc bitmap modification.

* @Kees Cook proposed to turn the self testing into modules.
  My answer was that the functionality is intentionally tested very early
  in the boot phase, to prevent unexplainable errors, should the feature
  really fail.

* @Matthew Wilcox proposed to use a different mechanism for th genalloc
  bitmap: 2 bitmaps, one for occupation and one for start.
  And possibly use an rbtree for the starts.
  My answer was that this solution is less optimized, because it scatters
  the data of one allocation across multiple words/pages, plus is not
  a transaction anymore. And the particular distribution of sizes of
  allocation is likely to eat up much more memory than the bitmap.

Igor Stoppa (8):
  genalloc: track beginning of allocations
  Add label to genalloc.rst for cross reference
  genalloc: selftest
  struct page: add field for vm_struct
  Protectable Memory
  Pmalloc selftest
  lkdtm: crash on overwriting protected pmalloc var
  Documentation for Pmalloc

 Documentation/core-api/genalloc.rst |   2 +
 Documentation/core-api/index.rst|   1 +
 Documentation/core-api/pmalloc.rst  | 111 ++
 drivers/misc/lkdtm.h|   1 +
 drivers/misc/lkdtm_core.c   |   3 +
 drivers/misc/lkdtm_perms.c  |  28 ++
 include/linux/genalloc.h| 116 +++---
 include/linux/mm_types.h|   1 +
 include/linux/pmalloc.h | 163 
 include/linux/test_genalloc.h   |  26 ++
 include/linux/test_pmalloc.h|  24 ++
 include/linux/vmalloc.h |   1 +
 init/main.c |   4 +
 lib/Kconfig |  15 +
 lib/Makefile|   1 +
 lib/genalloc.c  | 765 ++--
 lib/test_genalloc.c | 410 +++
 mm/Kconfig  |  17 +
 mm/Makefile |   2 +
 mm/pmalloc.c| 643 ++
 mm/test_pmalloc.c   | 238 +++
 mm/usercopy.c   |  33 ++
 mm/vmalloc.c|   2 +
 23 files changed, 2352 insertions(+), 255 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.rst
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 include/linux/test_genalloc.h
 create mode 100644 include/linux/test_pmalloc.h
 create mode 100644 lib/test_genalloc.c
 create mode 100644 mm/pmalloc.c
 create mode 100644 mm/test_pmalloc.c

-- 
2.14.1



[RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

2018-03-13 Thread Igor Stoppa
This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since v18:

[http://www.openwall.com/lists/kernel-hardening/2018/02/28/21]

* Code refactoring in pmalloc & genalloc:
  - simplify the logic for handling pools before and after sysf init
  - reduced section holding mutex on pmalloc list, when adding a pool
  - reduced the steps in finding length of an existing allocation
  - split various functions into smaller ones
* clarified in the comments the need for pfree()
* Various fixes to the documentation:
  - remove kerneldoc duplicates
  - added cross-reference lables
  - miscellaneous typos
* improved error notifications: use WARNs with specific messages
* added missing tests for possible error conditions


Discussion topics that are unclear if they are closed and would need
comment from those who initiated them, if my answers are accepted or not:

* @Kees Cook proposed to have first self testing for genalloc, to
  validate the following patch, adding tracing of allocations
  My answer was that such tests would also need patching, therefore they 
  could not certify that the functionality is corect both before and
  after the genalloc bitmap modification.

* @Kees Cook proposed to turn the self testing into modules.
  My answer was that the functionality is intentionally tested very early
  in the boot phase, to prevent unexplainable errors, should the feature
  really fail.

* @Matthew Wilcox proposed to use a different mechanism for th genalloc
  bitmap: 2 bitmaps, one for occupation and one for start.
  And possibly use an rbtree for the starts.
  My answer was that this solution is less optimized, because it scatters
  the data of one allocation across multiple words/pages, plus is not
  a transaction anymore. And the particular distribution of sizes of
  allocation is likely to eat up much more memory than the bitmap.

Igor Stoppa (8):
  genalloc: track beginning of allocations
  Add label to genalloc.rst for cross reference
  genalloc: selftest
  struct page: add field for vm_struct
  Protectable Memory
  Pmalloc selftest
  lkdtm: crash on overwriting protected pmalloc var
  Documentation for Pmalloc

 Documentation/core-api/genalloc.rst |   2 +
 Documentation/core-api/index.rst|   1 +
 Documentation/core-api/pmalloc.rst  | 111 ++
 drivers/misc/lkdtm.h|   1 +
 drivers/misc/lkdtm_core.c   |   3 +
 drivers/misc/lkdtm_perms.c  |  28 ++
 include/linux/genalloc.h| 116 +++---
 include/linux/mm_types.h|   1 +
 include/linux/pmalloc.h | 163 
 include/linux/test_genalloc.h   |  26 ++
 include/linux/test_pmalloc.h|  24 ++
 include/linux/vmalloc.h |   1 +
 init/main.c |   4 +
 lib/Kconfig |  15 +
 lib/Makefile|   1 +
 lib/genalloc.c  | 765 ++--
 lib/test_genalloc.c | 410 +++
 mm/Kconfig  |  17 +
 mm/Makefile |   2 +
 mm/pmalloc.c| 643 ++
 mm/test_pmalloc.c   | 238 +++
 mm/usercopy.c   |  33 ++
 mm/vmalloc.c|   2 +
 23 files changed, 2352 insertions(+), 255 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.rst
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 include/linux/test_genalloc.h
 create mode 100644 include/linux/test_pmalloc.h
 create mode 100644 lib/test_genalloc.c
 create mode 100644 mm/pmalloc.c
 create mode 100644 mm/test_pmalloc.c

-- 
2.14.1