Re: [PATCH 10/17] prmem: documentation

2018-11-21 Thread Igor Stoppa

Hi,

On 13/11/2018 20:36, Andy Lutomirski wrote:

On Tue, Nov 13, 2018 at 10:33 AM Igor Stoppa  wrote:


I forgot one sentence :-(

On 13/11/2018 20:31, Igor Stoppa wrote:

On 13/11/2018 19:47, Andy Lutomirski wrote:


For general rare-writish stuff, I don't think we want IRQs running
with them mapped anywhere for write.  For AVC and IMA, I'm less sure.


Why would these be less sensitive?

But I see a big difference between my initial implementation and this one.

In my case, by using a shared mapping, visible to all cores, freezing
the core that is performing the write would have exposed the writable
mapping to a potential attack run from another core.

If the mapping is private to the core performing the write, even if it
is frozen, it's much harder to figure out what it had mapped and where,
from another core.

To access that mapping, the attack should be performed from the ISR, I
think.


Unless the secondary mapping is also available to other cores, through
the shared mm_struct ?



I don't think this matters much.  The other cores will only be able to
use that mapping when they're doing a rare write.



I'm still mulling over this.
There might be other reasons for replicating the mm_struct.

If I understand correctly how the text patching works, it happens 
sequentially, because of the text_mutex used by arch_jump_label_transform


Which might be fine for this specific case, but I think I shouldn't 
introduce a global mutex, when it comes to data.
Most likely, if two or more cores want to perform a write rare 
operation, there is no correlation between them, they could proceed in 
parallel. And if there really is, then the user of the API should 
introduce own locking, for that specific case.


A bit unrelated question, related to text patching: I see that each 
patching operation is validated, but wouldn't it be more robust to first 
validate  all of then, and only after they are all found to be 
compliant, to proceed with the actual modifications?


And about the actual implementation of the write rare for the statically 
allocated variables, is it expected that I use Nadav's function?

Or that I refactor the code?

The name, referring to text would definitely not be ok for data.
And I would have to also generalize it, to deal with larger amounts of data.

I would find it easier, as first cut, to replicate its behavior and 
refactor only later, once it has stabilized and possibly Nadav's patches 
have been acked.


--
igor


Re: [PATCH 10/17] prmem: documentation

2018-11-21 Thread Igor Stoppa

Hi,

On 13/11/2018 20:36, Andy Lutomirski wrote:

On Tue, Nov 13, 2018 at 10:33 AM Igor Stoppa  wrote:


I forgot one sentence :-(

On 13/11/2018 20:31, Igor Stoppa wrote:

On 13/11/2018 19:47, Andy Lutomirski wrote:


For general rare-writish stuff, I don't think we want IRQs running
with them mapped anywhere for write.  For AVC and IMA, I'm less sure.


Why would these be less sensitive?

But I see a big difference between my initial implementation and this one.

In my case, by using a shared mapping, visible to all cores, freezing
the core that is performing the write would have exposed the writable
mapping to a potential attack run from another core.

If the mapping is private to the core performing the write, even if it
is frozen, it's much harder to figure out what it had mapped and where,
from another core.

To access that mapping, the attack should be performed from the ISR, I
think.


Unless the secondary mapping is also available to other cores, through
the shared mm_struct ?



I don't think this matters much.  The other cores will only be able to
use that mapping when they're doing a rare write.



I'm still mulling over this.
There might be other reasons for replicating the mm_struct.

If I understand correctly how the text patching works, it happens 
sequentially, because of the text_mutex used by arch_jump_label_transform


Which might be fine for this specific case, but I think I shouldn't 
introduce a global mutex, when it comes to data.
Most likely, if two or more cores want to perform a write rare 
operation, there is no correlation between them, they could proceed in 
parallel. And if there really is, then the user of the API should 
introduce own locking, for that specific case.


A bit unrelated question, related to text patching: I see that each 
patching operation is validated, but wouldn't it be more robust to first 
validate  all of then, and only after they are all found to be 
compliant, to proceed with the actual modifications?


And about the actual implementation of the write rare for the statically 
allocated variables, is it expected that I use Nadav's function?

Or that I refactor the code?

The name, referring to text would definitely not be ok for data.
And I would have to also generalize it, to deal with larger amounts of data.

I would find it easier, as first cut, to replicate its behavior and 
refactor only later, once it has stabilized and possibly Nadav's patches 
have been acked.


--
igor


Re: [PATCH 10/17] prmem: documentation

2018-11-13 Thread Andy Lutomirski
On Tue, Nov 13, 2018 at 10:33 AM Igor Stoppa  wrote:
>
> I forgot one sentence :-(
>
> On 13/11/2018 20:31, Igor Stoppa wrote:
> > On 13/11/2018 19:47, Andy Lutomirski wrote:
> >
> >> For general rare-writish stuff, I don't think we want IRQs running
> >> with them mapped anywhere for write.  For AVC and IMA, I'm less sure.
> >
> > Why would these be less sensitive?
> >
> > But I see a big difference between my initial implementation and this one.
> >
> > In my case, by using a shared mapping, visible to all cores, freezing
> > the core that is performing the write would have exposed the writable
> > mapping to a potential attack run from another core.
> >
> > If the mapping is private to the core performing the write, even if it
> > is frozen, it's much harder to figure out what it had mapped and where,
> > from another core.
> >
> > To access that mapping, the attack should be performed from the ISR, I
> > think.
>
> Unless the secondary mapping is also available to other cores, through
> the shared mm_struct ?
>

I don't think this matters much.  The other cores will only be able to
use that mapping when they're doing a rare write.


Re: [PATCH 10/17] prmem: documentation

2018-11-13 Thread Andy Lutomirski
On Tue, Nov 13, 2018 at 10:33 AM Igor Stoppa  wrote:
>
> I forgot one sentence :-(
>
> On 13/11/2018 20:31, Igor Stoppa wrote:
> > On 13/11/2018 19:47, Andy Lutomirski wrote:
> >
> >> For general rare-writish stuff, I don't think we want IRQs running
> >> with them mapped anywhere for write.  For AVC and IMA, I'm less sure.
> >
> > Why would these be less sensitive?
> >
> > But I see a big difference between my initial implementation and this one.
> >
> > In my case, by using a shared mapping, visible to all cores, freezing
> > the core that is performing the write would have exposed the writable
> > mapping to a potential attack run from another core.
> >
> > If the mapping is private to the core performing the write, even if it
> > is frozen, it's much harder to figure out what it had mapped and where,
> > from another core.
> >
> > To access that mapping, the attack should be performed from the ISR, I
> > think.
>
> Unless the secondary mapping is also available to other cores, through
> the shared mm_struct ?
>

I don't think this matters much.  The other cores will only be able to
use that mapping when they're doing a rare write.


Re: [PATCH 10/17] prmem: documentation

2018-11-13 Thread Igor Stoppa
On 13/11/2018 19:16, Andy Lutomirski wrote:
> On Tue, Nov 13, 2018 at 6:25 AM Igor Stoppa  wrote:

[...]

>> How about having one mm_struct for each writer (core or thread)?
>>
> 
> I don't think that helps anything.  I think the mm_struct used for
> prmem (or rare_write or whatever you want to call it)

write_rare / rarely can be shortened to wr_  which is kinda less
confusing than rare_write, since it would become rw_ and easier to
confuse with R/W

Any advice for better naming is welcome.

> should be
> entirely abstracted away by an appropriate API, so neither SELinux nor
> IMA need to be aware that there's an mm_struct involved.

Yes, that is fine. In my proposal I was thinking about tying it to the
core/thread that performs the actual write.

The high level API could be something like:

wr_memcpy(void *src, void *dst, uint_t size)

>  It's also
> entirely possible that some architectures won't even use an mm_struct
> behind the scenes -- x86, for example, could have avoided it if there
> were a kernel equivalent of PKRU.  Sadly, there isn't.

The mm_struct - or whatever is the means to do the write on that
architecture - can be kept hidden from the API.

But the reason why I was proposing to have one mm_struct per writer is
that, iiuc, the secondary mapping is created in the secondary mm_struct
for each writer using it.

So the updating of IMA measurements would have, theoretically, also
write access to the SELinux AVC. Which I was trying to avoid.
And similarly any other write rare updater. Is this correct?

>> 2) Iiuc, the purpose of the 2 pages being remapped is that the target of
>> the patch might spill across the page boundary, however if I deal with
>> the modification of generic data, I shouldn't (shouldn't I?) assume that
>> the data will not span across multiple pages.
> 
> The reason for the particular architecture of text_poke() is to avoid
> memory allocation to get it working.  i think that prmem/rare_write
> should have each rare-writable kernel address map to a unique user
> address, possibly just by offsetting everything by a constant.  For
> rare_write, you don't actually need it to work as such until fairly
> late in boot, since the rare_writable data will just be writable early
> on.

Yes, that is true. I think it's safe to assume, from an attack pattern,
that as long as user space is not started, the system can be considered
ok. Even user-space code run from initrd should be ok, since it can be
bundled (and signed) as a single binary with the kernel.

Modules loaded from a regular filesystem are a bit more risky, because
an attack might inject a rogue key in the key-ring and use it to load
malicious modules.

>> If the data spans across multiple pages, in unknown amount, I suppose
>> that I should not keep interrupts disabled for an unknown time, as it
>> would hurt preemption.
>>
>> What I thought, in my initial patch-set, was to iterate over each page
>> that must be written to, in a loop, re-enabling interrupts in-between
>> iterations, to give pending interrupts a chance to be served.
>>
>> This would mean that the data being written to would not be consistent,
>> but it's a problem that would have to be addressed anyways, since it can
>> be still read by other cores, while the write is ongoing.
> 
> This probably makes sense, except that enabling and disabling
> interrupts means you also need to restore the original mm_struct (most
> likely), which is slow.  I don't think there's a generic way to check
> whether in interrupt is pending without turning interrupts on.

The only "excuse" I have is that write_rare is opt-in and is "rare".
Maybe the enabling/disabling of interrupts - and the consequent switch
of mm_struct - could be somehow tied to the latency configuration?

If preemption is disabled, the expectations on the system latency are
anyway more relaxed.

But I'm not sure how it would work against I/O.

--
igor


Re: [PATCH 10/17] prmem: documentation

2018-11-13 Thread Igor Stoppa
On 13/11/2018 19:16, Andy Lutomirski wrote:
> On Tue, Nov 13, 2018 at 6:25 AM Igor Stoppa  wrote:

[...]

>> How about having one mm_struct for each writer (core or thread)?
>>
> 
> I don't think that helps anything.  I think the mm_struct used for
> prmem (or rare_write or whatever you want to call it)

write_rare / rarely can be shortened to wr_  which is kinda less
confusing than rare_write, since it would become rw_ and easier to
confuse with R/W

Any advice for better naming is welcome.

> should be
> entirely abstracted away by an appropriate API, so neither SELinux nor
> IMA need to be aware that there's an mm_struct involved.

Yes, that is fine. In my proposal I was thinking about tying it to the
core/thread that performs the actual write.

The high level API could be something like:

wr_memcpy(void *src, void *dst, uint_t size)

>  It's also
> entirely possible that some architectures won't even use an mm_struct
> behind the scenes -- x86, for example, could have avoided it if there
> were a kernel equivalent of PKRU.  Sadly, there isn't.

The mm_struct - or whatever is the means to do the write on that
architecture - can be kept hidden from the API.

But the reason why I was proposing to have one mm_struct per writer is
that, iiuc, the secondary mapping is created in the secondary mm_struct
for each writer using it.

So the updating of IMA measurements would have, theoretically, also
write access to the SELinux AVC. Which I was trying to avoid.
And similarly any other write rare updater. Is this correct?

>> 2) Iiuc, the purpose of the 2 pages being remapped is that the target of
>> the patch might spill across the page boundary, however if I deal with
>> the modification of generic data, I shouldn't (shouldn't I?) assume that
>> the data will not span across multiple pages.
> 
> The reason for the particular architecture of text_poke() is to avoid
> memory allocation to get it working.  i think that prmem/rare_write
> should have each rare-writable kernel address map to a unique user
> address, possibly just by offsetting everything by a constant.  For
> rare_write, you don't actually need it to work as such until fairly
> late in boot, since the rare_writable data will just be writable early
> on.

Yes, that is true. I think it's safe to assume, from an attack pattern,
that as long as user space is not started, the system can be considered
ok. Even user-space code run from initrd should be ok, since it can be
bundled (and signed) as a single binary with the kernel.

Modules loaded from a regular filesystem are a bit more risky, because
an attack might inject a rogue key in the key-ring and use it to load
malicious modules.

>> If the data spans across multiple pages, in unknown amount, I suppose
>> that I should not keep interrupts disabled for an unknown time, as it
>> would hurt preemption.
>>
>> What I thought, in my initial patch-set, was to iterate over each page
>> that must be written to, in a loop, re-enabling interrupts in-between
>> iterations, to give pending interrupts a chance to be served.
>>
>> This would mean that the data being written to would not be consistent,
>> but it's a problem that would have to be addressed anyways, since it can
>> be still read by other cores, while the write is ongoing.
> 
> This probably makes sense, except that enabling and disabling
> interrupts means you also need to restore the original mm_struct (most
> likely), which is slow.  I don't think there's a generic way to check
> whether in interrupt is pending without turning interrupts on.

The only "excuse" I have is that write_rare is opt-in and is "rare".
Maybe the enabling/disabling of interrupts - and the consequent switch
of mm_struct - could be somehow tied to the latency configuration?

If preemption is disabled, the expectations on the system latency are
anyway more relaxed.

But I'm not sure how it would work against I/O.

--
igor


Re: [PATCH 10/17] prmem: documentation

2018-11-13 Thread Nadav Amit
From: Andy Lutomirski
Sent: November 13, 2018 at 5:47:16 PM GMT
> To: Nadav Amit 
> Cc: Igor Stoppa , Kees Cook , 
> Peter Zijlstra , Mimi Zohar , 
> Matthew Wilcox , Dave Chinner , 
> James Morris , Michal Hocko , Kernel 
> Hardening , linux-integrity 
> , LSM List 
> , Igor Stoppa 
> , Dave Hansen , Jonathan 
> Corbet , Laura Abbott , Randy Dunlap 
> , Mike Rapoport , open 
> list:DOCUMENTATION , LKML 
> , Thomas Gleixner 
> Subject: Re: [PATCH 10/17] prmem: documentation
> 
> 
> On Tue, Nov 13, 2018 at 9:43 AM Nadav Amit  wrote:
>> From: Andy Lutomirski
>> Sent: November 13, 2018 at 5:16:09 PM GMT
>>> To: Igor Stoppa 
>>> Cc: Kees Cook , Peter Zijlstra 
>>> , Nadav Amit , Mimi Zohar 
>>> , Matthew Wilcox , Dave 
>>> Chinner , James Morris , Michal 
>>> Hocko , Kernel Hardening 
>>> , linux-integrity 
>>> , LSM List 
>>> , Igor Stoppa 
>>> , Dave Hansen , 
>>> Jonathan Corbet , Laura Abbott , Randy 
>>> Dunlap , Mike Rapoport , 
>>> open list:DOCUMENTATION , LKML 
>>> , Thomas Gleixner 
>>> Subject: Re: [PATCH 10/17] prmem: documentation
>>> 
>>> 
>>> On Tue, Nov 13, 2018 at 6:25 AM Igor Stoppa  wrote:
>>>> Hello,
>>>> I've been studying v4 of the patch-set [1] that Nadav has been working on.
>>>> Incidentally, I think it would be useful to cc also the
>>>> security/hardening ml.
>>>> The patch-set seems to be close to final, so I am resuming this discussion.
>>>> 
>>>> On 30/10/2018 19:06, Andy Lutomirski wrote:
>>>> 
>>>>> I support the addition of a rare-write mechanism to the upstream kernel.  
>>>>> And I think that there is only one sane way to implement it: using an 
>>>>> mm_struct. That mm_struct, just like any sane mm_struct, should only 
>>>>> differ from init_mm in that it has extra mappings in the *user* region.
>>>> 
>>>> After reading the code, I see what you meant.
>>>> I think I can work with it.
>>>> 
>>>> But I have a couple of questions wrt the use of this mechanism, in the
>>>> context of write rare.
>>>> 
>>>> 
>>>> 1) mm_struct.
>>>> 
>>>> Iiuc, the purpose of the patchset is mostly (only?) to patch kernel code
>>>> (live patch?), which seems to happen sequentially and in a relatively
>>>> standardized way, like replacing the NOPs specifically placed in the
>>>> functions that need patching.
>>>> 
>>>> This is a bit different from the more generic write-rare case, applied
>>>> to data.
>>>> 
>>>> As example, I have in mind a system where both IMA and SELinux are in use.
>>>> 
>>>> In this system, a file is accessed for the first time.
>>>> 
>>>> That would trigger 2 things:
>>>> - evaluation of the SELinux rules and probably update of the AVC cache
>>>> - IMA measurement and update of the measurements
>>>> 
>>>> Both of them could be write protected, meaning that they would both have
>>>> to be modified through the write rare mechanism.
>>>> 
>>>> While the events, for 1 specific file, would be sequential, it's not
>>>> difficult to imagine that multiple files could be accessed at the same 
>>>> time.
>>>> 
>>>> If the update of the data structures in both IMA and SELinux must use
>>>> the same mm_struct, that would have to be somehow regulated and it would
>>>> introduce an unnecessary (imho) dependency.
>>>> 
>>>> How about having one mm_struct for each writer (core or thread)?
>>> 
>>> I don't think that helps anything.  I think the mm_struct used for
>>> prmem (or rare_write or whatever you want to call it) should be
>>> entirely abstracted away by an appropriate API, so neither SELinux nor
>>> IMA need to be aware that there's an mm_struct involved.  It's also
>>> entirely possible that some architectures won't even use an mm_struct
>>> behind the scenes -- x86, for example, could have avoided it if there
>>> were a kernel equivalent of PKRU.  Sadly, there isn't.
>>> 
>>>> 2) Iiuc, the purpose of the 2 pages being remapped is that the target of
>>>> the patch might spill across the page boundary, however if I deal with
>>>> the modification of generic data, I shouldn't (shouldn't I?) assume that
>>>> the data will not

Re: [PATCH 10/17] prmem: documentation

2018-11-13 Thread Nadav Amit
From: Andy Lutomirski
Sent: November 13, 2018 at 5:47:16 PM GMT
> To: Nadav Amit 
> Cc: Igor Stoppa , Kees Cook , 
> Peter Zijlstra , Mimi Zohar , 
> Matthew Wilcox , Dave Chinner , 
> James Morris , Michal Hocko , Kernel 
> Hardening , linux-integrity 
> , LSM List 
> , Igor Stoppa 
> , Dave Hansen , Jonathan 
> Corbet , Laura Abbott , Randy Dunlap 
> , Mike Rapoport , open 
> list:DOCUMENTATION , LKML 
> , Thomas Gleixner 
> Subject: Re: [PATCH 10/17] prmem: documentation
> 
> 
> On Tue, Nov 13, 2018 at 9:43 AM Nadav Amit  wrote:
>> From: Andy Lutomirski
>> Sent: November 13, 2018 at 5:16:09 PM GMT
>>> To: Igor Stoppa 
>>> Cc: Kees Cook , Peter Zijlstra 
>>> , Nadav Amit , Mimi Zohar 
>>> , Matthew Wilcox , Dave 
>>> Chinner , James Morris , Michal 
>>> Hocko , Kernel Hardening 
>>> , linux-integrity 
>>> , LSM List 
>>> , Igor Stoppa 
>>> , Dave Hansen , 
>>> Jonathan Corbet , Laura Abbott , Randy 
>>> Dunlap , Mike Rapoport , 
>>> open list:DOCUMENTATION , LKML 
>>> , Thomas Gleixner 
>>> Subject: Re: [PATCH 10/17] prmem: documentation
>>> 
>>> 
>>> On Tue, Nov 13, 2018 at 6:25 AM Igor Stoppa  wrote:
>>>> Hello,
>>>> I've been studying v4 of the patch-set [1] that Nadav has been working on.
>>>> Incidentally, I think it would be useful to cc also the
>>>> security/hardening ml.
>>>> The patch-set seems to be close to final, so I am resuming this discussion.
>>>> 
>>>> On 30/10/2018 19:06, Andy Lutomirski wrote:
>>>> 
>>>>> I support the addition of a rare-write mechanism to the upstream kernel.  
>>>>> And I think that there is only one sane way to implement it: using an 
>>>>> mm_struct. That mm_struct, just like any sane mm_struct, should only 
>>>>> differ from init_mm in that it has extra mappings in the *user* region.
>>>> 
>>>> After reading the code, I see what you meant.
>>>> I think I can work with it.
>>>> 
>>>> But I have a couple of questions wrt the use of this mechanism, in the
>>>> context of write rare.
>>>> 
>>>> 
>>>> 1) mm_struct.
>>>> 
>>>> Iiuc, the purpose of the patchset is mostly (only?) to patch kernel code
>>>> (live patch?), which seems to happen sequentially and in a relatively
>>>> standardized way, like replacing the NOPs specifically placed in the
>>>> functions that need patching.
>>>> 
>>>> This is a bit different from the more generic write-rare case, applied
>>>> to data.
>>>> 
>>>> As example, I have in mind a system where both IMA and SELinux are in use.
>>>> 
>>>> In this system, a file is accessed for the first time.
>>>> 
>>>> That would trigger 2 things:
>>>> - evaluation of the SELinux rules and probably update of the AVC cache
>>>> - IMA measurement and update of the measurements
>>>> 
>>>> Both of them could be write protected, meaning that they would both have
>>>> to be modified through the write rare mechanism.
>>>> 
>>>> While the events, for 1 specific file, would be sequential, it's not
>>>> difficult to imagine that multiple files could be accessed at the same 
>>>> time.
>>>> 
>>>> If the update of the data structures in both IMA and SELinux must use
>>>> the same mm_struct, that would have to be somehow regulated and it would
>>>> introduce an unnecessary (imho) dependency.
>>>> 
>>>> How about having one mm_struct for each writer (core or thread)?
>>> 
>>> I don't think that helps anything.  I think the mm_struct used for
>>> prmem (or rare_write or whatever you want to call it) should be
>>> entirely abstracted away by an appropriate API, so neither SELinux nor
>>> IMA need to be aware that there's an mm_struct involved.  It's also
>>> entirely possible that some architectures won't even use an mm_struct
>>> behind the scenes -- x86, for example, could have avoided it if there
>>> were a kernel equivalent of PKRU.  Sadly, there isn't.
>>> 
>>>> 2) Iiuc, the purpose of the 2 pages being remapped is that the target of
>>>> the patch might spill across the page boundary, however if I deal with
>>>> the modification of generic data, I shouldn't (shouldn't I?) assume that
>>>> the data will not

Re: [PATCH 10/17] prmem: documentation

2018-10-31 Thread Igor Stoppa




On 01/11/2018 01:19, Andy Lutomirski wrote:


ISTM you don't need that atomic operation -- you could take a spinlock
and then just add one directly to the variable.


It was my intention to provide a 1:1 conversion of existing code, as it 
should be easier to verify the correctness of the conversion, as long as 
there isn't any significant degradation in performance.


The rework could be done afterward.

--
igor


Re: [PATCH 10/17] prmem: documentation

2018-10-31 Thread Igor Stoppa




On 01/11/2018 01:19, Andy Lutomirski wrote:


ISTM you don't need that atomic operation -- you could take a spinlock
and then just add one directly to the variable.


It was my intention to provide a 1:1 conversion of existing code, as it 
should be easier to verify the correctness of the conversion, as long as 
there isn't any significant degradation in performance.


The rework could be done afterward.

--
igor


Re: [PATCH 10/17] prmem: documentation

2018-10-31 Thread Andy Lutomirski
On Wed, Oct 31, 2018 at 4:10 PM Igor Stoppa  wrote:
>
>
>
> On 01/11/2018 00:57, Andy Lutomirski wrote:
> >
> >
> >> On Oct 31, 2018, at 2:00 PM, Peter Zijlstra  wrote:
>
>
>
> >> I _think_ the use-case for atomics is updating the reference counts of
> >> objects that are in this write-rare domain. But I'm not entirely clear
> >> on that myself either. I just really want to avoid duplicating that
> >> stuff.
> >
> > Sounds nuts. Doing a rare-write is many hundreds of cycles at best. Using 
> > that for a reference count sounds wacky.
> >
> > Can we see a *real* use case before we over complicate the API?
> >
>
>
> Does patch #14 of this set not qualify? ima_htable.len ?
>
> https://www.openwall.com/lists/kernel-hardening/2018/10/23/20
>

Do you mean this (sorry for whitespace damage):

+ pratomic_long_inc(_htable.len);

- atomic_long_inc(_htable.len);
  if (update_htable) {
key = ima_hash_key(entry->digest);
-   hlist_add_head_rcu(>hnext, _htable.queue[key]);
+   prhlist_add_head_rcu(>hnext, _htable.queue[key]);
  }

ISTM you don't need that atomic operation -- you could take a spinlock
and then just add one directly to the variable.

--Andy


Re: [PATCH 10/17] prmem: documentation

2018-10-31 Thread Andy Lutomirski
On Wed, Oct 31, 2018 at 4:10 PM Igor Stoppa  wrote:
>
>
>
> On 01/11/2018 00:57, Andy Lutomirski wrote:
> >
> >
> >> On Oct 31, 2018, at 2:00 PM, Peter Zijlstra  wrote:
>
>
>
> >> I _think_ the use-case for atomics is updating the reference counts of
> >> objects that are in this write-rare domain. But I'm not entirely clear
> >> on that myself either. I just really want to avoid duplicating that
> >> stuff.
> >
> > Sounds nuts. Doing a rare-write is many hundreds of cycles at best. Using 
> > that for a reference count sounds wacky.
> >
> > Can we see a *real* use case before we over complicate the API?
> >
>
>
> Does patch #14 of this set not qualify? ima_htable.len ?
>
> https://www.openwall.com/lists/kernel-hardening/2018/10/23/20
>

Do you mean this (sorry for whitespace damage):

+ pratomic_long_inc(_htable.len);

- atomic_long_inc(_htable.len);
  if (update_htable) {
key = ima_hash_key(entry->digest);
-   hlist_add_head_rcu(>hnext, _htable.queue[key]);
+   prhlist_add_head_rcu(>hnext, _htable.queue[key]);
  }

ISTM you don't need that atomic operation -- you could take a spinlock
and then just add one directly to the variable.

--Andy


Re: [PATCH 10/17] prmem: documentation

2018-10-30 Thread Igor Stoppa




On 30/10/2018 23:02, Andy Lutomirski wrote:




On Oct 30, 2018, at 1:43 PM, Igor Stoppa  wrote:




There is no need to process each of these tens of thousands allocations and 
initialization as write-rare.

Would it be possible to do the same here?


I don’t see why not, although getting the API right will be a tad complicated.


yes, I have some first-hand experience with this :-/





To subsequently modify q,

p = rare_modify(q);
q->a = y;


Do you mean

 p->a = y;

here? I assume the intent is that q isn't writable ever, but that's
the one we have in the structure at rest.

Yes, that was my intent, thanks.
To handle the list case that Igor has pointed out, you might want to
do something like this:
list_for_each_entry(x, , entry) {
struct foo *writable = rare_modify(entry);


Would this mapping be impossible to spoof by other cores?



Indeed. Only the core with the special mm loaded could see it.

But I dislike allowing regular writes in the protected region. We really only 
need four write primitives:

1. Just write one value.  Call at any time (except NMI).

2. Just copy some bytes. Same as (1) but any number of bytes.

3,4: Same as 1 and 2 but must be called inside a special rare write region. 
This is purely an optimization.


Atomic? RCU?

Yes, they are technically just memory writes, but shouldn't the "normal" 
operation be executed on the writable mapping?


It is possible to sandwich any call between a rare_modify/rare_protect, 
however isn't that pretty close to having a write-rare version of these 
plain function.


--
igor


Re: [PATCH 10/17] prmem: documentation

2018-10-30 Thread Igor Stoppa




On 30/10/2018 23:02, Andy Lutomirski wrote:




On Oct 30, 2018, at 1:43 PM, Igor Stoppa  wrote:




There is no need to process each of these tens of thousands allocations and 
initialization as write-rare.

Would it be possible to do the same here?


I don’t see why not, although getting the API right will be a tad complicated.


yes, I have some first-hand experience with this :-/





To subsequently modify q,

p = rare_modify(q);
q->a = y;


Do you mean

 p->a = y;

here? I assume the intent is that q isn't writable ever, but that's
the one we have in the structure at rest.

Yes, that was my intent, thanks.
To handle the list case that Igor has pointed out, you might want to
do something like this:
list_for_each_entry(x, , entry) {
struct foo *writable = rare_modify(entry);


Would this mapping be impossible to spoof by other cores?



Indeed. Only the core with the special mm loaded could see it.

But I dislike allowing regular writes in the protected region. We really only 
need four write primitives:

1. Just write one value.  Call at any time (except NMI).

2. Just copy some bytes. Same as (1) but any number of bytes.

3,4: Same as 1 and 2 but must be called inside a special rare write region. 
This is purely an optimization.


Atomic? RCU?

Yes, they are technically just memory writes, but shouldn't the "normal" 
operation be executed on the writable mapping?


It is possible to sandwich any call between a rare_modify/rare_protect, 
however isn't that pretty close to having a write-rare version of these 
plain function.


--
igor


Re: [PATCH 10/17] prmem: documentation

2018-10-26 Thread Markus Heiser

Am 26.10.18 um 11:26 schrieb Peter Zijlstra:
>> Jon,
>>
>> So the below document is a prime example for why I think RST sucks. As a
>> text document readability is greatly diminished by all the markup
>> nonsense.
>>
>> This stuff should not become write-only content like html and other
>> gunk. The actual text file is still the primary means of reading this.
>
> I think Igor neglected to read doc-guide/sphinx.rst:
>
> Specific guidelines for the kernel documentation
> 
>
> Here are some specific guidelines for the kernel documentation:
>
> * Please don't go overboard with reStructuredText markup. Keep it
>simple. For the most part the documentation should be plain text with
>just enough consistency in formatting that it can be converted to
>other formats.
>
> I agree that it's overly marked up.

To add my two cent ..

> WTH is with all those ** ?

I guess Igor was looking for a definition list ...

>> +Available protections for kernel data
>> +-
>> +
>> +- **constant**
>> +   Labelled as **const**, the data is never supposed to be altered.
>> +   It is statically allocated - if it has any memory footprint at all.
>> +   The compiler can even optimize it away, where possible, by replacing
>> +   references to a **const** with its actual value.


+Available protections for kernel data
+-
+
+constant
+   Labelled as const, the data is never supposed to be altered.
+   It is statically allocated - if it has any memory footprint at all.
+   The compiler can even optimize it away, where possible, by replacing
+   references to a const with its actual value.

see "Lists and Quote-like blocks" [1]

[1] 
http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks


-- Markus --


Re: [PATCH 10/17] prmem: documentation

2018-10-26 Thread Markus Heiser

Am 26.10.18 um 11:26 schrieb Peter Zijlstra:
>> Jon,
>>
>> So the below document is a prime example for why I think RST sucks. As a
>> text document readability is greatly diminished by all the markup
>> nonsense.
>>
>> This stuff should not become write-only content like html and other
>> gunk. The actual text file is still the primary means of reading this.
>
> I think Igor neglected to read doc-guide/sphinx.rst:
>
> Specific guidelines for the kernel documentation
> 
>
> Here are some specific guidelines for the kernel documentation:
>
> * Please don't go overboard with reStructuredText markup. Keep it
>simple. For the most part the documentation should be plain text with
>just enough consistency in formatting that it can be converted to
>other formats.
>
> I agree that it's overly marked up.

To add my two cent ..

> WTH is with all those ** ?

I guess Igor was looking for a definition list ...

>> +Available protections for kernel data
>> +-
>> +
>> +- **constant**
>> +   Labelled as **const**, the data is never supposed to be altered.
>> +   It is statically allocated - if it has any memory footprint at all.
>> +   The compiler can even optimize it away, where possible, by replacing
>> +   references to a **const** with its actual value.


+Available protections for kernel data
+-
+
+constant
+   Labelled as const, the data is never supposed to be altered.
+   It is statically allocated - if it has any memory footprint at all.
+   The compiler can even optimize it away, where possible, by replacing
+   references to a const with its actual value.

see "Lists and Quote-like blocks" [1]

[1] 
http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks


-- Markus --


Re: [PATCH 10/17] prmem: documentation

2018-10-26 Thread Matthew Wilcox
On Fri, Oct 26, 2018 at 11:26:09AM +0200, Peter Zijlstra wrote:
> Jon,
> 
> So the below document is a prime example for why I think RST sucks. As a
> text document readability is greatly diminished by all the markup
> nonsense.
> 
> This stuff should not become write-only content like html and other
> gunk. The actual text file is still the primary means of reading this.

I think Igor neglected to read doc-guide/sphinx.rst:

Specific guidelines for the kernel documentation


Here are some specific guidelines for the kernel documentation:

* Please don't go overboard with reStructuredText markup. Keep it
  simple. For the most part the documentation should be plain text with
  just enough consistency in formatting that it can be converted to
  other formats.

I agree that it's overly marked up.


Re: [PATCH 10/17] prmem: documentation

2018-10-26 Thread Matthew Wilcox
On Fri, Oct 26, 2018 at 11:26:09AM +0200, Peter Zijlstra wrote:
> Jon,
> 
> So the below document is a prime example for why I think RST sucks. As a
> text document readability is greatly diminished by all the markup
> nonsense.
> 
> This stuff should not become write-only content like html and other
> gunk. The actual text file is still the primary means of reading this.

I think Igor neglected to read doc-guide/sphinx.rst:

Specific guidelines for the kernel documentation


Here are some specific guidelines for the kernel documentation:

* Please don't go overboard with reStructuredText markup. Keep it
  simple. For the most part the documentation should be plain text with
  just enough consistency in formatting that it can be converted to
  other formats.

I agree that it's overly marked up.