Re: [PATCH] mmu notifiers #v6

2008-02-21 Thread Jack Steiner
> I really want suggestions on Jack's concern about issuing an
> invalidate per pte entry or per-pte instead of per-range. I'll answer
> that in a separate email. For KVM my patch is already close to optimal
> because each single spte invalidate requires a fixed amount of work,
> but for GRU a large invalidate-range would be more efficient.
>
> To address the GRU _valid_ concern, I can create a second version of
> my patch with range_begin/end instead of invalidate_pages, that still

I don't know how much significance to place on this data, but it is
a real data point.

I ran the GRU regression test suite on kernels with both types of
mmu_notifiers. The kernel/driver using Christoph's patch had
1/7 the number of TLB invalidates as Andrea's patch.

This reduction is due to both differences I mentioned yesterday:
- different location of callout for address space teardown
- range callouts

Unfortunately, the current driver does not allow me to quantify
which of the differences is most significant.

Also, I'll try to post the driver within the next few days. It is
still in development but it compiles and can successfully run most
workloads on a system simulator.

--- jack
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-21 Thread Andrea Arcangeli
On Thu, Feb 21, 2008 at 05:54:30AM +0100, Nick Piggin wrote:
> will send you incremental changes that can be discussed more easily
> that way (nothing major, mainly style and minor things).

I don't need to say you're very welcome ;).

> I agree: your coherent, non-sleeping mmu notifiers are pretty simple
> and unintrusive. The sleeping version is fundamentally going to either
> need to change VM locks, or be non-coherent, so I don't think there is
> a question of making one solution fit everybody. So the sleeping /
> xrmap patch should be kept either completely independent, or as an
> add-on to this one.

The need to change the VM locks to fit the sleepable "mmu notifier"
needs, I think is the major reason why the sleeping patch should be a
separate config option unless you think the i_mmap_lock will benefit
the VM for its own good regardless of the sleepable mmu
notifiers. Otherwise we'll end up merging in mainline an API that can
only satisfy the needs of the "sleeping users" that are only
interested about anonymous memory. While the basic concept of the mmu
notifiers is to cover the whole user visible address space, not just
anonymous memory! Furthermore XPMEM users already asked to work on
tmpfs/MAP_SHARED too...

Originally the trick that I was trying to remove the "atomic" param,
was to defer the invalidate_range after dropping the i_mmap_lock. But
clearly in truncate we'll have no more guarantees that nor the vma nor
the MM still exists after spin_unlock(i_mmap_lock) is called... So
it's simply impossible to call the mmu notifier out of the i_mmap_lock
for truncate, and Christoph's patch looks unfixable without altering
the VM core locking. Christoph's API one-config-fits-all can't really
fit-all, but only the anonymous memory.

However if I wear a KVM hat, I cannot care less what is merged as long
as .25 will be able to fully swap reliably a virtualized guest OS ;).
This is why I'm totally willing to support any decision in favor of
anything (including your own patch that would only work for KVM) that
can be merged.

> I will post some suggestions to you when I get a chance.

I really want suggestions on Jack's concern about issuing an
invalidate per pte entry or per-pte instead of per-range. I'll answer
that in a separate email. For KVM my patch is already close to optimal
because each single spte invalidate requires a fixed amount of work,
but for GRU a large invalidate-range would be more efficient.

To address the GRU _valid_ concern, I can create a second version of
my patch with range_begin/end instead of invalidate_pages, that still
won't support sleeping users like XPMEM but only KVM and GRU. Then
it's up to Christoph when he comes back to alter the vm locking so
that those calls can sleep too... But that will require a much bigger
change and then perhaps xpmem can share the same mmu notifiers when
the config option to make the mmu notifier sleepable is enabled. But
that part would better be incremental as it's not so obviously safe to
merge as the mmu notifier themself.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-21 Thread Andrea Arcangeli
On Thu, Feb 21, 2008 at 05:54:30AM +0100, Nick Piggin wrote:
 will send you incremental changes that can be discussed more easily
 that way (nothing major, mainly style and minor things).

I don't need to say you're very welcome ;).

 I agree: your coherent, non-sleeping mmu notifiers are pretty simple
 and unintrusive. The sleeping version is fundamentally going to either
 need to change VM locks, or be non-coherent, so I don't think there is
 a question of making one solution fit everybody. So the sleeping /
 xrmap patch should be kept either completely independent, or as an
 add-on to this one.

The need to change the VM locks to fit the sleepable mmu notifier
needs, I think is the major reason why the sleeping patch should be a
separate config option unless you think the i_mmap_lock will benefit
the VM for its own good regardless of the sleepable mmu
notifiers. Otherwise we'll end up merging in mainline an API that can
only satisfy the needs of the sleeping users that are only
interested about anonymous memory. While the basic concept of the mmu
notifiers is to cover the whole user visible address space, not just
anonymous memory! Furthermore XPMEM users already asked to work on
tmpfs/MAP_SHARED too...

Originally the trick that I was trying to remove the atomic param,
was to defer the invalidate_range after dropping the i_mmap_lock. But
clearly in truncate we'll have no more guarantees that nor the vma nor
the MM still exists after spin_unlock(i_mmap_lock) is called... So
it's simply impossible to call the mmu notifier out of the i_mmap_lock
for truncate, and Christoph's patch looks unfixable without altering
the VM core locking. Christoph's API one-config-fits-all can't really
fit-all, but only the anonymous memory.

However if I wear a KVM hat, I cannot care less what is merged as long
as .25 will be able to fully swap reliably a virtualized guest OS ;).
This is why I'm totally willing to support any decision in favor of
anything (including your own patch that would only work for KVM) that
can be merged.

 I will post some suggestions to you when I get a chance.

I really want suggestions on Jack's concern about issuing an
invalidate per pte entry or per-pte instead of per-range. I'll answer
that in a separate email. For KVM my patch is already close to optimal
because each single spte invalidate requires a fixed amount of work,
but for GRU a large invalidate-range would be more efficient.

To address the GRU _valid_ concern, I can create a second version of
my patch with range_begin/end instead of invalidate_pages, that still
won't support sleeping users like XPMEM but only KVM and GRU. Then
it's up to Christoph when he comes back to alter the vm locking so
that those calls can sleep too... But that will require a much bigger
change and then perhaps xpmem can share the same mmu notifiers when
the config option to make the mmu notifier sleepable is enabled. But
that part would better be incremental as it's not so obviously safe to
merge as the mmu notifier themself.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-21 Thread Jack Steiner
 I really want suggestions on Jack's concern about issuing an
 invalidate per pte entry or per-pte instead of per-range. I'll answer
 that in a separate email. For KVM my patch is already close to optimal
 because each single spte invalidate requires a fixed amount of work,
 but for GRU a large invalidate-range would be more efficient.

 To address the GRU _valid_ concern, I can create a second version of
 my patch with range_begin/end instead of invalidate_pages, that still

I don't know how much significance to place on this data, but it is
a real data point.

I ran the GRU regression test suite on kernels with both types of
mmu_notifiers. The kernel/driver using Christoph's patch had
1/7 the number of TLB invalidates as Andrea's patch.

This reduction is due to both differences I mentioned yesterday:
- different location of callout for address space teardown
- range callouts

Unfortunately, the current driver does not allow me to quantify
which of the differences is most significant.

Also, I'll try to post the driver within the next few days. It is
still in development but it compiles and can successfully run most
workloads on a system simulator.

--- jack
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Nick Piggin
On Wed, Feb 20, 2008 at 01:03:24PM +0100, Andrea Arcangeli wrote:
> If there's agreement that the VM should alter its locking from
> spinlock to mutex for its own good, then Christoph's
> one-config-option-fits-all becomes a lot more appealing (replacing RCU
> with a mutex in the mmu notifier list registration locking isn't my
> main worry and the non-sleeping-users may be ok to live with it).

Just from a high level view, in some cases we can just say that no we
aren't going to support this. And this may well be one of those cases.

The more constraints placed on the VM, the harder it becomes to
improve and adapt in future. And this seems like a pretty big restriction.
(especially if we can eg. work around it completely by having a special
purpose driver to get_user_pages on comm buffers as I suggested in the
other mail).

At any rate, I believe Andrea's patch really places minimal or no further
constraints than a regular CPU TLB (or the hash tables that some archs
implement). So we're kind of in 2 different leagues here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Nick Piggin
On Wed, Feb 20, 2008 at 11:39:42AM +0100, Andrea Arcangeli wrote:
> Given Nick's comments I ported my version of the mmu notifiers to
> latest mainline. There are no known bugs AFIK and it's obviously safe
> (nothing is allowed to schedule inside rcu_read_lock taken by
> mmu_notifier() with my patch).

Thanks! Yes the seqlock you are using now ends up looking similar
to what I did and I couldn't find a hole in that either. So I
think this is going to work.

I do prefer some parts of my patch, however for everyone's sanity,
I think you should be the maintainer of the mmu notifiers, and I
will send you incremental changes that can be discussed more easily
that way (nothing major, mainly style and minor things).


> XPMEM simply can't use RCU for the registration locking if it wants to
> schedule inside the mmu notifier calls. So I guess it's better to add
> the XPMEM invalidate_range_end/begin/external-rmap as a whole
> different subsystem that will have to use a mutex (not RCU) to
> serialize, and at the same time that CONFIG_XPMEM will also have to
> switch the i_mmap_lock to a mutex. I doubt xpmem fits inside a
> CONFIG_MMU_NOTIFIER anymore, or we'll all run a bit slower because of
> it. It's really a call of how much we want to optimize the MMU
> notifier, by keeping things like RCU for the registration.

I agree: your coherent, non-sleeping mmu notifiers are pretty simple
and unintrusive. The sleeping version is fundamentally going to either
need to change VM locks, or be non-coherent, so I don't think there is
a question of making one solution fit everybody. So the sleeping /
xrmap patch should be kept either completely independent, or as an
add-on to this one.

I will post some suggestions to you when I get a chance.

 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Jack Steiner
On Wed, Feb 20, 2008 at 11:39:42AM +0100, Andrea Arcangeli wrote:
> Given Nick's comments I ported my version of the mmu notifiers to
> latest mainline. There are no known bugs AFIK and it's obviously safe
> (nothing is allowed to schedule inside rcu_read_lock taken by
> mmu_notifier() with my patch).
> 

I ported the GRU driver to use the latest #v6 patch and ran a series of
tests on it using our system simulator. The simulator is slow so true
stress or swapping is not possible - at least within a finite amount of
time.

Functionally, the #v6 patch seems to work for the GRU. However, I did
notice two significant differences that make the #v6 performance worse for
the GRU than Christoph's patch.  I think one difference is easily fixable
but the other is more difficult:

- the location of the mmu_notifier_release() callout is at a
  different place in the 2 patches. Christoph has the callout
  BEFORE the call to unmap_vmas() whereas you have it AFTER. The
  net result is that the GRU does a LOT of 1-page TLB flushes
  during process teardown.  These flushes are not done with
  Christops's patch.

- the range callouts in Christoph's patch benefit the GRU because
  multiple TLB entries can be flushed with a single GRU
  instruction (the GRU hardware supports a range flush using a
  vaddr & length).  The #v6 patch does a TLB flush for each page in
  the range.  Flushing on the GRU is slow so being able to flush
  multiple pages with a single request is a benefit.

Seems like the latter difference could be significant for other users
of mmu notifiers.


--- jack
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 08:41:55AM -0600, Robin Holt wrote:
> On Wed, Feb 20, 2008 at 11:39:42AM +0100, Andrea Arcangeli wrote:
> > XPMEM simply can't use RCU for the registration locking if it wants to
> > schedule inside the mmu notifier calls. So I guess it's better to add
> 
> Whoa there.  In Christoph's patch, we did not use rcu for the list.  It
> was a simple hlist_head.  The list manipulations were done under
> down_write(>mm->mmap_sem) and would therefore not be racy.  All
> the callout locations are already acquiring the mmap_sem at least
> readably, so we should be safe.  Maybe I missed a race somewhere.

You missed quite a few, see when atomic=1 and when mmu_rmap_notifier
is invoked for example.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Robin Holt
On Wed, Feb 20, 2008 at 11:39:42AM +0100, Andrea Arcangeli wrote:
> XPMEM simply can't use RCU for the registration locking if it wants to
> schedule inside the mmu notifier calls. So I guess it's better to add

Whoa there.  In Christoph's patch, we did not use rcu for the list.  It
was a simple hlist_head.  The list manipulations were done under
down_write(>mm->mmap_sem) and would therefore not be racy.  All
the callout locations are already acquiring the mmap_sem at least
readably, so we should be safe.  Maybe I missed a race somewhere.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Robin Holt
On Wed, Feb 20, 2008 at 01:32:36PM +0100, Andrea Arcangeli wrote:
> On Wed, Feb 20, 2008 at 06:24:24AM -0600, Robin Holt wrote:
> > We do not need to do any allocation in the messaging layer, all
> > structures used for messaging are allocated at module load time.
> > The allocation discussions we had early on were about trying to
> > rearrange you notifiers to allow a seperate worker thread to do the
> > invalidate and then the main thread would spin waiting for the worker to
> > complete.  That was canned by the moving your notifier to before the
> > lock was grabbed which led us to the point of needing a _begin and _end.
> 
> I thought you called some net/* function inside the mmu notifier
> methods. Those always require several ram allocations internally.

Nope, that was the discussions with the IB folks.  We only use XPC and
both the messages we send and the XPC internals do not need to allocate.

> > So, fundamentally, how would they be different?  Would we be required to
> > add another notifier list to the mm and have two seperate callout
> > points?  Reduction would end up with the same half-registered
> > half-not-registered situation you point out above.  Then further
> > reduction would lead to the elimination of the callouts you have just
> > proposed and using the _begin/_end callouts and we are back to
> > Christoph's current patch.
> 
> Did you miss Nick's argument that we'd need to change some VM lock to
> mutex and solve lock issues first? Are you implying mutex are more
> efficient for the VM? (you may seek support from preempt-rt folks at
> least) or are you implying the VM would better run slower with mutex
> in order to have a single config option?

That would be if we needed to support file backed mappings and hugetlbfs
mappings.  Currently (and for the last 6 years), XPMEM has not supported
either of those.  I don't view either as being a realistic possibility,
but it is certainly something we would need to address before either
could be supported.

Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 06:24:24AM -0600, Robin Holt wrote:
> We do not need to do any allocation in the messaging layer, all
> structures used for messaging are allocated at module load time.
> The allocation discussions we had early on were about trying to
> rearrange you notifiers to allow a seperate worker thread to do the
> invalidate and then the main thread would spin waiting for the worker to
> complete.  That was canned by the moving your notifier to before the
> lock was grabbed which led us to the point of needing a _begin and _end.

I thought you called some net/* function inside the mmu notifier
methods. Those always require several ram allocations internally.

> So, fundamentally, how would they be different?  Would we be required to
> add another notifier list to the mm and have two seperate callout
> points?  Reduction would end up with the same half-registered
> half-not-registered situation you point out above.  Then further
> reduction would lead to the elimination of the callouts you have just
> proposed and using the _begin/_end callouts and we are back to
> Christoph's current patch.

Did you miss Nick's argument that we'd need to change some VM lock to
mutex and solve lock issues first? Are you implying mutex are more
efficient for the VM? (you may seek support from preempt-rt folks at
least) or are you implying the VM would better run slower with mutex
in order to have a single config option?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Robin Holt
On Wed, Feb 20, 2008 at 01:03:24PM +0100, Andrea Arcangeli wrote:
> I'm unconvinced both the main linux VM and the mmu notifier should be
> changed like this just to support xpmem. All non-sleeping users don't
> need that. Nevertheless I'm fully welcome to support xpmem (and it's
> not my call nor my interest to comment if allocating skbs in
> try_to_unmap in order to unpin pages is workable, let's assume it's
> workable for the sake of this discussion) with a new config option
> that will also alter how the core VM works, in order to fully support
> the sleeping users for filebacked mappings.

We do not need to do any allocation in the messaging layer, all
structures used for messaging are allocated at module load time.
The allocation discussions we had early on were about trying to
rearrange you notifiers to allow a seperate worker thread to do the
invalidate and then the main thread would spin waiting for the worker to
complete.  That was canned by the moving your notifier to before the
lock was grabbed which led us to the point of needing a _begin and _end.

> This will also create less confusion in the registration. With
> Christoph's one-config-option-fits-all you had to half register into
> the mmu notifier (the sleeping calls, so not invalidate_page) and full
> register in the external rmap notifier, and I had to only half
> register into the mmu notifier (not range_begin) and not register in
> the rmap external notifier.
> 
> With two separate config options for sleeping and non sleeping users,
> I'll 100% register in the mmu notifier methods, and the non-sleeping
> users will 100% register the xpmem methods. You won't have to have
> designed the mmu notifier patches to understand how to use it.

So, fundamentally, how would they be different?  Would we be required to
add another notifier list to the mm and have two seperate callout
points?  Reduction would end up with the same half-registered
half-not-registered situation you point out above.  Then further
reduction would lead to the elimination of the callouts you have just
proposed and using the _begin/_end callouts and we are back to
Christoph's current patch.

Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 05:33:13AM -0600, Robin Holt wrote:
> But won't that other "subsystem" cause us to have two seperate callouts
> that do equivalent things and therefore force a removal of this and go
> back to what Christoph has currently proposed?

The point is that a new kind of notifier that only supports sleeping
users will allow to keep optimizing the mmu notifier patch for the
non-sleeping users. If we keep going Christoph's way of having a
single notifier that fits all he will have to:

1) drop the entire RCU locking from its patches (making all previous
   rcu discussions and fixes void) those discussions only made sense
   if applied to _my_ patch, not Christoph's patches as long as you
   pretend to sleep in any of his mmu notifier methods like invalidate_range_*.

2) probably modify the linux VM to replace the i_mmap_lock and perhaps
   PT lock with a mutex (see Nick's comments for details)

I'm unconvinced both the main linux VM and the mmu notifier should be
changed like this just to support xpmem. All non-sleeping users don't
need that. Nevertheless I'm fully welcome to support xpmem (and it's
not my call nor my interest to comment if allocating skbs in
try_to_unmap in order to unpin pages is workable, let's assume it's
workable for the sake of this discussion) with a new config option
that will also alter how the core VM works, in order to fully support
the sleeping users for filebacked mappings.

This will also create less confusion in the registration. With
Christoph's one-config-option-fits-all you had to half register into
the mmu notifier (the sleeping calls, so not invalidate_page) and full
register in the external rmap notifier, and I had to only half
register into the mmu notifier (not range_begin) and not register in
the rmap external notifier.

With two separate config options for sleeping and non sleeping users,
I'll 100% register in the mmu notifier methods, and the non-sleeping
users will 100% register the xpmem methods. You won't have to have
designed the mmu notifier patches to understand how to use it.

In theory both KVM and GRU are free to use the xpmem methods too (the
invalidate_page will be page_t based instead of [mm,addr] based, but
that's possible to handle with KVM changes if one wants to), but if a
distro only wants to support the sleeping users in their binary kernel
images, they won't be forced to alter how the VM works to do
that.

If there's agreement that the VM should alter its locking from
spinlock to mutex for its own good, then Christoph's
one-config-option-fits-all becomes a lot more appealing (replacing RCU
with a mutex in the mmu notifier list registration locking isn't my
main worry and the non-sleeping-users may be ok to live with it).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Robin Holt
On Wed, Feb 20, 2008 at 11:39:42AM +0100, Andrea Arcangeli wrote:
> Given Nick's comments I ported my version of the mmu notifiers to
> latest mainline. There are no known bugs AFIK and it's obviously safe
> (nothing is allowed to schedule inside rcu_read_lock taken by
> mmu_notifier() with my patch).
> 
> XPMEM simply can't use RCU for the registration locking if it wants to
> schedule inside the mmu notifier calls. So I guess it's better to add
> the XPMEM invalidate_range_end/begin/external-rmap as a whole
> different subsystem that will have to use a mutex (not RCU) to
> serialize, and at the same time that CONFIG_XPMEM will also have to
> switch the i_mmap_lock to a mutex. I doubt xpmem fits inside a
> CONFIG_MMU_NOTIFIER anymore, or we'll all run a bit slower because of
> it. It's really a call of how much we want to optimize the MMU
> notifier, by keeping things like RCU for the registration.

But won't that other "subsystem" cause us to have two seperate callouts
that do equivalent things and therefore force a removal of this and go
back to what Christoph has currently proposed?

Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
Given Nick's comments I ported my version of the mmu notifiers to
latest mainline. There are no known bugs AFIK and it's obviously safe
(nothing is allowed to schedule inside rcu_read_lock taken by
mmu_notifier() with my patch).

XPMEM simply can't use RCU for the registration locking if it wants to
schedule inside the mmu notifier calls. So I guess it's better to add
the XPMEM invalidate_range_end/begin/external-rmap as a whole
different subsystem that will have to use a mutex (not RCU) to
serialize, and at the same time that CONFIG_XPMEM will also have to
switch the i_mmap_lock to a mutex. I doubt xpmem fits inside a
CONFIG_MMU_NOTIFIER anymore, or we'll all run a bit slower because of
it. It's really a call of how much we want to optimize the MMU
notifier, by keeping things like RCU for the registration.

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -46,6 +46,7 @@
__young = ptep_test_and_clear_young(__vma, __address, __ptep);  \
if (__young)\
flush_tlb_page(__vma, __address);   \
+   __young |= mmu_notifier_age_page((__vma)->vm_mm, __address);\
__young;\
 })
 #endif
@@ -86,6 +87,7 @@ do {  
\
pte_t __pte;\
__pte = ptep_get_and_clear((__vma)->vm_mm, __address, __ptep);  \
flush_tlb_page(__vma, __address);   \
+   mmu_notifier(invalidate_page, (__vma)->vm_mm, __address);   \
__pte;  \
 })
 #endif
diff --git a/include/asm-s390/pgtable.h b/include/asm-s390/pgtable.h
--- a/include/asm-s390/pgtable.h
+++ b/include/asm-s390/pgtable.h
@@ -735,6 +735,7 @@ static inline pte_t ptep_clear_flush(str
 {
pte_t pte = *ptep;
ptep_invalidate(vma->vm_mm, address, ptep);
+   mmu_notifier(invalidate_page, vma->vm_mm, address);
return pte;
 }
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -228,6 +229,8 @@ struct mm_struct {
 #ifdef CONFIG_CGROUP_MEM_CONT
struct mem_cgroup *mem_cgroup;
 #endif
+
+   struct mmu_notifier_head mmu_notifier; /* MMU notifier list */
 };
 
 #endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
new file mode 100644
--- /dev/null
+++ b/include/linux/mmu_notifier.h
@@ -0,0 +1,132 @@
+#ifndef _LINUX_MMU_NOTIFIER_H
+#define _LINUX_MMU_NOTIFIER_H
+
+#include 
+#include 
+
+struct mmu_notifier;
+
+struct mmu_notifier_ops {
+   /*
+* Called when nobody can register any more notifier in the mm
+* and after the "mn" notifier has been disarmed already.
+*/
+   void (*release)(struct mmu_notifier *mn,
+   struct mm_struct *mm);
+
+   /*
+* invalidate_page[s] is called in atomic context
+* after any pte has been updated and before
+* dropping the PT lock required to update any Linux pte.
+* Once the PT lock will be released the pte will have its
+* final value to export through the secondary MMU.
+* Before this is invoked any secondary MMU is still ok
+* to read/write to the page previously pointed by the
+* Linux pte because the old page hasn't been freed yet.
+* If required set_page_dirty has to be called internally
+* to this method.
+*/
+   void (*invalidate_page)(struct mmu_notifier *mn,
+   struct mm_struct *mm,
+   unsigned long address);
+   void (*invalidate_pages)(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long start, unsigned long end);
+
+   /*
+* Age page is called in atomic context inside the PT lock
+* right after the VM is test-and-clearing the young/accessed
+* bitflag in the pte. This way the VM will provide proper aging
+* to the accesses to the page through the secondary MMUs
+* and not only to the ones through the Linux pte.
+*/
+   int (*age_page)(struct mmu_notifier *mn,
+   struct mm_struct *mm,
+   unsigned long address);
+};
+
+struct mmu_notifier {
+   struct hlist_node hlist;
+   const struct mmu_notifier_ops *ops;
+};
+
+#ifdef CONFIG_MMU_NOTIFIER
+
+struct mmu_notifier_head {
+   struct hlist_head head;
+   spinlock_t lock;
+};
+
+#include 
+
+/*
+ * RCU is used to 

[PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
Given Nick's comments I ported my version of the mmu notifiers to
latest mainline. There are no known bugs AFIK and it's obviously safe
(nothing is allowed to schedule inside rcu_read_lock taken by
mmu_notifier() with my patch).

XPMEM simply can't use RCU for the registration locking if it wants to
schedule inside the mmu notifier calls. So I guess it's better to add
the XPMEM invalidate_range_end/begin/external-rmap as a whole
different subsystem that will have to use a mutex (not RCU) to
serialize, and at the same time that CONFIG_XPMEM will also have to
switch the i_mmap_lock to a mutex. I doubt xpmem fits inside a
CONFIG_MMU_NOTIFIER anymore, or we'll all run a bit slower because of
it. It's really a call of how much we want to optimize the MMU
notifier, by keeping things like RCU for the registration.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -46,6 +46,7 @@
__young = ptep_test_and_clear_young(__vma, __address, __ptep);  \
if (__young)\
flush_tlb_page(__vma, __address);   \
+   __young |= mmu_notifier_age_page((__vma)-vm_mm, __address);\
__young;\
 })
 #endif
@@ -86,6 +87,7 @@ do {  
\
pte_t __pte;\
__pte = ptep_get_and_clear((__vma)-vm_mm, __address, __ptep);  \
flush_tlb_page(__vma, __address);   \
+   mmu_notifier(invalidate_page, (__vma)-vm_mm, __address);   \
__pte;  \
 })
 #endif
diff --git a/include/asm-s390/pgtable.h b/include/asm-s390/pgtable.h
--- a/include/asm-s390/pgtable.h
+++ b/include/asm-s390/pgtable.h
@@ -735,6 +735,7 @@ static inline pte_t ptep_clear_flush(str
 {
pte_t pte = *ptep;
ptep_invalidate(vma-vm_mm, address, ptep);
+   mmu_notifier(invalidate_page, vma-vm_mm, address);
return pte;
 }
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -10,6 +10,7 @@
 #include linux/rbtree.h
 #include linux/rwsem.h
 #include linux/completion.h
+#include linux/mmu_notifier.h
 #include asm/page.h
 #include asm/mmu.h
 
@@ -228,6 +229,8 @@ struct mm_struct {
 #ifdef CONFIG_CGROUP_MEM_CONT
struct mem_cgroup *mem_cgroup;
 #endif
+
+   struct mmu_notifier_head mmu_notifier; /* MMU notifier list */
 };
 
 #endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
new file mode 100644
--- /dev/null
+++ b/include/linux/mmu_notifier.h
@@ -0,0 +1,132 @@
+#ifndef _LINUX_MMU_NOTIFIER_H
+#define _LINUX_MMU_NOTIFIER_H
+
+#include linux/list.h
+#include linux/spinlock.h
+
+struct mmu_notifier;
+
+struct mmu_notifier_ops {
+   /*
+* Called when nobody can register any more notifier in the mm
+* and after the mn notifier has been disarmed already.
+*/
+   void (*release)(struct mmu_notifier *mn,
+   struct mm_struct *mm);
+
+   /*
+* invalidate_page[s] is called in atomic context
+* after any pte has been updated and before
+* dropping the PT lock required to update any Linux pte.
+* Once the PT lock will be released the pte will have its
+* final value to export through the secondary MMU.
+* Before this is invoked any secondary MMU is still ok
+* to read/write to the page previously pointed by the
+* Linux pte because the old page hasn't been freed yet.
+* If required set_page_dirty has to be called internally
+* to this method.
+*/
+   void (*invalidate_page)(struct mmu_notifier *mn,
+   struct mm_struct *mm,
+   unsigned long address);
+   void (*invalidate_pages)(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long start, unsigned long end);
+
+   /*
+* Age page is called in atomic context inside the PT lock
+* right after the VM is test-and-clearing the young/accessed
+* bitflag in the pte. This way the VM will provide proper aging
+* to the accesses to the page through the secondary MMUs
+* and not only to the ones through the Linux pte.
+*/
+   int (*age_page)(struct mmu_notifier *mn,
+   struct mm_struct *mm,
+   unsigned long address);
+};
+
+struct mmu_notifier {
+   struct hlist_node hlist;
+   const struct mmu_notifier_ops *ops;
+};
+
+#ifdef CONFIG_MMU_NOTIFIER
+
+struct mmu_notifier_head {
+ 

Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Robin Holt
On Wed, Feb 20, 2008 at 11:39:42AM +0100, Andrea Arcangeli wrote:
 Given Nick's comments I ported my version of the mmu notifiers to
 latest mainline. There are no known bugs AFIK and it's obviously safe
 (nothing is allowed to schedule inside rcu_read_lock taken by
 mmu_notifier() with my patch).
 
 XPMEM simply can't use RCU for the registration locking if it wants to
 schedule inside the mmu notifier calls. So I guess it's better to add
 the XPMEM invalidate_range_end/begin/external-rmap as a whole
 different subsystem that will have to use a mutex (not RCU) to
 serialize, and at the same time that CONFIG_XPMEM will also have to
 switch the i_mmap_lock to a mutex. I doubt xpmem fits inside a
 CONFIG_MMU_NOTIFIER anymore, or we'll all run a bit slower because of
 it. It's really a call of how much we want to optimize the MMU
 notifier, by keeping things like RCU for the registration.

But won't that other subsystem cause us to have two seperate callouts
that do equivalent things and therefore force a removal of this and go
back to what Christoph has currently proposed?

Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 05:33:13AM -0600, Robin Holt wrote:
 But won't that other subsystem cause us to have two seperate callouts
 that do equivalent things and therefore force a removal of this and go
 back to what Christoph has currently proposed?

The point is that a new kind of notifier that only supports sleeping
users will allow to keep optimizing the mmu notifier patch for the
non-sleeping users. If we keep going Christoph's way of having a
single notifier that fits all he will have to:

1) drop the entire RCU locking from its patches (making all previous
   rcu discussions and fixes void) those discussions only made sense
   if applied to _my_ patch, not Christoph's patches as long as you
   pretend to sleep in any of his mmu notifier methods like invalidate_range_*.

2) probably modify the linux VM to replace the i_mmap_lock and perhaps
   PT lock with a mutex (see Nick's comments for details)

I'm unconvinced both the main linux VM and the mmu notifier should be
changed like this just to support xpmem. All non-sleeping users don't
need that. Nevertheless I'm fully welcome to support xpmem (and it's
not my call nor my interest to comment if allocating skbs in
try_to_unmap in order to unpin pages is workable, let's assume it's
workable for the sake of this discussion) with a new config option
that will also alter how the core VM works, in order to fully support
the sleeping users for filebacked mappings.

This will also create less confusion in the registration. With
Christoph's one-config-option-fits-all you had to half register into
the mmu notifier (the sleeping calls, so not invalidate_page) and full
register in the external rmap notifier, and I had to only half
register into the mmu notifier (not range_begin) and not register in
the rmap external notifier.

With two separate config options for sleeping and non sleeping users,
I'll 100% register in the mmu notifier methods, and the non-sleeping
users will 100% register the xpmem methods. You won't have to have
designed the mmu notifier patches to understand how to use it.

In theory both KVM and GRU are free to use the xpmem methods too (the
invalidate_page will be page_t based instead of [mm,addr] based, but
that's possible to handle with KVM changes if one wants to), but if a
distro only wants to support the sleeping users in their binary kernel
images, they won't be forced to alter how the VM works to do
that.

If there's agreement that the VM should alter its locking from
spinlock to mutex for its own good, then Christoph's
one-config-option-fits-all becomes a lot more appealing (replacing RCU
with a mutex in the mmu notifier list registration locking isn't my
main worry and the non-sleeping-users may be ok to live with it).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 06:24:24AM -0600, Robin Holt wrote:
 We do not need to do any allocation in the messaging layer, all
 structures used for messaging are allocated at module load time.
 The allocation discussions we had early on were about trying to
 rearrange you notifiers to allow a seperate worker thread to do the
 invalidate and then the main thread would spin waiting for the worker to
 complete.  That was canned by the moving your notifier to before the
 lock was grabbed which led us to the point of needing a _begin and _end.

I thought you called some net/* function inside the mmu notifier
methods. Those always require several ram allocations internally.

 So, fundamentally, how would they be different?  Would we be required to
 add another notifier list to the mm and have two seperate callout
 points?  Reduction would end up with the same half-registered
 half-not-registered situation you point out above.  Then further
 reduction would lead to the elimination of the callouts you have just
 proposed and using the _begin/_end callouts and we are back to
 Christoph's current patch.

Did you miss Nick's argument that we'd need to change some VM lock to
mutex and solve lock issues first? Are you implying mutex are more
efficient for the VM? (you may seek support from preempt-rt folks at
least) or are you implying the VM would better run slower with mutex
in order to have a single config option?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Robin Holt
On Wed, Feb 20, 2008 at 01:32:36PM +0100, Andrea Arcangeli wrote:
 On Wed, Feb 20, 2008 at 06:24:24AM -0600, Robin Holt wrote:
  We do not need to do any allocation in the messaging layer, all
  structures used for messaging are allocated at module load time.
  The allocation discussions we had early on were about trying to
  rearrange you notifiers to allow a seperate worker thread to do the
  invalidate and then the main thread would spin waiting for the worker to
  complete.  That was canned by the moving your notifier to before the
  lock was grabbed which led us to the point of needing a _begin and _end.
 
 I thought you called some net/* function inside the mmu notifier
 methods. Those always require several ram allocations internally.

Nope, that was the discussions with the IB folks.  We only use XPC and
both the messages we send and the XPC internals do not need to allocate.

  So, fundamentally, how would they be different?  Would we be required to
  add another notifier list to the mm and have two seperate callout
  points?  Reduction would end up with the same half-registered
  half-not-registered situation you point out above.  Then further
  reduction would lead to the elimination of the callouts you have just
  proposed and using the _begin/_end callouts and we are back to
  Christoph's current patch.
 
 Did you miss Nick's argument that we'd need to change some VM lock to
 mutex and solve lock issues first? Are you implying mutex are more
 efficient for the VM? (you may seek support from preempt-rt folks at
 least) or are you implying the VM would better run slower with mutex
 in order to have a single config option?

That would be if we needed to support file backed mappings and hugetlbfs
mappings.  Currently (and for the last 6 years), XPMEM has not supported
either of those.  I don't view either as being a realistic possibility,
but it is certainly something we would need to address before either
could be supported.

Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Robin Holt
On Wed, Feb 20, 2008 at 01:03:24PM +0100, Andrea Arcangeli wrote:
 I'm unconvinced both the main linux VM and the mmu notifier should be
 changed like this just to support xpmem. All non-sleeping users don't
 need that. Nevertheless I'm fully welcome to support xpmem (and it's
 not my call nor my interest to comment if allocating skbs in
 try_to_unmap in order to unpin pages is workable, let's assume it's
 workable for the sake of this discussion) with a new config option
 that will also alter how the core VM works, in order to fully support
 the sleeping users for filebacked mappings.

We do not need to do any allocation in the messaging layer, all
structures used for messaging are allocated at module load time.
The allocation discussions we had early on were about trying to
rearrange you notifiers to allow a seperate worker thread to do the
invalidate and then the main thread would spin waiting for the worker to
complete.  That was canned by the moving your notifier to before the
lock was grabbed which led us to the point of needing a _begin and _end.

 This will also create less confusion in the registration. With
 Christoph's one-config-option-fits-all you had to half register into
 the mmu notifier (the sleeping calls, so not invalidate_page) and full
 register in the external rmap notifier, and I had to only half
 register into the mmu notifier (not range_begin) and not register in
 the rmap external notifier.
 
 With two separate config options for sleeping and non sleeping users,
 I'll 100% register in the mmu notifier methods, and the non-sleeping
 users will 100% register the xpmem methods. You won't have to have
 designed the mmu notifier patches to understand how to use it.

So, fundamentally, how would they be different?  Would we be required to
add another notifier list to the mm and have two seperate callout
points?  Reduction would end up with the same half-registered
half-not-registered situation you point out above.  Then further
reduction would lead to the elimination of the callouts you have just
proposed and using the _begin/_end callouts and we are back to
Christoph's current patch.

Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Robin Holt
On Wed, Feb 20, 2008 at 11:39:42AM +0100, Andrea Arcangeli wrote:
 XPMEM simply can't use RCU for the registration locking if it wants to
 schedule inside the mmu notifier calls. So I guess it's better to add

Whoa there.  In Christoph's patch, we did not use rcu for the list.  It
was a simple hlist_head.  The list manipulations were done under
down_write(current-mm-mmap_sem) and would therefore not be racy.  All
the callout locations are already acquiring the mmap_sem at least
readably, so we should be safe.  Maybe I missed a race somewhere.

Thanks,
Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Andrea Arcangeli
On Wed, Feb 20, 2008 at 08:41:55AM -0600, Robin Holt wrote:
 On Wed, Feb 20, 2008 at 11:39:42AM +0100, Andrea Arcangeli wrote:
  XPMEM simply can't use RCU for the registration locking if it wants to
  schedule inside the mmu notifier calls. So I guess it's better to add
 
 Whoa there.  In Christoph's patch, we did not use rcu for the list.  It
 was a simple hlist_head.  The list manipulations were done under
 down_write(current-mm-mmap_sem) and would therefore not be racy.  All
 the callout locations are already acquiring the mmap_sem at least
 readably, so we should be safe.  Maybe I missed a race somewhere.

You missed quite a few, see when atomic=1 and when mmu_rmap_notifier
is invoked for example.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Jack Steiner
On Wed, Feb 20, 2008 at 11:39:42AM +0100, Andrea Arcangeli wrote:
 Given Nick's comments I ported my version of the mmu notifiers to
 latest mainline. There are no known bugs AFIK and it's obviously safe
 (nothing is allowed to schedule inside rcu_read_lock taken by
 mmu_notifier() with my patch).
 

I ported the GRU driver to use the latest #v6 patch and ran a series of
tests on it using our system simulator. The simulator is slow so true
stress or swapping is not possible - at least within a finite amount of
time.

Functionally, the #v6 patch seems to work for the GRU. However, I did
notice two significant differences that make the #v6 performance worse for
the GRU than Christoph's patch.  I think one difference is easily fixable
but the other is more difficult:

- the location of the mmu_notifier_release() callout is at a
  different place in the 2 patches. Christoph has the callout
  BEFORE the call to unmap_vmas() whereas you have it AFTER. The
  net result is that the GRU does a LOT of 1-page TLB flushes
  during process teardown.  These flushes are not done with
  Christops's patch.

- the range callouts in Christoph's patch benefit the GRU because
  multiple TLB entries can be flushed with a single GRU
  instruction (the GRU hardware supports a range flush using a
  vaddr  length).  The #v6 patch does a TLB flush for each page in
  the range.  Flushing on the GRU is slow so being able to flush
  multiple pages with a single request is a benefit.

Seems like the latter difference could be significant for other users
of mmu notifiers.


--- jack
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Nick Piggin
On Wed, Feb 20, 2008 at 11:39:42AM +0100, Andrea Arcangeli wrote:
 Given Nick's comments I ported my version of the mmu notifiers to
 latest mainline. There are no known bugs AFIK and it's obviously safe
 (nothing is allowed to schedule inside rcu_read_lock taken by
 mmu_notifier() with my patch).

Thanks! Yes the seqlock you are using now ends up looking similar
to what I did and I couldn't find a hole in that either. So I
think this is going to work.

I do prefer some parts of my patch, however for everyone's sanity,
I think you should be the maintainer of the mmu notifiers, and I
will send you incremental changes that can be discussed more easily
that way (nothing major, mainly style and minor things).


 XPMEM simply can't use RCU for the registration locking if it wants to
 schedule inside the mmu notifier calls. So I guess it's better to add
 the XPMEM invalidate_range_end/begin/external-rmap as a whole
 different subsystem that will have to use a mutex (not RCU) to
 serialize, and at the same time that CONFIG_XPMEM will also have to
 switch the i_mmap_lock to a mutex. I doubt xpmem fits inside a
 CONFIG_MMU_NOTIFIER anymore, or we'll all run a bit slower because of
 it. It's really a call of how much we want to optimize the MMU
 notifier, by keeping things like RCU for the registration.

I agree: your coherent, non-sleeping mmu notifiers are pretty simple
and unintrusive. The sleeping version is fundamentally going to either
need to change VM locks, or be non-coherent, so I don't think there is
a question of making one solution fit everybody. So the sleeping /
xrmap patch should be kept either completely independent, or as an
add-on to this one.

I will post some suggestions to you when I get a chance.

 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v6

2008-02-20 Thread Nick Piggin
On Wed, Feb 20, 2008 at 01:03:24PM +0100, Andrea Arcangeli wrote:
 If there's agreement that the VM should alter its locking from
 spinlock to mutex for its own good, then Christoph's
 one-config-option-fits-all becomes a lot more appealing (replacing RCU
 with a mutex in the mmu notifier list registration locking isn't my
 main worry and the non-sleeping-users may be ok to live with it).

Just from a high level view, in some cases we can just say that no we
aren't going to support this. And this may well be one of those cases.

The more constraints placed on the VM, the harder it becomes to
improve and adapt in future. And this seems like a pretty big restriction.
(especially if we can eg. work around it completely by having a special
purpose driver to get_user_pages on comm buffers as I suggested in the
other mail).

At any rate, I believe Andrea's patch really places minimal or no further
constraints than a regular CPU TLB (or the hash tables that some archs
implement). So we're kind of in 2 different leagues here.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/