Re: changing mm->mmap_sem (was: Re: system call for process information?)

2001-03-19 Thread Stephen C. Tweedie

Hi,

On Sun, Mar 18, 2001 at 10:34:38AM +0100, Manfred Spraul wrote:

> > The problem is that mmap_sem seems to be protecting the list
> > of VMAs, so taking _only_ the page_table_lock could let a VMA
> > change under us while a page fault is underway ...
> 
> No, that can't happen.

It can.  Page faults often need to block, so they have to be able to
drop the page_table_lock.  Holding the mmap_sem is all that keeps the
vma intact until the IO is complete.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: changing mm-mmap_sem (was: Re: system call for process information?)

2001-03-19 Thread Stephen C. Tweedie

Hi,

On Sun, Mar 18, 2001 at 10:34:38AM +0100, Manfred Spraul wrote:

  The problem is that mmap_sem seems to be protecting the list
  of VMAs, so taking _only_ the page_table_lock could let a VMA
  change under us while a page fault is underway ...
 
 No, that can't happen.

It can.  Page faults often need to block, so they have to be able to
drop the page_table_lock.  Holding the mmap_sem is all that keeps the
vma intact until the IO is complete.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: changing mm->mmap_sem (was: Re: system call for process information?)

2001-03-18 Thread Manfred Spraul

>
> The problem is that mmap_sem seems to be protecting the list
> of VMAs, so taking _only_ the page_table_lock could let a VMA
> change under us while a page fault is underway ...

No, that can't happen.
VMA changes only happen if both the mmap_sem and the page table lock is
acquired. (check insert_vm() at the end of mm/mmap.c)
The page fault path uses the map_sem, kswaps uses page_table_lock.

<< from your patch:
--- linux-2.4.2-ac20-vm/mm/vmscan.c.origSat Mar 17 11:30:49 2001
+++ linux-2.4.2-ac20-vm/mm/vmscan.c Sat Mar 17 20:53:10 2001
@@ -231,6 +231,7 @@
 * Find the proper vm-area after freezing the vma chain
 * and ptes.
 */
+   down_read(>mmap_sem);
spin_lock(>page_table_lock);
 

Why do you acquire the mmap semaphore in swapout_mm()? The old rule was
that kswapd should never sleep on the mmap semaphore. Isn't there a
deadlock if mmap sem is already acquired? I don't remember the details.

>
> The problem is that mmap_sem seems to be protecting the list
> of VMAs, so taking _only_ the page_table_lock could let a VMA
> change under us while a page fault is underway ...

I remember that the pmd_alloc() and pte_alloc() functions need
additional locking.

--
Manfred

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: changing mm-mmap_sem (was: Re: system call for process information?)

2001-03-18 Thread Manfred Spraul


 The problem is that mmap_sem seems to be protecting the list
 of VMAs, so taking _only_ the page_table_lock could let a VMA
 change under us while a page fault is underway ...

No, that can't happen.
VMA changes only happen if both the mmap_sem and the page table lock is
acquired. (check insert_vm() at the end of mm/mmap.c)
The page fault path uses the map_sem, kswaps uses page_table_lock.

 from your patch:
--- linux-2.4.2-ac20-vm/mm/vmscan.c.origSat Mar 17 11:30:49 2001
+++ linux-2.4.2-ac20-vm/mm/vmscan.c Sat Mar 17 20:53:10 2001
@@ -231,6 +231,7 @@
 * Find the proper vm-area after freezing the vma chain
 * and ptes.
 */
+   down_read(mm-mmap_sem);
spin_lock(mm-page_table_lock);
 

Why do you acquire the mmap semaphore in swapout_mm()? The old rule was
that kswapd should never sleep on the mmap semaphore. Isn't there a
deadlock if mmap sem is already acquired? I don't remember the details.


 The problem is that mmap_sem seems to be protecting the list
 of VMAs, so taking _only_ the page_table_lock could let a VMA
 change under us while a page fault is underway ...

I remember that the pmd_alloc() and pte_alloc() functions need
additional locking.

--
Manfred

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: changing mm->mmap_sem (was: Re: system call for process information?)

2001-03-16 Thread Stephen C. Tweedie

Hi,

On Fri, Mar 16, 2001 at 08:50:25AM -0300, Rik van Riel wrote:
> On Fri, 16 Mar 2001, Stephen C. Tweedie wrote:
> 
> > > Write locks would be used in the code where we actually want
> > > to change the VMA list and page faults would use an extra lock
> > > to protect against each other (possibly a per-pagetable lock
> > 
> > Why do we need another lock?  The critical section where we do the
> > final update on the pte _already_ takes the page table spinlock to
> > avoid races against the swapper.
> 
> The problem is that mmap_sem seems to be protecting the list
> of VMAs, so taking _only_ the page_table_lock could let a VMA
> change under us while a page fault is underway ...

Right, I'm not suggesting removing that: making the mmap_sem
read/write is fine, but yes, we still need that semaphore.  But as for
the "page faults would use an extra lock to protect against each
other" bit --- we already have another lock, the page table lock,
which can be used in this way, so ANOTHER lock should be unnecessary.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: changing mm->mmap_sem (was: Re: system call for process information?)

2001-03-16 Thread Stephen C. Tweedie

Hi,

On Thu, Mar 15, 2001 at 09:24:59AM -0300, Rik van Riel wrote:
> On Wed, 14 Mar 2001, Rik van Riel wrote:

> The mmap_sem is used in procfs to prevent the list of VMAs
> from changing. In the page fault code it seems to be used
> to prevent other page faults to happen at the same time with
> the current page fault (and to prevent VMAs from changing
> while a page fault is underway).

The page table spinlock should be quite sufficient to let us avoid
races in the page fault code.  We've had to deal with this before
there was ever a mmap_sem anyway: in ancient times, every page fault
had to do things like check to see if the pte had changed after IO was
complete and once the BKL had been retaken.  We can do the same with
the page fault spinlock without much pain.

> Maybe we should change the mmap_sem into a R/W semaphore ?

Definitely.

> Write locks would be used in the code where we actually want
> to change the VMA list and page faults would use an extra lock
> to protect against each other (possibly a per-pagetable lock

Why do we need another lock?  The critical section where we do the
final update on the pte _already_ takes the page table spinlock to
avoid races against the swapper.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: changing mm-mmap_sem (was: Re: system call for process information?)

2001-03-16 Thread Stephen C. Tweedie

Hi,

On Thu, Mar 15, 2001 at 09:24:59AM -0300, Rik van Riel wrote:
 On Wed, 14 Mar 2001, Rik van Riel wrote:

 The mmap_sem is used in procfs to prevent the list of VMAs
 from changing. In the page fault code it seems to be used
 to prevent other page faults to happen at the same time with
 the current page fault (and to prevent VMAs from changing
 while a page fault is underway).

The page table spinlock should be quite sufficient to let us avoid
races in the page fault code.  We've had to deal with this before
there was ever a mmap_sem anyway: in ancient times, every page fault
had to do things like check to see if the pte had changed after IO was
complete and once the BKL had been retaken.  We can do the same with
the page fault spinlock without much pain.

 Maybe we should change the mmap_sem into a R/W semaphore ?

Definitely.

 Write locks would be used in the code where we actually want
 to change the VMA list and page faults would use an extra lock
 to protect against each other (possibly a per-pagetable lock

Why do we need another lock?  The critical section where we do the
final update on the pte _already_ takes the page table spinlock to
avoid races against the swapper.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: changing mm-mmap_sem (was: Re: system call for process information?)

2001-03-16 Thread Stephen C. Tweedie

Hi,

On Fri, Mar 16, 2001 at 08:50:25AM -0300, Rik van Riel wrote:
 On Fri, 16 Mar 2001, Stephen C. Tweedie wrote:
 
   Write locks would be used in the code where we actually want
   to change the VMA list and page faults would use an extra lock
   to protect against each other (possibly a per-pagetable lock
  
  Why do we need another lock?  The critical section where we do the
  final update on the pte _already_ takes the page table spinlock to
  avoid races against the swapper.
 
 The problem is that mmap_sem seems to be protecting the list
 of VMAs, so taking _only_ the page_table_lock could let a VMA
 change under us while a page fault is underway ...

Right, I'm not suggesting removing that: making the mmap_sem
read/write is fine, but yes, we still need that semaphore.  But as for
the "page faults would use an extra lock to protect against each
other" bit --- we already have another lock, the page table lock,
which can be used in this way, so ANOTHER lock should be unnecessary.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



changing mm->mmap_sem (was: Re: system call for process information?)

2001-03-15 Thread Rik van Riel

On Wed, 14 Mar 2001, Rik van Riel wrote:
> On Wed, 14 Mar 2001, george anzinger wrote:
> 
> > Is it REALLY necessary to prevent them from seeing an
> > inconsistent state?  Seems to me that in the total picture (i.e.
> > system wide) they will never see a consistent state, so why be
> > concerned with a small corner of the system.
> 
> You're right.

Mmmm, I've looked at the code today and it turned out that
we're NOT right ;)

The mmap_sem is used in procfs to prevent the list of VMAs
from changing. In the page fault code it seems to be used
to prevent other page faults to happen at the same time with
the current page fault (and to prevent VMAs from changing
while a page fault is underway).

Maybe we should change the mmap_sem into a R/W semaphore ?

Since page faults seem to be the "common cause" of blocking
procfs access *and* since both page faults and procfs only
need to prevent the VMA list from changing, a read lock would
help here.

Write locks would be used in the code where we actually want
to change the VMA list and page faults would use an extra lock
to protect against each other (possibly a per-pagetable lock so
multithreaded apps can pagefault in different memory regions at
the same time ???).

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



changing mm-mmap_sem (was: Re: system call for process information?)

2001-03-15 Thread Rik van Riel

On Wed, 14 Mar 2001, Rik van Riel wrote:
 On Wed, 14 Mar 2001, george anzinger wrote:
 
  Is it REALLY necessary to prevent them from seeing an
  inconsistent state?  Seems to me that in the total picture (i.e.
  system wide) they will never see a consistent state, so why be
  concerned with a small corner of the system.
 
 You're right.

Mmmm, I've looked at the code today and it turned out that
we're NOT right ;)

The mmap_sem is used in procfs to prevent the list of VMAs
from changing. In the page fault code it seems to be used
to prevent other page faults to happen at the same time with
the current page fault (and to prevent VMAs from changing
while a page fault is underway).

Maybe we should change the mmap_sem into a R/W semaphore ?

Since page faults seem to be the "common cause" of blocking
procfs access *and* since both page faults and procfs only
need to prevent the VMA list from changing, a read lock would
help here.

Write locks would be used in the code where we actually want
to change the VMA list and page faults would use an extra lock
to protect against each other (possibly a per-pagetable lock so
multithreaded apps can pagefault in different memory regions at
the same time ???).

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/