Re: changing mm->mmap_sem (was: Re: system call for process information?)
Hi, On Sun, Mar 18, 2001 at 10:34:38AM +0100, Manfred Spraul wrote: > > The problem is that mmap_sem seems to be protecting the list > > of VMAs, so taking _only_ the page_table_lock could let a VMA > > change under us while a page fault is underway ... > > No, that can't happen. It can. Page faults often need to block, so they have to be able to drop the page_table_lock. Holding the mmap_sem is all that keeps the vma intact until the IO is complete. Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: changing mm-mmap_sem (was: Re: system call for process information?)
Hi, On Sun, Mar 18, 2001 at 10:34:38AM +0100, Manfred Spraul wrote: The problem is that mmap_sem seems to be protecting the list of VMAs, so taking _only_ the page_table_lock could let a VMA change under us while a page fault is underway ... No, that can't happen. It can. Page faults often need to block, so they have to be able to drop the page_table_lock. Holding the mmap_sem is all that keeps the vma intact until the IO is complete. Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: changing mm->mmap_sem (was: Re: system call for process information?)
> > The problem is that mmap_sem seems to be protecting the list > of VMAs, so taking _only_ the page_table_lock could let a VMA > change under us while a page fault is underway ... No, that can't happen. VMA changes only happen if both the mmap_sem and the page table lock is acquired. (check insert_vm() at the end of mm/mmap.c) The page fault path uses the map_sem, kswaps uses page_table_lock. << from your patch: --- linux-2.4.2-ac20-vm/mm/vmscan.c.origSat Mar 17 11:30:49 2001 +++ linux-2.4.2-ac20-vm/mm/vmscan.c Sat Mar 17 20:53:10 2001 @@ -231,6 +231,7 @@ * Find the proper vm-area after freezing the vma chain * and ptes. */ + down_read(>mmap_sem); spin_lock(>page_table_lock); Why do you acquire the mmap semaphore in swapout_mm()? The old rule was that kswapd should never sleep on the mmap semaphore. Isn't there a deadlock if mmap sem is already acquired? I don't remember the details. > > The problem is that mmap_sem seems to be protecting the list > of VMAs, so taking _only_ the page_table_lock could let a VMA > change under us while a page fault is underway ... I remember that the pmd_alloc() and pte_alloc() functions need additional locking. -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: changing mm-mmap_sem (was: Re: system call for process information?)
The problem is that mmap_sem seems to be protecting the list of VMAs, so taking _only_ the page_table_lock could let a VMA change under us while a page fault is underway ... No, that can't happen. VMA changes only happen if both the mmap_sem and the page table lock is acquired. (check insert_vm() at the end of mm/mmap.c) The page fault path uses the map_sem, kswaps uses page_table_lock. from your patch: --- linux-2.4.2-ac20-vm/mm/vmscan.c.origSat Mar 17 11:30:49 2001 +++ linux-2.4.2-ac20-vm/mm/vmscan.c Sat Mar 17 20:53:10 2001 @@ -231,6 +231,7 @@ * Find the proper vm-area after freezing the vma chain * and ptes. */ + down_read(mm-mmap_sem); spin_lock(mm-page_table_lock); Why do you acquire the mmap semaphore in swapout_mm()? The old rule was that kswapd should never sleep on the mmap semaphore. Isn't there a deadlock if mmap sem is already acquired? I don't remember the details. The problem is that mmap_sem seems to be protecting the list of VMAs, so taking _only_ the page_table_lock could let a VMA change under us while a page fault is underway ... I remember that the pmd_alloc() and pte_alloc() functions need additional locking. -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: changing mm->mmap_sem (was: Re: system call for process information?)
Hi, On Fri, Mar 16, 2001 at 08:50:25AM -0300, Rik van Riel wrote: > On Fri, 16 Mar 2001, Stephen C. Tweedie wrote: > > > > Write locks would be used in the code where we actually want > > > to change the VMA list and page faults would use an extra lock > > > to protect against each other (possibly a per-pagetable lock > > > > Why do we need another lock? The critical section where we do the > > final update on the pte _already_ takes the page table spinlock to > > avoid races against the swapper. > > The problem is that mmap_sem seems to be protecting the list > of VMAs, so taking _only_ the page_table_lock could let a VMA > change under us while a page fault is underway ... Right, I'm not suggesting removing that: making the mmap_sem read/write is fine, but yes, we still need that semaphore. But as for the "page faults would use an extra lock to protect against each other" bit --- we already have another lock, the page table lock, which can be used in this way, so ANOTHER lock should be unnecessary. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: changing mm->mmap_sem (was: Re: system call for process information?)
Hi, On Thu, Mar 15, 2001 at 09:24:59AM -0300, Rik van Riel wrote: > On Wed, 14 Mar 2001, Rik van Riel wrote: > The mmap_sem is used in procfs to prevent the list of VMAs > from changing. In the page fault code it seems to be used > to prevent other page faults to happen at the same time with > the current page fault (and to prevent VMAs from changing > while a page fault is underway). The page table spinlock should be quite sufficient to let us avoid races in the page fault code. We've had to deal with this before there was ever a mmap_sem anyway: in ancient times, every page fault had to do things like check to see if the pte had changed after IO was complete and once the BKL had been retaken. We can do the same with the page fault spinlock without much pain. > Maybe we should change the mmap_sem into a R/W semaphore ? Definitely. > Write locks would be used in the code where we actually want > to change the VMA list and page faults would use an extra lock > to protect against each other (possibly a per-pagetable lock Why do we need another lock? The critical section where we do the final update on the pte _already_ takes the page table spinlock to avoid races against the swapper. Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: changing mm-mmap_sem (was: Re: system call for process information?)
Hi, On Thu, Mar 15, 2001 at 09:24:59AM -0300, Rik van Riel wrote: On Wed, 14 Mar 2001, Rik van Riel wrote: The mmap_sem is used in procfs to prevent the list of VMAs from changing. In the page fault code it seems to be used to prevent other page faults to happen at the same time with the current page fault (and to prevent VMAs from changing while a page fault is underway). The page table spinlock should be quite sufficient to let us avoid races in the page fault code. We've had to deal with this before there was ever a mmap_sem anyway: in ancient times, every page fault had to do things like check to see if the pte had changed after IO was complete and once the BKL had been retaken. We can do the same with the page fault spinlock without much pain. Maybe we should change the mmap_sem into a R/W semaphore ? Definitely. Write locks would be used in the code where we actually want to change the VMA list and page faults would use an extra lock to protect against each other (possibly a per-pagetable lock Why do we need another lock? The critical section where we do the final update on the pte _already_ takes the page table spinlock to avoid races against the swapper. Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: changing mm-mmap_sem (was: Re: system call for process information?)
Hi, On Fri, Mar 16, 2001 at 08:50:25AM -0300, Rik van Riel wrote: On Fri, 16 Mar 2001, Stephen C. Tweedie wrote: Write locks would be used in the code where we actually want to change the VMA list and page faults would use an extra lock to protect against each other (possibly a per-pagetable lock Why do we need another lock? The critical section where we do the final update on the pte _already_ takes the page table spinlock to avoid races against the swapper. The problem is that mmap_sem seems to be protecting the list of VMAs, so taking _only_ the page_table_lock could let a VMA change under us while a page fault is underway ... Right, I'm not suggesting removing that: making the mmap_sem read/write is fine, but yes, we still need that semaphore. But as for the "page faults would use an extra lock to protect against each other" bit --- we already have another lock, the page table lock, which can be used in this way, so ANOTHER lock should be unnecessary. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
changing mm->mmap_sem (was: Re: system call for process information?)
On Wed, 14 Mar 2001, Rik van Riel wrote: > On Wed, 14 Mar 2001, george anzinger wrote: > > > Is it REALLY necessary to prevent them from seeing an > > inconsistent state? Seems to me that in the total picture (i.e. > > system wide) they will never see a consistent state, so why be > > concerned with a small corner of the system. > > You're right. Mmmm, I've looked at the code today and it turned out that we're NOT right ;) The mmap_sem is used in procfs to prevent the list of VMAs from changing. In the page fault code it seems to be used to prevent other page faults to happen at the same time with the current page fault (and to prevent VMAs from changing while a page fault is underway). Maybe we should change the mmap_sem into a R/W semaphore ? Since page faults seem to be the "common cause" of blocking procfs access *and* since both page faults and procfs only need to prevent the VMA list from changing, a read lock would help here. Write locks would be used in the code where we actually want to change the VMA list and page faults would use an extra lock to protect against each other (possibly a per-pagetable lock so multithreaded apps can pagefault in different memory regions at the same time ???). regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
changing mm-mmap_sem (was: Re: system call for process information?)
On Wed, 14 Mar 2001, Rik van Riel wrote: On Wed, 14 Mar 2001, george anzinger wrote: Is it REALLY necessary to prevent them from seeing an inconsistent state? Seems to me that in the total picture (i.e. system wide) they will never see a consistent state, so why be concerned with a small corner of the system. You're right. Mmmm, I've looked at the code today and it turned out that we're NOT right ;) The mmap_sem is used in procfs to prevent the list of VMAs from changing. In the page fault code it seems to be used to prevent other page faults to happen at the same time with the current page fault (and to prevent VMAs from changing while a page fault is underway). Maybe we should change the mmap_sem into a R/W semaphore ? Since page faults seem to be the "common cause" of blocking procfs access *and* since both page faults and procfs only need to prevent the VMA list from changing, a read lock would help here. Write locks would be used in the code where we actually want to change the VMA list and page faults would use an extra lock to protect against each other (possibly a per-pagetable lock so multithreaded apps can pagefault in different memory regions at the same time ???). regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/