Re: kswapd deadlock 2.4.3-pre6
Marcelo Tosatti writes: > Are you sure flush_page_to_ram()/flush_icache_page() without > page_table_lock held is ok for all archs? This is actually a sticky area. For example, I remember that a long time ago I noticed that it wasn't necessarily guarenteed that even all the flush_tlb_{page,mm,range}() stuff was done under the page table lock. Maybe things have changed since then, but... Furthermore I did not specify that flush_page_to_ram/flush_icache_page can assume this, see Documentation/cachetlb.txt I have no problem adding this invariant to some of the interfaces, but we have to audit things. > Looking at arch/mips/mm/r2300.c: The r2300 port is UP-only btw... Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Linus Torvalds wrote: > diff -u --recursive --new-file pre6/linux/mm/memory.c linux/mm/memory.c > --- pre6/linux/mm/memory.cTue Mar 20 23:13:03 2001 > +++ linux/mm/memory.c Wed Mar 21 22:21:27 2001 > @@ -1031,18 +1031,20 @@ > struct vm_area_struct * vma, unsigned long address, > pte_t * page_table, swp_entry_t entry, int write_access) > { > - struct page *page = lookup_swap_cache(entry); > + struct page *page; > pte_t pte; > > + spin_unlock(>page_table_lock); > + page = lookup_swap_cache(entry); > if (!page) { > - spin_unlock(>page_table_lock); > lock_kernel(); > swapin_readahead(entry); > page = read_swap_cache(entry); > unlock_kernel(); > - spin_lock(>page_table_lock); > - if (!page) > + if (!page) { > + spin_lock(>page_table_lock); > return -1; > + } > > flush_page_to_ram(page); > flush_icache_page(vma, page); Are you sure flush_page_to_ram()/flush_icache_page() without page_table_lock held is ok for all archs? Looking at arch/mips/mm/r2300.c: static void r3k_flush_cache_page(struct vm_area_struct *vma, unsigned long page) { struct mm_struct *mm = vma->vm_mm; ... if ((physpage = get_phys_page(page, vma->vm_mm))) < r3k_flush_icache_range(physpage, PAGE_SIZE); ... } static inline unsigned long get_phys_page (unsigned long addr, struct mm_struct *mm) { pgd_t *pgd; pmd_t *pmd; pte_t *pte; unsigned long physpage; pgd = pgd_offset(mm, addr); pmd = pmd_offset(pgd, addr); pte = pte_offset(pmd, addr); if((physpage = pte_val(*pte)) & _PAGE_VALID) return KSEG1ADDR(physpage & PAGE_MASK); else return 0; ... } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Linus Torvalds wrote: diff -u --recursive --new-file pre6/linux/mm/memory.c linux/mm/memory.c --- pre6/linux/mm/memory.cTue Mar 20 23:13:03 2001 +++ linux/mm/memory.c Wed Mar 21 22:21:27 2001 @@ -1031,18 +1031,20 @@ struct vm_area_struct * vma, unsigned long address, pte_t * page_table, swp_entry_t entry, int write_access) { - struct page *page = lookup_swap_cache(entry); + struct page *page; pte_t pte; + spin_unlock(mm-page_table_lock); + page = lookup_swap_cache(entry); if (!page) { - spin_unlock(mm-page_table_lock); lock_kernel(); swapin_readahead(entry); page = read_swap_cache(entry); unlock_kernel(); - spin_lock(mm-page_table_lock); - if (!page) + if (!page) { + spin_lock(mm-page_table_lock); return -1; + } flush_page_to_ram(page); flush_icache_page(vma, page); Are you sure flush_page_to_ram()/flush_icache_page() without page_table_lock held is ok for all archs? Looking at arch/mips/mm/r2300.c: static void r3k_flush_cache_page(struct vm_area_struct *vma, unsigned long page) { struct mm_struct *mm = vma-vm_mm; ... if ((physpage = get_phys_page(page, vma-vm_mm))) r3k_flush_icache_range(physpage, PAGE_SIZE); ... } static inline unsigned long get_phys_page (unsigned long addr, struct mm_struct *mm) { pgd_t *pgd; pmd_t *pmd; pte_t *pte; unsigned long physpage; pgd = pgd_offset(mm, addr); pmd = pmd_offset(pgd, addr); pte = pte_offset(pmd, addr); if((physpage = pte_val(*pte)) _PAGE_VALID) return KSEG1ADDR(physpage PAGE_MASK); else return 0; ... } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
Marcelo Tosatti writes: Are you sure flush_page_to_ram()/flush_icache_page() without page_table_lock held is ok for all archs? This is actually a sticky area. For example, I remember that a long time ago I noticed that it wasn't necessarily guarenteed that even all the flush_tlb_{page,mm,range}() stuff was done under the page table lock. Maybe things have changed since then, but... Furthermore I did not specify that flush_page_to_ram/flush_icache_page can assume this, see Documentation/cachetlb.txt I have no problem adding this invariant to some of the interfaces, but we have to audit things. Looking at arch/mips/mm/r2300.c: The r2300 port is UP-only btw... Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Linus Torvalds wrote: > Ho humm. Does the appended patch fix it for you? Looks obvious enough, but Confirmed. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Linus Torvalds wrote: > On Wed, 21 Mar 2001, Linus Torvalds wrote: > > > > The deadlock implies that somebody scheduled with page_table_lock held. > > Which would be really bad. > > ..and it is probably do_swap_page(). > > Despite the name, "lookup_swap_cache()" does more than a lookup - it will > wait for the page that it looked up. And we call it with the > page_table_lock held in do_swap_page(). Darn, you're too quick. (just figured it out and was about to report:) Trace; c012785b <__lock_page+83/ac> Trace; c012789b Trace; c01279a1 <__find_lock_page+81/f0> Trace; c013008b Trace; c0125362 Trace; c012571f Trace; c01148b4 Trace; c0114a17 Trace; c01148b4 Trace; c011581e Trace; c010908c > Ho humm. Does the appended patch fix it for you? Looks obvious enough, but > this bug is actually hidden on true SMP, and I'm too lazy to test with > "num_cpus=1" or something.. I'm sure it will, but will be back in a few with confirmation. > > Linus > > - > diff -u --recursive --new-file pre6/linux/mm/memory.c linux/mm/memory.c > --- pre6/linux/mm/memory.cTue Mar 20 23:13:03 2001 > +++ linux/mm/memory.c Wed Mar 21 22:21:27 2001 > @@ -1031,18 +1031,20 @@ > struct vm_area_struct * vma, unsigned long address, > pte_t * page_table, swp_entry_t entry, int write_access) > { > - struct page *page = lookup_swap_cache(entry); > + struct page *page; > pte_t pte; > > + spin_unlock(>page_table_lock); > + page = lookup_swap_cache(entry); > if (!page) { > - spin_unlock(>page_table_lock); > lock_kernel(); > swapin_readahead(entry); > page = read_swap_cache(entry); > unlock_kernel(); > - spin_lock(>page_table_lock); > - if (!page) > + if (!page) { > + spin_lock(>page_table_lock); > return -1; > + } > > flush_page_to_ram(page); > flush_icache_page(vma, page); > @@ -1053,13 +1055,13 @@ >* Must lock page before transferring our swap count to already >* obtained page count. >*/ > - spin_unlock(>page_table_lock); > lock_page(page); > - spin_lock(>page_table_lock); > > /* > - * Back out if somebody else faulted in this pte while we slept. > + * Back out if somebody else faulted in this pte while we > + * released the page table lock. >*/ > + spin_lock(>page_table_lock); > if (pte_present(*page_table)) { > UnlockPage(page); > page_cache_release(page); (oh well, I had the right idea at least) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Linus Torvalds wrote: > > The deadlock implies that somebody scheduled with page_table_lock held. > Which would be really bad. ..and it is probably do_swap_page(). Despite the name, "lookup_swap_cache()" does more than a lookup - it will wait for the page that it looked up. And we call it with the page_table_lock held in do_swap_page(). Ho humm. Does the appended patch fix it for you? Looks obvious enough, but this bug is actually hidden on true SMP, and I'm too lazy to test with "num_cpus=1" or something.. Linus - diff -u --recursive --new-file pre6/linux/mm/memory.c linux/mm/memory.c --- pre6/linux/mm/memory.c Tue Mar 20 23:13:03 2001 +++ linux/mm/memory.c Wed Mar 21 22:21:27 2001 @@ -1031,18 +1031,20 @@ struct vm_area_struct * vma, unsigned long address, pte_t * page_table, swp_entry_t entry, int write_access) { - struct page *page = lookup_swap_cache(entry); + struct page *page; pte_t pte; + spin_unlock(>page_table_lock); + page = lookup_swap_cache(entry); if (!page) { - spin_unlock(>page_table_lock); lock_kernel(); swapin_readahead(entry); page = read_swap_cache(entry); unlock_kernel(); - spin_lock(>page_table_lock); - if (!page) + if (!page) { + spin_lock(>page_table_lock); return -1; + } flush_page_to_ram(page); flush_icache_page(vma, page); @@ -1053,13 +1055,13 @@ * Must lock page before transferring our swap count to already * obtained page count. */ - spin_unlock(>page_table_lock); lock_page(page); - spin_lock(>page_table_lock); /* -* Back out if somebody else faulted in this pte while we slept. +* Back out if somebody else faulted in this pte while we +* released the page table lock. */ + spin_lock(>page_table_lock); if (pte_present(*page_table)) { UnlockPage(page); page_cache_release(page); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Mike Galbraith wrote: > > I have a repeatable deadlock when SMP is enabled on my UP box. > > >>EIP; c021e29a<= When you see something like this, please do gdb vmlinux (gdb) x/10i 0xc021e29a and it will basically show you where the code jumps back to. It's almost certainly the beginning of swap_out_mm() where we get the page_table_lock, but it would still be good to verify. The deadlock implies that somebody scheduled with page_table_lock held. Which would be really bad. You should be able to do something like if (current->mm && spin_is_locked(>mm->page_table_lock)) BUG(): in the scheduler to see if it triggers (this only works on UP hardware with a SMP kernel - on a real SMP machine it's entirely legal to have the lock during a schedule, as the lock may be held by any of the _other_ CPU's, of course, and the above assert would be the wrong thing to do in general) Of course, it might not be somebody scheduling with a spinlock, it might just be a recursive lock bug, but that sounds really unlikely. > ac20+2.4.2-ac20-rwmmap_sem3 does not deadlock doing the same > churn/burn via make -j30 bzImage. it won't do the page table locking for page table allocations, so it will have other bugs, though. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Rik van Riel wrote: > On Wed, 21 Mar 2001, Mike Galbraith wrote: > > > I have a repeatable deadlock when SMP is enabled on my UP box. > > Linus' version of do_anonymous_page() is racy too... Umm, forget that, I was reading too much code at once and missed a few lines ... ;) Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Mike Galbraith wrote: > I have a repeatable deadlock when SMP is enabled on my UP box. Linus' version of do_anonymous_page() is racy too... I know the one in my patch is uglier, but at least it doesn't leak memory or lose data ;) regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Mike Galbraith wrote: I have a repeatable deadlock when SMP is enabled on my UP box. Linus' version of do_anonymous_page() is racy too... I know the one in my patch is uglier, but at least it doesn't leak memory or lose data ;) regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Rik van Riel wrote: On Wed, 21 Mar 2001, Mike Galbraith wrote: I have a repeatable deadlock when SMP is enabled on my UP box. Linus' version of do_anonymous_page() is racy too... Umm, forget that, I was reading too much code at once and missed a few lines ... ;) Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Mike Galbraith wrote: I have a repeatable deadlock when SMP is enabled on my UP box. EIP; c021e29a stext_lock+1556/677b = When you see something like this, please do gdb vmlinux (gdb) x/10i 0xc021e29a and it will basically show you where the code jumps back to. It's almost certainly the beginning of swap_out_mm() where we get the page_table_lock, but it would still be good to verify. The deadlock implies that somebody scheduled with page_table_lock held. Which would be really bad. You should be able to do something like if (current-mm spin_is_locked(current-mm-page_table_lock)) BUG(): in the scheduler to see if it triggers (this only works on UP hardware with a SMP kernel - on a real SMP machine it's entirely legal to have the lock during a schedule, as the lock may be held by any of the _other_ CPU's, of course, and the above assert would be the wrong thing to do in general) Of course, it might not be somebody scheduling with a spinlock, it might just be a recursive lock bug, but that sounds really unlikely. ac20+2.4.2-ac20-rwmmap_sem3 does not deadlock doing the same churn/burn via make -j30 bzImage. it won't do the page table locking for page table allocations, so it will have other bugs, though. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Linus Torvalds wrote: The deadlock implies that somebody scheduled with page_table_lock held. Which would be really bad. ..and it is probably do_swap_page(). Despite the name, "lookup_swap_cache()" does more than a lookup - it will wait for the page that it looked up. And we call it with the page_table_lock held in do_swap_page(). Ho humm. Does the appended patch fix it for you? Looks obvious enough, but this bug is actually hidden on true SMP, and I'm too lazy to test with "num_cpus=1" or something.. Linus - diff -u --recursive --new-file pre6/linux/mm/memory.c linux/mm/memory.c --- pre6/linux/mm/memory.c Tue Mar 20 23:13:03 2001 +++ linux/mm/memory.c Wed Mar 21 22:21:27 2001 @@ -1031,18 +1031,20 @@ struct vm_area_struct * vma, unsigned long address, pte_t * page_table, swp_entry_t entry, int write_access) { - struct page *page = lookup_swap_cache(entry); + struct page *page; pte_t pte; + spin_unlock(mm-page_table_lock); + page = lookup_swap_cache(entry); if (!page) { - spin_unlock(mm-page_table_lock); lock_kernel(); swapin_readahead(entry); page = read_swap_cache(entry); unlock_kernel(); - spin_lock(mm-page_table_lock); - if (!page) + if (!page) { + spin_lock(mm-page_table_lock); return -1; + } flush_page_to_ram(page); flush_icache_page(vma, page); @@ -1053,13 +1055,13 @@ * Must lock page before transferring our swap count to already * obtained page count. */ - spin_unlock(mm-page_table_lock); lock_page(page); - spin_lock(mm-page_table_lock); /* -* Back out if somebody else faulted in this pte while we slept. +* Back out if somebody else faulted in this pte while we +* released the page table lock. */ + spin_lock(mm-page_table_lock); if (pte_present(*page_table)) { UnlockPage(page); page_cache_release(page); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Linus Torvalds wrote: On Wed, 21 Mar 2001, Linus Torvalds wrote: The deadlock implies that somebody scheduled with page_table_lock held. Which would be really bad. ..and it is probably do_swap_page(). Despite the name, "lookup_swap_cache()" does more than a lookup - it will wait for the page that it looked up. And we call it with the page_table_lock held in do_swap_page(). Darn, you're too quick. (just figured it out and was about to report:) Trace; c012785b __lock_page+83/ac Trace; c012789b lock_page+17/1c Trace; c01279a1 __find_lock_page+81/f0 Trace; c013008b lookup_swap_cache+4b/164 Trace; c0125362 do_swap_page+12/1cc Trace; c012571f handle_mm_fault+77/c4 Trace; c01148b4 do_page_fault+0/426 Trace; c0114a17 do_page_fault+163/426 Trace; c01148b4 do_page_fault+0/426 Trace; c011581e schedule+3e6/5ec Trace; c010908c error_code+34/3c Ho humm. Does the appended patch fix it for you? Looks obvious enough, but this bug is actually hidden on true SMP, and I'm too lazy to test with "num_cpus=1" or something.. I'm sure it will, but will be back in a few with confirmation. Linus - diff -u --recursive --new-file pre6/linux/mm/memory.c linux/mm/memory.c --- pre6/linux/mm/memory.cTue Mar 20 23:13:03 2001 +++ linux/mm/memory.c Wed Mar 21 22:21:27 2001 @@ -1031,18 +1031,20 @@ struct vm_area_struct * vma, unsigned long address, pte_t * page_table, swp_entry_t entry, int write_access) { - struct page *page = lookup_swap_cache(entry); + struct page *page; pte_t pte; + spin_unlock(mm-page_table_lock); + page = lookup_swap_cache(entry); if (!page) { - spin_unlock(mm-page_table_lock); lock_kernel(); swapin_readahead(entry); page = read_swap_cache(entry); unlock_kernel(); - spin_lock(mm-page_table_lock); - if (!page) + if (!page) { + spin_lock(mm-page_table_lock); return -1; + } flush_page_to_ram(page); flush_icache_page(vma, page); @@ -1053,13 +1055,13 @@ * Must lock page before transferring our swap count to already * obtained page count. */ - spin_unlock(mm-page_table_lock); lock_page(page); - spin_lock(mm-page_table_lock); /* - * Back out if somebody else faulted in this pte while we slept. + * Back out if somebody else faulted in this pte while we + * released the page table lock. */ + spin_lock(mm-page_table_lock); if (pte_present(*page_table)) { UnlockPage(page); page_cache_release(page); (oh well, I had the right idea at least) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd deadlock 2.4.3-pre6
On Wed, 21 Mar 2001, Linus Torvalds wrote: Ho humm. Does the appended patch fix it for you? Looks obvious enough, but Confirmed. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/