Re: Re: unused swap offset / bad page map.
On 27/08/13 17:30, Dave Jones wrote: Seems to do the trick. We are running many virtualization hosts with Linux 3.11.3, qemu 1.6.1 + kvm and ksm. The hosts have 128GB RAM, 10GB swap and 24x AMD Opteron 6238 cores. Several times few weeks ago, we have seen the OOM killer come to life and quickly kill a large number of VMs on a host, even when there appears to be free memory on that host at the start of this. However the OOM killings are preceded by some other traces, similar to the ones that were reported by Dave couple of months ago in this very thread (https://lkml.org/lkml/2013/8/7/27). The relevant kernel log lines read: 20:30:44 kernel: swap_free: Unused swap file entry 2000200 20:30:44 kernel: BUG: Bad page map in process qemu-system-x86 pte:00040002 pmd:1ecc0d4067 20:30:44 kernel: addr:7f5b8b404000 vm_flags:80100073 anon_vma:880ff0e9df00 mapping: (null) index:7f5b8b404 20:30:44 kernel: CPU: 9 PID: 22652 Comm: qemu-system-x86 Not tainted 3.11.2-elastic #2 20:30:44 kernel: Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 2.0b 03/01/2012 20:30:44 kernel: 7f5b8b404000 8807b76b1ab8 817ee7a6 000400f6 20:30:44 kernel: 880ea36a0e60 8807b76b1b08 81135ed5 000e 20:30:44 kernel: 0007f5b8b404 8807b76b1b08 7f5b8b404000 880ea36a0e60 20:30:44 kernel: Call Trace: 20:30:44 kernel: [] dump_stack+0x55/0x86 20:30:44 kernel: [] print_bad_pte+0x1f5/0x213 20:30:44 kernel: [] unmap_single_vma+0x509/0x6d6 20:30:44 kernel: [] unmap_vmas+0x4d/0x80 20:30:44 kernel: [] exit_mmap+0x93/0x11e 20:30:44 kernel: [] mmput+0x51/0xdb 20:30:44 kernel: [] do_exit+0x33c/0x8a2 20:30:44 kernel: [] ? get_futex_key+0x87/0x20c 20:30:44 kernel: [] ? __dequeue_signal+0x16/0x114 20:30:44 kernel: [] do_group_exit+0x6a/0x9d 20:30:44 kernel: [] get_signal_to_deliver+0x488/0x4a7 20:30:44 kernel: [] do_signal+0x47/0x48f 20:30:44 kernel: [] ? rcu_eqs_enter+0x7d/0x82 20:30:44 kernel: [] ? account_user_time+0x6a/0x95 20:30:44 kernel: [] ? vtime_account_user+0x5d/0x65 20:30:44 kernel: [] do_notify_resume+0x28/0x6a 20:30:44 kernel: [] int_signal+0x12/0x17 20:30:44 kernel: Disabling lock debugging due to kernel taint 20:30:44 kernel: 33550335 pages RAM 20:30:44 kernel: 561601 pages reserved 20:30:44 kernel: 24628376 pages shared 20:30:44 kernel: 7190750 pages non-shared Since we are using a 3.11.3 kernel, it already contains Cyrill's fix. However, our kernel log is very similar to Dave's report, so we are wondering if our mass OOM kill is another problem in the same area? Any thoughts on this? I can provide more information from the logs, if necessary, and my colleague Richard originally reported the mass OOM kill in detail at http://article.gmane.org/gmane.linux.kernel.mm/108703. Cheers, Alin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Tue, Aug 27, 2013 at 12:24:27PM -0400, Dave Jones wrote: > > > > I managed to trigger the issue as well. The patch below fixes it. > > Dave, could you please give it a shot once time permit? > > Seems to do the trick. > > Tested-by: Dave Jones Thanks a lot, Dave! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Tue, Aug 27, 2013 at 12:37:18PM +0400, Cyrill Gorcunov wrote: > On Mon, Aug 26, 2013 at 06:28:33PM -0400, Dave Jones wrote: > > > > > > I've not tried matching up bits with Dave's reports, and just going > > > into a meeting now, but this patch looks worth a try: probably Cyrill > > > can improve it meanwhile to what he actually wants there (I'm > > > surprised anything special is needed for just moving a pte). > > > > > > Hugh > > > > > > --- 3.11-rc7/mm/mremap.c2013-07-14 17:10:16.640003652 -0700 > > > +++ linux/mm/mremap.c 2013-08-26 14:46:14.460027627 -0700 > > > @@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str > > > continue; > > > pte = ptep_get_and_clear(mm, old_addr, old_pte); > > > pte = move_pte(pte, new_vma->vm_page_prot, old_addr, > > new_addr); > > > - set_pte_at(mm, new_addr, new_pte, > > pte_mksoft_dirty(pte)); > > > + set_pte_at(mm, new_addr, new_pte, pte); > > > } > > > > I'll give this a shot once I'm done with the bisect. > > I managed to trigger the issue as well. The patch below fixes it. > Dave, could you please give it a shot once time permit? Seems to do the trick. Tested-by: Dave Jones Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Mon, Aug 26, 2013 at 06:28:33PM -0400, Dave Jones wrote: > > > > I've not tried matching up bits with Dave's reports, and just going > > into a meeting now, but this patch looks worth a try: probably Cyrill > > can improve it meanwhile to what he actually wants there (I'm > > surprised anything special is needed for just moving a pte). > > > > Hugh > > > > --- 3.11-rc7/mm/mremap.c 2013-07-14 17:10:16.640003652 -0700 > > +++ linux/mm/mremap.c 2013-08-26 14:46:14.460027627 -0700 > > @@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str > >continue; > >pte = ptep_get_and_clear(mm, old_addr, old_pte); > >pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr); > > - set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte)); > > + set_pte_at(mm, new_addr, new_pte, pte); > >} > > I'll give this a shot once I'm done with the bisect. I managed to trigger the issue as well. The patch below fixes it. Dave, could you please give it a shot once time permit? Pavel, I kept 'make it dirty on move' logic, but i'm somehow doubt in it, won't plain pte copying (as in Hugh's patch) work of us? --- From: Cyrill Gorcunov Subject: [PATCH] mm: move_ptes -- Set soft dirty bit depending on pte type Dave reported corrupted swap entries | [ 4588.541886] swap_free: Unused swap offset entry 2d15 | [ 4588.541952] BUG: Bad page map in process trinity-kid12 pte:005a2a80 pmd:22c01f067 and Hugh pointed that in move_ptes _PAGE_SOFT_DIRTY bit set regardless the type of entry pte consists of. The trick here is that -- when we carry soft dirty status in swap entries we are to use _PAGE_SWP_SOFT_DIRTY instead, because this is the only place in pte which can be used for own needs without intersecting with bits owned by swap entry type/offset. Reported-by: Dave Jones Signed-off-by: Cyrill Gorcunov Cc: Pavel Emelyanov Cc: Linus Torvalds Cc: Hugh Dickins Cc: Hillf Danton Cc: Andrew Morton --- mm/mremap.c | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) Index: linux-2.6.git/mm/mremap.c === --- linux-2.6.git.orig/mm/mremap.c +++ linux-2.6.git/mm/mremap.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -69,6 +70,23 @@ static pmd_t *alloc_new_pmd(struct mm_st return pmd; } +static pte_t move_soft_dirty_pte(pte_t pte) +{ + /* +* Set soft dirty bit so we can notice +* in userspace the ptes were moved. +*/ +#ifdef CONFIG_MEM_SOFT_DIRTY + if (pte_present(pte)) + pte = pte_mksoft_dirty(pte); + else if (is_swap_pte(pte)) + pte = pte_swp_mksoft_dirty(pte); + else if (pte_file(pte)) + pte = pte_file_mksoft_dirty(pte); +#endif + return pte; +} + static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, unsigned long old_addr, unsigned long old_end, struct vm_area_struct *new_vma, pmd_t *new_pmd, @@ -126,7 +144,8 @@ static void move_ptes(struct vm_area_str continue; pte = ptep_get_and_clear(mm, old_addr, old_pte); pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr); - set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte)); + pte = move_soft_dirty_pte(pte); + set_pte_at(mm, new_addr, new_pte, pte); } arch_leave_lazy_mmu_mode(); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Mon, Aug 26, 2013 at 04:15:00PM -0700, Linus Torvalds wrote: > On Mon, Aug 26, 2013 at 3:08 PM, Hugh Dickins wrote: > > > > I just did a quick diff of 3.11-rc7/mm against 3.10, and here's > > a line in mremap which worries me. That set_pte_at() is operating > > on anything that isn't pte_none(), so the pte_mksoft_dirty() looks > > prone to corrupt a swap entry. > > Uhhuh. I think you hit the nail on the head here. > > I checked all the pte_swp_*soft_dirty() users (they should be used on > swp entries), because that came up in another thread. But you're > right, the non-swp ones only work on present pte entries (or on > file-offset entries, I guess), and at least that mremap() case seems > bogus. Oh my :( Indeed it sets _PAGE_SOFT_DIRTY unconditionally, sigh. This nit comes from former soft-dirty commit. Let me check all other places we set soft dirty bit (Pavel CC'ed). > I'm not seeing the point of marking the thing soft-dirty at all, > although I guess it's "dirty" in the sense that it changed the > contents at that virtual address. But for that code to work, it would > have to have the same bit for swap entries as for present pages (and > for file mapping entries), and that's not true. They are two different > bits (_PAGE_SOFT_DIRTY is bit #11 vs _PAGE_SWP_SOFT_DIRTY is bit #7). > > Ugh. Cyrill, this is a mess. Linus, I simply had no place in pte entry to carry soft-dirty status when pte incoded in swap format, so it was unpleasant but necessary decision. That's why bits access are wrapped in own macros with 'swp' prefix thus reader would easily grep for them. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Mon, Aug 26, 2013 at 3:08 PM, Hugh Dickins wrote: > > I just did a quick diff of 3.11-rc7/mm against 3.10, and here's > a line in mremap which worries me. That set_pte_at() is operating > on anything that isn't pte_none(), so the pte_mksoft_dirty() looks > prone to corrupt a swap entry. Uhhuh. I think you hit the nail on the head here. I checked all the pte_swp_*soft_dirty() users (they should be used on swp entries), because that came up in another thread. But you're right, the non-swp ones only work on present pte entries (or on file-offset entries, I guess), and at least that mremap() case seems bogus. I'm not seeing the point of marking the thing soft-dirty at all, although I guess it's "dirty" in the sense that it changed the contents at that virtual address. But for that code to work, it would have to have the same bit for swap entries as for present pages (and for file mapping entries), and that's not true. They are two different bits (_PAGE_SOFT_DIRTY is bit #11 vs _PAGE_SWP_SOFT_DIRTY is bit #7). Ugh. Cyrill, this is a mess. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Mon, Aug 26, 2013 at 03:08:45PM -0700, Hugh Dickins wrote: > > That said, google does find "swap_free: Unused swap offset entry" > > reports from over the years. Most of them seem to be single-bit > > errors, though (ie when the entry is 0100 or similar I'm more > > inclined to blame a bit error > > Yes, historically they have usually represented either single-bit > errors, or corruption of page tables by other kernel data. The > swap subsystem discovers it, but it's rarely an error of swap. Just to rule out bad hardware, I've seen this on two systems (admittedly the exact same spec, but still..) > So I don't care for Dave's suggestion much earlier in this thread, > that swapoff should fail with -EINVAL if there has been a bad page > taint: that doesn't necessarily interfere with swapoff at all. > > And besides, swapoff is killable: yes, if counts go wrong, it > can cycle around endlessly, but it checks for signal_pending() > each time around the loop. It might be killable, but if I've done /sbin/reboot, and the kernel dies in sys_swapoff because of the corruption, I won't get a chance to kill it, because at that point the shutdown process has killed my shell, sshd, and just about everything else. It mieans a grumpy walk to the other side of the house to prod a reset button. So yeah, it might not be a mergable thing, but at least while bisecting it's pretty much a must-have. > I just did a quick diff of 3.11-rc7/mm against 3.10, and here's > a line in mremap which worries me. That set_pte_at() is operating > on anything that isn't pte_none(), so the pte_mksoft_dirty() looks > prone to corrupt a swap entry. > > I've not tried matching up bits with Dave's reports, and just going > into a meeting now, but this patch looks worth a try: probably Cyrill > can improve it meanwhile to what he actually wants there (I'm > surprised anything special is needed for just moving a pte). > > Hugh > > --- 3.11-rc7/mm/mremap.c 2013-07-14 17:10:16.640003652 -0700 > +++ linux/mm/mremap.c2013-08-26 14:46:14.460027627 -0700 > @@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str > continue; > pte = ptep_get_and_clear(mm, old_addr, old_pte); > pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr); > -set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte)); > +set_pte_at(mm, new_addr, new_pte, pte); > } I'll give this a shot once I'm done with the bisect. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Mon, 26 Aug 2013, Linus Torvalds wrote: > On Mon, Aug 26, 2013 at 1:15 PM, Linus Torvalds > wrote: > > > > So I'm almost likely to think that we are more likely to have > > something wrong in the messy magical special cases. > > Of course, the good news would be if it actually ends up being the > soft-dirty stuff, and bisection hits something recent. I suspect so. > > So maybe I'm overly pessimistic. That messy swap_map[] code really > _is_ messy, but at the same time it should also be pretty well-tested. > I don't think it's been touched in years. Blame me for the byte-instead-of-short continuation stuff. But it's never yet shown any problem (okay, perhaps that's because it's so rare to need any continuation anyway). > > That said, google does find "swap_free: Unused swap offset entry" > reports from over the years. Most of them seem to be single-bit > errors, though (ie when the entry is 0100 or similar I'm more > inclined to blame a bit error Yes, historically they have usually represented either single-bit errors, or corruption of page tables by other kernel data. The swap subsystem discovers it, but it's rarely an error of swap. So I don't care for Dave's suggestion much earlier in this thread, that swapoff should fail with -EINVAL if there has been a bad page taint: that doesn't necessarily interfere with swapoff at all. And besides, swapoff is killable: yes, if counts go wrong, it can cycle around endlessly, but it checks for signal_pending() each time around the loop. > - in contrast your values look like "real" swap entries). Indeed they do. I just did a quick diff of 3.11-rc7/mm against 3.10, and here's a line in mremap which worries me. That set_pte_at() is operating on anything that isn't pte_none(), so the pte_mksoft_dirty() looks prone to corrupt a swap entry. I've not tried matching up bits with Dave's reports, and just going into a meeting now, but this patch looks worth a try: probably Cyrill can improve it meanwhile to what he actually wants there (I'm surprised anything special is needed for just moving a pte). Hugh --- 3.11-rc7/mm/mremap.c2013-07-14 17:10:16.640003652 -0700 +++ linux/mm/mremap.c 2013-08-26 14:46:14.460027627 -0700 @@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str continue; pte = ptep_get_and_clear(mm, old_addr, old_pte); pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr); - set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte)); + set_pte_at(mm, new_addr, new_pte, pte); } arch_leave_lazy_mmu_mode(); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Tue, Aug 27, 2013 at 01:49:40AM +0400, Cyrill Gorcunov wrote: > On Mon, Aug 26, 2013 at 05:42:44PM -0400, Dave Jones wrote: > > > > Yeah, for reproducing this bug, I'd stick to running it as a user, without > > --dangerous. > > you might still hit a few fairly-easy to trigger warn-on/printks. I run > > with > > this applied: http://paste.fedoraproject.org/34960/55323613/raw/ to make > > things > > a little less noisy. > > Ah, thanks, pulling it in. Btw, have you seen this problem earlier than -rc4 > at all? I just hit it on 3.11rc1. Couldn't reproduce on 3.10. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Mon, Aug 26, 2013 at 05:42:44PM -0400, Dave Jones wrote: > > Yeah, for reproducing this bug, I'd stick to running it as a user, without > --dangerous. > you might still hit a few fairly-easy to trigger warn-on/printks. I run with > this applied: http://paste.fedoraproject.org/34960/55323613/raw/ to make > things > a little less noisy. Ah, thanks, pulling it in. Btw, have you seen this problem earlier than -rc4 at all? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Tue, Aug 27, 2013 at 01:37:54AM +0400, Cyrill Gorcunov wrote: > On Tue, Aug 27, 2013 at 12:42:03AM +0400, Cyrill Gorcunov wrote: > > On Mon, Aug 26, 2013 at 04:37:02PM -0400, Dave Jones wrote: > > > > > > Try adding the -C64 to the invocation in scripts/test-multi.sh, > > > and perhaps up'ing the NR_PROCESSES variable there too. > > > > Thanks! I'll ping you if I manage to crash my instance. > > So trinity tained kernel, but definitely not in place I'm interested. > > [ 320.904506] raw_sendmsg: trinity-child14 forgot to set AF_INET. Fix it! > [ 329.570812] [ cut here ] > [ 329.571650] WARNING: CPU: 0 PID: 1982 at kernel/lockdep.c:3552 > check_flags+0x18a/0x1c1() > [ 329.571650] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled) > [ 329.571650] Modules linked in: > [ 329.571650] CPU: 0 PID: 1982 Comm: trinity-child4 Not tainted > 3.11.0-rc6-dirty #386 > [ 329.571650] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [ 329.571650] 0009 88001ee03b10 8157ac8a > 0006 > [ 329.571650] 88001ee03b60 88001ee03b50 81045bb2 > 81583840 > [ 329.571650] 81092620 880002b48000 0046 > 81a2f750 > [ 329.571650] Call Trace: > [ 329.571650][] dump_stack+0x4f/0x84 > [ 329.571650] [] warn_slowpath_common+0x81/0x9b > [ 329.571650] [] ? ftrace_call+0x5/0x2f > [ 329.571650] [] ? check_flags+0x18a/0x1c1 > [ 329.571650] [] warn_slowpath_fmt+0x46/0x48 > [ 329.571650] [] ? warn_slowpath_fmt+0x5/0x48 > [ 329.571650] [] check_flags+0x18a/0x1c1 > [ 329.571650] [] lock_is_held+0x30/0x5f > [ 329.571650] [] rcu_read_lock_held+0x36/0x38 > [ 329.571650] [] perf_tp_event+0x92/0x220 > [ 329.571650] [] ? perf_tp_event+0x20e/0x220 > [ 329.571650] [] ? __local_bh_enable+0x9a/0x9e > [ 329.571650] [] ? get_parent_ip+0x3f/0x3f > [ 329.571650] [] ? __local_bh_enable+0x9a/0x9e > [ 329.571650] [] perf_ftrace_function_call+0xce/0xdc when it rains, it pours.. > (since my config pretty similar to yours I tried to run trinity without > kernel recompilation. At first i loaded swap space with crap data > > [root@ovz trinity]# free > total used free sharedbuffers cached > Mem:493228 480188 13040 0 2912 12112 > -/+ buffers/cache: 465164 28064 > Swap: 20633561741304 322052 > > then run it as > > [root@ovz trinity]# ./trinity -C64 --dangerous) Yeah, for reproducing this bug, I'd stick to running it as a user, without --dangerous. you might still hit a few fairly-easy to trigger warn-on/printks. I run with this applied: http://paste.fedoraproject.org/34960/55323613/raw/ to make things a little less noisy. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Tue, Aug 27, 2013 at 12:42:03AM +0400, Cyrill Gorcunov wrote: > On Mon, Aug 26, 2013 at 04:37:02PM -0400, Dave Jones wrote: > > > > Try adding the -C64 to the invocation in scripts/test-multi.sh, > > and perhaps up'ing the NR_PROCESSES variable there too. > > Thanks! I'll ping you if I manage to crash my instance. So trinity tained kernel, but definitely not in place I'm interested. [ 320.904506] raw_sendmsg: trinity-child14 forgot to set AF_INET. Fix it! [ 329.570812] [ cut here ] [ 329.571650] WARNING: CPU: 0 PID: 1982 at kernel/lockdep.c:3552 check_flags+0x18a/0x1c1() [ 329.571650] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled) [ 329.571650] Modules linked in: [ 329.571650] CPU: 0 PID: 1982 Comm: trinity-child4 Not tainted 3.11.0-rc6-dirty #386 [ 329.571650] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 329.571650] 0009 88001ee03b10 8157ac8a 0006 [ 329.571650] 88001ee03b60 88001ee03b50 81045bb2 81583840 [ 329.571650] 81092620 880002b48000 0046 81a2f750 [ 329.571650] Call Trace: [ 329.571650][] dump_stack+0x4f/0x84 [ 329.571650] [] warn_slowpath_common+0x81/0x9b [ 329.571650] [] ? ftrace_call+0x5/0x2f [ 329.571650] [] ? check_flags+0x18a/0x1c1 [ 329.571650] [] warn_slowpath_fmt+0x46/0x48 [ 329.571650] [] ? warn_slowpath_fmt+0x5/0x48 [ 329.571650] [] check_flags+0x18a/0x1c1 [ 329.571650] [] lock_is_held+0x30/0x5f [ 329.571650] [] rcu_read_lock_held+0x36/0x38 [ 329.571650] [] perf_tp_event+0x92/0x220 [ 329.571650] [] ? perf_tp_event+0x20e/0x220 [ 329.571650] [] ? __local_bh_enable+0x9a/0x9e [ 329.571650] [] ? get_parent_ip+0x3f/0x3f [ 329.571650] [] ? __local_bh_enable+0x9a/0x9e [ 329.571650] [] perf_ftrace_function_call+0xce/0xdc ... (since my config pretty similar to yours I tried to run trinity without kernel recompilation. At first i loaded swap space with crap data [root@ovz trinity]# free total used free sharedbuffers cached Mem:493228 480188 13040 0 2912 12112 -/+ buffers/cache: 465164 28064 Swap: 20633561741304 322052 then run it as [root@ovz trinity]# ./trinity -C64 --dangerous) I'll continue tomorrow with your config and test-multi.sh. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Mon, Aug 26, 2013 at 1:15 PM, Linus Torvalds wrote: > > So I'm almost likely to think that we are more likely to have > something wrong in the messy magical special cases. Of course, the good news would be if it actually ends up being the soft-dirty stuff, and bisection hits something recent. So maybe I'm overly pessimistic. That messy swap_map[] code really _is_ messy, but at the same time it should also be pretty well-tested. I don't think it's been touched in years. That said, google does find "swap_free: Unused swap offset entry" reports from over the years. Most of them seem to be single-bit errors, though (ie when the entry is 0100 or similar I'm more inclined to blame a bit error - in contrast your values look like "real" swap entries). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Mon, Aug 26, 2013 at 04:37:02PM -0400, Dave Jones wrote: > > Try adding the -C64 to the invocation in scripts/test-multi.sh, > and perhaps up'ing the NR_PROCESSES variable there too. Thanks! I'll ping you if I manage to crash my instance. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Tue, Aug 27, 2013 at 12:18:46AM +0400, Cyrill Gorcunov wrote: > On Mon, Aug 26, 2013 at 03:08:22PM -0400, Dave Jones wrote: > > On Mon, Aug 26, 2013 at 11:45:53AM +0800, Hillf Danton wrote: > > > On Fri, Aug 23, 2013 at 11:53 AM, Dave Jones wrote: > > > > > > > > It actually seems worse, seems I can trigger it even easier now, as if > > > > there's a leak. > > > > > > > Can you please try the new fix for TLB flush? > > > > > > commit 2b047252d087be7f2ba > > > Fix TLB gather virtual address range invalidation corner cases > > > > No luck. > > Hi Dave, could you please put your .config somewhere so i would try > to repeat this problem? (i've tried trinity with -C64 but it didn't > trigger the issue). http://paste.fedoraproject.org/34944/77549285 machine I'm using has 8gb ram, 8gb swap, and 4 cores. Try adding the -C64 to the invocation in scripts/test-multi.sh, and perhaps up'ing the NR_PROCESSES variable there too. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Mon, Aug 26, 2013 at 03:08:22PM -0400, Dave Jones wrote: > On Mon, Aug 26, 2013 at 11:45:53AM +0800, Hillf Danton wrote: > > On Fri, Aug 23, 2013 at 11:53 AM, Dave Jones wrote: > > > > > > It actually seems worse, seems I can trigger it even easier now, as if > > > there's a leak. > > > > > Can you please try the new fix for TLB flush? > > > > commit 2b047252d087be7f2ba > > Fix TLB gather virtual address range invalidation corner cases > > No luck. Hi Dave, could you please put your .config somewhere so i would try to repeat this problem? (i've tried trinity with -C64 but it didn't trigger the issue). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Mon, Aug 26, 2013 at 12:08 PM, Dave Jones wrote: > > [ 4588.541886] swap_free: Unused swap offset entry 2d15 > [ 4588.541952] BUG: Bad page map in process trinity-kid12 pte:005a2a80 > pmd:22c01f067 > > I can reproduce this pretty quickly by driving the system into swapping using > a few instances of 'trinity -C64' (this creates 64 threads) > > I'm not sure how far back this bug goes, so I'll try some older kernels > and see if I can bisect it, because we don't seem to be getting closer > to figuring out what's actually happening.. Bisecting would indeed be good. But I get the feeling that you'll need to go back a *long* time, because the swap_map[] code hasn't changed in ages. I'm adding Hugh Dickins to the cc just in case he hasn't seen this on linux-mm, because the swap_map[] code is complex as hell, and Hugh did touch some of it last. The whole swap_map[] thing is complicated by: - it's a single byte per swap entry - it's not even a *structured* byte, but a single counter that has several "fields" by hand - it has a count in the low 6 bits, with a magic "bad" value (which is also a magic "continuation" value if one of the high bits are set) - it has two magic bits: HAS_CACHE and CONTINUED - it has a _third_ magic value (SWAP_MAP_SHMEM) which is "CONTINUED+BAD" - we increment this nasty pseudo-counter wildly hackily, and and have magic special case checks for the odd cases and if we get any of the special cases wrong, we'll increment/decrement it wrong, and we're screwed. The *locking* looks pretty simple, though. It's a simple spinlock. We do some optimistic tests outside the spinlock, but the actual allocation and modification seem to all be inside the lock and re-check any optimistic values afaik. So I'm almost likely to think that we are more likely to have something wrong in the messy magical special cases. I'm wondering if we should get rid of the continuation crap, for example, and expand the "one byte per swap page" to two bytes instead. Hugh, I think you know this code best, because you added the last special case (that SWAP_MAP_SHMEM value). Comments? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Mon, Aug 26, 2013 at 11:45:53AM +0800, Hillf Danton wrote: > On Fri, Aug 23, 2013 at 11:53 AM, Dave Jones wrote: > > > > It actually seems worse, seems I can trigger it even easier now, as if > > there's a leak. > > > Can you please try the new fix for TLB flush? > > commit 2b047252d087be7f2ba > Fix TLB gather virtual address range invalidation corner cases No luck. [ 4588.541886] swap_free: Unused swap offset entry 2d15 [ 4588.541952] BUG: Bad page map in process trinity-kid12 pte:005a2a80 pmd:22c01f067 [ 4588.541979] addr:7f0e95fa8000 vm_flags:00100073 anon_vma:880217665550 mapping: (null) index:1a42 [ 4588.542011] Modules linked in: snd_seq_dummy fuse hidp bnep scsi_transport_iscsi rfcomm ipt_ULOG can_bcm can_raw nfnetlink nfc caif_socket caif af_802154 phonet af_rxrpc bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 xfs libcrc32c snd_hda_codec_realtek snd_hda_intel e1000e snd_hda_codec snd_hwdep ptp snd_seq snd_seq_device snd_pcm usb_debug pps_core pcspkr snd_page_alloc snd_timer snd soundcore [ 4588.542245] CPU: 2 PID: 25390 Comm: trinity-kid12 Not tainted 3.11.0-rc7+ #13 [ 4588.542321] 88021ba33c98 816f9ddf 7f0e95fa8000 [ 4588.542354] 88021ba33ce0 81177047 005a2a80 1a42 [ 4588.542386] 7f0e9600 88022c01fd40 005a2a80 88021ba33e00 [ 4588.542418] Call Trace: [ 4588.542435] [] dump_stack+0x54/0x74 [ 4588.542457] [] print_bad_pte+0x187/0x220 [ 4588.542478] [] unmap_single_vma+0x524/0x850 [ 4588.542500] [] unmap_vmas+0x49/0x90 [ 4588.542521] [] exit_mmap+0xc5/0x170 [ 4588.542542] [] mmput+0x77/0x100 [ 4588.542562] [] do_exit+0x28d/0xcd0 [ 4588.542583] [] ? trace_hardirqs_on_caller+0x115/0x1e0 [ 4588.542607] [] ? trace_hardirqs_on+0xd/0x10 [ 4588.542629] [] do_group_exit+0x4c/0xc0 [ 4588.543534] [] SyS_exit_group+0x14/0x20 [ 4588.544438] [] tracesys+0xdd/0xe2 I can reproduce this pretty quickly by driving the system into swapping using a few instances of 'trinity -C64' (this creates 64 threads) I'm not sure how far back this bug goes, so I'll try some older kernels and see if I can bisect it, because we don't seem to be getting closer to figuring out what's actually happening.. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Fri, Aug 23, 2013 at 11:53 AM, Dave Jones wrote: > > It actually seems worse, seems I can trigger it even easier now, as if > there's a leak. > Can you please try the new fix for TLB flush? commit 2b047252d087be7f2ba Fix TLB gather virtual address range invalidation corner cases -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Fri, Aug 23, 2013 at 11:27:29AM +0800, Hillf Danton wrote: > On Fri, Aug 23, 2013 at 11:21 AM, Dave Jones wrote: > > > > I still see the swap_free messages with this applied. > > > Decremented? It actually seems worse, seems I can trigger it even easier now, as if there's a leak. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Thu, Aug 22, 2013 at 11:21:28AM +0800, Hillf Danton wrote: > On Thu, Aug 22, 2013 at 4:49 AM, Dave Jones wrote: > > > > didn't hit the bug_on, but got a bunch of > > > > [ 424.077993] swap_free: Unused swap offset entry 000187d5 > > [ 439.377194] swap_free: Unused swap offset entry 000187e7 > > [ 441.998411] swap_free: Unused swap offset entry 000187ee > > [ 446.956551] swap_free: Unused swap offset entry 245f > > > If page is reused, its swap entry is freed. > > reuse_swap_page() > delete_from_swap_cache() > swapcache_free() > count = swap_entry_free(p, entry, SWAP_HAS_CACHE); > > If count drops to zero, then swap_free() gives warning. > > > --- a/mm/memory.c Wed Aug 7 16:29:34 2013 > +++ b/mm/memory.c Thu Aug 22 10:44:32 2013 > @@ -3123,6 +3123,7 @@ static int do_swap_page(struct mm_struct > /* It's better to call commit-charge after rmap is established */ > mem_cgroup_commit_charge_swapin(page, ptr); > > + if (!exclusive) > swap_free(entry); > if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) > try_to_free_swap(page); > -- I still see the swap_free messages with this applied. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Thu, Aug 22, 2013 at 4:49 AM, Dave Jones wrote: > > didn't hit the bug_on, but got a bunch of > > [ 424.077993] swap_free: Unused swap offset entry 000187d5 > [ 439.377194] swap_free: Unused swap offset entry 000187e7 > [ 441.998411] swap_free: Unused swap offset entry 000187ee > [ 446.956551] swap_free: Unused swap offset entry 245f > If page is reused, its swap entry is freed. reuse_swap_page() delete_from_swap_cache() swapcache_free() count = swap_entry_free(p, entry, SWAP_HAS_CACHE); If count drops to zero, then swap_free() gives warning. --- a/mm/memory.c Wed Aug 7 16:29:34 2013 +++ b/mm/memory.c Thu Aug 22 10:44:32 2013 @@ -3123,6 +3123,7 @@ static int do_swap_page(struct mm_struct /* It's better to call commit-charge after rmap is established */ mem_cgroup_commit_charge_swapin(page, ptr); + if (!exclusive) swap_free(entry); if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) try_to_free_swap(page); -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Thu, Aug 22, 2013 at 4:49 AM, Dave Jones wrote: > > didn't hit the bug_on, but got a bunch of > > [ 424.077993] swap_free: Unused swap offset entry 000187d5 > [ 439.377194] swap_free: Unused swap offset entry 000187e7 > [ 441.998411] swap_free: Unused swap offset entry 000187ee > [ 446.956551] swap_free: Unused swap offset entry 245f > Related to the regression reported? Regression: x86/mm: new _PTE_SWP_SOFT_DIRTY bit conflicts with existing use https://lkml.org/lkml/2013/8/21/294 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Tue, Aug 20, 2013 at 12:39:05PM +0800, Hillf Danton wrote: > On Tue, Aug 20, 2013 at 7:18 AM, Dave Jones wrote: > > --- a/mm/memory.c Wed Aug 7 16:29:34 2013 > +++ b/mm/memory.c Tue Aug 20 11:13:06 2013 > @@ -933,8 +933,10 @@ again: > if (progress >= 32) { > progress = 0; > if (need_resched() || > -spin_needbreak(src_ptl) || spin_needbreak(dst_ptl)) > +spin_needbreak(src_ptl) || spin_needbreak(dst_ptl)) { > + BUG_ON(entry.val); > break; > + } > } > if (pte_none(*src_pte)) { > progress++; didn't hit the bug_on, but got a bunch of [ 424.077993] swap_free: Unused swap offset entry 000187d5 [ 439.377194] swap_free: Unused swap offset entry 000187e7 [ 441.998411] swap_free: Unused swap offset entry 000187ee [ 446.956551] swap_free: Unused swap offset entry 245f Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Tue, Aug 20, 2013 at 7:18 AM, Dave Jones wrote: > > btw, anyone have thoughts on a patch something like below ? And another(sorry if message is reformatted by the mail agent, and it took my an hour to get the agent back to the correct format but failed, and thanks a lot for any howto send plain text message). Hillf --- a/mm/memory.c Wed Aug 7 16:29:34 2013 +++ b/mm/memory.c Tue Aug 20 11:13:06 2013 @@ -933,8 +933,10 @@ again: if (progress >= 32) { progress = 0; if (need_resched() || -spin_needbreak(src_ptl) || spin_needbreak(dst_ptl)) +spin_needbreak(src_ptl) || spin_needbreak(dst_ptl)) { + BUG_ON(entry.val); break; + } } if (pte_none(*src_pte)) { progress++; -- > It's really annoying to debug stuff like this and have to walk > over to the machine and reboot it by hand after it wedges during swapoff. > > Dave > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 6cf2e60..bbb1192 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1587,6 +1587,10 @@ SYSCALL_DEFINE1(swapoff, const char __user *, > specialfile) > if (!capable(CAP_SYS_ADMIN)) > return -EPERM; > > + /* If we have hit memory corruption, we could hang during swapoff, so > don't even try. */ > + if (test_taint(TAINT_BAD_PAGE)) > + return -EINVAL; > + > BUG_ON(!current->mm); > > pathname = getname(specialfile); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Thu, Aug 08, 2013 at 11:20:28PM +0800, Hillf Danton wrote: > On Wed, Aug 7, 2013 at 11:30 PM, Dave Jones wrote: > > printk didn't trigger. > > > Is a corrupted page table entry encountered, according to the > comment of swap_duplicate()? > > > --- a/mm/swapfile.c Wed Aug 7 17:27:22 2013 > +++ b/mm/swapfile.c Thu Aug 8 23:12:30 2013 > @@ -770,6 +770,7 @@ int free_swap_and_cache(swp_entry_t entr > unlock_page(page); > page_cache_release(page); > } > +return 1; > return p != NULL; > } > > -- [sorry for delay, been travelling] With this applied, I no longer see the 'bad page' warning, but I do still get a bunch of messages like.. [ 340.342436] swap_free: Unused swap offset entry 3bb4 [ 340.952980] swap_free: Unused swap offset entry 298d [ 340.953016] swap_free: Unused swap offset entry 2996 [ 340.953048] swap_free: Unused swap offset entry 299d btw, anyone have thoughts on a patch something like below ? It's really annoying to debug stuff like this and have to walk over to the machine and reboot it by hand after it wedges during swapoff. Dave diff --git a/mm/swapfile.c b/mm/swapfile.c index 6cf2e60..bbb1192 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1587,6 +1587,10 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) if (!capable(CAP_SYS_ADMIN)) return -EPERM; + /* If we have hit memory corruption, we could hang during swapoff, so don't even try. */ + if (test_taint(TAINT_BAD_PAGE)) + return -EINVAL; + BUG_ON(!current->mm); pathname = getname(specialfile); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Thu, Aug 08, 2013 at 11:20:28PM +0800, Hillf Danton wrote: > On Wed, Aug 7, 2013 at 11:30 PM, Dave Jones wrote: > > printk didn't trigger. > > > Is a corrupted page table entry encountered, according to the > comment of swap_duplicate()? > > > --- a/mm/swapfile.c Wed Aug 7 17:27:22 2013 > +++ b/mm/swapfile.c Thu Aug 8 23:12:30 2013 > @@ -770,6 +770,7 @@ int free_swap_and_cache(swp_entry_t entr > unlock_page(page); > page_cache_release(page); > } > +return 1; > return p != NULL; > } Travelling for a week, I'll check it out when I get back. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Wed, Aug 7, 2013 at 11:30 PM, Dave Jones wrote: > printk didn't trigger. > Is a corrupted page table entry encountered, according to the comment of swap_duplicate()? --- a/mm/swapfile.c Wed Aug 7 17:27:22 2013 +++ b/mm/swapfile.c Thu Aug 8 23:12:30 2013 @@ -770,6 +770,7 @@ int free_swap_and_cache(swp_entry_t entr unlock_page(page); page_cache_release(page); } + return 1; return p != NULL; } -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
void __lru_cache_add(struct page *page) { struct pagevec *pvec = &get_cpu_var(lru_add_pvec); page_cache_get(page); if (!pagevec_space(pvec)) __pagevec_lru_add(pvec); pagevec_add(pvec, page); put_cpu_var(lru_add_pvec); } I added a printk, and found that pagevec_add frequently returns 0. Is that ok ? What happens to 'page' in this case ? Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
On Wed, Aug 07, 2013 at 06:04:20PM +0800, Hillf Danton wrote: > > There were a slew of these. same trace, different addr/anon_vma/index. > > mapping always null. > > > Would you please run again with the debug info added? > --- > --- a/mm/swapfile.c Wed Aug 7 17:27:22 2013 > +++ b/mm/swapfile.c Wed Aug 7 17:57:20 2013 > @@ -509,6 +509,7 @@ static struct swap_info_struct *swap_inf > { > struct swap_info_struct *p; > unsigned long offset, type; > +int race = 0; > > if (!entry.val) > goto out; > @@ -524,10 +525,17 @@ static struct swap_info_struct *swap_inf > if (!p->swap_map[offset]) > goto bad_free; > spin_lock(&p->lock); > +if (!p->swap_map[offset]) { > +race = 1; > +spin_unlock(&p->lock); > +goto bad_free; > +} > return p; > > bad_free: > printk(KERN_ERR "swap_free: %s%08lx\n", Unused_offset, entry.val); > +if (race) > +printk(KERN_ERR "but due to race\n"); > goto out; > bad_offset: > printk(KERN_ERR "swap_free: %s%08lx\n", Bad_offset, entry.val); > -- printk didn't trigger. This time around the oom killer was going off the same time. I'm wondering if we have some allocations somewhere in the swap code that don't handle failure correctly. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unused swap offset / bad page map.
Hello Dave On Wed, Aug 7, 2013 at 1:51 PM, Dave Jones wrote: > Seen while fuzzing with lots of child processes. > > swap_free: Unused swap offset entry 001263f5 > BUG: Bad page map in process trinity-child29 pte:24c7ea00 pmd:09fec067 > addr:7f9db958d000 vm_flags:00100073 anon_vma:88022c004ba0 mapping: > (null) index:f99 > Modules linked in: fuse ipt_ULOG snd_seq_dummy tun sctp scsi_transport_iscsi > can_raw can_bcm rfcomm bnep nfnetlink hidp appletalk bluetooth rose can > af_802154 phonet x25 af_rxrpc llc2 nfc rfkill af_key pppoe rds pppox > ppp_generic slhc caif_socket caif irda crc_ccitt atm netrom ax25 ipx p8023 > psnap p8022 llc snd_hda_codec_realtek pcspkr usb_debug snd_seq snd_seq_device > snd_hda_intel snd_hda_codec snd_hwdep e1000e snd_pcm ptp pps_core > snd_page_alloc snd_timer snd soundcore xfs libcrc32c > CPU: 1 PID: 2624 Comm: trinity-child29 Not tainted 3.11.0-rc4+ #1 > 8801fd7ddc90 81700f2c 7f9db958d000 > 8801fd7ddcd8 8117cba7 24c7ea00 0f99 > 7f9db960 880009fecc68 24c7ea00 8801fd7dde00 > Call Trace: > [] dump_stack+0x4e/0x82 > [] print_bad_pte+0x187/0x220 > [] unmap_single_vma+0x535/0x890 > [] unmap_vmas+0x49/0x90 > [] exit_mmap+0xc1/0x170 > [] mmput+0x6f/0x100 > [] do_exit+0x288/0xcd0 > [] ? trace_hardirqs_on_caller+0x115/0x1e0 > [] ? trace_hardirqs_on+0xd/0x10 > [] do_group_exit+0x4c/0xc0 > [] SyS_exit_group+0x14/0x20 > [] tracesys+0xdd/0xe2 > > There were a slew of these. same trace, different addr/anon_vma/index. > mapping always null. > Would you please run again with the debug info added? --- --- a/mm/swapfile.c Wed Aug 7 17:27:22 2013 +++ b/mm/swapfile.c Wed Aug 7 17:57:20 2013 @@ -509,6 +509,7 @@ static struct swap_info_struct *swap_inf { struct swap_info_struct *p; unsigned long offset, type; + int race = 0; if (!entry.val) goto out; @@ -524,10 +525,17 @@ static struct swap_info_struct *swap_inf if (!p->swap_map[offset]) goto bad_free; spin_lock(&p->lock); + if (!p->swap_map[offset]) { + race = 1; + spin_unlock(&p->lock); + goto bad_free; + } return p; bad_free: printk(KERN_ERR "swap_free: %s%08lx\n", Unused_offset, entry.val); + if (race) + printk(KERN_ERR "but due to race\n"); goto out; bad_offset: printk(KERN_ERR "swap_free: %s%08lx\n", Bad_offset, entry.val); -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/