Re: 2.6.22-rc5-yesterdaygit with VM debug: BUG in mm/rmap.c:66: anon_vma_link ?

2007-06-25 Thread Nick Piggin

Hugh Dickins wrote:

On Mon, 25 Jun 2007, Petr Vandrovec wrote:


Hello,
 to catch some memory corruption bug in our code I've modified malloc to do
mmap + mprotect - which has unfortunate effect that it creates thousands and
thousands of VMAs.  Everything works (though rather slowly on kernel with
CONFIG_VM_DEBUG) until application does fork() - kernel crashes on fork()
because copy_process()'s anon_vma_link complains that it could not find anon
vma after walking through 10 elements of anon list - which seems strange,
as I did not touch system wide limit (which is 65536 vmas), and mprotect()s
started failing after creating 65536 vmas, as expected.

Full output of test program and full kernel dmesg are at
http://buk.vc.cvut.cz/linux/rmap.



Thanks for finding that, Petr.  Patch below just solves the problem
by removing validate_anon_vma; but in the past both Nick and Andrea
have been less eager to delete old debug code than I am, so it would
be rude to put this patch in without an Ack from at least one of them
- they may prefer to tinker with the limit instead, but removing the
whole function is my preference.

You were puzzled by the numbers.  What happens is that the parent
builds up to 65536 vmas, and from that point on is not allowed to
split vmas any more, so the mprotects fail as you expected and
observed.  But further mmaps succeed, up to your own 131072 limit,
because each added area can simply extend the last vma.

All the vmas of interest here (i.e. not the executable, libs, stack
etc.), for better or worse, share the same anon_vma: so that if
mprotect were later used to undo the difference between neighbouring
vmas, they could be merged together - assigning different anon_vmas
would obstruct that merge (but yes, we've a guessed tradeoff there).

So the parent has around 65500 vmas all linked to the same anon_vma;
and in the course of its fork, links the child's dup vmas one by one
to that same anon_vma, until it hits the validate_anon_vma's 10
BUG_ON.  It's very much the nature of the anon_vma, to be shared
between parent and child: anon pages may be shared between both.

If we raised the 10 limit to 2*sysctl_max_map_count, then your
program would be safe (setting aside changes to that max_map_count),
but another program in which the child also forked would then BUG.



[PATCH] kill validate_anon_vma to avoid mapcount BUG


Fine by me. I had been meaning to get rid of that so DEBUG_VM is more
useful to be turned on in betas or even production kernels.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc5-yesterdaygit with VM debug: BUG in mm/rmap.c:66: anon_vma_link ?

2007-06-25 Thread Hugh Dickins
On Mon, 25 Jun 2007, Petr Vandrovec wrote:
> Hello,
>   to catch some memory corruption bug in our code I've modified malloc to do
> mmap + mprotect - which has unfortunate effect that it creates thousands and
> thousands of VMAs.  Everything works (though rather slowly on kernel with
> CONFIG_VM_DEBUG) until application does fork() - kernel crashes on fork()
> because copy_process()'s anon_vma_link complains that it could not find anon
> vma after walking through 10 elements of anon list - which seems strange,
> as I did not touch system wide limit (which is 65536 vmas), and mprotect()s
> started failing after creating 65536 vmas, as expected.
> 
> Full output of test program and full kernel dmesg are at
> http://buk.vc.cvut.cz/linux/rmap.

Thanks for finding that, Petr.  Patch below just solves the problem
by removing validate_anon_vma; but in the past both Nick and Andrea
have been less eager to delete old debug code than I am, so it would
be rude to put this patch in without an Ack from at least one of them
- they may prefer to tinker with the limit instead, but removing the
whole function is my preference.

You were puzzled by the numbers.  What happens is that the parent
builds up to 65536 vmas, and from that point on is not allowed to
split vmas any more, so the mprotects fail as you expected and
observed.  But further mmaps succeed, up to your own 131072 limit,
because each added area can simply extend the last vma.

All the vmas of interest here (i.e. not the executable, libs, stack
etc.), for better or worse, share the same anon_vma: so that if
mprotect were later used to undo the difference between neighbouring
vmas, they could be merged together - assigning different anon_vmas
would obstruct that merge (but yes, we've a guessed tradeoff there).

So the parent has around 65500 vmas all linked to the same anon_vma;
and in the course of its fork, links the child's dup vmas one by one
to that same anon_vma, until it hits the validate_anon_vma's 10
BUG_ON.  It's very much the nature of the anon_vma, to be shared
between parent and child: anon pages may be shared between both.

If we raised the 10 limit to 2*sysctl_max_map_count, then your
program would be safe (setting aside changes to that max_map_count),
but another program in which the child also forked would then BUG.



[PATCH] kill validate_anon_vma to avoid mapcount BUG

validate_anon_vma gave a useful check on the integrity of the anon_vma list
when Andrea was developing obj rmap; but it was not enabled in SLES9 itself,
nor in mainline, until Nick changed commented-out RMAP_DEBUG to configurable
CONFIG_DEBUG_VM in 2.6.17.  Now Petr Vandrovec reports that its
BUG_ON(mapcount > 10) can easily crash a CONFIG_DEBUG_VM=y system.

That limit was just an arbitrary number to protect against an infinite loop.
We could raise it to something enormous (depending on sizeof struct vma and
size of memory?); but I rather think validate_anon_vma has outlived its
usefulness, and is better just removed - which gives a magnificent
performance boost to anything like Petr's test program ;)

Of course, a very long anon_vma list is bad news for preemption latency,
and I believe there has been one recent report of such: let's not forget
that, but validate_anon_vma only makes it worse not better.

Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]>
---

 mm/rmap.c |   24 +---
 1 file changed, 1 insertion(+), 23 deletions(-)

--- 2.6.22-rc6/mm/rmap.c2007-05-19 07:36:34.0 +0100
+++ linux/mm/rmap.c 2007-06-25 21:01:01.0 +0100
@@ -53,24 +53,6 @@
 
 struct kmem_cache *anon_vma_cachep;
 
-static inline void validate_anon_vma(struct vm_area_struct *find_vma)
-{
-#ifdef CONFIG_DEBUG_VM
-   struct anon_vma *anon_vma = find_vma->anon_vma;
-   struct vm_area_struct *vma;
-   unsigned int mapcount = 0;
-   int found = 0;
-
-   list_for_each_entry(vma, _vma->head, anon_vma_node) {
-   mapcount++;
-   BUG_ON(mapcount > 10);
-   if (vma == find_vma)
-   found = 1;
-   }
-   BUG_ON(!found);
-#endif
-}
-
 /* This must be called under the mmap_sem. */
 int anon_vma_prepare(struct vm_area_struct *vma)
 {
@@ -121,10 +103,8 @@ void __anon_vma_link(struct vm_area_stru
 {
struct anon_vma *anon_vma = vma->anon_vma;
 
-   if (anon_vma) {
+   if (anon_vma)
list_add_tail(>anon_vma_node, _vma->head);
-   validate_anon_vma(vma);
-   }
 }
 
 void anon_vma_link(struct vm_area_struct *vma)
@@ -134,7 +114,6 @@ void anon_vma_link(struct vm_area_struct
if (anon_vma) {
spin_lock(_vma->lock);
list_add_tail(>anon_vma_node, _vma->head);
-   validate_anon_vma(vma);
spin_unlock(_vma->lock);
}
 }
@@ -148,7 +127,6 @@ void anon_vma_unlink(struct vm_area_stru
return;
 
spin_lock(_vma->lock);
-   

Re: 2.6.22-rc5-yesterdaygit with VM debug: BUG in mm/rmap.c:66: anon_vma_link ?

2007-06-25 Thread Andrea Arcangeli
On Mon, Jun 25, 2007 at 10:05:09PM +0100, Hugh Dickins wrote:
> size of memory?); but I rather think validate_anon_vma has outlived its
> usefulness, and is better just removed - which gives a magnificent

Probably yes. But the most fundamental issue is that this code
probably was never meant to be enabled through a menuconfig tweak but
only by editing the source.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.22-rc5-yesterdaygit with VM debug: BUG in mm/rmap.c:66: anon_vma_link ?

2007-06-25 Thread Petr Vandrovec

Hello,
  to catch some memory corruption bug in our code I've modified malloc 
to do mmap + mprotect - which has unfortunate effect that it creates 
thousands and thousands of VMAs.  Everything works (though rather slowly 
on kernel with CONFIG_VM_DEBUG) until application does fork() - kernel 
crashes on fork() because copy_process()'s anon_vma_link complains that 
it could not find anon vma after walking through 10 elements of anon 
list - which seems strange, as I did not touch system wide limit (which 
is 65536 vmas), and mprotect()s started failing after creating 65536 
vmas, as expected.


Full output of test program and full kernel dmesg are at 
http://buk.vc.cvut.cz/linux/rmap.

Thanks,
Petr Vandrovec


#include 
#include 
#include 
#include 

#define TRY_REGIONS 131072

int main(void) {
unsigned char* ptr[TRY_REGIONS];
int i;
int fd;
int badmprot = 0;
char buf[16384];
ssize_t l;

printf("PID=%u\n", getpid());
for (i = 0; i < TRY_REGIONS; i++) {
		ptr[i] = mmap(0, 8192, PROT_READ | PROT_WRITE, MAP_PRIVATE | 
MAP_ANONYMOUS, -1, 0);

if (ptr[i] == MAP_FAILED) {
break;
}
if (mprotect(ptr[i] + 4096, 4096, PROT_NONE)) {
badmprot++;
}
}
printf("Allocated %u regions, %u mprotects failed\n", i, badmprot);
fflush(stdout);
fd = open("/proc/self/maps", O_RDONLY);
while ((l = read(fd, buf, sizeof buf)) > 0) {
write(1, buf, l);
}
close(fd);
fork();
return 0;
}

PID=6101
Allocated 131072 regions, 98310 mprotects failed
08048000-08049000 r-xp  08:05 1163513 
 /root/test
08049000-0804a000 rw-p  08:05 1163513 
 /root/test

b7e37000-e7e44000 rw-p b7e37000 00:00 0
e7e44000-e7e45000 ---p e7e44000 00:00 0
e7e45000-e7e46000 rw-p e7e45000 00:00 0
[65525 lines removed]
f7f7b000-f7f7c000 ---p f7f7b000 00:00 0
f7f7c000-f7f7f000 rw-p f7f7c000 00:00 0
f7f7f000-f7f9a000 r-xp  08:05 15581230 
 /lib/ld-2.5.so
f7f9a000-f7f9c000 rw-p 0001b000 08:05 15581230 
 /lib/ld-2.5.so
ff869000-ff8ef000 rw-p ff869000 00:00 0 
 [stack]
e000-f000 r-xp e000 00:00 0 
 [vdso]


[ cut here ]
kernel BUG at /usr/src/linus/linux-2.6.22-rc5-7515/mm/rmap.c:66!
invalid opcode:  [1] PREEMPT SMP
CPU 0
Modules linked in: binfmt_misc rfcomm l2cap nfs nfsd exportfs lockd 
nfs_acl sunrpc ipx p8022 psnap llc p8023 ppdev lp af_packet aoe deflate 
zlib_deflate zlib_inflate twofish twofish_common camellia serpent 
blowfish des cbc ecb blkcipher aes xcbc sha256 sha1 crypto_null hmac 
crypto_hash cryptomgr af_key nls_utf8 nls_iso8859_2 ntfs fuse sbp2 loop 
hci_usb raw1394 dv1394 bluetooth usb_storage usbhid libusual 
snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_timer firewire_ohci 
firewire_core sg snd crc_itu_t parport_pc parport sky2 k8temp 8250_pnp 
8250 serial_core sr_mod serio_raw hwmon sata_sil24 ohci1394 ieee1394 
ohci_hcd ehci_hcd cdrom snd_page_alloc usbcore i2c_nforce2

Pid: 6101, comm: test Not tainted 2.6.22-rc5-7515-64 #1
RIP: 0010:[]  [] anon_vma_link+0x8b/0xa0
RSP: 0018:810111d4fd88  EFLAGS: 00010202
RAX: 8101109fb9e0 RBX: 8101109fb978 RCX: 8101109fb978
RDX: 810112fccf10 RSI: 000186a1 RDI: 
RBP: 810111d4fd98 R08: 810112fccf10 R09: 
R10: 8029874b R11:  R12: 810112fccee0
R13: 01200011 R14: 810111590080 R15: 810111590080
FS:  () GS:80652000(0063) knlGS:f7e196c0
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2: f7eaa7c0 CR3: 000111c17000 CR4: 06e0
Process test (pid: 6101, threadinfo 810111d4e000, task 810112fcf080)
Stack:  0001 8101109fb978 810111d4fe78 8023817c
 8061d688 8101140e00e0 8101115901b8 81012560d148
 8101115906a0 810111ebd760 f7e19708 
Call Trace:
 [] copy_process+0xb9c/0x1760
 [] alloc_pid+0x212/0x320
 [] do_fork+0xa3/0x290
 [] _spin_unlock+0x30/0x60
 [] __fput+0x176/0x1c0
 [] sys32_clone+0x27/0x30
 [] ia32_ptregs_common+0x25/0x50


Code: 0f 0b eb fe 0f 0b eb fe 66 0f 1f 44 00 00 0f 1f 80 00 00 00
RIP  [] anon_vma_link+0x8b/0xa0
 RSP 
note: test[6101] exited with preempt_count 1
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.22-rc5-yesterdaygit with VM debug: BUG in mm/rmap.c:66: anon_vma_link ?

2007-06-25 Thread Petr Vandrovec

Hello,
  to catch some memory corruption bug in our code I've modified malloc 
to do mmap + mprotect - which has unfortunate effect that it creates 
thousands and thousands of VMAs.  Everything works (though rather slowly 
on kernel with CONFIG_VM_DEBUG) until application does fork() - kernel 
crashes on fork() because copy_process()'s anon_vma_link complains that 
it could not find anon vma after walking through 10 elements of anon 
list - which seems strange, as I did not touch system wide limit (which 
is 65536 vmas), and mprotect()s started failing after creating 65536 
vmas, as expected.


Full output of test program and full kernel dmesg are at 
http://buk.vc.cvut.cz/linux/rmap.

Thanks,
Petr Vandrovec


#include sys/mman.h
#include stdio.h
#include unistd.h
#include fcntl.h

#define TRY_REGIONS 131072

int main(void) {
unsigned char* ptr[TRY_REGIONS];
int i;
int fd;
int badmprot = 0;
char buf[16384];
ssize_t l;

printf(PID=%u\n, getpid());
for (i = 0; i  TRY_REGIONS; i++) {
		ptr[i] = mmap(0, 8192, PROT_READ | PROT_WRITE, MAP_PRIVATE | 
MAP_ANONYMOUS, -1, 0);

if (ptr[i] == MAP_FAILED) {
break;
}
if (mprotect(ptr[i] + 4096, 4096, PROT_NONE)) {
badmprot++;
}
}
printf(Allocated %u regions, %u mprotects failed\n, i, badmprot);
fflush(stdout);
fd = open(/proc/self/maps, O_RDONLY);
while ((l = read(fd, buf, sizeof buf))  0) {
write(1, buf, l);
}
close(fd);
fork();
return 0;
}

PID=6101
Allocated 131072 regions, 98310 mprotects failed
08048000-08049000 r-xp  08:05 1163513 
 /root/test
08049000-0804a000 rw-p  08:05 1163513 
 /root/test

b7e37000-e7e44000 rw-p b7e37000 00:00 0
e7e44000-e7e45000 ---p e7e44000 00:00 0
e7e45000-e7e46000 rw-p e7e45000 00:00 0
[65525 lines removed]
f7f7b000-f7f7c000 ---p f7f7b000 00:00 0
f7f7c000-f7f7f000 rw-p f7f7c000 00:00 0
f7f7f000-f7f9a000 r-xp  08:05 15581230 
 /lib/ld-2.5.so
f7f9a000-f7f9c000 rw-p 0001b000 08:05 15581230 
 /lib/ld-2.5.so
ff869000-ff8ef000 rw-p ff869000 00:00 0 
 [stack]
e000-f000 r-xp e000 00:00 0 
 [vdso]


[ cut here ]
kernel BUG at /usr/src/linus/linux-2.6.22-rc5-7515/mm/rmap.c:66!
invalid opcode:  [1] PREEMPT SMP
CPU 0
Modules linked in: binfmt_misc rfcomm l2cap nfs nfsd exportfs lockd 
nfs_acl sunrpc ipx p8022 psnap llc p8023 ppdev lp af_packet aoe deflate 
zlib_deflate zlib_inflate twofish twofish_common camellia serpent 
blowfish des cbc ecb blkcipher aes xcbc sha256 sha1 crypto_null hmac 
crypto_hash cryptomgr af_key nls_utf8 nls_iso8859_2 ntfs fuse sbp2 loop 
hci_usb raw1394 dv1394 bluetooth usb_storage usbhid libusual 
snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_timer firewire_ohci 
firewire_core sg snd crc_itu_t parport_pc parport sky2 k8temp 8250_pnp 
8250 serial_core sr_mod serio_raw hwmon sata_sil24 ohci1394 ieee1394 
ohci_hcd ehci_hcd cdrom snd_page_alloc usbcore i2c_nforce2

Pid: 6101, comm: test Not tainted 2.6.22-rc5-7515-64 #1
RIP: 0010:[802987bb]  [802987bb] anon_vma_link+0x8b/0xa0
RSP: 0018:810111d4fd88  EFLAGS: 00010202
RAX: 8101109fb9e0 RBX: 8101109fb978 RCX: 8101109fb978
RDX: 810112fccf10 RSI: 000186a1 RDI: 
RBP: 810111d4fd98 R08: 810112fccf10 R09: 
R10: 8029874b R11:  R12: 810112fccee0
R13: 01200011 R14: 810111590080 R15: 810111590080
FS:  () GS:80652000(0063) knlGS:f7e196c0
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2: f7eaa7c0 CR3: 000111c17000 CR4: 06e0
Process test (pid: 6101, threadinfo 810111d4e000, task 810112fcf080)
Stack:  0001 8101109fb978 810111d4fe78 8023817c
 8061d688 8101140e00e0 8101115901b8 81012560d148
 8101115906a0 810111ebd760 f7e19708 
Call Trace:
 [8023817c] copy_process+0xb9c/0x1760
 [8024dd72] alloc_pid+0x212/0x320
 [80238ed3] do_fork+0xa3/0x290
 [804c6880] _spin_unlock+0x30/0x60
 [802aebb6] __fput+0x176/0x1c0
 [80227ce7] sys32_clone+0x27/0x30
 [802279d5] ia32_ptregs_common+0x25/0x50


Code: 0f 0b eb fe 0f 0b eb fe 66 0f 1f 44 00 00 0f 1f 80 00 00 00
RIP  [802987bb] anon_vma_link+0x8b/0xa0
 RSP 810111d4fd88
note: test[6101] exited with preempt_count 1
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc5-yesterdaygit with VM debug: BUG in mm/rmap.c:66: anon_vma_link ?

2007-06-25 Thread Andrea Arcangeli
On Mon, Jun 25, 2007 at 10:05:09PM +0100, Hugh Dickins wrote:
 size of memory?); but I rather think validate_anon_vma has outlived its
 usefulness, and is better just removed - which gives a magnificent

Probably yes. But the most fundamental issue is that this code
probably was never meant to be enabled through a menuconfig tweak but
only by editing the source.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc5-yesterdaygit with VM debug: BUG in mm/rmap.c:66: anon_vma_link ?

2007-06-25 Thread Hugh Dickins
On Mon, 25 Jun 2007, Petr Vandrovec wrote:
 Hello,
   to catch some memory corruption bug in our code I've modified malloc to do
 mmap + mprotect - which has unfortunate effect that it creates thousands and
 thousands of VMAs.  Everything works (though rather slowly on kernel with
 CONFIG_VM_DEBUG) until application does fork() - kernel crashes on fork()
 because copy_process()'s anon_vma_link complains that it could not find anon
 vma after walking through 10 elements of anon list - which seems strange,
 as I did not touch system wide limit (which is 65536 vmas), and mprotect()s
 started failing after creating 65536 vmas, as expected.
 
 Full output of test program and full kernel dmesg are at
 http://buk.vc.cvut.cz/linux/rmap.

Thanks for finding that, Petr.  Patch below just solves the problem
by removing validate_anon_vma; but in the past both Nick and Andrea
have been less eager to delete old debug code than I am, so it would
be rude to put this patch in without an Ack from at least one of them
- they may prefer to tinker with the limit instead, but removing the
whole function is my preference.

You were puzzled by the numbers.  What happens is that the parent
builds up to 65536 vmas, and from that point on is not allowed to
split vmas any more, so the mprotects fail as you expected and
observed.  But further mmaps succeed, up to your own 131072 limit,
because each added area can simply extend the last vma.

All the vmas of interest here (i.e. not the executable, libs, stack
etc.), for better or worse, share the same anon_vma: so that if
mprotect were later used to undo the difference between neighbouring
vmas, they could be merged together - assigning different anon_vmas
would obstruct that merge (but yes, we've a guessed tradeoff there).

So the parent has around 65500 vmas all linked to the same anon_vma;
and in the course of its fork, links the child's dup vmas one by one
to that same anon_vma, until it hits the validate_anon_vma's 10
BUG_ON.  It's very much the nature of the anon_vma, to be shared
between parent and child: anon pages may be shared between both.

If we raised the 10 limit to 2*sysctl_max_map_count, then your
program would be safe (setting aside changes to that max_map_count),
but another program in which the child also forked would then BUG.



[PATCH] kill validate_anon_vma to avoid mapcount BUG

validate_anon_vma gave a useful check on the integrity of the anon_vma list
when Andrea was developing obj rmap; but it was not enabled in SLES9 itself,
nor in mainline, until Nick changed commented-out RMAP_DEBUG to configurable
CONFIG_DEBUG_VM in 2.6.17.  Now Petr Vandrovec reports that its
BUG_ON(mapcount  10) can easily crash a CONFIG_DEBUG_VM=y system.

That limit was just an arbitrary number to protect against an infinite loop.
We could raise it to something enormous (depending on sizeof struct vma and
size of memory?); but I rather think validate_anon_vma has outlived its
usefulness, and is better just removed - which gives a magnificent
performance boost to anything like Petr's test program ;)

Of course, a very long anon_vma list is bad news for preemption latency,
and I believe there has been one recent report of such: let's not forget
that, but validate_anon_vma only makes it worse not better.

Signed-off-by: Hugh Dickins [EMAIL PROTECTED]
---

 mm/rmap.c |   24 +---
 1 file changed, 1 insertion(+), 23 deletions(-)

--- 2.6.22-rc6/mm/rmap.c2007-05-19 07:36:34.0 +0100
+++ linux/mm/rmap.c 2007-06-25 21:01:01.0 +0100
@@ -53,24 +53,6 @@
 
 struct kmem_cache *anon_vma_cachep;
 
-static inline void validate_anon_vma(struct vm_area_struct *find_vma)
-{
-#ifdef CONFIG_DEBUG_VM
-   struct anon_vma *anon_vma = find_vma-anon_vma;
-   struct vm_area_struct *vma;
-   unsigned int mapcount = 0;
-   int found = 0;
-
-   list_for_each_entry(vma, anon_vma-head, anon_vma_node) {
-   mapcount++;
-   BUG_ON(mapcount  10);
-   if (vma == find_vma)
-   found = 1;
-   }
-   BUG_ON(!found);
-#endif
-}
-
 /* This must be called under the mmap_sem. */
 int anon_vma_prepare(struct vm_area_struct *vma)
 {
@@ -121,10 +103,8 @@ void __anon_vma_link(struct vm_area_stru
 {
struct anon_vma *anon_vma = vma-anon_vma;
 
-   if (anon_vma) {
+   if (anon_vma)
list_add_tail(vma-anon_vma_node, anon_vma-head);
-   validate_anon_vma(vma);
-   }
 }
 
 void anon_vma_link(struct vm_area_struct *vma)
@@ -134,7 +114,6 @@ void anon_vma_link(struct vm_area_struct
if (anon_vma) {
spin_lock(anon_vma-lock);
list_add_tail(vma-anon_vma_node, anon_vma-head);
-   validate_anon_vma(vma);
spin_unlock(anon_vma-lock);
}
 }
@@ -148,7 +127,6 @@ void anon_vma_unlink(struct vm_area_stru
return;
 
spin_lock(anon_vma-lock);
-   

Re: 2.6.22-rc5-yesterdaygit with VM debug: BUG in mm/rmap.c:66: anon_vma_link ?

2007-06-25 Thread Nick Piggin

Hugh Dickins wrote:

On Mon, 25 Jun 2007, Petr Vandrovec wrote:


Hello,
 to catch some memory corruption bug in our code I've modified malloc to do
mmap + mprotect - which has unfortunate effect that it creates thousands and
thousands of VMAs.  Everything works (though rather slowly on kernel with
CONFIG_VM_DEBUG) until application does fork() - kernel crashes on fork()
because copy_process()'s anon_vma_link complains that it could not find anon
vma after walking through 10 elements of anon list - which seems strange,
as I did not touch system wide limit (which is 65536 vmas), and mprotect()s
started failing after creating 65536 vmas, as expected.

Full output of test program and full kernel dmesg are at
http://buk.vc.cvut.cz/linux/rmap.



Thanks for finding that, Petr.  Patch below just solves the problem
by removing validate_anon_vma; but in the past both Nick and Andrea
have been less eager to delete old debug code than I am, so it would
be rude to put this patch in without an Ack from at least one of them
- they may prefer to tinker with the limit instead, but removing the
whole function is my preference.

You were puzzled by the numbers.  What happens is that the parent
builds up to 65536 vmas, and from that point on is not allowed to
split vmas any more, so the mprotects fail as you expected and
observed.  But further mmaps succeed, up to your own 131072 limit,
because each added area can simply extend the last vma.

All the vmas of interest here (i.e. not the executable, libs, stack
etc.), for better or worse, share the same anon_vma: so that if
mprotect were later used to undo the difference between neighbouring
vmas, they could be merged together - assigning different anon_vmas
would obstruct that merge (but yes, we've a guessed tradeoff there).

So the parent has around 65500 vmas all linked to the same anon_vma;
and in the course of its fork, links the child's dup vmas one by one
to that same anon_vma, until it hits the validate_anon_vma's 10
BUG_ON.  It's very much the nature of the anon_vma, to be shared
between parent and child: anon pages may be shared between both.

If we raised the 10 limit to 2*sysctl_max_map_count, then your
program would be safe (setting aside changes to that max_map_count),
but another program in which the child also forked would then BUG.



[PATCH] kill validate_anon_vma to avoid mapcount BUG


Fine by me. I had been meaning to get rid of that so DEBUG_VM is more
useful to be turned on in betas or even production kernels.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/