Re: Memory modified after free in "MAP ENTRY" zone (vm_map_entry_t->read_ahead)

2016-02-22 Thread Andriy Gapon
On 19/02/2016 10:38, Andriy Gapon wrote:
> On 18/02/2016 17:13, Konstantin Belousov wrote:
>> So this is arguably a fallout from r188331.
>> The following is somewhat non-insistent attempt to fix the problem.
> 
> Kostik,
> 
> thank you very much, I am testing the patch.

The patch holds good so far.  No regressions.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Memory modified after free in "MAP ENTRY" zone (vm_map_entry_t->read_ahead)

2016-02-19 Thread Andriy Gapon
On 18/02/2016 17:13, Konstantin Belousov wrote:
> So this is arguably a fallout from r188331.
> The following is somewhat non-insistent attempt to fix the problem.

Kostik,

thank you very much, I am testing the patch.

> diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
> index a7e3d37..cddf1eb 100644
> --- a/sys/vm/vm_fault.c
> +++ b/sys/vm/vm_fault.c
> @@ -291,7 +291,8 @@ vm_fault_hold(vm_map_t map, vm_offset_t vaddr, vm_prot_t 
> fault_type,
>   struct faultstate fs;
>   struct vnode *vp;
>   vm_page_t m;
> - int ahead, behind, cluster_offset, error, locked;
> + int ahead, behind, cluster_offset, error, locked, rv;
> + u_char behavior;
>  
>   hardfault = 0;
>   growstack = TRUE;
> @@ -550,9 +551,18 @@ readrest:
>* zero-filled pages.
>*/
>   if (fs.object->type != OBJT_DEFAULT) {
> - int rv;
> - u_char behavior = vm_map_entry_behavior(fs.entry);
> -
> + if (!fs.lookup_still_valid) {
> + locked = vm_map_trylock_read(fs.map);
> + if (locked)
> + fs.lookup_still_valid = TRUE;
> + if (!locked || fs.map->timestamp !=
> + map_generation) {
> + release_page();
> + unlock_and_deallocate();
> + goto RetryFault;
> + }
> + }
> + behavior = vm_map_entry_behavior(fs.entry);
>   era = fs.entry->read_ahead;
>   if (behavior == MAP_ENTRY_BEHAV_RANDOM ||
>   P_KILLED(curproc)) {
> @@ -603,6 +613,7 @@ readrest:
>   }
>   ahead = ulmin(ahead, atop(fs.entry->end - vaddr) - 1);
>   if (era != nera)
> + /* XXX only read-lock on map */
>   fs.entry->read_ahead = nera;
>  
>   /*
> 


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Memory modified after free in "MAP ENTRY" zone (vm_map_entry_t->read_ahead)

2016-02-18 Thread Konstantin Belousov
On Mon, Feb 15, 2016 at 11:07:11AM +0200, Andriy Gapon wrote:
> On 15/02/2016 00:27, Alan Cox wrote:
> > 
> > On Sun, Feb 14, 2016 at 8:09 AM, Andriy Gapon  > > wrote:
> > 
> > On 10/02/2016 23:28, Andriy Gapon wrote:
> > >
> > > Over a span of approximately 3 weeks I have got two slightly 
> > different panics of
> > > the same kind.   The affected system is a several months old amd64 
> > head.
> > 
> > I added a small assertion and it got tripped:
> [snip]
> > 
> > So, it seems that the code allows modification of read_ahead field of 
> > an entry
> > without holding a map lock.  I'd guess that that should be mostly 
> > harmless as
> > read_ahead value is not critical, but it is still not nice.
> > 
> > Not intentionally.  The nearby code to the read_ahead assignment expects 
> > the map
> > to be locked, so I wouldn't categorize this as mostly harmless.
> > 
> > In general, you shouldn't get to the read_ahead assignment without the map 
> > being
> > locked, and almost all of the code paths that unlock the map jump to 
> > RetryFault
> > shortly thereafter, so it's hard to imagine how the map lock wouldn't be
> > reacquired.  I speculate that the root cause of your panic is a case where
> > vm_pager_get_pages() fails, and in such a case we might loop around, 
> > descending
> > to the next backing object without reacquiring the map lock.  In other 
> > words,
> > your above assertion failure is happening on the next iteration after an 
> > initial
> > vm_pager_get_pages() failure, and we're about to do another 
> > vm_pager_get_pages()
> > call on a different object.
> > 
> > In summary, I have no trouble believing that the code that handles a 
> > failure by
> > vm_pager_get_pages() in vm_fault() is not quite right, and I think that's 
> > what's
> > biting you.
> 
> Alan,
> 
> thank you very much for the very insightful analysis.
> Indeed, I see that in this case the object chain is default -> swap -> swap.  
> I
> am not sure how such chain was created.  It seems that going default -> swap 
> is
> not a problem as the map lock is not dropped in this case.  But going swap ->
> swap the way you described (pager error, e.g. the page is just not found) has
> exactly the problem that you suggested.
> 

So this is arguably a fallout from r188331.
The following is somewhat non-insistent attempt to fix the problem.

diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
index a7e3d37..cddf1eb 100644
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -291,7 +291,8 @@ vm_fault_hold(vm_map_t map, vm_offset_t vaddr, vm_prot_t 
fault_type,
struct faultstate fs;
struct vnode *vp;
vm_page_t m;
-   int ahead, behind, cluster_offset, error, locked;
+   int ahead, behind, cluster_offset, error, locked, rv;
+   u_char behavior;
 
hardfault = 0;
growstack = TRUE;
@@ -550,9 +551,18 @@ readrest:
 * zero-filled pages.
 */
if (fs.object->type != OBJT_DEFAULT) {
-   int rv;
-   u_char behavior = vm_map_entry_behavior(fs.entry);
-
+   if (!fs.lookup_still_valid) {
+   locked = vm_map_trylock_read(fs.map);
+   if (locked)
+   fs.lookup_still_valid = TRUE;
+   if (!locked || fs.map->timestamp !=
+   map_generation) {
+   release_page();
+   unlock_and_deallocate();
+   goto RetryFault;
+   }
+   }
+   behavior = vm_map_entry_behavior(fs.entry);
era = fs.entry->read_ahead;
if (behavior == MAP_ENTRY_BEHAV_RANDOM ||
P_KILLED(curproc)) {
@@ -603,6 +613,7 @@ readrest:
}
ahead = ulmin(ahead, atop(fs.entry->end - vaddr) - 1);
if (era != nera)
+   /* XXX only read-lock on map */
fs.entry->read_ahead = nera;
 
/*
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Memory modified after free in "MAP ENTRY" zone (vm_map_entry_t->read_ahead)

2016-02-15 Thread Andriy Gapon
On 15/02/2016 00:27, Alan Cox wrote:
> 
> On Sun, Feb 14, 2016 at 8:09 AM, Andriy Gapon  > wrote:
> 
> On 10/02/2016 23:28, Andriy Gapon wrote:
> >
> > Over a span of approximately 3 weeks I have got two slightly different 
> panics of
> > the same kind.   The affected system is a several months old amd64 head.
> 
> I added a small assertion and it got tripped:
[snip]
> 
> So, it seems that the code allows modification of read_ahead field of an 
> entry
> without holding a map lock.  I'd guess that that should be mostly 
> harmless as
> read_ahead value is not critical, but it is still not nice.
> 
> Not intentionally.  The nearby code to the read_ahead assignment expects the 
> map
> to be locked, so I wouldn't categorize this as mostly harmless.
> 
> In general, you shouldn't get to the read_ahead assignment without the map 
> being
> locked, and almost all of the code paths that unlock the map jump to 
> RetryFault
> shortly thereafter, so it's hard to imagine how the map lock wouldn't be
> reacquired.  I speculate that the root cause of your panic is a case where
> vm_pager_get_pages() fails, and in such a case we might loop around, 
> descending
> to the next backing object without reacquiring the map lock.  In other words,
> your above assertion failure is happening on the next iteration after an 
> initial
> vm_pager_get_pages() failure, and we're about to do another 
> vm_pager_get_pages()
> call on a different object.
> 
> In summary, I have no trouble believing that the code that handles a failure 
> by
> vm_pager_get_pages() in vm_fault() is not quite right, and I think that's 
> what's
> biting you.

Alan,

thank you very much for the very insightful analysis.
Indeed, I see that in this case the object chain is default -> swap -> swap.  I
am not sure how such chain was created.  It seems that going default -> swap is
not a problem as the map lock is not dropped in this case.  But going swap ->
swap the way you described (pager error, e.g. the page is just not found) has
exactly the problem that you suggested.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Memory modified after free in "MAP ENTRY" zone (vm_map_entry_t->read_ahead)

2016-02-14 Thread Alan Cox
On Sun, Feb 14, 2016 at 8:09 AM, Andriy Gapon  wrote:

> On 10/02/2016 23:28, Andriy Gapon wrote:
> >
> > Over a span of approximately 3 weeks I have got two slightly different
> panics of
> > the same kind.   The affected system is a several months old amd64 head.
>
> I added a small assertion and it got tripped:
> --- a/sys/vm/vm_fault.c
> +++ b/sys/vm/vm_fault.c
> @@ -608,8 +608,11 @@ readrest:
> cluster_offset;
> }
> ahead = ulmin(ahead, atop(fs.entry->end - vaddr) -
> 1);
> -   if (era != nera)
> +   if (era != nera) {
> +   KASSERT(fs.lookup_still_valid,
> +   ("modifying read_ahead without map
> lock"));
> fs.entry->read_ahead = nera;
> +   }
>
> /*
>  * Call the pager to retrieve the data, if any,
> after
>
> So, it seems that the code allows modification of read_ahead field of an
> entry
> without holding a map lock.  I'd guess that that should be mostly harmless
> as
> read_ahead value is not critical, but it is still not nice.
>
>

Not intentionally.  The nearby code to the read_ahead assignment expects
the map to be locked, so I wouldn't categorize this as mostly harmless.

In general, you shouldn't get to the read_ahead assignment without the map
being locked, and almost all of the code paths that unlock the map jump to
RetryFault shortly thereafter, so it's hard to imagine how the map lock
wouldn't be reacquired.  I speculate that the root cause of your panic is a
case where vm_pager_get_pages() fails, and in such a case we might loop
around, descending to the next backing object without reacquiring the map
lock.  In other words, your above assertion failure is happening on the
next iteration after an initial vm_pager_get_pages() failure, and we're
about to do another vm_pager_get_pages() call on a different object.

In summary, I have no trouble believing that the code that handles a
failure by vm_pager_get_pages() in vm_fault() is not quite right, and I
think that's what's biting you.



> Some details from the panic caused by the assertion can be found here:
> http://dpaste.com/39BYV7S.txt
>
>
> >  1 ===
> > Unread portion of the kernel message buffer:
> > panic: Memory modified after free 0xf8008c15ac80(128) val=adc0de @
> > 0xf8008c15acdc
> >
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at 0x8041e90b =
> db_trace_self_wrapper+0x2b/frame
> > 0xfe04f5349530
> > kdb_backtrace() at 0x80669a09 = kdb_backtrace+0x39/frame
> 0xfe04f53495e0
> > vpanic() at 0x80634dec = vpanic+0x14c/frame 0xfe04f5349620
> > panic() at 0x80634b33 = panic+0x43/frame 0xfe04f5349680
> > trash_ctor() at 0x807de9c8 = trash_ctor+0x48/frame
> 0xfe04f5349690
> > uma_zalloc_arg() at 0x807da785 = uma_zalloc_arg+0x475/frame
> > 0xfe04f5349720
> > uma_zalloc() at 0x807e44af = uma_zalloc+0xf/frame
> 0xfe04f5349730
> > vm_map_entry_create() at 0x807e5a2e =
> vm_map_entry_create+0x2e/frame
> > 0xfe04f5349740
> > _vm_map_clip_start() at 0x807e69e3 =
> _vm_map_clip_start+0x123/frame
> > 0xfe04f5349770
> > vm_map_wire() at 0x807e797d = vm_map_wire+0x11d/frame
> 0xfe04f53497f0
> > vslock() at 0x807e1a2b = vslock+0x6b/frame 0xfe04f5349810
> > sysctl_wire_old_buffer() at 0x8064148a =
> > sysctl_wire_old_buffer+0x4a/frame 0xfe04f5349830
> > sysctl_kern_proc() at 0x8062259b = sysctl_kern_proc+0x8b/frame
> > 0xfe04f5349880
> > sysctl_root_handler_locked() at 0x80641a8e =
> > sysctl_root_handler_locked+0x8e/frame 0xfe04f53498c0
> > sysctl_root() at 0x806412fe = sysctl_root+0x13e/frame
> 0xfe04f5349940
> > userland_sysctl() at 0x8064183d = userland_sysctl+0x16d/frame
> > 0xfe04f53499e0
> > sys___sysctl() at 0x80641694 = sys___sysctl+0x74/frame
> 0xfe04f5349a90
> > syscallenter() at 0x8081fa20 = syscallenter+0x320/frame
> 0xfe04f5349b00
> > amd64_syscall() at 0x8081f5ef = amd64_syscall+0x1f/frame
> 0xfe04f5349bf0
> > Xfast_syscall() at 0x80807c5b = Xfast_syscall+0xfb/frame
> 0xfe04f5349bf0
> > --- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x8042225ea, rsp =
> > 0x7fffcef8, rbp = 0x7fffcf30 ---
> > Uptime: 14d22h38m17s
> > ==
> >
> >  2 ===
> > Unread portion of the kernel message buffer:
> > panic: Memory modified after free 0xf80176692680(128) val=adc0de @
> > 0xf801766926dc
> >
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at 0x8041e90b =
> 

Re: Memory modified after free in "MAP ENTRY" zone (vm_map_entry_t->read_ahead)

2016-02-14 Thread Andriy Gapon
On 10/02/2016 23:28, Andriy Gapon wrote:
> 
> Over a span of approximately 3 weeks I have got two slightly different panics 
> of
> the same kind.   The affected system is a several months old amd64 head.

I added a small assertion and it got tripped:
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -608,8 +608,11 @@ readrest:
cluster_offset;
}
ahead = ulmin(ahead, atop(fs.entry->end - vaddr) - 1);
-   if (era != nera)
+   if (era != nera) {
+   KASSERT(fs.lookup_still_valid,
+   ("modifying read_ahead without map lock"));
fs.entry->read_ahead = nera;
+   }

/*
 * Call the pager to retrieve the data, if any, after

So, it seems that the code allows modification of read_ahead field of an entry
without holding a map lock.  I'd guess that that should be mostly harmless as
read_ahead value is not critical, but it is still not nice.

Some details from the panic caused by the assertion can be found here:
http://dpaste.com/39BYV7S.txt


>  1 ===
> Unread portion of the kernel message buffer:
> panic: Memory modified after free 0xf8008c15ac80(128) val=adc0de @
> 0xf8008c15acdc
> 
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0x8041e90b = 
> db_trace_self_wrapper+0x2b/frame
> 0xfe04f5349530
> kdb_backtrace() at 0x80669a09 = kdb_backtrace+0x39/frame 
> 0xfe04f53495e0
> vpanic() at 0x80634dec = vpanic+0x14c/frame 0xfe04f5349620
> panic() at 0x80634b33 = panic+0x43/frame 0xfe04f5349680
> trash_ctor() at 0x807de9c8 = trash_ctor+0x48/frame 0xfe04f5349690
> uma_zalloc_arg() at 0x807da785 = uma_zalloc_arg+0x475/frame
> 0xfe04f5349720
> uma_zalloc() at 0x807e44af = uma_zalloc+0xf/frame 0xfe04f5349730
> vm_map_entry_create() at 0x807e5a2e = vm_map_entry_create+0x2e/frame
> 0xfe04f5349740
> _vm_map_clip_start() at 0x807e69e3 = _vm_map_clip_start+0x123/frame
> 0xfe04f5349770
> vm_map_wire() at 0x807e797d = vm_map_wire+0x11d/frame 
> 0xfe04f53497f0
> vslock() at 0x807e1a2b = vslock+0x6b/frame 0xfe04f5349810
> sysctl_wire_old_buffer() at 0x8064148a =
> sysctl_wire_old_buffer+0x4a/frame 0xfe04f5349830
> sysctl_kern_proc() at 0x8062259b = sysctl_kern_proc+0x8b/frame
> 0xfe04f5349880
> sysctl_root_handler_locked() at 0x80641a8e =
> sysctl_root_handler_locked+0x8e/frame 0xfe04f53498c0
> sysctl_root() at 0x806412fe = sysctl_root+0x13e/frame 
> 0xfe04f5349940
> userland_sysctl() at 0x8064183d = userland_sysctl+0x16d/frame
> 0xfe04f53499e0
> sys___sysctl() at 0x80641694 = sys___sysctl+0x74/frame 
> 0xfe04f5349a90
> syscallenter() at 0x8081fa20 = syscallenter+0x320/frame 
> 0xfe04f5349b00
> amd64_syscall() at 0x8081f5ef = amd64_syscall+0x1f/frame 
> 0xfe04f5349bf0
> Xfast_syscall() at 0x80807c5b = Xfast_syscall+0xfb/frame 
> 0xfe04f5349bf0
> --- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x8042225ea, rsp =
> 0x7fffcef8, rbp = 0x7fffcf30 ---
> Uptime: 14d22h38m17s
> ==
> 
>  2 ===
> Unread portion of the kernel message buffer:
> panic: Memory modified after free 0xf80176692680(128) val=adc0de @
> 0xf801766926dc
> 
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0x8041e90b = 
> db_trace_self_wrapper+0x2b/frame
> 0xfe04f507c5a0
> kdb_backtrace() at 0x80669a09 = kdb_backtrace+0x39/frame 
> 0xfe04f507c650
> vpanic() at 0x80634dec = vpanic+0x14c/frame 0xfe04f507c690
> panic() at 0x80634b33 = panic+0x43/frame 0xfe04f507c6f0
> trash_ctor() at 0x807de9c8 = trash_ctor+0x48/frame 0xfe04f507c700
> uma_zalloc_arg() at 0x807da785 = uma_zalloc_arg+0x475/frame
> 0xfe04f507c790
> uma_zalloc() at 0x807e44af = uma_zalloc+0xf/frame 0xfe04f507c7a0
> vm_map_entry_create() at 0x807e5a2e = vm_map_entry_create+0x2e/frame
> 0xfe04f507c7b0
> vm_map_insert() at 0x807e558a = vm_map_insert+0x2fa/frame 
> 0xfe04f507c850
> vm_map_stack_locked() at 0x807e63cb = vm_map_stack_locked+0x13b/frame
> 0xfe04f507c8b0
> vm_map_find() at 0x807e65d3 = vm_map_find+0x183/frame 
> 0xfe04f507c950
> vm_mmap_object() at 0x807eac99 = vm_mmap_object+0x329/frame
> 0xfe04f507c9d0
> sys_mmap() at 0x807ea8ec = sys_mmap+0x41c/frame 0xfe04f507ca90
> syscallenter() at 0x8081fa20 = syscallenter+0x320/frame 
> 0xfe04f507cb00
> amd64_syscall() at 0x8081f5ef =