Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-25 Thread Stephen C. Tweedie

Hi,

On Sat, Mar 24, 2001 at 10:05:18PM -0300, Rik van Riel wrote:
> On Sun, 25 Mar 2001, Stephen C. Tweedie wrote:
> 
> > Rik, do you think it is really necessary to take the page lock and
> > release it inside lookup_swap_cache?  I may be overlooking something,
> > but I can't see the benefit of it ---
> 
> I don't think we need to do this, except to protect us from
> using a page which isn't up-to-date yet and locked because
> of disk IO.

But it doesn't --- page_launder can try to lock the page after it
checks the refcount, without taking any locks which protect us against
running lookup_swap_cache in parallel.  If we get our reference after
page_launder checks the count, we can find the page getting locked out
from underneath our feet.

> Reclaim_page() takes the pagecache_lock before trying to
> free anything, so there's no reason to lock against that.

Exactly.  We're not in danger of _losing_ the page, because
reclaim_page is locked more aggressively than page_launder.  We still
risk having the page locked against us after lookup_swap_cache does
its own UnlockPage.

So, if lookup_swap_cache doesn't actually ensure that the page is
unlocked, are there any callers which implicitly rely on
lookup_swap_cache() doing a wait_on_page?

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-25 Thread Stephen C. Tweedie

Hi,

On Sat, Mar 24, 2001 at 10:05:18PM -0300, Rik van Riel wrote:
 On Sun, 25 Mar 2001, Stephen C. Tweedie wrote:
 
  Rik, do you think it is really necessary to take the page lock and
  release it inside lookup_swap_cache?  I may be overlooking something,
  but I can't see the benefit of it ---
 
 I don't think we need to do this, except to protect us from
 using a page which isn't up-to-date yet and locked because
 of disk IO.

But it doesn't --- page_launder can try to lock the page after it
checks the refcount, without taking any locks which protect us against
running lookup_swap_cache in parallel.  If we get our reference after
page_launder checks the count, we can find the page getting locked out
from underneath our feet.

 Reclaim_page() takes the pagecache_lock before trying to
 free anything, so there's no reason to lock against that.

Exactly.  We're not in danger of _losing_ the page, because
reclaim_page is locked more aggressively than page_launder.  We still
risk having the page locked against us after lookup_swap_cache does
its own UnlockPage.

So, if lookup_swap_cache doesn't actually ensure that the page is
unlocked, are there any callers which implicitly rely on
lookup_swap_cache() doing a wait_on_page?

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-24 Thread Rik van Riel

On Sun, 25 Mar 2001, Stephen C. Tweedie wrote:

> Rik, do you think it is really necessary to take the page lock and
> release it inside lookup_swap_cache?  I may be overlooking something,
> but I can't see the benefit of it ---

I don't think we need to do this, except to protect us from
using a page which isn't up-to-date yet and locked because
of disk IO.

Reclaim_page() takes the pagecache_lock before trying to
free anything, so there's no reason to lock against that.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-24 Thread Stephen C. Tweedie

Hi,

On Fri, Mar 23, 2001 at 11:58:50AM -0800, Linus Torvalds wrote:

> Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.

Uggh --- the shmem code already does, see:

shmem_truncate->shmem_truncate_part->shmem_free_swp->
lookup_swap_cache->find_lock_page

It looks messy: lookup_swap_cache seems to be abusing the page lock
gratuitously, but there are probably callers of it which rely on the
assumption that it performs an implicit wait_on_page().

Rik, do you think it is really necessary to take the page lock and
release it inside lookup_swap_cache?  I may be overlooking something,
but I can't see the benefit of it --- we can still race against
page_launder, so the page may still get locked behind our backs after
we get the reference from lookup_swap_cache (page_launder explicitly
avoids taking the pagecache hash spinlock which might avoid this
particular race).

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-24 Thread Stephen C. Tweedie

Hi,

On Fri, Mar 23, 2001 at 11:58:50AM -0800, Linus Torvalds wrote:

 Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.

Uggh --- the shmem code already does, see:

shmem_truncate-shmem_truncate_part-shmem_free_swp-
lookup_swap_cache-find_lock_page

It looks messy: lookup_swap_cache seems to be abusing the page lock
gratuitously, but there are probably callers of it which rely on the
assumption that it performs an implicit wait_on_page().

Rik, do you think it is really necessary to take the page lock and
release it inside lookup_swap_cache?  I may be overlooking something,
but I can't see the benefit of it --- we can still race against
page_launder, so the page may still get locked behind our backs after
we get the reference from lookup_swap_cache (page_launder explicitly
avoids taking the pagecache hash spinlock which might avoid this
particular race).

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-24 Thread Rik van Riel

On Sun, 25 Mar 2001, Stephen C. Tweedie wrote:

 Rik, do you think it is really necessary to take the page lock and
 release it inside lookup_swap_cache?  I may be overlooking something,
 but I can't see the benefit of it ---

I don't think we need to do this, except to protect us from
using a page which isn't up-to-date yet and locked because
of disk IO.

Reclaim_page() takes the pagecache_lock before trying to
free anything, so there's no reason to lock against that.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Linus Torvalds



On Fri, 23 Mar 2001, Alan Cox wrote:
>
> __find_get_page has I think a misleading comment ?

Ehh..

I only said the _naming_ makes sense. [ Wild hand-waving ]

I suspect that what happened was that we split off the functions (one to
just get the page, one to lock it), and the comment that was associated
with the original "find_page()" never got removed, and just happens to sit
above one of the helper functions now - the one that didn't lock.

I'll fix the comment.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Alan Cox

> If you don't want to sleep, you need to use one of the wrappers for
> "__find_page_nolock()". Something like "find_get_page()", which only
> "gets" the page.

 * a rather lightweight function, finding and getting a reference to a
 * hashed page atomically, waiting for it if it's locked.

__find_get_page has I think a misleading comment ?

> The naming actually does make sense in this area.

Yep

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread David S. Miller


Alan Cox writes:
 > Umm find_lock_page doesnt sleep does it ?

It does lock_page, which sleeps to get the lock if necessary.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Alan Cox

> > >   +   page = find_lock_page(mapping, idx);
> > > 
> > > Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.
> > 
> > Umm find_lock_page doesnt sleep does it ?
> 
> It certainly does. find_lock_page() -> __find_lock_page() -> lock_page() ->
> -> __lock_page() -> schedule().

Ok I missed the lock page one. Yep.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Alexander Viro



On Fri, 23 Mar 2001, Alan Cox wrote:

> > On Fri, 23 Mar 2001, Stephen C. Tweedie wrote:
> > >
> > > The patch below is for two races in sysV shared memory.
> > 
> > +   spin_lock (>lock);
> > +
> > +   /* The shmem_swp_entry() call may have blocked, and
> > +* shmem_writepage may have been moving a page between the page
> > +* cache and swap cache.  We need to recheck the page cache
> > +* under the protection of the info->lock spinlock. */
> > +
> > +   page = find_lock_page(mapping, idx);
> > 
> > Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.
> 
> Umm find_lock_page doesnt sleep does it ?

It certainly does. find_lock_page() -> __find_lock_page() -> lock_page() ->
-> __lock_page() -> schedule().

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Linus Torvalds



On Fri, 23 Mar 2001, Alan Cox wrote:
> >
> > +   spin_lock (>lock);
> > +
> > +   /* The shmem_swp_entry() call may have blocked, and
> > +* shmem_writepage may have been moving a page between the page
> > +* cache and swap cache.  We need to recheck the page cache
> > +* under the protection of the info->lock spinlock. */
> > +
> > +   page = find_lock_page(mapping, idx);
> >
> > Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.
>
> Umm find_lock_page doesnt sleep does it ?

Sure it does. Note the "lock" in find_lock_page(). It will lock the page,
which implies sleeping if somebody is accessing it at the same time.

If you don't want to sleep, you need to use one of the wrappers for
"__find_page_nolock()". Something like "find_get_page()", which only
"gets" the page.

The naming actually does make sense in this area.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Alan Cox

> On Fri, 23 Mar 2001, Stephen C. Tweedie wrote:
> >
> > The patch below is for two races in sysV shared memory.
> 
>   +   spin_lock (>lock);
>   +
>   +   /* The shmem_swp_entry() call may have blocked, and
>   +* shmem_writepage may have been moving a page between the page
>   +* cache and swap cache.  We need to recheck the page cache
>   +* under the protection of the info->lock spinlock. */
>   +
>   +   page = find_lock_page(mapping, idx);
> 
> Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.

Umm find_lock_page doesnt sleep does it ?

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Linus Torvalds



On Fri, 23 Mar 2001, Stephen C. Tweedie wrote:
>
> The patch below is for two races in sysV shared memory.

+   spin_lock (>lock);
+
+   /* The shmem_swp_entry() call may have blocked, and
+* shmem_writepage may have been moving a page between the page
+* cache and swap cache.  We need to recheck the page cache
+* under the protection of the info->lock spinlock. */
+
+   page = find_lock_page(mapping, idx);

Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Linus Torvalds



On Fri, 23 Mar 2001, Stephen C. Tweedie wrote:

 The patch below is for two races in sysV shared memory.

+   spin_lock (info-lock);
+
+   /* The shmem_swp_entry() call may have blocked, and
+* shmem_writepage may have been moving a page between the page
+* cache and swap cache.  We need to recheck the page cache
+* under the protection of the info-lock spinlock. */
+
+   page = find_lock_page(mapping, idx);

Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Alan Cox

 On Fri, 23 Mar 2001, Stephen C. Tweedie wrote:
 
  The patch below is for two races in sysV shared memory.
 
   +   spin_lock (info-lock);
   +
   +   /* The shmem_swp_entry() call may have blocked, and
   +* shmem_writepage may have been moving a page between the page
   +* cache and swap cache.  We need to recheck the page cache
   +* under the protection of the info-lock spinlock. */
   +
   +   page = find_lock_page(mapping, idx);
 
 Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.

Umm find_lock_page doesnt sleep does it ?

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Alexander Viro



On Fri, 23 Mar 2001, Alan Cox wrote:

  On Fri, 23 Mar 2001, Stephen C. Tweedie wrote:
  
   The patch below is for two races in sysV shared memory.
  
  +   spin_lock (info-lock);
  +
  +   /* The shmem_swp_entry() call may have blocked, and
  +* shmem_writepage may have been moving a page between the page
  +* cache and swap cache.  We need to recheck the page cache
  +* under the protection of the info-lock spinlock. */
  +
  +   page = find_lock_page(mapping, idx);
  
  Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.
 
 Umm find_lock_page doesnt sleep does it ?

It certainly does. find_lock_page() - __find_lock_page() - lock_page() -
- __lock_page() - schedule().

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Linus Torvalds



On Fri, 23 Mar 2001, Alan Cox wrote:
 
  +   spin_lock (info-lock);
  +
  +   /* The shmem_swp_entry() call may have blocked, and
  +* shmem_writepage may have been moving a page between the page
  +* cache and swap cache.  We need to recheck the page cache
  +* under the protection of the info-lock spinlock. */
  +
  +   page = find_lock_page(mapping, idx);
 
  Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.

 Umm find_lock_page doesnt sleep does it ?

Sure it does. Note the "lock" in find_lock_page(). It will lock the page,
which implies sleeping if somebody is accessing it at the same time.

If you don't want to sleep, you need to use one of the wrappers for
"__find_page_nolock()". Something like "find_get_page()", which only
"gets" the page.

The naming actually does make sense in this area.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Alan Cox

 +   page = find_lock_page(mapping, idx);
   
   Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.
  
  Umm find_lock_page doesnt sleep does it ?
 
 It certainly does. find_lock_page() - __find_lock_page() - lock_page() -
 - __lock_page() - schedule().

Ok I missed the lock page one. Yep.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Alan Cox

 If you don't want to sleep, you need to use one of the wrappers for
 "__find_page_nolock()". Something like "find_get_page()", which only
 "gets" the page.

 * a rather lightweight function, finding and getting a reference to a
 * hashed page atomically, waiting for it if it's locked.

__find_get_page has I think a misleading comment ?

 The naming actually does make sense in this area.

Yep

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread Linus Torvalds



On Fri, 23 Mar 2001, Alan Cox wrote:

 __find_get_page has I think a misleading comment ?

Ehh..

I only said the _naming_ makes sense. [ Wild hand-waving ]

I suspect that what happened was that we split off the functions (one to
just get the page, one to lock it), and the comment that was associated
with the original "find_page()" never got removed, and just happens to sit
above one of the helper functions now - the one that didn't lock.

I'll fix the comment.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-23 Thread David S. Miller


Alan Cox writes:
  Umm find_lock_page doesnt sleep does it ?

It does lock_page, which sleeps to get the lock if necessary.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-22 Thread Stephen C. Tweedie

Hi,

The patch below is for two races in sysV shared memory.

The first (minor) one is in shmem_free_swp:

swap_free (entry);
*ptr = (swp_entry_t){0};
freed++;
if (!(page = lookup_swap_cache(entry)))
continue;
delete_from_swap_cache(page);
page_cache_release(page);

has a window between the first swap_free() and the
lookup_swap_cache().  If the swap_free() frees the last ref to the
swap entry and another cpu allocates and caches the same entry before
the lookup, we'll end up destroying another task's swap cache.

The second is nastier.  shmem_nopage() uses the inode semaphore to
serialise access to shmem_getpage_locked() for paging in shared memory
segments.  Lookups in the page cache and in the shmem swap vector are
done to locate the entry.  _getpage_ can move entries from swap to
page cache under protection of the shmem's info->lock spinlock.

shmem_writepage() is locked via the page lock, and moves shmem pages
from the page cache to the swap cache under protection of the same
info->lock spinlock.

However, shmem_nopage() does not hold this spinlock while doing its
lookups in the page cache and swap vectors, so it can race with
writepage, with once cpu in the middle of moving the page out of the
page cache in writepage and the other cpu then failing to find the
entry either in the page cache or in the shm swap entry vector.

Feedback welcome.

Cheers, 
 Stephen


--- mm/shmem.c.~1~  Fri Mar 23 00:26:49 2001
+++ mm/shmem.c  Fri Mar 23 00:42:21 2001
@@ -121,13 +121,13 @@
if (!ptr->val)
continue;
entry = *ptr;
-   swap_free (entry);
*ptr = (swp_entry_t){0};
freed++;
-   if (!(page = lookup_swap_cache(entry)))
-   continue;
-   delete_from_swap_cache(page);
-   page_cache_release(page);
+   if ((page = lookup_swap_cache(entry)) != NULL) {
+   delete_from_swap_cache(page);
+   page_cache_release(page);   
+   }
+   swap_free (entry);
}
return freed;
 }
@@ -218,15 +218,24 @@
 }
 
 /*
- * Move the page from the page cache to the swap cache
+ * Move the page from the page cache to the swap cache.
+ *
+ * The page lock prevents multiple occurences of shmem_writepage at
+ * once.  We still need to guard against racing with
+ * shmem_getpage_locked().  
  */
 static int shmem_writepage(struct page * page)
 {
int error;
struct shmem_inode_info *info;
swp_entry_t *entry, swap;
+   struct inode *inode;
 
-   info = >mapping->host->u.shmem_i;
+   if (!PageLocked(page))
+   BUG();
+   
+   inode = page->mapping->host;
+   info = >u.shmem_i;
swap = __get_swap_page(2);
if (!swap.val) {
set_page_dirty(page);
@@ -234,11 +243,11 @@
return -ENOMEM;
}
 
-   spin_lock(>lock);
-   shmem_recalc_inode(page->mapping->host);
entry = shmem_swp_entry(info, page->index);
if (IS_ERR(entry))  /* this had been allocted on page allocation */
BUG();
+   spin_lock(>lock);
+   shmem_recalc_inode(page->mapping->host);
error = -EAGAIN;
if (entry->val) {
__swap_free(swap, 2);
@@ -268,6 +277,10 @@
  * If we allocate a new one we do not mark it dirty. That's up to the
  * vm. If we swap it in we mark it dirty since we also free the swap
  * entry since a page cannot live in both the swap and page cache
+ *
+ * Called with the inode locked, so it cannot race with itself, but we
+ * still need to guard against racing with shm_writepage(), which might
+ * be trying to move the page to the swap cache as we run.
  */
 static struct page * shmem_getpage_locked(struct inode * inode, unsigned long idx)
 {
@@ -276,31 +289,57 @@
struct page * page;
swp_entry_t *entry;
 
-   page = find_lock_page(mapping, idx);;
+   info = >u.shmem_i;
+
+repeat:
+   page = find_lock_page(mapping, idx);
if (page)
return page;
 
-   info = >u.shmem_i;
entry = shmem_swp_entry (info, idx);
if (IS_ERR(entry))
return (void *)entry;
+
+   spin_lock (>lock);
+   
+   /* The shmem_swp_entry() call may have blocked, and
+* shmem_writepage may have been moving a page between the page
+* cache and swap cache.  We need to recheck the page cache
+* under the protection of the info->lock spinlock. */
+
+   page = find_lock_page(mapping, idx);
+   if (page) {
+   spin_unlock (>lock);
+   return page;
+   }
+   
if (entry->val) {
unsigned long flags;
 
/* Look it up and read it in.. */
page = 

[PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-22 Thread Stephen C. Tweedie

Hi,

The patch below is for two races in sysV shared memory.

The first (minor) one is in shmem_free_swp:

swap_free (entry);
*ptr = (swp_entry_t){0};
freed++;
if (!(page = lookup_swap_cache(entry)))
continue;
delete_from_swap_cache(page);
page_cache_release(page);

has a window between the first swap_free() and the
lookup_swap_cache().  If the swap_free() frees the last ref to the
swap entry and another cpu allocates and caches the same entry before
the lookup, we'll end up destroying another task's swap cache.

The second is nastier.  shmem_nopage() uses the inode semaphore to
serialise access to shmem_getpage_locked() for paging in shared memory
segments.  Lookups in the page cache and in the shmem swap vector are
done to locate the entry.  _getpage_ can move entries from swap to
page cache under protection of the shmem's info-lock spinlock.

shmem_writepage() is locked via the page lock, and moves shmem pages
from the page cache to the swap cache under protection of the same
info-lock spinlock.

However, shmem_nopage() does not hold this spinlock while doing its
lookups in the page cache and swap vectors, so it can race with
writepage, with once cpu in the middle of moving the page out of the
page cache in writepage and the other cpu then failing to find the
entry either in the page cache or in the shm swap entry vector.

Feedback welcome.

Cheers, 
 Stephen


--- mm/shmem.c.~1~  Fri Mar 23 00:26:49 2001
+++ mm/shmem.c  Fri Mar 23 00:42:21 2001
@@ -121,13 +121,13 @@
if (!ptr-val)
continue;
entry = *ptr;
-   swap_free (entry);
*ptr = (swp_entry_t){0};
freed++;
-   if (!(page = lookup_swap_cache(entry)))
-   continue;
-   delete_from_swap_cache(page);
-   page_cache_release(page);
+   if ((page = lookup_swap_cache(entry)) != NULL) {
+   delete_from_swap_cache(page);
+   page_cache_release(page);   
+   }
+   swap_free (entry);
}
return freed;
 }
@@ -218,15 +218,24 @@
 }
 
 /*
- * Move the page from the page cache to the swap cache
+ * Move the page from the page cache to the swap cache.
+ *
+ * The page lock prevents multiple occurences of shmem_writepage at
+ * once.  We still need to guard against racing with
+ * shmem_getpage_locked().  
  */
 static int shmem_writepage(struct page * page)
 {
int error;
struct shmem_inode_info *info;
swp_entry_t *entry, swap;
+   struct inode *inode;
 
-   info = page-mapping-host-u.shmem_i;
+   if (!PageLocked(page))
+   BUG();
+   
+   inode = page-mapping-host;
+   info = inode-u.shmem_i;
swap = __get_swap_page(2);
if (!swap.val) {
set_page_dirty(page);
@@ -234,11 +243,11 @@
return -ENOMEM;
}
 
-   spin_lock(info-lock);
-   shmem_recalc_inode(page-mapping-host);
entry = shmem_swp_entry(info, page-index);
if (IS_ERR(entry))  /* this had been allocted on page allocation */
BUG();
+   spin_lock(info-lock);
+   shmem_recalc_inode(page-mapping-host);
error = -EAGAIN;
if (entry-val) {
__swap_free(swap, 2);
@@ -268,6 +277,10 @@
  * If we allocate a new one we do not mark it dirty. That's up to the
  * vm. If we swap it in we mark it dirty since we also free the swap
  * entry since a page cannot live in both the swap and page cache
+ *
+ * Called with the inode locked, so it cannot race with itself, but we
+ * still need to guard against racing with shm_writepage(), which might
+ * be trying to move the page to the swap cache as we run.
  */
 static struct page * shmem_getpage_locked(struct inode * inode, unsigned long idx)
 {
@@ -276,31 +289,57 @@
struct page * page;
swp_entry_t *entry;
 
-   page = find_lock_page(mapping, idx);;
+   info = inode-u.shmem_i;
+
+repeat:
+   page = find_lock_page(mapping, idx);
if (page)
return page;
 
-   info = inode-u.shmem_i;
entry = shmem_swp_entry (info, idx);
if (IS_ERR(entry))
return (void *)entry;
+
+   spin_lock (info-lock);
+   
+   /* The shmem_swp_entry() call may have blocked, and
+* shmem_writepage may have been moving a page between the page
+* cache and swap cache.  We need to recheck the page cache
+* under the protection of the info-lock spinlock. */
+
+   page = find_lock_page(mapping, idx);
+   if (page) {
+   spin_unlock (info-lock);
+   return page;
+   }
+   
if (entry-val) {
unsigned long flags;
 
/* Look it up and read it in.. */