Re: [PATCH 06/11] pipe: no need to confirm page cache buf

2016-09-27 Thread Miklos Szeredi
On Tue, Sep 27, 2016 at 5:40 AM, Al Viro  wrote:
> On Wed, Sep 14, 2016 at 10:37:11AM +0200, Miklos Szeredi wrote:
>
>> Things could happen to that page that make it not uptodate while sitting in
>> the pipe, but it's questionable whether we should care about that.
>> Checking for being uptodate in the face of such page state change is always
>> going to be racy.
>
> I'm not sure it's the right thing to do here; that area looks like a victim
> of serious bitrot - once upon a time it was ->map(), which used to lock
> page, verity that it's valid, and kmap it.  ->unmap() did kunmap + unlock.
>
> Then the validate part got split off (->pin(), later renamed to ->confirm()),
> with lock _not_ held over the kmap/kunmap.  That's the point when it got racy,
> AFAICS.  OTOH, I would really hate to hold a page locked over e.g. copying to
> userland - too easy to get deadlocks that way.
>
> Jens, could you comment?  Pages definitely shouldn't be getting into pipe
> without having been uptodate; the question is what (if anything) should be
> done about having a page go non-uptodate (on truncate, hole-punching, etc.)
> while a reference to it is sitting in a pipe buffer.

Truncate/holepunch is interesting.  It doesn't make the page go
non-uptodate, just clears page->mapping.  Works like a charm, old data
can be read from the pipe just fine.

Partial truncate is even more interesting, because it will result in
partially cleared data (race is there with plain read() as well,
AFAICS, but very narrow).

Page invalidation by filesystems is similar to truncate.  Old data can
be read from the pipe.  And in fact this probably the way we want it
to work, since redoing the page lookup in ->confirm() would be really
messy.

Also just modifying the data sitting in the pipe will also result in
undefined behavior; either the old or the new data can be read out
from the other end of the pipe.

And while not explicitly documented, all the above cases are fine and
implicit in the non-copy behavior of splice.  Perhaps the man page
should note that modifying data after splice but before reading from
the other end of the pipe results in undefined behavior.

I haven't looked at filesystems, but generic code calls
ClearPageUptodate() from only a few places:

end_swap_bio_read(): does it on a non-uptodate page
page_endio(): AFAICS part of the page reading chain, again doing it on
a non-uptodate page
me_swapcache_dirty(): memory error on dirty swapcache page, this is
the only one that would make sense to trigger EIO on reading the pipe
buffer.  But then why only the dirty swapcache case?

Thanks,
Miklos


Re: [PATCH 06/11] pipe: no need to confirm page cache buf

2016-09-27 Thread Miklos Szeredi
On Tue, Sep 27, 2016 at 5:40 AM, Al Viro  wrote:
> On Wed, Sep 14, 2016 at 10:37:11AM +0200, Miklos Szeredi wrote:
>
>> Things could happen to that page that make it not uptodate while sitting in
>> the pipe, but it's questionable whether we should care about that.
>> Checking for being uptodate in the face of such page state change is always
>> going to be racy.
>
> I'm not sure it's the right thing to do here; that area looks like a victim
> of serious bitrot - once upon a time it was ->map(), which used to lock
> page, verity that it's valid, and kmap it.  ->unmap() did kunmap + unlock.
>
> Then the validate part got split off (->pin(), later renamed to ->confirm()),
> with lock _not_ held over the kmap/kunmap.  That's the point when it got racy,
> AFAICS.  OTOH, I would really hate to hold a page locked over e.g. copying to
> userland - too easy to get deadlocks that way.
>
> Jens, could you comment?  Pages definitely shouldn't be getting into pipe
> without having been uptodate; the question is what (if anything) should be
> done about having a page go non-uptodate (on truncate, hole-punching, etc.)
> while a reference to it is sitting in a pipe buffer.

Truncate/holepunch is interesting.  It doesn't make the page go
non-uptodate, just clears page->mapping.  Works like a charm, old data
can be read from the pipe just fine.

Partial truncate is even more interesting, because it will result in
partially cleared data (race is there with plain read() as well,
AFAICS, but very narrow).

Page invalidation by filesystems is similar to truncate.  Old data can
be read from the pipe.  And in fact this probably the way we want it
to work, since redoing the page lookup in ->confirm() would be really
messy.

Also just modifying the data sitting in the pipe will also result in
undefined behavior; either the old or the new data can be read out
from the other end of the pipe.

And while not explicitly documented, all the above cases are fine and
implicit in the non-copy behavior of splice.  Perhaps the man page
should note that modifying data after splice but before reading from
the other end of the pipe results in undefined behavior.

I haven't looked at filesystems, but generic code calls
ClearPageUptodate() from only a few places:

end_swap_bio_read(): does it on a non-uptodate page
page_endio(): AFAICS part of the page reading chain, again doing it on
a non-uptodate page
me_swapcache_dirty(): memory error on dirty swapcache page, this is
the only one that would make sense to trigger EIO on reading the pipe
buffer.  But then why only the dirty swapcache case?

Thanks,
Miklos


Re: [PATCH 06/11] pipe: no need to confirm page cache buf

2016-09-26 Thread Al Viro
On Wed, Sep 14, 2016 at 10:37:11AM +0200, Miklos Szeredi wrote:

> Things could happen to that page that make it not uptodate while sitting in
> the pipe, but it's questionable whether we should care about that.
> Checking for being uptodate in the face of such page state change is always
> going to be racy.

I'm not sure it's the right thing to do here; that area looks like a victim
of serious bitrot - once upon a time it was ->map(), which used to lock
page, verity that it's valid, and kmap it.  ->unmap() did kunmap + unlock.

Then the validate part got split off (->pin(), later renamed to ->confirm()),
with lock _not_ held over the kmap/kunmap.  That's the point when it got racy,
AFAICS.  OTOH, I would really hate to hold a page locked over e.g. copying to
userland - too easy to get deadlocks that way.

Jens, could you comment?  Pages definitely shouldn't be getting into pipe
without having been uptodate; the question is what (if anything) should be
done about having a page go non-uptodate (on truncate, hole-punching, etc.)
while a reference to it is sitting in a pipe buffer.


Re: [PATCH 06/11] pipe: no need to confirm page cache buf

2016-09-26 Thread Al Viro
On Wed, Sep 14, 2016 at 10:37:11AM +0200, Miklos Szeredi wrote:

> Things could happen to that page that make it not uptodate while sitting in
> the pipe, but it's questionable whether we should care about that.
> Checking for being uptodate in the face of such page state change is always
> going to be racy.

I'm not sure it's the right thing to do here; that area looks like a victim
of serious bitrot - once upon a time it was ->map(), which used to lock
page, verity that it's valid, and kmap it.  ->unmap() did kunmap + unlock.

Then the validate part got split off (->pin(), later renamed to ->confirm()),
with lock _not_ held over the kmap/kunmap.  That's the point when it got racy,
AFAICS.  OTOH, I would really hate to hold a page locked over e.g. copying to
userland - too easy to get deadlocks that way.

Jens, could you comment?  Pages definitely shouldn't be getting into pipe
without having been uptodate; the question is what (if anything) should be
done about having a page go non-uptodate (on truncate, hole-punching, etc.)
while a reference to it is sitting in a pipe buffer.


[PATCH 06/11] pipe: no need to confirm page cache buf

2016-09-14 Thread Miklos Szeredi
page_cache_pipe_buf_confirm() is used only in page_cache_pipe_buf_ops.

page_cache_pipe_buf_ops is used in two places:

1) __generic_file_splice_read()

This iterates all the pages, if not uptodate locks page, and if still not
uptodate does ->readpage() which reads the page synchronously.

2) shmem_file_splice_read()

This calls shmem_getpage() on individual pages.  shmem_getpage() swaps in
the page synchronously if not in memory, so it also seems to return an
uptodate page.

So all pages put into the buffer will be uptodate.

Things could happen to that page that make it not uptodate while sitting in
the pipe, but it's questionable whether we should care about that.
Checking for being uptodate in the face of such page state change is always
going to be racy.

Signed-off-by: Miklos Szeredi 
---
 fs/splice.c | 43 +--
 1 file changed, 1 insertion(+), 42 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index ba7a2240d58c..0ecbe3011796 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -92,51 +92,10 @@ static void page_cache_pipe_buf_release(struct 
pipe_inode_info *pipe,
buf->flags &= ~PIPE_BUF_FLAG_LRU;
 }
 
-/*
- * Check whether the contents of buf is OK to access. Since the content
- * is a page cache page, IO may be in flight.
- */
-static int page_cache_pipe_buf_confirm(struct pipe_inode_info *pipe,
-  struct pipe_buffer *buf)
-{
-   struct page *page = buf->page;
-   int err;
-
-   if (!PageUptodate(page)) {
-   lock_page(page);
-
-   /*
-* Page got truncated/unhashed. This will cause a 0-byte
-* splice, if this is the first page.
-*/
-   if (!page->mapping) {
-   err = -ENODATA;
-   goto error;
-   }
-
-   /*
-* Uh oh, read-error from disk.
-*/
-   if (!PageUptodate(page)) {
-   err = -EIO;
-   goto error;
-   }
-
-   /*
-* Page is ok afterall, we are done.
-*/
-   unlock_page(page);
-   }
-
-   return 0;
-error:
-   unlock_page(page);
-   return err;
-}
 
 const struct pipe_buf_operations page_cache_pipe_buf_ops = {
.can_merge = 0,
-   .confirm = page_cache_pipe_buf_confirm,
+   .confirm = generic_pipe_buf_confirm,
.release = page_cache_pipe_buf_release,
.steal = page_cache_pipe_buf_steal,
.get = generic_pipe_buf_get,
-- 
2.5.5



[PATCH 06/11] pipe: no need to confirm page cache buf

2016-09-14 Thread Miklos Szeredi
page_cache_pipe_buf_confirm() is used only in page_cache_pipe_buf_ops.

page_cache_pipe_buf_ops is used in two places:

1) __generic_file_splice_read()

This iterates all the pages, if not uptodate locks page, and if still not
uptodate does ->readpage() which reads the page synchronously.

2) shmem_file_splice_read()

This calls shmem_getpage() on individual pages.  shmem_getpage() swaps in
the page synchronously if not in memory, so it also seems to return an
uptodate page.

So all pages put into the buffer will be uptodate.

Things could happen to that page that make it not uptodate while sitting in
the pipe, but it's questionable whether we should care about that.
Checking for being uptodate in the face of such page state change is always
going to be racy.

Signed-off-by: Miklos Szeredi 
---
 fs/splice.c | 43 +--
 1 file changed, 1 insertion(+), 42 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index ba7a2240d58c..0ecbe3011796 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -92,51 +92,10 @@ static void page_cache_pipe_buf_release(struct 
pipe_inode_info *pipe,
buf->flags &= ~PIPE_BUF_FLAG_LRU;
 }
 
-/*
- * Check whether the contents of buf is OK to access. Since the content
- * is a page cache page, IO may be in flight.
- */
-static int page_cache_pipe_buf_confirm(struct pipe_inode_info *pipe,
-  struct pipe_buffer *buf)
-{
-   struct page *page = buf->page;
-   int err;
-
-   if (!PageUptodate(page)) {
-   lock_page(page);
-
-   /*
-* Page got truncated/unhashed. This will cause a 0-byte
-* splice, if this is the first page.
-*/
-   if (!page->mapping) {
-   err = -ENODATA;
-   goto error;
-   }
-
-   /*
-* Uh oh, read-error from disk.
-*/
-   if (!PageUptodate(page)) {
-   err = -EIO;
-   goto error;
-   }
-
-   /*
-* Page is ok afterall, we are done.
-*/
-   unlock_page(page);
-   }
-
-   return 0;
-error:
-   unlock_page(page);
-   return err;
-}
 
 const struct pipe_buf_operations page_cache_pipe_buf_ops = {
.can_merge = 0,
-   .confirm = page_cache_pipe_buf_confirm,
+   .confirm = generic_pipe_buf_confirm,
.release = page_cache_pipe_buf_release,
.steal = page_cache_pipe_buf_steal,
.get = generic_pipe_buf_get,
-- 
2.5.5