Re: [PATCH] hugetlbfs read() support
On Thu, 19 Jul 2007, Bill Irwin wrote: > On Thu, Jul 19, 2007 at 10:07:59AM -0700, Nishanth Aravamudan wrote: > > But I do think a second reason to do this is to make hugetlbfs behave > > like a normal fs -- that is read(), write(), etc. work on files in the > > mountpoint. But that is simply my opinion. > > Mine as well. ditto. here's a few other things i've run into recently: it should be possible to use cp(1) to load large datasets into a hugetlbfs. it should be possible to use ftruncate() on hugetlbfs files. (on a tmpfs it's req'd to extend the file before mmaping... on hugetlbfs it returns EINVAL or somesuch and mmap just magically extends files.) it should be possible to statfs() and get usage info... this works only if you mount with size=N. -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hugetlbfs read() support
On 20.07.2007 [14:47:31 +1000], Nick Piggin wrote: > Nishanth Aravamudan wrote: > >On 19.07.2007 [09:58:50 -0700], Andrew Morton wrote: > > > >>On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> > >>wrote: > >> > >> > >+} > >+ > >+offset += ret; > >+retval += ret; > >+len -= ret; > >+index += offset >> HPAGE_SHIFT; > >+offset &= ~HPAGE_MASK; > >+ > >+page_cache_release(page); > >+if (ret == nr && len) > >+continue; > >+goto out; > >+} > >+out: > >+return retval; > >+} > > This code doesn't have all the ghastly tricks which we deploy to > handle concurrent truncate. > >>> > >>>Do I need to ? Baaahh!! I don't want to deal with them. > >> > >>Nick, can you think of any serious consequences of a read/truncate > >>race in there? I can't.. > >> > >> > >>>All I want is a simple read() to get my oprofile working. Please > >>>advise. > >> > >>Did you consider changing oprofile userspace to read the executable > >>with mmap? > > > > > >It's not actually oprofile's code, though, it's libbfd (used by > >oprofile). And it works fine (presumably) for other binaries. > > So... what's the problem with changing it? The fact that it is a > library doesn't really make a difference except that you'll also help > everyone else who links with it. Well, I'm more concerned about testing that change libbfd is rather core code and used in a number of places. Also, libbfd's current code 'just works' for every other filesystem concerned, or I'd expect it would have been changed to mmap() before. I'm also terrified of binutils code :) I'm also not sure who I'm 'helping', exactly by changing it, beyond users of libhugetlbfs and OProfile, who are equally helped by this kernel patch (which, again, also has the added benefit of making hugetlbfs appear to be more like a normal filesystem). > It won't break backwards compatibility, and it will work on older > kernels... Fair enough. I'm looking into it, but I can't make any promises on timelines. Thanks, Nish -- Nishanth Aravamudan <[EMAIL PROTECTED]> IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hugetlbfs read() support
(sorry if this is a resend... something bad seems to have happened to me) Andrew Morton wrote: On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote: This code doesn't have all the ghastly tricks which we deploy to handle concurrent truncate. Do I need to ? Baaahh!! I don't want to deal with them. Nick, can you think of any serious consequences of a read/truncate race in there? I can't.. As it doesn't allow writes, then I _think_ it should be OK. If you ever did want to add write(2) support, then you would have transient zeroes problems. But I'm not completely sure.. we've had a lot of (and still have some known and probably unknown) bugs just in that single generic_mapping_read function, most of which are due to our rabid aversion to doing any locking whatsoever there. So why not just hold i_mutex around the whole thing to be safe? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hugetlbfs read() support
Andrew Morton wrote: On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote: This code doesn't have all the ghastly tricks which we deploy to handle concurrent truncate. Do I need to ? Baaahh!! I don't want to deal with them. Nick, can you think of any serious consequences of a read/truncate race in there? I can't.. As it doesn't allow writes, then I _think_ it should be OK. If you ever did want to add write(2) support, then you would have transient zeroes problems. But I'm not completely sure.. we've had a lot of (and still have some known and probably unknown) bugs just in that single generic_mapping_read function, most of which are due to our rabid aversion to doing any locking whatsoever there. So why not just hold i_mutex around the whole thing to be safe? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hugetlbfs read() support
Nishanth Aravamudan wrote: On 19.07.2007 [09:58:50 -0700], Andrew Morton wrote: On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote: + } + + offset += ret; + retval += ret; + len -= ret; + index += offset >> HPAGE_SHIFT; + offset &= ~HPAGE_MASK; + + page_cache_release(page); + if (ret == nr && len) + continue; + goto out; + } +out: + return retval; +} This code doesn't have all the ghastly tricks which we deploy to handle concurrent truncate. Do I need to ? Baaahh!! I don't want to deal with them. Nick, can you think of any serious consequences of a read/truncate race in there? I can't.. All I want is a simple read() to get my oprofile working. Please advise. Did you consider changing oprofile userspace to read the executable with mmap? It's not actually oprofile's code, though, it's libbfd (used by oprofile). And it works fine (presumably) for other binaries. So... what's the problem with changing it? The fact that it is a library doesn't really make a difference except that you'll also help everyone else who links with it. It won't break backwards compatibility, and it will work on older kernels... -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hugetlbfs read() support
On Fri, 2007-07-20 at 14:29 +1000, Nick Piggin wrote: > Andrew Morton wrote: > > On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> > > wrote: > > > > > + } > + > + offset += ret; > + retval += ret; > + len -= ret; > + index += offset >> HPAGE_SHIFT; > + offset &= ~HPAGE_MASK; > + > + page_cache_release(page); > + if (ret == nr && len) > + continue; > + goto out; > + } > +out: > + return retval; > +} > >>> > >>>This code doesn't have all the ghastly tricks which we deploy to handle > >>>concurrent truncate. > >> > >>Do I need to ? Baaahh!! I don't want to deal with them. > > > > > > Nick, can you think of any serious consequences of a read/truncate race in > > there? I can't.. > > As it doesn't allow writes, then I _think_ it should be OK. If you > ever did want to add write(2) support, then you would have transient > zeroes problems. I have no plans to add write() support - unless there is real reason for doing so. > > But why not just hold i_mutex around the whole thing just to be safe? Yeah. I can do that, just to be safe for future.. Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hugetlbfs read() support
Andrew Morton wrote: On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote: + } + + offset += ret; + retval += ret; + len -= ret; + index += offset >> HPAGE_SHIFT; + offset &= ~HPAGE_MASK; + + page_cache_release(page); + if (ret == nr && len) + continue; + goto out; + } +out: + return retval; +} This code doesn't have all the ghastly tricks which we deploy to handle concurrent truncate. Do I need to ? Baaahh!! I don't want to deal with them. Nick, can you think of any serious consequences of a read/truncate race in there? I can't.. As it doesn't allow writes, then I _think_ it should be OK. If you ever did want to add write(2) support, then you would have transient zeroes problems. But why not just hold i_mutex around the whole thing just to be safe? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hugetlbfs read() support
On Thu, Jul 19, 2007 at 10:07:59AM -0700, Nishanth Aravamudan wrote: > But I do think a second reason to do this is to make hugetlbfs behave > like a normal fs -- that is read(), write(), etc. work on files in the > mountpoint. But that is simply my opinion. Mine as well. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hugetlbfs read() support
On 19.07.2007 [09:58:50 -0700], Andrew Morton wrote: > On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > > > + } > > > > + > > > > + offset += ret; > > > > + retval += ret; > > > > + len -= ret; > > > > + index += offset >> HPAGE_SHIFT; > > > > + offset &= ~HPAGE_MASK; > > > > + > > > > + page_cache_release(page); > > > > + if (ret == nr && len) > > > > + continue; > > > > + goto out; > > > > + } > > > > +out: > > > > + return retval; > > > > +} > > > > > > This code doesn't have all the ghastly tricks which we deploy to > > > handle concurrent truncate. > > > > Do I need to ? Baaahh!! I don't want to deal with them. > > Nick, can you think of any serious consequences of a read/truncate > race in there? I can't.. > > > All I want is a simple read() to get my oprofile working. Please > > advise. > > Did you consider changing oprofile userspace to read the executable > with mmap? It's not actually oprofile's code, though, it's libbfd (used by oprofile). And it works fine (presumably) for other binaries. Just not for libhugetlbfs-relinked binaries because hugetlbfs doesn't behave like a normal ramfs (perhaps it shouldn't, but that's a different argument). But I do think a second reason to do this is to make hugetlbfs behave like a normal fs -- that is read(), write(), etc. work on files in the mountpoint. But that is simply my opinion. Thanks, Nish -- Nishanth Aravamudan <[EMAIL PROTECTED]> IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hugetlbfs read() support
On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > + } > > > + > > > + offset += ret; > > > + retval += ret; > > > + len -= ret; > > > + index += offset >> HPAGE_SHIFT; > > > + offset &= ~HPAGE_MASK; > > > + > > > + page_cache_release(page); > > > + if (ret == nr && len) > > > + continue; > > > + goto out; > > > + } > > > +out: > > > + return retval; > > > +} > > > > This code doesn't have all the ghastly tricks which we deploy to handle > > concurrent truncate. > > Do I need to ? Baaahh!! I don't want to deal with them. Nick, can you think of any serious consequences of a read/truncate race in there? I can't.. > All I want is a simple read() to get my oprofile working. > Please advise. Did you consider changing oprofile userspace to read the executable with mmap? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hugetlbfs read() support
On Wed, 2007-07-18 at 22:19 -0700, Andrew Morton wrote: > On Fri, 13 Jul 2007 18:23:33 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > Hi Andrew, > > > > Here is the patch to support read() for hugetlbfs, needed to get > > oprofile working on executables backed by largepages. > > > > If you plan to consider Christoph Lameter's pagecache cleanup patches, > > I will re-write this. Otherwise, please consider this for -mm. > > > > Thanks, > > Badari > > > > Support for reading from hugetlbfs files. libhugetlbfs lets application > > text/data to be placed in large pages. When we do that, oprofile doesn't > > work - since libbfd tries to read from it. > > > > This code is very similar to what do_generic_mapping_read() does, but > > I can't use it since it has PAGE_CACHE_SIZE assumptions. > > > > Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> > > Acked-by: William Irwin <[EMAIL PROTECTED]> > > Tested-by: Nishanth Aravamudan <[EMAIL PROTECTED]> > > > > fs/hugetlbfs/inode.c | 113 > > +++ > > 1 file changed, 113 insertions(+) > > > > Index: linux-2.6.22/fs/hugetlbfs/inode.c > > === > > --- linux-2.6.22.orig/fs/hugetlbfs/inode.c 2007-07-08 16:32:17.0 > > -0700 > > +++ linux-2.6.22/fs/hugetlbfs/inode.c 2007-07-13 19:24:36.0 > > -0700 > > @@ -156,6 +156,118 @@ full_search: > > } > > #endif > > > > +static int > > +hugetlbfs_read_actor(struct page *page, unsigned long offset, > > + char __user *buf, unsigned long count, > > + unsigned long size) > > +{ > > + char *kaddr; > > + unsigned long left, copied = 0; > > + int i, chunksize; > > + > > + if (size > count) > > + size = count; > > + > > + /* Find which 4k chunk and offset with in that chunk */ > > + i = offset >> PAGE_CACHE_SHIFT; > > + offset = offset & ~PAGE_CACHE_MASK; > > + > > + while (size) { > > + chunksize = PAGE_CACHE_SIZE; > > + if (offset) > > + chunksize -= offset; > > + if (chunksize > size) > > + chunksize = size; > > + kaddr = kmap(&page[i]); > > + left = __copy_to_user(buf, kaddr + offset, chunksize); > > + kunmap(&page[i]); > > + if (left) { > > + copied += (chunksize - left); > > + break; > > + } > > + offset = 0; > > + size -= chunksize; > > + buf += chunksize; > > + copied += chunksize; > > + i++; > > + } > > + return copied ? copied : -EFAULT; > > +} > > This returns -EFAULT when asked to read zero bytes. The caller prevents > that, but it's a little bit ugly. Livable with. I can fix that, but I didn't want to come here if length == 0 - so took a shortcut. > > > +/* > > + * Support for read() - Find the page attached to f_mapping and copy out > > the > > + * data. Its *very* similar to do_generic_mapping_read(), we can't use that > > + * since it has PAGE_CACHE_SIZE assumptions. > > + */ > > +ssize_t > > +hugetlbfs_read(struct file *filp, char __user *buf, size_t len, loff_t > > *ppos) > > +{ > > + struct address_space *mapping = filp->f_mapping; > > + struct inode *inode = mapping->host; > > + unsigned long index = *ppos >> HPAGE_SHIFT; > > + unsigned long end_index; > > + loff_t isize; > > + unsigned long offset; > > + ssize_t retval = 0; > > + > > + /* validate length */ > > + if (len == 0) > > + goto out; > > + > > + isize = i_size_read(inode); > > + if (!isize) > > + goto out; > > + > > + offset = *ppos & ~HPAGE_MASK; > > + end_index = (isize - 1) >> HPAGE_SHIFT; > > + for (;;) { > > + struct page *page; > > + int nr, ret; > > + > > + /* nr is the maximum number of bytes to copy from this page */ > > + nr = HPAGE_SIZE; > > + if (index >= end_index) { > > + if (index > end_index) > > + goto out; > > + nr = ((isize - 1) & ~HPAGE_MASK) + 1; > > + if (nr <= offset) { > > + goto out; > > + } > > + } > > + nr = nr - offset; > > + > > + /* Find the page */ > > + page = find_get_page(mapping, index); > > + if (unlikely(page == NULL)) { > > + /* > > +* We can't find the page in the cache - bail out ? > > +*/ > > + goto out; > > + } > > + /* > > +* Ok, we have the page, copy it to user space buffer. > > +*/ > > + ret = hugetlbfs_read_actor(page, offset, buf, len, nr); > > + if (ret < 0) { > > + retval = retval ? : ret; > > + goto out; > > Missing put_page(). Yes. Thanks for
Re: [PATCH] hugetlbfs read() support
On Fri, 13 Jul 2007 18:23:33 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote: > Hi Andrew, > > Here is the patch to support read() for hugetlbfs, needed to get > oprofile working on executables backed by largepages. > > If you plan to consider Christoph Lameter's pagecache cleanup patches, > I will re-write this. Otherwise, please consider this for -mm. > > Thanks, > Badari > > Support for reading from hugetlbfs files. libhugetlbfs lets application > text/data to be placed in large pages. When we do that, oprofile doesn't > work - since libbfd tries to read from it. > > This code is very similar to what do_generic_mapping_read() does, but > I can't use it since it has PAGE_CACHE_SIZE assumptions. > > Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> > Acked-by: William Irwin <[EMAIL PROTECTED]> > Tested-by: Nishanth Aravamudan <[EMAIL PROTECTED]> > > fs/hugetlbfs/inode.c | 113 > +++ > 1 file changed, 113 insertions(+) > > Index: linux-2.6.22/fs/hugetlbfs/inode.c > === > --- linux-2.6.22.orig/fs/hugetlbfs/inode.c2007-07-08 16:32:17.0 > -0700 > +++ linux-2.6.22/fs/hugetlbfs/inode.c 2007-07-13 19:24:36.0 -0700 > @@ -156,6 +156,118 @@ full_search: > } > #endif > > +static int > +hugetlbfs_read_actor(struct page *page, unsigned long offset, > + char __user *buf, unsigned long count, > + unsigned long size) > +{ > + char *kaddr; > + unsigned long left, copied = 0; > + int i, chunksize; > + > + if (size > count) > + size = count; > + > + /* Find which 4k chunk and offset with in that chunk */ > + i = offset >> PAGE_CACHE_SHIFT; > + offset = offset & ~PAGE_CACHE_MASK; > + > + while (size) { > + chunksize = PAGE_CACHE_SIZE; > + if (offset) > + chunksize -= offset; > + if (chunksize > size) > + chunksize = size; > + kaddr = kmap(&page[i]); > + left = __copy_to_user(buf, kaddr + offset, chunksize); > + kunmap(&page[i]); > + if (left) { > + copied += (chunksize - left); > + break; > + } > + offset = 0; > + size -= chunksize; > + buf += chunksize; > + copied += chunksize; > + i++; > + } > + return copied ? copied : -EFAULT; > +} This returns -EFAULT when asked to read zero bytes. The caller prevents that, but it's a little bit ugly. Livable with. > +/* > + * Support for read() - Find the page attached to f_mapping and copy out the > + * data. Its *very* similar to do_generic_mapping_read(), we can't use that > + * since it has PAGE_CACHE_SIZE assumptions. > + */ > +ssize_t > +hugetlbfs_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos) > +{ > + struct address_space *mapping = filp->f_mapping; > + struct inode *inode = mapping->host; > + unsigned long index = *ppos >> HPAGE_SHIFT; > + unsigned long end_index; > + loff_t isize; > + unsigned long offset; > + ssize_t retval = 0; > + > + /* validate length */ > + if (len == 0) > + goto out; > + > + isize = i_size_read(inode); > + if (!isize) > + goto out; > + > + offset = *ppos & ~HPAGE_MASK; > + end_index = (isize - 1) >> HPAGE_SHIFT; > + for (;;) { > + struct page *page; > + int nr, ret; > + > + /* nr is the maximum number of bytes to copy from this page */ > + nr = HPAGE_SIZE; > + if (index >= end_index) { > + if (index > end_index) > + goto out; > + nr = ((isize - 1) & ~HPAGE_MASK) + 1; > + if (nr <= offset) { > + goto out; > + } > + } > + nr = nr - offset; > + > + /* Find the page */ > + page = find_get_page(mapping, index); > + if (unlikely(page == NULL)) { > + /* > + * We can't find the page in the cache - bail out ? > + */ > + goto out; > + } > + /* > + * Ok, we have the page, copy it to user space buffer. > + */ > + ret = hugetlbfs_read_actor(page, offset, buf, len, nr); > + if (ret < 0) { > + retval = retval ? : ret; > + goto out; Missing put_page(). > + } > + > + offset += ret; > + retval += ret; > + len -= ret; > + index += offset >> HPAGE_SHIFT; > + offset &= ~HPAGE_MASK; > + > + page_cache_release(page); > + if (ret == nr && len) > +
[PATCH] hugetlbfs read() support
Hi Andrew, Here is the patch to support read() for hugetlbfs, needed to get oprofile working on executables backed by largepages. If you plan to consider Christoph Lameter's pagecache cleanup patches, I will re-write this. Otherwise, please consider this for -mm. Thanks, Badari Support for reading from hugetlbfs files. libhugetlbfs lets application text/data to be placed in large pages. When we do that, oprofile doesn't work - since libbfd tries to read from it. This code is very similar to what do_generic_mapping_read() does, but I can't use it since it has PAGE_CACHE_SIZE assumptions. Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> Acked-by: William Irwin <[EMAIL PROTECTED]> Tested-by: Nishanth Aravamudan <[EMAIL PROTECTED]> fs/hugetlbfs/inode.c | 113 +++ 1 file changed, 113 insertions(+) Index: linux-2.6.22/fs/hugetlbfs/inode.c === --- linux-2.6.22.orig/fs/hugetlbfs/inode.c 2007-07-08 16:32:17.0 -0700 +++ linux-2.6.22/fs/hugetlbfs/inode.c 2007-07-13 19:24:36.0 -0700 @@ -156,6 +156,118 @@ full_search: } #endif +static int +hugetlbfs_read_actor(struct page *page, unsigned long offset, + char __user *buf, unsigned long count, + unsigned long size) +{ + char *kaddr; + unsigned long left, copied = 0; + int i, chunksize; + + if (size > count) + size = count; + + /* Find which 4k chunk and offset with in that chunk */ + i = offset >> PAGE_CACHE_SHIFT; + offset = offset & ~PAGE_CACHE_MASK; + + while (size) { + chunksize = PAGE_CACHE_SIZE; + if (offset) + chunksize -= offset; + if (chunksize > size) + chunksize = size; + kaddr = kmap(&page[i]); + left = __copy_to_user(buf, kaddr + offset, chunksize); + kunmap(&page[i]); + if (left) { + copied += (chunksize - left); + break; + } + offset = 0; + size -= chunksize; + buf += chunksize; + copied += chunksize; + i++; + } + return copied ? copied : -EFAULT; +} + +/* + * Support for read() - Find the page attached to f_mapping and copy out the + * data. Its *very* similar to do_generic_mapping_read(), we can't use that + * since it has PAGE_CACHE_SIZE assumptions. + */ +ssize_t +hugetlbfs_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos) +{ + struct address_space *mapping = filp->f_mapping; + struct inode *inode = mapping->host; + unsigned long index = *ppos >> HPAGE_SHIFT; + unsigned long end_index; + loff_t isize; + unsigned long offset; + ssize_t retval = 0; + + /* validate length */ + if (len == 0) + goto out; + + isize = i_size_read(inode); + if (!isize) + goto out; + + offset = *ppos & ~HPAGE_MASK; + end_index = (isize - 1) >> HPAGE_SHIFT; + for (;;) { + struct page *page; + int nr, ret; + + /* nr is the maximum number of bytes to copy from this page */ + nr = HPAGE_SIZE; + if (index >= end_index) { + if (index > end_index) + goto out; + nr = ((isize - 1) & ~HPAGE_MASK) + 1; + if (nr <= offset) { + goto out; + } + } + nr = nr - offset; + + /* Find the page */ + page = find_get_page(mapping, index); + if (unlikely(page == NULL)) { + /* +* We can't find the page in the cache - bail out ? +*/ + goto out; + } + /* +* Ok, we have the page, copy it to user space buffer. +*/ + ret = hugetlbfs_read_actor(page, offset, buf, len, nr); + if (ret < 0) { + retval = retval ? : ret; + goto out; + } + + offset += ret; + retval += ret; + len -= ret; + index += offset >> HPAGE_SHIFT; + offset &= ~HPAGE_MASK; + + page_cache_release(page); + if (ret == nr && len) + continue; + goto out; + } +out: + return retval; +} + /* * Read a page. Again trivial. If it didn't already exist * in the page cache, it is zero-filled. @@ -560,6 +672,7 @@ static void init_once(void *foo, struct } const struct file_operations hugetlbfs_file_operations = { + .read