Re: [PATCH] hugetlbfs read() support

2007-07-30 Thread dean gaudet
On Thu, 19 Jul 2007, Bill Irwin wrote:

> On Thu, Jul 19, 2007 at 10:07:59AM -0700, Nishanth Aravamudan wrote:
> > But I do think a second reason to do this is to make hugetlbfs behave
> > like a normal fs -- that is read(), write(), etc. work on files in the
> > mountpoint. But that is simply my opinion.
> 
> Mine as well.

ditto.  here's a few other things i've run into recently:

it should be possible to use cp(1) to load large datasets into a 
hugetlbfs.

it should be possible to use ftruncate() on hugetlbfs files.  (on a tmpfs 
it's req'd to extend the file before mmaping... on hugetlbfs it returns 
EINVAL or somesuch and mmap just magically extends files.)

it should be possible to statfs() and get usage info... this works only if 
you mount with size=N.

-dean




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-23 Thread Nishanth Aravamudan
On 20.07.2007 [14:47:31 +1000], Nick Piggin wrote:
> Nishanth Aravamudan wrote:
> >On 19.07.2007 [09:58:50 -0700], Andrew Morton wrote:
> >
> >>On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> 
> >>wrote:
> >>
> >>
> >+}
> >+
> >+offset += ret;
> >+retval += ret;
> >+len -= ret;
> >+index += offset >> HPAGE_SHIFT;
> >+offset &= ~HPAGE_MASK;
> >+
> >+page_cache_release(page);
> >+if (ret == nr && len)
> >+continue;
> >+goto out;
> >+}
> >+out:
> >+return retval;
> >+}
> 
> This code doesn't have all the ghastly tricks which we deploy to
> handle concurrent truncate.
> >>>
> >>>Do I need to ? Baaahh!!  I don't want to deal with them. 
> >>
> >>Nick, can you think of any serious consequences of a read/truncate
> >>race in there?  I can't..
> >>
> >>
> >>>All I want is a simple read() to get my oprofile working.  Please
> >>>advise.
> >>
> >>Did you consider changing oprofile userspace to read the executable
> >>with mmap?
> >
> >
> >It's not actually oprofile's code, though, it's libbfd (used by
> >oprofile). And it works fine (presumably) for other binaries.
> 
> So... what's the problem with changing it? The fact that it is a
> library doesn't really make a difference except that you'll also help
> everyone else who links with it.

Well, I'm more concerned about testing that change libbfd is rather core
code and used in a number of places. Also, libbfd's current code 'just
works' for every other filesystem concerned, or I'd expect it would have
been changed to mmap() before. I'm also terrified of binutils code :)
I'm also not sure who I'm 'helping', exactly by changing it, beyond
users of libhugetlbfs and OProfile, who are equally helped by this
kernel patch (which, again, also has the added benefit of making
hugetlbfs appear to be more like a normal filesystem).

> It won't break backwards compatibility, and it will work on older
> kernels...

Fair enough. I'm looking into it, but I can't make any promises on
timelines.

Thanks,
Nish

-- 
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-20 Thread Nick Piggin

(sorry if this is a resend... something bad seems to have happened to me)

Andrew Morton wrote:

On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:



This code doesn't have all the ghastly tricks which we deploy to handle
concurrent truncate.


Do I need to ? Baaahh!!  I don't want to deal with them. 



Nick, can you think of any serious consequences of a read/truncate race in
there?  I can't..


As it doesn't allow writes, then I _think_ it should be OK. If you
ever did want to add write(2) support, then you would have transient
zeroes problems.

But I'm not completely sure.. we've had a lot of (and still have
some known and probably unknown) bugs just in that single
generic_mapping_read function, most of which are due to our rabid
aversion to doing any locking whatsoever there.

So why not just hold i_mutex around the whole thing to be safe?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-20 Thread Nick Piggin

Andrew Morton wrote:

On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:



This code doesn't have all the ghastly tricks which we deploy to handle
concurrent truncate.


Do I need to ? Baaahh!!  I don't want to deal with them. 



Nick, can you think of any serious consequences of a read/truncate race in
there?  I can't..


As it doesn't allow writes, then I _think_ it should be OK. If you
ever did want to add write(2) support, then you would have transient
zeroes problems.

But I'm not completely sure.. we've had a lot of (and still have
some known and probably unknown) bugs just in that single
generic_mapping_read function, most of which are due to our rabid
aversion to doing any locking whatsoever there.

So why not just hold i_mutex around the whole thing to be safe?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-20 Thread Nick Piggin

Nishanth Aravamudan wrote:

On 19.07.2007 [09:58:50 -0700], Andrew Morton wrote:


On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:



+   }
+
+   offset += ret;
+   retval += ret;
+   len -= ret;
+   index += offset >> HPAGE_SHIFT;
+   offset &= ~HPAGE_MASK;
+
+   page_cache_release(page);
+   if (ret == nr && len)
+   continue;
+   goto out;
+   }
+out:
+   return retval;
+}


This code doesn't have all the ghastly tricks which we deploy to
handle concurrent truncate.


Do I need to ? Baaahh!!  I don't want to deal with them. 


Nick, can you think of any serious consequences of a read/truncate
race in there?  I can't..



All I want is a simple read() to get my oprofile working.  Please
advise.


Did you consider changing oprofile userspace to read the executable
with mmap?



It's not actually oprofile's code, though, it's libbfd (used by
oprofile). And it works fine (presumably) for other binaries.


So... what's the problem with changing it? The fact that it is a
library doesn't really make a difference except that you'll also
help everyone else who links with it.

It won't break backwards compatibility, and it will work on older
kernels...

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-20 Thread Badari Pulavarty
On Fri, 2007-07-20 at 14:29 +1000, Nick Piggin wrote:
> Andrew Morton wrote:
> > On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> 
> > wrote:
> > 
> > 
> + }
> +
> + offset += ret;
> + retval += ret;
> + len -= ret;
> + index += offset >> HPAGE_SHIFT;
> + offset &= ~HPAGE_MASK;
> +
> + page_cache_release(page);
> + if (ret == nr && len)
> + continue;
> + goto out;
> + }
> +out:
> + return retval;
> +}
> >>>
> >>>This code doesn't have all the ghastly tricks which we deploy to handle
> >>>concurrent truncate.
> >>
> >>Do I need to ? Baaahh!!  I don't want to deal with them. 
> > 
> > 
> > Nick, can you think of any serious consequences of a read/truncate race in
> > there?  I can't..
> 
> As it doesn't allow writes, then I _think_ it should be OK. If you
> ever did want to add write(2) support, then you would have transient
> zeroes problems.

I have no plans to add write() support - unless there is real reason
for doing so.

> 
> But why not just hold i_mutex around the whole thing just to be safe?

Yeah. I can do that, just to be safe for future..

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-20 Thread Nick Piggin

Andrew Morton wrote:

On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:



+   }
+
+   offset += ret;
+   retval += ret;
+   len -= ret;
+   index += offset >> HPAGE_SHIFT;
+   offset &= ~HPAGE_MASK;
+
+   page_cache_release(page);
+   if (ret == nr && len)
+   continue;
+   goto out;
+   }
+out:
+   return retval;
+}


This code doesn't have all the ghastly tricks which we deploy to handle
concurrent truncate.


Do I need to ? Baaahh!!  I don't want to deal with them. 



Nick, can you think of any serious consequences of a read/truncate race in
there?  I can't..


As it doesn't allow writes, then I _think_ it should be OK. If you
ever did want to add write(2) support, then you would have transient
zeroes problems.

But why not just hold i_mutex around the whole thing just to be safe?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-19 Thread Bill Irwin
On Thu, Jul 19, 2007 at 10:07:59AM -0700, Nishanth Aravamudan wrote:
> But I do think a second reason to do this is to make hugetlbfs behave
> like a normal fs -- that is read(), write(), etc. work on files in the
> mountpoint. But that is simply my opinion.

Mine as well.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-19 Thread Nishanth Aravamudan
On 19.07.2007 [09:58:50 -0700], Andrew Morton wrote:
> On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> 
> > > > +   }
> > > > +
> > > > +   offset += ret;
> > > > +   retval += ret;
> > > > +   len -= ret;
> > > > +   index += offset >> HPAGE_SHIFT;
> > > > +   offset &= ~HPAGE_MASK;
> > > > +
> > > > +   page_cache_release(page);
> > > > +   if (ret == nr && len)
> > > > +   continue;
> > > > +   goto out;
> > > > +   }
> > > > +out:
> > > > +   return retval;
> > > > +}
> > > 
> > > This code doesn't have all the ghastly tricks which we deploy to
> > > handle concurrent truncate.
> > 
> > Do I need to ? Baaahh!!  I don't want to deal with them. 
> 
> Nick, can you think of any serious consequences of a read/truncate
> race in there?  I can't..
> 
> > All I want is a simple read() to get my oprofile working.  Please
> > advise.
> 
> Did you consider changing oprofile userspace to read the executable
> with mmap?

It's not actually oprofile's code, though, it's libbfd (used by
oprofile). And it works fine (presumably) for other binaries. Just not
for libhugetlbfs-relinked binaries because hugetlbfs doesn't behave like
a normal ramfs (perhaps it shouldn't, but that's a different argument).

But I do think a second reason to do this is to make hugetlbfs behave
like a normal fs -- that is read(), write(), etc. work on files in the
mountpoint. But that is simply my opinion.

Thanks,
Nish

-- 
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-19 Thread Andrew Morton
On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:

> > > + }
> > > +
> > > + offset += ret;
> > > + retval += ret;
> > > + len -= ret;
> > > + index += offset >> HPAGE_SHIFT;
> > > + offset &= ~HPAGE_MASK;
> > > +
> > > + page_cache_release(page);
> > > + if (ret == nr && len)
> > > + continue;
> > > + goto out;
> > > + }
> > > +out:
> > > + return retval;
> > > +}
> > 
> > This code doesn't have all the ghastly tricks which we deploy to handle
> > concurrent truncate.
> 
> Do I need to ? Baaahh!!  I don't want to deal with them. 

Nick, can you think of any serious consequences of a read/truncate race in
there?  I can't..

> All I want is a simple read() to get my oprofile working.
> Please advise.

Did you consider changing oprofile userspace to read the executable with
mmap?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-19 Thread Badari Pulavarty
On Wed, 2007-07-18 at 22:19 -0700, Andrew Morton wrote:
> On Fri, 13 Jul 2007 18:23:33 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> 
> > Hi Andrew,
> > 
> > Here is the patch to support read() for hugetlbfs, needed to get
> > oprofile working on executables backed by largepages. 
> > 
> > If you plan to consider Christoph Lameter's pagecache cleanup patches,
> > I will re-write this. Otherwise, please consider this for -mm.
> > 
> > Thanks,
> > Badari
> > 
> > Support for reading from hugetlbfs files. libhugetlbfs lets application
> > text/data to be placed in large pages. When we do that, oprofile doesn't
> > work - since libbfd tries to read from it.
> > 
> > This code is very similar to what do_generic_mapping_read() does, but
> > I can't use it since it has PAGE_CACHE_SIZE assumptions.
> > 
> > Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
> > Acked-by: William Irwin <[EMAIL PROTECTED]>
> > Tested-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
> > 
> >  fs/hugetlbfs/inode.c |  113 
> > +++
> >  1 file changed, 113 insertions(+)
> > 
> > Index: linux-2.6.22/fs/hugetlbfs/inode.c
> > ===
> > --- linux-2.6.22.orig/fs/hugetlbfs/inode.c  2007-07-08 16:32:17.0 
> > -0700
> > +++ linux-2.6.22/fs/hugetlbfs/inode.c   2007-07-13 19:24:36.0 
> > -0700
> > @@ -156,6 +156,118 @@ full_search:
> >  }
> >  #endif
> >  
> > +static int
> > +hugetlbfs_read_actor(struct page *page, unsigned long offset,
> > +   char __user *buf, unsigned long count,
> > +   unsigned long size)
> > +{
> > +   char *kaddr;
> > +   unsigned long left, copied = 0;
> > +   int i, chunksize;
> > +
> > +   if (size > count)
> > +   size = count;
> > +
> > +   /* Find which 4k chunk and offset with in that chunk */
> > +   i = offset >> PAGE_CACHE_SHIFT;
> > +   offset = offset & ~PAGE_CACHE_MASK;
> > +
> > +   while (size) {
> > +   chunksize = PAGE_CACHE_SIZE;
> > +   if (offset)
> > +   chunksize -= offset;
> > +   if (chunksize > size)
> > +   chunksize = size;
> > +   kaddr = kmap(&page[i]);
> > +   left = __copy_to_user(buf, kaddr + offset, chunksize);
> > +   kunmap(&page[i]);
> > +   if (left) {
> > +   copied += (chunksize - left);
> > +   break;
> > +   }
> > +   offset = 0;
> > +   size -= chunksize;
> > +   buf += chunksize;
> > +   copied += chunksize;
> > +   i++;
> > +   }
> > +   return copied ? copied : -EFAULT;
> > +}
> 
> This returns -EFAULT when asked to read zero bytes.  The caller prevents
> that, but it's a little bit ugly.  Livable with.

I can fix that, but I didn't want to come here if length == 0 - so
took a shortcut.

> 
> > +/*
> > + * Support for read() - Find the page attached to f_mapping and copy out 
> > the
> > + * data. Its *very* similar to do_generic_mapping_read(), we can't use that
> > + * since it has PAGE_CACHE_SIZE assumptions.
> > + */
> > +ssize_t
> > +hugetlbfs_read(struct file *filp, char __user *buf, size_t len, loff_t 
> > *ppos)
> > +{
> > +   struct address_space *mapping = filp->f_mapping;
> > +   struct inode *inode = mapping->host;
> > +   unsigned long index = *ppos >> HPAGE_SHIFT;
> > +   unsigned long end_index;
> > +   loff_t isize;
> > +   unsigned long offset;
> > +   ssize_t retval = 0;
> > +
> > +   /* validate length */
> > +   if (len == 0)
> > +   goto out;
> > +
> > +   isize = i_size_read(inode);
> > +   if (!isize)
> > +   goto out;
> > +
> > +   offset = *ppos & ~HPAGE_MASK;
> > +   end_index = (isize - 1) >> HPAGE_SHIFT;
> > +   for (;;) {
> > +   struct page *page;
> > +   int nr, ret;
> > +
> > +   /* nr is the maximum number of bytes to copy from this page */
> > +   nr = HPAGE_SIZE;
> > +   if (index >= end_index) {
> > +   if (index > end_index)
> > +   goto out;
> > +   nr = ((isize - 1) & ~HPAGE_MASK) + 1;
> > +   if (nr <= offset) {
> > +   goto out;
> > +   }
> > +   }
> > +   nr = nr - offset;
> > +
> > +   /* Find the page */
> > +   page = find_get_page(mapping, index);
> > +   if (unlikely(page == NULL)) {
> > +   /*
> > +* We can't find the page in the cache - bail out ?
> > +*/
> > +   goto out;
> > +   }
> > +   /*
> > +* Ok, we have the page, copy it to user space buffer.
> > +*/
> > +   ret = hugetlbfs_read_actor(page, offset, buf, len, nr);
> > +   if (ret < 0) {
> > +   retval = retval ? : ret;
> > +   goto out;
> 
> Missing put_page().

Yes. Thanks for

Re: [PATCH] hugetlbfs read() support

2007-07-18 Thread Andrew Morton
On Fri, 13 Jul 2007 18:23:33 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:

> Hi Andrew,
> 
> Here is the patch to support read() for hugetlbfs, needed to get
> oprofile working on executables backed by largepages. 
> 
> If you plan to consider Christoph Lameter's pagecache cleanup patches,
> I will re-write this. Otherwise, please consider this for -mm.
> 
> Thanks,
> Badari
> 
> Support for reading from hugetlbfs files. libhugetlbfs lets application
> text/data to be placed in large pages. When we do that, oprofile doesn't
> work - since libbfd tries to read from it.
> 
> This code is very similar to what do_generic_mapping_read() does, but
> I can't use it since it has PAGE_CACHE_SIZE assumptions.
> 
> Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
> Acked-by: William Irwin <[EMAIL PROTECTED]>
> Tested-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
> 
>  fs/hugetlbfs/inode.c |  113 
> +++
>  1 file changed, 113 insertions(+)
> 
> Index: linux-2.6.22/fs/hugetlbfs/inode.c
> ===
> --- linux-2.6.22.orig/fs/hugetlbfs/inode.c2007-07-08 16:32:17.0 
> -0700
> +++ linux-2.6.22/fs/hugetlbfs/inode.c 2007-07-13 19:24:36.0 -0700
> @@ -156,6 +156,118 @@ full_search:
>  }
>  #endif
>  
> +static int
> +hugetlbfs_read_actor(struct page *page, unsigned long offset,
> + char __user *buf, unsigned long count,
> + unsigned long size)
> +{
> + char *kaddr;
> + unsigned long left, copied = 0;
> + int i, chunksize;
> +
> + if (size > count)
> + size = count;
> +
> + /* Find which 4k chunk and offset with in that chunk */
> + i = offset >> PAGE_CACHE_SHIFT;
> + offset = offset & ~PAGE_CACHE_MASK;
> +
> + while (size) {
> + chunksize = PAGE_CACHE_SIZE;
> + if (offset)
> + chunksize -= offset;
> + if (chunksize > size)
> + chunksize = size;
> + kaddr = kmap(&page[i]);
> + left = __copy_to_user(buf, kaddr + offset, chunksize);
> + kunmap(&page[i]);
> + if (left) {
> + copied += (chunksize - left);
> + break;
> + }
> + offset = 0;
> + size -= chunksize;
> + buf += chunksize;
> + copied += chunksize;
> + i++;
> + }
> + return copied ? copied : -EFAULT;
> +}

This returns -EFAULT when asked to read zero bytes.  The caller prevents
that, but it's a little bit ugly.  Livable with.

> +/*
> + * Support for read() - Find the page attached to f_mapping and copy out the
> + * data. Its *very* similar to do_generic_mapping_read(), we can't use that
> + * since it has PAGE_CACHE_SIZE assumptions.
> + */
> +ssize_t
> +hugetlbfs_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
> +{
> + struct address_space *mapping = filp->f_mapping;
> + struct inode *inode = mapping->host;
> + unsigned long index = *ppos >> HPAGE_SHIFT;
> + unsigned long end_index;
> + loff_t isize;
> + unsigned long offset;
> + ssize_t retval = 0;
> +
> + /* validate length */
> + if (len == 0)
> + goto out;
> +
> + isize = i_size_read(inode);
> + if (!isize)
> + goto out;
> +
> + offset = *ppos & ~HPAGE_MASK;
> + end_index = (isize - 1) >> HPAGE_SHIFT;
> + for (;;) {
> + struct page *page;
> + int nr, ret;
> +
> + /* nr is the maximum number of bytes to copy from this page */
> + nr = HPAGE_SIZE;
> + if (index >= end_index) {
> + if (index > end_index)
> + goto out;
> + nr = ((isize - 1) & ~HPAGE_MASK) + 1;
> + if (nr <= offset) {
> + goto out;
> + }
> + }
> + nr = nr - offset;
> +
> + /* Find the page */
> + page = find_get_page(mapping, index);
> + if (unlikely(page == NULL)) {
> + /*
> +  * We can't find the page in the cache - bail out ?
> +  */
> + goto out;
> + }
> + /*
> +  * Ok, we have the page, copy it to user space buffer.
> +  */
> + ret = hugetlbfs_read_actor(page, offset, buf, len, nr);
> + if (ret < 0) {
> + retval = retval ? : ret;
> + goto out;

Missing put_page().

> + }
> +
> + offset += ret;
> + retval += ret;
> + len -= ret;
> + index += offset >> HPAGE_SHIFT;
> + offset &= ~HPAGE_MASK;
> +
> + page_cache_release(page);
> + if (ret == nr && len)
> +

[PATCH] hugetlbfs read() support

2007-07-13 Thread Badari Pulavarty
Hi Andrew,

Here is the patch to support read() for hugetlbfs, needed to get
oprofile working on executables backed by largepages. 

If you plan to consider Christoph Lameter's pagecache cleanup patches,
I will re-write this. Otherwise, please consider this for -mm.

Thanks,
Badari

Support for reading from hugetlbfs files. libhugetlbfs lets application
text/data to be placed in large pages. When we do that, oprofile doesn't
work - since libbfd tries to read from it.

This code is very similar to what do_generic_mapping_read() does, but
I can't use it since it has PAGE_CACHE_SIZE assumptions.

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
Acked-by: William Irwin <[EMAIL PROTECTED]>
Tested-by: Nishanth Aravamudan <[EMAIL PROTECTED]>

 fs/hugetlbfs/inode.c |  113 +++
 1 file changed, 113 insertions(+)

Index: linux-2.6.22/fs/hugetlbfs/inode.c
===
--- linux-2.6.22.orig/fs/hugetlbfs/inode.c  2007-07-08 16:32:17.0 
-0700
+++ linux-2.6.22/fs/hugetlbfs/inode.c   2007-07-13 19:24:36.0 -0700
@@ -156,6 +156,118 @@ full_search:
 }
 #endif
 
+static int
+hugetlbfs_read_actor(struct page *page, unsigned long offset,
+   char __user *buf, unsigned long count,
+   unsigned long size)
+{
+   char *kaddr;
+   unsigned long left, copied = 0;
+   int i, chunksize;
+
+   if (size > count)
+   size = count;
+
+   /* Find which 4k chunk and offset with in that chunk */
+   i = offset >> PAGE_CACHE_SHIFT;
+   offset = offset & ~PAGE_CACHE_MASK;
+
+   while (size) {
+   chunksize = PAGE_CACHE_SIZE;
+   if (offset)
+   chunksize -= offset;
+   if (chunksize > size)
+   chunksize = size;
+   kaddr = kmap(&page[i]);
+   left = __copy_to_user(buf, kaddr + offset, chunksize);
+   kunmap(&page[i]);
+   if (left) {
+   copied += (chunksize - left);
+   break;
+   }
+   offset = 0;
+   size -= chunksize;
+   buf += chunksize;
+   copied += chunksize;
+   i++;
+   }
+   return copied ? copied : -EFAULT;
+}
+
+/*
+ * Support for read() - Find the page attached to f_mapping and copy out the
+ * data. Its *very* similar to do_generic_mapping_read(), we can't use that
+ * since it has PAGE_CACHE_SIZE assumptions.
+ */
+ssize_t
+hugetlbfs_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
+{
+   struct address_space *mapping = filp->f_mapping;
+   struct inode *inode = mapping->host;
+   unsigned long index = *ppos >> HPAGE_SHIFT;
+   unsigned long end_index;
+   loff_t isize;
+   unsigned long offset;
+   ssize_t retval = 0;
+
+   /* validate length */
+   if (len == 0)
+   goto out;
+
+   isize = i_size_read(inode);
+   if (!isize)
+   goto out;
+
+   offset = *ppos & ~HPAGE_MASK;
+   end_index = (isize - 1) >> HPAGE_SHIFT;
+   for (;;) {
+   struct page *page;
+   int nr, ret;
+
+   /* nr is the maximum number of bytes to copy from this page */
+   nr = HPAGE_SIZE;
+   if (index >= end_index) {
+   if (index > end_index)
+   goto out;
+   nr = ((isize - 1) & ~HPAGE_MASK) + 1;
+   if (nr <= offset) {
+   goto out;
+   }
+   }
+   nr = nr - offset;
+
+   /* Find the page */
+   page = find_get_page(mapping, index);
+   if (unlikely(page == NULL)) {
+   /*
+* We can't find the page in the cache - bail out ?
+*/
+   goto out;
+   }
+   /*
+* Ok, we have the page, copy it to user space buffer.
+*/
+   ret = hugetlbfs_read_actor(page, offset, buf, len, nr);
+   if (ret < 0) {
+   retval = retval ? : ret;
+   goto out;
+   }
+
+   offset += ret;
+   retval += ret;
+   len -= ret;
+   index += offset >> HPAGE_SHIFT;
+   offset &= ~HPAGE_MASK;
+
+   page_cache_release(page);
+   if (ret == nr && len)
+   continue;
+   goto out;
+   }
+out:
+   return retval;
+}
+
 /*
  * Read a page. Again trivial. If it didn't already exist
  * in the page cache, it is zero-filled.
@@ -560,6 +672,7 @@ static void init_once(void *foo, struct 
 }
 
 const struct file_operations hugetlbfs_file_operations = {
+   .read