Re: [patch] rfc: introduce /dev/hugetlb

2007-03-25 Thread Arjan van de Ven

> But libraries are hard, for a number of distributional reasons.  

I don't see why this is the case to be honest.
You can ask distros to ship your library, and if it's a sensible one,
they will. And if you can't wait, you can always bundle the library with
your application, it's really not a big deal to do that properly.

That's not a reason to make it a harder problem by tying a library to
the kernel source... in fact I know enterprise distros are more likely
to uprev a library than to uprev a kernel tying them together you
get the worst of both worlds

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-25 Thread Arjan van de Ven

 But libraries are hard, for a number of distributional reasons.  

I don't see why this is the case to be honest.
You can ask distros to ship your library, and if it's a sensible one,
they will. And if you can't wait, you can always bundle the library with
your application, it's really not a big deal to do that properly.

That's not a reason to make it a harder problem by tying a library to
the kernel source... in fact I know enterprise distros are more likely
to uprev a library than to uprev a kernel tying them together you
get the worst of both worlds

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-24 Thread Andrew Morton
On Sat, 24 Mar 2007 07:57:52 +0100 Sam Ravnborg <[EMAIL PROTECTED]> wrote:

> > 
> > But for non-programming reasons, we're just not there yet: people want to
> > program direct to the kernel interfaces simply because of the
> > distribution/coordination problems with libraries.  It would be nice to fix
> > that problem.
> 
> What is then needed to get a small subset of user-space in the 
> kernel-development cycle?

Someone to lead the work, mainly.  It would be a large effort, a lot of
time and email traffic.

> Maybe a topic worth to take up at LKS...

Well, perhaps.  But unless someone with suitable experience has enough time
and energy to spare to make it happen, it won't be happening.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-24 Thread Andrew Morton
On Sat, 24 Mar 2007 00:11:32 -0700 "Ken Chen" <[EMAIL PROTECTED]> wrote:

> On 3/23/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > a) Ken observes that obtaining private hugetlb memory via hugetlbfs
> >involves "fuss".
> >
> > b) the libhugetlbfs maintainers then go off and implement a no-fuss way of
> >doing this.
> 
> Hmm, what started this thread was libhugetlbfs maintainer complained
> how "fuss" it was to create private hugetlb mapping and suggested an
> even bigger kernel change with pagetable_operations API.

OK.  I wasn't paying particularly close attention.  But my rant still
stands ;)

>  The new API
> was designed with an end goal of introduce /dev/hugetlb (as one of the
> feature, they might be thinking more).  What motivated me here is to
> point out that we can achieve the same goal of having a /dev/hugetlb
> with existing hugetlbfs infrastructure and the implementation is
> relatively straightforward.  What it also buys us is a bit more
> flexibility to the end user who wants to use the interface directly.

OK.

Why is it a "fuss" to do this with hugetlbfs files, btw?

Having read back through the thread, the only substantiation I can really
see is

  The pagetable_operations API opens up possibilities to do some
  additional (and completely sane) things.  For example, I have a patch
  that alters the character device code below to make use of a hugetlb
  ZERO_PAGE.  This eliminates almost all the up-front fault time, allowing
  pages to be COW'ed only when first written to.  We cannot do things like
  this with hugetlbfs anymore because we have a set of complex semantics to
  preserve.


Why is this actually a useful feature?

What does "complex semantics to preserve" mean?


I dunno.  I see a lot of code flying around, but comparatively little
effort to describe the actual problems which we're trying to solve.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-24 Thread Ken Chen

On 3/23/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

a) Ken observes that obtaining private hugetlb memory via hugetlbfs
   involves "fuss".

b) the libhugetlbfs maintainers then go off and implement a no-fuss way of
   doing this.


Hmm, what started this thread was libhugetlbfs maintainer complained
how "fuss" it was to create private hugetlb mapping and suggested an
even bigger kernel change with pagetable_operations API.  The new API
was designed with an end goal of introduce /dev/hugetlb (as one of the
feature, they might be thinking more).  What motivated me here is to
point out that we can achieve the same goal of having a /dev/hugetlb
with existing hugetlbfs infrastructure and the implementation is
relatively straightforward.  What it also buys us is a bit more
flexibility to the end user who wants to use the interface directly.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-24 Thread Ken Chen

On 3/23/07, Andrew Morton [EMAIL PROTECTED] wrote:

a) Ken observes that obtaining private hugetlb memory via hugetlbfs
   involves fuss.

b) the libhugetlbfs maintainers then go off and implement a no-fuss way of
   doing this.


Hmm, what started this thread was libhugetlbfs maintainer complained
how fuss it was to create private hugetlb mapping and suggested an
even bigger kernel change with pagetable_operations API.  The new API
was designed with an end goal of introduce /dev/hugetlb (as one of the
feature, they might be thinking more).  What motivated me here is to
point out that we can achieve the same goal of having a /dev/hugetlb
with existing hugetlbfs infrastructure and the implementation is
relatively straightforward.  What it also buys us is a bit more
flexibility to the end user who wants to use the interface directly.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-24 Thread Andrew Morton
On Sat, 24 Mar 2007 00:11:32 -0700 Ken Chen [EMAIL PROTECTED] wrote:

 On 3/23/07, Andrew Morton [EMAIL PROTECTED] wrote:
  a) Ken observes that obtaining private hugetlb memory via hugetlbfs
 involves fuss.
 
  b) the libhugetlbfs maintainers then go off and implement a no-fuss way of
 doing this.
 
 Hmm, what started this thread was libhugetlbfs maintainer complained
 how fuss it was to create private hugetlb mapping and suggested an
 even bigger kernel change with pagetable_operations API.

OK.  I wasn't paying particularly close attention.  But my rant still
stands ;)

  The new API
 was designed with an end goal of introduce /dev/hugetlb (as one of the
 feature, they might be thinking more).  What motivated me here is to
 point out that we can achieve the same goal of having a /dev/hugetlb
 with existing hugetlbfs infrastructure and the implementation is
 relatively straightforward.  What it also buys us is a bit more
 flexibility to the end user who wants to use the interface directly.

OK.

Why is it a fuss to do this with hugetlbfs files, btw?

Having read back through the thread, the only substantiation I can really
see is

  The pagetable_operations API opens up possibilities to do some
  additional (and completely sane) things.  For example, I have a patch
  that alters the character device code below to make use of a hugetlb
  ZERO_PAGE.  This eliminates almost all the up-front fault time, allowing
  pages to be COW'ed only when first written to.  We cannot do things like
  this with hugetlbfs anymore because we have a set of complex semantics to
  preserve.


Why is this actually a useful feature?

What does complex semantics to preserve mean?


I dunno.  I see a lot of code flying around, but comparatively little
effort to describe the actual problems which we're trying to solve.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-24 Thread Andrew Morton
On Sat, 24 Mar 2007 07:57:52 +0100 Sam Ravnborg [EMAIL PROTECTED] wrote:

  
  But for non-programming reasons, we're just not there yet: people want to
  program direct to the kernel interfaces simply because of the
  distribution/coordination problems with libraries.  It would be nice to fix
  that problem.
 
 What is then needed to get a small subset of user-space in the 
 kernel-development cycle?

Someone to lead the work, mainly.  It would be a large effort, a lot of
time and email traffic.

 Maybe a topic worth to take up at LKS...

Well, perhaps.  But unless someone with suitable experience has enough time
and energy to spare to make it happen, it won't be happening.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Sam Ravnborg
> 
> But for non-programming reasons, we're just not there yet: people want to
> program direct to the kernel interfaces simply because of the
> distribution/coordination problems with libraries.  It would be nice to fix
> that problem.

What is then needed to get a small subset of user-space in the 
kernel-development cycle?
Maybe a topic worth to take up at LKS...

The build system is anyway ready but that the smallest issue of all :-(

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Andrew Morton
On Fri, 23 Mar 2007 22:32:31 -0700 "Nish Aravamudan" <[EMAIL PROTECTED]> wrote:

> > Probably the kernel team should be maintaining, via existing processes, a
> > separate libkernel project, to fix these distributional problems.  The
> > advantage in this case is of course that our new hugetlb functionality
> > would be available to people on 2.6.18 kernels, not only on 2.6.22 and
> > later.
> 
> That sounds like a good idea. For this hugetlb stuff, though, I plan
> on simply taking advantage of /dev/hugetlb (or whatever it is called)
> if it exists, and otherwise falling back to hugetlbfs (which
> admittedly requires some admin intervention (mounting hugetlbfs,
> permissions, and such), but then again, so does using hugepages in the
> first place (either at boot-time or via /proc/sys/vm/nr_hugepages)).
> Is that what you mean by available to 2.6.18 (falling back to
> hugetlbfs) and 2.6.22 (using the chardev)?

My point is:

a) Ken observes that obtaining private hugetlb memory via hugetlbfs
   involves "fuss".

b) the libhugetlbfs maintainers then go off and implement a no-fuss way of
   doing this.

c) voila, people can now use the new no-fuss interface on older kernels.
   Whereas Ken's kernel patch would require that they upgrade to a new
   kernel.

It wasn't a vary big point ;) I'm assuming that users find that upgrading
libhugetlb is less costly than upgrading their kernel.


> > Am I wrong?
> 
> I don't think so. And hugepages are hard enough to use (and with
> enough architecture specific quirks) that it was worth creating
> libhugetlbfs. While having some nice features like segment remapping
> and overriding malloc, it is also meant to provide an API that is
> useful for general use of hugepages: we currently export
> gethugepagesize(), hugetlbfs_test_path() (verify a path is a valid
> hugetlbfs mount), hugetlbfs_find_path() (gives you the hugetlbfs
> mount) and hugetlbfs_unlinked_fd() (gives you an unlinked file in the
> hugetlbfs mount).
> 
> Then again, maybe I'm missing some much bigger picture here and you
> meant something completely different -- sorry for the noise in that
> case :/

You got it.

The fact that a kernel interface is "hard to use" really shouldn't be an
issue for us, because that hardness can be addressed in libraries.  Kernel
interfaces should be good, and complete, and maintainable, and etcetera. 
If that means that they end up hard to use, well, that's not necessarily a
bad thing.  I'm not sure that in all cases we want to be optimising for
ease-of-use just because libraries-are-hard.


But for non-programming reasons, we're just not there yet: people want to
program direct to the kernel interfaces simply because of the
distribution/coordination problems with libraries.  It would be nice to fix
that problem.


For a counter-example, look at futexes.  Their kernel interfaces are
*damned* hard to use.  But practically nobody is affected by that because
glibc solved the problem and programmers just use the pthread API.

More of this, please ;)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Nish Aravamudan

On 3/23/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

On Fri, 23 Mar 2007 01:44:38 -0700 "Ken Chen" <[EMAIL PROTECTED]> wrote:

> On 3/21/07, Adam Litke <[EMAIL PROTECTED]> wrote:
> > The main reason I am advocating a set of pagetable_operations is to
> > enable the development of a new hugetlb interface.  During the hugetlb
> > BOFS at OLS last year, we talked about a character device that would
> > behave like /dev/zero.  Many of the people were talking about how they
> > just wanted to create MAP_PRIVATE hugetlb mappings without all the fuss
> > about the hugetlbfs filesystem.  /dev/zero is a familiar interface for
> > getting anonymous memory so bringing that model to huge pages would make
> > programming for anonymous huge pages easier.
>
> I think we have enough infrastructure currently in hugetlbfs to
> implement what Adam wants for something like a /dev/hugetlb char
> device (except we can't afford to have a zero hugetlb page since it
> will be too costly on some arch).
>
> I really like the idea of having something similar to /dev/zero for
> hugetlb page.  So I coded it up on top of existing hugetlbfs.  The
> core change is really small and half of the patch is really just
> moving things around.  I think this at least can partially fulfill the
> goal.

Standing back and looking at this...

afaict the whole reason for this work is to provide a quick-n-easy way to
get private mappings of hugetlb pages.  With the emphasis on quick-n-easy.


I agree.


We can do the same with hugetlbfs, but that involves (horror) "fuss".


Yes.


The way to avoid "fuss" is of course to do it once, do it properly then stick
it in a library which everyone uses.


That's sort of what libhugetlbfs
(http://sourceforge.net/projects/libhugetlbfs for stable releases,
http://libhugetlbfs.ozlabs.org/ for development snapshots/git tree) is
for; while it currently only tries to abstract/provide functionality
via hugetlbfs, that's mostly because that is the only interface
available (or was, pending some sort of char dev being merged).


But libraries are hard, for a number of distributional reasons.  It is
easier for us to distribute this functionality within the kernel.  In fact,
if Linus's tree included a ./userspace/libkernel/libhugetlb/ then we'd
probably provide this functionality in there.


libhugetlbfs is available for both SLES10 and RHEL5.


This comes up regularly, and it's pretty sad.


I agree. There is simply some functionality that is *very* closely
tied to the kernel.


Probably the kernel team should be maintaining, via existing processes, a
separate libkernel project, to fix these distributional problems.  The
advantage in this case is of course that our new hugetlb functionality
would be available to people on 2.6.18 kernels, not only on 2.6.22 and
later.


That sounds like a good idea. For this hugetlb stuff, though, I plan
on simply taking advantage of /dev/hugetlb (or whatever it is called)
if it exists, and otherwise falling back to hugetlbfs (which
admittedly requires some admin intervention (mounting hugetlbfs,
permissions, and such), but then again, so does using hugepages in the
first place (either at boot-time or via /proc/sys/vm/nr_hugepages)).
Is that what you mean by available to 2.6.18 (falling back to
hugetlbfs) and 2.6.22 (using the chardev)?


Am I wrong?


I don't think so. And hugepages are hard enough to use (and with
enough architecture specific quirks) that it was worth creating
libhugetlbfs. While having some nice features like segment remapping
and overriding malloc, it is also meant to provide an API that is
useful for general use of hugepages: we currently export
gethugepagesize(), hugetlbfs_test_path() (verify a path is a valid
hugetlbfs mount), hugetlbfs_find_path() (gives you the hugetlbfs
mount) and hugetlbfs_unlinked_fd() (gives you an unlinked file in the
hugetlbfs mount).

Then again, maybe I'm missing some much bigger picture here and you
meant something completely different -- sorry for the noise in that
case :/

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Andrew Morton
On Fri, 23 Mar 2007 01:44:38 -0700 "Ken Chen" <[EMAIL PROTECTED]> wrote:

> On 3/21/07, Adam Litke <[EMAIL PROTECTED]> wrote:
> > The main reason I am advocating a set of pagetable_operations is to
> > enable the development of a new hugetlb interface.  During the hugetlb
> > BOFS at OLS last year, we talked about a character device that would
> > behave like /dev/zero.  Many of the people were talking about how they
> > just wanted to create MAP_PRIVATE hugetlb mappings without all the fuss
> > about the hugetlbfs filesystem.  /dev/zero is a familiar interface for
> > getting anonymous memory so bringing that model to huge pages would make
> > programming for anonymous huge pages easier.
> 
> I think we have enough infrastructure currently in hugetlbfs to
> implement what Adam wants for something like a /dev/hugetlb char
> device (except we can't afford to have a zero hugetlb page since it
> will be too costly on some arch).
> 
> I really like the idea of having something similar to /dev/zero for
> hugetlb page.  So I coded it up on top of existing hugetlbfs.  The
> core change is really small and half of the patch is really just
> moving things around.  I think this at least can partially fulfill the
> goal.

Standing back and looking at this...

afaict the whole reason for this work is to provide a quick-n-easy way to
get private mappings of hugetlb pages.  With the emphasis on quick-n-easy.

We can do the same with hugetlbfs, but that involves (horror) "fuss".

The way to avoid "fuss" is of course to do it once, do it properly then stick
it in a library which everyone uses.

But libraries are hard, for a number of distributional reasons.  It is
easier for us to distribute this functionality within the kernel.  In fact,
if Linus's tree included a ./userspace/libkernel/libhugetlb/ then we'd
probably provide this functionality in there.

This comes up regularly, and it's pretty sad.

Probably the kernel team should be maintaining, via existing processes, a
separate libkernel project, to fix these distributional problems.  The
advantage in this case is of course that our new hugetlb functionality
would be available to people on 2.6.18 kernels, not only on 2.6.22 and
later.

Am I wrong?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Ken Chen

On 3/23/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:

On Fri, 23 Mar 2007, William Lee Irwin III wrote:
>> Lack of compiletesting beyond x86-64 in all probability.

On Fri, Mar 23, 2007 at 03:15:55PM +, Mel Gorman wrote:
> Ok, this will go kablamo on Power then even if it compiles. I don't
> consider it a fundamental problem though. For the purposes of an RFC, it's
> grand and something that can be worked with.

He needs to un-#ifdef the prototype (which he already does), but he
needs to leave the definition under #ifdef while removing the static
qualifier. A relatively minor fixup.


Yes, sorry about that for lack of access to non-x86-64 machines.  I
needed to move the function prototype to hugetlb.h and evidently
removed the #ifdef by mistake.  I'm not going to touch this in my next
clean up patch, instead I will just declare char specific
file_operations struct in hugetlbfs and then have char device
reference it.

But nevertheless, hugetlb_get_unmapped_area function prototype  better
be in a header file somewhere.

- Ken
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Ken Chen

On 3/23/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:

I like this patch a lot, though I'm not likely to get around to testing
it today. If userspace testcode is available that would be great to see
posted so I can just boot into things and run that.


Here is the test code that I used:
(warning: x86 centric)

#include 
#include 
#include 
#include 

#define SIZE(4*1024*1024UL)

int main(void)
{
int fd;
long i;
char *addr;

fd = open("/dev/hugetlb", O_RDWR);
if (fd == -1) {
perror("open failure");
exit(1);
}

addr = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
if (addr == MAP_FAILED) {
perror("mmap failure");
exit(2);
}

for (i = 0; i < SIZE; i+=4096)
addr[i] = 1;

printf("success!\n");
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Benjamin Herrenschmidt

> -#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
> -unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long 
> addr,
> - unsigned long len, unsigned long pgoff, unsigned long flags);
> -#else
> -static unsigned long
> +unsigned long
>  hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>   unsigned long len, unsigned long pgoff, unsigned long flags)
>  {
> @@ -150,7 +145,6 @@ full_search:
>   addr = ALIGN(vma->vm_end, HPAGE_SIZE);
>   }
>  }
> -#endif

WTF ? get_unmapped_area() -has- to be arch in some platforms like
power...

I'm trying to improve the whole get_unmapped_area() to better handle
multiple constraints (cacheability, page size, ...) though I haven't
quite yet settled on an interface I like.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread William Lee Irwin III
On Fri, 23 Mar 2007, William Lee Irwin III wrote:
>> Lack of compiletesting beyond x86-64 in all probability.

On Fri, Mar 23, 2007 at 03:15:55PM +, Mel Gorman wrote:
> Ok, this will go kablamo on Power then even if it compiles. I don't 
> consider it a fundamental problem though. For the purposes of an RFC, it's 
> grand and something that can be worked with.

He needs to un-#ifdef the prototype (which he already does), but he
needs to leave the definition under #ifdef while removing the static
qualifier. A relatively minor fixup.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Mel Gorman

On Fri, 23 Mar 2007, William Lee Irwin III wrote:


On Fri, 23 Mar 2007, Ken Chen wrote:

-#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
-unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long
addr,
-   unsigned long len, unsigned long pgoff, unsigned long flags);
-#else
-static unsigned long
+unsigned long
hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags)



On Fri, Mar 23, 2007 at 03:03:57PM +, Mel Gorman wrote:

What is going on here? Why do arches not get to specify a
get_unmapped_area any more?


Lack of compiletesting beyond x86-64 in all probability.



Ok, this will go kablamo on Power then even if it compiles. I don't 
consider it a fundamental problem though. For the purposes of an RFC, it's 
grand and something that can be worked with.


--
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread William Lee Irwin III
On Fri, 23 Mar 2007, Ken Chen wrote:
> >-#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
> >-unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long 
> >addr,
> >-unsigned long len, unsigned long pgoff, unsigned long flags);
> >-#else
> >-static unsigned long
> >+unsigned long
> >hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> > unsigned long len, unsigned long pgoff, unsigned long flags)
> 
On Fri, Mar 23, 2007 at 03:03:57PM +, Mel Gorman wrote:
> What is going on here? Why do arches not get to specify a 
> get_unmapped_area any more?

Lack of compiletesting beyond x86-64 in all probability.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Mel Gorman

On Fri, 23 Mar 2007, Ken Chen wrote:


On 3/21/07, Adam Litke <[EMAIL PROTECTED]> wrote:

The main reason I am advocating a set of pagetable_operations is to
enable the development of a new hugetlb interface.  During the hugetlb
BOFS at OLS last year, we talked about a character device that would
behave like /dev/zero.  Many of the people were talking about how they
just wanted to create MAP_PRIVATE hugetlb mappings without all the fuss
about the hugetlbfs filesystem.  /dev/zero is a familiar interface for
getting anonymous memory so bringing that model to huge pages would make
programming for anonymous huge pages easier.


I think we have enough infrastructure currently in hugetlbfs to
implement what Adam wants for something like a /dev/hugetlb char
device (except we can't afford to have a zero hugetlb page since it
will be too costly on some arch).

I really like the idea of having something similar to /dev/zero for
hugetlb page.  So I coded it up on top of existing hugetlbfs.  The
core change is really small and half of the patch is really just
moving things around.  I think this at least can partially fulfill the
goal.



Good stuff. Lets take a look



Signed-off-by: Ken Chen <[EMAIL PROTECTED]>

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index f5c160c..56e58f5 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -27,6 +27,7 @@
#include 
#include 
#include 
+#include 

#include 
#include 
@@ -872,6 +873,13 @@ static const struct file_operations oldmem_fops = {
};
#endif

+#ifdef CONFIG_HUGETLBFS
+static const struct file_operations hugetlb_fops = {
+   .mmap   = hugetlb_zero_setup,
+   .get_unmapped_area  = hugetlb_get_unmapped_area,
+};
+#endif


Ok, so we'd behave similar to shared memory and use the internal mount. 
Seems reasonable



+
static ssize_t kmsg_write(struct file * file, const char __user * buf,
  size_t count, loff_t *ppos)
{
@@ -939,6 +947,11 @@ static int memory_open(struct inode *
filp->f_op = _fops;
break;
#endif
+#ifdef CONFIG_HUGETLBFS
+   case 13:
+   filp->f_op = _fops;
+   break;
+#endif
default:
return -ENXIO;
}
@@ -971,6 +984,9 @@ static const struct {
#ifdef CONFIG_CRASH_DUMP
{12,"oldmem",S_IRUSR | S_IWUSR | S_IRGRP, _fops},
#endif
+#ifdef CONFIG_HUGETLBFS
+   {13, "hugetlb",S_IRUGO | S_IWUGO, _fops},
+#endif
};

static struct class *mem_class;
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 8c718a3..af24664 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -97,12 +97,7 @@ out:
/*
* Called under down_write(mmap_sem).
*/
-
-#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
-unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long 
addr,

-   unsigned long len, unsigned long pgoff, unsigned long flags);
-#else
-static unsigned long
+unsigned long
hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags)


What is going on here? Why do arches not get to specify a 
get_unmapped_area any more?



{
@@ -150,7 +145,6 @@ full_search:
addr = ALIGN(vma->vm_end, HPAGE_SIZE);
}
}
-#endif

/*
* Read a page. Again trivial. If it didn't already exist
@@ -734,7 +728,7 @@ static int can_do_hugetlb_shm(void)
can_do_mlock());
}

-struct file *hugetlb_zero_setup(size_t size)
+struct file *hugetlb_file_setup(size_t size, int resv)
{
int error = -ENOMEM;
struct file *file;
@@ -771,7 +765,7 @@ struct file *hugetlb_zero_setup(size_t size)
goto out_file;

error = -ENOMEM;
-   if (hugetlb_reserve_pages(inode, 0, size >> HPAGE_SHIFT))
+   if (resv && hugetlb_reserve_pages(inode, 0, size >> HPAGE_SHIFT))
goto out_inode;



This looks like it should be a separate patch altogether. At first glance, 
it seems reasonable enough - just not jammed in with a char device.



d_instantiate(dentry, inode);
@@ -795,6 +789,18 @@ out_shm_unlock:
return ERR_PTR(error);
}

+int hugetlb_zero_setup(struct file *file, struct vm_area_struct *vma)
+{
+   file = hugetlb_file_setup(vma->vm_end - vma->vm_start, 0);
+   if (IS_ERR(file))
+   return PTR_ERR(file);
+
+   if (vma->vm_file)
+   fput(vma->vm_file);
+   vma->vm_file = file;
+   return hugetlbfs_file_mmap(file, vma);
+}
+
static int __init init_hugetlbfs_fs(void)
{
int error;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 3f3e7a6..d2a2190 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -163,9 +163,12 @@ static inline struct hugetlbfs_sb_info *

extern const struct file_operations hugetlbfs_file_operations;
extern struct vm_operations_struct hugetlb_vm_ops;
-struct file 

Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread William Lee Irwin III
On Fri, Mar 23, 2007 at 01:44:38AM -0700, Ken Chen wrote:
> I think we have enough infrastructure currently in hugetlbfs to
> implement what Adam wants for something like a /dev/hugetlb char
> device (except we can't afford to have a zero hugetlb page since it
> will be too costly on some arch).
> I really like the idea of having something similar to /dev/zero for
> hugetlb page.  So I coded it up on top of existing hugetlbfs.  The
> core change is really small and half of the patch is really just
> moving things around.  I think this at least can partially fulfill the
> goal.
> Signed-off-by: Ken Chen <[EMAIL PROTECTED]>

I like this patch a lot, though I'm not likely to get around to testing
it today. If userspace testcode is available that would be great to see
posted so I can just boot into things and run that.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Mel Gorman

On Fri, 23 Mar 2007, Ken Chen wrote:


On 3/21/07, Adam Litke [EMAIL PROTECTED] wrote:

The main reason I am advocating a set of pagetable_operations is to
enable the development of a new hugetlb interface.  During the hugetlb
BOFS at OLS last year, we talked about a character device that would
behave like /dev/zero.  Many of the people were talking about how they
just wanted to create MAP_PRIVATE hugetlb mappings without all the fuss
about the hugetlbfs filesystem.  /dev/zero is a familiar interface for
getting anonymous memory so bringing that model to huge pages would make
programming for anonymous huge pages easier.


I think we have enough infrastructure currently in hugetlbfs to
implement what Adam wants for something like a /dev/hugetlb char
device (except we can't afford to have a zero hugetlb page since it
will be too costly on some arch).

I really like the idea of having something similar to /dev/zero for
hugetlb page.  So I coded it up on top of existing hugetlbfs.  The
core change is really small and half of the patch is really just
moving things around.  I think this at least can partially fulfill the
goal.



Good stuff. Lets take a look



Signed-off-by: Ken Chen [EMAIL PROTECTED]

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index f5c160c..56e58f5 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -27,6 +27,7 @@
#include linux/bootmem.h
#include linux/pipe_fs_i.h
#include linux/pfn.h
+#include linux/hugetlb.h

#include asm/uaccess.h
#include asm/io.h
@@ -872,6 +873,13 @@ static const struct file_operations oldmem_fops = {
};
#endif

+#ifdef CONFIG_HUGETLBFS
+static const struct file_operations hugetlb_fops = {
+   .mmap   = hugetlb_zero_setup,
+   .get_unmapped_area  = hugetlb_get_unmapped_area,
+};
+#endif


Ok, so we'd behave similar to shared memory and use the internal mount. 
Seems reasonable



+
static ssize_t kmsg_write(struct file * file, const char __user * buf,
  size_t count, loff_t *ppos)
{
@@ -939,6 +947,11 @@ static int memory_open(struct inode *
filp-f_op = oldmem_fops;
break;
#endif
+#ifdef CONFIG_HUGETLBFS
+   case 13:
+   filp-f_op = hugetlb_fops;
+   break;
+#endif
default:
return -ENXIO;
}
@@ -971,6 +984,9 @@ static const struct {
#ifdef CONFIG_CRASH_DUMP
{12,oldmem,S_IRUSR | S_IWUSR | S_IRGRP, oldmem_fops},
#endif
+#ifdef CONFIG_HUGETLBFS
+   {13, hugetlb,S_IRUGO | S_IWUGO, hugetlb_fops},
+#endif
};

static struct class *mem_class;
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 8c718a3..af24664 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -97,12 +97,7 @@ out:
/*
* Called under down_write(mmap_sem).
*/
-
-#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
-unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long 
addr,

-   unsigned long len, unsigned long pgoff, unsigned long flags);
-#else
-static unsigned long
+unsigned long
hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags)


What is going on here? Why do arches not get to specify a 
get_unmapped_area any more?



{
@@ -150,7 +145,6 @@ full_search:
addr = ALIGN(vma-vm_end, HPAGE_SIZE);
}
}
-#endif

/*
* Read a page. Again trivial. If it didn't already exist
@@ -734,7 +728,7 @@ static int can_do_hugetlb_shm(void)
can_do_mlock());
}

-struct file *hugetlb_zero_setup(size_t size)
+struct file *hugetlb_file_setup(size_t size, int resv)
{
int error = -ENOMEM;
struct file *file;
@@ -771,7 +765,7 @@ struct file *hugetlb_zero_setup(size_t size)
goto out_file;

error = -ENOMEM;
-   if (hugetlb_reserve_pages(inode, 0, size  HPAGE_SHIFT))
+   if (resv  hugetlb_reserve_pages(inode, 0, size  HPAGE_SHIFT))
goto out_inode;



This looks like it should be a separate patch altogether. At first glance, 
it seems reasonable enough - just not jammed in with a char device.



d_instantiate(dentry, inode);
@@ -795,6 +789,18 @@ out_shm_unlock:
return ERR_PTR(error);
}

+int hugetlb_zero_setup(struct file *file, struct vm_area_struct *vma)
+{
+   file = hugetlb_file_setup(vma-vm_end - vma-vm_start, 0);
+   if (IS_ERR(file))
+   return PTR_ERR(file);
+
+   if (vma-vm_file)
+   fput(vma-vm_file);
+   vma-vm_file = file;
+   return hugetlbfs_file_mmap(file, vma);
+}
+
static int __init init_hugetlbfs_fs(void)
{
int error;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 3f3e7a6..d2a2190 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -163,9 +163,12 @@ static inline struct hugetlbfs_sb_info *

extern const struct file_operations 

Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread William Lee Irwin III
On Fri, Mar 23, 2007 at 01:44:38AM -0700, Ken Chen wrote:
 I think we have enough infrastructure currently in hugetlbfs to
 implement what Adam wants for something like a /dev/hugetlb char
 device (except we can't afford to have a zero hugetlb page since it
 will be too costly on some arch).
 I really like the idea of having something similar to /dev/zero for
 hugetlb page.  So I coded it up on top of existing hugetlbfs.  The
 core change is really small and half of the patch is really just
 moving things around.  I think this at least can partially fulfill the
 goal.
 Signed-off-by: Ken Chen [EMAIL PROTECTED]

I like this patch a lot, though I'm not likely to get around to testing
it today. If userspace testcode is available that would be great to see
posted so I can just boot into things and run that.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread William Lee Irwin III
On Fri, 23 Mar 2007, Ken Chen wrote:
 -#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
 -unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long 
 addr,
 -unsigned long len, unsigned long pgoff, unsigned long flags);
 -#else
 -static unsigned long
 +unsigned long
 hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
  unsigned long len, unsigned long pgoff, unsigned long flags)
 
On Fri, Mar 23, 2007 at 03:03:57PM +, Mel Gorman wrote:
 What is going on here? Why do arches not get to specify a 
 get_unmapped_area any more?

Lack of compiletesting beyond x86-64 in all probability.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Mel Gorman

On Fri, 23 Mar 2007, William Lee Irwin III wrote:


On Fri, 23 Mar 2007, Ken Chen wrote:

-#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
-unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long
addr,
-   unsigned long len, unsigned long pgoff, unsigned long flags);
-#else
-static unsigned long
+unsigned long
hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags)



On Fri, Mar 23, 2007 at 03:03:57PM +, Mel Gorman wrote:

What is going on here? Why do arches not get to specify a
get_unmapped_area any more?


Lack of compiletesting beyond x86-64 in all probability.



Ok, this will go kablamo on Power then even if it compiles. I don't 
consider it a fundamental problem though. For the purposes of an RFC, it's 
grand and something that can be worked with.


--
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread William Lee Irwin III
On Fri, 23 Mar 2007, William Lee Irwin III wrote:
 Lack of compiletesting beyond x86-64 in all probability.

On Fri, Mar 23, 2007 at 03:15:55PM +, Mel Gorman wrote:
 Ok, this will go kablamo on Power then even if it compiles. I don't 
 consider it a fundamental problem though. For the purposes of an RFC, it's 
 grand and something that can be worked with.

He needs to un-#ifdef the prototype (which he already does), but he
needs to leave the definition under #ifdef while removing the static
qualifier. A relatively minor fixup.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Benjamin Herrenschmidt

 -#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
 -unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long 
 addr,
 - unsigned long len, unsigned long pgoff, unsigned long flags);
 -#else
 -static unsigned long
 +unsigned long
  hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
   unsigned long len, unsigned long pgoff, unsigned long flags)
  {
 @@ -150,7 +145,6 @@ full_search:
   addr = ALIGN(vma-vm_end, HPAGE_SIZE);
   }
  }
 -#endif

WTF ? get_unmapped_area() -has- to be arch in some platforms like
power...

I'm trying to improve the whole get_unmapped_area() to better handle
multiple constraints (cacheability, page size, ...) though I haven't
quite yet settled on an interface I like.

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Ken Chen

On 3/23/07, William Lee Irwin III [EMAIL PROTECTED] wrote:

I like this patch a lot, though I'm not likely to get around to testing
it today. If userspace testcode is available that would be great to see
posted so I can just boot into things and run that.


Here is the test code that I used:
(warning: x86 centric)

#include stdlib.h
#include stdio.h
#include fcntl.h
#include sys/mman.h

#define SIZE(4*1024*1024UL)

int main(void)
{
int fd;
long i;
char *addr;

fd = open(/dev/hugetlb, O_RDWR);
if (fd == -1) {
perror(open failure);
exit(1);
}

addr = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
if (addr == MAP_FAILED) {
perror(mmap failure);
exit(2);
}

for (i = 0; i  SIZE; i+=4096)
addr[i] = 1;

printf(success!\n);
}
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Ken Chen

On 3/23/07, William Lee Irwin III [EMAIL PROTECTED] wrote:

On Fri, 23 Mar 2007, William Lee Irwin III wrote:
 Lack of compiletesting beyond x86-64 in all probability.

On Fri, Mar 23, 2007 at 03:15:55PM +, Mel Gorman wrote:
 Ok, this will go kablamo on Power then even if it compiles. I don't
 consider it a fundamental problem though. For the purposes of an RFC, it's
 grand and something that can be worked with.

He needs to un-#ifdef the prototype (which he already does), but he
needs to leave the definition under #ifdef while removing the static
qualifier. A relatively minor fixup.


Yes, sorry about that for lack of access to non-x86-64 machines.  I
needed to move the function prototype to hugetlb.h and evidently
removed the #ifdef by mistake.  I'm not going to touch this in my next
clean up patch, instead I will just declare char specific
file_operations struct in hugetlbfs and then have char device
reference it.

But nevertheless, hugetlb_get_unmapped_area function prototype  better
be in a header file somewhere.

- Ken
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Andrew Morton
On Fri, 23 Mar 2007 01:44:38 -0700 Ken Chen [EMAIL PROTECTED] wrote:

 On 3/21/07, Adam Litke [EMAIL PROTECTED] wrote:
  The main reason I am advocating a set of pagetable_operations is to
  enable the development of a new hugetlb interface.  During the hugetlb
  BOFS at OLS last year, we talked about a character device that would
  behave like /dev/zero.  Many of the people were talking about how they
  just wanted to create MAP_PRIVATE hugetlb mappings without all the fuss
  about the hugetlbfs filesystem.  /dev/zero is a familiar interface for
  getting anonymous memory so bringing that model to huge pages would make
  programming for anonymous huge pages easier.
 
 I think we have enough infrastructure currently in hugetlbfs to
 implement what Adam wants for something like a /dev/hugetlb char
 device (except we can't afford to have a zero hugetlb page since it
 will be too costly on some arch).
 
 I really like the idea of having something similar to /dev/zero for
 hugetlb page.  So I coded it up on top of existing hugetlbfs.  The
 core change is really small and half of the patch is really just
 moving things around.  I think this at least can partially fulfill the
 goal.

Standing back and looking at this...

afaict the whole reason for this work is to provide a quick-n-easy way to
get private mappings of hugetlb pages.  With the emphasis on quick-n-easy.

We can do the same with hugetlbfs, but that involves (horror) fuss.

The way to avoid fuss is of course to do it once, do it properly then stick
it in a library which everyone uses.

But libraries are hard, for a number of distributional reasons.  It is
easier for us to distribute this functionality within the kernel.  In fact,
if Linus's tree included a ./userspace/libkernel/libhugetlb/ then we'd
probably provide this functionality in there.

This comes up regularly, and it's pretty sad.

Probably the kernel team should be maintaining, via existing processes, a
separate libkernel project, to fix these distributional problems.  The
advantage in this case is of course that our new hugetlb functionality
would be available to people on 2.6.18 kernels, not only on 2.6.22 and
later.

Am I wrong?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Nish Aravamudan

On 3/23/07, Andrew Morton [EMAIL PROTECTED] wrote:

On Fri, 23 Mar 2007 01:44:38 -0700 Ken Chen [EMAIL PROTECTED] wrote:

 On 3/21/07, Adam Litke [EMAIL PROTECTED] wrote:
  The main reason I am advocating a set of pagetable_operations is to
  enable the development of a new hugetlb interface.  During the hugetlb
  BOFS at OLS last year, we talked about a character device that would
  behave like /dev/zero.  Many of the people were talking about how they
  just wanted to create MAP_PRIVATE hugetlb mappings without all the fuss
  about the hugetlbfs filesystem.  /dev/zero is a familiar interface for
  getting anonymous memory so bringing that model to huge pages would make
  programming for anonymous huge pages easier.

 I think we have enough infrastructure currently in hugetlbfs to
 implement what Adam wants for something like a /dev/hugetlb char
 device (except we can't afford to have a zero hugetlb page since it
 will be too costly on some arch).

 I really like the idea of having something similar to /dev/zero for
 hugetlb page.  So I coded it up on top of existing hugetlbfs.  The
 core change is really small and half of the patch is really just
 moving things around.  I think this at least can partially fulfill the
 goal.

Standing back and looking at this...

afaict the whole reason for this work is to provide a quick-n-easy way to
get private mappings of hugetlb pages.  With the emphasis on quick-n-easy.


I agree.


We can do the same with hugetlbfs, but that involves (horror) fuss.


Yes.


The way to avoid fuss is of course to do it once, do it properly then stick
it in a library which everyone uses.


That's sort of what libhugetlbfs
(http://sourceforge.net/projects/libhugetlbfs for stable releases,
http://libhugetlbfs.ozlabs.org/ for development snapshots/git tree) is
for; while it currently only tries to abstract/provide functionality
via hugetlbfs, that's mostly because that is the only interface
available (or was, pending some sort of char dev being merged).


But libraries are hard, for a number of distributional reasons.  It is
easier for us to distribute this functionality within the kernel.  In fact,
if Linus's tree included a ./userspace/libkernel/libhugetlb/ then we'd
probably provide this functionality in there.


libhugetlbfs is available for both SLES10 and RHEL5.


This comes up regularly, and it's pretty sad.


I agree. There is simply some functionality that is *very* closely
tied to the kernel.


Probably the kernel team should be maintaining, via existing processes, a
separate libkernel project, to fix these distributional problems.  The
advantage in this case is of course that our new hugetlb functionality
would be available to people on 2.6.18 kernels, not only on 2.6.22 and
later.


That sounds like a good idea. For this hugetlb stuff, though, I plan
on simply taking advantage of /dev/hugetlb (or whatever it is called)
if it exists, and otherwise falling back to hugetlbfs (which
admittedly requires some admin intervention (mounting hugetlbfs,
permissions, and such), but then again, so does using hugepages in the
first place (either at boot-time or via /proc/sys/vm/nr_hugepages)).
Is that what you mean by available to 2.6.18 (falling back to
hugetlbfs) and 2.6.22 (using the chardev)?


Am I wrong?


I don't think so. And hugepages are hard enough to use (and with
enough architecture specific quirks) that it was worth creating
libhugetlbfs. While having some nice features like segment remapping
and overriding malloc, it is also meant to provide an API that is
useful for general use of hugepages: we currently export
gethugepagesize(), hugetlbfs_test_path() (verify a path is a valid
hugetlbfs mount), hugetlbfs_find_path() (gives you the hugetlbfs
mount) and hugetlbfs_unlinked_fd() (gives you an unlinked file in the
hugetlbfs mount).

Then again, maybe I'm missing some much bigger picture here and you
meant something completely different -- sorry for the noise in that
case :/

Thanks,
Nish
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Andrew Morton
On Fri, 23 Mar 2007 22:32:31 -0700 Nish Aravamudan [EMAIL PROTECTED] wrote:

  Probably the kernel team should be maintaining, via existing processes, a
  separate libkernel project, to fix these distributional problems.  The
  advantage in this case is of course that our new hugetlb functionality
  would be available to people on 2.6.18 kernels, not only on 2.6.22 and
  later.
 
 That sounds like a good idea. For this hugetlb stuff, though, I plan
 on simply taking advantage of /dev/hugetlb (or whatever it is called)
 if it exists, and otherwise falling back to hugetlbfs (which
 admittedly requires some admin intervention (mounting hugetlbfs,
 permissions, and such), but then again, so does using hugepages in the
 first place (either at boot-time or via /proc/sys/vm/nr_hugepages)).
 Is that what you mean by available to 2.6.18 (falling back to
 hugetlbfs) and 2.6.22 (using the chardev)?

My point is:

a) Ken observes that obtaining private hugetlb memory via hugetlbfs
   involves fuss.

b) the libhugetlbfs maintainers then go off and implement a no-fuss way of
   doing this.

c) voila, people can now use the new no-fuss interface on older kernels.
   Whereas Ken's kernel patch would require that they upgrade to a new
   kernel.

It wasn't a vary big point ;) I'm assuming that users find that upgrading
libhugetlb is less costly than upgrading their kernel.


  Am I wrong?
 
 I don't think so. And hugepages are hard enough to use (and with
 enough architecture specific quirks) that it was worth creating
 libhugetlbfs. While having some nice features like segment remapping
 and overriding malloc, it is also meant to provide an API that is
 useful for general use of hugepages: we currently export
 gethugepagesize(), hugetlbfs_test_path() (verify a path is a valid
 hugetlbfs mount), hugetlbfs_find_path() (gives you the hugetlbfs
 mount) and hugetlbfs_unlinked_fd() (gives you an unlinked file in the
 hugetlbfs mount).
 
 Then again, maybe I'm missing some much bigger picture here and you
 meant something completely different -- sorry for the noise in that
 case :/

You got it.

The fact that a kernel interface is hard to use really shouldn't be an
issue for us, because that hardness can be addressed in libraries.  Kernel
interfaces should be good, and complete, and maintainable, and etcetera. 
If that means that they end up hard to use, well, that's not necessarily a
bad thing.  I'm not sure that in all cases we want to be optimising for
ease-of-use just because libraries-are-hard.


But for non-programming reasons, we're just not there yet: people want to
program direct to the kernel interfaces simply because of the
distribution/coordination problems with libraries.  It would be nice to fix
that problem.


For a counter-example, look at futexes.  Their kernel interfaces are
*damned* hard to use.  But practically nobody is affected by that because
glibc solved the problem and programmers just use the pthread API.

More of this, please ;)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] rfc: introduce /dev/hugetlb

2007-03-23 Thread Sam Ravnborg
 
 But for non-programming reasons, we're just not there yet: people want to
 program direct to the kernel interfaces simply because of the
 distribution/coordination problems with libraries.  It would be nice to fix
 that problem.

What is then needed to get a small subset of user-space in the 
kernel-development cycle?
Maybe a topic worth to take up at LKS...

The build system is anyway ready but that the smallest issue of all :-(

Sam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/