Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2)

2007-03-21 Thread Christoph Hellwig
On Wed, Mar 21, 2007 at 03:55:54PM +, Hugh Dickins wrote:
> On Mon, 19 Mar 2007, Adam Litke wrote:
> > Andrew, given the favorable review of these patches the last time around, 
> > would
> > you consider them for the -mm tree?  Does anyone else have any objections?
> 
> I quite fail to understand the enthusiasm for these patches.  All they
> do is make the already ugly interfaces to hugetlb more obscure than at
> present, and open the door to even uglier stuff later.  Don't you need
> to wait for at least one other user of these interfaces to emerge,
> to get a better idea of whether they're appropriate?

*nod*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2)

2007-03-21 Thread Hugh Dickins
On Mon, 19 Mar 2007, Adam Litke wrote:
> Andrew, given the favorable review of these patches the last time around, 
> would
> you consider them for the -mm tree?  Does anyone else have any objections?

I quite fail to understand the enthusiasm for these patches.  All they
do is make the already ugly interfaces to hugetlb more obscure than at
present, and open the door to even uglier stuff later.  Don't you need
to wait for at least one other user of these interfaces to emerge,
to get a better idea of whether they're appropriate?

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2)

2007-03-21 Thread Hugh Dickins
On Mon, 19 Mar 2007, Adam Litke wrote:
 Andrew, given the favorable review of these patches the last time around, 
 would
 you consider them for the -mm tree?  Does anyone else have any objections?

I quite fail to understand the enthusiasm for these patches.  All they
do is make the already ugly interfaces to hugetlb more obscure than at
present, and open the door to even uglier stuff later.  Don't you need
to wait for at least one other user of these interfaces to emerge,
to get a better idea of whether they're appropriate?

Hugh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2)

2007-03-21 Thread Christoph Hellwig
On Wed, Mar 21, 2007 at 03:55:54PM +, Hugh Dickins wrote:
 On Mon, 19 Mar 2007, Adam Litke wrote:
  Andrew, given the favorable review of these patches the last time around, 
  would
  you consider them for the -mm tree?  Does anyone else have any objections?
 
 I quite fail to understand the enthusiasm for these patches.  All they
 do is make the already ugly interfaces to hugetlb more obscure than at
 present, and open the door to even uglier stuff later.  Don't you need
 to wait for at least one other user of these interfaces to emerge,
 to get a better idea of whether they're appropriate?

*nod*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2)

2007-03-20 Thread William Lee Irwin III
On Mon, Mar 19, 2007 at 01:05:02PM -0700, Adam Litke wrote:
> Andrew, given the favorable review of these patches the last time
> around, would you consider them for the -mm tree?  Does anyone else
> have any objections?

We need a new round of commentary for how it should integrate with
Nick Piggin's fault handling patches given that both introduce very
similar ->fault() methods, albeit at different places and for different
purposes.

I think things weren't entirely wrapped up last time but there was
general approval in concept and code-level issues had been gotten past.
I've forgotten the conclusion of hch and arjan's commentary on making
the pagetable operations mandatory. ISTR they were all cosmetic affairs
like that or whether they should be part of ->vm_ops as opposed to
fundamental issues.

The last thing I'd want to do is hold things back, so by no means
delay merging etc. on account of this, but I am curious on several
points. First, is there any demonstrable overhead to mandatory indirect
calls for the pagetable operations? Second, can case analysis for e.g.
file-backed vs. anon and/or COW vs. shared be avoided by the use of
the indirect function call, or more specifically, to any beneficial
effect? Well, I rearranged the code in such a manner ca. 2.6.6 so I
know the rearrangement is possible, but not the performance impact vs.
modern kernels, if any, never mind how the code ends up looking in
modern kernels. Third, could you use lmbench or some such to get direct
fork() and fault handling microbenchmarks? Kernel compiles are too
close to macrobenchmarks to say anything concrete there apart from that
other issues (e.g. SMP load balancing, NUMA, lock contention, etc.)
dominate indirect calls. If you have the time or interest to explore
any of these areas, I'd be very interested in hearing the results.

One thing I would like to see for sure is dropping the has_pt_op()
and pt_op() macros. The Linux-native convention is to open-code the
function pointer fetches, and the non-native convention is to wrap
things like defaulting (though they actually do something more involved)
in the analogue of pt_op() for the purposes of things like extensible
sets of operations bordering on OOP-ish method tables. So this ends up
as some sort of hybrid convention without the functionality of the
non-native call wrappers and without the clarity of open-coding. My
personal preference is that the function pointer table be mandatory and
the call to the the function pointer be unconditional and the type
dispatch accomplished entirely through the function pointers, but I'm
not particularly insistent about that.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2)

2007-03-20 Thread Dave Hansen
On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
> For the common case (vma->pagetable_ops == NULL), we do almost the
> same thing as the current code: load and test.  The third instruction
> is different in that we jump for the common case instead of jumping in
> the hugetlb case.  I don't think this is a big deal though.  If it is,
> would an unlikely() macro fix it? 

I wouldn't worry about micro-optimizing it at that level.  The CPU does
enough stuff under the covers that I wouldn't worry about it at all.

I wonder if the real differential impact (if any) is likely to come from
the pagetable_ops cacheline being hot or cold, since it is in a
different place in the structure than the flags.  But, from a quick
glance I see a few vm_ops references preceding pagetable_ops references,
so the pagetable_ops cacheline might already be hot most of the time.  

BTW, are there any other possible users for these things other than
large pages?

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2)

2007-03-20 Thread Dave Hansen
On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
 For the common case (vma-pagetable_ops == NULL), we do almost the
 same thing as the current code: load and test.  The third instruction
 is different in that we jump for the common case instead of jumping in
 the hugetlb case.  I don't think this is a big deal though.  If it is,
 would an unlikely() macro fix it? 

I wouldn't worry about micro-optimizing it at that level.  The CPU does
enough stuff under the covers that I wouldn't worry about it at all.

I wonder if the real differential impact (if any) is likely to come from
the pagetable_ops cacheline being hot or cold, since it is in a
different place in the structure than the flags.  But, from a quick
glance I see a few vm_ops references preceding pagetable_ops references,
so the pagetable_ops cacheline might already be hot most of the time.  

BTW, are there any other possible users for these things other than
large pages?

-- Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2)

2007-03-20 Thread William Lee Irwin III
On Mon, Mar 19, 2007 at 01:05:02PM -0700, Adam Litke wrote:
 Andrew, given the favorable review of these patches the last time
 around, would you consider them for the -mm tree?  Does anyone else
 have any objections?

We need a new round of commentary for how it should integrate with
Nick Piggin's fault handling patches given that both introduce very
similar -fault() methods, albeit at different places and for different
purposes.

I think things weren't entirely wrapped up last time but there was
general approval in concept and code-level issues had been gotten past.
I've forgotten the conclusion of hch and arjan's commentary on making
the pagetable operations mandatory. ISTR they were all cosmetic affairs
like that or whether they should be part of -vm_ops as opposed to
fundamental issues.

The last thing I'd want to do is hold things back, so by no means
delay merging etc. on account of this, but I am curious on several
points. First, is there any demonstrable overhead to mandatory indirect
calls for the pagetable operations? Second, can case analysis for e.g.
file-backed vs. anon and/or COW vs. shared be avoided by the use of
the indirect function call, or more specifically, to any beneficial
effect? Well, I rearranged the code in such a manner ca. 2.6.6 so I
know the rearrangement is possible, but not the performance impact vs.
modern kernels, if any, never mind how the code ends up looking in
modern kernels. Third, could you use lmbench or some such to get direct
fork() and fault handling microbenchmarks? Kernel compiles are too
close to macrobenchmarks to say anything concrete there apart from that
other issues (e.g. SMP load balancing, NUMA, lock contention, etc.)
dominate indirect calls. If you have the time or interest to explore
any of these areas, I'd be very interested in hearing the results.

One thing I would like to see for sure is dropping the has_pt_op()
and pt_op() macros. The Linux-native convention is to open-code the
function pointer fetches, and the non-native convention is to wrap
things like defaulting (though they actually do something more involved)
in the analogue of pt_op() for the purposes of things like extensible
sets of operations bordering on OOP-ish method tables. So this ends up
as some sort of hybrid convention without the functionality of the
non-native call wrappers and without the clarity of open-coding. My
personal preference is that the function pointer table be mandatory and
the call to the the function pointer be unconditional and the type
dispatch accomplished entirely through the function pointers, but I'm
not particularly insistent about that.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2)

2007-03-19 Thread Adam Litke

Andrew, given the favorable review of these patches the last time around, would
you consider them for the -mm tree?  Does anyone else have any objections?

The page tables for hugetlb mappings are handled differently than page tables
for normal pages.  Rather than integrating multiple page size support into the
core VM (which would tremendously complicate the code) some hooks were created.
This allows hugetlb special cases to be handled "out of line" by a separate
interface.

Hugetlbfs was the huge page interface chosen.  At the time, large database
users were the only big users of huge pages and the hugetlbfs design meets
their needs pretty well.  Over time, hugetlbfs has been expanded to enable new
uses of huge page memory with varied results.  As features are added, the
semantics become a permanent part of the Linux API.  This makes maintenance of
hugetlbfs an increasingly difficult task and inhibits the addition of features
and functionality in support of ever-changing hardware.

To remedy the situation, I propose an API (currently called
pagetable_operations).  All of the current hugetlbfs-specific hooks are moved
into an operations struct that is attached to VMAs.  The end result is a more
explicit and IMO a cleaner interface between hugetlbfs and the core VM.  We are
then free to add other hugetlb interfaces (such as a /dev/zero-styled character
device) that can operate either in concert with or independent of hugetlbfs.

There should be no measurable performance impact for normal page users (we're
checking if pagetable_ops != NULL instead of checking for vm_flags &
VM_HUGETLB).  Of course we do increase the VMA size by one pointer.  For huge
pages, there is an added indirection for pt_op() calls.  This patch series does
not change the logic of the the hugetlbfs operations, just moves them into the
pagetable_operations struct.

I did some pretty basic benchmarking of these patches on ppc64, x86, and x86_64
to get a feel for the fast-path performance impact.  The following tables show
kernbench performance comparisons between a clean 2.6.20 kernel and one with my
patches applied.  These numbers seem well within statistical noise to me.

Changes since V1:
- Made hugetlbfs_pagetable_ops const (Thanks Arjan)

--

KernBench Comparison (ppc64)

   2.6.20-clean  2.6.20-pgtable_opspct. diff
User   CPU time  708.82 708.59  0.03
System CPU time   62.50  62.58 -0.13
Total  CPU time  771.32 771.17  0.02
Elapsedtime  115.40 115.35  0.04

KernBench Comparison (x86)
--
   2.6.20-clean  2.6.20-pgtable_opspct. diff
User   CPU time 1382.621381.88  0.05
System CPU time  146.06 146.86 -0.55
Total  CPU time 1528.681528.74 -0.00
Elapsedtime  394.92 396.70 -0.45

KernBench Comparison (x86_64)
-
   2.6.20-clean  2.6.20-pgtable_opspct. diff
User   CPU time  559.39 557.97  0.25
System CPU time   65.10  66.17 -1.64
Total  CPU time  624.49 624.14  0.06
Elapsedtime  158.54 158.59 -0.03

The lack of a performance impact makes sense to me.  The following is a
simplified instruction comparison for each case:

2.6.20-clean   2.6.20-pgtable_ops
---
/* Load vm_flags *//* Load pagetable_ops pointer */
mov 0x18(ecx),eax  mov  0x48(ecx),eax
/* Test for VM_HUGETLB */  /* Test if it's NULL */
test$0x40,eax  test   eax,eax
/* If set, jump to call stub *//* If so, jump away to main code */
jne c0148f04   je   c0148ba1
.../* Lookup the operation's function 
pointer */
/* copy_hugetlb_page_range call */ mov  0x4(eax),ebx
c0148f04:  /* Test if it's NULL */
mov 0xff98(ebp),ecxtest   ebx,ebx
mov 0xff9c(ebp),edx/* If so, jump away to main code */
mov 0xffa0(ebp),eaxje   c0148ba1
callc01536e0   /* pagetable operation call */
   mov  0xff9c(ebp),edx
   mov  0xffa0(ebp),eax
   call *ebx

For the common case (vma->pagetable_ops == NULL), we do almost the same thing 
as the current code: load and test.  The third instruction is different in that 
we jump for the common case instead of jumping in the hugetlb case.  I 

[PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2)

2007-03-19 Thread Adam Litke

Andrew, given the favorable review of these patches the last time around, would
you consider them for the -mm tree?  Does anyone else have any objections?

The page tables for hugetlb mappings are handled differently than page tables
for normal pages.  Rather than integrating multiple page size support into the
core VM (which would tremendously complicate the code) some hooks were created.
This allows hugetlb special cases to be handled out of line by a separate
interface.

Hugetlbfs was the huge page interface chosen.  At the time, large database
users were the only big users of huge pages and the hugetlbfs design meets
their needs pretty well.  Over time, hugetlbfs has been expanded to enable new
uses of huge page memory with varied results.  As features are added, the
semantics become a permanent part of the Linux API.  This makes maintenance of
hugetlbfs an increasingly difficult task and inhibits the addition of features
and functionality in support of ever-changing hardware.

To remedy the situation, I propose an API (currently called
pagetable_operations).  All of the current hugetlbfs-specific hooks are moved
into an operations struct that is attached to VMAs.  The end result is a more
explicit and IMO a cleaner interface between hugetlbfs and the core VM.  We are
then free to add other hugetlb interfaces (such as a /dev/zero-styled character
device) that can operate either in concert with or independent of hugetlbfs.

There should be no measurable performance impact for normal page users (we're
checking if pagetable_ops != NULL instead of checking for vm_flags 
VM_HUGETLB).  Of course we do increase the VMA size by one pointer.  For huge
pages, there is an added indirection for pt_op() calls.  This patch series does
not change the logic of the the hugetlbfs operations, just moves them into the
pagetable_operations struct.

I did some pretty basic benchmarking of these patches on ppc64, x86, and x86_64
to get a feel for the fast-path performance impact.  The following tables show
kernbench performance comparisons between a clean 2.6.20 kernel and one with my
patches applied.  These numbers seem well within statistical noise to me.

Changes since V1:
- Made hugetlbfs_pagetable_ops const (Thanks Arjan)

--

KernBench Comparison (ppc64)

   2.6.20-clean  2.6.20-pgtable_opspct. diff
User   CPU time  708.82 708.59  0.03
System CPU time   62.50  62.58 -0.13
Total  CPU time  771.32 771.17  0.02
Elapsedtime  115.40 115.35  0.04

KernBench Comparison (x86)
--
   2.6.20-clean  2.6.20-pgtable_opspct. diff
User   CPU time 1382.621381.88  0.05
System CPU time  146.06 146.86 -0.55
Total  CPU time 1528.681528.74 -0.00
Elapsedtime  394.92 396.70 -0.45

KernBench Comparison (x86_64)
-
   2.6.20-clean  2.6.20-pgtable_opspct. diff
User   CPU time  559.39 557.97  0.25
System CPU time   65.10  66.17 -1.64
Total  CPU time  624.49 624.14  0.06
Elapsedtime  158.54 158.59 -0.03

The lack of a performance impact makes sense to me.  The following is a
simplified instruction comparison for each case:

2.6.20-clean   2.6.20-pgtable_ops
---
/* Load vm_flags *//* Load pagetable_ops pointer */
mov 0x18(ecx),eax  mov  0x48(ecx),eax
/* Test for VM_HUGETLB */  /* Test if it's NULL */
test$0x40,eax  test   eax,eax
/* If set, jump to call stub *//* If so, jump away to main code */
jne c0148f04   je   c0148ba1
.../* Lookup the operation's function 
pointer */
/* copy_hugetlb_page_range call */ mov  0x4(eax),ebx
c0148f04:  /* Test if it's NULL */
mov 0xff98(ebp),ecxtest   ebx,ebx
mov 0xff9c(ebp),edx/* If so, jump away to main code */
mov 0xffa0(ebp),eaxje   c0148ba1
callc01536e0   /* pagetable operation call */
   mov  0xff9c(ebp),edx
   mov  0xffa0(ebp),eax
   call *ebx

For the common case (vma-pagetable_ops == NULL), we do almost the same thing 
as the current code: load and test.  The third instruction is different in that 
we jump for the common case instead of jumping in the hugetlb case.  I don't 

Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API

2007-02-20 Thread Benjamin Herrenschmidt

> maybe. I'm not entirely convinced... (I like the cleanup potential a lot
> code wise.. but if it costs performance, then... well I'd hate to see
> linux get slower for hugetlbfs)
> 
> > If not, then I definitely wouldn't
> > mind creating a default_pagetable_ops and calling into that.
> 
> ... but without it to be honest, your patch adds nothing real.. there's
> ONE user of your code, and there's no real cleanup unless you get rid of
> all the special casing since the special casing is the really ugly
> part of hugetlbfs, not the actual code inside the special case..

Well... I disagree there too :-)

I've been working recently for example on some spufs improvements that
require similar tweaking of the user address space as hugetlbfs. The
problem I have is that while there are hooks in the generic code pretty
much everywhere I need they are all hugetlb specific, that is they
call directly into the hugetlb code.

For now, I found ways of doing my stuff without hooking all over the
page table operations (well, I had no real choices) but I can imagine it
making sense to allow something (hugetlb being one of them) to take over
part of the user address space.

Ben.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API

2007-02-20 Thread Benjamin Herrenschmidt
On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote:
> On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> > The page tables for hugetlb mappings are handled differently than page 
> > tables
> > for normal pages.  Rather than integrating multiple page size support into 
> > the
> > main VM (which would tremendously complicate the code) some hooks were 
> > created.
> > This allows hugetlb special cases to be handled "out of line" by a separate
> > interface.
> 
> ok it makes sense to clean this up.. what I don't like is that there
> STILL are all the double cases... for this to work and be worth it both
> the common case and the hugetlb case should be using the ops structure
> always! Anything else and you're just replacing bad code with bad
> code ;(

I don't fully agree. I think it makes sense to have the "special" case
be a function pointer and the "normal" case stay where it is for
performances. You don't want to pay the cost of the function pointer
call in the normal case do you ?

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API

2007-02-20 Thread Benjamin Herrenschmidt
On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote:
 On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
  The page tables for hugetlb mappings are handled differently than page 
  tables
  for normal pages.  Rather than integrating multiple page size support into 
  the
  main VM (which would tremendously complicate the code) some hooks were 
  created.
  This allows hugetlb special cases to be handled out of line by a separate
  interface.
 
 ok it makes sense to clean this up.. what I don't like is that there
 STILL are all the double cases... for this to work and be worth it both
 the common case and the hugetlb case should be using the ops structure
 always! Anything else and you're just replacing bad code with bad
 code ;(

I don't fully agree. I think it makes sense to have the special case
be a function pointer and the normal case stay where it is for
performances. You don't want to pay the cost of the function pointer
call in the normal case do you ?

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API

2007-02-20 Thread Benjamin Herrenschmidt

 maybe. I'm not entirely convinced... (I like the cleanup potential a lot
 code wise.. but if it costs performance, then... well I'd hate to see
 linux get slower for hugetlbfs)
 
  If not, then I definitely wouldn't
  mind creating a default_pagetable_ops and calling into that.
 
 ... but without it to be honest, your patch adds nothing real.. there's
 ONE user of your code, and there's no real cleanup unless you get rid of
 all the special casing since the special casing is the really ugly
 part of hugetlbfs, not the actual code inside the special case..

Well... I disagree there too :-)

I've been working recently for example on some spufs improvements that
require similar tweaking of the user address space as hugetlbfs. The
problem I have is that while there are hooks in the generic code pretty
much everywhere I need they are all hugetlb specific, that is they
call directly into the hugetlb code.

For now, I found ways of doing my stuff without hooking all over the
page table operations (well, I had no real choices) but I can imagine it
making sense to allow something (hugetlb being one of them) to take over
part of the user address space.

Ben.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API

2007-02-19 Thread Arjan van de Ven
On Mon, 2007-02-19 at 13:34 -0600, Adam Litke wrote:
> On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote:
> > On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> > > The page tables for hugetlb mappings are handled differently than page 
> > > tables
> > > for normal pages.  Rather than integrating multiple page size support 
> > > into the
> > > main VM (which would tremendously complicate the code) some hooks were 
> > > created.
> > > This allows hugetlb special cases to be handled "out of line" by a 
> > > separate
> > > interface.
> > 
> > ok it makes sense to clean this up.. what I don't like is that there
> > STILL are all the double cases... for this to work and be worth it both
> > the common case and the hugetlb case should be using the ops structure
> > always! Anything else and you're just replacing bad code with bad
> > code ;(
> 
> Hmm.  Do you think everyone would support an extra pointer indirection
> for every handle_pte_fault() call?  

maybe. I'm not entirely convinced... (I like the cleanup potential a lot
code wise.. but if it costs performance, then... well I'd hate to see
linux get slower for hugetlbfs)

> If not, then I definitely wouldn't
> mind creating a default_pagetable_ops and calling into that.

... but without it to be honest, your patch adds nothing real.. there's
ONE user of your code, and there's no real cleanup unless you get rid of
all the special casing since the special casing is the really ugly
part of hugetlbfs, not the actual code inside the special case..


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API

2007-02-19 Thread Adam Litke
On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote:
> On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> > The page tables for hugetlb mappings are handled differently than page 
> > tables
> > for normal pages.  Rather than integrating multiple page size support into 
> > the
> > main VM (which would tremendously complicate the code) some hooks were 
> > created.
> > This allows hugetlb special cases to be handled "out of line" by a separate
> > interface.
> 
> ok it makes sense to clean this up.. what I don't like is that there
> STILL are all the double cases... for this to work and be worth it both
> the common case and the hugetlb case should be using the ops structure
> always! Anything else and you're just replacing bad code with bad
> code ;(

Hmm.  Do you think everyone would support an extra pointer indirection
for every handle_pte_fault() call?  If not, then I definitely wouldn't
mind creating a default_pagetable_ops and calling into that.

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API

2007-02-19 Thread Arjan van de Ven
On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> The page tables for hugetlb mappings are handled differently than page tables
> for normal pages.  Rather than integrating multiple page size support into the
> main VM (which would tremendously complicate the code) some hooks were 
> created.
> This allows hugetlb special cases to be handled "out of line" by a separate
> interface.

ok it makes sense to clean this up.. what I don't like is that there
STILL are all the double cases... for this to work and be worth it both
the common case and the hugetlb case should be using the ops structure
always! Anything else and you're just replacing bad code with bad
code ;(

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/7] [RFC] hugetlb: pagetable_operations API

2007-02-19 Thread Adam Litke

The page tables for hugetlb mappings are handled differently than page tables
for normal pages.  Rather than integrating multiple page size support into the
main VM (which would tremendously complicate the code) some hooks were created.
This allows hugetlb special cases to be handled "out of line" by a separate
interface.

Hugetlbfs was the huge page interface chosen.  At the time, large database
users were the only big users of huge pages and the hugetlbfs design meets
their needs pretty well.  Over time, hugetlbfs has been expanded to enable new
uses of huge page memory with varied results.  As features are added, the
semantics become a permanent part of the Linux API.  This makes maintenance of
hugetlbfs an increasingly difficult task and inhibits the addition of features
and functionality in support of ever-changing hardware.

To remedy the situation, I propose an API (currently called
pagetable_operations).  All of the current hugetlbfs-specific hooks are moved
into an operations struct that is attached to VMAs.  The end result is a more
explicit and IMO a cleaner interface between hugetlbfs and the core VM.  We are
then free to add other hugetlb interfaces (such as a /dev/zero-styled character
device) that can operate either in concert with or independent of hugetlbfs.

There should be no measurable performance impact for normal page users (we're
checking if pagetable_ops != NULL instead of checking for vm_flags &
VM_HUGETLB).  Of course we do increase the VMA size by one pointer.  For huge
pages, there is an added indirection for pt_op() calls.  This patch series does
not change the logic of the the hugetlbfs operations, just moves them into the
pagetable_operations struct.

Comments?  Do you think it's as good of an idea as I do?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/7] [RFC] hugetlb: pagetable_operations API

2007-02-19 Thread Adam Litke

The page tables for hugetlb mappings are handled differently than page tables
for normal pages.  Rather than integrating multiple page size support into the
main VM (which would tremendously complicate the code) some hooks were created.
This allows hugetlb special cases to be handled out of line by a separate
interface.

Hugetlbfs was the huge page interface chosen.  At the time, large database
users were the only big users of huge pages and the hugetlbfs design meets
their needs pretty well.  Over time, hugetlbfs has been expanded to enable new
uses of huge page memory with varied results.  As features are added, the
semantics become a permanent part of the Linux API.  This makes maintenance of
hugetlbfs an increasingly difficult task and inhibits the addition of features
and functionality in support of ever-changing hardware.

To remedy the situation, I propose an API (currently called
pagetable_operations).  All of the current hugetlbfs-specific hooks are moved
into an operations struct that is attached to VMAs.  The end result is a more
explicit and IMO a cleaner interface between hugetlbfs and the core VM.  We are
then free to add other hugetlb interfaces (such as a /dev/zero-styled character
device) that can operate either in concert with or independent of hugetlbfs.

There should be no measurable performance impact for normal page users (we're
checking if pagetable_ops != NULL instead of checking for vm_flags 
VM_HUGETLB).  Of course we do increase the VMA size by one pointer.  For huge
pages, there is an added indirection for pt_op() calls.  This patch series does
not change the logic of the the hugetlbfs operations, just moves them into the
pagetable_operations struct.

Comments?  Do you think it's as good of an idea as I do?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API

2007-02-19 Thread Arjan van de Ven
On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
 The page tables for hugetlb mappings are handled differently than page tables
 for normal pages.  Rather than integrating multiple page size support into the
 main VM (which would tremendously complicate the code) some hooks were 
 created.
 This allows hugetlb special cases to be handled out of line by a separate
 interface.

ok it makes sense to clean this up.. what I don't like is that there
STILL are all the double cases... for this to work and be worth it both
the common case and the hugetlb case should be using the ops structure
always! Anything else and you're just replacing bad code with bad
code ;(

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API

2007-02-19 Thread Adam Litke
On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote:
 On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
  The page tables for hugetlb mappings are handled differently than page 
  tables
  for normal pages.  Rather than integrating multiple page size support into 
  the
  main VM (which would tremendously complicate the code) some hooks were 
  created.
  This allows hugetlb special cases to be handled out of line by a separate
  interface.
 
 ok it makes sense to clean this up.. what I don't like is that there
 STILL are all the double cases... for this to work and be worth it both
 the common case and the hugetlb case should be using the ops structure
 always! Anything else and you're just replacing bad code with bad
 code ;(

Hmm.  Do you think everyone would support an extra pointer indirection
for every handle_pte_fault() call?  If not, then I definitely wouldn't
mind creating a default_pagetable_ops and calling into that.

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API

2007-02-19 Thread Arjan van de Ven
On Mon, 2007-02-19 at 13:34 -0600, Adam Litke wrote:
 On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote:
  On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
   The page tables for hugetlb mappings are handled differently than page 
   tables
   for normal pages.  Rather than integrating multiple page size support 
   into the
   main VM (which would tremendously complicate the code) some hooks were 
   created.
   This allows hugetlb special cases to be handled out of line by a 
   separate
   interface.
  
  ok it makes sense to clean this up.. what I don't like is that there
  STILL are all the double cases... for this to work and be worth it both
  the common case and the hugetlb case should be using the ops structure
  always! Anything else and you're just replacing bad code with bad
  code ;(
 
 Hmm.  Do you think everyone would support an extra pointer indirection
 for every handle_pte_fault() call?  

maybe. I'm not entirely convinced... (I like the cleanup potential a lot
code wise.. but if it costs performance, then... well I'd hate to see
linux get slower for hugetlbfs)

 If not, then I definitely wouldn't
 mind creating a default_pagetable_ops and calling into that.

... but without it to be honest, your patch adds nothing real.. there's
ONE user of your code, and there's no real cleanup unless you get rid of
all the special casing since the special casing is the really ugly
part of hugetlbfs, not the actual code inside the special case..


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/