Re: [PATCH] Use 1TB segments

2007-10-03 Thread Will Schmidt
On Wed, 2007-10-03 at 13:13 +1000, Paul Mackerras wrote:
 Will Schmidt writes:
 
  I still need to test this code for performance issues, and this version
  could still use some cosmetic touchups, so I dont think we want this to
  go into a tree yet.  I am reposting this primarily to indicate the prior
  version isnt quite right, and so Jon can rebase to this version.  :-)
 
 The way we scan the ibm,processor-segment-sizes property could be
 nicer.  Where there any other cosmetic touchups you were thinking of,
 and if so what were they?  I didn't notice any leftover debugging
 printks or anything else that obviously needed cleaning up.

Correct..  nothing in the patch really *needs* to be cleaned up.  This
is mostly me being way more nit-picky than I need to be.  :-)   I don't
have any real issues with the patch (being candidate for) going into a
tree. 

The only obvious is the MMU_SEGSIZE_* #define's in mmu-hash64.h appear
to be duplicated.

The rest I can follow up on later, none of it affects the code outside
of #ifdef DEBUG's and it should be a separate patch anyway.   

Thanks,
-Will

 Paul.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Use 1TB segments

2007-10-02 Thread Paul Mackerras
Olof Johansson writes:

  This makes the kernel use 1TB segments for all kernel mappings and for
  user addresses of 1TB and above, on machines which support them
  (currently POWER5+ and POWER6).
 
 PA6T supports them as well :)

In the patch, we don't actually hard-code the CPU_FTR_1T_SEGMENT bit
in the cputable entry for any processor; instead we look in the
ibm,processor-segment-sizes property in the cpu node(s) in the device
tree.  Do you want us to add the CPU_FTR_1T_SEGMENT bit to the
cputable entry for the PA6T, or will your firmware gives the
ibm,processor-segment-sizes property in the device tree?

 Wouldn't it be possible to stick with 1TB segments for the low range
 for 64-bit processes as well, and have them allocate their hugepages
 at 1TB?

You mean, forbid hugepages below 1TB?  That would be a user-visible
ABI change.  There are linker scripts for generating executables whose
text and/or data can go in hugepages, and I believe they put the
text/data below 1TB.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Use 1TB segments

2007-10-02 Thread Paul Mackerras
Will Schmidt writes:

 I still need to test this code for performance issues, and this version
 could still use some cosmetic touchups, so I dont think we want this to
 go into a tree yet.  I am reposting this primarily to indicate the prior
 version isnt quite right, and so Jon can rebase to this version.  :-)

The way we scan the ibm,processor-segment-sizes property could be
nicer.  Where there any other cosmetic touchups you were thinking of,
and if so what were they?  I didn't notice any leftover debugging
printks or anything else that obviously needed cleaning up.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Use 1TB segments

2007-10-02 Thread Olof Johansson
On Wed, Oct 03, 2007 at 01:07:58PM +1000, Paul Mackerras wrote:
 Olof Johansson writes:
 
   This makes the kernel use 1TB segments for all kernel mappings and for
   user addresses of 1TB and above, on machines which support them
   (currently POWER5+ and POWER6).
  
  PA6T supports them as well :)
 
 In the patch, we don't actually hard-code the CPU_FTR_1T_SEGMENT bit
 in the cputable entry for any processor; instead we look in the
 ibm,processor-segment-sizes property in the cpu node(s) in the device
 tree. 

Yep, I see that. I just wanted to clarify it for the (future) commit
message.

 Do you want us to add the CPU_FTR_1T_SEGMENT bit to the
 cputable entry for the PA6T, or will your firmware gives the
 ibm,processor-segment-sizes property in the device tree?

Thanks, but we've already got the properties there so it just works.

  Wouldn't it be possible to stick with 1TB segments for the low range
  for 64-bit processes as well, and have them allocate their hugepages
  at 1TB?
 
 You mean, forbid hugepages below 1TB?  That would be a user-visible
 ABI change.  There are linker scripts for generating executables whose
 text and/or data can go in hugepages, and I believe they put the
 text/data below 1TB.

Hm, good point. Bummer.


-Olof
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Use 1TB segments

2007-08-06 Thread Jon Tollefson
Paul Mackerras wrote:
 diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
   
A couple of hunks fail in this file when applying to the current tree.

...
 diff --git a/include/asm-powerpc/mmu-hash64.h 
 b/include/asm-powerpc/mmu-hash64.h
 index 695962f..053f86b 100644
 --- a/include/asm-powerpc/mmu-hash64.h
 +++ b/include/asm-powerpc/mmu-hash64.h
 @@ -47,6 +47,8 @@ extern char initial_stab[];

  /* Bits in the SLB VSID word */
  #define SLB_VSID_SHIFT   12
 +#define SLB_VSID_SHIFT_1T24
 +#define SLB_VSID_SSIZE_SHIFT 62
  #define SLB_VSID_B   ASM_CONST(0xc000)
  #define SLB_VSID_B_256M  ASM_CONST(0x)
  #define SLB_VSID_B_1TASM_CONST(0x4000)
 @@ -66,6 +68,7 @@ extern char initial_stab[];
  #define SLB_VSID_USER(SLB_VSID_KP|SLB_VSID_KS|SLB_VSID_C)

  #define SLBIE_C  (0x0800)
 +#define SLBIE_SSIZE_SHIFT25

  /*
   * Hash table
 @@ -77,7 +80,7 @@ extern char initial_stab[];
  #define HPTE_V_AVPN_SHIFT7
  #define HPTE_V_AVPN  ASM_CONST(0x3f80)
  #define HPTE_V_AVPN_VAL(x)   (((x)  HPTE_V_AVPN)  HPTE_V_AVPN_SHIFT)
 -#define HPTE_V_COMPARE(x,y)  (!(((x) ^ (y))  HPTE_V_AVPN))
 +#define HPTE_V_COMPARE(x,y)  (!(((x) ^ (y))  0xff80))
  #define HPTE_V_BOLTEDASM_CONST(0x0010)
  #define HPTE_V_LOCK  ASM_CONST(0x0008)
  #define HPTE_V_LARGE ASM_CONST(0x0004)
 @@ -164,16 +167,25 @@ struct mmu_psize_def
  #define MMU_SEGSIZE_256M 0
  #define MMU_SEGSIZE_1T   1

 +/*
 + * Supported segment sizes
 + */
 +#define MMU_SEGSIZE_256M 0
 +#define MMU_SEGSIZE_1T   1
   
It looks like this is repeating the definitions just above it.


Jon


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Use 1TB segments

2007-08-02 Thread Will Schmidt
Hi Paul, 
   just a few questions.  

On Wed, 2007-08-01 at 12:04 +1000, Paul Mackerras wrote:
 This makes the kernel use 1TB segments for all kernel mappings and for
 user addresses of 1TB and above, on machines which support them
 (currently POWER5+ and POWER6).  We don't currently use 1TB segments
 for user addresses  1T, since that would effectively prevent 32-bit
 processes from using huge pages unless we also had a way to revert to
 using 256MB segments.

I think I have a question about user address  1T..  once I think on
it a bit more it'll either click, or I'll have a question
articulated. :-)


 -static inline void __tlbiel(unsigned long va, unsigned int psize)
 +static inline void __tlbiel(unsigned long va, int psize, int ssize)

 -static inline void tlbie(unsigned long va, int psize, int local)
 +static inline void tlbie(unsigned long va, int psize, int ssize, int local)

  static long native_hpte_insert(unsigned long hpte_group, unsigned long va,
   unsigned long pa, unsigned long rflags,
 - unsigned long vflags, int psize)
 + unsigned long vflags, int psize, int ssize)

  static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
 -  unsigned long va, int psize, int local)
 +  unsigned long va, int psize, int ssize,
 +  int local)

Is there technical reason why the 'local' variable remains at the end of
the parm list for these?   In other cases 'ssize' simply gets added to
the end of the parm list. 

 +static int __init htab_dt_scan_seg_sizes(unsigned long node,
 +  const char *uname, int depth,
 +  void *data)
 +{
 + char *type = of_get_flat_dt_prop(node, device_type, NULL);
 + u32 *prop;
 + unsigned long size = 0;
 +
 + /* We are scanning cpu nodes only */
 + if (type == NULL || strcmp(type, cpu) != 0)
 + return 0;
 +
 + prop = (u32 *)of_get_flat_dt_prop(node, ibm,processor-segment-sizes,
 +   size);
 + if (prop != NULL  size = 8) {
 + if (prop[0] == 0x1c  prop[1] == 0x28) {

This is 0x1c indicating 2^28 for 256M; and 0x28 indicating 2^40 for 1TB
segments.

Will there ever be a segment size between the two?  Or will the
representation every vary from this?  i.e. wondering if prop[0] will
always be for 256M and prop[1] for 1TB.  

 +#define slb_vsid_shift(ssize)\
 + ((ssize) == MMU_SEGSIZE_256M? SLB_VSID_SHIFT: SLB_VSID_SHIFT_1T)

 @@ -100,12 +106,13 @@ void slb_flush_and_rebolt(void)
   vflags = SLB_VSID_KERNEL | vmalloc_llp;
 
   ksp_esid_data = mk_esid_data(get_paca()-kstack, 2);
 - if ((ksp_esid_data  ESID_MASK) == PAGE_OFFSET)
 + mask = (mmu_kernel_ssize == MMU_SEGSIZE_256M)? ESID_MASK: ESID_MASK_1T;

Is this one worthy of a #define like the slb_vsid_shift() above? 

 +#define VSID_MULTIPLIER_256M ASM_CONST(200730139)/* 28-bit prime */

 +#define VSID_MULTIPLIER_1T   ASM_CONST(12538073) /* 24-bit prime */

Anything special in how this 24-bit prime value was selected?   (same
question could be for the 28-bit prime, though I see that value was
updated at least once a few years back)

-Will


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Use 1TB segments

2007-08-02 Thread Benjamin Herrenschmidt

 Is there technical reason why the 'local' variable remains at the end of
 the parm list for these?   In other cases 'ssize' simply gets added to
 the end of the parm list. 

Looks nicer to have psize and ssize together :-)

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Use 1TB segments

2007-08-02 Thread Will Schmidt
On Fri, 2007-08-03 at 08:37 +1000, Benjamin Herrenschmidt wrote:
  Is there technical reason why the 'local' variable remains at the end of
  the parm list for these?   In other cases 'ssize' simply gets added to
  the end of the parm list. 
 
 Looks nicer to have psize and ssize together :-)

Aah!   And here I thought there was some obscure register usage
optimization going on..   :-)

-Will

 
 Ben.
 
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@ozlabs.org
 https://ozlabs.org/mailman/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Use 1TB segments

2007-08-02 Thread David Gibson
On Thu, Aug 02, 2007 at 03:41:23PM -0500, Will Schmidt wrote:
 Hi Paul, 
just a few questions.  
[snip]
  +#define VSID_MULTIPLIER_256M   ASM_CONST(200730139)/* 28-bit prime 
  */
 
  +#define VSID_MULTIPLIER_1T ASM_CONST(12538073) /* 24-bit prime */
 
 Anything special in how this 24-bit prime value was selected?   (same
 question could be for the 28-bit prime, though I see that value was
 updated at least once a few years back)

I can't speak to the 24-bit value specifically, but I can speak to the
28-bit one:  I did the rewrite of the SLB miss handler to use that
multiplicative hash, and changed the prime value when a bug report
showed problems in the original choice.

Afaict, the value of the prime doesn't matter all that much - in fact
it doesn't strictly even need to be prime, just co-prime to (2^36-1).
Originally, I picked the largest 28-bit prime, on the basis that a
large multiplier should give better scattering/folding.

That turned out to cause problems on some iSeries machines (which
couldn't do 16M pages) - we were filling up hash buckets with the
linear mapping.  I figured out that that was because a very large
28-bit number was in the relevant modulus, sort of equivalent to a
small negative number.  For a big linear mapping, the hash bucket
selected strided gradually backwards through the table - there were
also differences in the high bits of the hash, but they were lost
because of the limited size of the hash table.

So, I changed the multiplier to the median 28-bit prime, in the hopes
that that would give better hash scattering across as many bits of the
VSID as possible.  I don't have much in the way of theoretical
justification for that, but it fixed the iSeries problem and I've
never heard of any regressions, so apparently it's not too bad.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev