Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-08-01 Thread David Mosberger-Tang
On 8/1/07, Zoltan Menyhart <[EMAIL PROTECTED]> wrote:

> You do have model specific I cache semantics.
> Not taking it into account will oblige you to flush in vain for the models
> which do not require it. Why do you want to take this option?

Given unlimited resources, your proposal makes perfect sense.  We
could have a Linux version for Merced, one for McKinley, one for
Madison, etc., etc.

(Un)fortunately, resources are limited and with that constraint in
place, rather than spending lots of time optimizing the kernel for
particular idiosyncrasies of a CPU model, it is generally much better
to optimize it for the things the hardware designers promised us would
stay the same across CPU models (i.e., the "architecture").  Sure, it
may mean that on occasion certain things are slightly slower than they
could be but it does have the decided advantage of letting the
maintainers sleep at night... ;-)  Moreover, higher-level
optimizations usually have much higher payoff, so even though you may
do things a bit more slowly at the lowest level, you may end up with a
faster system overall because you were able to spend more time
optimizing at a higher level.

  --david
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-08-01 Thread Zoltan Menyhart

Luck, Tony wrote:

This seems crazy to me.  Flushing should occur according to the
*architecture*, not model-by-model.  Even if we happen to get "lucky"
on pre-Montecito CPUs, that doesn't justify such ugly hacks.  Or you
really want to debug this *again* come next CPU?



Ditto.  The only reason we should ever have model specific checks should
be to work around model specific errata (e.g. the McKinley Errata #9 code
in patch.c).


You do have model specific I cache semantics.
Not taking it into account will oblige you to flush in vain for the models
which do not require it. Why do you want to take this option?


Thanks,

Zoltan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-08-01 Thread Zoltan Menyhart

Jim Hull wrote:

Not just crazy, but wrong - this *can* happen on pre-Montecito.  Even though
L1D is write-through and L2 was mixed I/D, the L1 I-cache could contain
stale instrutions if there are missing flushes.


I cannot agree with you.

In order to consider an L1 I-cache entry as valid, a corresponding
virtual -> physic address translation should be valid in one of the L1 ITLBs.

"See 6.1.1. Instruction TLBS" of the I2 Proc. Ref. Man. for SW Dev. & Opt.

You cannot have a valid L1 ITLB entry unless you have a corresponding valid
L2 ITLB entry.

When you remove a PTE (or switch off the exec bit) and you flush the L2 ITLB
matching the old translation (and you kill the corresponding L1 ITLBs),
you do invalidate the corresponding L1 I-cache entries.

Therefore CPU models without split L2 caches are safe.

Thanks,

Zoltan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-08-01 Thread Zoltan Menyhart

Jim Hull wrote:

Not just crazy, but wrong - this *can* happen on pre-Montecito.  Even though
L1D is write-through and L2 was mixed I/D, the L1 I-cache could contain
stale instrutions if there are missing flushes.


I cannot agree with you.

In order to consider an L1 I-cache entry as valid, a corresponding
virtual - physic address translation should be valid in one of the L1 ITLBs.

See 6.1.1. Instruction TLBS of the I2 Proc. Ref. Man. for SW Dev.  Opt.

You cannot have a valid L1 ITLB entry unless you have a corresponding valid
L2 ITLB entry.

When you remove a PTE (or switch off the exec bit) and you flush the L2 ITLB
matching the old translation (and you kill the corresponding L1 ITLBs),
you do invalidate the corresponding L1 I-cache entries.

Therefore CPU models without split L2 caches are safe.

Thanks,

Zoltan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-08-01 Thread Zoltan Menyhart

Luck, Tony wrote:

This seems crazy to me.  Flushing should occur according to the
*architecture*, not model-by-model.  Even if we happen to get lucky
on pre-Montecito CPUs, that doesn't justify such ugly hacks.  Or you
really want to debug this *again* come next CPU?



Ditto.  The only reason we should ever have model specific checks should
be to work around model specific errata (e.g. the McKinley Errata #9 code
in patch.c).


You do have model specific I cache semantics.
Not taking it into account will oblige you to flush in vain for the models
which do not require it. Why do you want to take this option?


Thanks,

Zoltan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-08-01 Thread David Mosberger-Tang
On 8/1/07, Zoltan Menyhart [EMAIL PROTECTED] wrote:

 You do have model specific I cache semantics.
 Not taking it into account will oblige you to flush in vain for the models
 which do not require it. Why do you want to take this option?

Given unlimited resources, your proposal makes perfect sense.  We
could have a Linux version for Merced, one for McKinley, one for
Madison, etc., etc.

(Un)fortunately, resources are limited and with that constraint in
place, rather than spending lots of time optimizing the kernel for
particular idiosyncrasies of a CPU model, it is generally much better
to optimize it for the things the hardware designers promised us would
stay the same across CPU models (i.e., the architecture).  Sure, it
may mean that on occasion certain things are slightly slower than they
could be but it does have the decided advantage of letting the
maintainers sleep at night... ;-)  Moreover, higher-level
optimizations usually have much higher payoff, so even though you may
do things a bit more slowly at the lowest level, you may end up with a
faster system overall because you were able to spend more time
optimizing at a higher level.

  --david
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-31 Thread Jim Hull
Not just crazy, but wrong - this *can* happen on pre-Montecito.  Even though
L1D is write-through and L2 was mixed I/D, the L1 I-cache could contain
stale instrutions if there are missing flushes. I think the only reason this
has never been observed is that the L1I is so small (32K) that it's likely
that any stale data has been displaced.

 -- Jim

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> KAMEZAWA Hiroyuki
> Sent: Monday, July 30, 2007 9:30 PM
> To: David Mosberger-Tang
> Cc: LKML; [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
> [EMAIL PROTECTED]; Christoph Lameter
> Subject: Re: [PATCH] flush icache before set_pte take6. [4/4] 
> optimization for cpus other than montecito
> 
> On Mon, 30 Jul 2007 22:15:50 -0600
> "David Mosberger-Tang" <[EMAIL PROTECTED]> wrote:
> 
> > This seems crazy to me.  Flushing should occur according to the
> > *architecture*, not model-by-model.  Even if we happen to 
> get "lucky"
> > on pre-Montecito CPUs, that doesn't justify such ugly hacks.  
> 
> I'm not sure this can happen before Montecito because L1 was 
> write-through
> and L2 was mixed. 
> 
> > Or you really want to debug this *again* come next CPU?
> 
> No. 
> I should add RFC to this patch. I just want to hear opinions.
> This is why I separated this patch. I can drop this.
> 
> Thanks,
> -Kame
> 
> 
> 
> 
> 
>  
> >   --david
> > 
> > On 7/30/07, KAMEZAWA Hiroyuki 
> <[EMAIL PROTECTED]> wrote:
> > >
> > > Add "L2 cache is separated? check flag" as read_mostly 
> global variable.
> > >
> > > This add one memory reference to global variable to page 
> faults of "executable"
> > > map in do_wp_page(page copy case), file-mapped page fault 
> and some system calls
> > > which does memory map changes. But not so bad as calling 
> sync_icache_dcache in
> > > architectures which doesn't need it.
> > >
> > > Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>
> > >
> > >
> > > ---
> > >  arch/ia64/kernel/setup.c   |7 +++
> > >  include/asm-ia64/pgtable.h |3 ++-
> > >  2 files changed, 9 insertions(+), 1 deletion(-)
> > >
> > > Index: linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
> > > 
> ===
> > > --- linux-2.6.23-rc1.test.orig/arch/ia64/kernel/setup.c
> > > +++ linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
> > > @@ -106,6 +106,8 @@ struct io_space io_space[MAX_IO_SPACES];
> > >  EXPORT_SYMBOL(io_space);
> > >  unsigned int num_io_spaces;
> > >
> > > +int separated_l2_icache_dcache __read_mostly;
> > > +
> > >  /*
> > >   * "flush_icache_range()" needs to know what processor 
> dependent stride size to use
> > >   * when it makes i-cache(s) coherent with d-caches.
> > > @@ -718,6 +720,11 @@ get_model_name(__u8 family, __u8 model)
> > > printk(KERN_ERR
> > >"%s: Table overflow. Some 
> processor model information will be missing\n",
> > >__FUNCTION__);
> > > +   /* Montecito has separated L2 Icache and Dcache. 
> This requires
> > > +  synchronize Icache and Dcache before set_pte() */
> > > +   if (family == 0x20)
> > > +   separated_l2_icache_dcache = 1;
> > > +
> > > return "Unknown";
> > >  }
> > >
> > > Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
> > > 
> ===
> > > --- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h
> > > +++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
> > > @@ -489,9 +489,10 @@ extern struct page *zero_page_memmap_ptr
> > >   * as an executable pte.
> > >   */
> > >  extern void __sync_icache_dcache(pte_t pte);
> > > +extern int separated_l2_icache_dcache;
> > >  static inline void sync_icache_dcache(pte_t pte)
> > >  {
> > > -   if (pte_exec(pte))
> > > +   if (pte_exec(pte) && separated_l2_icache_dcache)
> > > __sync_icache_dcache(pte);
> > >  }
> > >  #define __HAVE_ARCH_SYNC_ICACHE_DCACHE
> > >
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe 
> linux-ia64" in
> > > the body of a message to [EMAIL PROTECTED]
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
> > 
> > 
> > -- 
> > Mosberger Consulting LLC, http://www.mosberger-consulting.com/
> > 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-ia64" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-31 Thread Luck, Tony
> This seems crazy to me.  Flushing should occur according to the
> *architecture*, not model-by-model.  Even if we happen to get "lucky"
> on pre-Montecito CPUs, that doesn't justify such ugly hacks.  Or you
> really want to debug this *again* come next CPU?

Ditto.  The only reason we should ever have model specific checks should
be to work around model specific errata (e.g. the McKinley Errata #9 code
in patch.c).

-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-31 Thread Zoltan Menyhart

KAMEZAWA Hiroyuki wrote:


Could we discuss this in other thread as "optimization for some cpus" and
push bug-fix patches first ?

If take5 or part of take6(patch 1,2) are not acceptable, I'll continue this
work on -rc2.

Thanks,
-Kame


Sure.

Thanks,

Zoltan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-31 Thread KAMEZAWA Hiroyuki
On Tue, 31 Jul 2007 10:38:48 +0200
Zoltan Menyhart <[EMAIL PROTECTED]> wrote:

> David Mosberger-Tang wrote:
> > This seems crazy to me.  Flushing should occur according to the
> > *architecture*, not model-by-model.  Even if we happen to get "lucky"
> > on pre-Montecito CPUs, that doesn't justify such ugly hacks.  Or you
> > really want to debug this *again* come next CPU?
> > 
> >   --david
> 
> O.K. let's say we flush by default: the global flag is set.
> 
> We can have a (short) list of the CPU models which do not require
> this flush.
> 
> If all of the CPUs are on the list then clear the global flag. And:
> 
> static inline void sync_icache_dcache(pte_t pte) {
>if (pte_exec(pte) && global_flag)
>__sync_icache_dcache(pte);
> }
> 

Could we discuss this in other thread as "optimization for some cpus" and
push bug-fix patches first ?

If take5 or part of take6(patch 1,2) are not acceptable, I'll continue this
work on -rc2.

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-31 Thread Zoltan Menyhart

David Mosberger-Tang wrote:

This seems crazy to me.  Flushing should occur according to the
*architecture*, not model-by-model.  Even if we happen to get "lucky"
on pre-Montecito CPUs, that doesn't justify such ugly hacks.  Or you
really want to debug this *again* come next CPU?

  --david


O.K. let's say we flush by default: the global flag is set.

We can have a (short) list of the CPU models which do not require
this flush.

If all of the CPUs are on the list then clear the global flag. And:

static inline void sync_icache_dcache(pte_t pte) {
  if (pte_exec(pte) && global_flag)
  __sync_icache_dcache(pte);
}

Thanks,

Zoltan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-31 Thread Zoltan Menyhart

David Mosberger-Tang wrote:

This seems crazy to me.  Flushing should occur according to the
*architecture*, not model-by-model.  Even if we happen to get lucky
on pre-Montecito CPUs, that doesn't justify such ugly hacks.  Or you
really want to debug this *again* come next CPU?

  --david


O.K. let's say we flush by default: the global flag is set.

We can have a (short) list of the CPU models which do not require
this flush.

If all of the CPUs are on the list then clear the global flag. And:

static inline void sync_icache_dcache(pte_t pte) {
  if (pte_exec(pte)  global_flag)
  __sync_icache_dcache(pte);
}

Thanks,

Zoltan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-31 Thread KAMEZAWA Hiroyuki
On Tue, 31 Jul 2007 10:38:48 +0200
Zoltan Menyhart [EMAIL PROTECTED] wrote:

 David Mosberger-Tang wrote:
  This seems crazy to me.  Flushing should occur according to the
  *architecture*, not model-by-model.  Even if we happen to get lucky
  on pre-Montecito CPUs, that doesn't justify such ugly hacks.  Or you
  really want to debug this *again* come next CPU?
  
--david
 
 O.K. let's say we flush by default: the global flag is set.
 
 We can have a (short) list of the CPU models which do not require
 this flush.
 
 If all of the CPUs are on the list then clear the global flag. And:
 
 static inline void sync_icache_dcache(pte_t pte) {
if (pte_exec(pte)  global_flag)
__sync_icache_dcache(pte);
 }
 

Could we discuss this in other thread as optimization for some cpus and
push bug-fix patches first ?

If take5 or part of take6(patch 1,2) are not acceptable, I'll continue this
work on -rc2.

Thanks,
-Kame

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-31 Thread Zoltan Menyhart

KAMEZAWA Hiroyuki wrote:


Could we discuss this in other thread as optimization for some cpus and
push bug-fix patches first ?

If take5 or part of take6(patch 1,2) are not acceptable, I'll continue this
work on -rc2.

Thanks,
-Kame


Sure.

Thanks,

Zoltan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-31 Thread Luck, Tony
 This seems crazy to me.  Flushing should occur according to the
 *architecture*, not model-by-model.  Even if we happen to get lucky
 on pre-Montecito CPUs, that doesn't justify such ugly hacks.  Or you
 really want to debug this *again* come next CPU?

Ditto.  The only reason we should ever have model specific checks should
be to work around model specific errata (e.g. the McKinley Errata #9 code
in patch.c).

-Tony
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-31 Thread Jim Hull
Not just crazy, but wrong - this *can* happen on pre-Montecito.  Even though
L1D is write-through and L2 was mixed I/D, the L1 I-cache could contain
stale instrutions if there are missing flushes. I think the only reason this
has never been observed is that the L1I is so small (32K) that it's likely
that any stale data has been displaced.

 -- Jim

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 KAMEZAWA Hiroyuki
 Sent: Monday, July 30, 2007 9:30 PM
 To: David Mosberger-Tang
 Cc: LKML; [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; Christoph Lameter
 Subject: Re: [PATCH] flush icache before set_pte take6. [4/4] 
 optimization for cpus other than montecito
 
 On Mon, 30 Jul 2007 22:15:50 -0600
 David Mosberger-Tang [EMAIL PROTECTED] wrote:
 
  This seems crazy to me.  Flushing should occur according to the
  *architecture*, not model-by-model.  Even if we happen to 
 get lucky
  on pre-Montecito CPUs, that doesn't justify such ugly hacks.  
 
 I'm not sure this can happen before Montecito because L1 was 
 write-through
 and L2 was mixed. 
 
  Or you really want to debug this *again* come next CPU?
 
 No. 
 I should add RFC to this patch. I just want to hear opinions.
 This is why I separated this patch. I can drop this.
 
 Thanks,
 -Kame
 
 
 
 
 
  
--david
  
  On 7/30/07, KAMEZAWA Hiroyuki 
 [EMAIL PROTECTED] wrote:
  
   Add L2 cache is separated? check flag as read_mostly 
 global variable.
  
   This add one memory reference to global variable to page 
 faults of executable
   map in do_wp_page(page copy case), file-mapped page fault 
 and some system calls
   which does memory map changes. But not so bad as calling 
 sync_icache_dcache in
   architectures which doesn't need it.
  
   Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED]
  
  
   ---
arch/ia64/kernel/setup.c   |7 +++
include/asm-ia64/pgtable.h |3 ++-
2 files changed, 9 insertions(+), 1 deletion(-)
  
   Index: linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
   
 ===
   --- linux-2.6.23-rc1.test.orig/arch/ia64/kernel/setup.c
   +++ linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
   @@ -106,6 +106,8 @@ struct io_space io_space[MAX_IO_SPACES];
EXPORT_SYMBOL(io_space);
unsigned int num_io_spaces;
  
   +int separated_l2_icache_dcache __read_mostly;
   +
/*
 * flush_icache_range() needs to know what processor 
 dependent stride size to use
 * when it makes i-cache(s) coherent with d-caches.
   @@ -718,6 +720,11 @@ get_model_name(__u8 family, __u8 model)
   printk(KERN_ERR
  %s: Table overflow. Some 
 processor model information will be missing\n,
  __FUNCTION__);
   +   /* Montecito has separated L2 Icache and Dcache. 
 This requires
   +  synchronize Icache and Dcache before set_pte() */
   +   if (family == 0x20)
   +   separated_l2_icache_dcache = 1;
   +
   return Unknown;
}
  
   Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
   
 ===
   --- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h
   +++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
   @@ -489,9 +489,10 @@ extern struct page *zero_page_memmap_ptr
 * as an executable pte.
 */
extern void __sync_icache_dcache(pte_t pte);
   +extern int separated_l2_icache_dcache;
static inline void sync_icache_dcache(pte_t pte)
{
   -   if (pte_exec(pte))
   +   if (pte_exec(pte)  separated_l2_icache_dcache)
   __sync_icache_dcache(pte);
}
#define __HAVE_ARCH_SYNC_ICACHE_DCACHE
  
   -
   To unsubscribe from this list: send the line unsubscribe 
 linux-ia64 in
   the body of a message to [EMAIL PROTECTED]
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
  
  
  -- 
  Mosberger Consulting LLC, http://www.mosberger-consulting.com/
  
 
 -
 To unsubscribe from this list: send the line unsubscribe 
 linux-ia64 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-30 Thread KAMEZAWA Hiroyuki
On Tue, 31 Jul 2007 13:29:32 +0900
KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:

> On Mon, 30 Jul 2007 22:15:50 -0600
> "David Mosberger-Tang" <[EMAIL PROTECTED]> wrote:
> 
> > This seems crazy to me.  Flushing should occur according to the
> > *architecture*, not model-by-model.  Even if we happen to get "lucky"
> > on pre-Montecito CPUs, that doesn't justify such ugly hacks.  
> 

BTW, how about the quality of patch[2/4] ?

If it is enough well, I want it to be merged

please..
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-30 Thread KAMEZAWA Hiroyuki
On Mon, 30 Jul 2007 22:15:50 -0600
"David Mosberger-Tang" <[EMAIL PROTECTED]> wrote:

> This seems crazy to me.  Flushing should occur according to the
> *architecture*, not model-by-model.  Even if we happen to get "lucky"
> on pre-Montecito CPUs, that doesn't justify such ugly hacks.  

I'm not sure this can happen before Montecito because L1 was write-through
and L2 was mixed. 

> Or you really want to debug this *again* come next CPU?

No. 
I should add RFC to this patch. I just want to hear opinions.
This is why I separated this patch. I can drop this.

Thanks,
-Kame





 
>   --david
> 
> On 7/30/07, KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:
> >
> > Add "L2 cache is separated? check flag" as read_mostly global variable.
> >
> > This add one memory reference to global variable to page faults of 
> > "executable"
> > map in do_wp_page(page copy case), file-mapped page fault and some system 
> > calls
> > which does memory map changes. But not so bad as calling sync_icache_dcache 
> > in
> > architectures which doesn't need it.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>
> >
> >
> > ---
> >  arch/ia64/kernel/setup.c   |7 +++
> >  include/asm-ia64/pgtable.h |3 ++-
> >  2 files changed, 9 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
> > ===
> > --- linux-2.6.23-rc1.test.orig/arch/ia64/kernel/setup.c
> > +++ linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
> > @@ -106,6 +106,8 @@ struct io_space io_space[MAX_IO_SPACES];
> >  EXPORT_SYMBOL(io_space);
> >  unsigned int num_io_spaces;
> >
> > +int separated_l2_icache_dcache __read_mostly;
> > +
> >  /*
> >   * "flush_icache_range()" needs to know what processor dependent stride 
> > size to use
> >   * when it makes i-cache(s) coherent with d-caches.
> > @@ -718,6 +720,11 @@ get_model_name(__u8 family, __u8 model)
> > printk(KERN_ERR
> >"%s: Table overflow. Some processor model 
> > information will be missing\n",
> >__FUNCTION__);
> > +   /* Montecito has separated L2 Icache and Dcache. This requires
> > +  synchronize Icache and Dcache before set_pte() */
> > +   if (family == 0x20)
> > +   separated_l2_icache_dcache = 1;
> > +
> > return "Unknown";
> >  }
> >
> > Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
> > ===
> > --- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h
> > +++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
> > @@ -489,9 +489,10 @@ extern struct page *zero_page_memmap_ptr
> >   * as an executable pte.
> >   */
> >  extern void __sync_icache_dcache(pte_t pte);
> > +extern int separated_l2_icache_dcache;
> >  static inline void sync_icache_dcache(pte_t pte)
> >  {
> > -   if (pte_exec(pte))
> > +   if (pte_exec(pte) && separated_l2_icache_dcache)
> > __sync_icache_dcache(pte);
> >  }
> >  #define __HAVE_ARCH_SYNC_ICACHE_DCACHE
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 
> -- 
> Mosberger Consulting LLC, http://www.mosberger-consulting.com/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-30 Thread David Mosberger-Tang
This seems crazy to me.  Flushing should occur according to the
*architecture*, not model-by-model.  Even if we happen to get "lucky"
on pre-Montecito CPUs, that doesn't justify such ugly hacks.  Or you
really want to debug this *again* come next CPU?

  --david

On 7/30/07, KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:
>
> Add "L2 cache is separated? check flag" as read_mostly global variable.
>
> This add one memory reference to global variable to page faults of 
> "executable"
> map in do_wp_page(page copy case), file-mapped page fault and some system 
> calls
> which does memory map changes. But not so bad as calling sync_icache_dcache in
> architectures which doesn't need it.
>
> Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>
>
>
> ---
>  arch/ia64/kernel/setup.c   |7 +++
>  include/asm-ia64/pgtable.h |3 ++-
>  2 files changed, 9 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
> ===
> --- linux-2.6.23-rc1.test.orig/arch/ia64/kernel/setup.c
> +++ linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
> @@ -106,6 +106,8 @@ struct io_space io_space[MAX_IO_SPACES];
>  EXPORT_SYMBOL(io_space);
>  unsigned int num_io_spaces;
>
> +int separated_l2_icache_dcache __read_mostly;
> +
>  /*
>   * "flush_icache_range()" needs to know what processor dependent stride size 
> to use
>   * when it makes i-cache(s) coherent with d-caches.
> @@ -718,6 +720,11 @@ get_model_name(__u8 family, __u8 model)
> printk(KERN_ERR
>"%s: Table overflow. Some processor model information 
> will be missing\n",
>__FUNCTION__);
> +   /* Montecito has separated L2 Icache and Dcache. This requires
> +  synchronize Icache and Dcache before set_pte() */
> +   if (family == 0x20)
> +   separated_l2_icache_dcache = 1;
> +
> return "Unknown";
>  }
>
> Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
> ===
> --- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h
> +++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
> @@ -489,9 +489,10 @@ extern struct page *zero_page_memmap_ptr
>   * as an executable pte.
>   */
>  extern void __sync_icache_dcache(pte_t pte);
> +extern int separated_l2_icache_dcache;
>  static inline void sync_icache_dcache(pte_t pte)
>  {
> -   if (pte_exec(pte))
> +   if (pte_exec(pte) && separated_l2_icache_dcache)
> __sync_icache_dcache(pte);
>  }
>  #define __HAVE_ARCH_SYNC_ICACHE_DCACHE
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Mosberger Consulting LLC, http://www.mosberger-consulting.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-30 Thread KAMEZAWA Hiroyuki

Add "L2 cache is separated? check flag" as read_mostly global variable.

This add one memory reference to global variable to page faults of "executable"
map in do_wp_page(page copy case), file-mapped page fault and some system calls
which does memory map changes. But not so bad as calling sync_icache_dcache in
architectures which doesn't need it.

Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>


---
 arch/ia64/kernel/setup.c   |7 +++
 include/asm-ia64/pgtable.h |3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

Index: linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
===
--- linux-2.6.23-rc1.test.orig/arch/ia64/kernel/setup.c
+++ linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
@@ -106,6 +106,8 @@ struct io_space io_space[MAX_IO_SPACES];
 EXPORT_SYMBOL(io_space);
 unsigned int num_io_spaces;
 
+int separated_l2_icache_dcache __read_mostly;
+
 /*
  * "flush_icache_range()" needs to know what processor dependent stride size 
to use
  * when it makes i-cache(s) coherent with d-caches.
@@ -718,6 +720,11 @@ get_model_name(__u8 family, __u8 model)
printk(KERN_ERR
   "%s: Table overflow. Some processor model information 
will be missing\n",
   __FUNCTION__);
+   /* Montecito has separated L2 Icache and Dcache. This requires
+  synchronize Icache and Dcache before set_pte() */
+   if (family == 0x20)
+   separated_l2_icache_dcache = 1;
+
return "Unknown";
 }
 
Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
===
--- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h
+++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
@@ -489,9 +489,10 @@ extern struct page *zero_page_memmap_ptr
  * as an executable pte.
  */
 extern void __sync_icache_dcache(pte_t pte);
+extern int separated_l2_icache_dcache;
 static inline void sync_icache_dcache(pte_t pte)
 {
-   if (pte_exec(pte))
+   if (pte_exec(pte) && separated_l2_icache_dcache)
__sync_icache_dcache(pte);
 }
 #define __HAVE_ARCH_SYNC_ICACHE_DCACHE

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-30 Thread KAMEZAWA Hiroyuki

Add L2 cache is separated? check flag as read_mostly global variable.

This add one memory reference to global variable to page faults of executable
map in do_wp_page(page copy case), file-mapped page fault and some system calls
which does memory map changes. But not so bad as calling sync_icache_dcache in
architectures which doesn't need it.

Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED]


---
 arch/ia64/kernel/setup.c   |7 +++
 include/asm-ia64/pgtable.h |3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

Index: linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
===
--- linux-2.6.23-rc1.test.orig/arch/ia64/kernel/setup.c
+++ linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
@@ -106,6 +106,8 @@ struct io_space io_space[MAX_IO_SPACES];
 EXPORT_SYMBOL(io_space);
 unsigned int num_io_spaces;
 
+int separated_l2_icache_dcache __read_mostly;
+
 /*
  * flush_icache_range() needs to know what processor dependent stride size 
to use
  * when it makes i-cache(s) coherent with d-caches.
@@ -718,6 +720,11 @@ get_model_name(__u8 family, __u8 model)
printk(KERN_ERR
   %s: Table overflow. Some processor model information 
will be missing\n,
   __FUNCTION__);
+   /* Montecito has separated L2 Icache and Dcache. This requires
+  synchronize Icache and Dcache before set_pte() */
+   if (family == 0x20)
+   separated_l2_icache_dcache = 1;
+
return Unknown;
 }
 
Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
===
--- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h
+++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
@@ -489,9 +489,10 @@ extern struct page *zero_page_memmap_ptr
  * as an executable pte.
  */
 extern void __sync_icache_dcache(pte_t pte);
+extern int separated_l2_icache_dcache;
 static inline void sync_icache_dcache(pte_t pte)
 {
-   if (pte_exec(pte))
+   if (pte_exec(pte)  separated_l2_icache_dcache)
__sync_icache_dcache(pte);
 }
 #define __HAVE_ARCH_SYNC_ICACHE_DCACHE

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-30 Thread David Mosberger-Tang
This seems crazy to me.  Flushing should occur according to the
*architecture*, not model-by-model.  Even if we happen to get lucky
on pre-Montecito CPUs, that doesn't justify such ugly hacks.  Or you
really want to debug this *again* come next CPU?

  --david

On 7/30/07, KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote:

 Add L2 cache is separated? check flag as read_mostly global variable.

 This add one memory reference to global variable to page faults of 
 executable
 map in do_wp_page(page copy case), file-mapped page fault and some system 
 calls
 which does memory map changes. But not so bad as calling sync_icache_dcache in
 architectures which doesn't need it.

 Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED]


 ---
  arch/ia64/kernel/setup.c   |7 +++
  include/asm-ia64/pgtable.h |3 ++-
  2 files changed, 9 insertions(+), 1 deletion(-)

 Index: linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
 ===
 --- linux-2.6.23-rc1.test.orig/arch/ia64/kernel/setup.c
 +++ linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
 @@ -106,6 +106,8 @@ struct io_space io_space[MAX_IO_SPACES];
  EXPORT_SYMBOL(io_space);
  unsigned int num_io_spaces;

 +int separated_l2_icache_dcache __read_mostly;
 +
  /*
   * flush_icache_range() needs to know what processor dependent stride size 
 to use
   * when it makes i-cache(s) coherent with d-caches.
 @@ -718,6 +720,11 @@ get_model_name(__u8 family, __u8 model)
 printk(KERN_ERR
%s: Table overflow. Some processor model information 
 will be missing\n,
__FUNCTION__);
 +   /* Montecito has separated L2 Icache and Dcache. This requires
 +  synchronize Icache and Dcache before set_pte() */
 +   if (family == 0x20)
 +   separated_l2_icache_dcache = 1;
 +
 return Unknown;
  }

 Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
 ===
 --- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h
 +++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
 @@ -489,9 +489,10 @@ extern struct page *zero_page_memmap_ptr
   * as an executable pte.
   */
  extern void __sync_icache_dcache(pte_t pte);
 +extern int separated_l2_icache_dcache;
  static inline void sync_icache_dcache(pte_t pte)
  {
 -   if (pte_exec(pte))
 +   if (pte_exec(pte)  separated_l2_icache_dcache)
 __sync_icache_dcache(pte);
  }
  #define __HAVE_ARCH_SYNC_ICACHE_DCACHE

 -
 To unsubscribe from this list: send the line unsubscribe linux-ia64 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Mosberger Consulting LLC, http://www.mosberger-consulting.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-30 Thread KAMEZAWA Hiroyuki
On Mon, 30 Jul 2007 22:15:50 -0600
David Mosberger-Tang [EMAIL PROTECTED] wrote:

 This seems crazy to me.  Flushing should occur according to the
 *architecture*, not model-by-model.  Even if we happen to get lucky
 on pre-Montecito CPUs, that doesn't justify such ugly hacks.  

I'm not sure this can happen before Montecito because L1 was write-through
and L2 was mixed. 

 Or you really want to debug this *again* come next CPU?

No. 
I should add RFC to this patch. I just want to hear opinions.
This is why I separated this patch. I can drop this.

Thanks,
-Kame





 
   --david
 
 On 7/30/07, KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote:
 
  Add L2 cache is separated? check flag as read_mostly global variable.
 
  This add one memory reference to global variable to page faults of 
  executable
  map in do_wp_page(page copy case), file-mapped page fault and some system 
  calls
  which does memory map changes. But not so bad as calling sync_icache_dcache 
  in
  architectures which doesn't need it.
 
  Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED]
 
 
  ---
   arch/ia64/kernel/setup.c   |7 +++
   include/asm-ia64/pgtable.h |3 ++-
   2 files changed, 9 insertions(+), 1 deletion(-)
 
  Index: linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
  ===
  --- linux-2.6.23-rc1.test.orig/arch/ia64/kernel/setup.c
  +++ linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
  @@ -106,6 +106,8 @@ struct io_space io_space[MAX_IO_SPACES];
   EXPORT_SYMBOL(io_space);
   unsigned int num_io_spaces;
 
  +int separated_l2_icache_dcache __read_mostly;
  +
   /*
* flush_icache_range() needs to know what processor dependent stride 
  size to use
* when it makes i-cache(s) coherent with d-caches.
  @@ -718,6 +720,11 @@ get_model_name(__u8 family, __u8 model)
  printk(KERN_ERR
 %s: Table overflow. Some processor model 
  information will be missing\n,
 __FUNCTION__);
  +   /* Montecito has separated L2 Icache and Dcache. This requires
  +  synchronize Icache and Dcache before set_pte() */
  +   if (family == 0x20)
  +   separated_l2_icache_dcache = 1;
  +
  return Unknown;
   }
 
  Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
  ===
  --- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h
  +++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
  @@ -489,9 +489,10 @@ extern struct page *zero_page_memmap_ptr
* as an executable pte.
*/
   extern void __sync_icache_dcache(pte_t pte);
  +extern int separated_l2_icache_dcache;
   static inline void sync_icache_dcache(pte_t pte)
   {
  -   if (pte_exec(pte))
  +   if (pte_exec(pte)  separated_l2_icache_dcache)
  __sync_icache_dcache(pte);
   }
   #define __HAVE_ARCH_SYNC_ICACHE_DCACHE
 
  -
  To unsubscribe from this list: send the line unsubscribe linux-ia64 in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 
 -- 
 Mosberger Consulting LLC, http://www.mosberger-consulting.com/
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-30 Thread KAMEZAWA Hiroyuki
On Tue, 31 Jul 2007 13:29:32 +0900
KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote:

 On Mon, 30 Jul 2007 22:15:50 -0600
 David Mosberger-Tang [EMAIL PROTECTED] wrote:
 
  This seems crazy to me.  Flushing should occur according to the
  *architecture*, not model-by-model.  Even if we happen to get lucky
  on pre-Montecito CPUs, that doesn't justify such ugly hacks.  
 

BTW, how about the quality of patch[2/4] ?

If it is enough well, I want it to be merged

please..
-Kame

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/