Re: i915.ko WC writes are slow after ea8596bb2d8d379

2015-11-19 Thread Ingo Molnar

* Chris Wilson  wrote:

> > > A bisection pointed to 
> > > 
> > > commit ea8596bb2d8d37957f3e92db9511c50801689180
> > > Author: Masami Hiramatsu 
> > > Date:   Thu Jul 18 20:47:53 2013 +0900
> > > 
> > > kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() 
> > > functions
> > > 
> > > of which the active ingredient was just
> > > 
> > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > > index b32ebf9..f4001e0 100644
> > > --- a/arch/x86/Kconfig
> > > +++ b/arch/x86/Kconfig
> > > @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
> > >  
> > >  config HAVE_TEXT_POKE_SMP
> > > bool
> > > -   select STOP_MACHINE if SMP

Ouch...

This is certainly an educative example of how pure 'code removal' patches can 
have 
unintended side effects.

Is there a full fix patch available, and is anyone pushing that to Linus?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2015-11-19 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> On Wed, Nov 18, 2015 at 6:48 AM, Chris Wilson  
> wrote:
> > Although
> >
> > diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
> > index d2abbdb..ff4f029 100644
> > --- a/include/linux/stop_machine.h
> > +++ b/include/linux/stop_machine.h
> > @@ -97,7 +97,7 @@ static inline int try_stop_cpus(const struct cpumask 
> > *cpumask,
> >   * grabbing every spinlock (and more).  So the "read" side to such a
> >   * lock is anything which disables preemption.
> >   */
> > -#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
> > +#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)
> 
> [...]
> 
> This seems much better.  Having a set of stop_machine functions around
> that don't work depending on config seems dangerous.

Agreed.

Acked-by: Ingo Molnar 

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2015-11-19 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> On Wed, Nov 18, 2015 at 6:48 AM, Chris Wilson  
> wrote:
> > Although
> >
> > diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
> > index d2abbdb..ff4f029 100644
> > --- a/include/linux/stop_machine.h
> > +++ b/include/linux/stop_machine.h
> > @@ -97,7 +97,7 @@ static inline int try_stop_cpus(const struct cpumask 
> > *cpumask,
> >   * grabbing every spinlock (and more).  So the "read" side to such a
> >   * lock is anything which disables preemption.
> >   */
> > -#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
> > +#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)
> 
> [...]
> 
> This seems much better.  Having a set of stop_machine functions around
> that don't work depending on config seems dangerous.

Agreed.

Acked-by: Ingo Molnar 

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2015-11-19 Thread Ingo Molnar

* Chris Wilson  wrote:

> > > A bisection pointed to 
> > > 
> > > commit ea8596bb2d8d37957f3e92db9511c50801689180
> > > Author: Masami Hiramatsu 
> > > Date:   Thu Jul 18 20:47:53 2013 +0900
> > > 
> > > kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() 
> > > functions
> > > 
> > > of which the active ingredient was just
> > > 
> > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > > index b32ebf9..f4001e0 100644
> > > --- a/arch/x86/Kconfig
> > > +++ b/arch/x86/Kconfig
> > > @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
> > >  
> > >  config HAVE_TEXT_POKE_SMP
> > > bool
> > > -   select STOP_MACHINE if SMP

Ouch...

This is certainly an educative example of how pure 'code removal' patches can 
have 
unintended side effects.

Is there a full fix patch available, and is anyone pushing that to Linus?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2015-11-18 Thread Andy Lutomirski
On Wed, Nov 18, 2015 at 6:48 AM, Chris Wilson  wrote:
> Although
>
> diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
> index d2abbdb..ff4f029 100644
> --- a/include/linux/stop_machine.h
> +++ b/include/linux/stop_machine.h
> @@ -97,7 +97,7 @@ static inline int try_stop_cpus(const struct cpumask 
> *cpumask,
>   * grabbing every spinlock (and more).  So the "read" side to such a
>   * lock is anything which disables preemption.
>   */
> -#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
> +#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)

[...]


This seems much better.  Having a set of stop_machine functions around
that don't work depending on config seems dangerous.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2015-11-18 Thread Chris Wilson
On Wed, Oct 08, 2014 at 05:10:59AM -0500, Chuck Ebbert wrote:
> On Wed, 8 Oct 2014 10:03:36 +0100
> Chris Wilson  wrote:
> 
> > 
> > I ran into a problem on a Sandybridge i5-2500s whilst measuring the
> > performance of GTT write-combining access. I found subsequent runs were
> > about 10-40x slower than the first. For example,
> > 
> > igt/gem_gtt_speed:
> > 
> > Time to read 16k through a GTT map: 325.285µs
> > Time to write 16k through a GTT map:  4.729µs
> > Time to clear 16k through a GTT map:  4.584µs
> > Time to clear 16k through a cached GTT map:   1.342µs
> > 
> > on the second run became:
> > 
> > Time to read 16k through a GTT map: 332.148µs
> > Time to write 16k through a GTT map:209.411µs
> > Time to clear 16k through a GTT map: 56.460µs
> > Time to clear 16k through a cached GTT map:  50.897µs
> > 
> > Naively I would say that we lost the wc on our ioremap.
> > /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
> > runs.
> > 
> > A bisection pointed to 
> > 
> > commit ea8596bb2d8d37957f3e92db9511c50801689180
> > Author: Masami Hiramatsu 
> > Date:   Thu Jul 18 20:47:53 2013 +0900
> > 
> > kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() 
> > functions
> > 
> > of which the active ingredient was just
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index b32ebf9..f4001e0 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
> >  
> >  config HAVE_TEXT_POKE_SMP
> > bool
> > -   select STOP_MACHINE if SMP
> >  
> >  config X86_DEV_DMA_OPS
> > bool
> > 
> > and adding that back into the current build, e.g.
> 
> Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of
> sync and your results depend on which CPU the test runs on?

(From the other reply, it did and is still required).

I have run into other issues where stop_machine() tries to only do a
irq-disabled callback on the local CPU as opposed to halting all CPUs
and running the callback universally.

My understanding is that the root cause of the issue is:

diff --git a/init/Kconfig b/init/Kconfig
index af09b4f..8235e0b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1993,8 +1993,7 @@ config INIT_ALL_POSSIBLE
 
  config STOP_MACHINE
  bool
  -   default y
  -   depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
  +   default y if SMP || HOTPLUG_CPU
  help
Need stop_machine() primitive.

Although

diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index d2abbdb..ff4f029 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -97,7 +97,7 @@ static inline int try_stop_cpus(const struct cpumask *cpumask,
  * grabbing every spinlock (and more).  So the "read" side to such a
  * lock is anything which disables preemption.
  */
-#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
+#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)
 
 /**
  * stop_machine: freeze the machine on all CPUs and run this function
@@ -128,7 +128,7 @@ int __stop_machine(int (*fn)(void *), void *data, const 
struct cpumask *cpus);
 int stop_machine_from_inactive_cpu(int (*fn)(void *), void *data,
   const struct cpumask *cpus);
 
-#else   /* CONFIG_STOP_MACHINE && CONFIG_SMP */
+#else   /* CONFIG_SMP */
 
 static inline int __stop_machine(int (*fn)(void *), void *data,
 const struct cpumask *cpus)
@@ -153,5 +153,5 @@ static inline int stop_machine_from_inactive_cpu(int 
(*fn)(void *), void *data,
return __stop_machine(fn, data, cpus);
 }
 
-#endif /* CONFIG_STOP_MACHINE && CONFIG_SMP */
+#endif /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */
 #endif /* _LINUX_STOP_MACHINE */
diff --git a/init/Kconfig b/init/Kconfig
index af09b4f..44600a8 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1991,13 +1991,6 @@ config INIT_ALL_POSSIBLE
  it was better to provide this option than to break all the archs
  and have several arch maintainers pursuing me down dark alleys.
 
-config STOP_MACHINE
-   bool
-   default y
-   depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
-   help
- Need stop_machine() primitive.
-
 source "block/Kconfig"
 
 config PREEMPT_NOTIFIERS
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index fd643d8..2dd1f306 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -513,7 +513,7 @@ static int __init cpu_stop_init(void)
 }
 early_initcall(cpu_stop_init);
 
-#ifdef CONFIG_STOP_MACHINE
+#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)
 
 int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
 {
@@ -613,4 +613,4 @@ int stop_machine_from_inactive_cpu(int (*fn)(void *), void 
*data,
return ret ?: done.ret;
 }
 
-#endif /* CONFIG_STOP_MACHINE 

Re: i915.ko WC writes are slow after ea8596bb2d8d379

2015-11-18 Thread Andy Lutomirski
On Wed, Nov 18, 2015 at 6:48 AM, Chris Wilson  wrote:
> Although
>
> diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
> index d2abbdb..ff4f029 100644
> --- a/include/linux/stop_machine.h
> +++ b/include/linux/stop_machine.h
> @@ -97,7 +97,7 @@ static inline int try_stop_cpus(const struct cpumask 
> *cpumask,
>   * grabbing every spinlock (and more).  So the "read" side to such a
>   * lock is anything which disables preemption.
>   */
> -#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
> +#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)

[...]


This seems much better.  Having a set of stop_machine functions around
that don't work depending on config seems dangerous.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2015-11-18 Thread Chris Wilson
On Wed, Oct 08, 2014 at 05:10:59AM -0500, Chuck Ebbert wrote:
> On Wed, 8 Oct 2014 10:03:36 +0100
> Chris Wilson  wrote:
> 
> > 
> > I ran into a problem on a Sandybridge i5-2500s whilst measuring the
> > performance of GTT write-combining access. I found subsequent runs were
> > about 10-40x slower than the first. For example,
> > 
> > igt/gem_gtt_speed:
> > 
> > Time to read 16k through a GTT map: 325.285µs
> > Time to write 16k through a GTT map:  4.729µs
> > Time to clear 16k through a GTT map:  4.584µs
> > Time to clear 16k through a cached GTT map:   1.342µs
> > 
> > on the second run became:
> > 
> > Time to read 16k through a GTT map: 332.148µs
> > Time to write 16k through a GTT map:209.411µs
> > Time to clear 16k through a GTT map: 56.460µs
> > Time to clear 16k through a cached GTT map:  50.897µs
> > 
> > Naively I would say that we lost the wc on our ioremap.
> > /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
> > runs.
> > 
> > A bisection pointed to 
> > 
> > commit ea8596bb2d8d37957f3e92db9511c50801689180
> > Author: Masami Hiramatsu 
> > Date:   Thu Jul 18 20:47:53 2013 +0900
> > 
> > kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() 
> > functions
> > 
> > of which the active ingredient was just
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index b32ebf9..f4001e0 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
> >  
> >  config HAVE_TEXT_POKE_SMP
> > bool
> > -   select STOP_MACHINE if SMP
> >  
> >  config X86_DEV_DMA_OPS
> > bool
> > 
> > and adding that back into the current build, e.g.
> 
> Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of
> sync and your results depend on which CPU the test runs on?

(From the other reply, it did and is still required).

I have run into other issues where stop_machine() tries to only do a
irq-disabled callback on the local CPU as opposed to halting all CPUs
and running the callback universally.

My understanding is that the root cause of the issue is:

diff --git a/init/Kconfig b/init/Kconfig
index af09b4f..8235e0b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1993,8 +1993,7 @@ config INIT_ALL_POSSIBLE
 
  config STOP_MACHINE
  bool
  -   default y
  -   depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
  +   default y if SMP || HOTPLUG_CPU
  help
Need stop_machine() primitive.

Although

diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index d2abbdb..ff4f029 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -97,7 +97,7 @@ static inline int try_stop_cpus(const struct cpumask *cpumask,
  * grabbing every spinlock (and more).  So the "read" side to such a
  * lock is anything which disables preemption.
  */
-#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
+#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)
 
 /**
  * stop_machine: freeze the machine on all CPUs and run this function
@@ -128,7 +128,7 @@ int __stop_machine(int (*fn)(void *), void *data, const 
struct cpumask *cpus);
 int stop_machine_from_inactive_cpu(int (*fn)(void *), void *data,
   const struct cpumask *cpus);
 
-#else   /* CONFIG_STOP_MACHINE && CONFIG_SMP */
+#else   /* CONFIG_SMP */
 
 static inline int __stop_machine(int (*fn)(void *), void *data,
 const struct cpumask *cpus)
@@ -153,5 +153,5 @@ static inline int stop_machine_from_inactive_cpu(int 
(*fn)(void *), void *data,
return __stop_machine(fn, data, cpus);
 }
 
-#endif /* CONFIG_STOP_MACHINE && CONFIG_SMP */
+#endif /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */
 #endif /* _LINUX_STOP_MACHINE */
diff --git a/init/Kconfig b/init/Kconfig
index af09b4f..44600a8 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1991,13 +1991,6 @@ config INIT_ALL_POSSIBLE
  it was better to provide this option than to break all the archs
  and have several arch maintainers pursuing me down dark alleys.
 
-config STOP_MACHINE
-   bool
-   default y
-   depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
-   help
- Need stop_machine() primitive.
-
 source "block/Kconfig"
 
 config PREEMPT_NOTIFIERS
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index fd643d8..2dd1f306 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -513,7 +513,7 @@ static int __init cpu_stop_init(void)
 }
 early_initcall(cpu_stop_init);
 
-#ifdef CONFIG_STOP_MACHINE
+#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)
 
 int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
 {
@@ -613,4 +613,4 @@ int stop_machine_from_inactive_cpu(int (*fn)(void *), void 
*data,

Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-09 Thread Chris Wilson
On Thu, Oct 09, 2014 at 09:46:37AM -0500, Chuck Ebbert wrote:
> Well they're all the same.
> 
> Hmm, x86info is not dumping all the variable MTRRs. You have 10, but
> it only prints the first 8. I don't know if it will show anything
> different, but can you try fixing it with this patch?

Source (https://github.com/dankamongmen/x86info) was slightly different,
but I followed the drift.

tldr: 8,9 appear to be identical on all cpus as well.

$ sudo ./x86info --mtrr --all-cpus
x86info v1.31pre
Found 4 CPUs.
CPU #1:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Core i7 (SandyBridge)
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0d0a wc:1 fix:1 vcnt:10
MTRRphysBase0 (0x200): 0x0006 (physbase:0x00 type: 0x06 
(write-back))
MTRRphysMask0 (0x201): 0x000f8800 (physmask:0xf8 valid:1)
MTRRphysBase1 (0x202): 0x8006 (physbase:0x08 type: 0x06 
(write-back))
MTRRphysMask1 (0x203): 0x000ff800 (physmask:0xff valid:1)
MTRRphysBase2 (0x204): 0x8e00 (physbase:0x08e000 type: 0x00 
(uncacheable))
MTRRphysMask2 (0x205): 0x000ffe000800 (physmask:0xffe000 valid:1)
MTRRphysBase3 (0x206): 0x8d00 (physbase:0x08d000 type: 0x00 
(uncacheable))
MTRRphysMask3 (0x207): 0x000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase4 (0x208): 0x00010006 (physbase:0x10 type: 0x06 
(write-back))
MTRRphysMask4 (0x209): 0x000f8800 (physmask:0xf8 valid:1)
MTRRphysBase5 (0x20a): 0x00017000 (physbase:0x17 type: 0x00 
(uncacheable))
MTRRphysMask5 (0x20b): 0x000ff800 (physmask:0xff valid:1)
MTRRphysBase6 (0x20c): 0x00016f00 (physbase:0x16f000 type: 0x00 
(uncacheable))
MTRRphysMask6 (0x20d): 0x000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase7 (0x20e): 0x00016e80 (physbase:0x16e800 type: 0x00 
(uncacheable))
MTRRphysMask7 (0x20f): 0x000fff800800 (physmask:0xfff800 valid:1)
MTRRphysBase8 (0x210): 0x00016e60 (physbase:0x16e600 type: 0x00 
(uncacheable))
MTRRphysMask8 (0x211): 0x000fffe00800 (physmask:0xfffe00 valid:1)
MTRRphysBase9 (0x212): 0x (physbase:0x00 type: 0x00 
(uncacheable))
MTRRphysMask9 (0x213): 0x (physmask:0x00 valid:0)
MTRRfix64K_0 (0x250): 0x0606060606060606
MTRRfix16K_8 (0x258): 0x0606060606060606
MTRRfix16K_A (0x259): 0x
MTRRfix4K_C8000 (0x269): 0x0505050505050505
MTRRfix4K_D 0x26a: 0x
MTRRfix4K_D8000 0x26b: 0x
MTRRfix4K_E 0x26c: 0x
MTRRfix4K_E8000 0x26d: 0x0505050505050505
MTRRfix4K_F 0x26e: 0x0505050505050505
MTRRfix4K_F8000 0x26f: 0x0505050505050505
MTRRdefType (0x2ff): 0x0c00 (fixed-range flag:1 enable flag:1 
default type:0x00 (uncacheable))

--
CPU #2:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Core i7 (SandyBridge)
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0d0a wc:1 fix:1 vcnt:10
MTRRphysBase0 (0x200): 0x0006 (physbase:0x00 type: 0x06 
(write-back))
MTRRphysMask0 (0x201): 0x000f8800 (physmask:0xf8 valid:1)
MTRRphysBase1 (0x202): 0x8006 (physbase:0x08 type: 0x06 
(write-back))
MTRRphysMask1 (0x203): 0x000ff800 (physmask:0xff valid:1)
MTRRphysBase2 (0x204): 0x8e00 (physbase:0x08e000 type: 0x00 
(uncacheable))
MTRRphysMask2 (0x205): 0x000ffe000800 (physmask:0xffe000 valid:1)
MTRRphysBase3 (0x206): 0x8d00 (physbase:0x08d000 type: 0x00 
(uncacheable))
MTRRphysMask3 (0x207): 0x000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase4 (0x208): 0x00010006 (physbase:0x10 type: 0x06 
(write-back))
MTRRphysMask4 (0x209): 0x000f8800 (physmask:0xf8 valid:1)
MTRRphysBase5 (0x20a): 0x00017000 (physbase:0x17 type: 0x00 
(uncacheable))
MTRRphysMask5 (0x20b): 0x000ff800 (physmask:0xff valid:1)
MTRRphysBase6 (0x20c): 0x00016f00 (physbase:0x16f000 type: 0x00 
(uncacheable))
MTRRphysMask6 (0x20d): 0x000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase7 (0x20e): 0x00016e80 (physbase:0x16e800 type: 0x00 
(uncacheable))
MTRRphysMask7 (0x20f): 0x000fff800800 (physmask:0xfff800 valid:1)
MTRRphysBase8 (0x210): 0x00016e60 (physbase:0x16e600 type: 0x00 
(uncacheable))
MTRRphysMask8 (0x211): 0x000fffe00800 (physmask:0xfffe00 valid:1)
MTRRphysBase9 (0x212): 0x (physbase:0x00 type: 0x00 
(uncacheable))
MTRRphysMask9 (0x213): 0x (physmask:0x00 valid:0)
MTRRfix64K_0 (0x250): 0x0606060606060606
MTRRfix16K_8 (0x258): 0x0606060606060606

Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-09 Thread Chuck Ebbert
On Thu, 9 Oct 2014 14:00:47 +0100
Chris Wilson  wrote:

> On Thu, Oct 09, 2014 at 07:44:16AM -0500, Chuck Ebbert wrote:
> > Could you try installing x86info and running "x86info --mtrr
> > --all-cpus" while running the broken kernel?
> 
> # /opt/xorg/src/intel-gpu-tools/tests/gem_gtt_speed 
> IGT-Version: 1.8-g32a0308 (x86_64) (Linux: 3.17.0+ x86_64)
> Time to read 16k through a GTT map: 318.643µs
> Time to write 16k through a GTT map:203.103µs
> Time to clear 16k through a GTT map: 53.098µs
> Time to clear 16k through a cached GTT map:  49.925µs
> 
> (i.e. bad kernel)
> 
> # x86info --mtrr --all-cpus
> x86info v1.30.  Dave Jones 2001-2011
> Feedback to .
> 
> Found 4 CPUs.
> CPU #1:
> Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
> Type: 0 (Original OEM)
> CPU Model (x86info's best guess): Unknown model. 
> Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 
> 3.30GHz
> 
> MTRR registers:
> MTRRcap (0xfe): 0x0d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 
> 0x1, vcnt field: 0x0a (10))
> MTRRphysBase0 (0x200): 0x0006 (physbase field:0x00 type 
> field: 0x06 (write-back))
> MTRRphysMask0 (0x201): 0x000f8800 (physmask field:0xf8 valid 
> flag: 1)
> MTRRphysBase1 (0x202): 0x8006 (physbase field:0x08 type 
> field: 0x06 (write-back))
> MTRRphysMask1 (0x203): 0x000ff800 (physmask field:0xff valid 
> flag: 1)
> MTRRphysBase2 (0x204): 0x8e00 (physbase field:0x08e000 type 
> field: 0x00 (uncacheable))
> MTRRphysMask2 (0x205): 0x000ffe000800 (physmask field:0xffe000 valid 
> flag: 1)
> MTRRphysBase3 (0x206): 0x8d00 (physbase field:0x08d000 type 
> field: 0x00 (uncacheable))
> MTRRphysMask3 (0x207): 0x000fff000800 (physmask field:0xfff000 valid 
> flag: 1)
> MTRRphysBase4 (0x208): 0x00010006 (physbase field:0x10 type 
> field: 0x06 (write-back))
> MTRRphysMask4 (0x209): 0x000f8800 (physmask field:0xf8 valid 
> flag: 1)
> MTRRphysBase5 (0x20a): 0x00017000 (physbase field:0x17 type 
> field: 0x00 (uncacheable))
> MTRRphysMask5 (0x20b): 0x000ff800 (physmask field:0xff valid 
> flag: 1)
> MTRRphysBase6 (0x20c): 0x00016f00 (physbase field:0x16f000 type 
> field: 0x00 (uncacheable))
> MTRRphysMask6 (0x20d): 0x000fff000800 (physmask field:0xfff000 valid 
> flag: 1)
> MTRRphysBase7 (0x20e): 0x00016e80 (physbase field:0x16e800 type 
> field: 0x00 (uncacheable))
> MTRRphysMask7 (0x20f): 0x000fff800800 (physmask field:0xfff800 valid 
> flag: 1)
> MTRRfix64K_0 (0x250): 0x0606060606060606
> MTRRfix16K_8 (0x258): 0x0606060606060606
> MTRRfix16K_A (0x259): 0x
> MTRRfix4K_C8000 (0x269): 0x0505050505050505
> MTRRfix4K_D 0x26a: 0x
> MTRRfix4K_D8000 0x26b: 0x
> MTRRfix4K_E 0x26c: 0x
> MTRRfix4K_E8000 0x26d: 0x0505050505050505
> MTRRfix4K_F 0x26e: 0x0505050505050505
> MTRRfix4K_F8000 0x26f: 0x0505050505050505
> MTRRdefType (0x2ff): 0x0c00 (fixed-range flag: 0x1, mtrr flag: 
> 0x1, type field: 0x00 (uncacheable))
> 



Well they're all the same.

Hmm, x86info is not dumping all the variable MTRRs. You have 10, but
it only prints the first 8. I don't know if it will show anything
different, but can you try fixing it with this patch?

--- a/mtrr.c
+++ b/mtrr.c
@@ -75,19 +75,23 @@
printf("0x%016llx\n", val);
 }
 
-static void decode_mtrrcap(int cpu, int msr)
+unsigned int decode_mtrrcap(int cpu, int msr)
 {
unsigned long long val;
+   unsigned int vcnt = 0;
int ret;
 
ret = mtrr_value(cpu,msr,);
if (ret) {
+   vcnt = (unsigned int)(val & IA32_MTRRCAP_VCNT);
printf("0x%016llx ", val);
printf("(smrr flag: 0x%01x, ",(unsigned int) (val & 
IA32_MTRRCAP_SMRR) >> 11 );
printf("wc flag: 0x%01x, ",(unsigned int) (val_MTRRCAP_WC) 
>> 10);
printf("fix flag: 0x%01x, ",(unsigned int) 
(val_MTRRCAP_FIX) >> 8);
-   printf("vcnt field: 0x%02x (%d))\n",(unsigned int) 
(val_MTRRCAP_VCNT) , (int) (val_MTRRCAP_VCNT));
+   printf("vcnt field: 0x%02x (%u))\n", vcnt, vcnt);
}
+
+   return vcnt;
 }
 
 static void decode_mtrr_deftype(int cpu, int msr)
@@ -142,7 +146,7 @@
 void dump_mtrrs(struct cpudata *cpu)
 {
unsigned long long val = 0;
-   unsigned int i;
+   unsigned int i, vcnt;
 
if (!(cpu->flags_edx & (X86_FEATURE_MTRR)))
return;
@@ -157,11 +161,11 @@
printf("MTRR registers:\n");
 
printf("MTRRcap (0xfe): ");
-   decode_mtrrcap(cpu->number, 0xfe);
+   vcnt = decode_mtrrcap(cpu->number, 0xfe);
 
set_max_phy_addr(cpu);
 
-   for (i = 0; i < 16; i+=2) {
+   for (i = 0; i < 2 * vcnt; i += 2) {
printf("MTRRphysBase%u (0x%x): ", i/2, (unsigned int) 

Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-09 Thread Chris Wilson
On Thu, Oct 09, 2014 at 07:44:16AM -0500, Chuck Ebbert wrote:
> Could you try installing x86info and running "x86info --mtrr
> --all-cpus" while running the broken kernel?

# /opt/xorg/src/intel-gpu-tools/tests/gem_gtt_speed 
IGT-Version: 1.8-g32a0308 (x86_64) (Linux: 3.17.0+ x86_64)
Time to read 16k through a GTT map: 318.643µs
Time to write 16k through a GTT map:203.103µs
Time to clear 16k through a GTT map: 53.098µs
Time to clear 16k through a cached GTT map:  49.925µs

(i.e. bad kernel)

# x86info --mtrr --all-cpus
x86info v1.30.  Dave Jones 2001-2011
Feedback to .

Found 4 CPUs.
CPU #1:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Unknown model. 
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 
0x1, vcnt field: 0x0a (10))
MTRRphysBase0 (0x200): 0x0006 (physbase field:0x00 type field: 
0x06 (write-back))
MTRRphysMask0 (0x201): 0x000f8800 (physmask field:0xf8 valid flag: 
1)
MTRRphysBase1 (0x202): 0x8006 (physbase field:0x08 type field: 
0x06 (write-back))
MTRRphysMask1 (0x203): 0x000ff800 (physmask field:0xff valid flag: 
1)
MTRRphysBase2 (0x204): 0x8e00 (physbase field:0x08e000 type field: 
0x00 (uncacheable))
MTRRphysMask2 (0x205): 0x000ffe000800 (physmask field:0xffe000 valid flag: 
1)
MTRRphysBase3 (0x206): 0x8d00 (physbase field:0x08d000 type field: 
0x00 (uncacheable))
MTRRphysMask3 (0x207): 0x000fff000800 (physmask field:0xfff000 valid flag: 
1)
MTRRphysBase4 (0x208): 0x00010006 (physbase field:0x10 type field: 
0x06 (write-back))
MTRRphysMask4 (0x209): 0x000f8800 (physmask field:0xf8 valid flag: 
1)
MTRRphysBase5 (0x20a): 0x00017000 (physbase field:0x17 type field: 
0x00 (uncacheable))
MTRRphysMask5 (0x20b): 0x000ff800 (physmask field:0xff valid flag: 
1)
MTRRphysBase6 (0x20c): 0x00016f00 (physbase field:0x16f000 type field: 
0x00 (uncacheable))
MTRRphysMask6 (0x20d): 0x000fff000800 (physmask field:0xfff000 valid flag: 
1)
MTRRphysBase7 (0x20e): 0x00016e80 (physbase field:0x16e800 type field: 
0x00 (uncacheable))
MTRRphysMask7 (0x20f): 0x000fff800800 (physmask field:0xfff800 valid flag: 
1)
MTRRfix64K_0 (0x250): 0x0606060606060606
MTRRfix16K_8 (0x258): 0x0606060606060606
MTRRfix16K_A (0x259): 0x
MTRRfix4K_C8000 (0x269): 0x0505050505050505
MTRRfix4K_D 0x26a: 0x
MTRRfix4K_D8000 0x26b: 0x
MTRRfix4K_E 0x26c: 0x
MTRRfix4K_E8000 0x26d: 0x0505050505050505
MTRRfix4K_F 0x26e: 0x0505050505050505
MTRRfix4K_F8000 0x26f: 0x0505050505050505
MTRRdefType (0x2ff): 0x0c00 (fixed-range flag: 0x1, mtrr flag: 0x1, 
type field: 0x00 (uncacheable))

--

CPU #2:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Unknown model. 
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 
0x1, vcnt field: 0x0a (10))
MTRRphysBase0 (0x200): 0x0006 (physbase field:0x00 type field: 
0x06 (write-back))
MTRRphysMask0 (0x201): 0x000f8800 (physmask field:0xf8 valid flag: 
1)
MTRRphysBase1 (0x202): 0x8006 (physbase field:0x08 type field: 
0x06 (write-back))
MTRRphysMask1 (0x203): 0x000ff800 (physmask field:0xff valid flag: 
1)
MTRRphysBase2 (0x204): 0x8e00 (physbase field:0x08e000 type field: 
0x00 (uncacheable))
MTRRphysMask2 (0x205): 0x000ffe000800 (physmask field:0xffe000 valid flag: 
1)
MTRRphysBase3 (0x206): 0x8d00 (physbase field:0x08d000 type field: 
0x00 (uncacheable))
MTRRphysMask3 (0x207): 0x000fff000800 (physmask field:0xfff000 valid flag: 
1)
MTRRphysBase4 (0x208): 0x00010006 (physbase field:0x10 type field: 
0x06 (write-back))
MTRRphysMask4 (0x209): 0x000f8800 (physmask field:0xf8 valid flag: 
1)
MTRRphysBase5 (0x20a): 0x00017000 (physbase field:0x17 type field: 
0x00 (uncacheable))
MTRRphysMask5 (0x20b): 0x000ff800 (physmask field:0xff valid flag: 
1)
MTRRphysBase6 (0x20c): 0x00016f00 (physbase field:0x16f000 type field: 
0x00 (uncacheable))
MTRRphysMask6 (0x20d): 0x000fff000800 (physmask field:0xfff000 valid flag: 
1)
MTRRphysBase7 (0x20e): 0x00016e80 (physbase field:0x16e800 type field: 
0x00 (uncacheable))
MTRRphysMask7 (0x20f): 0x000fff800800 (physmask field:0xfff800 valid flag: 
1)
MTRRfix64K_0 (0x250): 0x0606060606060606
MTRRfix16K_8 (0x258): 0x0606060606060606
MTRRfix16K_A (0x259): 

Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-09 Thread Chuck Ebbert
On Thu, 9 Oct 2014 07:53:31 +0100
Chris Wilson  wrote:

> # cat /proc/mtrr 
> reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
> reg01: base=0x08000 ( 2048MB), size=  256MB, count=1: write-back
> reg02: base=0x08e00 ( 2272MB), size=   32MB, count=1: uncachable
> reg03: base=0x08d00 ( 2256MB), size=   16MB, count=1: uncachable
> reg04: base=0x1 ( 4096MB), size= 2048MB, count=1: write-back
> reg05: base=0x17000 ( 5888MB), size=  256MB, count=1: uncachable
> reg06: base=0x16f00 ( 5872MB), size=   16MB, count=1: uncachable
> reg07: base=0x16e80 ( 5864MB), size=8MB, count=1: uncachable
> reg08: base=0x16e60 ( 5862MB), size=2MB, count=1: uncachable
> 

Well that's what the kernel thinks is in every CPU.
Could you try installing x86info and running "x86info --mtrr
--all-cpus" while running the broken kernel?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-09 Thread Chris Wilson
On Wed, Oct 08, 2014 at 02:36:49PM -0700, H. Peter Anvin wrote:
> On 10/08/2014 12:49 PM, Chris Wilson wrote:
> > 
> > Indeed, this appears to be the explanation. (And here I thought PAT
> > superseded mtrrs - i915.ko stopped trying to use assign an mtrr for its
> > GTT quite a while ago.)
> > 
> > Replacing the stop_machine there with on_each_cpu does the trick:
> > 
> 
> It should, but there seem to be quite a few drivers which still muck
> with MTRRs.  However, i915 is not one of them, it calls
> io_mapping_create_wc() followed by arch_phys_wc_add(), so I'm wondering
> what the heck is going on here.

This system also have a radeon GPU. Disabling it (not building in the
module) makes no difference to the wc speed.
 
> > Naively I would say that we lost the wc on our ioremap.
> > /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
> > runs.
> 
> Could you tell me what the above looks like?

# cat /sys/kernel/debug/x86/pat_memtype_list
PAT memtype list:
write-back @ 0x8cf34000-0x8cf43000
write-back @ 0x8cf4d000-0x8cf4e000
write-back @ 0x8cf4d000-0x8cf5
write-back @ 0x8cf5-0x8cf51000
write-back @ 0x8cf51000-0x8cf52000
write-back @ 0x8cf52000-0x8cf53000
write-back @ 0x8cf53000-0x8cf55000
write-back @ 0x8cf55000-0x8cf56000
write-back @ 0x8cf9d000-0x8cf9e000
write-back @ 0x8cf9f000-0x8cfa
write-back @ 0x8cffc000-0x8cffd000
uncached-minus @ 0x8fc0-0x8fe0
write-combining @ 0x8fe0-0x9000
uncached-minus @ 0x9022-0x9024
uncached-minus @ 0x9030-0x9032
uncached-minus @ 0x9034-0x90341000
uncached-minus @ 0x9038-0x90381000
write-combining @ 0xa000-0xc000
write-combining @ 0xa0139000-0xa0159000
write-combining @ 0xa0159000-0xa0179000
write-combining @ 0xa0179000-0xa0199000
write-combining @ 0xc004-0xc025e000
write-combining @ 0xc025e000-0xc045e000
write-combining @ 0xc045e000-0xc045f000
write-combining @ 0xc045f000-0xc075f000
uncached-minus @ 0xf800-0xfc00
uncached-minus @ 0xfed0-0xfed01000
uncached-minus @ 0xfed1-0xfed16000
uncached-minus @ 0xfed1f000-0xfed2

(identical for good/bad runs)

# cat /proc/mtrr 
reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
reg01: base=0x08000 ( 2048MB), size=  256MB, count=1: write-back
reg02: base=0x08e00 ( 2272MB), size=   32MB, count=1: uncachable
reg03: base=0x08d00 ( 2256MB), size=   16MB, count=1: uncachable
reg04: base=0x1 ( 4096MB), size= 2048MB, count=1: write-back
reg05: base=0x17000 ( 5888MB), size=  256MB, count=1: uncachable
reg06: base=0x16f00 ( 5872MB), size=   16MB, count=1: uncachable
reg07: base=0x16e80 ( 5864MB), size=8MB, count=1: uncachable
reg08: base=0x16e60 ( 5862MB), size=2MB, count=1: uncachable

# cat /proc/iomem:
-0fff : reserved
1000-0009bbff : System RAM
0009bc00-0009 : reserved
000a-000b : PCI Bus :00
000c-000cdfff : Video ROM
000d-000d3fff : PCI Bus :00
000d4000-000d7fff : PCI Bus :00
000d8000-000dbfff : PCI Bus :00
000dc000-000d : PCI Bus :00
000e-000f : reserved
  000e-000e3fff : PCI Bus :00
  000e4000-000e7fff : PCI Bus :00
  000f-000f : System ROM
0010-1fff : System RAM
  0100-0161981b : Kernel code
  0161981c-01ca20ff : Kernel data
  01dac000-01e2dfff : Kernel bss
2000-201f : reserved
  2000-201f : pnp 00:05
2020-3fff : System RAM
4000-401f : reserved
  4000-401f : pnp 00:05
4020-8ccd2fff : System RAM
8ccd3000-8cd66fff : reserved
8cd67000-8cfe6fff : ACPI Non-volatile Storage
8cfe7000-8cffefff : ACPI Tables
8cfff000-8cff : System RAM
8d00-8f9f : reserved
  8da0-8f9f : Graphics Stolen Memory
8fa0-feaf : PCI Bus :00
  8fa0-8fa00fff : pnp 00:03
  8fc0-8fff : :00:02.0
  9000-900f : PCI Bus :04
9000-900f : PCI Bus :05
  9000-90003fff : :05:00.0
  9001-900107ff : :05:00.0
  9010-901f : PCI Bus :03
9010-90101fff : :03:00.0
  9020-902f : PCI Bus :01
9020-9021 : :01:00.0
9022-9023 : :01:00.0
9024-90243fff : :01:00.1
  9030-9031 : :00:19.0
9030-9031 : e1000e
  9033-903300ff : :00:1f.3
  9034-903407ff : :00:1f.2
9034-903407ff : ahci
  9035-903503ff : :00:1d.0
  9036-90363fff : :00:1b.0
  9037-903703ff : :00:1a.0
  9038-90380fff : :00:19.0
9038-90380fff : e1000e
  9039-90390fff : :00:16.3
  903a-903a000f : :00:16.0
  a000-bfff : :00:02.0
  c000-cfff : PCI Bus :01
c000-cfff : :01:00.0
  f800-fbff : PCI MMCONFIG  [bus 00-3f]
f800-fbff : reserved
  f800-fbff : pnp 00:03
fec0-fec00fff : reserved
  fec0-fec003ff : IOAPIC 0
fed0-fed003ff : HPET 0
  fed0-fed003ff : PNP0103:00

Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-09 Thread Chris Wilson
On Wed, Oct 08, 2014 at 02:36:49PM -0700, H. Peter Anvin wrote:
 On 10/08/2014 12:49 PM, Chris Wilson wrote:
  
  Indeed, this appears to be the explanation. (And here I thought PAT
  superseded mtrrs - i915.ko stopped trying to use assign an mtrr for its
  GTT quite a while ago.)
  
  Replacing the stop_machine there with on_each_cpu does the trick:
  
 
 It should, but there seem to be quite a few drivers which still muck
 with MTRRs.  However, i915 is not one of them, it calls
 io_mapping_create_wc() followed by arch_phys_wc_add(), so I'm wondering
 what the heck is going on here.

This system also have a radeon GPU. Disabling it (not building in the
module) makes no difference to the wc speed.
 
  Naively I would say that we lost the wc on our ioremap.
  /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
  runs.
 
 Could you tell me what the above looks like?

# cat /sys/kernel/debug/x86/pat_memtype_list
PAT memtype list:
write-back @ 0x8cf34000-0x8cf43000
write-back @ 0x8cf4d000-0x8cf4e000
write-back @ 0x8cf4d000-0x8cf5
write-back @ 0x8cf5-0x8cf51000
write-back @ 0x8cf51000-0x8cf52000
write-back @ 0x8cf52000-0x8cf53000
write-back @ 0x8cf53000-0x8cf55000
write-back @ 0x8cf55000-0x8cf56000
write-back @ 0x8cf9d000-0x8cf9e000
write-back @ 0x8cf9f000-0x8cfa
write-back @ 0x8cffc000-0x8cffd000
uncached-minus @ 0x8fc0-0x8fe0
write-combining @ 0x8fe0-0x9000
uncached-minus @ 0x9022-0x9024
uncached-minus @ 0x9030-0x9032
uncached-minus @ 0x9034-0x90341000
uncached-minus @ 0x9038-0x90381000
write-combining @ 0xa000-0xc000
write-combining @ 0xa0139000-0xa0159000
write-combining @ 0xa0159000-0xa0179000
write-combining @ 0xa0179000-0xa0199000
write-combining @ 0xc004-0xc025e000
write-combining @ 0xc025e000-0xc045e000
write-combining @ 0xc045e000-0xc045f000
write-combining @ 0xc045f000-0xc075f000
uncached-minus @ 0xf800-0xfc00
uncached-minus @ 0xfed0-0xfed01000
uncached-minus @ 0xfed1-0xfed16000
uncached-minus @ 0xfed1f000-0xfed2

(identical for good/bad runs)

# cat /proc/mtrr 
reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
reg01: base=0x08000 ( 2048MB), size=  256MB, count=1: write-back
reg02: base=0x08e00 ( 2272MB), size=   32MB, count=1: uncachable
reg03: base=0x08d00 ( 2256MB), size=   16MB, count=1: uncachable
reg04: base=0x1 ( 4096MB), size= 2048MB, count=1: write-back
reg05: base=0x17000 ( 5888MB), size=  256MB, count=1: uncachable
reg06: base=0x16f00 ( 5872MB), size=   16MB, count=1: uncachable
reg07: base=0x16e80 ( 5864MB), size=8MB, count=1: uncachable
reg08: base=0x16e60 ( 5862MB), size=2MB, count=1: uncachable

# cat /proc/iomem:
-0fff : reserved
1000-0009bbff : System RAM
0009bc00-0009 : reserved
000a-000b : PCI Bus :00
000c-000cdfff : Video ROM
000d-000d3fff : PCI Bus :00
000d4000-000d7fff : PCI Bus :00
000d8000-000dbfff : PCI Bus :00
000dc000-000d : PCI Bus :00
000e-000f : reserved
  000e-000e3fff : PCI Bus :00
  000e4000-000e7fff : PCI Bus :00
  000f-000f : System ROM
0010-1fff : System RAM
  0100-0161981b : Kernel code
  0161981c-01ca20ff : Kernel data
  01dac000-01e2dfff : Kernel bss
2000-201f : reserved
  2000-201f : pnp 00:05
2020-3fff : System RAM
4000-401f : reserved
  4000-401f : pnp 00:05
4020-8ccd2fff : System RAM
8ccd3000-8cd66fff : reserved
8cd67000-8cfe6fff : ACPI Non-volatile Storage
8cfe7000-8cffefff : ACPI Tables
8cfff000-8cff : System RAM
8d00-8f9f : reserved
  8da0-8f9f : Graphics Stolen Memory
8fa0-feaf : PCI Bus :00
  8fa0-8fa00fff : pnp 00:03
  8fc0-8fff : :00:02.0
  9000-900f : PCI Bus :04
9000-900f : PCI Bus :05
  9000-90003fff : :05:00.0
  9001-900107ff : :05:00.0
  9010-901f : PCI Bus :03
9010-90101fff : :03:00.0
  9020-902f : PCI Bus :01
9020-9021 : :01:00.0
9022-9023 : :01:00.0
9024-90243fff : :01:00.1
  9030-9031 : :00:19.0
9030-9031 : e1000e
  9033-903300ff : :00:1f.3
  9034-903407ff : :00:1f.2
9034-903407ff : ahci
  9035-903503ff : :00:1d.0
  9036-90363fff : :00:1b.0
  9037-903703ff : :00:1a.0
  9038-90380fff : :00:19.0
9038-90380fff : e1000e
  9039-90390fff : :00:16.3
  903a-903a000f : :00:16.0
  a000-bfff : :00:02.0
  c000-cfff : PCI Bus :01
c000-cfff : :01:00.0
  f800-fbff : PCI MMCONFIG  [bus 00-3f]
f800-fbff : reserved
  f800-fbff : pnp 00:03
fec0-fec00fff : reserved
  fec0-fec003ff : IOAPIC 0
fed0-fed003ff : HPET 0
  fed0-fed003ff : PNP0103:00
fed1-fed13fff : reserved

Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-09 Thread Chuck Ebbert
On Thu, 9 Oct 2014 07:53:31 +0100
Chris Wilson ch...@chris-wilson.co.uk wrote:

 # cat /proc/mtrr 
 reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
 reg01: base=0x08000 ( 2048MB), size=  256MB, count=1: write-back
 reg02: base=0x08e00 ( 2272MB), size=   32MB, count=1: uncachable
 reg03: base=0x08d00 ( 2256MB), size=   16MB, count=1: uncachable
 reg04: base=0x1 ( 4096MB), size= 2048MB, count=1: write-back
 reg05: base=0x17000 ( 5888MB), size=  256MB, count=1: uncachable
 reg06: base=0x16f00 ( 5872MB), size=   16MB, count=1: uncachable
 reg07: base=0x16e80 ( 5864MB), size=8MB, count=1: uncachable
 reg08: base=0x16e60 ( 5862MB), size=2MB, count=1: uncachable
 

Well that's what the kernel thinks is in every CPU.
Could you try installing x86info and running x86info --mtrr
--all-cpus while running the broken kernel?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-09 Thread Chris Wilson
On Thu, Oct 09, 2014 at 07:44:16AM -0500, Chuck Ebbert wrote:
 Could you try installing x86info and running x86info --mtrr
 --all-cpus while running the broken kernel?

# /opt/xorg/src/intel-gpu-tools/tests/gem_gtt_speed 
IGT-Version: 1.8-g32a0308 (x86_64) (Linux: 3.17.0+ x86_64)
Time to read 16k through a GTT map: 318.643µs
Time to write 16k through a GTT map:203.103µs
Time to clear 16k through a GTT map: 53.098µs
Time to clear 16k through a cached GTT map:  49.925µs

(i.e. bad kernel)

# x86info --mtrr --all-cpus
x86info v1.30.  Dave Jones 2001-2011
Feedback to da...@redhat.com.

Found 4 CPUs.
CPU #1:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Unknown model. 
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 
0x1, vcnt field: 0x0a (10))
MTRRphysBase0 (0x200): 0x0006 (physbase field:0x00 type field: 
0x06 (write-back))
MTRRphysMask0 (0x201): 0x000f8800 (physmask field:0xf8 valid flag: 
1)
MTRRphysBase1 (0x202): 0x8006 (physbase field:0x08 type field: 
0x06 (write-back))
MTRRphysMask1 (0x203): 0x000ff800 (physmask field:0xff valid flag: 
1)
MTRRphysBase2 (0x204): 0x8e00 (physbase field:0x08e000 type field: 
0x00 (uncacheable))
MTRRphysMask2 (0x205): 0x000ffe000800 (physmask field:0xffe000 valid flag: 
1)
MTRRphysBase3 (0x206): 0x8d00 (physbase field:0x08d000 type field: 
0x00 (uncacheable))
MTRRphysMask3 (0x207): 0x000fff000800 (physmask field:0xfff000 valid flag: 
1)
MTRRphysBase4 (0x208): 0x00010006 (physbase field:0x10 type field: 
0x06 (write-back))
MTRRphysMask4 (0x209): 0x000f8800 (physmask field:0xf8 valid flag: 
1)
MTRRphysBase5 (0x20a): 0x00017000 (physbase field:0x17 type field: 
0x00 (uncacheable))
MTRRphysMask5 (0x20b): 0x000ff800 (physmask field:0xff valid flag: 
1)
MTRRphysBase6 (0x20c): 0x00016f00 (physbase field:0x16f000 type field: 
0x00 (uncacheable))
MTRRphysMask6 (0x20d): 0x000fff000800 (physmask field:0xfff000 valid flag: 
1)
MTRRphysBase7 (0x20e): 0x00016e80 (physbase field:0x16e800 type field: 
0x00 (uncacheable))
MTRRphysMask7 (0x20f): 0x000fff800800 (physmask field:0xfff800 valid flag: 
1)
MTRRfix64K_0 (0x250): 0x0606060606060606
MTRRfix16K_8 (0x258): 0x0606060606060606
MTRRfix16K_A (0x259): 0x
MTRRfix4K_C8000 (0x269): 0x0505050505050505
MTRRfix4K_D 0x26a: 0x
MTRRfix4K_D8000 0x26b: 0x
MTRRfix4K_E 0x26c: 0x
MTRRfix4K_E8000 0x26d: 0x0505050505050505
MTRRfix4K_F 0x26e: 0x0505050505050505
MTRRfix4K_F8000 0x26f: 0x0505050505050505
MTRRdefType (0x2ff): 0x0c00 (fixed-range flag: 0x1, mtrr flag: 0x1, 
type field: 0x00 (uncacheable))

--

CPU #2:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Unknown model. 
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 
0x1, vcnt field: 0x0a (10))
MTRRphysBase0 (0x200): 0x0006 (physbase field:0x00 type field: 
0x06 (write-back))
MTRRphysMask0 (0x201): 0x000f8800 (physmask field:0xf8 valid flag: 
1)
MTRRphysBase1 (0x202): 0x8006 (physbase field:0x08 type field: 
0x06 (write-back))
MTRRphysMask1 (0x203): 0x000ff800 (physmask field:0xff valid flag: 
1)
MTRRphysBase2 (0x204): 0x8e00 (physbase field:0x08e000 type field: 
0x00 (uncacheable))
MTRRphysMask2 (0x205): 0x000ffe000800 (physmask field:0xffe000 valid flag: 
1)
MTRRphysBase3 (0x206): 0x8d00 (physbase field:0x08d000 type field: 
0x00 (uncacheable))
MTRRphysMask3 (0x207): 0x000fff000800 (physmask field:0xfff000 valid flag: 
1)
MTRRphysBase4 (0x208): 0x00010006 (physbase field:0x10 type field: 
0x06 (write-back))
MTRRphysMask4 (0x209): 0x000f8800 (physmask field:0xf8 valid flag: 
1)
MTRRphysBase5 (0x20a): 0x00017000 (physbase field:0x17 type field: 
0x00 (uncacheable))
MTRRphysMask5 (0x20b): 0x000ff800 (physmask field:0xff valid flag: 
1)
MTRRphysBase6 (0x20c): 0x00016f00 (physbase field:0x16f000 type field: 
0x00 (uncacheable))
MTRRphysMask6 (0x20d): 0x000fff000800 (physmask field:0xfff000 valid flag: 
1)
MTRRphysBase7 (0x20e): 0x00016e80 (physbase field:0x16e800 type field: 
0x00 (uncacheable))
MTRRphysMask7 (0x20f): 0x000fff800800 (physmask field:0xfff800 valid flag: 
1)
MTRRfix64K_0 (0x250): 0x0606060606060606
MTRRfix16K_8 (0x258): 0x0606060606060606
MTRRfix16K_A 

Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-09 Thread Chuck Ebbert
On Thu, 9 Oct 2014 14:00:47 +0100
Chris Wilson ch...@chris-wilson.co.uk wrote:

 On Thu, Oct 09, 2014 at 07:44:16AM -0500, Chuck Ebbert wrote:
  Could you try installing x86info and running x86info --mtrr
  --all-cpus while running the broken kernel?
 
 # /opt/xorg/src/intel-gpu-tools/tests/gem_gtt_speed 
 IGT-Version: 1.8-g32a0308 (x86_64) (Linux: 3.17.0+ x86_64)
 Time to read 16k through a GTT map: 318.643µs
 Time to write 16k through a GTT map:203.103µs
 Time to clear 16k through a GTT map: 53.098µs
 Time to clear 16k through a cached GTT map:  49.925µs
 
 (i.e. bad kernel)
 
 # x86info --mtrr --all-cpus
 x86info v1.30.  Dave Jones 2001-2011
 Feedback to da...@redhat.com.
 
 Found 4 CPUs.
 CPU #1:
 Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
 Type: 0 (Original OEM)
 CPU Model (x86info's best guess): Unknown model. 
 Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 
 3.30GHz
 
 MTRR registers:
 MTRRcap (0xfe): 0x0d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 
 0x1, vcnt field: 0x0a (10))
 MTRRphysBase0 (0x200): 0x0006 (physbase field:0x00 type 
 field: 0x06 (write-back))
 MTRRphysMask0 (0x201): 0x000f8800 (physmask field:0xf8 valid 
 flag: 1)
 MTRRphysBase1 (0x202): 0x8006 (physbase field:0x08 type 
 field: 0x06 (write-back))
 MTRRphysMask1 (0x203): 0x000ff800 (physmask field:0xff valid 
 flag: 1)
 MTRRphysBase2 (0x204): 0x8e00 (physbase field:0x08e000 type 
 field: 0x00 (uncacheable))
 MTRRphysMask2 (0x205): 0x000ffe000800 (physmask field:0xffe000 valid 
 flag: 1)
 MTRRphysBase3 (0x206): 0x8d00 (physbase field:0x08d000 type 
 field: 0x00 (uncacheable))
 MTRRphysMask3 (0x207): 0x000fff000800 (physmask field:0xfff000 valid 
 flag: 1)
 MTRRphysBase4 (0x208): 0x00010006 (physbase field:0x10 type 
 field: 0x06 (write-back))
 MTRRphysMask4 (0x209): 0x000f8800 (physmask field:0xf8 valid 
 flag: 1)
 MTRRphysBase5 (0x20a): 0x00017000 (physbase field:0x17 type 
 field: 0x00 (uncacheable))
 MTRRphysMask5 (0x20b): 0x000ff800 (physmask field:0xff valid 
 flag: 1)
 MTRRphysBase6 (0x20c): 0x00016f00 (physbase field:0x16f000 type 
 field: 0x00 (uncacheable))
 MTRRphysMask6 (0x20d): 0x000fff000800 (physmask field:0xfff000 valid 
 flag: 1)
 MTRRphysBase7 (0x20e): 0x00016e80 (physbase field:0x16e800 type 
 field: 0x00 (uncacheable))
 MTRRphysMask7 (0x20f): 0x000fff800800 (physmask field:0xfff800 valid 
 flag: 1)
 MTRRfix64K_0 (0x250): 0x0606060606060606
 MTRRfix16K_8 (0x258): 0x0606060606060606
 MTRRfix16K_A (0x259): 0x
 MTRRfix4K_C8000 (0x269): 0x0505050505050505
 MTRRfix4K_D 0x26a: 0x
 MTRRfix4K_D8000 0x26b: 0x
 MTRRfix4K_E 0x26c: 0x
 MTRRfix4K_E8000 0x26d: 0x0505050505050505
 MTRRfix4K_F 0x26e: 0x0505050505050505
 MTRRfix4K_F8000 0x26f: 0x0505050505050505
 MTRRdefType (0x2ff): 0x0c00 (fixed-range flag: 0x1, mtrr flag: 
 0x1, type field: 0x00 (uncacheable))
 

snip

Well they're all the same.

Hmm, x86info is not dumping all the variable MTRRs. You have 10, but
it only prints the first 8. I don't know if it will show anything
different, but can you try fixing it with this patch?

--- a/mtrr.c
+++ b/mtrr.c
@@ -75,19 +75,23 @@
printf(0x%016llx\n, val);
 }
 
-static void decode_mtrrcap(int cpu, int msr)
+unsigned int decode_mtrrcap(int cpu, int msr)
 {
unsigned long long val;
+   unsigned int vcnt = 0;
int ret;
 
ret = mtrr_value(cpu,msr,val);
if (ret) {
+   vcnt = (unsigned int)(val  IA32_MTRRCAP_VCNT);
printf(0x%016llx , val);
printf((smrr flag: 0x%01x, ,(unsigned int) (val  
IA32_MTRRCAP_SMRR)  11 );
printf(wc flag: 0x%01x, ,(unsigned int) (valIA32_MTRRCAP_WC) 
 10);
printf(fix flag: 0x%01x, ,(unsigned int) 
(valIA32_MTRRCAP_FIX)  8);
-   printf(vcnt field: 0x%02x (%d))\n,(unsigned int) 
(valIA32_MTRRCAP_VCNT) , (int) (valIA32_MTRRCAP_VCNT));
+   printf(vcnt field: 0x%02x (%u))\n, vcnt, vcnt);
}
+
+   return vcnt;
 }
 
 static void decode_mtrr_deftype(int cpu, int msr)
@@ -142,7 +146,7 @@
 void dump_mtrrs(struct cpudata *cpu)
 {
unsigned long long val = 0;
-   unsigned int i;
+   unsigned int i, vcnt;
 
if (!(cpu-flags_edx  (X86_FEATURE_MTRR)))
return;
@@ -157,11 +161,11 @@
printf(MTRR registers:\n);
 
printf(MTRRcap (0xfe): );
-   decode_mtrrcap(cpu-number, 0xfe);
+   vcnt = decode_mtrrcap(cpu-number, 0xfe);
 
set_max_phy_addr(cpu);
 
-   for (i = 0; i  16; i+=2) {
+   for (i = 0; i  2 * vcnt; i += 2) {
printf(MTRRphysBase%u (0x%x): , i/2, (unsigned int) 0x200+i);

Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-09 Thread Chris Wilson
On Thu, Oct 09, 2014 at 09:46:37AM -0500, Chuck Ebbert wrote:
 Well they're all the same.
 
 Hmm, x86info is not dumping all the variable MTRRs. You have 10, but
 it only prints the first 8. I don't know if it will show anything
 different, but can you try fixing it with this patch?

Source (https://github.com/dankamongmen/x86info) was slightly different,
but I followed the drift.

tldr: 8,9 appear to be identical on all cpus as well.

$ sudo ./x86info --mtrr --all-cpus
x86info v1.31pre
Found 4 CPUs.
CPU #1:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Core i7 (SandyBridge)
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0d0a wc:1 fix:1 vcnt:10
MTRRphysBase0 (0x200): 0x0006 (physbase:0x00 type: 0x06 
(write-back))
MTRRphysMask0 (0x201): 0x000f8800 (physmask:0xf8 valid:1)
MTRRphysBase1 (0x202): 0x8006 (physbase:0x08 type: 0x06 
(write-back))
MTRRphysMask1 (0x203): 0x000ff800 (physmask:0xff valid:1)
MTRRphysBase2 (0x204): 0x8e00 (physbase:0x08e000 type: 0x00 
(uncacheable))
MTRRphysMask2 (0x205): 0x000ffe000800 (physmask:0xffe000 valid:1)
MTRRphysBase3 (0x206): 0x8d00 (physbase:0x08d000 type: 0x00 
(uncacheable))
MTRRphysMask3 (0x207): 0x000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase4 (0x208): 0x00010006 (physbase:0x10 type: 0x06 
(write-back))
MTRRphysMask4 (0x209): 0x000f8800 (physmask:0xf8 valid:1)
MTRRphysBase5 (0x20a): 0x00017000 (physbase:0x17 type: 0x00 
(uncacheable))
MTRRphysMask5 (0x20b): 0x000ff800 (physmask:0xff valid:1)
MTRRphysBase6 (0x20c): 0x00016f00 (physbase:0x16f000 type: 0x00 
(uncacheable))
MTRRphysMask6 (0x20d): 0x000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase7 (0x20e): 0x00016e80 (physbase:0x16e800 type: 0x00 
(uncacheable))
MTRRphysMask7 (0x20f): 0x000fff800800 (physmask:0xfff800 valid:1)
MTRRphysBase8 (0x210): 0x00016e60 (physbase:0x16e600 type: 0x00 
(uncacheable))
MTRRphysMask8 (0x211): 0x000fffe00800 (physmask:0xfffe00 valid:1)
MTRRphysBase9 (0x212): 0x (physbase:0x00 type: 0x00 
(uncacheable))
MTRRphysMask9 (0x213): 0x (physmask:0x00 valid:0)
MTRRfix64K_0 (0x250): 0x0606060606060606
MTRRfix16K_8 (0x258): 0x0606060606060606
MTRRfix16K_A (0x259): 0x
MTRRfix4K_C8000 (0x269): 0x0505050505050505
MTRRfix4K_D 0x26a: 0x
MTRRfix4K_D8000 0x26b: 0x
MTRRfix4K_E 0x26c: 0x
MTRRfix4K_E8000 0x26d: 0x0505050505050505
MTRRfix4K_F 0x26e: 0x0505050505050505
MTRRfix4K_F8000 0x26f: 0x0505050505050505
MTRRdefType (0x2ff): 0x0c00 (fixed-range flag:1 enable flag:1 
default type:0x00 (uncacheable))

--
CPU #2:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Core i7 (SandyBridge)
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0d0a wc:1 fix:1 vcnt:10
MTRRphysBase0 (0x200): 0x0006 (physbase:0x00 type: 0x06 
(write-back))
MTRRphysMask0 (0x201): 0x000f8800 (physmask:0xf8 valid:1)
MTRRphysBase1 (0x202): 0x8006 (physbase:0x08 type: 0x06 
(write-back))
MTRRphysMask1 (0x203): 0x000ff800 (physmask:0xff valid:1)
MTRRphysBase2 (0x204): 0x8e00 (physbase:0x08e000 type: 0x00 
(uncacheable))
MTRRphysMask2 (0x205): 0x000ffe000800 (physmask:0xffe000 valid:1)
MTRRphysBase3 (0x206): 0x8d00 (physbase:0x08d000 type: 0x00 
(uncacheable))
MTRRphysMask3 (0x207): 0x000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase4 (0x208): 0x00010006 (physbase:0x10 type: 0x06 
(write-back))
MTRRphysMask4 (0x209): 0x000f8800 (physmask:0xf8 valid:1)
MTRRphysBase5 (0x20a): 0x00017000 (physbase:0x17 type: 0x00 
(uncacheable))
MTRRphysMask5 (0x20b): 0x000ff800 (physmask:0xff valid:1)
MTRRphysBase6 (0x20c): 0x00016f00 (physbase:0x16f000 type: 0x00 
(uncacheable))
MTRRphysMask6 (0x20d): 0x000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase7 (0x20e): 0x00016e80 (physbase:0x16e800 type: 0x00 
(uncacheable))
MTRRphysMask7 (0x20f): 0x000fff800800 (physmask:0xfff800 valid:1)
MTRRphysBase8 (0x210): 0x00016e60 (physbase:0x16e600 type: 0x00 
(uncacheable))
MTRRphysMask8 (0x211): 0x000fffe00800 (physmask:0xfffe00 valid:1)
MTRRphysBase9 (0x212): 0x (physbase:0x00 type: 0x00 
(uncacheable))
MTRRphysMask9 (0x213): 0x (physmask:0x00 valid:0)
MTRRfix64K_0 (0x250): 0x0606060606060606
MTRRfix16K_8 (0x258): 0x0606060606060606
MTRRfix16K_A 

Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-08 Thread Masami Hiramatsu
(2014/10/09 2:47), Chuck Ebbert wrote:
> On Wed, 8 Oct 2014 10:03:36 +0100
> Chris Wilson  wrote:
> 
>> and adding that back into the current build, e.g.
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 3632743..48a8a69 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -87,6 +87,7 @@ config X86
>> select HAVE_USER_RETURN_NOTIFIER
>> select ARCH_BINFMT_ELF_RANDOMIZE_PIE
>> select HAVE_ARCH_JUMP_LABEL
>> +   select STOP_MACHINE
>> select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>> select SPARSE_IRQ
>> select GENERIC_FIND_FIRST_BIT
>>
>> fixes the regression.
>>
> 
> Looking closer at this, it seems most configs work by accident,
> because they have MOD_UNLOAD and/or HOTPLUG_CPU enabled. I take it
> you disabled both of those? stop_machine() is called from all kinds
> of places and almost none of them make sure STOP_MACHINE is selected.

I guess most of them expects stop_machine() is not a configurable
feature...
If some of them requires stop_machine(), it should enable it on its
kconfig entry (including ftrace, kprobes).

> $ find -name Kconf\* | xargs grep STOP_MACHINE
> ./init/Kconfig:config STOP_MACHINE
> 
> All these places use stop_machine():
> 
> mm/page_alloc.c, line 3886
> drivers/xen/manage.c, line 130
> drivers/char/hw_random/intel-rng.c, line 373
> arch/powerpc/mm/numa.c:
> line 1616
> line 1623 
> arch/powerpc/platforms/powernv/subcore.c, line 324
> arch/arm/kernel/kprobes.c, line 165
> arch/arm/kernel/patch.c:
> line 64
> line 71 
> arch/s390/kernel/jump_label.c, line 61
> arch/s390/kernel/kprobes.c:
> line 311
> line 320 
> arch/s390/kernel/time.c:
> line 820
> line 1590 
> arch/x86/kernel/cpu/mtrr/main.c, line 231
> arch/arm64/kernel/insn.c, line 181
> kernel/time/timekeeping.c, line 892
> kernel/trace/ftrace.c, line 2219
> kernel/module.c:
> line 770
> line 1861 
> 

BTW, as I sent a series of patches, the last two can be removed.
https://lkml.org/lkml/2014/8/25/142

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-08 Thread H. Peter Anvin
On 10/08/2014 12:49 PM, Chris Wilson wrote:
> 
> Indeed, this appears to be the explanation. (And here I thought PAT
> superseded mtrrs - i915.ko stopped trying to use assign an mtrr for its
> GTT quite a while ago.)
> 
> Replacing the stop_machine there with on_each_cpu does the trick:
> 

It should, but there seem to be quite a few drivers which still muck
with MTRRs.  However, i915 is not one of them, it calls
io_mapping_create_wc() followed by arch_phys_wc_add(), so I'm wondering
what the heck is going on here.

> Naively I would say that we lost the wc on our ioremap.
> /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
> runs.

Could you tell me what the above looks like?

-hpa




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-08 Thread Chris Wilson
On Wed, Oct 08, 2014 at 05:10:59AM -0500, Chuck Ebbert wrote:
> On Wed, 8 Oct 2014 10:03:36 +0100
> Chris Wilson  wrote:
> 
> > 
> > I ran into a problem on a Sandybridge i5-2500s whilst measuring the
> > performance of GTT write-combining access. I found subsequent runs were
> > about 10-40x slower than the first. For example,
> > 
> > igt/gem_gtt_speed:
> > 
> > Time to read 16k through a GTT map: 325.285µs
> > Time to write 16k through a GTT map:  4.729µs
> > Time to clear 16k through a GTT map:  4.584µs
> > Time to clear 16k through a cached GTT map:   1.342µs
> > 
> > on the second run became:
> > 
> > Time to read 16k through a GTT map: 332.148µs
> > Time to write 16k through a GTT map:209.411µs
> > Time to clear 16k through a GTT map: 56.460µs
> > Time to clear 16k through a cached GTT map:  50.897µs
> > 
> > Naively I would say that we lost the wc on our ioremap.
> > /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
> > runs.
> > 
> > A bisection pointed to 
> > 
> > commit ea8596bb2d8d37957f3e92db9511c50801689180
> > Author: Masami Hiramatsu 
> > Date:   Thu Jul 18 20:47:53 2013 +0900
> > 
> > kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() 
> > functions
> > 
> > of which the active ingredient was just
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index b32ebf9..f4001e0 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
> >  
> >  config HAVE_TEXT_POKE_SMP
> > bool
> > -   select STOP_MACHINE if SMP
> >  
> >  config X86_DEV_DMA_OPS
> > bool
> > 
> > and adding that back into the current build, e.g.
> 
> Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of
> sync and your results depend on which CPU the test runs on?

Indeed, this appears to be the explanation. (And here I thought PAT
superseded mtrrs - i915.ko stopped trying to use assign an mtrr for its
GTT quite a while ago.)

Replacing the stop_machine there with on_each_cpu does the trick:

diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index f961de9..c0e37d5 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -151,7 +151,7 @@ struct set_mtrr_data {
  *
  * Returns nothing.
  */
-static int mtrr_rendezvous_handler(void *info)
+static void mtrr_rendezvous_handler(void *info)
 {
struct set_mtrr_data *data = info;
 
@@ -174,7 +174,6 @@ static int mtrr_rendezvous_handler(void *info)
} else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
mtrr_if->set_all();
}
-   return 0;
 }
 
 static inline int types_compatible(mtrr_type type1, mtrr_type type2)
@@ -228,7 +227,7 @@ set_mtrr(unsigned int reg, unsigned long base, unsigned 
long size, mtrr_type typ
  .smp_type = type
};
 
-   stop_machine(mtrr_rendezvous_handler, , cpu_online_mask);
+   on_each_cpu_mask(cpu_online_mask, mtrr_rendezvous_handler, , true);
 }
 
 static void set_mtrr_from_inactive_cpu(unsigned int reg, unsigned long base,
@@ -240,8 +239,7 @@ static void set_mtrr_from_inactive_cpu(unsigned int reg, 
unsigned long base,
  .smp_type = type
};
 
-   stop_machine_from_inactive_cpu(mtrr_rendezvous_handler, ,
-  cpu_callout_mask);
+   on_each_cpu_mask(cpu_callout_mask, mtrr_rendezvous_handler, , 
true);
 }
 
 /**

-- 
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-08 Thread Chuck Ebbert
On Wed, 8 Oct 2014 10:03:36 +0100
Chris Wilson  wrote:

> and adding that back into the current build, e.g.
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 3632743..48a8a69 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -87,6 +87,7 @@ config X86
> select HAVE_USER_RETURN_NOTIFIER
> select ARCH_BINFMT_ELF_RANDOMIZE_PIE
> select HAVE_ARCH_JUMP_LABEL
> +   select STOP_MACHINE
> select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
> select SPARSE_IRQ
> select GENERIC_FIND_FIRST_BIT
> 
> fixes the regression.
> 

Looking closer at this, it seems most configs work by accident,
because they have MOD_UNLOAD and/or HOTPLUG_CPU enabled. I take it
you disabled both of those? stop_machine() is called from all kinds
of places and almost none of them make sure STOP_MACHINE is selected.

$ find -name Kconf\* | xargs grep STOP_MACHINE
./init/Kconfig:config STOP_MACHINE

All these places use stop_machine():

mm/page_alloc.c, line 3886
drivers/xen/manage.c, line 130
drivers/char/hw_random/intel-rng.c, line 373
arch/powerpc/mm/numa.c:
line 1616
line 1623 
arch/powerpc/platforms/powernv/subcore.c, line 324
arch/arm/kernel/kprobes.c, line 165
arch/arm/kernel/patch.c:
line 64
line 71 
arch/s390/kernel/jump_label.c, line 61
arch/s390/kernel/kprobes.c:
line 311
line 320 
arch/s390/kernel/time.c:
line 820
line 1590 
arch/x86/kernel/cpu/mtrr/main.c, line 231
arch/arm64/kernel/insn.c, line 181
kernel/time/timekeeping.c, line 892
kernel/trace/ftrace.c, line 2219
kernel/module.c:
line 770
line 1861 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-08 Thread Chuck Ebbert
On Wed, 8 Oct 2014 10:03:36 +0100
Chris Wilson  wrote:

> 
> I ran into a problem on a Sandybridge i5-2500s whilst measuring the
> performance of GTT write-combining access. I found subsequent runs were
> about 10-40x slower than the first. For example,
> 
> igt/gem_gtt_speed:
> 
> Time to read 16k through a GTT map: 325.285µs
> Time to write 16k through a GTT map:  4.729µs
> Time to clear 16k through a GTT map:  4.584µs
> Time to clear 16k through a cached GTT map:   1.342µs
> 
> on the second run became:
> 
> Time to read 16k through a GTT map: 332.148µs
> Time to write 16k through a GTT map:209.411µs
> Time to clear 16k through a GTT map: 56.460µs
> Time to clear 16k through a cached GTT map:  50.897µs
> 
> Naively I would say that we lost the wc on our ioremap.
> /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
> runs.
> 
> A bisection pointed to 
> 
> commit ea8596bb2d8d37957f3e92db9511c50801689180
> Author: Masami Hiramatsu 
> Date:   Thu Jul 18 20:47:53 2013 +0900
> 
> kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() 
> functions
> 
> of which the active ingredient was just
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b32ebf9..f4001e0 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
>  
>  config HAVE_TEXT_POKE_SMP
> bool
> -   select STOP_MACHINE if SMP
>  
>  config X86_DEV_DMA_OPS
> bool
> 
> and adding that back into the current build, e.g.

Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of
sync and your results depend on which CPU the test runs on?

> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 3632743..48a8a69 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -87,6 +87,7 @@ config X86
> select HAVE_USER_RETURN_NOTIFIER
> select ARCH_BINFMT_ELF_RANDOMIZE_PIE
> select HAVE_ARCH_JUMP_LABEL
> +   select STOP_MACHINE
> select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
> select SPARSE_IRQ
> select GENERIC_FIND_FIRST_BIT
> 
> fixes the regression.
> 
> For the record, this kernel build doesn't use modules, which seems relevant
> in light of ea8596bb2 "fixes a Kconfig dependency issue on STOP_MACHINE
> in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-08 Thread Chris Wilson

I ran into a problem on a Sandybridge i5-2500s whilst measuring the
performance of GTT write-combining access. I found subsequent runs were
about 10-40x slower than the first. For example,

igt/gem_gtt_speed:

Time to read 16k through a GTT map: 325.285µs
Time to write 16k through a GTT map:  4.729µs
Time to clear 16k through a GTT map:  4.584µs
Time to clear 16k through a cached GTT map:   1.342µs

on the second run became:

Time to read 16k through a GTT map: 332.148µs
Time to write 16k through a GTT map:209.411µs
Time to clear 16k through a GTT map: 56.460µs
Time to clear 16k through a cached GTT map:  50.897µs

Naively I would say that we lost the wc on our ioremap.
/sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
runs.

A bisection pointed to 

commit ea8596bb2d8d37957f3e92db9511c50801689180
Author: Masami Hiramatsu 
Date:   Thu Jul 18 20:47:53 2013 +0900

kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() 
functions

of which the active ingredient was just

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b32ebf9..f4001e0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
 
 config HAVE_TEXT_POKE_SMP
bool
-   select STOP_MACHINE if SMP
 
 config X86_DEV_DMA_OPS
bool

and adding that back into the current build, e.g.

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3632743..48a8a69 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -87,6 +87,7 @@ config X86
select HAVE_USER_RETURN_NOTIFIER
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select HAVE_ARCH_JUMP_LABEL
+   select STOP_MACHINE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select SPARSE_IRQ
select GENERIC_FIND_FIRST_BIT

fixes the regression.

For the record, this kernel build doesn't use modules, which seems relevant
in light of ea8596bb2 "fixes a Kconfig dependency issue on STOP_MACHINE
in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD".
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-08 Thread Chris Wilson

I ran into a problem on a Sandybridge i5-2500s whilst measuring the
performance of GTT write-combining access. I found subsequent runs were
about 10-40x slower than the first. For example,

igt/gem_gtt_speed:

Time to read 16k through a GTT map: 325.285µs
Time to write 16k through a GTT map:  4.729µs
Time to clear 16k through a GTT map:  4.584µs
Time to clear 16k through a cached GTT map:   1.342µs

on the second run became:

Time to read 16k through a GTT map: 332.148µs
Time to write 16k through a GTT map:209.411µs
Time to clear 16k through a GTT map: 56.460µs
Time to clear 16k through a cached GTT map:  50.897µs

Naively I would say that we lost the wc on our ioremap.
/sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
runs.

A bisection pointed to 

commit ea8596bb2d8d37957f3e92db9511c50801689180
Author: Masami Hiramatsu masami.hiramatsu...@hitachi.com
Date:   Thu Jul 18 20:47:53 2013 +0900

kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() 
functions

of which the active ingredient was just

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b32ebf9..f4001e0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
 
 config HAVE_TEXT_POKE_SMP
bool
-   select STOP_MACHINE if SMP
 
 config X86_DEV_DMA_OPS
bool

and adding that back into the current build, e.g.

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3632743..48a8a69 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -87,6 +87,7 @@ config X86
select HAVE_USER_RETURN_NOTIFIER
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select HAVE_ARCH_JUMP_LABEL
+   select STOP_MACHINE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select SPARSE_IRQ
select GENERIC_FIND_FIRST_BIT

fixes the regression.

For the record, this kernel build doesn't use modules, which seems relevant
in light of ea8596bb2 fixes a Kconfig dependency issue on STOP_MACHINE
in the case of CONFIG_SMP  !CONFIG_MODULE_UNLOAD.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-08 Thread Chuck Ebbert
On Wed, 8 Oct 2014 10:03:36 +0100
Chris Wilson ch...@chris-wilson.co.uk wrote:

 
 I ran into a problem on a Sandybridge i5-2500s whilst measuring the
 performance of GTT write-combining access. I found subsequent runs were
 about 10-40x slower than the first. For example,
 
 igt/gem_gtt_speed:
 
 Time to read 16k through a GTT map: 325.285µs
 Time to write 16k through a GTT map:  4.729µs
 Time to clear 16k through a GTT map:  4.584µs
 Time to clear 16k through a cached GTT map:   1.342µs
 
 on the second run became:
 
 Time to read 16k through a GTT map: 332.148µs
 Time to write 16k through a GTT map:209.411µs
 Time to clear 16k through a GTT map: 56.460µs
 Time to clear 16k through a cached GTT map:  50.897µs
 
 Naively I would say that we lost the wc on our ioremap.
 /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
 runs.
 
 A bisection pointed to 
 
 commit ea8596bb2d8d37957f3e92db9511c50801689180
 Author: Masami Hiramatsu masami.hiramatsu...@hitachi.com
 Date:   Thu Jul 18 20:47:53 2013 +0900
 
 kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() 
 functions
 
 of which the active ingredient was just
 
 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
 index b32ebf9..f4001e0 100644
 --- a/arch/x86/Kconfig
 +++ b/arch/x86/Kconfig
 @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
  
  config HAVE_TEXT_POKE_SMP
 bool
 -   select STOP_MACHINE if SMP
  
  config X86_DEV_DMA_OPS
 bool
 
 and adding that back into the current build, e.g.

Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of
sync and your results depend on which CPU the test runs on?

 
 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
 index 3632743..48a8a69 100644
 --- a/arch/x86/Kconfig
 +++ b/arch/x86/Kconfig
 @@ -87,6 +87,7 @@ config X86
 select HAVE_USER_RETURN_NOTIFIER
 select ARCH_BINFMT_ELF_RANDOMIZE_PIE
 select HAVE_ARCH_JUMP_LABEL
 +   select STOP_MACHINE
 select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
 select SPARSE_IRQ
 select GENERIC_FIND_FIRST_BIT
 
 fixes the regression.
 
 For the record, this kernel build doesn't use modules, which seems relevant
 in light of ea8596bb2 fixes a Kconfig dependency issue on STOP_MACHINE
 in the case of CONFIG_SMP  !CONFIG_MODULE_UNLOAD.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-08 Thread Chuck Ebbert
On Wed, 8 Oct 2014 10:03:36 +0100
Chris Wilson ch...@chris-wilson.co.uk wrote:

 and adding that back into the current build, e.g.
 
 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
 index 3632743..48a8a69 100644
 --- a/arch/x86/Kconfig
 +++ b/arch/x86/Kconfig
 @@ -87,6 +87,7 @@ config X86
 select HAVE_USER_RETURN_NOTIFIER
 select ARCH_BINFMT_ELF_RANDOMIZE_PIE
 select HAVE_ARCH_JUMP_LABEL
 +   select STOP_MACHINE
 select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
 select SPARSE_IRQ
 select GENERIC_FIND_FIRST_BIT
 
 fixes the regression.
 

Looking closer at this, it seems most configs work by accident,
because they have MOD_UNLOAD and/or HOTPLUG_CPU enabled. I take it
you disabled both of those? stop_machine() is called from all kinds
of places and almost none of them make sure STOP_MACHINE is selected.

$ find -name Kconf\* | xargs grep STOP_MACHINE
./init/Kconfig:config STOP_MACHINE

All these places use stop_machine():

mm/page_alloc.c, line 3886
drivers/xen/manage.c, line 130
drivers/char/hw_random/intel-rng.c, line 373
arch/powerpc/mm/numa.c:
line 1616
line 1623 
arch/powerpc/platforms/powernv/subcore.c, line 324
arch/arm/kernel/kprobes.c, line 165
arch/arm/kernel/patch.c:
line 64
line 71 
arch/s390/kernel/jump_label.c, line 61
arch/s390/kernel/kprobes.c:
line 311
line 320 
arch/s390/kernel/time.c:
line 820
line 1590 
arch/x86/kernel/cpu/mtrr/main.c, line 231
arch/arm64/kernel/insn.c, line 181
kernel/time/timekeeping.c, line 892
kernel/trace/ftrace.c, line 2219
kernel/module.c:
line 770
line 1861 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-08 Thread Chris Wilson
On Wed, Oct 08, 2014 at 05:10:59AM -0500, Chuck Ebbert wrote:
 On Wed, 8 Oct 2014 10:03:36 +0100
 Chris Wilson ch...@chris-wilson.co.uk wrote:
 
  
  I ran into a problem on a Sandybridge i5-2500s whilst measuring the
  performance of GTT write-combining access. I found subsequent runs were
  about 10-40x slower than the first. For example,
  
  igt/gem_gtt_speed:
  
  Time to read 16k through a GTT map: 325.285µs
  Time to write 16k through a GTT map:  4.729µs
  Time to clear 16k through a GTT map:  4.584µs
  Time to clear 16k through a cached GTT map:   1.342µs
  
  on the second run became:
  
  Time to read 16k through a GTT map: 332.148µs
  Time to write 16k through a GTT map:209.411µs
  Time to clear 16k through a GTT map: 56.460µs
  Time to clear 16k through a cached GTT map:  50.897µs
  
  Naively I would say that we lost the wc on our ioremap.
  /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
  runs.
  
  A bisection pointed to 
  
  commit ea8596bb2d8d37957f3e92db9511c50801689180
  Author: Masami Hiramatsu masami.hiramatsu...@hitachi.com
  Date:   Thu Jul 18 20:47:53 2013 +0900
  
  kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() 
  functions
  
  of which the active ingredient was just
  
  diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
  index b32ebf9..f4001e0 100644
  --- a/arch/x86/Kconfig
  +++ b/arch/x86/Kconfig
  @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
   
   config HAVE_TEXT_POKE_SMP
  bool
  -   select STOP_MACHINE if SMP
   
   config X86_DEV_DMA_OPS
  bool
  
  and adding that back into the current build, e.g.
 
 Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of
 sync and your results depend on which CPU the test runs on?

Indeed, this appears to be the explanation. (And here I thought PAT
superseded mtrrs - i915.ko stopped trying to use assign an mtrr for its
GTT quite a while ago.)

Replacing the stop_machine there with on_each_cpu does the trick:

diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index f961de9..c0e37d5 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -151,7 +151,7 @@ struct set_mtrr_data {
  *
  * Returns nothing.
  */
-static int mtrr_rendezvous_handler(void *info)
+static void mtrr_rendezvous_handler(void *info)
 {
struct set_mtrr_data *data = info;
 
@@ -174,7 +174,6 @@ static int mtrr_rendezvous_handler(void *info)
} else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
mtrr_if-set_all();
}
-   return 0;
 }
 
 static inline int types_compatible(mtrr_type type1, mtrr_type type2)
@@ -228,7 +227,7 @@ set_mtrr(unsigned int reg, unsigned long base, unsigned 
long size, mtrr_type typ
  .smp_type = type
};
 
-   stop_machine(mtrr_rendezvous_handler, data, cpu_online_mask);
+   on_each_cpu_mask(cpu_online_mask, mtrr_rendezvous_handler, data, true);
 }
 
 static void set_mtrr_from_inactive_cpu(unsigned int reg, unsigned long base,
@@ -240,8 +239,7 @@ static void set_mtrr_from_inactive_cpu(unsigned int reg, 
unsigned long base,
  .smp_type = type
};
 
-   stop_machine_from_inactive_cpu(mtrr_rendezvous_handler, data,
-  cpu_callout_mask);
+   on_each_cpu_mask(cpu_callout_mask, mtrr_rendezvous_handler, data, 
true);
 }
 
 /**

-- 
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-08 Thread H. Peter Anvin
On 10/08/2014 12:49 PM, Chris Wilson wrote:
 
 Indeed, this appears to be the explanation. (And here I thought PAT
 superseded mtrrs - i915.ko stopped trying to use assign an mtrr for its
 GTT quite a while ago.)
 
 Replacing the stop_machine there with on_each_cpu does the trick:
 

It should, but there seem to be quite a few drivers which still muck
with MTRRs.  However, i915 is not one of them, it calls
io_mapping_create_wc() followed by arch_phys_wc_add(), so I'm wondering
what the heck is going on here.

 Naively I would say that we lost the wc on our ioremap.
 /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
 runs.

Could you tell me what the above looks like?

-hpa




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915.ko WC writes are slow after ea8596bb2d8d379

2014-10-08 Thread Masami Hiramatsu
(2014/10/09 2:47), Chuck Ebbert wrote:
 On Wed, 8 Oct 2014 10:03:36 +0100
 Chris Wilson ch...@chris-wilson.co.uk wrote:
 
 and adding that back into the current build, e.g.

 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
 index 3632743..48a8a69 100644
 --- a/arch/x86/Kconfig
 +++ b/arch/x86/Kconfig
 @@ -87,6 +87,7 @@ config X86
 select HAVE_USER_RETURN_NOTIFIER
 select ARCH_BINFMT_ELF_RANDOMIZE_PIE
 select HAVE_ARCH_JUMP_LABEL
 +   select STOP_MACHINE
 select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
 select SPARSE_IRQ
 select GENERIC_FIND_FIRST_BIT

 fixes the regression.

 
 Looking closer at this, it seems most configs work by accident,
 because they have MOD_UNLOAD and/or HOTPLUG_CPU enabled. I take it
 you disabled both of those? stop_machine() is called from all kinds
 of places and almost none of them make sure STOP_MACHINE is selected.

I guess most of them expects stop_machine() is not a configurable
feature...
If some of them requires stop_machine(), it should enable it on its
kconfig entry (including ftrace, kprobes).

 $ find -name Kconf\* | xargs grep STOP_MACHINE
 ./init/Kconfig:config STOP_MACHINE
 
 All these places use stop_machine():
 
 mm/page_alloc.c, line 3886
 drivers/xen/manage.c, line 130
 drivers/char/hw_random/intel-rng.c, line 373
 arch/powerpc/mm/numa.c:
 line 1616
 line 1623 
 arch/powerpc/platforms/powernv/subcore.c, line 324
 arch/arm/kernel/kprobes.c, line 165
 arch/arm/kernel/patch.c:
 line 64
 line 71 
 arch/s390/kernel/jump_label.c, line 61
 arch/s390/kernel/kprobes.c:
 line 311
 line 320 
 arch/s390/kernel/time.c:
 line 820
 line 1590 
 arch/x86/kernel/cpu/mtrr/main.c, line 231
 arch/arm64/kernel/insn.c, line 181
 kernel/time/timekeeping.c, line 892
 kernel/trace/ftrace.c, line 2219
 kernel/module.c:
 line 770
 line 1861 
 

BTW, as I sent a series of patches, the last two can be removed.
https://lkml.org/lkml/2014/8/25/142

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/