Re: [PATCH] drm/bridge: adv7511: Attach to DSI host at probe time

2019-06-25 Thread Matt Redfearn
Hi,

Any feedback on this patch?

Thanks,
Matt

On 24/04/2019 14:22, Matthew Redfearn wrote:
> In contrast to all of the DSI panel drivers in drivers/gpu/drm/panel
> which attach to the DSI host via mipi_dsi_attach() at probe time, the
> ADV7533 bridge device does not. Instead it defers this to the point that
> the upstream device connects to its bridge via drm_bridge_attach().
> The generic Synopsys MIPI DSI host driver does not register it's own
> drm_bridge until the MIPI DSI has attached. But it does not call
> drm_bridge_attach() on the downstream device until the upstream device
> has attached. This leads to a chicken and the egg failure and the DRM
> pipeline does not complete.
> Since all other mipi_dsi_device drivers call mipi_dsi_attach() in
> probe(), make the adv7533 mipi_dsi_device do the same. This ensures that
> the Synopsys MIPI DSI host registers it's bridge such that it is
> available for the upstream device to connect to.
> 
> Signed-off-by: Matt Redfearn 
> 
> ---
> 
>   drivers/gpu/drm/bridge/adv7511/adv7511_drv.c | 9 +
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c 
> b/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
> index e7ddd3e3db9..ea36ac3a3de 100644
> --- a/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
> +++ b/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
> @@ -874,9 +874,6 @@ static int adv7511_bridge_attach(struct drm_bridge 
> *bridge)
>_connector_helper_funcs);
>   drm_connector_attach_encoder(>connector, bridge->encoder);
>   
> - if (adv->type == ADV7533)
> - ret = adv7533_attach_dsi(adv);
> -
>   if (adv->i2c_main->irq)
>   regmap_write(adv->regmap, ADV7511_REG_INT_ENABLE(0),
>ADV7511_INT0_HPD);
> @@ -1222,7 +1219,11 @@ static int adv7511_probe(struct i2c_client *i2c, const 
> struct i2c_device_id *id)
>   drm_bridge_add(>bridge);
>   
>   adv7511_audio_init(dev, adv7511);
> - return 0;
> +
> + if (adv7511->type == ADV7533)
> + return adv7533_attach_dsi(adv7511);
> + else
> + return 0;
>   
>   err_unregister_cec:
>   i2c_unregister_device(adv7511->i2c_cec);
> 


[PATCH v3 2/2] MIPS: memset.S: Add comments to fault fixup handlers

2018-05-23 Thread Matt Redfearn
It is not immediately obvious what the expected inputs to these fault
handlers is and how they calculate the number of unset bytes. Having
stared deeply at this in order to fix some corner cases, add some
comments to assist those who follow.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Changes in v3:
- Update comment on .Lbyte_fixup to reflect corrected behavior

Changes in v2:
- Add comments to fault handlers in new, separate, patch.

 arch/mips/lib/memset.S | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index fac26ce64b2..3a6f34ef5ff 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -232,16 +232,25 @@
 
 #ifdef CONFIG_CPU_MIPSR6
 .Lbyte_fixup\@:
+   /*
+* unset_bytes = (#bytes - (#unaligned bytes)) - (-#unaligned bytes 
remaining + 1) + 1
+*  a2 = a2-  t0
   + 1
+*/
PTR_SUBUa2, t0
jr  ra
 PTR_ADDIU  a2, 1
 #endif /* CONFIG_CPU_MIPSR6 */
 
 .Lfirst_fixup\@:
+   /* unset_bytes already in a2 */
jr  ra
 nop
 
 .Lfwd_fixup\@:
+   /*
+* unset_bytes = partial_start_addr +  #bytes   - fault_addr
+*  a2 = t1 + (a2 & 3f) - $28->task->BUADDR
+*/
PTR_L   t0, TI_TASK($28)
andia2, 0x3f
LONG_L  t0, THREAD_BUADDR(t0)
@@ -250,6 +259,10 @@
 LONG_SUBU  a2, t0
 
 .Lpartial_fixup\@:
+   /*
+* unset_bytes = partial_end_addr +  #bytes - fault_addr
+*  a2 =   a0 + (a2 & STORMASK) - $28->task->BUADDR
+*/
PTR_L   t0, TI_TASK($28)
andia2, STORMASK
LONG_L  t0, THREAD_BUADDR(t0)
@@ -258,10 +271,15 @@
 LONG_SUBU  a2, t0
 
 .Llast_fixup\@:
+   /* unset_bytes already in a2 */
jr  ra
 nop
 
 .Lsmall_fixup\@:
+   /*
+* unset_bytes = end_addr - current_addr + 1
+*  a2 =t1-  a0  + 1
+*/
PTR_SUBUa2, t1, a0
jr  ra
 PTR_ADDIU  a2, 1
-- 
2.7.4



[PATCH v3 2/2] MIPS: memset.S: Add comments to fault fixup handlers

2018-05-23 Thread Matt Redfearn
It is not immediately obvious what the expected inputs to these fault
handlers is and how they calculate the number of unset bytes. Having
stared deeply at this in order to fix some corner cases, add some
comments to assist those who follow.

Signed-off-by: Matt Redfearn 
---

Changes in v3:
- Update comment on .Lbyte_fixup to reflect corrected behavior

Changes in v2:
- Add comments to fault handlers in new, separate, patch.

 arch/mips/lib/memset.S | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index fac26ce64b2..3a6f34ef5ff 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -232,16 +232,25 @@
 
 #ifdef CONFIG_CPU_MIPSR6
 .Lbyte_fixup\@:
+   /*
+* unset_bytes = (#bytes - (#unaligned bytes)) - (-#unaligned bytes 
remaining + 1) + 1
+*  a2 = a2-  t0
   + 1
+*/
PTR_SUBUa2, t0
jr  ra
 PTR_ADDIU  a2, 1
 #endif /* CONFIG_CPU_MIPSR6 */
 
 .Lfirst_fixup\@:
+   /* unset_bytes already in a2 */
jr  ra
 nop
 
 .Lfwd_fixup\@:
+   /*
+* unset_bytes = partial_start_addr +  #bytes   - fault_addr
+*  a2 = t1 + (a2 & 3f) - $28->task->BUADDR
+*/
PTR_L   t0, TI_TASK($28)
andia2, 0x3f
LONG_L  t0, THREAD_BUADDR(t0)
@@ -250,6 +259,10 @@
 LONG_SUBU  a2, t0
 
 .Lpartial_fixup\@:
+   /*
+* unset_bytes = partial_end_addr +  #bytes - fault_addr
+*  a2 =   a0 + (a2 & STORMASK) - $28->task->BUADDR
+*/
PTR_L   t0, TI_TASK($28)
andia2, STORMASK
LONG_L  t0, THREAD_BUADDR(t0)
@@ -258,10 +271,15 @@
 LONG_SUBU  a2, t0
 
 .Llast_fixup\@:
+   /* unset_bytes already in a2 */
jr  ra
 nop
 
 .Lsmall_fixup\@:
+   /*
+* unset_bytes = end_addr - current_addr + 1
+*  a2 =t1-  a0  + 1
+*/
PTR_SUBUa2, t1, a0
jr  ra
 PTR_ADDIU  a2, 1
-- 
2.7.4



[PATCH v3 1/2] MIPS: memset.S: Fix byte_fixup for MIPSr6

2018-05-23 Thread Matt Redfearn
The __clear_user function is defined to return the number of bytes that
could not be cleared. From the underlying memset / bzero implementation
this means setting register a2 to that number on return. Currently if a
page fault is triggered within the MIPSr6 version of setting of initial
unaligned bytes, the value loaded into a2 on return is meaningless.

During the MIPSr6 version of the initial unaligned bytes block, register
a2 contains the number of bytes to be set beyond the initial unaligned
bytes. The t0 register is initally set to the number of unaligned bytes
- STORSIZE, effectively a negative version of the number of unaligned
bytes. This is then incremented before each byte is saved.

The label .Lbyte_fixup\@ is jumped to on page fault. Currently the value
in a2 is incorrectly replaced by 0 - t0 + 1, effectively the number of
unaligned bytes remaining. This leads to the failures being reported by
the following test code:

static int __init test_clear_user(void)
{
int j, k;

pr_info("\n\n\nTesting clear_user\n");
for (j = 0; j < 512; j++) {
if ((k = clear_user(NULL+3, j)) != j) {
pr_err("clear_user (NULL %d) returned %d\n", j, k);
}
}
return 0;
}
late_initcall(test_clear_user);

Which reports:
[3.965439] Testing clear_user
[3.973169] clear_user (NULL 8) returned 6
[3.976782] clear_user (NULL 9) returned 6
[3.980390] clear_user (NULL 10) returned 6
[3.984052] clear_user (NULL 11) returned 6
[3.987524] clear_user (NULL 12) returned 6

Fix this by subtracting t0 from a2 (rather than $0), effectivey giving:
unset_bytes = (#bytes - (#unaligned bytes)) - (-#unaligned bytes remaining + 1) 
+ 1
 a2 = a2-  t0   
+ 1

This fixes the value returned from __clear user when the number of bytes
to set is > LONGSIZE and the address is invalid and unaligned.

Unfortunately, this breaks the fixup handling for unaligned bytes after
the final long, where register a2 still contains the number of bytes
remaining to be set and the t0 register is to 0 - the number of
unaligned bytes remaining.

Because t0 is now is now subtracted from a2 rather than 0, the number of
bytes unset is reported incorrectly:

static int __init test_clear_user(void)
{
char *test;
int j, k;

pr_info("\n\n\nTesting clear_user\n");
test = vmalloc(PAGE_SIZE);

for (j = 256; j < 512; j++) {
if ((k = clear_user(test + PAGE_SIZE - 254, j)) != j - 254) {
pr_err("clear_user (%px %d) returned %d\n",
test + PAGE_SIZE - 254, j, k);
}
}
return 0;
}
late_initcall(test_clear_user);

[3.976775] clear_user (c000df02 256) returned 4
[3.981957] clear_user (c000df02 257) returned 6
[3.986425] clear_user (c000df02 258) returned 8
[3.990850] clear_user (c000df02 259) returned 10
[3.995332] clear_user (c000df02 260) returned 12
[3.999815] clear_user (c000df02 261) returned 14

Fix this by ensuring that a2 is set to 0 during the set of final
unaligned bytes.

Fixes: 8c56208aff77 ("MIPS: lib: memset: Add MIPS R6 support")
Cc: sta...@vger.kernel.org
Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>

---

Changes in v3:
New patch to fix fault handling during MIPSr6 version of setting
unaligned bytes.

Changes in v2: None

 arch/mips/lib/memset.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 1cc306520a5..fac26ce64b2 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -195,6 +195,7 @@
 #endif
 #else
 PTR_SUBU   t0, $0, a2
+   movea2, zero/* No remaining longs */
PTR_ADDIU   t0, 1
STORE_BYTE(0)
STORE_BYTE(1)
@@ -231,7 +232,7 @@
 
 #ifdef CONFIG_CPU_MIPSR6
 .Lbyte_fixup\@:
-   PTR_SUBUa2, $0, t0
+   PTR_SUBUa2, t0
jr  ra
 PTR_ADDIU  a2, 1
 #endif /* CONFIG_CPU_MIPSR6 */
-- 
2.7.4



[PATCH v3 1/2] MIPS: memset.S: Fix byte_fixup for MIPSr6

2018-05-23 Thread Matt Redfearn
The __clear_user function is defined to return the number of bytes that
could not be cleared. From the underlying memset / bzero implementation
this means setting register a2 to that number on return. Currently if a
page fault is triggered within the MIPSr6 version of setting of initial
unaligned bytes, the value loaded into a2 on return is meaningless.

During the MIPSr6 version of the initial unaligned bytes block, register
a2 contains the number of bytes to be set beyond the initial unaligned
bytes. The t0 register is initally set to the number of unaligned bytes
- STORSIZE, effectively a negative version of the number of unaligned
bytes. This is then incremented before each byte is saved.

The label .Lbyte_fixup\@ is jumped to on page fault. Currently the value
in a2 is incorrectly replaced by 0 - t0 + 1, effectively the number of
unaligned bytes remaining. This leads to the failures being reported by
the following test code:

static int __init test_clear_user(void)
{
int j, k;

pr_info("\n\n\nTesting clear_user\n");
for (j = 0; j < 512; j++) {
if ((k = clear_user(NULL+3, j)) != j) {
pr_err("clear_user (NULL %d) returned %d\n", j, k);
}
}
return 0;
}
late_initcall(test_clear_user);

Which reports:
[3.965439] Testing clear_user
[3.973169] clear_user (NULL 8) returned 6
[3.976782] clear_user (NULL 9) returned 6
[3.980390] clear_user (NULL 10) returned 6
[3.984052] clear_user (NULL 11) returned 6
[3.987524] clear_user (NULL 12) returned 6

Fix this by subtracting t0 from a2 (rather than $0), effectivey giving:
unset_bytes = (#bytes - (#unaligned bytes)) - (-#unaligned bytes remaining + 1) 
+ 1
 a2 = a2-  t0   
+ 1

This fixes the value returned from __clear user when the number of bytes
to set is > LONGSIZE and the address is invalid and unaligned.

Unfortunately, this breaks the fixup handling for unaligned bytes after
the final long, where register a2 still contains the number of bytes
remaining to be set and the t0 register is to 0 - the number of
unaligned bytes remaining.

Because t0 is now is now subtracted from a2 rather than 0, the number of
bytes unset is reported incorrectly:

static int __init test_clear_user(void)
{
char *test;
int j, k;

pr_info("\n\n\nTesting clear_user\n");
test = vmalloc(PAGE_SIZE);

for (j = 256; j < 512; j++) {
if ((k = clear_user(test + PAGE_SIZE - 254, j)) != j - 254) {
pr_err("clear_user (%px %d) returned %d\n",
test + PAGE_SIZE - 254, j, k);
}
}
return 0;
}
late_initcall(test_clear_user);

[3.976775] clear_user (c000df02 256) returned 4
[3.981957] clear_user (c000df02 257) returned 6
[3.986425] clear_user (c000df02 258) returned 8
[3.990850] clear_user (c000df02 259) returned 10
[3.995332] clear_user (c000df02 260) returned 12
[3.999815] clear_user (c000df02 261) returned 14

Fix this by ensuring that a2 is set to 0 during the set of final
unaligned bytes.

Fixes: 8c56208aff77 ("MIPS: lib: memset: Add MIPS R6 support")
Cc: sta...@vger.kernel.org
Signed-off-by: Matt Redfearn 

---

Changes in v3:
New patch to fix fault handling during MIPSr6 version of setting
unaligned bytes.

Changes in v2: None

 arch/mips/lib/memset.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 1cc306520a5..fac26ce64b2 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -195,6 +195,7 @@
 #endif
 #else
 PTR_SUBU   t0, $0, a2
+   movea2, zero/* No remaining longs */
PTR_ADDIU   t0, 1
STORE_BYTE(0)
STORE_BYTE(1)
@@ -231,7 +232,7 @@
 
 #ifdef CONFIG_CPU_MIPSR6
 .Lbyte_fixup\@:
-   PTR_SUBUa2, $0, t0
+   PTR_SUBUa2, t0
jr  ra
 PTR_ADDIU  a2, 1
 #endif /* CONFIG_CPU_MIPSR6 */
-- 
2.7.4



Re: [PATCH v2 4/4] MIPS: memset.S: Add comments to fault fixup handlers

2018-05-22 Thread Matt Redfearn

Hi James,

On 21/05/18 17:14, James Hogan wrote:

On Tue, Apr 17, 2018 at 04:40:03PM +0100, Matt Redfearn wrote:

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 1cc306520a55..a06dabe99d4b 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -231,16 +231,25 @@
  
  #ifdef CONFIG_CPU_MIPSR6

  .Lbyte_fixup\@:
+   /*
+* unset_bytes = current_addr + 1
+*  a2 =  t0  + 1


The code looks more like a2 = 1 - t0 to me:


+*/
PTR_SUBUa2, $0, t0
jr  ra
 PTR_ADDIU  a2, 1


I.e. t0 counts up to 1 and then stops.


Well spotted. Which means this code is also wrong :-/

We have the count of bytes to zero in a2, but that gets clobbered in the 
exception handler and we always return a number of bytes uncopied within 
STORSIZE.


This test code:

static int __init __attribute__((optimize("O0"))) test_clear_user(void)
{
int j, k;

pr_info("\n\n\nTesting clear_user\n");

for (j = 0; j < 512; j++) {
if ((k = clear_user(NULL+3, j)) != j) {
pr_err("clear_user (NULL %d) returned %d\n", j, k);
}
}

return 0;
}
late_initcall(test_clear_user);

on a 64r6el kernel results in:

[3.965439] Testing clear_user
[3.973169] clear_user (NULL 8) returned 6
[3.976782] clear_user (NULL 9) returned 6
[3.980390] clear_user (NULL 10) returned 6
[3.984052] clear_user (NULL 11) returned 6
[3.987524] clear_user (NULL 12) returned 6
[3.991179] clear_user (NULL 13) returned 6
[3.994841] clear_user (NULL 14) returned 6
[3.998500] clear_user (NULL 15) returned 6
[4.002160] clear_user (NULL 16) returned 6
[4.005820] clear_user (NULL 17) returned 6
[4.009480] clear_user (NULL 18) returned 6
[4.013140] clear_user (NULL 19) returned 6
[4.016797] clear_user (NULL 20) returned 6
[4.020456] clear_user (NULL 21) returned 6

I'll post another fix soon, and update this comment to reflect the 
corrected behavior.


Thanks,
Matt



Anyway I've applied patch 3 for 4.18.

Cheers
James



Re: [PATCH v2 4/4] MIPS: memset.S: Add comments to fault fixup handlers

2018-05-22 Thread Matt Redfearn

Hi James,

On 21/05/18 17:14, James Hogan wrote:

On Tue, Apr 17, 2018 at 04:40:03PM +0100, Matt Redfearn wrote:

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 1cc306520a55..a06dabe99d4b 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -231,16 +231,25 @@
  
  #ifdef CONFIG_CPU_MIPSR6

  .Lbyte_fixup\@:
+   /*
+* unset_bytes = current_addr + 1
+*  a2 =  t0  + 1


The code looks more like a2 = 1 - t0 to me:


+*/
PTR_SUBUa2, $0, t0
jr  ra
 PTR_ADDIU  a2, 1


I.e. t0 counts up to 1 and then stops.


Well spotted. Which means this code is also wrong :-/

We have the count of bytes to zero in a2, but that gets clobbered in the 
exception handler and we always return a number of bytes uncopied within 
STORSIZE.


This test code:

static int __init __attribute__((optimize("O0"))) test_clear_user(void)
{
int j, k;

pr_info("\n\n\nTesting clear_user\n");

for (j = 0; j < 512; j++) {
if ((k = clear_user(NULL+3, j)) != j) {
pr_err("clear_user (NULL %d) returned %d\n", j, k);
}
}

return 0;
}
late_initcall(test_clear_user);

on a 64r6el kernel results in:

[3.965439] Testing clear_user
[3.973169] clear_user (NULL 8) returned 6
[3.976782] clear_user (NULL 9) returned 6
[3.980390] clear_user (NULL 10) returned 6
[3.984052] clear_user (NULL 11) returned 6
[3.987524] clear_user (NULL 12) returned 6
[3.991179] clear_user (NULL 13) returned 6
[3.994841] clear_user (NULL 14) returned 6
[3.998500] clear_user (NULL 15) returned 6
[4.002160] clear_user (NULL 16) returned 6
[4.005820] clear_user (NULL 17) returned 6
[4.009480] clear_user (NULL 18) returned 6
[4.013140] clear_user (NULL 19) returned 6
[4.016797] clear_user (NULL 20) returned 6
[4.020456] clear_user (NULL 21) returned 6

I'll post another fix soon, and update this comment to reflect the 
corrected behavior.


Thanks,
Matt



Anyway I've applied patch 3 for 4.18.

Cheers
James



Re: [PATCH v3 5/7] MIPS: perf: Allocate per-core counters on demand

2018-05-17 Thread Matt Redfearn

Hi James,

On 16/05/18 19:05, James Hogan wrote:

On Fri, Apr 20, 2018 at 11:23:07AM +0100, Matt Redfearn wrote:

Previously when performance counters are per-core, rather than
per-thread, the number available were divided by 2 on detection, and the
counters used by each thread in a core were "swizzled" to ensure
separation. However, this solution is suboptimal since it relies on a
couple of assumptions:
a) Always having 2 VPEs / core (number of counters was divided by 2)
b) Always having a number of counters implemented in the core that is
divisible by 2. For instance if an SoC implementation had a single
counter and 2 VPEs per core, then this logic would fail and no
performance counters would be available.
The mechanism also does not allow for one VPE in a core using more than
it's allocation of the per-core counters to count multiple events even
though other VPEs may not be using them.

Fix this situation by instead allocating (and releasing) per-core
performance counters when they are requested. This approach removes the
above assumptions and fixes the shortcomings.

In order to do this:
Add additional logic to mipsxx_pmu_alloc_counter() to detect if a
sibling is using a per-core counter, and to allocate a per-core counter
in all sibling CPUs.
Similarly, add a mipsxx_pmu_free_counter() function to release a
per-core counter in all sibling CPUs when it is finished with.
A new spinlock, core_counters_lock, is introduced to ensure exclusivity
when allocating and releasing per-core counters.
Since counters are now allocated per-core on demand, rather than being
reserved per-thread at boot, all of the "swizzling" of counters is
removed.

The upshot is that in an SoC with 2 counters / thread, counters are
reported as:
Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
available to each CPU, irq 18

Running an instance of a test program on each of 2 threads in a
core, both threads can use their 2 counters to count 2 events:

taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
perf stat -e instructions:u,branches:u ./test_prog

  Performance counter stats for './test_prog':

  30002  instructions:u
  1  branches:u

0.005164264 seconds time elapsed
  Performance counter stats for './test_prog':

  30002  instructions:u
  1  branches:u

0.006139975 seconds time elapsed

In an SoC with 2 counters / core (which can be forced by setting
cpu_has_mipsmt_pertccounters = 0), counters are reported as:
Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
available to each core, irq 18

Running an instance of a test program on each of 2 threads in 
a/soak/bin/bashsoak -E cpuhotplugHi
core, now only one thread manages to secure the performance counters to
count 2 events. The other thread does not get any counters.

taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
perf stat -e instructions:u,branches:u ./test_prog

  Performance counter stats for './test_prog':

 instructions:u
 branches:u

0.005179533 seconds time elapsed

  Performance counter stats for './test_prog':

  30002  instructions:u
  1  branches:u

0.005179467 seconds time elapsed

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>


While this sounds like an improvement in practice, being able to use
more counters on single threaded stuff than otherwise, I'm a little
concerned what would happen if a task was migrated to a different CPU
and the perf counters couldn't be obtained on the new CPU due to
counters already being in use. Would the values be incorrectly small?


This change was really forced by the new I7200 development. Current 
configurations have 2 counters per core, but each core has 3 VPEs - 
which means the current logic cannot correctly assign counters. IoW the 
2 assumptions stated in the commit log are no longer true.


Though you are right that if a task migrated to a core on which another 
VPE is already using the counters, this change would mean counters 
cannot be assigned. In that case we return EAGAIN. I'm not sure if that 
error would be handled gracefully by the scheduler and the task 
scheduled away again... The code events logic that backs this is tricky 
to follow.


Thanks,
Matt




Cheers
James



Re: [PATCH v3 5/7] MIPS: perf: Allocate per-core counters on demand

2018-05-17 Thread Matt Redfearn

Hi James,

On 16/05/18 19:05, James Hogan wrote:

On Fri, Apr 20, 2018 at 11:23:07AM +0100, Matt Redfearn wrote:

Previously when performance counters are per-core, rather than
per-thread, the number available were divided by 2 on detection, and the
counters used by each thread in a core were "swizzled" to ensure
separation. However, this solution is suboptimal since it relies on a
couple of assumptions:
a) Always having 2 VPEs / core (number of counters was divided by 2)
b) Always having a number of counters implemented in the core that is
divisible by 2. For instance if an SoC implementation had a single
counter and 2 VPEs per core, then this logic would fail and no
performance counters would be available.
The mechanism also does not allow for one VPE in a core using more than
it's allocation of the per-core counters to count multiple events even
though other VPEs may not be using them.

Fix this situation by instead allocating (and releasing) per-core
performance counters when they are requested. This approach removes the
above assumptions and fixes the shortcomings.

In order to do this:
Add additional logic to mipsxx_pmu_alloc_counter() to detect if a
sibling is using a per-core counter, and to allocate a per-core counter
in all sibling CPUs.
Similarly, add a mipsxx_pmu_free_counter() function to release a
per-core counter in all sibling CPUs when it is finished with.
A new spinlock, core_counters_lock, is introduced to ensure exclusivity
when allocating and releasing per-core counters.
Since counters are now allocated per-core on demand, rather than being
reserved per-thread at boot, all of the "swizzling" of counters is
removed.

The upshot is that in an SoC with 2 counters / thread, counters are
reported as:
Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
available to each CPU, irq 18

Running an instance of a test program on each of 2 threads in a
core, both threads can use their 2 counters to count 2 events:

taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
perf stat -e instructions:u,branches:u ./test_prog

  Performance counter stats for './test_prog':

  30002  instructions:u
  1  branches:u

0.005164264 seconds time elapsed
  Performance counter stats for './test_prog':

  30002  instructions:u
  1  branches:u

0.006139975 seconds time elapsed

In an SoC with 2 counters / core (which can be forced by setting
cpu_has_mipsmt_pertccounters = 0), counters are reported as:
Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
available to each core, irq 18

Running an instance of a test program on each of 2 threads in 
a/soak/bin/bashsoak -E cpuhotplugHi
core, now only one thread manages to secure the performance counters to
count 2 events. The other thread does not get any counters.

taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
perf stat -e instructions:u,branches:u ./test_prog

  Performance counter stats for './test_prog':

 instructions:u
 branches:u

0.005179533 seconds time elapsed

  Performance counter stats for './test_prog':

  30002  instructions:u
  1  branches:u

0.005179467 seconds time elapsed

Signed-off-by: Matt Redfearn 


While this sounds like an improvement in practice, being able to use
more counters on single threaded stuff than otherwise, I'm a little
concerned what would happen if a task was migrated to a different CPU
and the perf counters couldn't be obtained on the new CPU due to
counters already being in use. Would the values be incorrectly small?


This change was really forced by the new I7200 development. Current 
configurations have 2 counters per core, but each core has 3 VPEs - 
which means the current logic cannot correctly assign counters. IoW the 
2 assumptions stated in the commit log are no longer true.


Though you are right that if a task migrated to a core on which another 
VPE is already using the counters, this change would mean counters 
cannot be assigned. In that case we return EAGAIN. I'm not sure if that 
error would be handled gracefully by the scheduler and the task 
scheduled away again... The code events logic that backs this is tricky 
to follow.


Thanks,
Matt




Cheers
James



Re: [PATCH v3 4/7] MIPS: perf: Fix perf with MT counting other threads

2018-05-17 Thread Matt Redfearn

Hi James,

On 16/05/18 18:59, James Hogan wrote:

On Fri, Apr 20, 2018 at 11:23:06AM +0100, Matt Redfearn wrote:

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 7e2b7d38a774..fe50986e83c6 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -323,7 +323,11 @@ static int mipsxx_pmu_alloc_counter(struct cpu_hw_events 
*cpuc,
  
  static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)

  {
+   struct perf_event *event = container_of(evt, struct perf_event, hw);
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+#ifdef CONFIG_MIPS_MT_SMP
+   unsigned int range = evt->event_base >> 24;
+#endif /* CONFIG_MIPS_MT_SMP */
  
  	WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
  
@@ -331,11 +335,37 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)

(evt->config_base & M_PERFCTL_CONFIG_MASK) |
/* Make sure interrupt enabled. */
MIPS_PERFCTRL_IE;
-   if (IS_ENABLED(CONFIG_CPU_BMIPS5000))
+
+#ifdef CONFIG_CPU_BMIPS5000
+   {
/* enable the counter for the calling thread */
cpuc->saved_ctrl[idx] |=
(1 << (12 + vpe_id())) | BRCM_PERFCTRL_TC;
+   }
+#else
+#ifdef CONFIG_MIPS_MT_SMP
+   if (range > V) {
+   /* The counter is processor wide. Set it up to count all TCs. */
+   pr_debug("Enabling perf counter for all TCs\n");
+   cpuc->saved_ctrl[idx] |= M_TC_EN_ALL;
+   } else
+#endif /* CONFIG_MIPS_MT_SMP */
+   {
+   unsigned int cpu, ctrl;
  
+		/*

+* Set up the counter for a particular CPU when event->cpu is
+* a valid CPU number. Otherwise set up the counter for the CPU
+* scheduling this thread.
+*/
+   cpu = (event->cpu >= 0) ? event->cpu : smp_processor_id();
+
+   ctrl = M_PERFCTL_VPEID(cpu_vpe_id(_data[cpu]));
+   ctrl |= M_TC_EN_VPE;
+   cpuc->saved_ctrl[idx] |= ctrl;
+   pr_debug("Enabling perf counter for CPU%d\n", cpu);
+   }
+#endif /* CONFIG_CPU_BMIPS5000 */


I'm not a huge fan of the ifdefery tbh, I don't think it makes it very
easy to read having a combination of ifs and #ifdefs. I reckon
IF_ENABLED would be better, perhaps with having the BMIPS5000 case
return to avoid too much nesting.


OK, I'll try and tidy it up.

Thanks,
Matt



Otherwise the patch looks okay to me.

Thanks
James



Re: [PATCH v3 4/7] MIPS: perf: Fix perf with MT counting other threads

2018-05-17 Thread Matt Redfearn

Hi James,

On 16/05/18 18:59, James Hogan wrote:

On Fri, Apr 20, 2018 at 11:23:06AM +0100, Matt Redfearn wrote:

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 7e2b7d38a774..fe50986e83c6 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -323,7 +323,11 @@ static int mipsxx_pmu_alloc_counter(struct cpu_hw_events 
*cpuc,
  
  static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)

  {
+   struct perf_event *event = container_of(evt, struct perf_event, hw);
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+#ifdef CONFIG_MIPS_MT_SMP
+   unsigned int range = evt->event_base >> 24;
+#endif /* CONFIG_MIPS_MT_SMP */
  
  	WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
  
@@ -331,11 +335,37 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)

(evt->config_base & M_PERFCTL_CONFIG_MASK) |
/* Make sure interrupt enabled. */
MIPS_PERFCTRL_IE;
-   if (IS_ENABLED(CONFIG_CPU_BMIPS5000))
+
+#ifdef CONFIG_CPU_BMIPS5000
+   {
/* enable the counter for the calling thread */
cpuc->saved_ctrl[idx] |=
(1 << (12 + vpe_id())) | BRCM_PERFCTRL_TC;
+   }
+#else
+#ifdef CONFIG_MIPS_MT_SMP
+   if (range > V) {
+   /* The counter is processor wide. Set it up to count all TCs. */
+   pr_debug("Enabling perf counter for all TCs\n");
+   cpuc->saved_ctrl[idx] |= M_TC_EN_ALL;
+   } else
+#endif /* CONFIG_MIPS_MT_SMP */
+   {
+   unsigned int cpu, ctrl;
  
+		/*

+* Set up the counter for a particular CPU when event->cpu is
+* a valid CPU number. Otherwise set up the counter for the CPU
+* scheduling this thread.
+*/
+   cpu = (event->cpu >= 0) ? event->cpu : smp_processor_id();
+
+   ctrl = M_PERFCTL_VPEID(cpu_vpe_id(_data[cpu]));
+   ctrl |= M_TC_EN_VPE;
+   cpuc->saved_ctrl[idx] |= ctrl;
+   pr_debug("Enabling perf counter for CPU%d\n", cpu);
+   }
+#endif /* CONFIG_CPU_BMIPS5000 */


I'm not a huge fan of the ifdefery tbh, I don't think it makes it very
easy to read having a combination of ifs and #ifdefs. I reckon
IF_ENABLED would be better, perhaps with having the BMIPS5000 case
return to avoid too much nesting.


OK, I'll try and tidy it up.

Thanks,
Matt



Otherwise the patch looks okay to me.

Thanks
James



[PATCH v4] MIPS: perf: Fix BMIPS5000 system mode counting

2018-05-15 Thread Matt Redfearn
When perf is used in system mode, i.e. specifying a set of CPUs to
count (perf -a -C cpu), event->cpu is set to the CPU number on which
events should be counted. The current BMIPS500 variation of
mipsxx_pmu_enable_event only over sets the counter to count the current
CPU, so system mode does not work.

Fix this by removing this BMIPS5000 specific path and integrating it
with the generic one. Since BMIPS5000 uses specific extensions to the
perf control register, different fields must be set up to count the
relevant CPU.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
Tested-by: Florian Fainelli <f.faine...@gmail.com>

---

Changes in v4:
Fix compiler error from BRCM_PERFCTRL_VPEID flagged by Florian.

Changes in v2:
New patch to fix BMIPS5000 system mode perf.

 arch/mips/include/asm/mipsregs.h |  1 +
 arch/mips/kernel/perf_event_mipsxx.c | 17 ++---
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index a4b02bc..6b0b06d2683 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -735,6 +735,7 @@
 #define MIPS_PERFCTRL_MT_EN_TC (_ULCAST_(2) << 20)
 
 /* PerfCnt control register MT extensions used by BMIPS5000 */
+#define BRCM_PERFCTRL_VPEID(v) (_ULCAST_(1) << (12 + (v)))
 #define BRCM_PERFCTRL_TC   (_ULCAST_(1) << 30)
 
 /* PerfCnt control register MT extensions used by Netlogic XLR */
diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 5b8811643e6..77d7167e303 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -364,16 +364,7 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
/* Make sure interrupt enabled. */
MIPS_PERFCTRL_IE;
 
-#ifdef CONFIG_CPU_BMIPS5000
-   {
-   /* enable the counter for the calling thread */
-   unsigned int vpe_id;
-
-   vpe_id = smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK;
-   cpuc->saved_ctrl[idx] |= BIT(12 + vpe_id) | BRCM_PERFCTRL_TC;
-   }
-#else
-#ifdef CONFIG_MIPS_MT_SMP
+#if defined(CONFIG_MIPS_MT_SMP) && !defined(CONFIG_CPU_BMIPS5000)
if (range > V) {
/* The counter is processor wide. Set it up to count all TCs. */
pr_debug("Enabling perf counter for all TCs\n");
@@ -390,12 +381,16 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
 */
cpu = (event->cpu >= 0) ? event->cpu : smp_processor_id();
 
+#if defined(CONFIG_CPU_BMIPS5000)
+   ctrl = BRCM_PERFCTRL_VPEID(cpu & MIPS_CPUID_TO_COUNTER_MASK);
+   ctrl |= BRCM_PERFCTRL_TC;
+#else
ctrl = M_PERFCTL_VPEID(cpu_vpe_id(_data[cpu]));
ctrl |= M_TC_EN_VPE;
+#endif
cpuc->saved_ctrl[idx] |= ctrl;
pr_debug("Enabling perf counter for CPU%d\n", cpu);
}
-#endif /* CONFIG_CPU_BMIPS5000 */
/*
 * We do not actually let the counter run. Leave it until start().
 */
-- 
2.7.4



[PATCH v4] MIPS: perf: Fix BMIPS5000 system mode counting

2018-05-15 Thread Matt Redfearn
When perf is used in system mode, i.e. specifying a set of CPUs to
count (perf -a -C cpu), event->cpu is set to the CPU number on which
events should be counted. The current BMIPS500 variation of
mipsxx_pmu_enable_event only over sets the counter to count the current
CPU, so system mode does not work.

Fix this by removing this BMIPS5000 specific path and integrating it
with the generic one. Since BMIPS5000 uses specific extensions to the
perf control register, different fields must be set up to count the
relevant CPU.

Signed-off-by: Matt Redfearn 
Tested-by: Florian Fainelli 

---

Changes in v4:
Fix compiler error from BRCM_PERFCTRL_VPEID flagged by Florian.

Changes in v2:
New patch to fix BMIPS5000 system mode perf.

 arch/mips/include/asm/mipsregs.h |  1 +
 arch/mips/kernel/perf_event_mipsxx.c | 17 ++---
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index a4b02bc..6b0b06d2683 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -735,6 +735,7 @@
 #define MIPS_PERFCTRL_MT_EN_TC (_ULCAST_(2) << 20)
 
 /* PerfCnt control register MT extensions used by BMIPS5000 */
+#define BRCM_PERFCTRL_VPEID(v) (_ULCAST_(1) << (12 + (v)))
 #define BRCM_PERFCTRL_TC   (_ULCAST_(1) << 30)
 
 /* PerfCnt control register MT extensions used by Netlogic XLR */
diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 5b8811643e6..77d7167e303 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -364,16 +364,7 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
/* Make sure interrupt enabled. */
MIPS_PERFCTRL_IE;
 
-#ifdef CONFIG_CPU_BMIPS5000
-   {
-   /* enable the counter for the calling thread */
-   unsigned int vpe_id;
-
-   vpe_id = smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK;
-   cpuc->saved_ctrl[idx] |= BIT(12 + vpe_id) | BRCM_PERFCTRL_TC;
-   }
-#else
-#ifdef CONFIG_MIPS_MT_SMP
+#if defined(CONFIG_MIPS_MT_SMP) && !defined(CONFIG_CPU_BMIPS5000)
if (range > V) {
/* The counter is processor wide. Set it up to count all TCs. */
pr_debug("Enabling perf counter for all TCs\n");
@@ -390,12 +381,16 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
 */
cpu = (event->cpu >= 0) ? event->cpu : smp_processor_id();
 
+#if defined(CONFIG_CPU_BMIPS5000)
+   ctrl = BRCM_PERFCTRL_VPEID(cpu & MIPS_CPUID_TO_COUNTER_MASK);
+   ctrl |= BRCM_PERFCTRL_TC;
+#else
ctrl = M_PERFCTL_VPEID(cpu_vpe_id(_data[cpu]));
ctrl |= M_TC_EN_VPE;
+#endif
cpuc->saved_ctrl[idx] |= ctrl;
pr_debug("Enabling perf counter for CPU%d\n", cpu);
}
-#endif /* CONFIG_CPU_BMIPS5000 */
/*
 * We do not actually let the counter run. Leave it until start().
 */
-- 
2.7.4



Re: [PATCH v2] clocksource/drivers/mips-gic-timer: Add pr_fmt and reword pr_* messages

2018-05-14 Thread Matt Redfearn



On 29/03/18 10:49, Matt Redfearn wrote:

Several messages from the MIPS GIC driver include the text "GIC", "GIC
timer", etc, but the format is not standard. Add a pr_fmt of
"mips-gic-timer: " and reword the messages now that they will be
prefixed with the driver name.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>



ping?

Thanks,
Matt


---

Changes in v2:
Rebase on v4.16-rc7

  drivers/clocksource/mips-gic-timer.c | 18 ++
  1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/clocksource/mips-gic-timer.c 
b/drivers/clocksource/mips-gic-timer.c
index 986b6796b631..54f8a331b53a 100644
--- a/drivers/clocksource/mips-gic-timer.c
+++ b/drivers/clocksource/mips-gic-timer.c
@@ -5,6 +5,9 @@
   *
   * Copyright (C) 2012 MIPS Technologies, Inc.  All rights reserved.
   */
+
+#define pr_fmt(fmt) "mips-gic-timer: " fmt
+
  #include 
  #include 
  #include 
@@ -136,8 +139,7 @@ static int gic_clockevent_init(void)
  
  	ret = setup_percpu_irq(gic_timer_irq, _compare_irqaction);

if (ret < 0) {
-   pr_err("GIC timer IRQ %d setup failed: %d\n",
-  gic_timer_irq, ret);
+   pr_err("IRQ %d setup failed (%d)\n", gic_timer_irq, ret);
return ret;
}
  
@@ -176,7 +178,7 @@ static int __init __gic_clocksource_init(void)
  
  	ret = clocksource_register_hz(_clocksource, gic_frequency);

if (ret < 0)
-   pr_warn("GIC: Unable to register clocksource\n");
+   pr_warn("Unable to register clocksource\n");
  
  	return ret;

  }
@@ -188,7 +190,7 @@ static int __init gic_clocksource_of_init(struct 
device_node *node)
  
  	if (!mips_gic_present() || !node->parent ||

!of_device_is_compatible(node->parent, "mti,gic")) {
-   pr_warn("No DT definition for the mips gic driver\n");
+   pr_warn("No DT definition\n");
return -ENXIO;
}
  
@@ -196,7 +198,7 @@ static int __init gic_clocksource_of_init(struct device_node *node)

if (!IS_ERR(clk)) {
ret = clk_prepare_enable(clk);
if (ret < 0) {
-   pr_err("GIC failed to enable clock\n");
+   pr_err("Failed to enable clock\n");
clk_put(clk);
return ret;
}
@@ -204,12 +206,12 @@ static int __init gic_clocksource_of_init(struct 
device_node *node)
gic_frequency = clk_get_rate(clk);
} else if (of_property_read_u32(node, "clock-frequency",
_frequency)) {
-   pr_err("GIC frequency not specified.\n");
+   pr_err("Frequency not specified\n");
return -EINVAL;
}
gic_timer_irq = irq_of_parse_and_map(node, 0);
if (!gic_timer_irq) {
-   pr_err("GIC timer IRQ not specified.\n");
+   pr_err("IRQ not specified\n");
return -EINVAL;
}
  
@@ -220,7 +222,7 @@ static int __init gic_clocksource_of_init(struct device_node *node)

ret = gic_clockevent_init();
if (!ret && !IS_ERR(clk)) {
if (clk_notifier_register(clk, _clk_nb) < 0)
-   pr_warn("GIC: Unable to register clock notifier\n");
+   pr_warn("Unable to register clock notifier\n");
}
  
  	/* And finally start the counter */




Re: [PATCH v2] clocksource/drivers/mips-gic-timer: Add pr_fmt and reword pr_* messages

2018-05-14 Thread Matt Redfearn



On 29/03/18 10:49, Matt Redfearn wrote:

Several messages from the MIPS GIC driver include the text "GIC", "GIC
timer", etc, but the format is not standard. Add a pr_fmt of
"mips-gic-timer: " and reword the messages now that they will be
prefixed with the driver name.

Signed-off-by: Matt Redfearn 



ping?

Thanks,
Matt


---

Changes in v2:
Rebase on v4.16-rc7

  drivers/clocksource/mips-gic-timer.c | 18 ++
  1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/clocksource/mips-gic-timer.c 
b/drivers/clocksource/mips-gic-timer.c
index 986b6796b631..54f8a331b53a 100644
--- a/drivers/clocksource/mips-gic-timer.c
+++ b/drivers/clocksource/mips-gic-timer.c
@@ -5,6 +5,9 @@
   *
   * Copyright (C) 2012 MIPS Technologies, Inc.  All rights reserved.
   */
+
+#define pr_fmt(fmt) "mips-gic-timer: " fmt
+
  #include 
  #include 
  #include 
@@ -136,8 +139,7 @@ static int gic_clockevent_init(void)
  
  	ret = setup_percpu_irq(gic_timer_irq, _compare_irqaction);

if (ret < 0) {
-   pr_err("GIC timer IRQ %d setup failed: %d\n",
-  gic_timer_irq, ret);
+   pr_err("IRQ %d setup failed (%d)\n", gic_timer_irq, ret);
return ret;
}
  
@@ -176,7 +178,7 @@ static int __init __gic_clocksource_init(void)
  
  	ret = clocksource_register_hz(_clocksource, gic_frequency);

if (ret < 0)
-   pr_warn("GIC: Unable to register clocksource\n");
+   pr_warn("Unable to register clocksource\n");
  
  	return ret;

  }
@@ -188,7 +190,7 @@ static int __init gic_clocksource_of_init(struct 
device_node *node)
  
  	if (!mips_gic_present() || !node->parent ||

!of_device_is_compatible(node->parent, "mti,gic")) {
-   pr_warn("No DT definition for the mips gic driver\n");
+   pr_warn("No DT definition\n");
return -ENXIO;
}
  
@@ -196,7 +198,7 @@ static int __init gic_clocksource_of_init(struct device_node *node)

if (!IS_ERR(clk)) {
ret = clk_prepare_enable(clk);
if (ret < 0) {
-   pr_err("GIC failed to enable clock\n");
+   pr_err("Failed to enable clock\n");
clk_put(clk);
return ret;
}
@@ -204,12 +206,12 @@ static int __init gic_clocksource_of_init(struct 
device_node *node)
gic_frequency = clk_get_rate(clk);
} else if (of_property_read_u32(node, "clock-frequency",
_frequency)) {
-   pr_err("GIC frequency not specified.\n");
+   pr_err("Frequency not specified\n");
return -EINVAL;
}
gic_timer_irq = irq_of_parse_and_map(node, 0);
if (!gic_timer_irq) {
-   pr_err("GIC timer IRQ not specified.\n");
+   pr_err("IRQ not specified\n");
return -EINVAL;
}
  
@@ -220,7 +222,7 @@ static int __init gic_clocksource_of_init(struct device_node *node)

ret = gic_clockevent_init();
if (!ret && !IS_ERR(clk)) {
if (clk_notifier_register(clk, _clk_nb) < 0)
-   pr_warn("GIC: Unable to register clock notifier\n");
+   pr_warn("Unable to register clock notifier\n");
}
  
  	/* And finally start the counter */




Re: [PATCH 4.17 2/2] ssb: make SSB_PCICORE_HOSTMODE depend on SSB = y

2018-05-10 Thread Matt Redfearn

Hi,

On 10/05/18 12:26, Michael Büsch wrote:

On Thu, 10 May 2018 13:20:01 +0200
Rafał Miłecki  wrote:


On 10 May 2018 at 13:17, Michael Büsch  wrote:

On Thu, 10 May 2018 13:14:01 +0200
Rafał Miłecki  wrote:
  

From: Rafał Miłecki 

SSB_PCICORE_HOSTMODE protects MIPS specific code that calls not exported
symbols pcibios_enable_device and register_pci_controller. This code is
supposed to be compiled only with ssb builtin.

This fixes:
ERROR: "pcibios_enable_device" [drivers/ssb/ssb.ko] undefined!
ERROR: "register_pci_controller" [drivers/ssb/ssb.ko] undefined!
make[1]: *** [scripts/Makefile.modpost:92: __modpost] Error 1

Signed-off-by: Rafał Miłecki 
---
  drivers/ssb/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ssb/Kconfig b/drivers/ssb/Kconfig
index b3f5cae98ea6..c574dd210500 100644
--- a/drivers/ssb/Kconfig
+++ b/drivers/ssb/Kconfig
@@ -131,7 +131,7 @@ config SSB_DRIVER_PCICORE

  config SSB_PCICORE_HOSTMODE
   bool "Hostmode support for SSB PCI core"
- depends on SSB_DRIVER_PCICORE && SSB_DRIVER_MIPS
+ depends on SSB_DRIVER_PCICORE && SSB_DRIVER_MIPS && SSB = y
   help
 PCIcore hostmode operation (external PCI bus).
  



I think we also need to depend on PCI_DRIVERS_LEGACY.
See the other patch that floats around.


I believe it's already handled by SSB_PCIHOST_POSSIBLE's dependency on
PCI_DRIVERS_LEGACY.



That dependency seems to be wrong there.
Was it added among some other "let's just unbreak some random
build" change as well?


Yeah - that was commit 58eae1416b80 ("ssb: Disable PCI host for 
PCI_DRIVERS_GENERIC").




SSB_PCIHOST enables support for SSB on top of PCI. (Which is 99% of it
uses). I don't see how this uses the legacy API.

SSB_PCICORE_HOSTMODE enables PCI on top of SSB. Which is a MIPS corner
case. This uses the legacy MIPS API to register a PCI bus.



Yeah the dependency would seem to be in the wrong place and should be on 
SSB_PCICORE_HOSTMODE, in the same way as the bcma driver - commits 
664eadd6f44b ("bcma: Fix 'allmodconfig' and BCMA builds on MIPS 
targets") & 79ca239a68f8 ("bcma: Prevent build of PCI host features in 
module").


Thanks,
Matt


Re: [PATCH 4.17 2/2] ssb: make SSB_PCICORE_HOSTMODE depend on SSB = y

2018-05-10 Thread Matt Redfearn

Hi,

On 10/05/18 12:26, Michael Büsch wrote:

On Thu, 10 May 2018 13:20:01 +0200
Rafał Miłecki  wrote:


On 10 May 2018 at 13:17, Michael Büsch  wrote:

On Thu, 10 May 2018 13:14:01 +0200
Rafał Miłecki  wrote:
  

From: Rafał Miłecki 

SSB_PCICORE_HOSTMODE protects MIPS specific code that calls not exported
symbols pcibios_enable_device and register_pci_controller. This code is
supposed to be compiled only with ssb builtin.

This fixes:
ERROR: "pcibios_enable_device" [drivers/ssb/ssb.ko] undefined!
ERROR: "register_pci_controller" [drivers/ssb/ssb.ko] undefined!
make[1]: *** [scripts/Makefile.modpost:92: __modpost] Error 1

Signed-off-by: Rafał Miłecki 
---
  drivers/ssb/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ssb/Kconfig b/drivers/ssb/Kconfig
index b3f5cae98ea6..c574dd210500 100644
--- a/drivers/ssb/Kconfig
+++ b/drivers/ssb/Kconfig
@@ -131,7 +131,7 @@ config SSB_DRIVER_PCICORE

  config SSB_PCICORE_HOSTMODE
   bool "Hostmode support for SSB PCI core"
- depends on SSB_DRIVER_PCICORE && SSB_DRIVER_MIPS
+ depends on SSB_DRIVER_PCICORE && SSB_DRIVER_MIPS && SSB = y
   help
 PCIcore hostmode operation (external PCI bus).
  



I think we also need to depend on PCI_DRIVERS_LEGACY.
See the other patch that floats around.


I believe it's already handled by SSB_PCIHOST_POSSIBLE's dependency on
PCI_DRIVERS_LEGACY.



That dependency seems to be wrong there.
Was it added among some other "let's just unbreak some random
build" change as well?


Yeah - that was commit 58eae1416b80 ("ssb: Disable PCI host for 
PCI_DRIVERS_GENERIC").




SSB_PCIHOST enables support for SSB on top of PCI. (Which is 99% of it
uses). I don't see how this uses the legacy API.

SSB_PCICORE_HOSTMODE enables PCI on top of SSB. Which is a MIPS corner
case. This uses the legacy MIPS API to register a PCI bus.



Yeah the dependency would seem to be in the wrong place and should be on 
SSB_PCICORE_HOSTMODE, in the same way as the bcma driver - commits 
664eadd6f44b ("bcma: Fix 'allmodconfig' and BCMA builds on MIPS 
targets") & 79ca239a68f8 ("bcma: Prevent build of PCI host features in 
module").


Thanks,
Matt


Re: Regression caused by commit 882164a4a928

2018-05-10 Thread Matt Redfearn

Hi Rafał,

On 10/05/18 11:41, Rafał Miłecki wrote:

On 7 May 2018 at 17:44, Larry Finger  wrote:

Although commit 882164a4a928 ("ssb: Prevent build of PCI host features in
module") appeared to be harmless, it leads to complete failure of drivers
b43. and b43legacy, and likely affects b44 as well. The problem is that
CONFIG_SSB_PCIHOST is undefined, which prevents the compilation of the code
that controls the PCI cores of the device. See
https://bugzilla.redhat.com/show_bug.cgi?id=1572349 for details.

As the underlying errors ("pcibios_enable_device" undefined, and
"register_pci_controller" undefined) do not appear on the architectures that
I have tested (x86_64, x86, and ppc), I suspect something in the
arch-specific code for your setup (MIPS?). As I have no idea on how to fix
that problem, would the following patch work for you?

diff --git a/drivers/ssb/Kconfig b/drivers/ssb/Kconfig
index 9371651d8017..3743533c8057 100644
--- a/drivers/ssb/Kconfig
+++ b/drivers/ssb/Kconfig
@@ -117,7 +117,7 @@ config SSB_SERIAL

  config SSB_DRIVER_PCICORE_POSSIBLE
 bool
-   depends on SSB_PCIHOST && SSB = y
+   depends on SSB_PCIHOST && (SSB = y || !MIPS)
 default y

  config SSB_DRIVER_PCICORE


I strongly suggest we take a step back, slow down a bit and look at
the original problem.

In driver_pcicore.c there is MIPS specific code. It's protected using
#ifdef CONFIG_SSB_PCICORE_HOSTMODE
(...)
#endif

If anyone has ever seen
ERROR: "pcibios_enable_device" [drivers/ssb/ssb.ko] undefined!
ERROR: "register_pci_controller" [drivers/ssb/ssb.ko] undefined!
make[1]: *** [scripts/Makefile.modpost:92: __modpost] Error 1
it means he managed to get CONFIG_SSB_PCICORE_HOSTMODE set on non-MIPS system.


I saw this on a MIPS system (to my knowledge, this does not happen on 
other arches due to the Kconfig rules you describe), which is what my 
original patch was attempting to fix, but appears to have caused 
problems on other arches.


Thanks,
Matt




We should rather answer how did that happen and fix it.

SSB_PCICORE_HOSTMODE depends on SSB_DRIVER_MIPS
SSB_DRIVER_MIPS depends on MIPS

How is that possible to set SSB_PCICORE_HOSTMODE with non-MIPS config?
Is there some mistake in Kconfig I can't see?



Re: Regression caused by commit 882164a4a928

2018-05-10 Thread Matt Redfearn

Hi Rafał,

On 10/05/18 11:41, Rafał Miłecki wrote:

On 7 May 2018 at 17:44, Larry Finger  wrote:

Although commit 882164a4a928 ("ssb: Prevent build of PCI host features in
module") appeared to be harmless, it leads to complete failure of drivers
b43. and b43legacy, and likely affects b44 as well. The problem is that
CONFIG_SSB_PCIHOST is undefined, which prevents the compilation of the code
that controls the PCI cores of the device. See
https://bugzilla.redhat.com/show_bug.cgi?id=1572349 for details.

As the underlying errors ("pcibios_enable_device" undefined, and
"register_pci_controller" undefined) do not appear on the architectures that
I have tested (x86_64, x86, and ppc), I suspect something in the
arch-specific code for your setup (MIPS?). As I have no idea on how to fix
that problem, would the following patch work for you?

diff --git a/drivers/ssb/Kconfig b/drivers/ssb/Kconfig
index 9371651d8017..3743533c8057 100644
--- a/drivers/ssb/Kconfig
+++ b/drivers/ssb/Kconfig
@@ -117,7 +117,7 @@ config SSB_SERIAL

  config SSB_DRIVER_PCICORE_POSSIBLE
 bool
-   depends on SSB_PCIHOST && SSB = y
+   depends on SSB_PCIHOST && (SSB = y || !MIPS)
 default y

  config SSB_DRIVER_PCICORE


I strongly suggest we take a step back, slow down a bit and look at
the original problem.

In driver_pcicore.c there is MIPS specific code. It's protected using
#ifdef CONFIG_SSB_PCICORE_HOSTMODE
(...)
#endif

If anyone has ever seen
ERROR: "pcibios_enable_device" [drivers/ssb/ssb.ko] undefined!
ERROR: "register_pci_controller" [drivers/ssb/ssb.ko] undefined!
make[1]: *** [scripts/Makefile.modpost:92: __modpost] Error 1
it means he managed to get CONFIG_SSB_PCICORE_HOSTMODE set on non-MIPS system.


I saw this on a MIPS system (to my knowledge, this does not happen on 
other arches due to the Kconfig rules you describe), which is what my 
original patch was attempting to fix, but appears to have caused 
problems on other arches.


Thanks,
Matt




We should rather answer how did that happen and fix it.

SSB_PCICORE_HOSTMODE depends on SSB_DRIVER_MIPS
SSB_DRIVER_MIPS depends on MIPS

How is that possible to set SSB_PCICORE_HOSTMODE with non-MIPS config?
Is there some mistake in Kconfig I can't see?



Re: Regression caused by commit 882164a4a928

2018-05-10 Thread Matt Redfearn

Hi Michael,

On 09/05/18 17:27, Michael Büsch wrote:

On Wed, 9 May 2018 13:55:43 +0100
Matt Redfearn <matt.redfe...@mips.com> wrote:


Hi Larry

On 07/05/18 16:44, Larry Finger wrote:

Matt,

Although commit 882164a4a928 ("ssb: Prevent build of PCI host features
in module") appeared to be harmless, it leads to complete failure of
drivers b43. and b43legacy, and likely affects b44 as well. The problem
is that CONFIG_SSB_PCIHOST is undefined, which prevents the compilation
of the code that controls the PCI cores of the device. See
https://bugzilla.redhat.com/show_bug.cgi?id=1572349 for details.


Sorry for the breakage :-/



As the underlying errors ("pcibios_enable_device" undefined, and
"register_pci_controller" undefined) do not appear on the architectures
that I have tested (x86_64, x86, and ppc), I suspect something in the
arch-specific code for your setup (MIPS?). As I have no idea on how to
fix that problem, would the following patch work for you?

diff --git a/drivers/ssb/Kconfig b/drivers/ssb/Kconfig
index 9371651d8017..3743533c8057 100644
--- a/drivers/ssb/Kconfig
+++ b/drivers/ssb/Kconfig
@@ -117,7 +117,7 @@ config SSB_SERIAL

   config SSB_DRIVER_PCICORE_POSSIBLE
      bool
-   depends on SSB_PCIHOST && SSB = y
+   depends on SSB_PCIHOST && (SSB = y || !MIPS)
      default y

   config SSB_DRIVER_PCICORE


I believe that the problem stems from these drivers being used for some
wireless AP functionality built into some MIPS based SoCs. The Kconfig
rules sort out building this additional functionality when configured
for MIPS (in a round about sort of way), but it allowed it even when SSB
is a module, leading to build failures. My patch was intended to prevent
that.

There was a similar issue in the same Kconfig file, introduced by
c5611df96804 and fixed by a9e6d44ddecc. It was fixed the same way as you
suggest. I've tested the above patch and it does work for MIPS
(preventing the PCICORE being built into the module).

Tested-by: Matt Redfearn <matt.redfe...@mips.com>



Could you please try this?

config SSB_DRIVER_PCICORE_POSSIBLE
depends on SSB_PCIHOST

config SSB_PCICORE_HOSTMODE
depends on SSB_DRIVER_PCICORE && SSB_DRIVER_MIPS && (SSB = y) && 
PCI_DRIVERS_LEGACY


The affected API pcibios_enable_device() and register_pci_controller()
is only used in HOSTMODE. So I think it makes sense to make HOSTMODE
depend on SSB=y and PCI_DRIVERS_LEGACY.

PCICore itself does not use the API, if hostmode is disabled.



Sure - I've tested the patch:

--- a/drivers/ssb/Kconfig
+++ b/drivers/ssb/Kconfig
@@ -117,7 +117,7 @@ config SSB_SERIAL

 config SSB_DRIVER_PCICORE_POSSIBLE
bool
-   depends on SSB_PCIHOST && SSB = y
+   depends on SSB_PCIHOST
default y

 config SSB_DRIVER_PCICORE
@@ -131,7 +131,7 @@ config SSB_DRIVER_PCICORE

 config SSB_PCICORE_HOSTMODE
bool "Hostmode support for SSB PCI core"
-   depends on SSB_DRIVER_PCICORE && SSB_DRIVER_MIPS
+   depends on SSB_DRIVER_PCICORE && SSB_DRIVER_MIPS && (SSB = y) && 
PCI_DRIVERS_LEGACY

help
  PCIcore hostmode operation (external PCI bus).


And this seems to work for MIPS, we don't get the build error from 
building the SSB module under nec_markeins allmodconfig, and 
SSB_PCICORE_HOSTMODE=y for bcm47xx allmodconfig, which selects SSB=y.


So this looks like a good fix for MIPS, at least.

Tested-by: Matt Redfearn <matt.redfe...@mips.com>

Thanks,
Matt




Re: Regression caused by commit 882164a4a928

2018-05-10 Thread Matt Redfearn

Hi Michael,

On 09/05/18 17:27, Michael Büsch wrote:

On Wed, 9 May 2018 13:55:43 +0100
Matt Redfearn  wrote:


Hi Larry

On 07/05/18 16:44, Larry Finger wrote:

Matt,

Although commit 882164a4a928 ("ssb: Prevent build of PCI host features
in module") appeared to be harmless, it leads to complete failure of
drivers b43. and b43legacy, and likely affects b44 as well. The problem
is that CONFIG_SSB_PCIHOST is undefined, which prevents the compilation
of the code that controls the PCI cores of the device. See
https://bugzilla.redhat.com/show_bug.cgi?id=1572349 for details.


Sorry for the breakage :-/



As the underlying errors ("pcibios_enable_device" undefined, and
"register_pci_controller" undefined) do not appear on the architectures
that I have tested (x86_64, x86, and ppc), I suspect something in the
arch-specific code for your setup (MIPS?). As I have no idea on how to
fix that problem, would the following patch work for you?

diff --git a/drivers/ssb/Kconfig b/drivers/ssb/Kconfig
index 9371651d8017..3743533c8057 100644
--- a/drivers/ssb/Kconfig
+++ b/drivers/ssb/Kconfig
@@ -117,7 +117,7 @@ config SSB_SERIAL

   config SSB_DRIVER_PCICORE_POSSIBLE
      bool
-   depends on SSB_PCIHOST && SSB = y
+   depends on SSB_PCIHOST && (SSB = y || !MIPS)
      default y

   config SSB_DRIVER_PCICORE


I believe that the problem stems from these drivers being used for some
wireless AP functionality built into some MIPS based SoCs. The Kconfig
rules sort out building this additional functionality when configured
for MIPS (in a round about sort of way), but it allowed it even when SSB
is a module, leading to build failures. My patch was intended to prevent
that.

There was a similar issue in the same Kconfig file, introduced by
c5611df96804 and fixed by a9e6d44ddecc. It was fixed the same way as you
suggest. I've tested the above patch and it does work for MIPS
(preventing the PCICORE being built into the module).

Tested-by: Matt Redfearn 



Could you please try this?

config SSB_DRIVER_PCICORE_POSSIBLE
depends on SSB_PCIHOST

config SSB_PCICORE_HOSTMODE
depends on SSB_DRIVER_PCICORE && SSB_DRIVER_MIPS && (SSB = y) && 
PCI_DRIVERS_LEGACY


The affected API pcibios_enable_device() and register_pci_controller()
is only used in HOSTMODE. So I think it makes sense to make HOSTMODE
depend on SSB=y and PCI_DRIVERS_LEGACY.

PCICore itself does not use the API, if hostmode is disabled.



Sure - I've tested the patch:

--- a/drivers/ssb/Kconfig
+++ b/drivers/ssb/Kconfig
@@ -117,7 +117,7 @@ config SSB_SERIAL

 config SSB_DRIVER_PCICORE_POSSIBLE
bool
-   depends on SSB_PCIHOST && SSB = y
+   depends on SSB_PCIHOST
default y

 config SSB_DRIVER_PCICORE
@@ -131,7 +131,7 @@ config SSB_DRIVER_PCICORE

 config SSB_PCICORE_HOSTMODE
bool "Hostmode support for SSB PCI core"
-   depends on SSB_DRIVER_PCICORE && SSB_DRIVER_MIPS
+   depends on SSB_DRIVER_PCICORE && SSB_DRIVER_MIPS && (SSB = y) && 
PCI_DRIVERS_LEGACY

help
  PCIcore hostmode operation (external PCI bus).


And this seems to work for MIPS, we don't get the build error from 
building the SSB module under nec_markeins allmodconfig, and 
SSB_PCICORE_HOSTMODE=y for bcm47xx allmodconfig, which selects SSB=y.


So this looks like a good fix for MIPS, at least.

Tested-by: Matt Redfearn 

Thanks,
Matt




Re: [REVIEW][PATCH 08/22] signal/mips: Use force_sig_fault where appropriate

2018-05-10 Thread Matt Redfearn

Hi Eric,

On 10/05/18 03:39, Eric W. Biederman wrote:

Matt Redfearn <matt.redfe...@mips.com> writes:


Hi Eric,

On 20/04/18 15:37, Eric W. Biederman wrote:

Filling in struct siginfo before calling force_sig_info a tedious and
error prone process, where once in a great while the wrong fields
are filled out, and siginfo has been inconsistently cleared.

Simplify this process by using the helper force_sig_fault.  Which
takes as a parameters all of the information it needs, ensures
all of the fiddly bits of filling in struct siginfo are done properly
and then calls force_sig_info.

In short about a 5 line reduction in code for every time force_sig_info
is called, which makes the calling function clearer.

Cc: Ralf Baechle <r...@linux-mips.org>
Cc: James Hogan <jho...@kernel.org>
Cc: linux-m...@linux-mips.org
Signed-off-by: "Eric W. Biederman" <ebied...@xmission.com>
---
   arch/mips/kernel/traps.c | 65 
++--
   arch/mips/mm/fault.c | 19 --
   2 files changed, 23 insertions(+), 61 deletions(-)

diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 967e9e4e795e..66ec4b0b484d 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -699,17 +699,11 @@ static int simulate_sync(struct pt_regs *regs, unsigned 
int opcode)
   asmlinkage void do_ov(struct pt_regs *regs)
   {
enum ctx_state prev_state;
-   siginfo_t info;
-
-   clear_siginfo();
-   info.si_signo = SIGFPE;
-   info.si_code = FPE_INTOVF;
-   info.si_addr = (void __user *)regs->cp0_epc;
prev_state = exception_enter();
die_if_kernel("Integer overflow", regs);
   -force_sig_info(SIGFPE, , current);
+   force_sig_fault(SIGFPE, FPE_INTOVF, (void __user *)regs->cp0_epc, 
current);
exception_exit(prev_state);
   }
   @@ -722,32 +716,27 @@ asmlinkage void do_ov(struct pt_regs *regs)
   void force_fcr31_sig(unsigned long fcr31, void __user *fault_addr,
 struct task_struct *tsk)
   {
-   struct siginfo si;
-
-   clear_siginfo();
-   si.si_addr = fault_addr;
-   si.si_signo = SIGFPE;
+   int si_code;


This is giving build errors in Linux next
(https://storage.kernelci.org/next/master/next-20180509/mips/defconfig+kselftest/build.log)

si_code would have ended up as 0 before from the clear_siginfo(), but perhaps


And si_code 0 is not a valid si_code to use with a floating point
siginfo layout.


int si_code = FPE_FLTUNK;

Would make a more sensible default?


FPE_FLTUNK would make a more sensible default.

I seem to remember someone telling me that case can never happen in
practice so I have simply not worried about it.  Perhaps I am
misremembering this.


It probably can't happen in practise - but the issue is that the kernel 
doesn't even compile because -Werror=maybe-uninitialized results in a 
build error since the compiler can't know that one of the branches will 
definitely be taken to set si_code.


Thanks,
Matt



Eric



Re: [REVIEW][PATCH 08/22] signal/mips: Use force_sig_fault where appropriate

2018-05-10 Thread Matt Redfearn

Hi Eric,

On 10/05/18 03:39, Eric W. Biederman wrote:

Matt Redfearn  writes:


Hi Eric,

On 20/04/18 15:37, Eric W. Biederman wrote:

Filling in struct siginfo before calling force_sig_info a tedious and
error prone process, where once in a great while the wrong fields
are filled out, and siginfo has been inconsistently cleared.

Simplify this process by using the helper force_sig_fault.  Which
takes as a parameters all of the information it needs, ensures
all of the fiddly bits of filling in struct siginfo are done properly
and then calls force_sig_info.

In short about a 5 line reduction in code for every time force_sig_info
is called, which makes the calling function clearer.

Cc: Ralf Baechle 
Cc: James Hogan 
Cc: linux-m...@linux-mips.org
Signed-off-by: "Eric W. Biederman" 
---
   arch/mips/kernel/traps.c | 65 
++--
   arch/mips/mm/fault.c | 19 --
   2 files changed, 23 insertions(+), 61 deletions(-)

diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 967e9e4e795e..66ec4b0b484d 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -699,17 +699,11 @@ static int simulate_sync(struct pt_regs *regs, unsigned 
int opcode)
   asmlinkage void do_ov(struct pt_regs *regs)
   {
enum ctx_state prev_state;
-   siginfo_t info;
-
-   clear_siginfo();
-   info.si_signo = SIGFPE;
-   info.si_code = FPE_INTOVF;
-   info.si_addr = (void __user *)regs->cp0_epc;
prev_state = exception_enter();
die_if_kernel("Integer overflow", regs);
   -force_sig_info(SIGFPE, , current);
+   force_sig_fault(SIGFPE, FPE_INTOVF, (void __user *)regs->cp0_epc, 
current);
exception_exit(prev_state);
   }
   @@ -722,32 +716,27 @@ asmlinkage void do_ov(struct pt_regs *regs)
   void force_fcr31_sig(unsigned long fcr31, void __user *fault_addr,
 struct task_struct *tsk)
   {
-   struct siginfo si;
-
-   clear_siginfo();
-   si.si_addr = fault_addr;
-   si.si_signo = SIGFPE;
+   int si_code;


This is giving build errors in Linux next
(https://storage.kernelci.org/next/master/next-20180509/mips/defconfig+kselftest/build.log)

si_code would have ended up as 0 before from the clear_siginfo(), but perhaps


And si_code 0 is not a valid si_code to use with a floating point
siginfo layout.


int si_code = FPE_FLTUNK;

Would make a more sensible default?


FPE_FLTUNK would make a more sensible default.

I seem to remember someone telling me that case can never happen in
practice so I have simply not worried about it.  Perhaps I am
misremembering this.


It probably can't happen in practise - but the issue is that the kernel 
doesn't even compile because -Werror=maybe-uninitialized results in a 
build error since the compiler can't know that one of the branches will 
definitely be taken to set si_code.


Thanks,
Matt



Eric



Re: [REVIEW][PATCH 08/22] signal/mips: Use force_sig_fault where appropriate

2018-05-09 Thread Matt Redfearn

Hi Eric,

On 20/04/18 15:37, Eric W. Biederman wrote:

Filling in struct siginfo before calling force_sig_info a tedious and
error prone process, where once in a great while the wrong fields
are filled out, and siginfo has been inconsistently cleared.

Simplify this process by using the helper force_sig_fault.  Which
takes as a parameters all of the information it needs, ensures
all of the fiddly bits of filling in struct siginfo are done properly
and then calls force_sig_info.

In short about a 5 line reduction in code for every time force_sig_info
is called, which makes the calling function clearer.

Cc: Ralf Baechle 
Cc: James Hogan 
Cc: linux-m...@linux-mips.org
Signed-off-by: "Eric W. Biederman" 
---
  arch/mips/kernel/traps.c | 65 ++--
  arch/mips/mm/fault.c | 19 --
  2 files changed, 23 insertions(+), 61 deletions(-)

diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 967e9e4e795e..66ec4b0b484d 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -699,17 +699,11 @@ static int simulate_sync(struct pt_regs *regs, unsigned 
int opcode)
  asmlinkage void do_ov(struct pt_regs *regs)
  {
enum ctx_state prev_state;
-   siginfo_t info;
-
-   clear_siginfo();
-   info.si_signo = SIGFPE;
-   info.si_code = FPE_INTOVF;
-   info.si_addr = (void __user *)regs->cp0_epc;
  
  	prev_state = exception_enter();

die_if_kernel("Integer overflow", regs);
  
-	force_sig_info(SIGFPE, , current);

+   force_sig_fault(SIGFPE, FPE_INTOVF, (void __user *)regs->cp0_epc, 
current);
exception_exit(prev_state);
  }
  
@@ -722,32 +716,27 @@ asmlinkage void do_ov(struct pt_regs *regs)

  void force_fcr31_sig(unsigned long fcr31, void __user *fault_addr,
 struct task_struct *tsk)
  {
-   struct siginfo si;
-
-   clear_siginfo();
-   si.si_addr = fault_addr;
-   si.si_signo = SIGFPE;
+   int si_code;


This is giving build errors in Linux next 
(https://storage.kernelci.org/next/master/next-20180509/mips/defconfig+kselftest/build.log)


si_code would have ended up as 0 before from the clear_siginfo(), but 
perhaps


int si_code = FPE_FLTUNK;

Would make a more sensible default?

Thanks,
Matt


  
  	if (fcr31 & FPU_CSR_INV_X)

-   si.si_code = FPE_FLTINV;
+   si_code = FPE_FLTINV;
else if (fcr31 & FPU_CSR_DIV_X)
-   si.si_code = FPE_FLTDIV;
+   si_code = FPE_FLTDIV;
else if (fcr31 & FPU_CSR_OVF_X)
-   si.si_code = FPE_FLTOVF;
+   si_code = FPE_FLTOVF;
else if (fcr31 & FPU_CSR_UDF_X)
-   si.si_code = FPE_FLTUND;
+   si_code = FPE_FLTUND;
else if (fcr31 & FPU_CSR_INE_X)
-   si.si_code = FPE_FLTRES;
+   si_code = FPE_FLTRES;
  
-	force_sig_info(SIGFPE, , tsk);

+   force_sig_fault(SIGFPE, si_code, fault_addr, tsk);
  }
  
  int process_fpemu_return(int sig, void __user *fault_addr, unsigned long fcr31)

  {
-   struct siginfo si;
+   int si_code;
struct vm_area_struct *vma;
  
-	clear_siginfo();

switch (sig) {
case 0:
return 0;
@@ -757,23 +746,18 @@ int process_fpemu_return(int sig, void __user 
*fault_addr, unsigned long fcr31)
return 1;
  
  	case SIGBUS:

-   si.si_addr = fault_addr;
-   si.si_signo = sig;
-   si.si_code = BUS_ADRERR;
-   force_sig_info(sig, , current);
+   force_sig_fault(SIGBUS, BUS_ADRERR, fault_addr, current);
return 1;
  
  	case SIGSEGV:

-   si.si_addr = fault_addr;
-   si.si_signo = sig;
down_read(>mm->mmap_sem);
vma = find_vma(current->mm, (unsigned long)fault_addr);
if (vma && (vma->vm_start <= (unsigned long)fault_addr))
-   si.si_code = SEGV_ACCERR;
+   si_code = SEGV_ACCERR;
else
-   si.si_code = SEGV_MAPERR;
+   si_code = SEGV_MAPERR;
up_read(>mm->mmap_sem);
-   force_sig_info(sig, , current);
+   force_sig_fault(SIGSEGV, si_code, fault_addr, current);
return 1;
  
  	default:

@@ -896,10 +880,8 @@ asmlinkage void do_fpe(struct pt_regs *regs, unsigned long 
fcr31)
  void do_trap_or_bp(struct pt_regs *regs, unsigned int code, int si_code,
const char *str)
  {
-   siginfo_t info;
char b[40];
  
-	clear_siginfo();

  #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_TRAP, str, regs, code, current->thread.trap_nr,
 SIGTRAP) == NOTIFY_STOP)
@@ -921,13 +903,9 @@ void do_trap_or_bp(struct pt_regs *regs, unsigned int 
code, int si_code,
case BRK_DIVZERO:
scnprintf(b, 

Re: [REVIEW][PATCH 08/22] signal/mips: Use force_sig_fault where appropriate

2018-05-09 Thread Matt Redfearn

Hi Eric,

On 20/04/18 15:37, Eric W. Biederman wrote:

Filling in struct siginfo before calling force_sig_info a tedious and
error prone process, where once in a great while the wrong fields
are filled out, and siginfo has been inconsistently cleared.

Simplify this process by using the helper force_sig_fault.  Which
takes as a parameters all of the information it needs, ensures
all of the fiddly bits of filling in struct siginfo are done properly
and then calls force_sig_info.

In short about a 5 line reduction in code for every time force_sig_info
is called, which makes the calling function clearer.

Cc: Ralf Baechle 
Cc: James Hogan 
Cc: linux-m...@linux-mips.org
Signed-off-by: "Eric W. Biederman" 
---
  arch/mips/kernel/traps.c | 65 ++--
  arch/mips/mm/fault.c | 19 --
  2 files changed, 23 insertions(+), 61 deletions(-)

diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 967e9e4e795e..66ec4b0b484d 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -699,17 +699,11 @@ static int simulate_sync(struct pt_regs *regs, unsigned 
int opcode)
  asmlinkage void do_ov(struct pt_regs *regs)
  {
enum ctx_state prev_state;
-   siginfo_t info;
-
-   clear_siginfo();
-   info.si_signo = SIGFPE;
-   info.si_code = FPE_INTOVF;
-   info.si_addr = (void __user *)regs->cp0_epc;
  
  	prev_state = exception_enter();

die_if_kernel("Integer overflow", regs);
  
-	force_sig_info(SIGFPE, , current);

+   force_sig_fault(SIGFPE, FPE_INTOVF, (void __user *)regs->cp0_epc, 
current);
exception_exit(prev_state);
  }
  
@@ -722,32 +716,27 @@ asmlinkage void do_ov(struct pt_regs *regs)

  void force_fcr31_sig(unsigned long fcr31, void __user *fault_addr,
 struct task_struct *tsk)
  {
-   struct siginfo si;
-
-   clear_siginfo();
-   si.si_addr = fault_addr;
-   si.si_signo = SIGFPE;
+   int si_code;


This is giving build errors in Linux next 
(https://storage.kernelci.org/next/master/next-20180509/mips/defconfig+kselftest/build.log)


si_code would have ended up as 0 before from the clear_siginfo(), but 
perhaps


int si_code = FPE_FLTUNK;

Would make a more sensible default?

Thanks,
Matt


  
  	if (fcr31 & FPU_CSR_INV_X)

-   si.si_code = FPE_FLTINV;
+   si_code = FPE_FLTINV;
else if (fcr31 & FPU_CSR_DIV_X)
-   si.si_code = FPE_FLTDIV;
+   si_code = FPE_FLTDIV;
else if (fcr31 & FPU_CSR_OVF_X)
-   si.si_code = FPE_FLTOVF;
+   si_code = FPE_FLTOVF;
else if (fcr31 & FPU_CSR_UDF_X)
-   si.si_code = FPE_FLTUND;
+   si_code = FPE_FLTUND;
else if (fcr31 & FPU_CSR_INE_X)
-   si.si_code = FPE_FLTRES;
+   si_code = FPE_FLTRES;
  
-	force_sig_info(SIGFPE, , tsk);

+   force_sig_fault(SIGFPE, si_code, fault_addr, tsk);
  }
  
  int process_fpemu_return(int sig, void __user *fault_addr, unsigned long fcr31)

  {
-   struct siginfo si;
+   int si_code;
struct vm_area_struct *vma;
  
-	clear_siginfo();

switch (sig) {
case 0:
return 0;
@@ -757,23 +746,18 @@ int process_fpemu_return(int sig, void __user 
*fault_addr, unsigned long fcr31)
return 1;
  
  	case SIGBUS:

-   si.si_addr = fault_addr;
-   si.si_signo = sig;
-   si.si_code = BUS_ADRERR;
-   force_sig_info(sig, , current);
+   force_sig_fault(SIGBUS, BUS_ADRERR, fault_addr, current);
return 1;
  
  	case SIGSEGV:

-   si.si_addr = fault_addr;
-   si.si_signo = sig;
down_read(>mm->mmap_sem);
vma = find_vma(current->mm, (unsigned long)fault_addr);
if (vma && (vma->vm_start <= (unsigned long)fault_addr))
-   si.si_code = SEGV_ACCERR;
+   si_code = SEGV_ACCERR;
else
-   si.si_code = SEGV_MAPERR;
+   si_code = SEGV_MAPERR;
up_read(>mm->mmap_sem);
-   force_sig_info(sig, , current);
+   force_sig_fault(SIGSEGV, si_code, fault_addr, current);
return 1;
  
  	default:

@@ -896,10 +880,8 @@ asmlinkage void do_fpe(struct pt_regs *regs, unsigned long 
fcr31)
  void do_trap_or_bp(struct pt_regs *regs, unsigned int code, int si_code,
const char *str)
  {
-   siginfo_t info;
char b[40];
  
-	clear_siginfo();

  #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_TRAP, str, regs, code, current->thread.trap_nr,
 SIGTRAP) == NOTIFY_STOP)
@@ -921,13 +903,9 @@ void do_trap_or_bp(struct pt_regs *regs, unsigned int 
code, int si_code,
case BRK_DIVZERO:
scnprintf(b, sizeof(b), "%s instruction in kernel code", str);

Re: Regression caused by commit 882164a4a928

2018-05-09 Thread Matt Redfearn

Hi Larry

On 07/05/18 16:44, Larry Finger wrote:

Matt,

Although commit 882164a4a928 ("ssb: Prevent build of PCI host features 
in module") appeared to be harmless, it leads to complete failure of 
drivers b43. and b43legacy, and likely affects b44 as well. The problem 
is that CONFIG_SSB_PCIHOST is undefined, which prevents the compilation 
of the code that controls the PCI cores of the device. See 
https://bugzilla.redhat.com/show_bug.cgi?id=1572349 for details.


Sorry for the breakage :-/



As the underlying errors ("pcibios_enable_device" undefined, and 
"register_pci_controller" undefined) do not appear on the architectures 
that I have tested (x86_64, x86, and ppc), I suspect something in the 
arch-specific code for your setup (MIPS?). As I have no idea on how to 
fix that problem, would the following patch work for you?


diff --git a/drivers/ssb/Kconfig b/drivers/ssb/Kconfig
index 9371651d8017..3743533c8057 100644
--- a/drivers/ssb/Kconfig
+++ b/drivers/ssb/Kconfig
@@ -117,7 +117,7 @@ config SSB_SERIAL

  config SSB_DRIVER_PCICORE_POSSIBLE
     bool
-   depends on SSB_PCIHOST && SSB = y
+   depends on SSB_PCIHOST && (SSB = y || !MIPS)
     default y

  config SSB_DRIVER_PCICORE


I believe that the problem stems from these drivers being used for some 
wireless AP functionality built into some MIPS based SoCs. The Kconfig 
rules sort out building this additional functionality when configured 
for MIPS (in a round about sort of way), but it allowed it even when SSB 
is a module, leading to build failures. My patch was intended to prevent 
that.


There was a similar issue in the same Kconfig file, introduced by 
c5611df96804 and fixed by a9e6d44ddecc. It was fixed the same way as you 
suggest. I've tested the above patch and it does work for MIPS 
(preventing the PCICORE being built into the module).


Tested-by: Matt Redfearn <matt.redfe...@mips.com>

Thanks & sorry again for the breakage,
Matt





Thanks,

Larry


Re: Regression caused by commit 882164a4a928

2018-05-09 Thread Matt Redfearn

Hi Larry

On 07/05/18 16:44, Larry Finger wrote:

Matt,

Although commit 882164a4a928 ("ssb: Prevent build of PCI host features 
in module") appeared to be harmless, it leads to complete failure of 
drivers b43. and b43legacy, and likely affects b44 as well. The problem 
is that CONFIG_SSB_PCIHOST is undefined, which prevents the compilation 
of the code that controls the PCI cores of the device. See 
https://bugzilla.redhat.com/show_bug.cgi?id=1572349 for details.


Sorry for the breakage :-/



As the underlying errors ("pcibios_enable_device" undefined, and 
"register_pci_controller" undefined) do not appear on the architectures 
that I have tested (x86_64, x86, and ppc), I suspect something in the 
arch-specific code for your setup (MIPS?). As I have no idea on how to 
fix that problem, would the following patch work for you?


diff --git a/drivers/ssb/Kconfig b/drivers/ssb/Kconfig
index 9371651d8017..3743533c8057 100644
--- a/drivers/ssb/Kconfig
+++ b/drivers/ssb/Kconfig
@@ -117,7 +117,7 @@ config SSB_SERIAL

  config SSB_DRIVER_PCICORE_POSSIBLE
     bool
-   depends on SSB_PCIHOST && SSB = y
+   depends on SSB_PCIHOST && (SSB = y || !MIPS)
     default y

  config SSB_DRIVER_PCICORE


I believe that the problem stems from these drivers being used for some 
wireless AP functionality built into some MIPS based SoCs. The Kconfig 
rules sort out building this additional functionality when configured 
for MIPS (in a round about sort of way), but it allowed it even when SSB 
is a module, leading to build failures. My patch was intended to prevent 
that.


There was a similar issue in the same Kconfig file, introduced by 
c5611df96804 and fixed by a9e6d44ddecc. It was fixed the same way as you 
suggest. I've tested the above patch and it does work for MIPS 
(preventing the PCICORE being built into the module).


Tested-by: Matt Redfearn 

Thanks & sorry again for the breakage,
Matt





Thanks,

Larry


Re: [RFC PATCH] MIPS: Oprofile: Drop support

2018-05-04 Thread Matt Redfearn

Hi Robert,

On 04/05/18 13:27, Robert Richter wrote:

On 04.05.18 12:03:12, Matt Redfearn wrote:

As said, oprofile version 0.9.x is still available for cpus that do
not support perf. What is the breakage?


The breakage I originally set out to fix was the MT support in perf.
https://www.linux-mips.org/archives/linux-mips/2018-04/msg00259.html

Since the perf code shares so much copied code from oprofile, those same
issues exist in oprofile and ought to be addressed. But as newer oprofile
userspace does not use the (MIPS) kernel oprofile code, then we could,
perhaps, just remove it (as per the RFC). That would break legacy tools
(0.9.x) though...


Those support perf:

  (CPU_MIPS32 || CPU_MIPS64 || CPU_R1 || CPU_SB1 || CPU_CAVIUM_OCTEON || 
CPU_XLP || CPU_LOONGSON3)

Here is the total list of CPU_*:

  $ git grep -h config.CPU_ arch/mips/ | sort -u | wc -l
  79


To be fair, that list for oprofile is not much different:

arch/mips/oprofile/Makefile:

oprofile-$(CONFIG_CPU_MIPS32)   += op_model_mipsxx.o
oprofile-$(CONFIG_CPU_MIPS64)   += op_model_mipsxx.o
oprofile-$(CONFIG_CPU_R1)   += op_model_mipsxx.o
oprofile-$(CONFIG_CPU_SB1)  += op_model_mipsxx.o
oprofile-$(CONFIG_CPU_XLR)  += op_model_mipsxx.o
oprofile-$(CONFIG_CPU_LOONGSON2)+= op_model_loongson2.o
oprofile-$(CONFIG_CPU_LOONGSON3)+= op_model_loongson3.o

However, since those are generally CPU families rather than individual 
CPUs, the number of models supported by each framework tells a different 
story:


git grep -h ops.cpu_type arch/mips/oprofile | wc -l
20

git grep -h pmu.name arch/mips/kernel/perf_event* | wc -l
17

The difference is mainly older CPUs - M14Kc, 20K, loongson1, etc. But 
yes you are right dropping it would kill profiling for them - that being 
the case I guess oprofile should remain and instead just remove support 
for the MT capable CPUs (34K, interAptiv) which are all supported by perf.


Thanks,
Matt




The comparisation might not be accurate, but at least gives a hint
that there are many cpus not supporting perf. You would drop profiling
support at al to them.

If it is too hard to also fix the oprofile code (code duplication
seems the main issue here), then it would be also ok to blacklist
newer cpus to enable oprofile kernel code (where it is broken).

-Robert



Re: [RFC PATCH] MIPS: Oprofile: Drop support

2018-05-04 Thread Matt Redfearn

Hi Robert,

On 04/05/18 13:27, Robert Richter wrote:

On 04.05.18 12:03:12, Matt Redfearn wrote:

As said, oprofile version 0.9.x is still available for cpus that do
not support perf. What is the breakage?


The breakage I originally set out to fix was the MT support in perf.
https://www.linux-mips.org/archives/linux-mips/2018-04/msg00259.html

Since the perf code shares so much copied code from oprofile, those same
issues exist in oprofile and ought to be addressed. But as newer oprofile
userspace does not use the (MIPS) kernel oprofile code, then we could,
perhaps, just remove it (as per the RFC). That would break legacy tools
(0.9.x) though...


Those support perf:

  (CPU_MIPS32 || CPU_MIPS64 || CPU_R1 || CPU_SB1 || CPU_CAVIUM_OCTEON || 
CPU_XLP || CPU_LOONGSON3)

Here is the total list of CPU_*:

  $ git grep -h config.CPU_ arch/mips/ | sort -u | wc -l
  79


To be fair, that list for oprofile is not much different:

arch/mips/oprofile/Makefile:

oprofile-$(CONFIG_CPU_MIPS32)   += op_model_mipsxx.o
oprofile-$(CONFIG_CPU_MIPS64)   += op_model_mipsxx.o
oprofile-$(CONFIG_CPU_R1)   += op_model_mipsxx.o
oprofile-$(CONFIG_CPU_SB1)  += op_model_mipsxx.o
oprofile-$(CONFIG_CPU_XLR)  += op_model_mipsxx.o
oprofile-$(CONFIG_CPU_LOONGSON2)+= op_model_loongson2.o
oprofile-$(CONFIG_CPU_LOONGSON3)+= op_model_loongson3.o

However, since those are generally CPU families rather than individual 
CPUs, the number of models supported by each framework tells a different 
story:


git grep -h ops.cpu_type arch/mips/oprofile | wc -l
20

git grep -h pmu.name arch/mips/kernel/perf_event* | wc -l
17

The difference is mainly older CPUs - M14Kc, 20K, loongson1, etc. But 
yes you are right dropping it would kill profiling for them - that being 
the case I guess oprofile should remain and instead just remove support 
for the MT capable CPUs (34K, interAptiv) which are all supported by perf.


Thanks,
Matt




The comparisation might not be accurate, but at least gives a hint
that there are many cpus not supporting perf. You would drop profiling
support at al to them.

If it is too hard to also fix the oprofile code (code duplication
seems the main issue here), then it would be also ok to blacklist
newer cpus to enable oprofile kernel code (where it is broken).

-Robert



Re: [RFC PATCH] MIPS: Oprofile: Drop support

2018-05-04 Thread Matt Redfearn

Hi Robert,

On 04/05/18 11:26, Robert Richter wrote:

On 04.05.18 10:54:32, Matt Redfearn wrote:

perf is available for MIPS and supports many more CPU types than oprofile.
oprofile userspace seemingly has been broken since 1.0.0 - removing oprofile
support from the MIPS kernel would not break it more thatn it already is,


What do you mean with "oprofile is broken"? It looks like you modified
Kconfig to enable oprofile and perf in parallel, which is not intended
to work. Have you tried a kernel with oprofile disabled and perf
enabled?


Oh I see what you mean - previously I was trying v1.1.0 of the userspace 
with a kernel that has perf disabled - and that did not work (I assumed, 
naively, that the kernel oprofile code was required to run the oprofile 
userspace).


Thanks for the pointer - I confirmed that oprofile 1.1.0 userspace tools 
work with a kernel with "CONFIG_OPROFILE is not set", and 
"CONFIG_HW_PERF_EVENTS=y".




As said, oprofile version 0.9.x is still available for cpus that do
not support perf. What is the breakage?


The breakage I originally set out to fix was the MT support in perf. 
https://www.linux-mips.org/archives/linux-mips/2018-04/msg00259.html


Since the perf code shares so much copied code from oprofile, those same 
issues exist in oprofile and ought to be addressed. But as newer 
oprofile userspace does not use the (MIPS) kernel oprofile code, then we 
could, perhaps, just remove it (as per the RFC). That would break legacy 
tools (0.9.x) though...


Thanks,
Matt



Thanks,

-Robert



Re: [RFC PATCH] MIPS: Oprofile: Drop support

2018-05-04 Thread Matt Redfearn

Hi Robert,

On 04/05/18 11:26, Robert Richter wrote:

On 04.05.18 10:54:32, Matt Redfearn wrote:

perf is available for MIPS and supports many more CPU types than oprofile.
oprofile userspace seemingly has been broken since 1.0.0 - removing oprofile
support from the MIPS kernel would not break it more thatn it already is,


What do you mean with "oprofile is broken"? It looks like you modified
Kconfig to enable oprofile and perf in parallel, which is not intended
to work. Have you tried a kernel with oprofile disabled and perf
enabled?


Oh I see what you mean - previously I was trying v1.1.0 of the userspace 
with a kernel that has perf disabled - and that did not work (I assumed, 
naively, that the kernel oprofile code was required to run the oprofile 
userspace).


Thanks for the pointer - I confirmed that oprofile 1.1.0 userspace tools 
work with a kernel with "CONFIG_OPROFILE is not set", and 
"CONFIG_HW_PERF_EVENTS=y".




As said, oprofile version 0.9.x is still available for cpus that do
not support perf. What is the breakage?


The breakage I originally set out to fix was the MT support in perf. 
https://www.linux-mips.org/archives/linux-mips/2018-04/msg00259.html


Since the perf code shares so much copied code from oprofile, those same 
issues exist in oprofile and ought to be addressed. But as newer 
oprofile userspace does not use the (MIPS) kernel oprofile code, then we 
could, perhaps, just remove it (as per the RFC). That would break legacy 
tools (0.9.x) though...


Thanks,
Matt



Thanks,

-Robert



Re: [RFC PATCH] MIPS: Oprofile: Drop support

2018-05-04 Thread Matt Redfearn

Hi Robert,

On 04/05/18 10:30, Robert Richter wrote:

On 24.04.18 14:15:58, Matt Redfearn wrote:

On 24/04/18 14:05, James Hogan wrote:

On Tue, Apr 24, 2018 at 01:55:54PM +0100, Matt Redfearn wrote:

Since it appears that MIPS oprofile support is currently broken, core
oprofile is not getting many updates and not as many architectures
implement support for it compared to perf, remove the MIPS support.


That sounds reasonable to me. Any idea how long its been broken?


Sorry, not yet. I haven't yet looked into where/how it's broken that would
narrow that down...


oprofile moved to perf syscall as kernel i/f with version 1.0.0. The


OK interesting. I guess this was the point at which MIPS' current 
Kconfig rule which only allows building oprofile or perf into a kernel 
broke oprofile userspace.




opcontrol script that was using the oprofile kernel i/f was removed:

  
https://sourceforge.net/p/oprofile/oprofile/ci/0c142c3a096d3e9ec42cc9b0ddad994fea60d135/

Thus, cpus that do not support the perf syscall are no longer
supported by 1.x releases.

  
https://sourceforge.net/p/oprofile/oprofile/ci/797d01dea0b82dbbdb0c21112a3de75990e011d2/

For those remainings there is still version 0.9.x available (tagged
PRE_RELEASE_1_0).

I am undecided whether removing oprofile kernel i/f falls under the
rule of "never break user space" here. Strictly seen, yes it breaks
those remainings. So if the perf syscall is not available as an
alternative, the oprofile kernel support shouldn't be removed.


perf is available for MIPS and supports many more CPU types than 
oprofile. oprofile userspace seemingly has been broken since 1.0.0 - 
removing oprofile support from the MIPS kernel would not break it more 
thatn it already is, but of course it would be better to fix it - if it 
is still useful and people still use it. That is the question that I was 
looking for answers for with this RFC - whether to spend the time & 
effort to fix oprofile, or if it can be removed since everyone uses perf.


Thanks,
Matt



-Robert



Re: [RFC PATCH] MIPS: Oprofile: Drop support

2018-05-04 Thread Matt Redfearn

Hi Robert,

On 04/05/18 10:30, Robert Richter wrote:

On 24.04.18 14:15:58, Matt Redfearn wrote:

On 24/04/18 14:05, James Hogan wrote:

On Tue, Apr 24, 2018 at 01:55:54PM +0100, Matt Redfearn wrote:

Since it appears that MIPS oprofile support is currently broken, core
oprofile is not getting many updates and not as many architectures
implement support for it compared to perf, remove the MIPS support.


That sounds reasonable to me. Any idea how long its been broken?


Sorry, not yet. I haven't yet looked into where/how it's broken that would
narrow that down...


oprofile moved to perf syscall as kernel i/f with version 1.0.0. The


OK interesting. I guess this was the point at which MIPS' current 
Kconfig rule which only allows building oprofile or perf into a kernel 
broke oprofile userspace.




opcontrol script that was using the oprofile kernel i/f was removed:

  
https://sourceforge.net/p/oprofile/oprofile/ci/0c142c3a096d3e9ec42cc9b0ddad994fea60d135/

Thus, cpus that do not support the perf syscall are no longer
supported by 1.x releases.

  
https://sourceforge.net/p/oprofile/oprofile/ci/797d01dea0b82dbbdb0c21112a3de75990e011d2/

For those remainings there is still version 0.9.x available (tagged
PRE_RELEASE_1_0).

I am undecided whether removing oprofile kernel i/f falls under the
rule of "never break user space" here. Strictly seen, yes it breaks
those remainings. So if the perf syscall is not available as an
alternative, the oprofile kernel support shouldn't be removed.


perf is available for MIPS and supports many more CPU types than 
oprofile. oprofile userspace seemingly has been broken since 1.0.0 - 
removing oprofile support from the MIPS kernel would not break it more 
thatn it already is, but of course it would be better to fix it - if it 
is still useful and people still use it. That is the question that I was 
looking for answers for with this RFC - whether to spend the time & 
effort to fix oprofile, or if it can be removed since everyone uses perf.


Thanks,
Matt



-Robert



Re: [RFC PATCH] MIPS: Oprofile: Drop support

2018-04-24 Thread Matt Redfearn



On 24/04/18 14:05, James Hogan wrote:

On Tue, Apr 24, 2018 at 01:55:54PM +0100, Matt Redfearn wrote:

Since it appears that MIPS oprofile support is currently broken, core
oprofile is not getting many updates and not as many architectures
implement support for it compared to perf, remove the MIPS support.


That sounds reasonable to me. Any idea how long its been broken?


Sorry, not yet. I haven't yet looked into where/how it's broken that 
would narrow that down...


The other thing to bear in mind is that the userspace tools have not 
seen any MIPS additions since 2010, the last commit being to add 1004K 
and 34K support:

https://sourceforge.net/p/oprofile/oprofile/ci/master/tree/events/mips/

Though I'm not sure if anyone is maintaining a vendor specific fork 
containing further support.




I'll let it sit on the list for a bit in case anybody does object and
wants to fix it instead.


Cool, sounds good.

Thanks,
Matt



Thanks
James



Re: [RFC PATCH] MIPS: Oprofile: Drop support

2018-04-24 Thread Matt Redfearn



On 24/04/18 14:05, James Hogan wrote:

On Tue, Apr 24, 2018 at 01:55:54PM +0100, Matt Redfearn wrote:

Since it appears that MIPS oprofile support is currently broken, core
oprofile is not getting many updates and not as many architectures
implement support for it compared to perf, remove the MIPS support.


That sounds reasonable to me. Any idea how long its been broken?


Sorry, not yet. I haven't yet looked into where/how it's broken that 
would narrow that down...


The other thing to bear in mind is that the userspace tools have not 
seen any MIPS additions since 2010, the last commit being to add 1004K 
and 34K support:

https://sourceforge.net/p/oprofile/oprofile/ci/master/tree/events/mips/

Though I'm not sure if anyone is maintaining a vendor specific fork 
containing further support.




I'll let it sit on the list for a bit in case anybody does object and
wants to fix it instead.


Cool, sounds good.

Thanks,
Matt



Thanks
James



[RFC PATCH] MIPS: Oprofile: Drop support

2018-04-24 Thread Matt Redfearn
The core oprofile code in drivers/oprofile/ has not seeen significant
maintenance other than fixes to changes in other parts of the tree for
the last 5 years at least. It looks as through the perf tool has
more or less superceeded it's functionality.
Additionally the MIPS architecture support has bitrotted to an extent
meaning it is not currently functional.
For example the current Kconfig rule disallows HW_PERF from being
enabled in a kernel supporting OPROFILE, but attempting to use oprofile
on such a kernel results in:

$ op-check-perfevents -v
  perf_event_open syscall returned No such file or directory

Due to missing perf support

If this dependency is removed such that oprofile can be started, very
quickly an oops is triggered which appears to be a result of the
oprofile buffer having not been set up correctly:

$ operf: Profiler started
[   96.950415] CPU 1 Unable to handle kernel paging request at virtual address 
0008, epc == 804fcb14, ra == 809b6424
[   96.962266] Oops[#1]:
[   96.964821] CPU: 1 PID: 145 Comm: operf Not tainted 4.16.0+ #235
[   96.971516] $ 0   :  809b6500 d179d1bb 
[   96.977367] $ 4   :  000c 0001 8efab000
[   96.983215] $ 8   : 80e0 007f 002887fc 
[   96.989062] $12   : 8f495de4  8131c400 80e01000
[   96.994916] $16   : 8f495dfc 80d1  8efab000
[   97.000781] $20   :  80cf8aa0 8f495ef8 80d7
[   97.006630] $24   :  
[   97.012477] $28   : 8e936000 8f495d68 0001 809b6424
[   97.018329] Hi: 00210c94
[   97.021551] Lo: 4a54aef0
[   97.024810] epc   : 804fcb14 ring_buffer_lock_reserve+0x38/0x6fc
[   97.031526] ra: 809b6424 op_cpu_buffer_write_reserve+0x3c/0x78
[   97.038417] Status: 1100fc02KERNEL EXL
[   97.042820] Cause : 04808008 (ExcCode 02)
[   97.047286] BadVA : 0008
[   97.050503] PrId  : 0001a120 (MIPS interAptiv (multi))
[   97.056225] Modules linked in:
[   97.059667] Process operf (pid: 145, threadinfo=e9a0abb0, task=3ddccdc2, 
tls=77fcd490)
[   97.068484] Stack : 80e0 8132d400 0001 8efab000 8132d400 80d1 
80d0de64 0001
[   97.077836] 3973 8047763c 8293d343 80d7 8f495d98 d179d1bb 
0001 8f495dfc
[   97.087190] 0001 0001 8efab000  80cf8aa0 8f495ef8 
80d7 809b6424
[   97.096542] 8293e5d2 0016 8293d343 80d7 804c9c38 000b 
80d1 809b6500
[   97.105897] 81329ae8 8e937a90 8293e5d2 0016 8293d343 81329ae8 
81329ae8 804b4cf4
[   97.115251] ...
[   97.117992] Call Trace:
[   97.120743] [<804fcb14>] ring_buffer_lock_reserve+0x38/0x6fc
[   97.127080] [<809b6424>] op_cpu_buffer_write_reserve+0x3c/0x78
[   97.133604] [<809b6500>] op_add_code+0x80/0x114
[   97.138675] [<809b6610>] log_sample+0x7c/0xf0
[   97.143545] [<809b6928>] oprofile_add_sample+0x90/0xd4
[   97.149292] [<809ba2f0>] mipsxx_perfcount_handler+0x408/0x474
[   97.155728] [<80492c4c>] __handle_irq_event_percpu+0xbc/0x288
[   97.162141] [<80492e58>] handle_irq_event_percpu+0x40/0x98
[   97.168276] [<80498700>] handle_percpu_irq+0x98/0xc8
[   97.173833] [<8049204c>] generic_handle_irq+0x38/0x48
[   97.179504] [<807da298>] gic_handle_local_int+0xb0/0x108
[   97.185438] [<807da498>] gic_irq_dispatch+0x20/0x30
[   97.190887] [<8049204c>] generic_handle_irq+0x38/0x48
[   97.196539] [<80b7e098>] do_IRQ+0x28/0x34
[   97.201024] [<807d8ef0>] plat_irq_dispatch+0xf0/0x11c
[   97.206677] [<804060a8>] except_vec_vi_end+0xb8/0xc4
[   97.212227] [<80b7dc2c>] _raw_spin_unlock_irq+0x28/0x34
[   97.218067] [<80544428>] __add_to_page_cache_locked.part.44+0xe4/0x218
[   97.225353] [<80544930>] add_to_page_cache_lru+0x88/0x1bc
[   97.231419] [<805563e0>] read_cache_pages+0x9c/0x1bc
[   97.236981] [<806d4b38>] nfs_readpages+0x124/0x234
[   97.242344] [<805566e8>] __do_page_cache_readahead+0x1e8/0x344
[   97.248877] [<80556e70>] page_cache_sync_readahead+0x68/0x74
[   97.255199] [<80547354>] generic_file_read_iter+0x340/0xd50
[   97.261449] [<806c5a9c>] nfs_file_read+0x84/0x120
[   97.266726] [<805a8d28>] __vfs_read+0x144/0x198
[   97.271801] [<805a8e34>] vfs_read+0xb8/0x144
[   97.276589] [<805a8ee8>] kernel_read+0x28/0x3c
[   97.281585] [<805b00f4>] prepare_binprm+0x18c/0x1c4
[   97.287042] [<805b21c8>] do_execveat_common+0x3fc/0x6d8
[   97.292878] [<805b24dc>] do_execve+0x38/0x44
[   97.297669] [<80415a58>] syscall_common+0x34/0x58
[   97.302924] Code: afb0003c  8e22d4e0  afa20034 <8c820008> 14400170
24030020  8f82000c  26460010  00a09825

Since it appears that MIPS oprofile support is currently broken, core
oprofile is not getting many updates and not as many architectures
implement support for it compared to perf, remove the MIPS support.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Th

[RFC PATCH] MIPS: Oprofile: Drop support

2018-04-24 Thread Matt Redfearn
The core oprofile code in drivers/oprofile/ has not seeen significant
maintenance other than fixes to changes in other parts of the tree for
the last 5 years at least. It looks as through the perf tool has
more or less superceeded it's functionality.
Additionally the MIPS architecture support has bitrotted to an extent
meaning it is not currently functional.
For example the current Kconfig rule disallows HW_PERF from being
enabled in a kernel supporting OPROFILE, but attempting to use oprofile
on such a kernel results in:

$ op-check-perfevents -v
  perf_event_open syscall returned No such file or directory

Due to missing perf support

If this dependency is removed such that oprofile can be started, very
quickly an oops is triggered which appears to be a result of the
oprofile buffer having not been set up correctly:

$ operf: Profiler started
[   96.950415] CPU 1 Unable to handle kernel paging request at virtual address 
0008, epc == 804fcb14, ra == 809b6424
[   96.962266] Oops[#1]:
[   96.964821] CPU: 1 PID: 145 Comm: operf Not tainted 4.16.0+ #235
[   96.971516] $ 0   :  809b6500 d179d1bb 
[   96.977367] $ 4   :  000c 0001 8efab000
[   96.983215] $ 8   : 80e0 007f 002887fc 
[   96.989062] $12   : 8f495de4  8131c400 80e01000
[   96.994916] $16   : 8f495dfc 80d1  8efab000
[   97.000781] $20   :  80cf8aa0 8f495ef8 80d7
[   97.006630] $24   :  
[   97.012477] $28   : 8e936000 8f495d68 0001 809b6424
[   97.018329] Hi: 00210c94
[   97.021551] Lo: 4a54aef0
[   97.024810] epc   : 804fcb14 ring_buffer_lock_reserve+0x38/0x6fc
[   97.031526] ra: 809b6424 op_cpu_buffer_write_reserve+0x3c/0x78
[   97.038417] Status: 1100fc02KERNEL EXL
[   97.042820] Cause : 04808008 (ExcCode 02)
[   97.047286] BadVA : 0008
[   97.050503] PrId  : 0001a120 (MIPS interAptiv (multi))
[   97.056225] Modules linked in:
[   97.059667] Process operf (pid: 145, threadinfo=e9a0abb0, task=3ddccdc2, 
tls=77fcd490)
[   97.068484] Stack : 80e0 8132d400 0001 8efab000 8132d400 80d1 
80d0de64 0001
[   97.077836] 3973 8047763c 8293d343 80d7 8f495d98 d179d1bb 
0001 8f495dfc
[   97.087190] 0001 0001 8efab000  80cf8aa0 8f495ef8 
80d7 809b6424
[   97.096542] 8293e5d2 0016 8293d343 80d7 804c9c38 000b 
80d1 809b6500
[   97.105897] 81329ae8 8e937a90 8293e5d2 0016 8293d343 81329ae8 
81329ae8 804b4cf4
[   97.115251] ...
[   97.117992] Call Trace:
[   97.120743] [<804fcb14>] ring_buffer_lock_reserve+0x38/0x6fc
[   97.127080] [<809b6424>] op_cpu_buffer_write_reserve+0x3c/0x78
[   97.133604] [<809b6500>] op_add_code+0x80/0x114
[   97.138675] [<809b6610>] log_sample+0x7c/0xf0
[   97.143545] [<809b6928>] oprofile_add_sample+0x90/0xd4
[   97.149292] [<809ba2f0>] mipsxx_perfcount_handler+0x408/0x474
[   97.155728] [<80492c4c>] __handle_irq_event_percpu+0xbc/0x288
[   97.162141] [<80492e58>] handle_irq_event_percpu+0x40/0x98
[   97.168276] [<80498700>] handle_percpu_irq+0x98/0xc8
[   97.173833] [<8049204c>] generic_handle_irq+0x38/0x48
[   97.179504] [<807da298>] gic_handle_local_int+0xb0/0x108
[   97.185438] [<807da498>] gic_irq_dispatch+0x20/0x30
[   97.190887] [<8049204c>] generic_handle_irq+0x38/0x48
[   97.196539] [<80b7e098>] do_IRQ+0x28/0x34
[   97.201024] [<807d8ef0>] plat_irq_dispatch+0xf0/0x11c
[   97.206677] [<804060a8>] except_vec_vi_end+0xb8/0xc4
[   97.212227] [<80b7dc2c>] _raw_spin_unlock_irq+0x28/0x34
[   97.218067] [<80544428>] __add_to_page_cache_locked.part.44+0xe4/0x218
[   97.225353] [<80544930>] add_to_page_cache_lru+0x88/0x1bc
[   97.231419] [<805563e0>] read_cache_pages+0x9c/0x1bc
[   97.236981] [<806d4b38>] nfs_readpages+0x124/0x234
[   97.242344] [<805566e8>] __do_page_cache_readahead+0x1e8/0x344
[   97.248877] [<80556e70>] page_cache_sync_readahead+0x68/0x74
[   97.255199] [<80547354>] generic_file_read_iter+0x340/0xd50
[   97.261449] [<806c5a9c>] nfs_file_read+0x84/0x120
[   97.266726] [<805a8d28>] __vfs_read+0x144/0x198
[   97.271801] [<805a8e34>] vfs_read+0xb8/0x144
[   97.276589] [<805a8ee8>] kernel_read+0x28/0x3c
[   97.281585] [<805b00f4>] prepare_binprm+0x18c/0x1c4
[   97.287042] [<805b21c8>] do_execveat_common+0x3fc/0x6d8
[   97.292878] [<805b24dc>] do_execve+0x38/0x44
[   97.297669] [<80415a58>] syscall_common+0x34/0x58
[   97.302924] Code: afb0003c  8e22d4e0  afa20034 <8c820008> 14400170
24030020  8f82000c  26460010  00a09825

Since it appears that MIPS oprofile support is currently broken, core
oprofile is not getting many updates and not as many architectures
implement support for it compared to perf, remove the MIPS support.

Signed-off-by: Matt Redfearn 
---

This is meant as an RFC to determi

[PATCH] cifs: smbd: Fix printk format warning for iov on the stack

2018-04-24 Thread Matt Redfearn
Commit 4863cc758216 ("cifs: smbd: Avoid allocating iov on the stack")
(next-20180424) added a warning when the pdu size is not as expected,
but used a %lu for the printk warning. This results in the following
warning being emitted from MIPS allyesconfig builds:

fs/cifs/smbdirect.c:2106:3: warning: format '%lu' expects argument of
type 'long unsigned int', but argument 4 has type 'size_t' [-Wformat=]

Change the format specifier to %zu for the size_t argument.

Fixes: 4863cc758216 ("cifs: smbd: Avoid allocating iov on the stack")
Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>

---

This is new in next-20180424. Feel free to squash this if possible
before it hits master.

---
 fs/cifs/smbdirect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 24cea63e17f5..c62f7c95683c 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2103,7 +2103,7 @@ int smbd_send(struct smbd_connection *info, struct 
smb_rqst *rqst)
 */
 
if (rqst->rq_iov[0].iov_len != 4) {
-   log_write(ERR, "expected the pdu length in 1st iov, but got 
%lu\n", rqst->rq_iov[0].iov_len);
+   log_write(ERR, "expected the pdu length in 1st iov, but got 
%zu\n", rqst->rq_iov[0].iov_len);
return -EINVAL;
}
iov = >rq_iov[1];
-- 
2.7.4



[PATCH] cifs: smbd: Fix printk format warning for iov on the stack

2018-04-24 Thread Matt Redfearn
Commit 4863cc758216 ("cifs: smbd: Avoid allocating iov on the stack")
(next-20180424) added a warning when the pdu size is not as expected,
but used a %lu for the printk warning. This results in the following
warning being emitted from MIPS allyesconfig builds:

fs/cifs/smbdirect.c:2106:3: warning: format '%lu' expects argument of
type 'long unsigned int', but argument 4 has type 'size_t' [-Wformat=]

Change the format specifier to %zu for the size_t argument.

Fixes: 4863cc758216 ("cifs: smbd: Avoid allocating iov on the stack")
Signed-off-by: Matt Redfearn 

---

This is new in next-20180424. Feel free to squash this if possible
before it hits master.

---
 fs/cifs/smbdirect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 24cea63e17f5..c62f7c95683c 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2103,7 +2103,7 @@ int smbd_send(struct smbd_connection *info, struct 
smb_rqst *rqst)
 */
 
if (rqst->rq_iov[0].iov_len != 4) {
-   log_write(ERR, "expected the pdu length in 1st iov, but got 
%lu\n", rqst->rq_iov[0].iov_len);
+   log_write(ERR, "expected the pdu length in 1st iov, but got 
%zu\n", rqst->rq_iov[0].iov_len);
return -EINVAL;
}
iov = >rq_iov[1];
-- 
2.7.4



Re: [PATCH v3 0/7] MIPS: perf: MT fixes and improvements

2018-04-23 Thread Matt Redfearn



On 20/04/18 23:51, Florian Fainelli wrote:

On 04/20/2018 03:23 AM, Matt Redfearn wrote:

This series addresses a few issues with how the MIPS performance
counters code supports the hardware multithreading MT ASE.

Firstly, implementations of the MT ASE may implement performance
counters
per core or per thread(TC). MIPS Techologies implementations signal this
via a bit in the implmentation specific CONFIG7 register. Since this
register is implementation specific, checking it should be guarded by a
PRID check. This also replaces a bit defined by a magic number.

Secondly, the code currently uses vpe_id(), defined as
smp_processor_id(), divided by 2, to share core performance counters
between VPEs. This relies on a couple of assumptions of the hardware
implementation to function correctly (always 2 VPEs per core, and the
hardware reading only the least significant bit).

Finally, the method of sharing core performance counters between VPEs is
suboptimal since it does not allow one process running on a VPE to use
all of the performance counters available to it, because the kernel will
reserve half of the coutners for the other VPE even if it may never use
them. This reservation at counter probe is replaced with an allocation
on use strategy.

Tested on a MIPS Creator CI40 (2C2T MIPS InterAptiv with per-TC
counters, though for the purposes of testing the per-TC availability was
hardcoded to allow testing both paths).

Series applies to v4.16


Sorry it took so long to get that tested.


Hi Florian,

Thanks for testing!



Sounds like you need to build test this on a BMIPS5000 configuration
(bmips_stb_defconfig should provide that):

In file included from ./arch/mips/include/asm/mach-generic/spaces.h:15:0,
  from ./arch/mips/include/asm/mach-bmips/spaces.h:16,
  from ./arch/mips/include/asm/addrspace.h:13,
  from ./arch/mips/include/asm/barrier.h:11,
  from ./include/linux/compiler.h:245,
  from ./include/linux/kernel.h:10,
  from ./include/linux/cpumask.h:10,
  from arch/mips/kernel/perf_event_mipsxx.c:18:
arch/mips/kernel/perf_event_mipsxx.c: In function 'mipsxx_pmu_enable_event':
./arch/mips/include/asm/mipsregs.h:738:52: error: suggest parentheses
around '+' in operand of '&' [-Werror=parentheses]
  #define BRCM_PERFCTRL_VPEID(v) (_ULCAST_(1) << (12 + v))

arch/mips/kernel/perf_event_mipsxx.c:385:10: note: in expansion of macro
'BRCM_PERFCTRL_VPEID'
ctrl = BRCM_PERFCTRL_VPEID(cpu & MIPS_CPUID_TO_COUNTER_MASK);
   ^~~
   CC  drivers/of/fdt_addres


Good spot - I've updated the patch to

+#define BRCM_PERFCTRL_VPEID(v) (_ULCAST_(1) << (12 + (v)))

to fix that.



after fixing that, I tried the following to see whether this would be a
good test case to exercise against:

perf record -a -C 0 taskset -c 1 /bin/true
perf record -a -C 1 taskset -c 0 /bin/true

and would not see anything related to /bin/true running in either case,
which seems like it does the right thing?


I've generally been testing using this code:

perf_test.S:

#include 

#define ITERATIONS 1


.text
.global __start
.type   __start, @function;
__start:
.setnoreorder
li  $2, ITERATIONS

1:
addiu   $2,$2,-1
bnez$2, 1b
 nop

li  $2, __NR_exit
syscall


Makefile:
$(CC) $(ISA_FLAG) $(ABI_FLAG) $(ENDIAN_FLAG) -static -nostartfiles -O2 
-o perf_test perf_test.S


Then running perf which should report the right counts:

taskset 1 perf stat -e instructions:u,branches:u ./perf_test
 Performance counter stats for './perf_test':

 30002  instructions:u
 1  branches:u

   0.005179467 seconds time elapsed


System mode should also work:

# perf stat -e instructions:u,branches:u -a -C 2
^C

 Performance counter stats for 'system wide':

  1416  instructions:u 

   198  branches:u 



   4.454874812 seconds time elapsed



Tested-by: Florian Fainelli <f.faine...@gmail.com>


Thanks!



BTW, for some reason not specifying -a -C  does lead to lockups,
consistently and for pretty much any perf command, this could be BMIPS
specific, did not get a chance to cross test on a different machine.


Interesting FWIW I don't get lockups on ci40 (MIPS InterAptiv). Is 
this a regression with this series applied or an existing problem?


Thanks,
Matt






Changes in v3:
New patch to detect feature presence in cpu-probe.c
Use flag in cpu_data set by cpu_probe.c to indicate feature presence.
- rebase on new feature detection

Changes in v2:
Fix mipsxx_pmu_enable_event for !#ifdef CONFIG_MIPS_MT_SMP
- Fix !#ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS build
- re-use cpuc variable in mipsxx_pmu_alloc_counter,
   mipsxx_pmu_free_counter rather than having sibling_ version.
Since BMIPS5000 does not implement per TC counters, we can remove the
check on cp

Re: [PATCH v3 0/7] MIPS: perf: MT fixes and improvements

2018-04-23 Thread Matt Redfearn



On 20/04/18 23:51, Florian Fainelli wrote:

On 04/20/2018 03:23 AM, Matt Redfearn wrote:

This series addresses a few issues with how the MIPS performance
counters code supports the hardware multithreading MT ASE.

Firstly, implementations of the MT ASE may implement performance
counters
per core or per thread(TC). MIPS Techologies implementations signal this
via a bit in the implmentation specific CONFIG7 register. Since this
register is implementation specific, checking it should be guarded by a
PRID check. This also replaces a bit defined by a magic number.

Secondly, the code currently uses vpe_id(), defined as
smp_processor_id(), divided by 2, to share core performance counters
between VPEs. This relies on a couple of assumptions of the hardware
implementation to function correctly (always 2 VPEs per core, and the
hardware reading only the least significant bit).

Finally, the method of sharing core performance counters between VPEs is
suboptimal since it does not allow one process running on a VPE to use
all of the performance counters available to it, because the kernel will
reserve half of the coutners for the other VPE even if it may never use
them. This reservation at counter probe is replaced with an allocation
on use strategy.

Tested on a MIPS Creator CI40 (2C2T MIPS InterAptiv with per-TC
counters, though for the purposes of testing the per-TC availability was
hardcoded to allow testing both paths).

Series applies to v4.16


Sorry it took so long to get that tested.


Hi Florian,

Thanks for testing!



Sounds like you need to build test this on a BMIPS5000 configuration
(bmips_stb_defconfig should provide that):

In file included from ./arch/mips/include/asm/mach-generic/spaces.h:15:0,
  from ./arch/mips/include/asm/mach-bmips/spaces.h:16,
  from ./arch/mips/include/asm/addrspace.h:13,
  from ./arch/mips/include/asm/barrier.h:11,
  from ./include/linux/compiler.h:245,
  from ./include/linux/kernel.h:10,
  from ./include/linux/cpumask.h:10,
  from arch/mips/kernel/perf_event_mipsxx.c:18:
arch/mips/kernel/perf_event_mipsxx.c: In function 'mipsxx_pmu_enable_event':
./arch/mips/include/asm/mipsregs.h:738:52: error: suggest parentheses
around '+' in operand of '&' [-Werror=parentheses]
  #define BRCM_PERFCTRL_VPEID(v) (_ULCAST_(1) << (12 + v))

arch/mips/kernel/perf_event_mipsxx.c:385:10: note: in expansion of macro
'BRCM_PERFCTRL_VPEID'
ctrl = BRCM_PERFCTRL_VPEID(cpu & MIPS_CPUID_TO_COUNTER_MASK);
   ^~~
   CC  drivers/of/fdt_addres


Good spot - I've updated the patch to

+#define BRCM_PERFCTRL_VPEID(v) (_ULCAST_(1) << (12 + (v)))

to fix that.



after fixing that, I tried the following to see whether this would be a
good test case to exercise against:

perf record -a -C 0 taskset -c 1 /bin/true
perf record -a -C 1 taskset -c 0 /bin/true

and would not see anything related to /bin/true running in either case,
which seems like it does the right thing?


I've generally been testing using this code:

perf_test.S:

#include 

#define ITERATIONS 1


.text
.global __start
.type   __start, @function;
__start:
.setnoreorder
li  $2, ITERATIONS

1:
addiu   $2,$2,-1
bnez$2, 1b
 nop

li  $2, __NR_exit
syscall


Makefile:
$(CC) $(ISA_FLAG) $(ABI_FLAG) $(ENDIAN_FLAG) -static -nostartfiles -O2 
-o perf_test perf_test.S


Then running perf which should report the right counts:

taskset 1 perf stat -e instructions:u,branches:u ./perf_test
 Performance counter stats for './perf_test':

 30002  instructions:u
 1  branches:u

   0.005179467 seconds time elapsed


System mode should also work:

# perf stat -e instructions:u,branches:u -a -C 2
^C

 Performance counter stats for 'system wide':

  1416  instructions:u 

   198  branches:u 



   4.454874812 seconds time elapsed



Tested-by: Florian Fainelli 


Thanks!



BTW, for some reason not specifying -a -C  does lead to lockups,
consistently and for pretty much any perf command, this could be BMIPS
specific, did not get a chance to cross test on a different machine.


Interesting FWIW I don't get lockups on ci40 (MIPS InterAptiv). Is 
this a regression with this series applied or an existing problem?


Thanks,
Matt






Changes in v3:
New patch to detect feature presence in cpu-probe.c
Use flag in cpu_data set by cpu_probe.c to indicate feature presence.
- rebase on new feature detection

Changes in v2:
Fix mipsxx_pmu_enable_event for !#ifdef CONFIG_MIPS_MT_SMP
- Fix !#ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS build
- re-use cpuc variable in mipsxx_pmu_alloc_counter,
   mipsxx_pmu_free_counter rather than having sibling_ version.
Since BMIPS5000 does not implement per TC counters, we can remove the
check on cpu_has_mipsmt_pertccounte

Re: [PATCH] serial: 8250_early: Setup divider when uartclk is passed

2018-04-23 Thread Matt Redfearn

Hi Michal

On 23/04/18 10:18, Michal Simek wrote:

device->baud is always non zero value because it is checked already in
early_serial8250_setup() before init_port is called.


True, currently init_port is only called from the one location and so 
the test is a little redundant, though I don't see the harm in testing 
both inputs to the divisor calculation immediately before use such that 
any future call path avoids setting a bad divisor.




Fixes: 0ff3ab701963 ("serial: 8250_early: Only set divisor if valid clk & baud")
Cc: stable 


Even if the test is dropped going forward, I wouldn't consider it's 
presence a "bug" such that a fix needs to be backported.


Thanks,
Matt


Signed-off-by: Michal Simek 
---

  drivers/tty/serial/8250/8250_early.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/8250/8250_early.c 
b/drivers/tty/serial/8250/8250_early.c
index ae6a256524d8..5cd8c36c8fcc 100644
--- a/drivers/tty/serial/8250/8250_early.c
+++ b/drivers/tty/serial/8250/8250_early.c
@@ -122,7 +122,7 @@ static void __init init_port(struct earlycon_device *device)
serial8250_early_out(port, UART_FCR, 0);/* no fifo */
serial8250_early_out(port, UART_MCR, 0x3);  /* DTR + RTS */
  
-	if (port->uartclk && device->baud) {

+   if (port->uartclk) {
divisor = DIV_ROUND_CLOSEST(port->uartclk, 16 * device->baud);
c = serial8250_early_in(port, UART_LCR);
serial8250_early_out(port, UART_LCR, c | UART_LCR_DLAB);



Re: [PATCH] serial: 8250_early: Setup divider when uartclk is passed

2018-04-23 Thread Matt Redfearn

Hi Michal

On 23/04/18 10:18, Michal Simek wrote:

device->baud is always non zero value because it is checked already in
early_serial8250_setup() before init_port is called.


True, currently init_port is only called from the one location and so 
the test is a little redundant, though I don't see the harm in testing 
both inputs to the divisor calculation immediately before use such that 
any future call path avoids setting a bad divisor.




Fixes: 0ff3ab701963 ("serial: 8250_early: Only set divisor if valid clk & baud")
Cc: stable 


Even if the test is dropped going forward, I wouldn't consider it's 
presence a "bug" such that a fix needs to be backported.


Thanks,
Matt


Signed-off-by: Michal Simek 
---

  drivers/tty/serial/8250/8250_early.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/8250/8250_early.c 
b/drivers/tty/serial/8250/8250_early.c
index ae6a256524d8..5cd8c36c8fcc 100644
--- a/drivers/tty/serial/8250/8250_early.c
+++ b/drivers/tty/serial/8250/8250_early.c
@@ -122,7 +122,7 @@ static void __init init_port(struct earlycon_device *device)
serial8250_early_out(port, UART_FCR, 0);/* no fifo */
serial8250_early_out(port, UART_MCR, 0x3);  /* DTR + RTS */
  
-	if (port->uartclk && device->baud) {

+   if (port->uartclk) {
divisor = DIV_ROUND_CLOSEST(port->uartclk, 16 * device->baud);
c = serial8250_early_in(port, UART_LCR);
serial8250_early_out(port, UART_LCR, c | UART_LCR_DLAB);



Re: [RFC. PATCH] earlycon: Remove hardcoded port->uartclk initialization in of_setup_earlycon

2018-04-23 Thread Matt Redfearn



On 23/04/18 10:27, Michal Simek wrote:

There is no reason to initialize uartclk to BASE_BAUD * 16 for DT based
systems.

Signed-off-by: Michal Simek <michal.si...@xilinx.com>
---

It looks like from history that portclk = BASE_BAUD * 16 was setup to
get on calculation on x86 (divisor = 1) but it shouldn't be needed on DT based
system. That's why I think that there is no DT based system which really
requires this line.


Hi Michal,

This is fine for the MIPS generic platform (tested on Boston board) 
which was broken by some earlycon changes a few versions ago. Also 
tested on a MIPS pistachio board. As long as the bootloader has 
configured the uart divisor, earlycon should work as long as my patch 
"serial: 8250_early: Only set divisor if valid clk & baud" is applied to 
avoid a bad divisor getting calculated.


Tested-by: Matt Redfearn <matt.redfe...@mips.com>

Thanks,
Matt


---
  drivers/tty/serial/earlycon.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/tty/serial/earlycon.c b/drivers/tty/serial/earlycon.c
index a24278380fec..c12b1edcdd8e 100644
--- a/drivers/tty/serial/earlycon.c
+++ b/drivers/tty/serial/earlycon.c
@@ -244,7 +244,6 @@ int __init of_setup_earlycon(const struct earlycon_id 
*match,
return -ENXIO;
}
port->mapbase = addr;
-   port->uartclk = BASE_BAUD * 16;
  
  	val = of_get_flat_dt_prop(node, "reg-offset", NULL);

if (val)



Re: [RFC. PATCH] earlycon: Remove hardcoded port->uartclk initialization in of_setup_earlycon

2018-04-23 Thread Matt Redfearn



On 23/04/18 10:27, Michal Simek wrote:

There is no reason to initialize uartclk to BASE_BAUD * 16 for DT based
systems.

Signed-off-by: Michal Simek 
---

It looks like from history that portclk = BASE_BAUD * 16 was setup to
get on calculation on x86 (divisor = 1) but it shouldn't be needed on DT based
system. That's why I think that there is no DT based system which really
requires this line.


Hi Michal,

This is fine for the MIPS generic platform (tested on Boston board) 
which was broken by some earlycon changes a few versions ago. Also 
tested on a MIPS pistachio board. As long as the bootloader has 
configured the uart divisor, earlycon should work as long as my patch 
"serial: 8250_early: Only set divisor if valid clk & baud" is applied to 
avoid a bad divisor getting calculated.


Tested-by: Matt Redfearn 

Thanks,
Matt


---
  drivers/tty/serial/earlycon.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/tty/serial/earlycon.c b/drivers/tty/serial/earlycon.c
index a24278380fec..c12b1edcdd8e 100644
--- a/drivers/tty/serial/earlycon.c
+++ b/drivers/tty/serial/earlycon.c
@@ -244,7 +244,6 @@ int __init of_setup_earlycon(const struct earlycon_id 
*match,
return -ENXIO;
}
port->mapbase = addr;
-   port->uartclk = BASE_BAUD * 16;
  
  	val = of_get_flat_dt_prop(node, "reg-offset", NULL);

if (val)



Re: [PATCH 3.18 45/52] MIPS: memset.S: Fix clobber of v1 in last_fixup

2018-04-23 Thread Matt Redfearn



On 23/04/18 08:16, Heiher wrote:

Hi,

IIRC, The v1 is a temporary register, value is not preserved across
function calls.


v1 is conventionally used for a function return value and as such can be 
changed by called functions. However, bzero is called from inline 
assembly and v1 is not in the clobbers list

https://elixir.bootlin.com/linux/v4.17-rc1/source/arch/mips/include/asm/uaccess.h#L652
So the calling function does not expect that register to have been used 
and can legitimately expect its value to remain after the function call, 
which without this patch, it does not - as demonstrated by the test code.


Thanks,
Matt



I don't see any functions that generated by compiler to restore values
of v1 after clobbered it.

On Sun, Apr 22, 2018 at 9:54 PM, Greg Kroah-Hartman
<gre...@linuxfoundation.org> wrote:

3.18-stable review patch.  If anyone has any objections, please let me know.

--

From: Matt Redfearn <matt.redfe...@mips.com>

commit c96eebf07692e53bf4dd5987510d8b550e793598 upstream.

The label .Llast_fixup\@ is jumped to on page fault within the final
byte set loop of memset (on < MIPSR6 architectures). For some reason, in
this fault handler, the v1 register is randomly set to a2 & STORMASK.
This clobbers v1 for the calling function. This can be observed with the
following test code:

static int __init __attribute__((optimize("O0"))) test_clear_user(void)
{
   register int t asm("v1");
   char *test;
   int j, k;

   pr_info("\n\n\nTesting clear_user\n");
   test = vmalloc(PAGE_SIZE);

   for (j = 256; j < 512; j++) {
 t = 0xa5a5a5a5;
 if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) {
 pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, 
k);
 }
 if (t != 0xa5a5a5a5) {
pr_err("v1 was clobbered to 0x%x!\n", t);
 }
   }

   return 0;
}
late_initcall(test_clear_user);

Which demonstrates that v1 is indeed clobbered (MIPS64):

Testing clear_user
v1 was clobbered to 0x1!
v1 was clobbered to 0x2!
v1 was clobbered to 0x3!
v1 was clobbered to 0x4!
v1 was clobbered to 0x5!
v1 was clobbered to 0x6!
v1 was clobbered to 0x7!

Since the number of bytes that could not be set is already contained in
a2, the andi placing a value in v1 is not necessary and actively
harmful in clobbering v1.

Reported-by: James Hogan <jho...@kernel.org>
Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
Cc: Ralf Baechle <r...@linux-mips.org>
Cc: linux-m...@linux-mips.org
Cc: sta...@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/19109/
Signed-off-by: James Hogan <jho...@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
  arch/mips/lib/memset.S |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -210,7 +210,7 @@

  .Llast_fixup\@:
 jr  ra
-   andiv1, a2, STORMASK
+nop

  .Lsmall_fixup\@:
 PTR_SUBUa2, t1, a0









Re: [PATCH 3.18 45/52] MIPS: memset.S: Fix clobber of v1 in last_fixup

2018-04-23 Thread Matt Redfearn



On 23/04/18 08:16, Heiher wrote:

Hi,

IIRC, The v1 is a temporary register, value is not preserved across
function calls.


v1 is conventionally used for a function return value and as such can be 
changed by called functions. However, bzero is called from inline 
assembly and v1 is not in the clobbers list

https://elixir.bootlin.com/linux/v4.17-rc1/source/arch/mips/include/asm/uaccess.h#L652
So the calling function does not expect that register to have been used 
and can legitimately expect its value to remain after the function call, 
which without this patch, it does not - as demonstrated by the test code.


Thanks,
Matt



I don't see any functions that generated by compiler to restore values
of v1 after clobbered it.

On Sun, Apr 22, 2018 at 9:54 PM, Greg Kroah-Hartman
 wrote:

3.18-stable review patch.  If anyone has any objections, please let me know.

--

From: Matt Redfearn 

commit c96eebf07692e53bf4dd5987510d8b550e793598 upstream.

The label .Llast_fixup\@ is jumped to on page fault within the final
byte set loop of memset (on < MIPSR6 architectures). For some reason, in
this fault handler, the v1 register is randomly set to a2 & STORMASK.
This clobbers v1 for the calling function. This can be observed with the
following test code:

static int __init __attribute__((optimize("O0"))) test_clear_user(void)
{
   register int t asm("v1");
   char *test;
   int j, k;

   pr_info("\n\n\nTesting clear_user\n");
   test = vmalloc(PAGE_SIZE);

   for (j = 256; j < 512; j++) {
 t = 0xa5a5a5a5;
 if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) {
 pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, 
k);
 }
 if (t != 0xa5a5a5a5) {
pr_err("v1 was clobbered to 0x%x!\n", t);
 }
   }

   return 0;
}
late_initcall(test_clear_user);

Which demonstrates that v1 is indeed clobbered (MIPS64):

Testing clear_user
v1 was clobbered to 0x1!
v1 was clobbered to 0x2!
v1 was clobbered to 0x3!
v1 was clobbered to 0x4!
v1 was clobbered to 0x5!
v1 was clobbered to 0x6!
v1 was clobbered to 0x7!

Since the number of bytes that could not be set is already contained in
a2, the andi placing a value in v1 is not necessary and actively
harmful in clobbering v1.

Reported-by: James Hogan 
Signed-off-by: Matt Redfearn 
Cc: Ralf Baechle 
Cc: linux-m...@linux-mips.org
Cc: sta...@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/19109/
Signed-off-by: James Hogan 
Signed-off-by: Greg Kroah-Hartman 

---
  arch/mips/lib/memset.S |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -210,7 +210,7 @@

  .Llast_fixup\@:
 jr  ra
-   andiv1, a2, STORMASK
+nop

  .Lsmall_fixup\@:
 PTR_SUBUa2, t1, a0









[PATCH v3 2/7] MIPS: perf: More robustly probe for the presence of per-tc counters

2018-04-20 Thread Matt Redfearn
The presence of per TC performance counters is now detected by
cpu-probe.c and indicated by MIPS_CPU_MT_PER_TC_PERF_COUNTERS in
cpu_data. Switch detection of the feature to use this new flag rather
than blindly testing the implementation specific config7 register with a
magic number.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Changes in v3:
Use flag in cpu_data set by cpu_probe.c to indicate feature presence.

Changes in v2: None

 arch/mips/include/asm/cpu-features.h | 7 +++
 arch/mips/kernel/perf_event_mipsxx.c | 3 ---
 arch/mips/oprofile/op_model_mipsxx.c | 2 --
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/cpu-features.h 
b/arch/mips/include/asm/cpu-features.h
index 721b698bfe3c..69755d900c69 100644
--- a/arch/mips/include/asm/cpu-features.h
+++ b/arch/mips/include/asm/cpu-features.h
@@ -534,6 +534,13 @@
 # define cpu_has_shared_ftlb_entries 0
 #endif
 
+#ifdef CONFIG_MIPS_MT_SMP
+# define cpu_has_mipsmt_pertccounters \
+   (cpu_data[0].options & MIPS_CPU_MT_PER_TC_PERF_COUNTERS)
+#else
+# define cpu_has_mipsmt_pertccounters 0
+#endif /* CONFIG_MIPS_MT_SMP */
+
 /*
  * Guest capabilities
  */
diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 6668f67a61c3..0595a974bc81 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -129,8 +129,6 @@ static struct mips_pmu mipspmu;
 
 
 #ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS
-static int cpu_has_mipsmt_pertccounters;
-
 static DEFINE_RWLOCK(pmuint_rwlock);
 
 #if defined(CONFIG_CPU_BMIPS5000)
@@ -1723,7 +1721,6 @@ init_hw_perf_events(void)
}
 
 #ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS
-   cpu_has_mipsmt_pertccounters = read_c0_config7() & (1<<19);
if (!cpu_has_mipsmt_pertccounters)
counters = counters_total_to_per_cpu(counters);
 #endif
diff --git a/arch/mips/oprofile/op_model_mipsxx.c 
b/arch/mips/oprofile/op_model_mipsxx.c
index c3e4c18ef8d4..7c04b17f4a48 100644
--- a/arch/mips/oprofile/op_model_mipsxx.c
+++ b/arch/mips/oprofile/op_model_mipsxx.c
@@ -36,7 +36,6 @@ static int perfcount_irq;
 #endif
 
 #ifdef CONFIG_MIPS_MT_SMP
-static int cpu_has_mipsmt_pertccounters;
 #define WHAT   (MIPS_PERFCTRL_MT_EN_VPE | \
 M_PERFCTL_VPEID(cpu_vpe_id(_cpu_data)))
 #define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
@@ -326,7 +325,6 @@ static int __init mipsxx_init(void)
}
 
 #ifdef CONFIG_MIPS_MT_SMP
-   cpu_has_mipsmt_pertccounters = read_c0_config7() & (1<<19);
if (!cpu_has_mipsmt_pertccounters)
counters = counters_total_to_per_cpu(counters);
 #endif
-- 
2.7.4



[PATCH v3 2/7] MIPS: perf: More robustly probe for the presence of per-tc counters

2018-04-20 Thread Matt Redfearn
The presence of per TC performance counters is now detected by
cpu-probe.c and indicated by MIPS_CPU_MT_PER_TC_PERF_COUNTERS in
cpu_data. Switch detection of the feature to use this new flag rather
than blindly testing the implementation specific config7 register with a
magic number.

Signed-off-by: Matt Redfearn 
---

Changes in v3:
Use flag in cpu_data set by cpu_probe.c to indicate feature presence.

Changes in v2: None

 arch/mips/include/asm/cpu-features.h | 7 +++
 arch/mips/kernel/perf_event_mipsxx.c | 3 ---
 arch/mips/oprofile/op_model_mipsxx.c | 2 --
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/cpu-features.h 
b/arch/mips/include/asm/cpu-features.h
index 721b698bfe3c..69755d900c69 100644
--- a/arch/mips/include/asm/cpu-features.h
+++ b/arch/mips/include/asm/cpu-features.h
@@ -534,6 +534,13 @@
 # define cpu_has_shared_ftlb_entries 0
 #endif
 
+#ifdef CONFIG_MIPS_MT_SMP
+# define cpu_has_mipsmt_pertccounters \
+   (cpu_data[0].options & MIPS_CPU_MT_PER_TC_PERF_COUNTERS)
+#else
+# define cpu_has_mipsmt_pertccounters 0
+#endif /* CONFIG_MIPS_MT_SMP */
+
 /*
  * Guest capabilities
  */
diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 6668f67a61c3..0595a974bc81 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -129,8 +129,6 @@ static struct mips_pmu mipspmu;
 
 
 #ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS
-static int cpu_has_mipsmt_pertccounters;
-
 static DEFINE_RWLOCK(pmuint_rwlock);
 
 #if defined(CONFIG_CPU_BMIPS5000)
@@ -1723,7 +1721,6 @@ init_hw_perf_events(void)
}
 
 #ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS
-   cpu_has_mipsmt_pertccounters = read_c0_config7() & (1<<19);
if (!cpu_has_mipsmt_pertccounters)
counters = counters_total_to_per_cpu(counters);
 #endif
diff --git a/arch/mips/oprofile/op_model_mipsxx.c 
b/arch/mips/oprofile/op_model_mipsxx.c
index c3e4c18ef8d4..7c04b17f4a48 100644
--- a/arch/mips/oprofile/op_model_mipsxx.c
+++ b/arch/mips/oprofile/op_model_mipsxx.c
@@ -36,7 +36,6 @@ static int perfcount_irq;
 #endif
 
 #ifdef CONFIG_MIPS_MT_SMP
-static int cpu_has_mipsmt_pertccounters;
 #define WHAT   (MIPS_PERFCTRL_MT_EN_VPE | \
 M_PERFCTL_VPEID(cpu_vpe_id(_cpu_data)))
 #define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
@@ -326,7 +325,6 @@ static int __init mipsxx_init(void)
}
 
 #ifdef CONFIG_MIPS_MT_SMP
-   cpu_has_mipsmt_pertccounters = read_c0_config7() & (1<<19);
if (!cpu_has_mipsmt_pertccounters)
counters = counters_total_to_per_cpu(counters);
 #endif
-- 
2.7.4



[PATCH v3 7/7] MIPS: perf: Fix BMIPS5000 system mode counting

2018-04-20 Thread Matt Redfearn
When perf is used in system mode, i.e. specifying a set of CPUs to
count (perf -a -C cpu), event->cpu is set to the CPU number on which
events should be counted. The current BMIPS500 variation of
mipsxx_pmu_enable_event only over sets the counter to count the current
CPU, so system mode does not work.

Fix this by removing this BMIPS5000 specific path and integrating it
with the generic one. Since BMIPS5000 uses specific extensions to the
perf control register, different fields must be set up to count the
relevant CPU.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Changes in v3: None
Changes in v2:
New patch to fix BMIPS5000 system mode perf.

Florian, I don't have access to a BMIPS5000 board, but from code
inspection only I suspect this patch is necessary to have system mode
work. If someone could test that would be appreciated.

---
 arch/mips/include/asm/mipsregs.h |  1 +
 arch/mips/kernel/perf_event_mipsxx.c | 17 ++---
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index a4b02bc8..3e1fbb7aaa2a 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -735,6 +735,7 @@
 #define MIPS_PERFCTRL_MT_EN_TC (_ULCAST_(2) << 20)
 
 /* PerfCnt control register MT extensions used by BMIPS5000 */
+#define BRCM_PERFCTRL_VPEID(v) (_ULCAST_(1) << (12 + v))
 #define BRCM_PERFCTRL_TC   (_ULCAST_(1) << 30)
 
 /* PerfCnt control register MT extensions used by Netlogic XLR */
diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 5b8811643e60..77d7167e303b 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -364,16 +364,7 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
/* Make sure interrupt enabled. */
MIPS_PERFCTRL_IE;
 
-#ifdef CONFIG_CPU_BMIPS5000
-   {
-   /* enable the counter for the calling thread */
-   unsigned int vpe_id;
-
-   vpe_id = smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK;
-   cpuc->saved_ctrl[idx] |= BIT(12 + vpe_id) | BRCM_PERFCTRL_TC;
-   }
-#else
-#ifdef CONFIG_MIPS_MT_SMP
+#if defined(CONFIG_MIPS_MT_SMP) && !defined(CONFIG_CPU_BMIPS5000)
if (range > V) {
/* The counter is processor wide. Set it up to count all TCs. */
pr_debug("Enabling perf counter for all TCs\n");
@@ -390,12 +381,16 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
 */
cpu = (event->cpu >= 0) ? event->cpu : smp_processor_id();
 
+#if defined(CONFIG_CPU_BMIPS5000)
+   ctrl = BRCM_PERFCTRL_VPEID(cpu & MIPS_CPUID_TO_COUNTER_MASK);
+   ctrl |= BRCM_PERFCTRL_TC;
+#else
ctrl = M_PERFCTL_VPEID(cpu_vpe_id(_data[cpu]));
ctrl |= M_TC_EN_VPE;
+#endif
cpuc->saved_ctrl[idx] |= ctrl;
pr_debug("Enabling perf counter for CPU%d\n", cpu);
}
-#endif /* CONFIG_CPU_BMIPS5000 */
/*
 * We do not actually let the counter run. Leave it until start().
 */
-- 
2.7.4



[PATCH v3 7/7] MIPS: perf: Fix BMIPS5000 system mode counting

2018-04-20 Thread Matt Redfearn
When perf is used in system mode, i.e. specifying a set of CPUs to
count (perf -a -C cpu), event->cpu is set to the CPU number on which
events should be counted. The current BMIPS500 variation of
mipsxx_pmu_enable_event only over sets the counter to count the current
CPU, so system mode does not work.

Fix this by removing this BMIPS5000 specific path and integrating it
with the generic one. Since BMIPS5000 uses specific extensions to the
perf control register, different fields must be set up to count the
relevant CPU.

Signed-off-by: Matt Redfearn 
---

Changes in v3: None
Changes in v2:
New patch to fix BMIPS5000 system mode perf.

Florian, I don't have access to a BMIPS5000 board, but from code
inspection only I suspect this patch is necessary to have system mode
work. If someone could test that would be appreciated.

---
 arch/mips/include/asm/mipsregs.h |  1 +
 arch/mips/kernel/perf_event_mipsxx.c | 17 ++---
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index a4b02bc8..3e1fbb7aaa2a 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -735,6 +735,7 @@
 #define MIPS_PERFCTRL_MT_EN_TC (_ULCAST_(2) << 20)
 
 /* PerfCnt control register MT extensions used by BMIPS5000 */
+#define BRCM_PERFCTRL_VPEID(v) (_ULCAST_(1) << (12 + v))
 #define BRCM_PERFCTRL_TC   (_ULCAST_(1) << 30)
 
 /* PerfCnt control register MT extensions used by Netlogic XLR */
diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 5b8811643e60..77d7167e303b 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -364,16 +364,7 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
/* Make sure interrupt enabled. */
MIPS_PERFCTRL_IE;
 
-#ifdef CONFIG_CPU_BMIPS5000
-   {
-   /* enable the counter for the calling thread */
-   unsigned int vpe_id;
-
-   vpe_id = smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK;
-   cpuc->saved_ctrl[idx] |= BIT(12 + vpe_id) | BRCM_PERFCTRL_TC;
-   }
-#else
-#ifdef CONFIG_MIPS_MT_SMP
+#if defined(CONFIG_MIPS_MT_SMP) && !defined(CONFIG_CPU_BMIPS5000)
if (range > V) {
/* The counter is processor wide. Set it up to count all TCs. */
pr_debug("Enabling perf counter for all TCs\n");
@@ -390,12 +381,16 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
 */
cpu = (event->cpu >= 0) ? event->cpu : smp_processor_id();
 
+#if defined(CONFIG_CPU_BMIPS5000)
+   ctrl = BRCM_PERFCTRL_VPEID(cpu & MIPS_CPUID_TO_COUNTER_MASK);
+   ctrl |= BRCM_PERFCTRL_TC;
+#else
ctrl = M_PERFCTL_VPEID(cpu_vpe_id(_data[cpu]));
ctrl |= M_TC_EN_VPE;
+#endif
cpuc->saved_ctrl[idx] |= ctrl;
pr_debug("Enabling perf counter for CPU%d\n", cpu);
}
-#endif /* CONFIG_CPU_BMIPS5000 */
/*
 * We do not actually let the counter run. Leave it until start().
 */
-- 
2.7.4



[PATCH v3 5/7] MIPS: perf: Allocate per-core counters on demand

2018-04-20 Thread Matt Redfearn
Previously when performance counters are per-core, rather than
per-thread, the number available were divided by 2 on detection, and the
counters used by each thread in a core were "swizzled" to ensure
separation. However, this solution is suboptimal since it relies on a
couple of assumptions:
a) Always having 2 VPEs / core (number of counters was divided by 2)
b) Always having a number of counters implemented in the core that is
   divisible by 2. For instance if an SoC implementation had a single
   counter and 2 VPEs per core, then this logic would fail and no
   performance counters would be available.
The mechanism also does not allow for one VPE in a core using more than
it's allocation of the per-core counters to count multiple events even
though other VPEs may not be using them.

Fix this situation by instead allocating (and releasing) per-core
performance counters when they are requested. This approach removes the
above assumptions and fixes the shortcomings.

In order to do this:
Add additional logic to mipsxx_pmu_alloc_counter() to detect if a
sibling is using a per-core counter, and to allocate a per-core counter
in all sibling CPUs.
Similarly, add a mipsxx_pmu_free_counter() function to release a
per-core counter in all sibling CPUs when it is finished with.
A new spinlock, core_counters_lock, is introduced to ensure exclusivity
when allocating and releasing per-core counters.
Since counters are now allocated per-core on demand, rather than being
reserved per-thread at boot, all of the "swizzling" of counters is
removed.

The upshot is that in an SoC with 2 counters / thread, counters are
reported as:
Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
available to each CPU, irq 18

Running an instance of a test program on each of 2 threads in a
core, both threads can use their 2 counters to count 2 events:

taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
perf stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

   0.005164264 seconds time elapsed
 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

   0.006139975 seconds time elapsed

In an SoC with 2 counters / core (which can be forced by setting
cpu_has_mipsmt_pertccounters = 0), counters are reported as:
Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
available to each core, irq 18

Running an instance of a test program on each of 2 threads in a
core, now only one thread manages to secure the performance counters to
count 2 events. The other thread does not get any counters.

taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
perf stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

instructions:u
branches:u

   0.005179533 seconds time elapsed

 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

   0.005179467 seconds time elapsed

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Changes in v3:
- rebase on new feature detection

Changes in v2:
- Fix !#ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS build
- re-use cpuc variable in mipsxx_pmu_alloc_counter,
  mipsxx_pmu_free_counter rather than having sibling_ version.

 arch/mips/kernel/perf_event_mipsxx.c | 130 +++
 1 file changed, 85 insertions(+), 45 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index fe50986e83c6..a0aa1b79 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -129,6 +129,8 @@ static struct mips_pmu mipspmu;
 
 
 #ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS
+static DEFINE_SPINLOCK(core_counters_lock);
+
 static DEFINE_RWLOCK(pmuint_rwlock);
 
 #if defined(CONFIG_CPU_BMIPS5000)
@@ -139,20 +141,6 @@ static DEFINE_RWLOCK(pmuint_rwlock);
 0 : cpu_vpe_id(_cpu_data))
 #endif
 
-/* Copied from op_model_mipsxx.c */
-static unsigned int vpe_shift(void)
-{
-   if (num_possible_cpus() > 1)
-   return 1;
-
-   return 0;
-}
-
-static unsigned int counters_total_to_per_cpu(unsigned int counters)
-{
-   return counters >> vpe_shift();
-}
-
 #else /* !CONFIG_MIPS_PERF_SHARED_TC_COUNTERS */
 #define vpe_id()   0
 
@@ -163,17 +151,8 @@ static void pause_local_counters(void);
 static irqreturn_t mipsxx_pmu_handle_irq(int, void *);
 static int mipsxx_pmu_handle_shared_irq(void);
 
-static unsigned int mipsxx_pmu_swizzle_perf_idx(unsigned int idx)
-{
-   if (vpe_id() == 1)
-   idx = (idx + 2) & 3;
-   return idx;
-}
-
 static u64 mipsxx_pmu_read_counter(unsigned int idx)
 {
-   idx = mipsxx_pmu_swizzle_perf_id

[PATCH v3 6/7] MIPS: perf: Fold vpe_id() macro into it's one last usage

2018-04-20 Thread Matt Redfearn
The vpe_id() macro is now used only in mipsxx_pmu_enable_event when
CONFIG_CPU_BMIPS5000 is defined. Fold the one used definition of the
macro into it's usage and remove the now unused definitions.

Since we know that cpu_has_mipsmt_pertccounters == 0 on BMIPS5000,
remove the test on it and just set the counter to count the relevant
VPE.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>

---

Changes in v3: None
Changes in v2:
Since BMIPS5000 does not implement per TC counters, we can remove the
check on cpu_has_mipsmt_pertccounters.

 arch/mips/kernel/perf_event_mipsxx.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index a0aa1b79..5b8811643e60 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -132,18 +132,6 @@ static struct mips_pmu mipspmu;
 static DEFINE_SPINLOCK(core_counters_lock);
 
 static DEFINE_RWLOCK(pmuint_rwlock);
-
-#if defined(CONFIG_CPU_BMIPS5000)
-#define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
-0 : (smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK))
-#else
-#define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
-0 : cpu_vpe_id(_cpu_data))
-#endif
-
-#else /* !CONFIG_MIPS_PERF_SHARED_TC_COUNTERS */
-#define vpe_id()   0
-
 #endif /* CONFIG_MIPS_PERF_SHARED_TC_COUNTERS */
 
 static void resume_local_counters(void);
@@ -379,8 +367,10 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
 #ifdef CONFIG_CPU_BMIPS5000
{
/* enable the counter for the calling thread */
-   cpuc->saved_ctrl[idx] |=
-   (1 << (12 + vpe_id())) | BRCM_PERFCTRL_TC;
+   unsigned int vpe_id;
+
+   vpe_id = smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK;
+   cpuc->saved_ctrl[idx] |= BIT(12 + vpe_id) | BRCM_PERFCTRL_TC;
}
 #else
 #ifdef CONFIG_MIPS_MT_SMP
-- 
2.7.4



[PATCH v3 6/7] MIPS: perf: Fold vpe_id() macro into it's one last usage

2018-04-20 Thread Matt Redfearn
The vpe_id() macro is now used only in mipsxx_pmu_enable_event when
CONFIG_CPU_BMIPS5000 is defined. Fold the one used definition of the
macro into it's usage and remove the now unused definitions.

Since we know that cpu_has_mipsmt_pertccounters == 0 on BMIPS5000,
remove the test on it and just set the counter to count the relevant
VPE.

Signed-off-by: Matt Redfearn 

---

Changes in v3: None
Changes in v2:
Since BMIPS5000 does not implement per TC counters, we can remove the
check on cpu_has_mipsmt_pertccounters.

 arch/mips/kernel/perf_event_mipsxx.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index a0aa1b79..5b8811643e60 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -132,18 +132,6 @@ static struct mips_pmu mipspmu;
 static DEFINE_SPINLOCK(core_counters_lock);
 
 static DEFINE_RWLOCK(pmuint_rwlock);
-
-#if defined(CONFIG_CPU_BMIPS5000)
-#define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
-0 : (smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK))
-#else
-#define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
-0 : cpu_vpe_id(_cpu_data))
-#endif
-
-#else /* !CONFIG_MIPS_PERF_SHARED_TC_COUNTERS */
-#define vpe_id()   0
-
 #endif /* CONFIG_MIPS_PERF_SHARED_TC_COUNTERS */
 
 static void resume_local_counters(void);
@@ -379,8 +367,10 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
 #ifdef CONFIG_CPU_BMIPS5000
{
/* enable the counter for the calling thread */
-   cpuc->saved_ctrl[idx] |=
-   (1 << (12 + vpe_id())) | BRCM_PERFCTRL_TC;
+   unsigned int vpe_id;
+
+   vpe_id = smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK;
+   cpuc->saved_ctrl[idx] |= BIT(12 + vpe_id) | BRCM_PERFCTRL_TC;
}
 #else
 #ifdef CONFIG_MIPS_MT_SMP
-- 
2.7.4



[PATCH v3 5/7] MIPS: perf: Allocate per-core counters on demand

2018-04-20 Thread Matt Redfearn
Previously when performance counters are per-core, rather than
per-thread, the number available were divided by 2 on detection, and the
counters used by each thread in a core were "swizzled" to ensure
separation. However, this solution is suboptimal since it relies on a
couple of assumptions:
a) Always having 2 VPEs / core (number of counters was divided by 2)
b) Always having a number of counters implemented in the core that is
   divisible by 2. For instance if an SoC implementation had a single
   counter and 2 VPEs per core, then this logic would fail and no
   performance counters would be available.
The mechanism also does not allow for one VPE in a core using more than
it's allocation of the per-core counters to count multiple events even
though other VPEs may not be using them.

Fix this situation by instead allocating (and releasing) per-core
performance counters when they are requested. This approach removes the
above assumptions and fixes the shortcomings.

In order to do this:
Add additional logic to mipsxx_pmu_alloc_counter() to detect if a
sibling is using a per-core counter, and to allocate a per-core counter
in all sibling CPUs.
Similarly, add a mipsxx_pmu_free_counter() function to release a
per-core counter in all sibling CPUs when it is finished with.
A new spinlock, core_counters_lock, is introduced to ensure exclusivity
when allocating and releasing per-core counters.
Since counters are now allocated per-core on demand, rather than being
reserved per-thread at boot, all of the "swizzling" of counters is
removed.

The upshot is that in an SoC with 2 counters / thread, counters are
reported as:
Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
available to each CPU, irq 18

Running an instance of a test program on each of 2 threads in a
core, both threads can use their 2 counters to count 2 events:

taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
perf stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

   0.005164264 seconds time elapsed
 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

   0.006139975 seconds time elapsed

In an SoC with 2 counters / core (which can be forced by setting
cpu_has_mipsmt_pertccounters = 0), counters are reported as:
Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
available to each core, irq 18

Running an instance of a test program on each of 2 threads in a
core, now only one thread manages to secure the performance counters to
count 2 events. The other thread does not get any counters.

taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
perf stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

instructions:u
branches:u

   0.005179533 seconds time elapsed

 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

   0.005179467 seconds time elapsed

Signed-off-by: Matt Redfearn 
---

Changes in v3:
- rebase on new feature detection

Changes in v2:
- Fix !#ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS build
- re-use cpuc variable in mipsxx_pmu_alloc_counter,
  mipsxx_pmu_free_counter rather than having sibling_ version.

 arch/mips/kernel/perf_event_mipsxx.c | 130 +++
 1 file changed, 85 insertions(+), 45 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index fe50986e83c6..a0aa1b79 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -129,6 +129,8 @@ static struct mips_pmu mipspmu;
 
 
 #ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS
+static DEFINE_SPINLOCK(core_counters_lock);
+
 static DEFINE_RWLOCK(pmuint_rwlock);
 
 #if defined(CONFIG_CPU_BMIPS5000)
@@ -139,20 +141,6 @@ static DEFINE_RWLOCK(pmuint_rwlock);
 0 : cpu_vpe_id(_cpu_data))
 #endif
 
-/* Copied from op_model_mipsxx.c */
-static unsigned int vpe_shift(void)
-{
-   if (num_possible_cpus() > 1)
-   return 1;
-
-   return 0;
-}
-
-static unsigned int counters_total_to_per_cpu(unsigned int counters)
-{
-   return counters >> vpe_shift();
-}
-
 #else /* !CONFIG_MIPS_PERF_SHARED_TC_COUNTERS */
 #define vpe_id()   0
 
@@ -163,17 +151,8 @@ static void pause_local_counters(void);
 static irqreturn_t mipsxx_pmu_handle_irq(int, void *);
 static int mipsxx_pmu_handle_shared_irq(void);
 
-static unsigned int mipsxx_pmu_swizzle_perf_idx(unsigned int idx)
-{
-   if (vpe_id() == 1)
-   idx = (idx + 2) & 3;
-   return idx;
-}
-
 static u64 mipsxx_pmu_read_counter(unsigned int idx)
 {
-   idx = mipsxx_pmu_swizzle_perf_idx(idx);
-
switch (idx) {
   

[PATCH v3 4/7] MIPS: perf: Fix perf with MT counting other threads

2018-04-20 Thread Matt Redfearn
When perf is used in non-system mode, i.e. without specifying CPUs to
count on, check_and_calc_range falls into the case when it sets
M_TC_EN_ALL in the counter config_base. This has the impact of always
counting for all of the threads in a core, even when the user has not
requested it. For example this can be seen with a test program which
executes 30002 instructions and 1 branches running on one VPE and a
busy load on the other VPE in the core. Without this commit, the
expected count is not returned:

taskset 4 dd if=/dev/zero of=/dev/null count=10 & taskset 8 perf
stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

103235  instructions:u
 17015  branches:u

In order to fix this, remove check_and_calc_range entirely and perform
all of the logic in mipsxx_pmu_enable_event. Since
mipsxx_pmu_enable_event now requires the range of the event, ensure that
it is set by mipspmu_perf_event_encode in the same circumstances as
before (i.e. #ifdef CONFIG_MIPS_MT_SMP && num_possible_cpus() > 1).

The logic of mipsxx_pmu_enable_event now becomes:
If the CPU is a BMIPS5000, then use the special vpe_id() implementation
to select which VPE to count.
If the counter has a range greater than a single VPE, i.e. it is a
core-wide counter, then ensure that the counter is set up to count
events from all TCs (though, since this is true by definition, is this
necessary? Just enabling a core-wide counter in the per-VPE case appears
experimentally to return the same counts. This is left in for now as the
logic was present before).
If the event is set up to count a particular CPU (i.e. system mode),
then the VPE ID of that CPU is used for the counter.
Otherwise, the event should be counted on the CPU scheduling this thread
(this was the critical bit missing from the previous implementation) so
the VPE ID of this CPU is used for the counter.

With this commit, the same test as before returns the counts expected:

taskset 4 dd if=/dev/zero of=/dev/null count=10 & taskset 8 perf
stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>

---

Changes in v3: None
Changes in v2:
Fix mipsxx_pmu_enable_event for !#ifdef CONFIG_MIPS_MT_SMP

 arch/mips/kernel/perf_event_mipsxx.c | 78 ++--
 1 file changed, 39 insertions(+), 39 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 7e2b7d38a774..fe50986e83c6 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -323,7 +323,11 @@ static int mipsxx_pmu_alloc_counter(struct cpu_hw_events 
*cpuc,
 
 static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)
 {
+   struct perf_event *event = container_of(evt, struct perf_event, hw);
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+#ifdef CONFIG_MIPS_MT_SMP
+   unsigned int range = evt->event_base >> 24;
+#endif /* CONFIG_MIPS_MT_SMP */
 
WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
 
@@ -331,11 +335,37 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
(evt->config_base & M_PERFCTL_CONFIG_MASK) |
/* Make sure interrupt enabled. */
MIPS_PERFCTRL_IE;
-   if (IS_ENABLED(CONFIG_CPU_BMIPS5000))
+
+#ifdef CONFIG_CPU_BMIPS5000
+   {
/* enable the counter for the calling thread */
cpuc->saved_ctrl[idx] |=
(1 << (12 + vpe_id())) | BRCM_PERFCTRL_TC;
+   }
+#else
+#ifdef CONFIG_MIPS_MT_SMP
+   if (range > V) {
+   /* The counter is processor wide. Set it up to count all TCs. */
+   pr_debug("Enabling perf counter for all TCs\n");
+   cpuc->saved_ctrl[idx] |= M_TC_EN_ALL;
+   } else
+#endif /* CONFIG_MIPS_MT_SMP */
+   {
+   unsigned int cpu, ctrl;
 
+   /*
+* Set up the counter for a particular CPU when event->cpu is
+* a valid CPU number. Otherwise set up the counter for the CPU
+* scheduling this thread.
+*/
+   cpu = (event->cpu >= 0) ? event->cpu : smp_processor_id();
+
+   ctrl = M_PERFCTL_VPEID(cpu_vpe_id(_data[cpu]));
+   ctrl |= M_TC_EN_VPE;
+   cpuc->saved_ctrl[idx] |= ctrl;
+   pr_debug("Enabling perf counter for CPU%d\n", cpu);
+   }
+#endif /* CONFIG_CPU_BMIPS5000 */
/*
 * We do not actually let the counter run. Leave it until start().
 */
@@ -649,13 +679,14 @@ static unsigned int mipspmu_perf_event_encode(const 
struct mips_perf_event *pev)
  * event_id.
  */
 #ifdef CON

[PATCH v3 4/7] MIPS: perf: Fix perf with MT counting other threads

2018-04-20 Thread Matt Redfearn
When perf is used in non-system mode, i.e. without specifying CPUs to
count on, check_and_calc_range falls into the case when it sets
M_TC_EN_ALL in the counter config_base. This has the impact of always
counting for all of the threads in a core, even when the user has not
requested it. For example this can be seen with a test program which
executes 30002 instructions and 1 branches running on one VPE and a
busy load on the other VPE in the core. Without this commit, the
expected count is not returned:

taskset 4 dd if=/dev/zero of=/dev/null count=10 & taskset 8 perf
stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

103235  instructions:u
 17015  branches:u

In order to fix this, remove check_and_calc_range entirely and perform
all of the logic in mipsxx_pmu_enable_event. Since
mipsxx_pmu_enable_event now requires the range of the event, ensure that
it is set by mipspmu_perf_event_encode in the same circumstances as
before (i.e. #ifdef CONFIG_MIPS_MT_SMP && num_possible_cpus() > 1).

The logic of mipsxx_pmu_enable_event now becomes:
If the CPU is a BMIPS5000, then use the special vpe_id() implementation
to select which VPE to count.
If the counter has a range greater than a single VPE, i.e. it is a
core-wide counter, then ensure that the counter is set up to count
events from all TCs (though, since this is true by definition, is this
necessary? Just enabling a core-wide counter in the per-VPE case appears
experimentally to return the same counts. This is left in for now as the
logic was present before).
If the event is set up to count a particular CPU (i.e. system mode),
then the VPE ID of that CPU is used for the counter.
Otherwise, the event should be counted on the CPU scheduling this thread
(this was the critical bit missing from the previous implementation) so
the VPE ID of this CPU is used for the counter.

With this commit, the same test as before returns the counts expected:

taskset 4 dd if=/dev/zero of=/dev/null count=10 & taskset 8 perf
stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

Signed-off-by: Matt Redfearn 

---

Changes in v3: None
Changes in v2:
Fix mipsxx_pmu_enable_event for !#ifdef CONFIG_MIPS_MT_SMP

 arch/mips/kernel/perf_event_mipsxx.c | 78 ++--
 1 file changed, 39 insertions(+), 39 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 7e2b7d38a774..fe50986e83c6 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -323,7 +323,11 @@ static int mipsxx_pmu_alloc_counter(struct cpu_hw_events 
*cpuc,
 
 static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)
 {
+   struct perf_event *event = container_of(evt, struct perf_event, hw);
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+#ifdef CONFIG_MIPS_MT_SMP
+   unsigned int range = evt->event_base >> 24;
+#endif /* CONFIG_MIPS_MT_SMP */
 
WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
 
@@ -331,11 +335,37 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
(evt->config_base & M_PERFCTL_CONFIG_MASK) |
/* Make sure interrupt enabled. */
MIPS_PERFCTRL_IE;
-   if (IS_ENABLED(CONFIG_CPU_BMIPS5000))
+
+#ifdef CONFIG_CPU_BMIPS5000
+   {
/* enable the counter for the calling thread */
cpuc->saved_ctrl[idx] |=
(1 << (12 + vpe_id())) | BRCM_PERFCTRL_TC;
+   }
+#else
+#ifdef CONFIG_MIPS_MT_SMP
+   if (range > V) {
+   /* The counter is processor wide. Set it up to count all TCs. */
+   pr_debug("Enabling perf counter for all TCs\n");
+   cpuc->saved_ctrl[idx] |= M_TC_EN_ALL;
+   } else
+#endif /* CONFIG_MIPS_MT_SMP */
+   {
+   unsigned int cpu, ctrl;
 
+   /*
+* Set up the counter for a particular CPU when event->cpu is
+* a valid CPU number. Otherwise set up the counter for the CPU
+* scheduling this thread.
+*/
+   cpu = (event->cpu >= 0) ? event->cpu : smp_processor_id();
+
+   ctrl = M_PERFCTL_VPEID(cpu_vpe_id(_data[cpu]));
+   ctrl |= M_TC_EN_VPE;
+   cpuc->saved_ctrl[idx] |= ctrl;
+   pr_debug("Enabling perf counter for CPU%d\n", cpu);
+   }
+#endif /* CONFIG_CPU_BMIPS5000 */
/*
 * We do not actually let the counter run. Leave it until start().
 */
@@ -649,13 +679,14 @@ static unsigned int mipspmu_perf_event_encode(const 
struct mips_perf_event *pev)
  * event_id.
  */
 #ifdef CONFIG_MIPS_MT_SMP
-

[PATCH v3 3/7] MIPS: perf: Use correct VPE ID when setting up VPE tracing

2018-04-20 Thread Matt Redfearn
There are a couple of FIXME's in the perf code which state that
cpu_data[event->cpu].vpe_id reports 0 for both CPUs. This is no longer
the case, since the vpe_id is used extensively by SMP CPS.

VPE local counting gets around this by using smp_processor_id() instead.
As it happens this does work correctly to count events on the right VPE,
but relies on 2 assumptions:
a) Always having 2 VPEs / core.
b) The hardware only paying attention to the least significant bit of
the PERFCTL.VPEID field.
If either of these assumptions change then the incorrect VPEs events
will be counted.

Fix this by replacing smp_processor_id() with
cpu_vpe_id(_cpu_data), in the vpe_id() macro, and pass vpe_id()
to M_PERFCTL_VPEID() when setting up PERFCTL.VPEID. The FIXME's can also
be removed since they no longer apply.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Changes in v3: None
Changes in v2: None

 arch/mips/kernel/perf_event_mipsxx.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 0595a974bc81..7e2b7d38a774 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -135,12 +135,8 @@ static DEFINE_RWLOCK(pmuint_rwlock);
 #define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
 0 : (smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK))
 #else
-/*
- * FIXME: For VSMP, vpe_id() is redefined for Perf-events, because
- * cpu_data[cpuid].vpe_id reports 0 for _both_ CPUs.
- */
 #define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
-0 : smp_processor_id())
+0 : cpu_vpe_id(_cpu_data))
 #endif
 
 /* Copied from op_model_mipsxx.c */
@@ -1277,11 +1273,7 @@ static void check_and_calc_range(struct perf_event 
*event,
 */
hwc->config_base |= M_TC_EN_ALL;
} else {
-   /*
-* FIXME: cpu_data[event->cpu].vpe_id reports 0
-* for both CPUs.
-*/
-   hwc->config_base |= M_PERFCTL_VPEID(event->cpu);
+   hwc->config_base |= M_PERFCTL_VPEID(vpe_id());
hwc->config_base |= M_TC_EN_VPE;
}
} else
-- 
2.7.4



[PATCH v3 3/7] MIPS: perf: Use correct VPE ID when setting up VPE tracing

2018-04-20 Thread Matt Redfearn
There are a couple of FIXME's in the perf code which state that
cpu_data[event->cpu].vpe_id reports 0 for both CPUs. This is no longer
the case, since the vpe_id is used extensively by SMP CPS.

VPE local counting gets around this by using smp_processor_id() instead.
As it happens this does work correctly to count events on the right VPE,
but relies on 2 assumptions:
a) Always having 2 VPEs / core.
b) The hardware only paying attention to the least significant bit of
the PERFCTL.VPEID field.
If either of these assumptions change then the incorrect VPEs events
will be counted.

Fix this by replacing smp_processor_id() with
cpu_vpe_id(_cpu_data), in the vpe_id() macro, and pass vpe_id()
to M_PERFCTL_VPEID() when setting up PERFCTL.VPEID. The FIXME's can also
be removed since they no longer apply.

Signed-off-by: Matt Redfearn 
---

Changes in v3: None
Changes in v2: None

 arch/mips/kernel/perf_event_mipsxx.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 0595a974bc81..7e2b7d38a774 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -135,12 +135,8 @@ static DEFINE_RWLOCK(pmuint_rwlock);
 #define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
 0 : (smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK))
 #else
-/*
- * FIXME: For VSMP, vpe_id() is redefined for Perf-events, because
- * cpu_data[cpuid].vpe_id reports 0 for _both_ CPUs.
- */
 #define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
-0 : smp_processor_id())
+0 : cpu_vpe_id(_cpu_data))
 #endif
 
 /* Copied from op_model_mipsxx.c */
@@ -1277,11 +1273,7 @@ static void check_and_calc_range(struct perf_event 
*event,
 */
hwc->config_base |= M_TC_EN_ALL;
} else {
-   /*
-* FIXME: cpu_data[event->cpu].vpe_id reports 0
-* for both CPUs.
-*/
-   hwc->config_base |= M_PERFCTL_VPEID(event->cpu);
+   hwc->config_base |= M_PERFCTL_VPEID(vpe_id());
hwc->config_base |= M_TC_EN_VPE;
}
} else
-- 
2.7.4



[PATCH v3 1/7] MIPS: Probe for MIPS MT perf counters per TC

2018-04-20 Thread Matt Redfearn
Processors implementing the MIPS MT ASE may have performance counters
implemented per core or per TC. Processors implemented by MIPS
Technologies signify presence per TC through a bit in the implementation
specific Config7 register. Currently the code which probes for their
presence blindly reads a magic number corresponding to this bit, despite
it potentially having a different meaning in the CPU implementation.

Since CPU features are generally detected by cpu-probe.c, perform the
detection here instead. Introduce cpu_set_mt_per_tc_perf which checks
the bit in config7 and call it from MIPS CPUs known to implement this
bit and the MT ASE, specifically, the 34K, 1004K and interAptiv.

Once the presence of the per-tc counter is indicated in cpu_data, tests
for it can be updated to use this flag.

Suggested-by: James Hogan <jho...@kernel.org>
Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>

---

Changes in v3:
New patch to detect feature presence in cpu-probe.c

Changes in v2: None

 arch/mips/include/asm/cpu.h  |  2 ++
 arch/mips/include/asm/mipsregs.h |  5 +
 arch/mips/kernel/cpu-probe.c | 12 
 3 files changed, 19 insertions(+)

diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
index d39324c4adf1..5b9d02ef4f60 100644
--- a/arch/mips/include/asm/cpu.h
+++ b/arch/mips/include/asm/cpu.h
@@ -418,6 +418,8 @@ enum cpu_type_enum {
MBIT_ULL(54)/* CPU shares FTLB RAM with 
another */
 #define MIPS_CPU_SHARED_FTLB_ENTRIES \
MBIT_ULL(55)/* CPU shares FTLB entries with 
another */
+#define MIPS_CPU_MT_PER_TC_PERF_COUNTERS \
+   MBIT_ULL(56)/* CPU has perf counters 
implemented per TC (MIPSMT ASE) */
 
 /*
  * CPU ASE encodings
diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index 858752dac337..a4b02bc8 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -684,6 +684,11 @@
 #define MIPS_CONF7_IAR (_ULCAST_(1) << 10)
 #define MIPS_CONF7_AR  (_ULCAST_(1) << 16)
 
+/* Config7 Bits specific to MIPS Technologies. */
+
+/* Performance counters implemented Per TC */
+#define MTI_CONF7_PTC  (_ULCAST_(1) << 19)
+
 /* WatchLo* register definitions */
 #define MIPS_WATCHLO_IRW   (_ULCAST_(0x7) << 0)
 
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index cf3fd549e16d..1241c2a23d90 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -414,6 +414,14 @@ static int __init ftlb_disable(char *s)
 
 __setup("noftlb", ftlb_disable);
 
+/*
+ * Check if the CPU has per tc perf counters
+ */
+static inline void cpu_set_mt_per_tc_perf(struct cpuinfo_mips *c)
+{
+   if (read_c0_config7() & MTI_CONF7_PTC)
+   c->options |= MIPS_CPU_MT_PER_TC_PERF_COUNTERS;
+}
 
 static inline void check_errata(void)
 {
@@ -1569,6 +1577,7 @@ static inline void cpu_probe_mips(struct cpuinfo_mips *c, 
unsigned int cpu)
c->cputype = CPU_34K;
c->writecombine = _CACHE_UNCACHED;
__cpu_name[cpu] = "MIPS 34Kc";
+   cpu_set_mt_per_tc_perf(c);
break;
case PRID_IMP_74K:
c->cputype = CPU_74K;
@@ -1589,6 +1598,7 @@ static inline void cpu_probe_mips(struct cpuinfo_mips *c, 
unsigned int cpu)
c->cputype = CPU_1004K;
c->writecombine = _CACHE_UNCACHED;
__cpu_name[cpu] = "MIPS 1004Kc";
+   cpu_set_mt_per_tc_perf(c);
break;
case PRID_IMP_1074K:
c->cputype = CPU_1074K;
@@ -1598,10 +1608,12 @@ static inline void cpu_probe_mips(struct cpuinfo_mips 
*c, unsigned int cpu)
case PRID_IMP_INTERAPTIV_UP:
c->cputype = CPU_INTERAPTIV;
__cpu_name[cpu] = "MIPS interAptiv";
+   cpu_set_mt_per_tc_perf(c);
break;
case PRID_IMP_INTERAPTIV_MP:
c->cputype = CPU_INTERAPTIV;
__cpu_name[cpu] = "MIPS interAptiv (multi)";
+   cpu_set_mt_per_tc_perf(c);
break;
case PRID_IMP_PROAPTIV_UP:
c->cputype = CPU_PROAPTIV;
-- 
2.7.4



[PATCH v3 1/7] MIPS: Probe for MIPS MT perf counters per TC

2018-04-20 Thread Matt Redfearn
Processors implementing the MIPS MT ASE may have performance counters
implemented per core or per TC. Processors implemented by MIPS
Technologies signify presence per TC through a bit in the implementation
specific Config7 register. Currently the code which probes for their
presence blindly reads a magic number corresponding to this bit, despite
it potentially having a different meaning in the CPU implementation.

Since CPU features are generally detected by cpu-probe.c, perform the
detection here instead. Introduce cpu_set_mt_per_tc_perf which checks
the bit in config7 and call it from MIPS CPUs known to implement this
bit and the MT ASE, specifically, the 34K, 1004K and interAptiv.

Once the presence of the per-tc counter is indicated in cpu_data, tests
for it can be updated to use this flag.

Suggested-by: James Hogan 
Signed-off-by: Matt Redfearn 

---

Changes in v3:
New patch to detect feature presence in cpu-probe.c

Changes in v2: None

 arch/mips/include/asm/cpu.h  |  2 ++
 arch/mips/include/asm/mipsregs.h |  5 +
 arch/mips/kernel/cpu-probe.c | 12 
 3 files changed, 19 insertions(+)

diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
index d39324c4adf1..5b9d02ef4f60 100644
--- a/arch/mips/include/asm/cpu.h
+++ b/arch/mips/include/asm/cpu.h
@@ -418,6 +418,8 @@ enum cpu_type_enum {
MBIT_ULL(54)/* CPU shares FTLB RAM with 
another */
 #define MIPS_CPU_SHARED_FTLB_ENTRIES \
MBIT_ULL(55)/* CPU shares FTLB entries with 
another */
+#define MIPS_CPU_MT_PER_TC_PERF_COUNTERS \
+   MBIT_ULL(56)/* CPU has perf counters 
implemented per TC (MIPSMT ASE) */
 
 /*
  * CPU ASE encodings
diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index 858752dac337..a4b02bc8 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -684,6 +684,11 @@
 #define MIPS_CONF7_IAR (_ULCAST_(1) << 10)
 #define MIPS_CONF7_AR  (_ULCAST_(1) << 16)
 
+/* Config7 Bits specific to MIPS Technologies. */
+
+/* Performance counters implemented Per TC */
+#define MTI_CONF7_PTC  (_ULCAST_(1) << 19)
+
 /* WatchLo* register definitions */
 #define MIPS_WATCHLO_IRW   (_ULCAST_(0x7) << 0)
 
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index cf3fd549e16d..1241c2a23d90 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -414,6 +414,14 @@ static int __init ftlb_disable(char *s)
 
 __setup("noftlb", ftlb_disable);
 
+/*
+ * Check if the CPU has per tc perf counters
+ */
+static inline void cpu_set_mt_per_tc_perf(struct cpuinfo_mips *c)
+{
+   if (read_c0_config7() & MTI_CONF7_PTC)
+   c->options |= MIPS_CPU_MT_PER_TC_PERF_COUNTERS;
+}
 
 static inline void check_errata(void)
 {
@@ -1569,6 +1577,7 @@ static inline void cpu_probe_mips(struct cpuinfo_mips *c, 
unsigned int cpu)
c->cputype = CPU_34K;
c->writecombine = _CACHE_UNCACHED;
__cpu_name[cpu] = "MIPS 34Kc";
+   cpu_set_mt_per_tc_perf(c);
break;
case PRID_IMP_74K:
c->cputype = CPU_74K;
@@ -1589,6 +1598,7 @@ static inline void cpu_probe_mips(struct cpuinfo_mips *c, 
unsigned int cpu)
c->cputype = CPU_1004K;
c->writecombine = _CACHE_UNCACHED;
__cpu_name[cpu] = "MIPS 1004Kc";
+   cpu_set_mt_per_tc_perf(c);
break;
case PRID_IMP_1074K:
c->cputype = CPU_1074K;
@@ -1598,10 +1608,12 @@ static inline void cpu_probe_mips(struct cpuinfo_mips 
*c, unsigned int cpu)
case PRID_IMP_INTERAPTIV_UP:
c->cputype = CPU_INTERAPTIV;
__cpu_name[cpu] = "MIPS interAptiv";
+   cpu_set_mt_per_tc_perf(c);
break;
case PRID_IMP_INTERAPTIV_MP:
c->cputype = CPU_INTERAPTIV;
__cpu_name[cpu] = "MIPS interAptiv (multi)";
+   cpu_set_mt_per_tc_perf(c);
break;
case PRID_IMP_PROAPTIV_UP:
c->cputype = CPU_PROAPTIV;
-- 
2.7.4



[PATCH v3 0/7] MIPS: perf: MT fixes and improvements

2018-04-20 Thread Matt Redfearn
This series addresses a few issues with how the MIPS performance
counters code supports the hardware multithreading MT ASE.

Firstly, implementations of the MT ASE may implement performance
counters
per core or per thread(TC). MIPS Techologies implementations signal this
via a bit in the implmentation specific CONFIG7 register. Since this
register is implementation specific, checking it should be guarded by a
PRID check. This also replaces a bit defined by a magic number.

Secondly, the code currently uses vpe_id(), defined as
smp_processor_id(), divided by 2, to share core performance counters
between VPEs. This relies on a couple of assumptions of the hardware
implementation to function correctly (always 2 VPEs per core, and the
hardware reading only the least significant bit).

Finally, the method of sharing core performance counters between VPEs is
suboptimal since it does not allow one process running on a VPE to use
all of the performance counters available to it, because the kernel will
reserve half of the coutners for the other VPE even if it may never use
them. This reservation at counter probe is replaced with an allocation
on use strategy.

Tested on a MIPS Creator CI40 (2C2T MIPS InterAptiv with per-TC
counters, though for the purposes of testing the per-TC availability was
hardcoded to allow testing both paths).

Series applies to v4.16


Changes in v3:
New patch to detect feature presence in cpu-probe.c
Use flag in cpu_data set by cpu_probe.c to indicate feature presence.
- rebase on new feature detection

Changes in v2:
Fix mipsxx_pmu_enable_event for !#ifdef CONFIG_MIPS_MT_SMP
- Fix !#ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS build
- re-use cpuc variable in mipsxx_pmu_alloc_counter,
  mipsxx_pmu_free_counter rather than having sibling_ version.
Since BMIPS5000 does not implement per TC counters, we can remove the
check on cpu_has_mipsmt_pertccounters.
New patch to fix BMIPS5000 system mode perf.

Matt Redfearn (7):
  MIPS: Probe for MIPS MT perf counters per TC
  MIPS: perf: More robustly probe for the presence of per-tc counters
  MIPS: perf: Use correct VPE ID when setting up VPE tracing
  MIPS: perf: Fix perf with MT counting other threads
  MIPS: perf: Allocate per-core counters on demand
  MIPS: perf: Fold vpe_id() macro into it's one last usage
  MIPS: perf: Fix BMIPS5000 system mode counting

 arch/mips/include/asm/cpu-features.h |   7 ++
 arch/mips/include/asm/cpu.h  |   2 +
 arch/mips/include/asm/mipsregs.h |   6 +
 arch/mips/kernel/cpu-probe.c |  12 ++
 arch/mips/kernel/perf_event_mipsxx.c | 232 +++
 arch/mips/oprofile/op_model_mipsxx.c |   2 -
 6 files changed, 150 insertions(+), 111 deletions(-)

-- 
2.7.4



[PATCH v3 0/7] MIPS: perf: MT fixes and improvements

2018-04-20 Thread Matt Redfearn
This series addresses a few issues with how the MIPS performance
counters code supports the hardware multithreading MT ASE.

Firstly, implementations of the MT ASE may implement performance
counters
per core or per thread(TC). MIPS Techologies implementations signal this
via a bit in the implmentation specific CONFIG7 register. Since this
register is implementation specific, checking it should be guarded by a
PRID check. This also replaces a bit defined by a magic number.

Secondly, the code currently uses vpe_id(), defined as
smp_processor_id(), divided by 2, to share core performance counters
between VPEs. This relies on a couple of assumptions of the hardware
implementation to function correctly (always 2 VPEs per core, and the
hardware reading only the least significant bit).

Finally, the method of sharing core performance counters between VPEs is
suboptimal since it does not allow one process running on a VPE to use
all of the performance counters available to it, because the kernel will
reserve half of the coutners for the other VPE even if it may never use
them. This reservation at counter probe is replaced with an allocation
on use strategy.

Tested on a MIPS Creator CI40 (2C2T MIPS InterAptiv with per-TC
counters, though for the purposes of testing the per-TC availability was
hardcoded to allow testing both paths).

Series applies to v4.16


Changes in v3:
New patch to detect feature presence in cpu-probe.c
Use flag in cpu_data set by cpu_probe.c to indicate feature presence.
- rebase on new feature detection

Changes in v2:
Fix mipsxx_pmu_enable_event for !#ifdef CONFIG_MIPS_MT_SMP
- Fix !#ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS build
- re-use cpuc variable in mipsxx_pmu_alloc_counter,
  mipsxx_pmu_free_counter rather than having sibling_ version.
Since BMIPS5000 does not implement per TC counters, we can remove the
check on cpu_has_mipsmt_pertccounters.
New patch to fix BMIPS5000 system mode perf.

Matt Redfearn (7):
  MIPS: Probe for MIPS MT perf counters per TC
  MIPS: perf: More robustly probe for the presence of per-tc counters
  MIPS: perf: Use correct VPE ID when setting up VPE tracing
  MIPS: perf: Fix perf with MT counting other threads
  MIPS: perf: Allocate per-core counters on demand
  MIPS: perf: Fold vpe_id() macro into it's one last usage
  MIPS: perf: Fix BMIPS5000 system mode counting

 arch/mips/include/asm/cpu-features.h |   7 ++
 arch/mips/include/asm/cpu.h  |   2 +
 arch/mips/include/asm/mipsregs.h |   6 +
 arch/mips/kernel/cpu-probe.c |  12 ++
 arch/mips/kernel/perf_event_mipsxx.c | 232 +++
 arch/mips/oprofile/op_model_mipsxx.c |   2 -
 6 files changed, 150 insertions(+), 111 deletions(-)

-- 
2.7.4



Re: [PATCH v6 3/4] MIPS: vmlinuz: Use generic ashldi3

2018-04-18 Thread Matt Redfearn

Hi James,

On 18/04/18 00:09, James Hogan wrote:

On Wed, Apr 11, 2018 at 08:50:18AM +0100, Matt Redfearn wrote:

diff --git a/arch/mips/boot/compressed/Makefile 
b/arch/mips/boot/compressed/Makefile
index adce180f3ee4..e03f522c33ac 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -46,9 +46,12 @@ $(obj)/uart-ath79.c: 
$(srctree)/arch/mips/ath79/early_printk.c
  
  vmlinuzobjs-$(CONFIG_KERNEL_XZ) += $(obj)/ashldi3.o $(obj)/bswapsi.o
  
-extra-y += ashldi3.c bswapsi.c

-$(obj)/ashldi3.o $(obj)/bswapsi.o: KBUILD_CFLAGS += -I$(srctree)/arch/mips/lib
-$(obj)/ashldi3.c $(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c
+extra-y += ashldi3.c
+$(obj)/ashldi3.c: $(obj)/%.c: $(srctree)/lib/%.c
+   $(call cmd,shipped)
+
+extra-y += bswapsi.c
+$(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c
$(call cmd,shipped)


ci20_defconfig:

arch/mips/boot/compressed/ashldi3.c:4:10: fatal error: libgcc.h: No such file 
or directory
  #include "libgcc.h"
^~

It looks like it had already copied ashldi3.c from arch/mips/lib/ when
building an older commit, and it hasn't been regenerated from lib/ since
the Makefile changed, so its still using the old version.

I think it should be using FORCE and if_changed like this:

diff --git a/arch/mips/boot/compressed/Makefile 
b/arch/mips/boot/compressed/Makefile
index e03f522c33ac..abe77add8789 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -47,12 +47,12 @@ $(obj)/uart-ath79.c: 
$(srctree)/arch/mips/ath79/early_printk.c
  vmlinuzobjs-$(CONFIG_KERNEL_XZ) += $(obj)/ashldi3.o $(obj)/bswapsi.o
  
  extra-y += ashldi3.c

-$(obj)/ashldi3.c: $(obj)/%.c: $(srctree)/lib/%.c
-   $(call cmd,shipped)
+$(obj)/ashldi3.c: $(obj)/%.c: $(srctree)/lib/%.c FORCE
+   $(call if_changed,shipped)
  
  extra-y += bswapsi.c

-$(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c
-   $(call cmd,shipped)
+$(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c FORCE
+   $(call if_changed,shipped)
  
  targets := $(notdir $(vmlinuzobjs-y))
  
That resolves the build failures when checking out old -> new without

cleaning, since the .ashldi3.c.cmd is missing so it gets rebuilt.

It should also resolve issues if the path it copies from is updated in
future since the .ashldi3.c.cmd will get updated.

If you checkout new -> old without cleaning, the now removed
arch/mips/lib/ashldi3.c will get added which will trigger regeneration,
so it won't error.

However if you do new -> old -> new then the .ashldi3.cmd file isn't
updated while at old, so you get the same error as above. I'm not sure
there's much we can practically do about that, aside perhaps avoiding
the issue in future by somehow auto-deleting stale .*.cmd files.

Cc'ing kbuild folk in case they have any bright ideas.

At least the straightforward old->new upgrade will work with the above
fixup though. If you're okay with it I'm happy to apply as a fixup.


Unbelievable how fragile this change is proving to be :-/
Yeah fixup looks good to me.

Thanks,
Matt



Cheers
James



Re: [PATCH v6 3/4] MIPS: vmlinuz: Use generic ashldi3

2018-04-18 Thread Matt Redfearn

Hi James,

On 18/04/18 00:09, James Hogan wrote:

On Wed, Apr 11, 2018 at 08:50:18AM +0100, Matt Redfearn wrote:

diff --git a/arch/mips/boot/compressed/Makefile 
b/arch/mips/boot/compressed/Makefile
index adce180f3ee4..e03f522c33ac 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -46,9 +46,12 @@ $(obj)/uart-ath79.c: 
$(srctree)/arch/mips/ath79/early_printk.c
  
  vmlinuzobjs-$(CONFIG_KERNEL_XZ) += $(obj)/ashldi3.o $(obj)/bswapsi.o
  
-extra-y += ashldi3.c bswapsi.c

-$(obj)/ashldi3.o $(obj)/bswapsi.o: KBUILD_CFLAGS += -I$(srctree)/arch/mips/lib
-$(obj)/ashldi3.c $(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c
+extra-y += ashldi3.c
+$(obj)/ashldi3.c: $(obj)/%.c: $(srctree)/lib/%.c
+   $(call cmd,shipped)
+
+extra-y += bswapsi.c
+$(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c
$(call cmd,shipped)


ci20_defconfig:

arch/mips/boot/compressed/ashldi3.c:4:10: fatal error: libgcc.h: No such file 
or directory
  #include "libgcc.h"
^~

It looks like it had already copied ashldi3.c from arch/mips/lib/ when
building an older commit, and it hasn't been regenerated from lib/ since
the Makefile changed, so its still using the old version.

I think it should be using FORCE and if_changed like this:

diff --git a/arch/mips/boot/compressed/Makefile 
b/arch/mips/boot/compressed/Makefile
index e03f522c33ac..abe77add8789 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -47,12 +47,12 @@ $(obj)/uart-ath79.c: 
$(srctree)/arch/mips/ath79/early_printk.c
  vmlinuzobjs-$(CONFIG_KERNEL_XZ) += $(obj)/ashldi3.o $(obj)/bswapsi.o
  
  extra-y += ashldi3.c

-$(obj)/ashldi3.c: $(obj)/%.c: $(srctree)/lib/%.c
-   $(call cmd,shipped)
+$(obj)/ashldi3.c: $(obj)/%.c: $(srctree)/lib/%.c FORCE
+   $(call if_changed,shipped)
  
  extra-y += bswapsi.c

-$(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c
-   $(call cmd,shipped)
+$(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c FORCE
+   $(call if_changed,shipped)
  
  targets := $(notdir $(vmlinuzobjs-y))
  
That resolves the build failures when checking out old -> new without

cleaning, since the .ashldi3.c.cmd is missing so it gets rebuilt.

It should also resolve issues if the path it copies from is updated in
future since the .ashldi3.c.cmd will get updated.

If you checkout new -> old without cleaning, the now removed
arch/mips/lib/ashldi3.c will get added which will trigger regeneration,
so it won't error.

However if you do new -> old -> new then the .ashldi3.cmd file isn't
updated while at old, so you get the same error as above. I'm not sure
there's much we can practically do about that, aside perhaps avoiding
the issue in future by somehow auto-deleting stale .*.cmd files.

Cc'ing kbuild folk in case they have any bright ideas.

At least the straightforward old->new upgrade will work with the above
fixup though. If you're okay with it I'm happy to apply as a fixup.


Unbelievable how fragile this change is proving to be :-/
Yeah fixup looks good to me.

Thanks,
Matt



Cheers
James



[PATCH v2 4/4] MIPS: memset.S: Add comments to fault fixup handlers

2018-04-17 Thread Matt Redfearn
It is not immediately obvious what the expected inputs to these fault
handlers is and how they calculate the number of unset bytes. Having
stared deeply at this in order to fix some corner cases, add some
comments to addist those who follow.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Changes in v2:
- Add comments to fault handlers in new, separate, patch.

 arch/mips/lib/memset.S | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 1cc306520a55..a06dabe99d4b 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -231,16 +231,25 @@
 
 #ifdef CONFIG_CPU_MIPSR6
 .Lbyte_fixup\@:
+   /*
+* unset_bytes = current_addr + 1
+*  a2 =  t0  + 1
+*/
PTR_SUBUa2, $0, t0
jr  ra
 PTR_ADDIU  a2, 1
 #endif /* CONFIG_CPU_MIPSR6 */
 
 .Lfirst_fixup\@:
+   /* unset_bytes already in a2 */
jr  ra
 nop
 
 .Lfwd_fixup\@:
+   /*
+* unset_bytes = partial_start_addr +  #bytes   - fault_addr
+*  a2 = t1 + (a2 & 3f) - $28->task->BUADDR
+*/
PTR_L   t0, TI_TASK($28)
andia2, 0x3f
LONG_L  t0, THREAD_BUADDR(t0)
@@ -249,6 +258,10 @@
 LONG_SUBU  a2, t0
 
 .Lpartial_fixup\@:
+   /*
+* unset_bytes = partial_end_addr +  #bytes - fault_addr
+*  a2 =   a0 + (a2 & STORMASK) - $28->task->BUADDR
+*/
PTR_L   t0, TI_TASK($28)
andia2, STORMASK
LONG_L  t0, THREAD_BUADDR(t0)
@@ -257,10 +270,15 @@
 LONG_SUBU  a2, t0
 
 .Llast_fixup\@:
+   /* unset_bytes already in a2 */
jr  ra
 nop
 
 .Lsmall_fixup\@:
+   /*
+* unset_bytes = end_addr - current_addr + 1
+*  a2 =t1-  a0  + 1
+*/
PTR_SUBUa2, t1, a0
jr  ra
 PTR_ADDIU  a2, 1
-- 
2.7.4



[PATCH v2 4/4] MIPS: memset.S: Add comments to fault fixup handlers

2018-04-17 Thread Matt Redfearn
It is not immediately obvious what the expected inputs to these fault
handlers is and how they calculate the number of unset bytes. Having
stared deeply at this in order to fix some corner cases, add some
comments to addist those who follow.

Signed-off-by: Matt Redfearn 
---

Changes in v2:
- Add comments to fault handlers in new, separate, patch.

 arch/mips/lib/memset.S | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 1cc306520a55..a06dabe99d4b 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -231,16 +231,25 @@
 
 #ifdef CONFIG_CPU_MIPSR6
 .Lbyte_fixup\@:
+   /*
+* unset_bytes = current_addr + 1
+*  a2 =  t0  + 1
+*/
PTR_SUBUa2, $0, t0
jr  ra
 PTR_ADDIU  a2, 1
 #endif /* CONFIG_CPU_MIPSR6 */
 
 .Lfirst_fixup\@:
+   /* unset_bytes already in a2 */
jr  ra
 nop
 
 .Lfwd_fixup\@:
+   /*
+* unset_bytes = partial_start_addr +  #bytes   - fault_addr
+*  a2 = t1 + (a2 & 3f) - $28->task->BUADDR
+*/
PTR_L   t0, TI_TASK($28)
andia2, 0x3f
LONG_L  t0, THREAD_BUADDR(t0)
@@ -249,6 +258,10 @@
 LONG_SUBU  a2, t0
 
 .Lpartial_fixup\@:
+   /*
+* unset_bytes = partial_end_addr +  #bytes - fault_addr
+*  a2 =   a0 + (a2 & STORMASK) - $28->task->BUADDR
+*/
PTR_L   t0, TI_TASK($28)
andia2, STORMASK
LONG_L  t0, THREAD_BUADDR(t0)
@@ -257,10 +270,15 @@
 LONG_SUBU  a2, t0
 
 .Llast_fixup\@:
+   /* unset_bytes already in a2 */
jr  ra
 nop
 
 .Lsmall_fixup\@:
+   /*
+* unset_bytes = end_addr - current_addr + 1
+*  a2 =t1-  a0  + 1
+*/
PTR_SUBUa2, t1, a0
jr  ra
 PTR_ADDIU  a2, 1
-- 
2.7.4



[PATCH v2 3/4] MIPS: memset.S: Reinstate delay slot indentation

2018-04-17 Thread Matt Redfearn
Assembly language within the MIPS kernel conventionally indents
instructions which are in a branch delay slot to make them easier to
see. Commit 8483b14aaa81 ("MIPS: lib: memset: Whitespace fixes") rather
inexplicably removed all of these indentations from memset.S. Reinstate
the convention for all instructions in a branch delay slot. This
effectively reverts the above commit, plus other locations introduced
with MIPSR6 support.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Changes in v2:
- Rebase delay slot indentation on v3 of "MIPS: memset.S: Fix return of
  __clear_user from Lpartial_fixup"

 arch/mips/lib/memset.S | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index f7327979a8f8..1cc306520a55 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -95,7 +95,7 @@
 
sltiu   t0, a2, STORSIZE/* very small region? */
bnezt0, .Lsmall_memset\@
-   andit0, a0, STORMASK/* aligned? */
+andi   t0, a0, STORMASK/* aligned? */
 
 #ifdef CONFIG_CPU_MICROMIPS
movet8, a1  /* used by 'swp' instruction */
@@ -103,12 +103,12 @@
 #endif
 #ifndef CONFIG_CPU_DADDI_WORKAROUNDS
beqzt0, 1f
-   PTR_SUBUt0, STORSIZE/* alignment in bytes */
+PTR_SUBU   t0, STORSIZE/* alignment in bytes */
 #else
.setnoat
li  AT, STORSIZE
beqzt0, 1f
-   PTR_SUBUt0, AT  /* alignment in bytes */
+PTR_SUBU   t0, AT  /* alignment in bytes */
.setat
 #endif
 
@@ -149,7 +149,7 @@
 1: ori t1, a2, 0x3f/* # of full blocks */
xorit1, 0x3f
beqzt1, .Lmemset_partial\@  /* no block to fill */
-   andit0, a2, 0x40-STORSIZE
+andi   t0, a2, 0x40-STORSIZE
 
PTR_ADDUt1, a0  /* end address */
.setreorder
@@ -174,7 +174,7 @@
.setat
 #endif
jr  t1
-   PTR_ADDUa0, t0  /* dest ptr */
+PTR_ADDU   a0, t0  /* dest ptr */
 
.setpush
.setnoreorder
@@ -186,7 +186,7 @@
 
beqza2, 1f
 #ifndef CONFIG_CPU_MIPSR6
-   PTR_ADDUa0, a2  /* What's left */
+PTR_ADDU   a0, a2  /* What's left */
R10KCBARRIER(0(ra))
 #ifdef __MIPSEB__
EX(LONG_S_R, a1, -1(a0), .Llast_fixup\@)
@@ -194,7 +194,7 @@
EX(LONG_S_L, a1, -1(a0), .Llast_fixup\@)
 #endif
 #else
-   PTR_SUBUt0, $0, a2
+PTR_SUBU   t0, $0, a2
PTR_ADDIU   t0, 1
STORE_BYTE(0)
STORE_BYTE(1)
@@ -210,11 +210,11 @@
 0:
 #endif
 1: jr  ra
-   movea2, zero
+move   a2, zero
 
 .Lsmall_memset\@:
beqza2, 2f
-   PTR_ADDUt1, a0, a2
+PTR_ADDU   t1, a0, a2
 
 1: PTR_ADDIU   a0, 1   /* fill bytewise */
R10KCBARRIER(0(ra))
@@ -222,7 +222,7 @@
 EX(sb, a1, -1(a0), .Lsmall_fixup\@)
 
 2: jr  ra  /* done */
-   movea2, zero
+move   a2, zero
.if __memset == 1
END(memset)
.set __memset, 0
@@ -238,7 +238,7 @@
 
 .Lfirst_fixup\@:
jr  ra
-   nop
+nop
 
 .Lfwd_fixup\@:
PTR_L   t0, TI_TASK($28)
@@ -246,7 +246,7 @@
LONG_L  t0, THREAD_BUADDR(t0)
LONG_ADDU   a2, t1
jr  ra
-   LONG_SUBU   a2, t0
+LONG_SUBU  a2, t0
 
 .Lpartial_fixup\@:
PTR_L   t0, TI_TASK($28)
@@ -254,7 +254,7 @@
LONG_L  t0, THREAD_BUADDR(t0)
LONG_ADDU   a2, a0
jr  ra
-   LONG_SUBU   a2, t0
+LONG_SUBU  a2, t0
 
 .Llast_fixup\@:
jr  ra
@@ -278,7 +278,7 @@
 LEAF(memset)
 EXPORT_SYMBOL(memset)
beqza1, 1f
-   movev0, a0  /* result */
+move   v0, a0  /* result */
 
andia1, 0xff/* spread fillword */
LONG_SLLt1, a1, 8
-- 
2.7.4



[PATCH v2 3/4] MIPS: memset.S: Reinstate delay slot indentation

2018-04-17 Thread Matt Redfearn
Assembly language within the MIPS kernel conventionally indents
instructions which are in a branch delay slot to make them easier to
see. Commit 8483b14aaa81 ("MIPS: lib: memset: Whitespace fixes") rather
inexplicably removed all of these indentations from memset.S. Reinstate
the convention for all instructions in a branch delay slot. This
effectively reverts the above commit, plus other locations introduced
with MIPSR6 support.

Signed-off-by: Matt Redfearn 
---

Changes in v2:
- Rebase delay slot indentation on v3 of "MIPS: memset.S: Fix return of
  __clear_user from Lpartial_fixup"

 arch/mips/lib/memset.S | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index f7327979a8f8..1cc306520a55 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -95,7 +95,7 @@
 
sltiu   t0, a2, STORSIZE/* very small region? */
bnezt0, .Lsmall_memset\@
-   andit0, a0, STORMASK/* aligned? */
+andi   t0, a0, STORMASK/* aligned? */
 
 #ifdef CONFIG_CPU_MICROMIPS
movet8, a1  /* used by 'swp' instruction */
@@ -103,12 +103,12 @@
 #endif
 #ifndef CONFIG_CPU_DADDI_WORKAROUNDS
beqzt0, 1f
-   PTR_SUBUt0, STORSIZE/* alignment in bytes */
+PTR_SUBU   t0, STORSIZE/* alignment in bytes */
 #else
.setnoat
li  AT, STORSIZE
beqzt0, 1f
-   PTR_SUBUt0, AT  /* alignment in bytes */
+PTR_SUBU   t0, AT  /* alignment in bytes */
.setat
 #endif
 
@@ -149,7 +149,7 @@
 1: ori t1, a2, 0x3f/* # of full blocks */
xorit1, 0x3f
beqzt1, .Lmemset_partial\@  /* no block to fill */
-   andit0, a2, 0x40-STORSIZE
+andi   t0, a2, 0x40-STORSIZE
 
PTR_ADDUt1, a0  /* end address */
.setreorder
@@ -174,7 +174,7 @@
.setat
 #endif
jr  t1
-   PTR_ADDUa0, t0  /* dest ptr */
+PTR_ADDU   a0, t0  /* dest ptr */
 
.setpush
.setnoreorder
@@ -186,7 +186,7 @@
 
beqza2, 1f
 #ifndef CONFIG_CPU_MIPSR6
-   PTR_ADDUa0, a2  /* What's left */
+PTR_ADDU   a0, a2  /* What's left */
R10KCBARRIER(0(ra))
 #ifdef __MIPSEB__
EX(LONG_S_R, a1, -1(a0), .Llast_fixup\@)
@@ -194,7 +194,7 @@
EX(LONG_S_L, a1, -1(a0), .Llast_fixup\@)
 #endif
 #else
-   PTR_SUBUt0, $0, a2
+PTR_SUBU   t0, $0, a2
PTR_ADDIU   t0, 1
STORE_BYTE(0)
STORE_BYTE(1)
@@ -210,11 +210,11 @@
 0:
 #endif
 1: jr  ra
-   movea2, zero
+move   a2, zero
 
 .Lsmall_memset\@:
beqza2, 2f
-   PTR_ADDUt1, a0, a2
+PTR_ADDU   t1, a0, a2
 
 1: PTR_ADDIU   a0, 1   /* fill bytewise */
R10KCBARRIER(0(ra))
@@ -222,7 +222,7 @@
 EX(sb, a1, -1(a0), .Lsmall_fixup\@)
 
 2: jr  ra  /* done */
-   movea2, zero
+move   a2, zero
.if __memset == 1
END(memset)
.set __memset, 0
@@ -238,7 +238,7 @@
 
 .Lfirst_fixup\@:
jr  ra
-   nop
+nop
 
 .Lfwd_fixup\@:
PTR_L   t0, TI_TASK($28)
@@ -246,7 +246,7 @@
LONG_L  t0, THREAD_BUADDR(t0)
LONG_ADDU   a2, t1
jr  ra
-   LONG_SUBU   a2, t0
+LONG_SUBU  a2, t0
 
 .Lpartial_fixup\@:
PTR_L   t0, TI_TASK($28)
@@ -254,7 +254,7 @@
LONG_L  t0, THREAD_BUADDR(t0)
LONG_ADDU   a2, a0
jr  ra
-   LONG_SUBU   a2, t0
+LONG_SUBU  a2, t0
 
 .Llast_fixup\@:
jr  ra
@@ -278,7 +278,7 @@
 LEAF(memset)
 EXPORT_SYMBOL(memset)
beqza1, 1f
-   movev0, a0  /* result */
+move   v0, a0  /* result */
 
andia1, 0xff/* spread fillword */
LONG_SLLt1, a1, 8
-- 
2.7.4



[PATCH v2 2/4] MIPS: uaccess: Add micromips clobbers to bzero invocation

2018-04-17 Thread Matt Redfearn
The micromips implementation of bzero additionally clobbers registers t7
& t8. Specify this in the clobbers list when invoking bzero.

Reported-by: James Hogan <jho...@kernel.org>
Fixes: 26c5e07d1478 ("MIPS: microMIPS: Optimise 'memset' core library 
function.")
Cc: sta...@vger.kernel.org
Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Changes in v2: None

 arch/mips/include/asm/uaccess.h | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h
index b71306947290..06629011a434 100644
--- a/arch/mips/include/asm/uaccess.h
+++ b/arch/mips/include/asm/uaccess.h
@@ -654,6 +654,13 @@ __clear_user(void __user *addr, __kernel_size_t size)
 {
__kernel_size_t res;
 
+#ifdef CONFIG_CPU_MICROMIPS
+/* micromips memset / bzero also clobbers t7 & t8 */
+#define bzero_clobbers "$4", "$5", "$6", __UA_t0, __UA_t1, "$15", "$24", "$31"
+#else
+#define bzero_clobbers "$4", "$5", "$6", __UA_t0, __UA_t1, "$31"
+#endif /* CONFIG_CPU_MICROMIPS */
+
if (eva_kernel_access()) {
__asm__ __volatile__(
"move\t$4, %1\n\t"
@@ -663,7 +670,7 @@ __clear_user(void __user *addr, __kernel_size_t size)
"move\t%0, $6"
: "=r" (res)
: "r" (addr), "r" (size)
-   : "$4", "$5", "$6", __UA_t0, __UA_t1, "$31");
+   : bzero_clobbers);
} else {
might_fault();
__asm__ __volatile__(
@@ -674,7 +681,7 @@ __clear_user(void __user *addr, __kernel_size_t size)
"move\t%0, $6"
: "=r" (res)
: "r" (addr), "r" (size)
-   : "$4", "$5", "$6", __UA_t0, __UA_t1, "$31");
+   : bzero_clobbers);
}
 
return res;
-- 
2.7.4



[PATCH v2 2/4] MIPS: uaccess: Add micromips clobbers to bzero invocation

2018-04-17 Thread Matt Redfearn
The micromips implementation of bzero additionally clobbers registers t7
& t8. Specify this in the clobbers list when invoking bzero.

Reported-by: James Hogan 
Fixes: 26c5e07d1478 ("MIPS: microMIPS: Optimise 'memset' core library 
function.")
Cc: sta...@vger.kernel.org
Signed-off-by: Matt Redfearn 
---

Changes in v2: None

 arch/mips/include/asm/uaccess.h | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h
index b71306947290..06629011a434 100644
--- a/arch/mips/include/asm/uaccess.h
+++ b/arch/mips/include/asm/uaccess.h
@@ -654,6 +654,13 @@ __clear_user(void __user *addr, __kernel_size_t size)
 {
__kernel_size_t res;
 
+#ifdef CONFIG_CPU_MICROMIPS
+/* micromips memset / bzero also clobbers t7 & t8 */
+#define bzero_clobbers "$4", "$5", "$6", __UA_t0, __UA_t1, "$15", "$24", "$31"
+#else
+#define bzero_clobbers "$4", "$5", "$6", __UA_t0, __UA_t1, "$31"
+#endif /* CONFIG_CPU_MICROMIPS */
+
if (eva_kernel_access()) {
__asm__ __volatile__(
"move\t$4, %1\n\t"
@@ -663,7 +670,7 @@ __clear_user(void __user *addr, __kernel_size_t size)
"move\t%0, $6"
: "=r" (res)
: "r" (addr), "r" (size)
-   : "$4", "$5", "$6", __UA_t0, __UA_t1, "$31");
+   : bzero_clobbers);
} else {
might_fault();
__asm__ __volatile__(
@@ -674,7 +681,7 @@ __clear_user(void __user *addr, __kernel_size_t size)
"move\t%0, $6"
: "=r" (res)
: "r" (addr), "r" (size)
-   : "$4", "$5", "$6", __UA_t0, __UA_t1, "$31");
+   : bzero_clobbers);
}
 
return res;
-- 
2.7.4



[PATCH v2 1/4] MIPS: memset.S: Fix clobber of v1 in last_fixup

2018-04-17 Thread Matt Redfearn
The label .Llast_fixup\@ is jumped to on page fault within the final
byte set loop of memset (on < MIPSR6 architectures). For some reason, in
this fault handler, the v1 register is randomly set to a2 & STORMASK.
This clobbers v1 for the calling function. This can be observed with the
following test code:

static int __init __attribute__((optimize("O0"))) test_clear_user(void)
{
  register int t asm("v1");
  char *test;
  int j, k;

  pr_info("\n\n\nTesting clear_user\n");
  test = vmalloc(PAGE_SIZE);

  for (j = 256; j < 512; j++) {
t = 0xa5a5a5a5;
if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) {
pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, 
k);
}
if (t != 0xa5a5a5a5) {
   pr_err("v1 was clobbered to 0x%x!\n", t);
}
  }

  return 0;
}
late_initcall(test_clear_user);

Which demonstrates that v1 is indeed clobbered (MIPS64):

Testing clear_user
v1 was clobbered to 0x1!
v1 was clobbered to 0x2!
v1 was clobbered to 0x3!
v1 was clobbered to 0x4!
v1 was clobbered to 0x5!
v1 was clobbered to 0x6!
v1 was clobbered to 0x7!

Since the number of bytes that could not be set is already contained in
a2, the andi placing a value in v1 is not necessary and actively
harmful in clobbering v1.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: sta...@vger.kernel.org
Reported-by: James Hogan <jho...@kernel.org>
Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Changes in v2: None

 arch/mips/lib/memset.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 184819c1d5c8..f7327979a8f8 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -258,7 +258,7 @@
 
 .Llast_fixup\@:
jr  ra
-   andiv1, a2, STORMASK
+nop
 
 .Lsmall_fixup\@:
PTR_SUBUa2, t1, a0
-- 
2.7.4



[PATCH v2 1/4] MIPS: memset.S: Fix clobber of v1 in last_fixup

2018-04-17 Thread Matt Redfearn
The label .Llast_fixup\@ is jumped to on page fault within the final
byte set loop of memset (on < MIPSR6 architectures). For some reason, in
this fault handler, the v1 register is randomly set to a2 & STORMASK.
This clobbers v1 for the calling function. This can be observed with the
following test code:

static int __init __attribute__((optimize("O0"))) test_clear_user(void)
{
  register int t asm("v1");
  char *test;
  int j, k;

  pr_info("\n\n\nTesting clear_user\n");
  test = vmalloc(PAGE_SIZE);

  for (j = 256; j < 512; j++) {
t = 0xa5a5a5a5;
if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) {
pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, 
k);
}
if (t != 0xa5a5a5a5) {
   pr_err("v1 was clobbered to 0x%x!\n", t);
}
  }

  return 0;
}
late_initcall(test_clear_user);

Which demonstrates that v1 is indeed clobbered (MIPS64):

Testing clear_user
v1 was clobbered to 0x1!
v1 was clobbered to 0x2!
v1 was clobbered to 0x3!
v1 was clobbered to 0x4!
v1 was clobbered to 0x5!
v1 was clobbered to 0x6!
v1 was clobbered to 0x7!

Since the number of bytes that could not be set is already contained in
a2, the andi placing a value in v1 is not necessary and actively
harmful in clobbering v1.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: sta...@vger.kernel.org
Reported-by: James Hogan 
Signed-off-by: Matt Redfearn 
---

Changes in v2: None

 arch/mips/lib/memset.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 184819c1d5c8..f7327979a8f8 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -258,7 +258,7 @@
 
 .Llast_fixup\@:
jr  ra
-   andiv1, a2, STORMASK
+nop
 
 .Lsmall_fixup\@:
PTR_SUBUa2, t1, a0
-- 
2.7.4



[PATCH v3] MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup

2018-04-17 Thread Matt Redfearn
The __clear_user function is defined to return the number of bytes that
could not be cleared. From the underlying memset / bzero implementation
this means setting register a2 to that number on return. Currently if a
page fault is triggered within the memset_partial block, the value
loaded into a2 on return is meaningless.

The label .Lpartial_fixup\@ is jumped to on page fault. In order to work
out how many bytes failed to copy, the exception handler should find how
many bytes left in the partial block (andi a2, STORMASK), add that to
the partial block end address (a2), and subtract the faulting address to
get the remainder. Currently it incorrectly subtracts the partial block
start address (t1), which has additionally has been clobbered to
generate a jump target in memset_partial. Fix this by adding the block
end address instead.

This issue was found with the following test code:
  int j, k;
  for (j = 0; j < 512; j++) {
if ((k = clear_user(NULL, j)) != j) {
   pr_err("clear_user (NULL %d) returned %d\n", j, k);
}
  }
Which now passes on Creator Ci40 (MIPS32) and Cavium Octeon II (MIPS64).

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: sta...@vger.kernel.org
Suggested-by: James Hogan <jho...@kernel.org>
Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>

---

Changes in v3:
- Just fix the issue at hand

Changes in v2:
- Use James Hogan's suggestion of replacing t1 with a0 to get the
  correct remainder count.

 arch/mips/lib/memset.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 90bcdf1224ee..184819c1d5c8 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -252,7 +252,7 @@
PTR_L   t0, TI_TASK($28)
andia2, STORMASK
LONG_L  t0, THREAD_BUADDR(t0)
-   LONG_ADDU   a2, t1
+   LONG_ADDU   a2, a0
jr  ra
LONG_SUBU   a2, t0
 
-- 
2.7.4



[PATCH v3] MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup

2018-04-17 Thread Matt Redfearn
The __clear_user function is defined to return the number of bytes that
could not be cleared. From the underlying memset / bzero implementation
this means setting register a2 to that number on return. Currently if a
page fault is triggered within the memset_partial block, the value
loaded into a2 on return is meaningless.

The label .Lpartial_fixup\@ is jumped to on page fault. In order to work
out how many bytes failed to copy, the exception handler should find how
many bytes left in the partial block (andi a2, STORMASK), add that to
the partial block end address (a2), and subtract the faulting address to
get the remainder. Currently it incorrectly subtracts the partial block
start address (t1), which has additionally has been clobbered to
generate a jump target in memset_partial. Fix this by adding the block
end address instead.

This issue was found with the following test code:
  int j, k;
  for (j = 0; j < 512; j++) {
if ((k = clear_user(NULL, j)) != j) {
   pr_err("clear_user (NULL %d) returned %d\n", j, k);
}
  }
Which now passes on Creator Ci40 (MIPS32) and Cavium Octeon II (MIPS64).

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: sta...@vger.kernel.org
Suggested-by: James Hogan 
Signed-off-by: Matt Redfearn 

---

Changes in v3:
- Just fix the issue at hand

Changes in v2:
- Use James Hogan's suggestion of replacing t1 with a0 to get the
  correct remainder count.

 arch/mips/lib/memset.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 90bcdf1224ee..184819c1d5c8 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -252,7 +252,7 @@
PTR_L   t0, TI_TASK($28)
andia2, STORMASK
LONG_L  t0, THREAD_BUADDR(t0)
-   LONG_ADDU   a2, t1
+   LONG_ADDU   a2, a0
jr  ra
LONG_SUBU   a2, t0
 
-- 
2.7.4



[PATCH 2/3] MIPS: uaccess: Add micromips clobbers to bzero invocation

2018-04-17 Thread Matt Redfearn
The micromips implementation of bzero additionally clobbers registers t7
& t8. Specify this in the clobbers list when invoking bzero.

Reported-by: James Hogan <jho...@kernel.org>
Fixes: 26c5e07d1478 ("MIPS: microMIPS: Optimise 'memset' core library 
function.")
Cc: sta...@vger.kernel.org
Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

 arch/mips/include/asm/uaccess.h | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h
index b71306947290..06629011a434 100644
--- a/arch/mips/include/asm/uaccess.h
+++ b/arch/mips/include/asm/uaccess.h
@@ -654,6 +654,13 @@ __clear_user(void __user *addr, __kernel_size_t size)
 {
__kernel_size_t res;
 
+#ifdef CONFIG_CPU_MICROMIPS
+/* micromips memset / bzero also clobbers t7 & t8 */
+#define bzero_clobbers "$4", "$5", "$6", __UA_t0, __UA_t1, "$15", "$24", "$31"
+#else
+#define bzero_clobbers "$4", "$5", "$6", __UA_t0, __UA_t1, "$31"
+#endif /* CONFIG_CPU_MICROMIPS */
+
if (eva_kernel_access()) {
__asm__ __volatile__(
"move\t$4, %1\n\t"
@@ -663,7 +670,7 @@ __clear_user(void __user *addr, __kernel_size_t size)
"move\t%0, $6"
: "=r" (res)
: "r" (addr), "r" (size)
-   : "$4", "$5", "$6", __UA_t0, __UA_t1, "$31");
+   : bzero_clobbers);
} else {
might_fault();
__asm__ __volatile__(
@@ -674,7 +681,7 @@ __clear_user(void __user *addr, __kernel_size_t size)
"move\t%0, $6"
: "=r" (res)
: "r" (addr), "r" (size)
-   : "$4", "$5", "$6", __UA_t0, __UA_t1, "$31");
+   : bzero_clobbers);
}
 
return res;
-- 
2.7.4



[PATCH 2/3] MIPS: uaccess: Add micromips clobbers to bzero invocation

2018-04-17 Thread Matt Redfearn
The micromips implementation of bzero additionally clobbers registers t7
& t8. Specify this in the clobbers list when invoking bzero.

Reported-by: James Hogan 
Fixes: 26c5e07d1478 ("MIPS: microMIPS: Optimise 'memset' core library 
function.")
Cc: sta...@vger.kernel.org
Signed-off-by: Matt Redfearn 
---

 arch/mips/include/asm/uaccess.h | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h
index b71306947290..06629011a434 100644
--- a/arch/mips/include/asm/uaccess.h
+++ b/arch/mips/include/asm/uaccess.h
@@ -654,6 +654,13 @@ __clear_user(void __user *addr, __kernel_size_t size)
 {
__kernel_size_t res;
 
+#ifdef CONFIG_CPU_MICROMIPS
+/* micromips memset / bzero also clobbers t7 & t8 */
+#define bzero_clobbers "$4", "$5", "$6", __UA_t0, __UA_t1, "$15", "$24", "$31"
+#else
+#define bzero_clobbers "$4", "$5", "$6", __UA_t0, __UA_t1, "$31"
+#endif /* CONFIG_CPU_MICROMIPS */
+
if (eva_kernel_access()) {
__asm__ __volatile__(
"move\t$4, %1\n\t"
@@ -663,7 +670,7 @@ __clear_user(void __user *addr, __kernel_size_t size)
"move\t%0, $6"
: "=r" (res)
: "r" (addr), "r" (size)
-   : "$4", "$5", "$6", __UA_t0, __UA_t1, "$31");
+   : bzero_clobbers);
} else {
might_fault();
__asm__ __volatile__(
@@ -674,7 +681,7 @@ __clear_user(void __user *addr, __kernel_size_t size)
"move\t%0, $6"
: "=r" (res)
: "r" (addr), "r" (size)
-   : "$4", "$5", "$6", __UA_t0, __UA_t1, "$31");
+   : bzero_clobbers);
}
 
return res;
-- 
2.7.4



[PATCH 3/3] MIPS: memset.S: Reinstate delay slot indentation

2018-04-17 Thread Matt Redfearn
Assembly language within the MIPS kernel conventionally indents
instructions which are in a branch delay slot to make them easier to
see. Commit 8483b14aaa81 ("MIPS: lib: memset: Whitespace fixes") rather
inexplicably removed all of these indentations from memset.S. Reinstate
the convention for all instructions in a branch delay slot. This
effectively reverts the above commit, plus other locations introduced
with MIPSR6 support.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

 arch/mips/lib/memset.S | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 84e91f4fdf53..085cc86f624f 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -95,7 +95,7 @@
 
sltiu   t0, a2, STORSIZE/* very small region? */
bnezt0, .Lsmall_memset\@
-   andit0, a0, STORMASK/* aligned? */
+andi   t0, a0, STORMASK/* aligned? */
 
 #ifdef CONFIG_CPU_MICROMIPS
movet8, a1  /* used by 'swp' instruction */
@@ -103,12 +103,12 @@
 #endif
 #ifndef CONFIG_CPU_DADDI_WORKAROUNDS
beqzt0, 1f
-   PTR_SUBUt0, STORSIZE/* alignment in bytes */
+PTR_SUBU   t0, STORSIZE/* alignment in bytes */
 #else
.setnoat
li  AT, STORSIZE
beqzt0, 1f
-   PTR_SUBUt0, AT  /* alignment in bytes */
+PTR_SUBU   t0, AT  /* alignment in bytes */
.setat
 #endif
 
@@ -149,7 +149,7 @@
 1: ori t1, a2, 0x3f/* # of full blocks */
xorit1, 0x3f
beqzt1, .Lmemset_partial\@  /* no block to fill */
-   andit0, a2, 0x40-STORSIZE
+andi   t0, a2, 0x40-STORSIZE
 
PTR_ADDUt1, a0  /* end address */
.setreorder
@@ -174,7 +174,7 @@
.setat
 #endif
jr  t1
-   PTR_ADDUa0, t0  /* dest ptr */
+PTR_ADDU   a0, t0  /* dest ptr */
 
.setpush
.setnoreorder
@@ -186,7 +186,7 @@
 
beqza2, 1f
 #ifndef CONFIG_CPU_MIPSR6
-   PTR_ADDUa0, a2  /* What's left */
+PTR_ADDU   a0, a2  /* What's left */
R10KCBARRIER(0(ra))
 #ifdef __MIPSEB__
EX(LONG_S_R, a1, -1(a0), .Llast_fixup\@)
@@ -194,7 +194,7 @@
EX(LONG_S_L, a1, -1(a0), .Llast_fixup\@)
 #endif
 #else
-   PTR_SUBUt0, $0, a2
+PTR_SUBU   t0, $0, a2
PTR_ADDIU   t0, 1
STORE_BYTE(0)
STORE_BYTE(1)
@@ -210,11 +210,11 @@
 0:
 #endif
 1: jr  ra
-   movea2, zero
+move   a2, zero
 
 .Lsmall_memset\@:
beqza2, 2f
-   PTR_ADDUt1, a0, a2
+PTR_ADDU   t1, a0, a2
 
 1: PTR_ADDIU   a0, 1   /* fill bytewise */
R10KCBARRIER(0(ra))
@@ -222,7 +222,7 @@
 EX(sb, a1, -1(a0), .Lsmall_fixup\@)
 
 2: jr  ra  /* done */
-   movea2, zero
+move   a2, zero
.if __memset == 1
END(memset)
.set __memset, 0
@@ -238,7 +238,7 @@
 
 .Lfirst_fixup\@:
jr  ra
-   nop
+nop
 
 .Lfwd_fixup\@:
PTR_L   t0, TI_TASK($28)
@@ -246,7 +246,7 @@
LONG_L  t0, THREAD_BUADDR(t0)
LONG_ADDU   a2, t1
jr  ra
-   LONG_SUBU   a2, t0
+LONG_SUBU  a2, t0
 
 .Lpartial_fixup\@:
PTR_L   t0, TI_TASK($28)
@@ -278,7 +278,7 @@
 LEAF(memset)
 EXPORT_SYMBOL(memset)
beqza1, 1f
-   movev0, a0  /* result */
+move   v0, a0  /* result */
 
andia1, 0xff/* spread fillword */
LONG_SLLt1, a1, 8
-- 
2.7.4



[PATCH 3/3] MIPS: memset.S: Reinstate delay slot indentation

2018-04-17 Thread Matt Redfearn
Assembly language within the MIPS kernel conventionally indents
instructions which are in a branch delay slot to make them easier to
see. Commit 8483b14aaa81 ("MIPS: lib: memset: Whitespace fixes") rather
inexplicably removed all of these indentations from memset.S. Reinstate
the convention for all instructions in a branch delay slot. This
effectively reverts the above commit, plus other locations introduced
with MIPSR6 support.

Signed-off-by: Matt Redfearn 
---

 arch/mips/lib/memset.S | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 84e91f4fdf53..085cc86f624f 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -95,7 +95,7 @@
 
sltiu   t0, a2, STORSIZE/* very small region? */
bnezt0, .Lsmall_memset\@
-   andit0, a0, STORMASK/* aligned? */
+andi   t0, a0, STORMASK/* aligned? */
 
 #ifdef CONFIG_CPU_MICROMIPS
movet8, a1  /* used by 'swp' instruction */
@@ -103,12 +103,12 @@
 #endif
 #ifndef CONFIG_CPU_DADDI_WORKAROUNDS
beqzt0, 1f
-   PTR_SUBUt0, STORSIZE/* alignment in bytes */
+PTR_SUBU   t0, STORSIZE/* alignment in bytes */
 #else
.setnoat
li  AT, STORSIZE
beqzt0, 1f
-   PTR_SUBUt0, AT  /* alignment in bytes */
+PTR_SUBU   t0, AT  /* alignment in bytes */
.setat
 #endif
 
@@ -149,7 +149,7 @@
 1: ori t1, a2, 0x3f/* # of full blocks */
xorit1, 0x3f
beqzt1, .Lmemset_partial\@  /* no block to fill */
-   andit0, a2, 0x40-STORSIZE
+andi   t0, a2, 0x40-STORSIZE
 
PTR_ADDUt1, a0  /* end address */
.setreorder
@@ -174,7 +174,7 @@
.setat
 #endif
jr  t1
-   PTR_ADDUa0, t0  /* dest ptr */
+PTR_ADDU   a0, t0  /* dest ptr */
 
.setpush
.setnoreorder
@@ -186,7 +186,7 @@
 
beqza2, 1f
 #ifndef CONFIG_CPU_MIPSR6
-   PTR_ADDUa0, a2  /* What's left */
+PTR_ADDU   a0, a2  /* What's left */
R10KCBARRIER(0(ra))
 #ifdef __MIPSEB__
EX(LONG_S_R, a1, -1(a0), .Llast_fixup\@)
@@ -194,7 +194,7 @@
EX(LONG_S_L, a1, -1(a0), .Llast_fixup\@)
 #endif
 #else
-   PTR_SUBUt0, $0, a2
+PTR_SUBU   t0, $0, a2
PTR_ADDIU   t0, 1
STORE_BYTE(0)
STORE_BYTE(1)
@@ -210,11 +210,11 @@
 0:
 #endif
 1: jr  ra
-   movea2, zero
+move   a2, zero
 
 .Lsmall_memset\@:
beqza2, 2f
-   PTR_ADDUt1, a0, a2
+PTR_ADDU   t1, a0, a2
 
 1: PTR_ADDIU   a0, 1   /* fill bytewise */
R10KCBARRIER(0(ra))
@@ -222,7 +222,7 @@
 EX(sb, a1, -1(a0), .Lsmall_fixup\@)
 
 2: jr  ra  /* done */
-   movea2, zero
+move   a2, zero
.if __memset == 1
END(memset)
.set __memset, 0
@@ -238,7 +238,7 @@
 
 .Lfirst_fixup\@:
jr  ra
-   nop
+nop
 
 .Lfwd_fixup\@:
PTR_L   t0, TI_TASK($28)
@@ -246,7 +246,7 @@
LONG_L  t0, THREAD_BUADDR(t0)
LONG_ADDU   a2, t1
jr  ra
-   LONG_SUBU   a2, t0
+LONG_SUBU  a2, t0
 
 .Lpartial_fixup\@:
PTR_L   t0, TI_TASK($28)
@@ -278,7 +278,7 @@
 LEAF(memset)
 EXPORT_SYMBOL(memset)
beqza1, 1f
-   movev0, a0  /* result */
+move   v0, a0  /* result */
 
andia1, 0xff/* spread fillword */
LONG_SLLt1, a1, 8
-- 
2.7.4



[PATCH 1/3] MIPS: memset.S: Fix clobber of v1 in last_fixup

2018-04-17 Thread Matt Redfearn
The label .Llast_fixup\@ is jumped to on page fault within the final
byte set loop of memset (on < MIPSR6 architectures). For some reason, in
this fault handler, the v1 register is randomly set to a2 & STORMASK.
This clobbers v1 for the calling function. This can be observed with the
following test code:

static int __init __attribute__((optimize("O0"))) test_clear_user(void)
{
  register int t asm("v1");
  char *test;
  int j, k;

  pr_info("\n\n\nTesting clear_user\n");
  test = vmalloc(PAGE_SIZE);

  for (j = 256; j < 512; j++) {
t = 0xa5a5a5a5;
if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) {
pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, 
k);
}
if (t != 0xa5a5a5a5) {
   pr_err("v1 was clobbered to 0x%x!\n", t);
}
  }

  return 0;
}
late_initcall(test_clear_user);

Which demonstrates that v1 is indeed clobbered (MIPS64):

Testing clear_user
v1 was clobbered to 0x1!
v1 was clobbered to 0x2!
v1 was clobbered to 0x3!
v1 was clobbered to 0x4!
v1 was clobbered to 0x5!
v1 was clobbered to 0x6!
v1 was clobbered to 0x7!

Since the number of bytes that could not be set is already contained in
a2, the andi placing a value in v1 is not necessary and actively
harmful in clobbering v1.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: sta...@vger.kernel.org
Reported-by: James Hogan <jho...@kernel.org>
Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

 arch/mips/lib/memset.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index fa3bec269331..84e91f4fdf53 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -258,7 +258,7 @@
 
 .Llast_fixup\@:
jr  ra
-   andiv1, a2, STORMASK
+nop
 
 .Lsmall_fixup\@:
PTR_SUBUa2, t1, a0
-- 
2.7.4



[PATCH 1/3] MIPS: memset.S: Fix clobber of v1 in last_fixup

2018-04-17 Thread Matt Redfearn
The label .Llast_fixup\@ is jumped to on page fault within the final
byte set loop of memset (on < MIPSR6 architectures). For some reason, in
this fault handler, the v1 register is randomly set to a2 & STORMASK.
This clobbers v1 for the calling function. This can be observed with the
following test code:

static int __init __attribute__((optimize("O0"))) test_clear_user(void)
{
  register int t asm("v1");
  char *test;
  int j, k;

  pr_info("\n\n\nTesting clear_user\n");
  test = vmalloc(PAGE_SIZE);

  for (j = 256; j < 512; j++) {
t = 0xa5a5a5a5;
if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) {
pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, 
k);
}
if (t != 0xa5a5a5a5) {
   pr_err("v1 was clobbered to 0x%x!\n", t);
}
  }

  return 0;
}
late_initcall(test_clear_user);

Which demonstrates that v1 is indeed clobbered (MIPS64):

Testing clear_user
v1 was clobbered to 0x1!
v1 was clobbered to 0x2!
v1 was clobbered to 0x3!
v1 was clobbered to 0x4!
v1 was clobbered to 0x5!
v1 was clobbered to 0x6!
v1 was clobbered to 0x7!

Since the number of bytes that could not be set is already contained in
a2, the andi placing a value in v1 is not necessary and actively
harmful in clobbering v1.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: sta...@vger.kernel.org
Reported-by: James Hogan 
Signed-off-by: Matt Redfearn 
---

 arch/mips/lib/memset.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index fa3bec269331..84e91f4fdf53 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -258,7 +258,7 @@
 
 .Llast_fixup\@:
jr  ra
-   andiv1, a2, STORMASK
+nop
 
 .Lsmall_fixup\@:
PTR_SUBUa2, t1, a0
-- 
2.7.4



[PATCH v2] MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup

2018-04-17 Thread Matt Redfearn
The __clear_user function is defined to return the number of bytes that
could not be cleared. From the underlying memset / bzero implementation
this means setting register a2 to that number on return. Currently if a
page fault is triggered within the memset_partial block, the value
loaded into a2 on return is meaningless.

The label .Lpartial_fixup\@ is jumped to on page fault. In order to work
out how many bytes failed to copy, the exception handler should find how
many bytes left in the partial block (andi a2, STORMASK), add that to
the partial block end address (a2), and subtract the faulting address to
get the remainder. Currently it incorrectly subtracts the partial block
start address (t1), which has additionally has been clobbered to
generate a jump target in memset_partial. Fix this by adding the block
end address instead.

Since this code is non-trivial to read, add comments to describe the
fault handling.

This issue was found with the following test code:
  int j, k;
  for (j = 0; j < 512; j++) {
if ((k = clear_user(NULL, j)) != j) {
   pr_err("clear_user (NULL %d) returned %d\n", j, k);
}
  }
Which now passes on Creator Ci40 (MIPS32) and Cavium Octeon II (MIPS64).

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: sta...@vger.kernel.org
Suggested-by: James Hogan <jho...@kernel.org>
Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>

---

Changes in v2:
- Use James Hogan's suggestion of replacing t1 with a0 to get the
  correct remainder count.
- Add comments to .Lpartial_fixup to aid those who next try to deciper
  this code.

 arch/mips/lib/memset.S | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 90bcdf1224ee..fa3bec269331 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -250,11 +250,11 @@
 
 .Lpartial_fixup\@:
PTR_L   t0, TI_TASK($28)
-   andia2, STORMASK
-   LONG_L  t0, THREAD_BUADDR(t0)
-   LONG_ADDU   a2, t1
+   andia2, STORMASK/* #Bytes beyond partial block */
+   LONG_L  t0, THREAD_BUADDR(t0)   /* Get faulting address */
+   LONG_ADDU   a2, a0  /* Add end address of partial block */
jr  ra
-   LONG_SUBU   a2, t0
+LONG_SUBU  a2, t0  /* a2 = partial_end + #bytes - fault */
 
 .Llast_fixup\@:
jr  ra
-- 
2.7.4



[PATCH v2] MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup

2018-04-17 Thread Matt Redfearn
The __clear_user function is defined to return the number of bytes that
could not be cleared. From the underlying memset / bzero implementation
this means setting register a2 to that number on return. Currently if a
page fault is triggered within the memset_partial block, the value
loaded into a2 on return is meaningless.

The label .Lpartial_fixup\@ is jumped to on page fault. In order to work
out how many bytes failed to copy, the exception handler should find how
many bytes left in the partial block (andi a2, STORMASK), add that to
the partial block end address (a2), and subtract the faulting address to
get the remainder. Currently it incorrectly subtracts the partial block
start address (t1), which has additionally has been clobbered to
generate a jump target in memset_partial. Fix this by adding the block
end address instead.

Since this code is non-trivial to read, add comments to describe the
fault handling.

This issue was found with the following test code:
  int j, k;
  for (j = 0; j < 512; j++) {
if ((k = clear_user(NULL, j)) != j) {
   pr_err("clear_user (NULL %d) returned %d\n", j, k);
}
  }
Which now passes on Creator Ci40 (MIPS32) and Cavium Octeon II (MIPS64).

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: sta...@vger.kernel.org
Suggested-by: James Hogan 
Signed-off-by: Matt Redfearn 

---

Changes in v2:
- Use James Hogan's suggestion of replacing t1 with a0 to get the
  correct remainder count.
- Add comments to .Lpartial_fixup to aid those who next try to deciper
  this code.

 arch/mips/lib/memset.S | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 90bcdf1224ee..fa3bec269331 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -250,11 +250,11 @@
 
 .Lpartial_fixup\@:
PTR_L   t0, TI_TASK($28)
-   andia2, STORMASK
-   LONG_L  t0, THREAD_BUADDR(t0)
-   LONG_ADDU   a2, t1
+   andia2, STORMASK/* #Bytes beyond partial block */
+   LONG_L  t0, THREAD_BUADDR(t0)   /* Get faulting address */
+   LONG_ADDU   a2, a0  /* Add end address of partial block */
jr  ra
-   LONG_SUBU   a2, t0
+LONG_SUBU  a2, t0  /* a2 = partial_end + #bytes - fault */
 
 .Llast_fixup\@:
jr  ra
-- 
2.7.4



Re: [PATCH 2/2] MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup

2018-04-17 Thread Matt Redfearn

Hi James,

On 16/04/18 23:13, James Hogan wrote:

On Thu, Mar 29, 2018 at 10:28:24AM +0100, Matt Redfearn wrote:

The __clear_user function is defined to return the number of bytes that
could not be cleared. From the underlying memset / bzero implementation
this means setting register a2 to that number on return. Currently if a
page fault is triggered within the memset_partial block, the value
loaded into a2 on return is meaningless.

The label .Lpartial_fixup\@ is jumped to on page fault. Currently it
masks the remaining count of bytes (a2) with STORMASK, meaning that the
least significant 2 (32bit) or 3 (64bit) bits of the remaining count are
always clear.


Are you sure about that. It seems to do that *to ensure those bits are
set correctly*...


Secondly, .Lpartial_fixup\@ expects t1 to contain the end address of the
copy. This is set up by the initial block:
PTR_ADDUt1, a0  /* end address */
However, the .Lmemset_partial\@ block then reuses register t1 to
calculate a jump through a block of word copies. This leaves it no
longer containing the end address of the copy operation if a page fault
occurs, and the remaining bytes calculation is incorrect.

Fix these issues by removing the and of a2 with STORMASK, and replace t1
with register t2 in the .Lmemset_partial\@ block.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: sta...@vger.kernel.org
Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

  arch/mips/lib/memset.S | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 90bcdf1224ee..3257dca58cad 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -161,19 +161,19 @@
  
  .Lmemset_partial\@:

R10KCBARRIER(0(ra))
-   PTR_LA  t1, 2f  /* where to start */
+   PTR_LA  t2, 2f  /* where to start */
  #ifdef CONFIG_CPU_MICROMIPS
LONG_SRLt7, t0, 1


Hmm, on microMIPS t7 isn't on the clobber list for __bzero, and nor is
t8...


  #endif
  #if LONGSIZE == 4
-   PTR_SUBUt1, FILLPTRG
+   PTR_SUBUt2, FILLPTRG
  #else
.setnoat
LONG_SRLAT, FILLPTRG, 1
-   PTR_SUBUt1, AT
+   PTR_SUBUt2, AT
.setat
  #endif
-   jr  t1
+   jr  t2
PTR_ADDUa0, t0  /* dest ptr */


^^^ note this...

  
  	.set		push

@@ -250,7 +250,6 @@
  
  .Lpartial_fixup\@:

PTR_L   t0, TI_TASK($28)
-   andia2, STORMASK


... this isn't right.

If I read correctly, t1 (after the above change stops clobbering it) is
the end of the full 64-byte blocks, i.e. the start address of the final
partial block.


The .Lfwd_fixup calculation (for full blocks) appears to be:

   a2 = ((len & 0x3f) + start_of_partial) - badvaddr

which is spot on. (len & 0x3f) is the partial block and remaining bytes
that haven't been set yet, add start_of_partial to get end of the full
range, subtract bad address to find how much didn't copy.


The calculation for .Lpartial_fixup however appears to (currently) do:

   a2 = ((len & STORMASK) + start_of_partial) - badvaddr

Which might make sense if start_of_partial (t1) was replaced with
end_of_partial, which does seem to be calculated as noted above, and put
in a0 ready for the final few bytes to be set.


LONG_L  t0, THREAD_BUADDR(t0)
LONG_ADDU   a2, t1


^^ So I think either it needs to just s/t1/a0/ here and not bother
preserving t1 above (smaller change and probably the original intent),
or preserve t1 and mask 0x3f instead of STORMASK like .Lfwd_fixup does
(which would work but seems needlessly complicated to me).

Does that make any sense or have I misunderstood some subtlety?


Thanks for taking the time to work this through - you're right, changing 
t1 to a0 in the fault handler does give the right result and is much 
less invasive.  Updated patch incoming :-)


Thanks,
Matt




Cheers
James


jr  ra
--
2.7.4



Re: [PATCH 2/2] MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup

2018-04-17 Thread Matt Redfearn

Hi James,

On 16/04/18 23:13, James Hogan wrote:

On Thu, Mar 29, 2018 at 10:28:24AM +0100, Matt Redfearn wrote:

The __clear_user function is defined to return the number of bytes that
could not be cleared. From the underlying memset / bzero implementation
this means setting register a2 to that number on return. Currently if a
page fault is triggered within the memset_partial block, the value
loaded into a2 on return is meaningless.

The label .Lpartial_fixup\@ is jumped to on page fault. Currently it
masks the remaining count of bytes (a2) with STORMASK, meaning that the
least significant 2 (32bit) or 3 (64bit) bits of the remaining count are
always clear.


Are you sure about that. It seems to do that *to ensure those bits are
set correctly*...


Secondly, .Lpartial_fixup\@ expects t1 to contain the end address of the
copy. This is set up by the initial block:
PTR_ADDUt1, a0  /* end address */
However, the .Lmemset_partial\@ block then reuses register t1 to
calculate a jump through a block of word copies. This leaves it no
longer containing the end address of the copy operation if a page fault
occurs, and the remaining bytes calculation is incorrect.

Fix these issues by removing the and of a2 with STORMASK, and replace t1
with register t2 in the .Lmemset_partial\@ block.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: sta...@vger.kernel.org
Signed-off-by: Matt Redfearn 
---

  arch/mips/lib/memset.S | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 90bcdf1224ee..3257dca58cad 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -161,19 +161,19 @@
  
  .Lmemset_partial\@:

R10KCBARRIER(0(ra))
-   PTR_LA  t1, 2f  /* where to start */
+   PTR_LA  t2, 2f  /* where to start */
  #ifdef CONFIG_CPU_MICROMIPS
LONG_SRLt7, t0, 1


Hmm, on microMIPS t7 isn't on the clobber list for __bzero, and nor is
t8...


  #endif
  #if LONGSIZE == 4
-   PTR_SUBUt1, FILLPTRG
+   PTR_SUBUt2, FILLPTRG
  #else
.setnoat
LONG_SRLAT, FILLPTRG, 1
-   PTR_SUBUt1, AT
+   PTR_SUBUt2, AT
.setat
  #endif
-   jr  t1
+   jr  t2
PTR_ADDUa0, t0  /* dest ptr */


^^^ note this...

  
  	.set		push

@@ -250,7 +250,6 @@
  
  .Lpartial_fixup\@:

PTR_L   t0, TI_TASK($28)
-   andia2, STORMASK


... this isn't right.

If I read correctly, t1 (after the above change stops clobbering it) is
the end of the full 64-byte blocks, i.e. the start address of the final
partial block.


The .Lfwd_fixup calculation (for full blocks) appears to be:

   a2 = ((len & 0x3f) + start_of_partial) - badvaddr

which is spot on. (len & 0x3f) is the partial block and remaining bytes
that haven't been set yet, add start_of_partial to get end of the full
range, subtract bad address to find how much didn't copy.


The calculation for .Lpartial_fixup however appears to (currently) do:

   a2 = ((len & STORMASK) + start_of_partial) - badvaddr

Which might make sense if start_of_partial (t1) was replaced with
end_of_partial, which does seem to be calculated as noted above, and put
in a0 ready for the final few bytes to be set.


LONG_L  t0, THREAD_BUADDR(t0)
LONG_ADDU   a2, t1


^^ So I think either it needs to just s/t1/a0/ here and not bother
preserving t1 above (smaller change and probably the original intent),
or preserve t1 and mask 0x3f instead of STORMASK like .Lfwd_fixup does
(which would work but seems needlessly complicated to me).

Does that make any sense or have I misunderstood some subtlety?


Thanks for taking the time to work this through - you're right, changing 
t1 to a0 in the fault handler does give the right result and is much 
less invasive.  Updated patch incoming :-)


Thanks,
Matt




Cheers
James


jr  ra
--
2.7.4



Re: [PATCH 1/2] MIPS: memset.S: EVA & fault support for small_memset

2018-04-17 Thread Matt Redfearn

Hi James,

On 16/04/18 21:22, James Hogan wrote:

On Thu, Mar 29, 2018 at 10:28:23AM +0100, Matt Redfearn wrote:

@@ -260,6 +260,11 @@
jr  ra
andiv1, a2, STORMASK


This patch looks good, well spotted!

But whats that v1 write about? Any ideas? Seems to go back to the git
epoch, and $3 isn't in the clobber lists when __bzero* is called.


No idea what the original intent was, but I've verified that v1 does 
indeed get clobbered if this path is hit. Patch incoming!


Thanks,
Matt



Cheers
James

  
+.Lsmall_fixup\@:

+   PTR_SUBUa2, t1, a0
+   jr  ra
+PTR_ADDIU  a2, 1
+


Re: [PATCH 1/2] MIPS: memset.S: EVA & fault support for small_memset

2018-04-17 Thread Matt Redfearn

Hi James,

On 16/04/18 21:22, James Hogan wrote:

On Thu, Mar 29, 2018 at 10:28:23AM +0100, Matt Redfearn wrote:

@@ -260,6 +260,11 @@
jr  ra
andiv1, a2, STORMASK


This patch looks good, well spotted!

But whats that v1 write about? Any ideas? Seems to go back to the git
epoch, and $3 isn't in the clobber lists when __bzero* is called.


No idea what the original intent was, but I've verified that v1 does 
indeed get clobbered if this path is hit. Patch incoming!


Thanks,
Matt



Cheers
James

  
+.Lsmall_fixup\@:

+   PTR_SUBUa2, t1, a0
+   jr  ra
+PTR_ADDIU  a2, 1
+


[PATCH] MIPS: dts: Boston: Fix PCI bus dtc warnings:

2018-04-13 Thread Matt Redfearn
dtc recently (v1.4.4-8-g756ffc4f52f6) added PCI bus checks. Fix the
warnings now emitted:

arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@1000:
missing bus-range for PCI bridge
arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@1200:
missing bus-range for PCI bridge
arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@1400:
missing bus-range for PCI bridge

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

 arch/mips/boot/dts/img/boston.dts | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/mips/boot/dts/img/boston.dts 
b/arch/mips/boot/dts/img/boston.dts
index 2cd49b60e030..f7aad80c69ab 100644
--- a/arch/mips/boot/dts/img/boston.dts
+++ b/arch/mips/boot/dts/img/boston.dts
@@ -51,6 +51,8 @@
ranges = <0x0200 0 0x4000
  0x4000 0 0x4000>;
 
+   bus-range = <0x00 0xff>;
+
interrupt-map-mask = <0 0 0 7>;
interrupt-map = <0 0 0 1 _intc 1>,
<0 0 0 2 _intc 2>,
@@ -79,6 +81,8 @@
ranges = <0x0200 0 0x2000
  0x2000 0 0x2000>;
 
+   bus-range = <0x00 0xff>;
+
interrupt-map-mask = <0 0 0 7>;
interrupt-map = <0 0 0 1 _intc 1>,
<0 0 0 2 _intc 2>,
@@ -107,6 +111,8 @@
ranges = <0x0200 0 0x1600
  0x1600 0 0x10>;
 
+   bus-range = <0x00 0xff>;
+
interrupt-map-mask = <0 0 0 7>;
interrupt-map = <0 0 0 1 _intc 1>,
<0 0 0 2 _intc 2>,
-- 
2.7.4



[PATCH] MIPS: dts: Boston: Fix PCI bus dtc warnings:

2018-04-13 Thread Matt Redfearn
dtc recently (v1.4.4-8-g756ffc4f52f6) added PCI bus checks. Fix the
warnings now emitted:

arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@1000:
missing bus-range for PCI bridge
arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@1200:
missing bus-range for PCI bridge
arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@1400:
missing bus-range for PCI bridge

Signed-off-by: Matt Redfearn 
---

 arch/mips/boot/dts/img/boston.dts | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/mips/boot/dts/img/boston.dts 
b/arch/mips/boot/dts/img/boston.dts
index 2cd49b60e030..f7aad80c69ab 100644
--- a/arch/mips/boot/dts/img/boston.dts
+++ b/arch/mips/boot/dts/img/boston.dts
@@ -51,6 +51,8 @@
ranges = <0x0200 0 0x4000
  0x4000 0 0x4000>;
 
+   bus-range = <0x00 0xff>;
+
interrupt-map-mask = <0 0 0 7>;
interrupt-map = <0 0 0 1 _intc 1>,
<0 0 0 2 _intc 2>,
@@ -79,6 +81,8 @@
ranges = <0x0200 0 0x2000
  0x2000 0 0x2000>;
 
+   bus-range = <0x00 0xff>;
+
interrupt-map-mask = <0 0 0 7>;
interrupt-map = <0 0 0 1 _intc 1>,
<0 0 0 2 _intc 2>,
@@ -107,6 +111,8 @@
ranges = <0x0200 0 0x1600
  0x1600 0 0x10>;
 
+   bus-range = <0x00 0xff>;
+
interrupt-map-mask = <0 0 0 7>;
interrupt-map = <0 0 0 1 _intc 1>,
<0 0 0 2 _intc 2>,
-- 
2.7.4



[PATCH v2 6/6] MIPS: perf: Fix BMIPS5000 system mode counting

2018-04-12 Thread Matt Redfearn
When perf is used in system mode, i.e. specifying a set of CPUs to
count (perf -a -C cpu), event->cpu is set to the CPU number on which
events should be counted. The current BMIPS500 variation of
mipsxx_pmu_enable_event only over sets the counter to count the current
CPU, so system mode does not work.

Fix this by removing this BMIPS5000 specific path and integrating it
with the generic one. Since BMIPS5000 uses specific extensions to the
perf control register, different fields must be set up to count the
relevant CPU.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Changes in v2:
New patch to fix BMIPS5000 system mode perf.

Florian, I don't have access to a BMIPS5000 board, but from code
inspection only I suspect this patch is necessary to have system mode
work. If someone could test that would be appreciated.

---
 arch/mips/include/asm/mipsregs.h |  1 +
 arch/mips/kernel/perf_event_mipsxx.c | 17 ++---
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index a4b02bc8..3e1fbb7aaa2a 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -735,6 +735,7 @@
 #define MIPS_PERFCTRL_MT_EN_TC (_ULCAST_(2) << 20)
 
 /* PerfCnt control register MT extensions used by BMIPS5000 */
+#define BRCM_PERFCTRL_VPEID(v) (_ULCAST_(1) << (12 + v))
 #define BRCM_PERFCTRL_TC   (_ULCAST_(1) << 30)
 
 /* PerfCnt control register MT extensions used by Netlogic XLR */
diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 389e346e9cf3..37cbb93aa521 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -366,16 +366,7 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
/* Make sure interrupt enabled. */
MIPS_PERFCTRL_IE;
 
-#ifdef CONFIG_CPU_BMIPS5000
-   {
-   /* enable the counter for the calling thread */
-   unsigned int vpe_id;
-
-   vpe_id = smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK;
-   cpuc->saved_ctrl[idx] |= BIT(12 + vpe_id) | BRCM_PERFCTRL_TC;
-   }
-#else
-#ifdef CONFIG_MIPS_MT_SMP
+#if defined(CONFIG_MIPS_MT_SMP) && !defined(CONFIG_CPU_BMIPS5000)
if (range > V) {
/* The counter is processor wide. Set it up to count all TCs. */
pr_debug("Enabling perf counter for all TCs\n");
@@ -392,12 +383,16 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
 */
cpu = (event->cpu >= 0) ? event->cpu : smp_processor_id();
 
+#if defined(CONFIG_CPU_BMIPS5000)
+   ctrl = BRCM_PERFCTRL_VPEID(cpu & MIPS_CPUID_TO_COUNTER_MASK);
+   ctrl |= BRCM_PERFCTRL_TC;
+#else
ctrl = M_PERFCTL_VPEID(cpu_vpe_id(_data[cpu]));
ctrl |= M_TC_EN_VPE;
+#endif
cpuc->saved_ctrl[idx] |= ctrl;
pr_debug("Enabling perf counter for CPU%d\n", cpu);
}
-#endif /* CONFIG_CPU_BMIPS5000 */
/*
 * We do not actually let the counter run. Leave it until start().
 */
-- 
2.7.4



[PATCH v2 6/6] MIPS: perf: Fix BMIPS5000 system mode counting

2018-04-12 Thread Matt Redfearn
When perf is used in system mode, i.e. specifying a set of CPUs to
count (perf -a -C cpu), event->cpu is set to the CPU number on which
events should be counted. The current BMIPS500 variation of
mipsxx_pmu_enable_event only over sets the counter to count the current
CPU, so system mode does not work.

Fix this by removing this BMIPS5000 specific path and integrating it
with the generic one. Since BMIPS5000 uses specific extensions to the
perf control register, different fields must be set up to count the
relevant CPU.

Signed-off-by: Matt Redfearn 
---

Changes in v2:
New patch to fix BMIPS5000 system mode perf.

Florian, I don't have access to a BMIPS5000 board, but from code
inspection only I suspect this patch is necessary to have system mode
work. If someone could test that would be appreciated.

---
 arch/mips/include/asm/mipsregs.h |  1 +
 arch/mips/kernel/perf_event_mipsxx.c | 17 ++---
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index a4b02bc8..3e1fbb7aaa2a 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -735,6 +735,7 @@
 #define MIPS_PERFCTRL_MT_EN_TC (_ULCAST_(2) << 20)
 
 /* PerfCnt control register MT extensions used by BMIPS5000 */
+#define BRCM_PERFCTRL_VPEID(v) (_ULCAST_(1) << (12 + v))
 #define BRCM_PERFCTRL_TC   (_ULCAST_(1) << 30)
 
 /* PerfCnt control register MT extensions used by Netlogic XLR */
diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 389e346e9cf3..37cbb93aa521 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -366,16 +366,7 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
/* Make sure interrupt enabled. */
MIPS_PERFCTRL_IE;
 
-#ifdef CONFIG_CPU_BMIPS5000
-   {
-   /* enable the counter for the calling thread */
-   unsigned int vpe_id;
-
-   vpe_id = smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK;
-   cpuc->saved_ctrl[idx] |= BIT(12 + vpe_id) | BRCM_PERFCTRL_TC;
-   }
-#else
-#ifdef CONFIG_MIPS_MT_SMP
+#if defined(CONFIG_MIPS_MT_SMP) && !defined(CONFIG_CPU_BMIPS5000)
if (range > V) {
/* The counter is processor wide. Set it up to count all TCs. */
pr_debug("Enabling perf counter for all TCs\n");
@@ -392,12 +383,16 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
 */
cpu = (event->cpu >= 0) ? event->cpu : smp_processor_id();
 
+#if defined(CONFIG_CPU_BMIPS5000)
+   ctrl = BRCM_PERFCTRL_VPEID(cpu & MIPS_CPUID_TO_COUNTER_MASK);
+   ctrl |= BRCM_PERFCTRL_TC;
+#else
ctrl = M_PERFCTL_VPEID(cpu_vpe_id(_data[cpu]));
ctrl |= M_TC_EN_VPE;
+#endif
cpuc->saved_ctrl[idx] |= ctrl;
pr_debug("Enabling perf counter for CPU%d\n", cpu);
}
-#endif /* CONFIG_CPU_BMIPS5000 */
/*
 * We do not actually let the counter run. Leave it until start().
 */
-- 
2.7.4



[PATCH v2 5/6] MIPS: perf: Fold vpe_id() macro into it's one last usage

2018-04-12 Thread Matt Redfearn
The vpe_id() macro is now used only in mipsxx_pmu_enable_event when
CONFIG_CPU_BMIPS5000 is defined. Fold the one used definition of the
macro into it's usage and remove the now unused definitions.

Since we know that cpu_has_mipsmt_pertccounters == 0 on BMIPS5000,
remove the test on it and just set the counter to count the relevant
VPE.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>

---

Changes in v2:
Since BMIPS5000 does not implement per TC counters, we can remove the
check on cpu_has_mipsmt_pertccounters.

 arch/mips/kernel/perf_event_mipsxx.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 6c9b5d64a9ef..389e346e9cf3 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -134,18 +134,6 @@ static int cpu_has_mipsmt_pertccounters;
 static DEFINE_SPINLOCK(core_counters_lock);
 
 static DEFINE_RWLOCK(pmuint_rwlock);
-
-#if defined(CONFIG_CPU_BMIPS5000)
-#define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
-0 : (smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK))
-#else
-#define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
-0 : cpu_vpe_id(_cpu_data))
-#endif
-
-#else /* !CONFIG_MIPS_PERF_SHARED_TC_COUNTERS */
-#define vpe_id()   0
-
 #endif /* CONFIG_MIPS_PERF_SHARED_TC_COUNTERS */
 
 static void resume_local_counters(void);
@@ -381,8 +369,10 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
 #ifdef CONFIG_CPU_BMIPS5000
{
/* enable the counter for the calling thread */
-   cpuc->saved_ctrl[idx] |=
-   (1 << (12 + vpe_id())) | BRCM_PERFCTRL_TC;
+   unsigned int vpe_id;
+
+   vpe_id = smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK;
+   cpuc->saved_ctrl[idx] |= BIT(12 + vpe_id) | BRCM_PERFCTRL_TC;
}
 #else
 #ifdef CONFIG_MIPS_MT_SMP
-- 
2.7.4



[PATCH v2 5/6] MIPS: perf: Fold vpe_id() macro into it's one last usage

2018-04-12 Thread Matt Redfearn
The vpe_id() macro is now used only in mipsxx_pmu_enable_event when
CONFIG_CPU_BMIPS5000 is defined. Fold the one used definition of the
macro into it's usage and remove the now unused definitions.

Since we know that cpu_has_mipsmt_pertccounters == 0 on BMIPS5000,
remove the test on it and just set the counter to count the relevant
VPE.

Signed-off-by: Matt Redfearn 

---

Changes in v2:
Since BMIPS5000 does not implement per TC counters, we can remove the
check on cpu_has_mipsmt_pertccounters.

 arch/mips/kernel/perf_event_mipsxx.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 6c9b5d64a9ef..389e346e9cf3 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -134,18 +134,6 @@ static int cpu_has_mipsmt_pertccounters;
 static DEFINE_SPINLOCK(core_counters_lock);
 
 static DEFINE_RWLOCK(pmuint_rwlock);
-
-#if defined(CONFIG_CPU_BMIPS5000)
-#define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
-0 : (smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK))
-#else
-#define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
-0 : cpu_vpe_id(_cpu_data))
-#endif
-
-#else /* !CONFIG_MIPS_PERF_SHARED_TC_COUNTERS */
-#define vpe_id()   0
-
 #endif /* CONFIG_MIPS_PERF_SHARED_TC_COUNTERS */
 
 static void resume_local_counters(void);
@@ -381,8 +369,10 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
 #ifdef CONFIG_CPU_BMIPS5000
{
/* enable the counter for the calling thread */
-   cpuc->saved_ctrl[idx] |=
-   (1 << (12 + vpe_id())) | BRCM_PERFCTRL_TC;
+   unsigned int vpe_id;
+
+   vpe_id = smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK;
+   cpuc->saved_ctrl[idx] |= BIT(12 + vpe_id) | BRCM_PERFCTRL_TC;
}
 #else
 #ifdef CONFIG_MIPS_MT_SMP
-- 
2.7.4



[PATCH v2 4/6] MIPS: perf: Allocate per-core counters on demand

2018-04-12 Thread Matt Redfearn
Previously when performance counters are per-core, rather than
per-thread, the number available were divided by 2 on detection, and the
counters used by each thread in a core were "swizzled" to ensure
separation. However, this solution is suboptimal since it relies on a
couple of assumptions:
a) Always having 2 VPEs / core (number of counters was divided by 2)
b) Always having a number of counters implemented in the core that is
   divisible by 2. For instance if an SoC implementation had a single
   counter and 2 VPEs per core, then this logic would fail and no
   performance counters would be available.
The mechanism also does not allow for one VPE in a core using more than
it's allocation of the per-core counters to count multiple events even
though other VPEs may not be using them.

Fix this situation by instead allocating (and releasing) per-core
performance counters when they are requested. This approach removes the
above assumptions and fixes the shortcomings.

In order to do this:
Add additional logic to mipsxx_pmu_alloc_counter() to detect if a
sibling is using a per-core counter, and to allocate a per-core counter
in all sibling CPUs.
Similarly, add a mipsxx_pmu_free_counter() function to release a
per-core counter in all sibling CPUs when it is finished with.
A new spinlock, core_counters_lock, is introduced to ensure exclusivity
when allocating and releasing per-core counters.
Since counters are now allocated per-core on demand, rather than being
reserved per-thread at boot, all of the "swizzling" of counters is
removed.

The upshot is that in an SoC with 2 counters / thread, counters are
reported as:
Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
available to each CPU, irq 18

Running an instance of a test program on each of 2 threads in a
core, both threads can use their 2 counters to count 2 events:

taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
perf stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

   0.005164264 seconds time elapsed
 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

   0.006139975 seconds time elapsed

In an SoC with 2 counters / core (which can be forced by setting
cpu_has_mipsmt_pertccounters = 0), counters are reported as:
Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
available to each core, irq 18

Running an instance of a test program on each of 2 threads in a
core, now only one thread manages to secure the performance counters to
count 2 events. The other thread does not get any counters.

taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
perf stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

instructions:u
branches:u

   0.005179533 seconds time elapsed

 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

   0.005179467 seconds time elapsed

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Changes in v2:
- Fix !#ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS build
- re-use cpuc variable in mipsxx_pmu_alloc_counter,
  mipsxx_pmu_free_counter rather than having sibling_ version.

 arch/mips/kernel/perf_event_mipsxx.c | 127 +++
 1 file changed, 85 insertions(+), 42 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 0373087abee8..6c9b5d64a9ef 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -131,6 +131,8 @@ static struct mips_pmu mipspmu;
 #ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS
 static int cpu_has_mipsmt_pertccounters;
 
+static DEFINE_SPINLOCK(core_counters_lock);
+
 static DEFINE_RWLOCK(pmuint_rwlock);
 
 #if defined(CONFIG_CPU_BMIPS5000)
@@ -141,20 +143,6 @@ static DEFINE_RWLOCK(pmuint_rwlock);
 0 : cpu_vpe_id(_cpu_data))
 #endif
 
-/* Copied from op_model_mipsxx.c */
-static unsigned int vpe_shift(void)
-{
-   if (num_possible_cpus() > 1)
-   return 1;
-
-   return 0;
-}
-
-static unsigned int counters_total_to_per_cpu(unsigned int counters)
-{
-   return counters >> vpe_shift();
-}
-
 #else /* !CONFIG_MIPS_PERF_SHARED_TC_COUNTERS */
 #define vpe_id()   0
 
@@ -165,17 +153,8 @@ static void pause_local_counters(void);
 static irqreturn_t mipsxx_pmu_handle_irq(int, void *);
 static int mipsxx_pmu_handle_shared_irq(void);
 
-static unsigned int mipsxx_pmu_swizzle_perf_idx(unsigned int idx)
-{
-   if (vpe_id() == 1)
-   idx = (idx + 2) & 3;
-   return idx;
-}
-
 static u64 mipsxx_pmu_read_counter(unsigned int idx)
 {
-   idx = mipsxx_pmu_swizzle_perf_id

[PATCH v2 4/6] MIPS: perf: Allocate per-core counters on demand

2018-04-12 Thread Matt Redfearn
Previously when performance counters are per-core, rather than
per-thread, the number available were divided by 2 on detection, and the
counters used by each thread in a core were "swizzled" to ensure
separation. However, this solution is suboptimal since it relies on a
couple of assumptions:
a) Always having 2 VPEs / core (number of counters was divided by 2)
b) Always having a number of counters implemented in the core that is
   divisible by 2. For instance if an SoC implementation had a single
   counter and 2 VPEs per core, then this logic would fail and no
   performance counters would be available.
The mechanism also does not allow for one VPE in a core using more than
it's allocation of the per-core counters to count multiple events even
though other VPEs may not be using them.

Fix this situation by instead allocating (and releasing) per-core
performance counters when they are requested. This approach removes the
above assumptions and fixes the shortcomings.

In order to do this:
Add additional logic to mipsxx_pmu_alloc_counter() to detect if a
sibling is using a per-core counter, and to allocate a per-core counter
in all sibling CPUs.
Similarly, add a mipsxx_pmu_free_counter() function to release a
per-core counter in all sibling CPUs when it is finished with.
A new spinlock, core_counters_lock, is introduced to ensure exclusivity
when allocating and releasing per-core counters.
Since counters are now allocated per-core on demand, rather than being
reserved per-thread at boot, all of the "swizzling" of counters is
removed.

The upshot is that in an SoC with 2 counters / thread, counters are
reported as:
Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
available to each CPU, irq 18

Running an instance of a test program on each of 2 threads in a
core, both threads can use their 2 counters to count 2 events:

taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
perf stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

   0.005164264 seconds time elapsed
 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

   0.006139975 seconds time elapsed

In an SoC with 2 counters / core (which can be forced by setting
cpu_has_mipsmt_pertccounters = 0), counters are reported as:
Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
available to each core, irq 18

Running an instance of a test program on each of 2 threads in a
core, now only one thread manages to secure the performance counters to
count 2 events. The other thread does not get any counters.

taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
perf stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

instructions:u
branches:u

   0.005179533 seconds time elapsed

 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

   0.005179467 seconds time elapsed

Signed-off-by: Matt Redfearn 
---

Changes in v2:
- Fix !#ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS build
- re-use cpuc variable in mipsxx_pmu_alloc_counter,
  mipsxx_pmu_free_counter rather than having sibling_ version.

 arch/mips/kernel/perf_event_mipsxx.c | 127 +++
 1 file changed, 85 insertions(+), 42 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 0373087abee8..6c9b5d64a9ef 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -131,6 +131,8 @@ static struct mips_pmu mipspmu;
 #ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS
 static int cpu_has_mipsmt_pertccounters;
 
+static DEFINE_SPINLOCK(core_counters_lock);
+
 static DEFINE_RWLOCK(pmuint_rwlock);
 
 #if defined(CONFIG_CPU_BMIPS5000)
@@ -141,20 +143,6 @@ static DEFINE_RWLOCK(pmuint_rwlock);
 0 : cpu_vpe_id(_cpu_data))
 #endif
 
-/* Copied from op_model_mipsxx.c */
-static unsigned int vpe_shift(void)
-{
-   if (num_possible_cpus() > 1)
-   return 1;
-
-   return 0;
-}
-
-static unsigned int counters_total_to_per_cpu(unsigned int counters)
-{
-   return counters >> vpe_shift();
-}
-
 #else /* !CONFIG_MIPS_PERF_SHARED_TC_COUNTERS */
 #define vpe_id()   0
 
@@ -165,17 +153,8 @@ static void pause_local_counters(void);
 static irqreturn_t mipsxx_pmu_handle_irq(int, void *);
 static int mipsxx_pmu_handle_shared_irq(void);
 
-static unsigned int mipsxx_pmu_swizzle_perf_idx(unsigned int idx)
-{
-   if (vpe_id() == 1)
-   idx = (idx + 2) & 3;
-   return idx;
-}
-
 static u64 mipsxx_pmu_read_counter(unsigned int idx)
 {
-   idx = mipsxx_pmu_swizzle_perf_idx(idx);
-
switch (idx) {
ca

[PATCH v2 3/6] MIPS: perf: Fix perf with MT counting other threads

2018-04-12 Thread Matt Redfearn
When perf is used in non-system mode, i.e. without specifying CPUs to
count on, check_and_calc_range falls into the case when it sets
M_TC_EN_ALL in the counter config_base. This has the impact of always
counting for all of the threads in a core, even when the user has not
requested it. For example this can be seen with a test program which
executes 30002 instructions and 1 branches running on one VPE and a
busy load on the other VPE in the core. Without this commit, the
expected count is not returned:

taskset 4 dd if=/dev/zero of=/dev/null count=10 & taskset 8 perf
stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

103235  instructions:u
 17015  branches:u

In order to fix this, remove check_and_calc_range entirely and perform
all of the logic in mipsxx_pmu_enable_event. Since
mipsxx_pmu_enable_event now requires the range of the event, ensure that
it is set by mipspmu_perf_event_encode in the same circumstances as
before (i.e. #ifdef CONFIG_MIPS_MT_SMP && num_possible_cpus() > 1).

The logic of mipsxx_pmu_enable_event now becomes:
If the CPU is a BMIPS5000, then use the special vpe_id() implementation
to select which VPE to count.
If the counter has a range greater than a single VPE, i.e. it is a
core-wide counter, then ensure that the counter is set up to count
events from all TCs (though, since this is true by definition, is this
necessary? Just enabling a core-wide counter in the per-VPE case appears
experimentally to return the same counts. This is left in for now as the
logic was present before).
If the event is set up to count a particular CPU (i.e. system mode),
then the VPE ID of that CPU is used for the counter.
Otherwise, the event should be counted on the CPU scheduling this thread
(this was the critical bit missing from the previous implementation) so
the VPE ID of this CPU is used for the counter.

With this commit, the same test as before returns the counts expected:

taskset 4 dd if=/dev/zero of=/dev/null count=10 & taskset 8 perf
stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>

---

Changes in v2:
Fix mipsxx_pmu_enable_event for !#ifdef CONFIG_MIPS_MT_SMP

 arch/mips/kernel/perf_event_mipsxx.c | 78 ++--
 1 file changed, 39 insertions(+), 39 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 239c4ca89fb0..0373087abee8 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -325,7 +325,11 @@ static int mipsxx_pmu_alloc_counter(struct cpu_hw_events 
*cpuc,
 
 static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)
 {
+   struct perf_event *event = container_of(evt, struct perf_event, hw);
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+#ifdef CONFIG_MIPS_MT_SMP
+   unsigned int range = evt->event_base >> 24;
+#endif /* CONFIG_MIPS_MT_SMP */
 
WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
 
@@ -333,11 +337,37 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
(evt->config_base & M_PERFCTL_CONFIG_MASK) |
/* Make sure interrupt enabled. */
MIPS_PERFCTRL_IE;
-   if (IS_ENABLED(CONFIG_CPU_BMIPS5000))
+
+#ifdef CONFIG_CPU_BMIPS5000
+   {
/* enable the counter for the calling thread */
cpuc->saved_ctrl[idx] |=
(1 << (12 + vpe_id())) | BRCM_PERFCTRL_TC;
+   }
+#else
+#ifdef CONFIG_MIPS_MT_SMP
+   if (range > V) {
+   /* The counter is processor wide. Set it up to count all TCs. */
+   pr_debug("Enabling perf counter for all TCs\n");
+   cpuc->saved_ctrl[idx] |= M_TC_EN_ALL;
+   } else
+#endif /* CONFIG_MIPS_MT_SMP */
+   {
+   unsigned int cpu, ctrl;
 
+   /*
+* Set up the counter for a particular CPU when event->cpu is
+* a valid CPU number. Otherwise set up the counter for the CPU
+* scheduling this thread.
+*/
+   cpu = (event->cpu >= 0) ? event->cpu : smp_processor_id();
+
+   ctrl = M_PERFCTL_VPEID(cpu_vpe_id(_data[cpu]));
+   ctrl |= M_TC_EN_VPE;
+   cpuc->saved_ctrl[idx] |= ctrl;
+   pr_debug("Enabling perf counter for CPU%d\n", cpu);
+   }
+#endif /* CONFIG_CPU_BMIPS5000 */
/*
 * We do not actually let the counter run. Leave it until start().
 */
@@ -651,13 +681,14 @@ static unsigned int mipspmu_perf_event_encode(const 
struct mips_perf_event *pev)
  * event_id.
  */
 #ifdef CONFIG_MIPS_MT

[PATCH v2 3/6] MIPS: perf: Fix perf with MT counting other threads

2018-04-12 Thread Matt Redfearn
When perf is used in non-system mode, i.e. without specifying CPUs to
count on, check_and_calc_range falls into the case when it sets
M_TC_EN_ALL in the counter config_base. This has the impact of always
counting for all of the threads in a core, even when the user has not
requested it. For example this can be seen with a test program which
executes 30002 instructions and 1 branches running on one VPE and a
busy load on the other VPE in the core. Without this commit, the
expected count is not returned:

taskset 4 dd if=/dev/zero of=/dev/null count=10 & taskset 8 perf
stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

103235  instructions:u
 17015  branches:u

In order to fix this, remove check_and_calc_range entirely and perform
all of the logic in mipsxx_pmu_enable_event. Since
mipsxx_pmu_enable_event now requires the range of the event, ensure that
it is set by mipspmu_perf_event_encode in the same circumstances as
before (i.e. #ifdef CONFIG_MIPS_MT_SMP && num_possible_cpus() > 1).

The logic of mipsxx_pmu_enable_event now becomes:
If the CPU is a BMIPS5000, then use the special vpe_id() implementation
to select which VPE to count.
If the counter has a range greater than a single VPE, i.e. it is a
core-wide counter, then ensure that the counter is set up to count
events from all TCs (though, since this is true by definition, is this
necessary? Just enabling a core-wide counter in the per-VPE case appears
experimentally to return the same counts. This is left in for now as the
logic was present before).
If the event is set up to count a particular CPU (i.e. system mode),
then the VPE ID of that CPU is used for the counter.
Otherwise, the event should be counted on the CPU scheduling this thread
(this was the critical bit missing from the previous implementation) so
the VPE ID of this CPU is used for the counter.

With this commit, the same test as before returns the counts expected:

taskset 4 dd if=/dev/zero of=/dev/null count=10 & taskset 8 perf
stat -e instructions:u,branches:u ./test_prog

 Performance counter stats for './test_prog':

 30002  instructions:u
 1  branches:u

Signed-off-by: Matt Redfearn 

---

Changes in v2:
Fix mipsxx_pmu_enable_event for !#ifdef CONFIG_MIPS_MT_SMP

 arch/mips/kernel/perf_event_mipsxx.c | 78 ++--
 1 file changed, 39 insertions(+), 39 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 239c4ca89fb0..0373087abee8 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -325,7 +325,11 @@ static int mipsxx_pmu_alloc_counter(struct cpu_hw_events 
*cpuc,
 
 static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)
 {
+   struct perf_event *event = container_of(evt, struct perf_event, hw);
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+#ifdef CONFIG_MIPS_MT_SMP
+   unsigned int range = evt->event_base >> 24;
+#endif /* CONFIG_MIPS_MT_SMP */
 
WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
 
@@ -333,11 +337,37 @@ static void mipsxx_pmu_enable_event(struct hw_perf_event 
*evt, int idx)
(evt->config_base & M_PERFCTL_CONFIG_MASK) |
/* Make sure interrupt enabled. */
MIPS_PERFCTRL_IE;
-   if (IS_ENABLED(CONFIG_CPU_BMIPS5000))
+
+#ifdef CONFIG_CPU_BMIPS5000
+   {
/* enable the counter for the calling thread */
cpuc->saved_ctrl[idx] |=
(1 << (12 + vpe_id())) | BRCM_PERFCTRL_TC;
+   }
+#else
+#ifdef CONFIG_MIPS_MT_SMP
+   if (range > V) {
+   /* The counter is processor wide. Set it up to count all TCs. */
+   pr_debug("Enabling perf counter for all TCs\n");
+   cpuc->saved_ctrl[idx] |= M_TC_EN_ALL;
+   } else
+#endif /* CONFIG_MIPS_MT_SMP */
+   {
+   unsigned int cpu, ctrl;
 
+   /*
+* Set up the counter for a particular CPU when event->cpu is
+* a valid CPU number. Otherwise set up the counter for the CPU
+* scheduling this thread.
+*/
+   cpu = (event->cpu >= 0) ? event->cpu : smp_processor_id();
+
+   ctrl = M_PERFCTL_VPEID(cpu_vpe_id(_data[cpu]));
+   ctrl |= M_TC_EN_VPE;
+   cpuc->saved_ctrl[idx] |= ctrl;
+   pr_debug("Enabling perf counter for CPU%d\n", cpu);
+   }
+#endif /* CONFIG_CPU_BMIPS5000 */
/*
 * We do not actually let the counter run. Leave it until start().
 */
@@ -651,13 +681,14 @@ static unsigned int mipspmu_perf_event_encode(const 
struct mips_perf_event *pev)
  * event_id.
  */
 #ifdef CONFIG_MIPS_MT_SMP
-   return ((unsigned int)pev-&

[PATCH v2 2/6] MIPS: perf: Use correct VPE ID when setting up VPE tracing

2018-04-12 Thread Matt Redfearn
There are a couple of FIXME's in the perf code which state that
cpu_data[event->cpu].vpe_id reports 0 for both CPUs. This is no longer
the case, since the vpe_id is used extensively by SMP CPS.

VPE local counting gets around this by using smp_processor_id() instead.
As it happens this does work correctly to count events on the right VPE,
but relies on 2 assumptions:
a) Always having 2 VPEs / core.
b) The hardware only paying attention to the least significant bit of
the PERFCTL.VPEID field.
If either of these assumptions change then the incorrect VPEs events
will be counted.

Fix this by replacing smp_processor_id() with
cpu_vpe_id(_cpu_data), in the vpe_id() macro, and pass vpe_id()
to M_PERFCTL_VPEID() when setting up PERFCTL.VPEID. The FIXME's can also
be removed since they no longer apply.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Changes in v2: None

 arch/mips/kernel/perf_event_mipsxx.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index f3ec4a36921d..239c4ca89fb0 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -137,12 +137,8 @@ static DEFINE_RWLOCK(pmuint_rwlock);
 #define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
 0 : (smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK))
 #else
-/*
- * FIXME: For VSMP, vpe_id() is redefined for Perf-events, because
- * cpu_data[cpuid].vpe_id reports 0 for _both_ CPUs.
- */
 #define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
-0 : smp_processor_id())
+0 : cpu_vpe_id(_cpu_data))
 #endif
 
 /* Copied from op_model_mipsxx.c */
@@ -1279,11 +1275,7 @@ static void check_and_calc_range(struct perf_event 
*event,
 */
hwc->config_base |= M_TC_EN_ALL;
} else {
-   /*
-* FIXME: cpu_data[event->cpu].vpe_id reports 0
-* for both CPUs.
-*/
-   hwc->config_base |= M_PERFCTL_VPEID(event->cpu);
+   hwc->config_base |= M_PERFCTL_VPEID(vpe_id());
hwc->config_base |= M_TC_EN_VPE;
}
} else
-- 
2.7.4



[PATCH v2 2/6] MIPS: perf: Use correct VPE ID when setting up VPE tracing

2018-04-12 Thread Matt Redfearn
There are a couple of FIXME's in the perf code which state that
cpu_data[event->cpu].vpe_id reports 0 for both CPUs. This is no longer
the case, since the vpe_id is used extensively by SMP CPS.

VPE local counting gets around this by using smp_processor_id() instead.
As it happens this does work correctly to count events on the right VPE,
but relies on 2 assumptions:
a) Always having 2 VPEs / core.
b) The hardware only paying attention to the least significant bit of
the PERFCTL.VPEID field.
If either of these assumptions change then the incorrect VPEs events
will be counted.

Fix this by replacing smp_processor_id() with
cpu_vpe_id(_cpu_data), in the vpe_id() macro, and pass vpe_id()
to M_PERFCTL_VPEID() when setting up PERFCTL.VPEID. The FIXME's can also
be removed since they no longer apply.

Signed-off-by: Matt Redfearn 
---

Changes in v2: None

 arch/mips/kernel/perf_event_mipsxx.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index f3ec4a36921d..239c4ca89fb0 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -137,12 +137,8 @@ static DEFINE_RWLOCK(pmuint_rwlock);
 #define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
 0 : (smp_processor_id() & MIPS_CPUID_TO_COUNTER_MASK))
 #else
-/*
- * FIXME: For VSMP, vpe_id() is redefined for Perf-events, because
- * cpu_data[cpuid].vpe_id reports 0 for _both_ CPUs.
- */
 #define vpe_id()   (cpu_has_mipsmt_pertccounters ? \
-0 : smp_processor_id())
+0 : cpu_vpe_id(_cpu_data))
 #endif
 
 /* Copied from op_model_mipsxx.c */
@@ -1279,11 +1275,7 @@ static void check_and_calc_range(struct perf_event 
*event,
 */
hwc->config_base |= M_TC_EN_ALL;
} else {
-   /*
-* FIXME: cpu_data[event->cpu].vpe_id reports 0
-* for both CPUs.
-*/
-   hwc->config_base |= M_PERFCTL_VPEID(event->cpu);
+   hwc->config_base |= M_PERFCTL_VPEID(vpe_id());
hwc->config_base |= M_TC_EN_VPE;
}
} else
-- 
2.7.4



[PATCH v2 1/6] MIPS: perf: More robustly probe for the presence of per-tc counters

2018-04-12 Thread Matt Redfearn
Processors implementing the MIPS MT ASE may have performance counters
implemented per core or per TC. Processors implemented by MIPS
Technologies signify presence per TC through a bit in the implementation
specific Config7 register. Currently the code which probes for their
presence blindly reads a magic number corresponding to this bit, despite
it potentially having a different meaning in the CPU implementation.

The test of Config7.PTC was previously enabled when CONFIG_BMIPS5000 was
enabled. However, according to [florian], the BMIPS5000 manual does not
define this bit, so we can assume it is 0 and the feature is not
supported.

Introduce probe_mipsmt_pertccounters() to probe for the presence of per
TC counters. This detects the ases implemented in the CPU, and reads any
implementation specific bit flagging their presence. In the case of MIPS
implementations, this bit is Config7.PTC. A definition of this bit is
added in mipsregs.h for MIPS Technologies. No other implementations
support this feature.

Signed-off-by: Matt Redfearn <matt.redfe...@mips.com>
---

Changes in v2: None

 arch/mips/include/asm/mipsregs.h |  5 +
 arch/mips/kernel/perf_event_mipsxx.c | 29 -
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index 858752dac337..a4b02bc8 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -684,6 +684,11 @@
 #define MIPS_CONF7_IAR (_ULCAST_(1) << 10)
 #define MIPS_CONF7_AR  (_ULCAST_(1) << 16)
 
+/* Config7 Bits specific to MIPS Technologies. */
+
+/* Performance counters implemented Per TC */
+#define MTI_CONF7_PTC  (_ULCAST_(1) << 19)
+
 /* WatchLo* register definitions */
 #define MIPS_WATCHLO_IRW   (_ULCAST_(0x7) << 0)
 
diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 6668f67a61c3..f3ec4a36921d 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -1708,6 +1708,33 @@ static const struct mips_perf_event 
*xlp_pmu_map_raw_event(u64 config)
return _event;
 }
 
+#ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS
+/*
+ * The MIPS MT ASE specifies that performance counters may be implemented
+ * per core or per TC. If implemented per TC then all Linux CPUs have their
+ * own unique counters. If implemented per core, then VPEs in the core must
+ * treat the counters as a shared resource.
+ * Probe for the presence of per-TC counters
+ */
+static int probe_mipsmt_pertccounters(void)
+{
+   struct cpuinfo_mips *c = _cpu_data;
+
+   /* Non-MT cores by definition cannot implement per-TC counters */
+   if (!cpu_has_mipsmt)
+   return 0;
+
+   switch (c->processor_id & PRID_COMP_MASK) {
+   case PRID_COMP_MIPS:
+   /* MTI implementations use CONFIG7.PTC to signify presence */
+   return read_c0_config7() & MTI_CONF7_PTC;
+   default:
+   break;
+   }
+   return 0;
+}
+#endif /* CONFIG_MIPS_PERF_SHARED_TC_COUNTERS */
+
 static int __init
 init_hw_perf_events(void)
 {
@@ -1723,7 +1750,7 @@ init_hw_perf_events(void)
}
 
 #ifdef CONFIG_MIPS_PERF_SHARED_TC_COUNTERS
-   cpu_has_mipsmt_pertccounters = read_c0_config7() & (1<<19);
+   cpu_has_mipsmt_pertccounters = probe_mipsmt_pertccounters();
if (!cpu_has_mipsmt_pertccounters)
counters = counters_total_to_per_cpu(counters);
 #endif
-- 
2.7.4



  1   2   3   4   5   6   7   8   9   10   >